128 0 11MB
English Pages 373 [364] Year 2023
Natural Computing Series
Dimo Brockhoff Michael Emmerich Boris Naujoks Robin Purshouse Editors
Many-Criteria Optimization and Decision Analysis State-of-the-Art, Present Challenges, and Future Perspectives
Natural Computing Series Founding Editor Grzegorz Rozenberg
Series Editors Thomas Bäck , Natural Computing Group–LIACS, Leiden University, Leiden, The Netherlands Lila Kari, School of Computer Science, University of Waterloo, Waterloo, ON, Canada Susan Stepney, Department of Computer Science, University of York, York, UK
Scope The Natural Computing book series covers theory, experiment, and implementations at the intersection of computation and natural systems. This includes: . Computation inspired by Nature: Paradigms, algorithms, and theories inspired by natural phenomena. Examples include cellular automata, simulated annealing, neural computation, evolutionary computation, swarm intelligence, and membrane computing. . Computing using Nature-inspired novel substrates: Examples include biomolecular (DNA) computing, quantum computing, chemical computing, synthetic biology, soft robotics, and artificial life. . Computational analysis of Nature: Understanding nature through a computational lens. Examples include systems biology, computational neuroscience, quantum information processing.
Dimo Brockhoff · Michael Emmerich · Boris Naujoks · Robin Purshouse Editors
Many-Criteria Optimization and Decision Analysis State-of-the-Art, Present Challenges, and Future Perspectives
Editors Dimo Brockhoff Inria Saclay-Île-de-France and CMAP École Polytechnique Palaiseau, France Boris Naujoks Informatik und Ingenieurwissenschaften TH Köln Gummersbach, Germany
Michael Emmerich Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands Robin Purshouse Automatic Control and Systems Engineering University of Sheffield Sheffield, UK
ISSN 1619-7127 Natural Computing Series ISBN 978-3-031-25262-4 ISBN 978-3-031-25263-1 (eBook) https://doi.org/10.1007/978-3-031-25263-1 © Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Optimization problems must be tackled in many areas of our modern lives. More often than not, we have to optimize several conflicting criteria simultaneously: performance and cost, risk and return, or quality, cost and environmental issues. The availability of fast computing resources, more complex models and paradigm shifts such as towards sustainable solutions in recent years results in more and more of those criteria in real-world optimization formulations. The field of many-criteria optimization aims at developing optimization algorithms for such kind of problems and at shedding light at the mathematical foundations of such many-criteria problems.
Why This Book? Starting from the appearance of the term many-criteria or many-objective optimization back in 2002, the field of many-criteria optimization has become one of the most active subfields of multi-criteria optimization with many independent research groups working on various aspects. In our opinion, previous research on manycriteria optimization was uncoordinated and often rather incremental and the aspects that strongly distinguish optimization with a very small number of objectives (typically two or three) from that with larger numbers of objective functions were often neglected. For example, more and more algorithms for many-criteria optimization have been proposed in recent years without addressing fundamental issues such as proper performance assessment/benchmarking and theoretical aspects. Hence, we were convinced that it was important to organize a workshop around the topic of many-criteria optimization to change this. Finally, in September 2019, the ‘ManyCriteria Optimisation and Decision Analysis (MACODA)’ workshop was held at the Lorentz Center in Leiden, The Netherlands, jointly co-organized by the four of us. Fortunately, we were able to gather more than 50 leading experts and young researchers on MACODA and related fields of science for a week-long workshop to discuss the current status of MACODA, to initiate collaborations, and, more importantly, to propose a joint, coordinated research agenda for the years to come. v
vi
Preface
The original goal of the workshop was to compile a comprehensive list of important topics to work on and to define the possible path of the research field as a whole in a joint journal publication by the workshop participants. As suggested by the Lorentz Center staff, we thereby followed the idea of an ‘open space’ environment that allowed the participants to discuss freely the many different aspects of MACODA without a pre-specified schedule. Soon into the workshop days with the open space arrangement, it became apparent that a single article was not sufficient to summarize the workshop output. Instead of the original plan to write a single joint journal article, the participants instead resolved to write this book, to be edited by the MACODA workshop organizers. It should not only serve as a summary of the state of the art in many-criteria optimization and of the workshop discussions in particular but also provide a starting point for future contributions to the advancements of the field. To address the latter aspect, the contributing chapters explicitly discuss important open research questions.
How Did We Organize the Writing? Based on the most important open questions, identified at the workshop, the editors approached a few lead authors to organize the writing of five key chapters of this book. Those lead authors joined forces with other authors, interested in these topics to write the chapters. Right after the choice of the main topics of the book, we asked every participant in the workshop to contribute additional book chapters of interest. Also, other experts in the research field, who have not been able to participate in the workshop, have been approached to give the widest possible view on the field to date. The reviewing and proofreading of the single chapters involved authors of related chapters to increase consistency and reduce repetitions throughout the book and to not forget to mention the most important aspects of many-criteria optimization. Further reviewers, who did not participate in the original workshop, gave additional feedback. We are very grateful to the following reviewers for the time they spent in helping ensure the scientific rigour of the MACODA book: Richard Allmendinger, Vitor Basto-Fernandes, Jürgen Branke, Tinkle Chugh, Carlos Coello Coello, António Gaspar Cunha, João Duro, Jonathan Fieldsend, Bogdan Filipiˇc, Peter Fleming, Carlos M. Fonseca, Hisao Ishibuchi, Yaochu Jin, Pascal Kerschke, Kathrin Klamroth, Joshua Knowles, Rodica Lung, Frank Neumann, Alma Rahat, Günter Rudolph, Pradyumn Shukla, Ricardo Takahashi, Tea Tušar, Hao Wang, Margaret Wieczek and Iryna Yevseyeva.
Thanks Because the book would not have been written without the support of the Lorentz Center, we warmly thank everybody who was involved in the MACODA workshop,
Preface
vii
especially Michelle Grandia-Jonkers, Aimée Reinards and Henriette Jensenius. To point us to the open space idea and assisting in using it was a big plus for the workshop and thus for the success of this book. Many thanks also go to Mio Sugino and Ronan Nugent at Springer for their support and assistance with the technical issues. But most of all, we thank all workshop participants and all authors for their input, ideas and hard work to make this book possible.
A Peek into the Book We eventually have 13 chapters to offer, including an ontology, plus an appendix with a glossary. We start with a general introduction and two chapters about real-world applications, followed by chapters on fundamental aspects of many-criteria optimization, namely on order relations, quality measures, benchmarking, visualization and theory before more specialized chapters on correlated objectives, heterogeneous objectives, Bayesian optimization and game theory. We wish you a nice read and hope that you will be able to apply the information of the book in your own application or that you even contribute to the advancements of the field of many-criteria optimization as intended by the book. Gummersbach, Germany Palaiseau, France Leiden, The Netherlands Sheffield, UK
Boris Naujoks Dimo Brockhoff Michael Emmerich Robin Purshouse
Contents
Part I 1
2
3
Key Research Topics
Introduction to Many-Criteria Optimization and Decision Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimo Brockhoff, Michael Emmerich, Boris Naujoks, and Robin Purshouse Key Issues in Real-World Applications of Many-Objective Optimisation and Decision Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kalyanmoy Deb, Peter Fleming, Yaochu Jin, Kaisa Miettinen, and Patrick M. Reed Identifying Properties of Real-World Optimisation Problems Through a Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Koen van der Blom, Timo M. Deist, Vanessa Volz, Mariapia Marchi, Yusuke Nojima, Boris Naujoks, Akira Oyama, and Tea Tušar
3
29
59
4
Many-Criteria Dominance Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andre H. Deutz, Michael Emmerich, and Yali Wang
81
5
Many-Objective Quality Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Bekir Afsar, Jonathan E. Fieldsend, Andreia P. Guerreiro, Kaisa Miettinen, Sebastian Rojas Gonzalez, and Hiroyuki Sato
6
Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Vanessa Volz, Dani Irawan, Koen van der Blom, and Boris Naujoks
7
Visualisation for Decision Support in Many-Objective Optimisation: State-of-the-art, Guidance and Future Directions . . . 181 Jussi Hakanen, David Gold, Kaisa Miettinen, and Patrick M. Reed
8
Theoretical Aspects of Subset Selection in Multi-Objective Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Andreia P. Guerreiro, Kathrin Klamroth, and Carlos M. Fonseca ix
x
9
Contents
Identifying Correlations in Understanding and Solving Many-Objective Optimisation Problems . . . . . . . . . . . . . . . . . . . . . . . . . 241 T. Chugh, A. Gaspar-Cunha, A. H. Deutz, J. A. Duro, D. C. Oara, and A. Rahat
Part II Emerging Topics 10 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Hao Wang and Kaifeng Yang 11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Mickaël Binois, Abderrahmane Habbal, and Victor Picheny 12 Heterogeneous Objectives: State-of-the-Art and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Richard Allmendinger and Joshua Knowles 13 Many-Criteria Optimisation and Decision Analysis Ontology and Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Vitor Basto-Fernandes, Diana Salvador, Iryna Yevseyeva, and Michael Emmerich Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Contributors
Bekir Afsar Faculty of Information Technology, University of Jyvaskyla, Jyvaskyla, Finland Richard Allmendinger Alliance Manchester Business School, The University of Manchester, Manchester, UK Vitor Basto-Fernandes Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL, Av. das Forças Armadas, Lisboa, Portugal; School of Computer Science and Informatics, Faculty of Technology, De Montfort University, Leicester, United Kingdom; Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden, The Netherlands Mickaël Binois LJAD, CNRS, Inria, Université Côte d’Azur, Nice, France Koen van der Blom Leiden University, Leiden, The Netherlands; Sorbonne Université, CNRS, LIP6, Paris, France; Leiden Institute of Advanced Computer Science, Leiden, Netherlands Dimo Brockhoff Inria, Ecole Polytechnique, IP, Paris, France Tinkle Chugh University of Exeter, Exeter, UK Kalyanmoy Deb Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA Timo M. Deist Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Andre H. Deutz Leiden University, Leiden, The Netherlands João A. Duro University of Sheffield, Sheffield, UK Michael Emmerich Leiden University, Leiden, The Netherlands Jonathan E. Fieldsend Department of Computer Science, University of Exeter, Exeter, UK
xi
xii
Contributors
Peter Fleming Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield, UK Carlos M. Fonseca Department of Informatics Engineering, Polo II, University of Coimbra, CISUC, Coimbra, Portugal Antonio Gaspar-Cunha University of Minho, Braga, Portugal David Gold Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY, USA Andreia P. Guerreiro INESC-ID, Rua Alves Redol, Lisbon, Portugal Abderrahmane Habbal LJAD, UMR 7351, CNRS, Inria, Université Côte d’Azur, Nice, France Jussi Hakanen University of Jyvaskyla, Faculty of Information Technology, Jyvaskyla, Finland Dani Irawan TH Köln–University of Applied Sciences, Cologne, Germany Yaochu Jin Department of Computer Science, University of Surrey, Guildford, Surrey, UK Kathrin Klamroth University of Wuppertal, Wuppertal, Germany Joshua Knowles Alliance Manchester Business School, The University of Manchester, Manchester, UK; Schlumberger Cambridge Research, Cambridge, UK Mariapia Marchi ESTECO SpA, Trieste, Italy Kaisa Miettinen Faculty of Information Technology, University of Jyvaskyla, Jyvaskyla, Finland Boris Naujoks TH Köln–University of Applied Sciences, Cologne, Gummersbach, Germany Yusuke Nojima Osaka Metropolitan University, Sakai, Osaka, Japan Daniel C. Oara University of Sheffield, Sheffield, UK Akira Oyama Japan Aerospace Exploration Agency, Sagamihara, Japan Victor Picheny Secondmind, Cambridge, UK Robin Purshouse University of Sheffield, Sheffield, UK Alma Rahat Swansea University, Swansea, UK Patrick M. Reed Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY, USA
Contributors
xiii
Sebastian Rojas Gonzalez Department of Information Technology, Ghent University, Gent, Belgium; Data Science Institute, University of Hasselt, Diepenbeek, Belgium Diana Salvador Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL, Av. das Forças Armadas, Lisboa, Portugal; School of Computer Science and Informatics, Faculty of Technology, De Montfort University, Leicester, United Kingdom; Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden, The Netherlands Hiroyuki Sato The University of Electro-Communications, Chofu, Tokyo, Japan Tea Tušar Jožef Stefan Institute, Ljubljana, Slovenia Vanessa Volz modl.ai, Copenhagen, Denmark Hao Wang Leiden Institute of Advanced Computer Science, Leiden, CA, The Netherlands Yali Wang Leiden University, Leiden, The Netherlands Kaifeng Yang University of Applied Sciences Upper Austria, Hagenberg, Austria Iryna Yevseyeva Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL, Av. das Forças Armadas, Lisboa, Portugal; School of Computer Science and Informatics, Faculty of Technology, De Montfort University, Leicester, United Kingdom; Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden, The Netherlands
Part I
Key Research Topics
Chapter 1
Introduction to Many-Criteria Optimization and Decision Analysis Dimo Brockhoff, Michael Emmerich, Boris Naujoks, and Robin Purshouse
Abstract Many-objective optimization problems (MaOPs) are problems that feature four or more objectives, criteria or attributes that must be considered simultaneously. MaOPs often arise in real-world situations and the development of algorithms for solving MaOPs has become one of the hot topics in the field of evolutionary multi-criteria optimization (EMO). However, much of this energy devoted to MaOP research is arguably detached from the challenges of, and decision analysis requirements for, MaOPs. Motivated by this gap, the authors of this chapter organized a Lorentz Center workshop in 2019 entitled Many-Criteria Optimization and Decision Analysis—MACODA—bringing researchers and practitioners together to reflect on the challenges in many-objective optimization and analysis, and to develop a vision for the next decade of MACODA research. From the workshop arose the MACODA book, for which this chapter forms the introduction. The chapter describes the organizers’ perspectives on the challenges of MaOP. It introduces the history of MaOP principally from the perspective of EMO, from where the terminology originated, but drawing important connections to pre-existing work in the field of multi-criteria decision-making (MCDM) which was the source or inspiration for many EMO ideas. The chapter then offers a brief review of the present state of MACODA research, covering major algorithms, scalarization approaches, objective-space reduction, order extensions to Pareto dominance, preference elicitation, wider decision-maker interaction methods and visualization. In drawing together the vision for MACODA in 2030, the chapter provides synopses of the unique and varied contributions that comprise the MACODA book and identifies further under-explored topics worthy of consideration by researchers over the next decade and beyond. D. Brockhoff (B) Inria, Ecole Polytechnique, IP, Paris, France e-mail: [email protected] M. Emmerich Leiden University, Leiden, The Netherlands B. Naujoks TH Köln, Cologne, Germany R. Purshouse University of Sheffield, Sheffield, UK © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_1
3
4
D. Brockhoff et al.
1.1 Motivation Myriad decision problems require the simultaneous consideration of multiple performance criteria, objectives, or solution attributes. Oftentimes, there are many such criteria that need to be weighed in the balance when deciding upon a solution. Several research communities have been established that seek to provide theory, methods and tools to support decision-makers in solving problems of this type. The principal two communities are multiple criteria decision-making (MCDM) and evolutionary multi-criteria optimization (EMO), although multi-criteria problem-solving is also a topic in the more general fields of operational research and optimization theory. The types of MCDM approaches vary according to whether the specific solution options to be considered are small in number and already known, or whether the set of potential candidate solutions is much larger and yet to be determined. The former situations belong to the field of multi-criteria decision analysis (MCDA), whilst the latter belong to the field of multi-objective optimization. Excellent introductions to each field can be found in the texts of Goodwin and Wright [45] and Miettinen [66], respectively. The EMO community also focuses almost exclusively on multiobjective optimization problems—the main difference from classical MCDM being the use of population-based randomized search heuristics rather than analytical or deterministic methods as the instruments used in identifying optimal candidate solutions. However, over the past two decades, the distinction between the MCDM and EMO communities has become purposefully blurred, due to successful initiatives such as a programme of shared Dagstuhl seminars [9]. Excellent introductions to EMO and MCDM can be found in the texts of Deb [23] and Coello Coello, Lamont and Van Veldhuizen [20] as well as Miettinen [66], Ehrgott [30], Steuer [77], Belton and Stewart [7], Zopounidis and Pardalos [89] and Greco et al. [46], respectively. Historically, the EMO community has focused on intensive algorithm development driven by artificial bi-objective benchmarking problems, with the aim of identifying high-quality sample-based representations of Pareto fronts and the associated Pareto sets. A plethora of methods and tools are now available to support decisionmakers in identifying Pareto fronts for such problems—and these have been demonstrably useful across a broad range of applications. More recently, as will be summarized in Sect. 1.2.2 of this chapter, the EMO community began to recognize that the algorithms developed for bi-objective problems did not scale well to problems with many more objectives—an important issue, since many real-world problems possess this characteristic. This realization has triggered a new phase of intensive algorithm development on artificial benchmarking problems featuring many objectives, but using the same conceptual framework as the earlier methods. Where these research efforts are claiming successful results, the studies use performance metrics designed for low numbers of criteria. Since there is no good understanding of how these metrics work in many-criteria spaces, it is not clear whether the findings from these now copious studies are meaningful. Whilst this rather pathological characterization of EMO research is a generalization that some researchers would undoubtedly
1 Introduction to Many-Criteria Optimization and Decision Analysis
5
raise objection to, it does raise concerns about the true usefulness of a large corpus of contemporary EMO research for real-world decision-making. As four researchers who had been working on a variety of many-criteria optimization topics over the past decade or more, we reached a joint view that recent EMO research on many-criteria optimization had become rather uncoordinated and would benefit from redirecting its efforts from algorithmic development towards developing a better understanding of the special context presented by many-objective optimization problems (MaOPs).1 We argued that, once this new context is understood, the aims of solving MaOPs have been articulated and fundamental questions posed by MaOPs have been addressed, the EMO community would be much better positioned to make a beneficial contribution to designing methods and tools for MaOPs, including for real-world problems. To this end, we proposed the Many-Criteria Optimization and Decision Analysis (MACODA) workshop at the Lorentz Center in Leiden, the Netherlands in 2019. In this week-long invitation-only workshop, experts from EMO and other disciplines came together to discuss and develop new concepts to tackle the challenging problems in many-criteria optimization and also the wider decision aid systems needed for practical applications of EMO algorithms. Whilst we organizers developed an initial set of research questions, participants got the opportunity to propose alternative directions when registering for the workshop; these suggestions were the basis for 34 extensive self-organized “open space” sessions in which the participants organized themselves around the central topic of the workshop and many aspects of many-criteria optimization were discussed [49]. During the MACODA workshop, a collective decision was made by participants to compile and extend the many outcomes of the workshop within a single book. We are happy that this plan worked, and that we have collected both the state-of-the-art as well as open research directions in many-criteria optimization. This book can help the reader to get a good overview of the field, to serve as a reference and to provide guidelines of where and how to start contributing to the field. Each of the following eleven chapters thereby focuses on a certain aspect of many-criteria optimization, such as benchmarking, visualization, or theoretical aspects. An ontology of the field rounds up the book. Although most of the authors who have contributed to the present book would arguably identify themselves foremost as being part of the EMO community, the topics covered are relevant to a broader community of researchers, in particular in the MCDM community, and the book also features contributions from this field. Before the specific chapters, we provide the fundamentals and a broader overview with this introductory chapter. We specifically explain what is many-criteria optimization in a more formal way in Sect. 1.2—including a discussion on the additional challenges a MaOP poses compared to a problem with fewer criteria. Section 1.3 gives a more detailed overview of what had been achieved collectively by the time of the MACODA workshop, whilst Sect. 1.4 discusses the open questions and the orga-
1
The terms many-criteria and many-objective will be used interchangeably throughout this book, since both are common within the research community.
6
D. Brockhoff et al.
nizers’ vision for where the research field should focus the efforts. Finally, Sect. 1.5 briefly introduces the constitutive chapters of this book and their main contributions.
1.2 What is Many-Criteria Optimization? Many practical optimization problems involve multiple, conflicting objective functions or criteria such as risk, sustainability and return in portfolio optimization, performance and energy efficiency in chip design or cost, reliability, comfort, fuel consumption and the reduction of tailpipe emissions in the design of cars or planes. Other application areas include urban planning, staff scheduling, embedded systems design, forest and environmental management, medical therapy design and drug discovery [19]. Note that both methodological and application works tend to use the terms ‘criterion’ and ‘objective’ synonymously and interchangeably, which we will continue to do throughout this chapter and is also the case across other chapters in the book. From an abstract point of view, the evaluation of a system, policy, or schedule can be viewed as the evaluation of a mathematical function that represents the performance of the system. Formally, we are interested in minimizing a vector-valued objective function f : X → Rm ,
f (x) = ( f 1 (x), . . . , f m (x)) with x ∈ X .
We denote X as the search or decision space and the image f (X ) ⊆ Rm as the objective space of the m-criteria problem. Depending on the context, we will use the terms criteria, objectives and objective functions interchangeably with a preference for the term objective function(s) when we address the actual above-defined mathematical function(s). When m = 1, the problem has a single objective, whilst m ≥ 2 denotes a multi-objective problem. Problems with m = 2 objectives are sometimes known as bi-objective problems. Common to multi-criteria problems is the typical conflict among the objective functions, meaning that not a single optimal solution exists, but that without the knowledge of preferences from decision-maker(s) (DMs), we aim at finding a set of trade-off solutions. Depending on when a decision-maker is involved in the optimization process, we talk about a priori, a posteriori, or interactive approaches. In a priori approaches, a single-criterion optimization problem is created before the optimization process begins, based on the preferences of the decision-maker. In a posteriori approaches, the decision-maker bases her decisions on the knowledge gained from an optimization process that identifies a set of candidate solutions, offering trade-offs between the different criteria. In an interactive approach, search and decision-making are intertwined before a final solution is obtained—refining the search iteratively towards solutions interesting to the decision-maker. Note that the above formalization does not include constraints explicitly but that, in many practical problems, constraints have to be dealt with in addition to the
1 Introduction to Many-Criteria Optimization and Decision Analysis
7
trade-offs—as is also the case for other typical challenges of optimization problems such as large search space dimension, mixed-type variables, nonlinearities, ill-conditioning or multi-modality. With the introduced conflict and the corresponding trade-offs, basic properties of optimization problems change when moving from single-criteria to multi-criteria problems. Most prominently, non-dominance can occur, i.e. two solutions do not need to be necessarily comparable anymore. We say that one solution x ∈ X dominates another solution y ∈ X if for all objective functions, x is not worse than y and if for at least one objective, x is better. We denote such a dominance of x over y as x ≺ y. For minimization, this means x ≺ y if and only if ∀1 ≤ i ≤ m : f i (x) ≤ f i (y) and ∃1 ≤ i ≤ m : f i (x) < f i (y). We call the set of solutions X ∗ ⊆ X that are not dominated by any other solution in X the efficient or Pareto(-optimal) set and its image f (X ∗ ) in objective space the efficient frontier or Pareto(-optimal) front. For additional notations related to multi-criteria problems used later in this book, we refer to the Glossary 13.4. Interestingly, there is another important change when the number of objective functions rises beyond three. In this case, our spatial cognition and intuition, by which we are used to visualizing and comparing solutions, does not support us well anymore. Besides, in higher-dimensional spaces many, often counter-intuitive, phenomena occur and the qualitative and quantitative characteristics of dominance orders and related concepts change quite drastically. Following the historical terminology (see also Sect. 1.2.2), we therefore denote many-criteria optimization problems are problems with four or more objective functions.
Note that this definition is not known in classical research fields such as operations research or multiple criteria decision-making, where distinctions might be made between bi-objective (m = 2) and multi-objective (m > 2) problems, but not for a larger number of objective functions. Note further that some authors consider many-objective problems to be a subset of multi-objective problems, whilst others now refer to multi-objective as meaning problems with exclusively two or three objectives, i.e. m ∈ {2, 3}. This terminological issue has not yet been resolved by the community and examples of both kinds of use can be found in the chapters of this book. In the following, we review the main differences between many-criteria problems and multi-criteria problems with fewer objective functions (Sect. 1.2.1) and survey the history of the research field of many-criteria optimization (Sect. 1.2.2) which arose due to these additional differences.
1.2.1 Salient Challenges in Many-Criteria Optimization Formally, many-criteria optimization problems differ from classical 2- and 3-criteria and even single-criteria optimization problems only in their increased number of objective functions. Practically, however, the higher number of objectives results in
8
D. Brockhoff et al.
additional properties and challenges that are not prevalent or even existing when the number of objective functions is lower. The Curse(s) of Dimensionality Coined by Richard E. Bellman [6], the term “curse of dimensionality” describes all difficulties in high-dimensional spaces, related to optimization, sampling, sparse data, etc. In our context of many-objective problems, we refer to this term with respect to the objective space. The number of solutions to cover the Pareto front or Pareto set with a given precision typically increases exponentially with the number of objectives. The Pareto dominance relation provides less and less guidance towards the Pareto front the larger the number of objectives gets (because in uniformly distributed point sets, solutions tend to become mutually non-dominated at an exponential rate, see [55]). As a consequence, locally, the area of improvement is decreasing exponentially. More precisely, it was shown that the difficulty to obtain dominating solutions using isotropic mutation decreases exponentially fast [2, 60]. All those aspects of the curse of dimensionality are of high importance in algorithm design and all related analyses of many-criteria data (for example in performance assessment) alike. Computational Aspects Related to the curse of dimensionality, the optimization of many-objective problems also often means a high computation time. Despite the obvious calculation of more objective function values, several algorithmic components of multi-objective optimization algorithms have a runtime that grows with the number of objectives, sometimes even exponentially. Prominent examples are non-dominated sorting [25] with a polynomially bounded runtime in the number of objectives or the calculation of the hypervolume indicator which is even #P hard in the number of objectives [12] and other hypervolume-related questions [83]. Correlation Structure Among the Objectives With more objective functions, also more structure can be observed among the objectives. Two objective functions can be everything from negatively correlated, through uncorrelated, to positively correlated (up to the point that the resulting problem is actually, in mathematical terms, a single-objective one). But with more objective functions, correlations as well as redundant objectives can be observed. In addition, it gets more and more likely that not all objective functions are equally important in practice when more and more objectives are introduced in the problem formulation. To efficiently optimize, such structural properties of the objectives might be identified and explained by a solver. Cognitive Challenges Finally, novel cognitive challenges appear with many-objective problems—especially related to visualization. Approximations of Pareto sets and any solution sets in general are still easily visualizable with two or three objectives, but starting with four
1 Introduction to Many-Criteria Optimization and Decision Analysis
9
objectives, objective vectors can only be visualized indirectly (by projections, linear, or nonlinear mappings, etc. embedding them into the real 2D or 3D space). See for example the work of Tušar [82, 83] for an extensive overview of visualization techniques for multi-objective optimization. Another cognitive challenge with many objective functions is the articulation of preferences. Already difficult when only two objective functions are involved, preference articulation becomes even harder when more objectives have to be taken into account. Related Scientific Challenges The above mentioned rather generic challenges of many-objective problems relate to more concrete research areas in many-objective optimization such as • Algorithm Design: algorithmic strategies that work well with two or three objectives might not anymore work well in higher dimensions due to the curse of dimensionality, computational aspects of internal procedures, or the presence of correlation structures (which could be exploited). Hence, new concepts need to be introduced to optimize efficiently in high-dimensional objective spaces. • Performance Assessment and Benchmarking: performance assessment and algorithm benchmarking becomes more challenging not only because of the absence of valid and efficient performance indicators but also because the choice of algorithmic and quality indicator parameters becomes more crucial for the algorithms’ behaviour, for example, when choosing (a few) reference points(s) in the higher-dimensional space. The lack of approaches for easy visualization of large, many-dimensional sets plays an important role here as well. Eventually, the objective space dimension as a new adjustable parameter asks for scalability assessment of algorithms in both search and objective space. Naturally, more objective functions, thus, mean more computation time in benchmarking experiments and also the inclusion of decision-makers in the performance assessment gets more involved. Automated tuning of algorithms, automated benchmarking and parallelization aspects play an important role here. • Preference Articulation/User Interaction: once we go beyond the optimization process itself by putting a decision-maker or a group of decision-makers into the loop (for example after the optimization or within interactive methods), additional challenges appear in terms of visualization and the formulation of preferences. Navigation in high-dimensional objective spaces becomes more difficult than with two or three objectives. We cannot see all information at once anymore and additional preference information to focus on small parts of a huge Pareto front is needed in the decision-making process. The articulation of preferences itself also becomes more complex with more objective functions and might even involve more different decision-makers than if only a few objectives are present (for example in a multidisciplinary approach). Finally, the larger amount of incomparabilities among solutions affects the decision finding process as well: not only is it more difficult to find agreements among the decision-makers but it also makes the involvement of the decision-maker(s) during the optimization process more
10
D. Brockhoff et al.
important. The latter is done in order to increase the probability of actual improvements within the optimization. • Objective Space Reduction: As a direct consequence of the presence of many objective functions and potential correlations among them, the question arises whether all defined objective functions are necessary to describe the optimization problem at hand or whether some of the objectives are redundant and can therefore be removed in a reformulation of the problem with smaller objective space dimension. This not only has obvious applications in the visualization of solution sets after the optimization but also in reducing the actual optimization time of algorithms. A good portion of the mentioned research topics will be discussed in more detail in the other chapters of this book. On an abstract level, we will detail later in Sect. 1.3 where the research field is now in terms of (some of) the mentioned topics as well as where we would like to be in the mid-term future in Sect. 1.4. But before, let us review the beginnings of the many-objective optimization research field.
1.2.2 History of Many-Criteria Optimization 1.2.2.1
Origins of the Terminology
To our knowledge, the terminology many-criteria and many-objective were both introduced by Farina and Amato [34] in the title and abstract of a paper in the Proceedings of the 2002 Annual Meeting of the North American Fuzzy Information Processing Society that proposed replacing the conventional Pareto dominance and optimality definitions with fuzzy alternatives. These authors did not subsequently define either term in their paper, but the concluding sentence, “The Pareto optimality definition is unsatisfactory when considering multi-criteria search problems with more than three objectives.” implied that the terms related to problems with greater than three objectives. This definition was clarified in the authors’ subsequent journal paper [35]. The terminology was first adopted by the EMO field in 2003 by Purshouse and Fleming [69, 72, 73]. These authors were motivated by the observation that methodological papers in the field were typically developed in the context of bi-objective (and, occasionally, three-objective) benchmark problems for a posteriori optimization, whilst applications of these methods were typically grappling with problems with more than three criteria. The term evolutionary many-objective optimization was introduced to emphasize this distinction, being exemplified as problems featuring four to 20 objectives [37], and to highlight the need for more methodological research set in the context of many-objective problems. Sometime after this, ‘many-objective’ became terminology that was particular to the EMO field. It was not adopted by other research communities in the field of multi-objective optimization, such as MCDM. Indeed, many of the methods subsequently developed for many-objective problems in the EMO field have reused, been inspired by, or simply
1 Introduction to Many-Criteria Optimization and Decision Analysis
11
rediscovered ideas already developed within MCDM for handling large numbers of objectives. The historical review that follows focuses on developments in the EMO community, but aims to draws attention to those conceptual links with MCDM.
1.2.2.2
Pre-terminology Works: 1995–2003
It is important to note that early applications work in the EMO field, in selected cases, featured many-objective problem formulations. The rich survey of applications in [20] identifies a good number of such works across a range of domains. Here, we focus on journal examples that aim to identify trade-off surfaces (rather than adopting a single cost function based on a priori weighting). In an early example, Flynn and Sherman presented a helicopter panel design problem with four objectives [38]. Chipperfield and Fleming presented a gas turbine engine control problem with nine objectives [18]. Hinchliffe and colleagues developed a four objective formulation for modelling chemical processes [50]. Yu considered a four objective formulation of radiotherapy treatment planning [86]. Reynolds and Ford worked with a 10 objective calibration problem for ecology models. Kumar and Rockett framed classification problems using seven objectives [61]. Fonseca and Fleming had already identified potential issues with Pareto-based approaches by this point, demonstrating the high proportion of non-dominated solutions in their analysis of a seven-objective gas turbine engine control problem [39]. These authors proposed the use of decision-maker preferences to better refine the ranking induced by Pareto dominance. The result was the seminal preferability relation, that explicitly unified a range of European school MCDM preference approaches into a single operator [40]. Alternative operators to modify the dominance relation were also subsequently proposed by other researchers [29, 35, 78]. In his seminal book introducing MOEAs [23], Deb highlighted scalability in the objective space as one of the “salient issues” for the research field. Reinforcing Fonseca and Fleming’s applied analysis, Deb identified the issue of a large increase in the number of non-dominated solutions in an approximation set for steadily increasing numbers of objectives. ... as the number of objective functions increases, more and more solutions tend to lie in the first non-dominated front ... this may cause a difficulty to most multi-objective evolutionary algorithms.
A major step towards progress in this area came with the subsequent publication of the so-called DTLZ problems by Deb and colleagues [28]. Crucially, the DTLZ problems were scalable in the number of objectives (although this scalability also increased the number of decision variables and was therefore subject to potential confounding). Empirical analysis of the then state-of-the-art EMO algorithms using the DTLZ test suite was first published by Khare and colleagues in 2003 [59]. This was a broad study across algorithms and problems and did not identify mechanisms responsible for variations in performance. In a narrower, but more focused study, Purshouse and Fleming [72] took a component-level approach [62] and a specific
12
D. Brockhoff et al.
test problem (DTLZ2) to explore scalability issues for different configurations of Pareto optimizer. This study identified that: • the sweet spot of algorithm configurations (controlling the balance between exploration and exploitation) that produce effective convergence to the Pareto front reduced as the number of objectives increased; • conventional parameterizations for variation operators that had arisen from biobjective studies were no longer located in the sweet spot for problems with greater numbers of objectives; • poor convergence (little or no better than random search) was associated with inability to find new solutions that dominate existing solutions (related to the notion of dominance resistance [53]); • reverse convergence, in which the optimizer performed worse than a random search, was driven by over-promotion of diversity in the presence of dominance resistance (since diverse solutions can be found in poor performing areas of the objective space). Purshouse and Fleming suggested, but did not demonstrate, that these findings were generalizable to the wider class of EMO algorithms that relied on dominance and diversity mechanisms to drive the search and to other many-objective problems that exhibited conflict across four or more objectives.
1.2.2.3
Early Post-terminology Works: 2003–2008
Subsequently, a number of key works took up the challenge of evolutionary manyobjective optimization. Hughes developed important insights in a many-objective empirical analysis of his Multiple Single Objective Pareto Sampling (MSOPS) [51] algorithm [52]. Hughes examined two decomposition approaches based on the “traditional” weighted min-max (i.e. Chebyshev) scalarization function from the MCDM community: (i) individual single objective optimization runs for each scalarization function; (ii) a population-based approach (MSOPS), in which all the scalarizations are considered together. Hughes found that both these scalarization approaches outperformed a Pareto-based optimizer (NSGA-II, [25]) on four-objective and sevenobjective problems. In a prelude to the success of the later MOEA/D algorithm [87], the study showed that considering the scalarizations simultaneously also provided a more effective search than a set of isolated scalarization runs. These findings were reinforced in further empirical studies by Corne and Knowles [21] and Wagner et al. [84]. The former study considered five, 10, 15 and 20-objective instances of travelling salesman problems [21], finding that a scalarization function based on the weighted average ranking for each objective performed better than Pareto-based approaches. The latter study extended Hughes’ earlier findings to DTLZ problems and also identified that indicator-based algorithms (using set-based fitness metrics) also showed promising performance on many-objective problems— although the high time-complexity of computing indicator values or contributions when the number of objectives is large should be noted.
1 Introduction to Many-Criteria Optimization and Decision Analysis
13
Whilst most of the works in evolutionary many-objective optimization have assumed, either explicitly or implicitly, that the objectives were all in conflict with each other, other problem formulations are possible. Purshouse and Fleming proposed a typology consisting of three types of relationship in many-objective problems [71]: conflict, harmony (in which objectives worked in synergy) and independence (in which the objectives were unrelated). The concept of harmony drew explicitly on existing ideas in the MCDM field on what was variously called ‘redundant’ [42], ‘supportive’ [17] or ‘nonessential’ [41] objectives—objectives for which the Pareto set is unchanged when those objectives are removed from the problem formulation. Whilst independence has received little attention [70], harmony has been considered more extensively by the EMO community, alongside parallel developments in MCDM [64, 81]. The early focus here was on methods for reducing the dimensionality of the objective space by automatically identifying and removing harmonious (or redundant) objectives during the optimization process. Brockhoff and Zitzler [14, 16] proposed a mathematical framework for objective reduction, based on the extent to which the dominance relations considering the full set of objectives would be preserved. The authors also proposed algorithms to identify minimum objective subsets under specific preservation conditions. In a separate approach, López Jaimes and colleagues used feature detection to identify candidates for objective reduction [56], also based on minimum subset definitions. Deb and Saxena investigated principal component analysis as a method for reducing the dimension of the objective space [26], and later provided an extension for nonlinear geometries [74, 75]. In [15], the authors investigated how the reduction of the number of objectives can reduce the internal computation time of a hypervolume-based algorithm. Further fundamental studies considered the effect of removing or adding objectives to a problem formulation [13, 48]. The culmination of this early phase of many-criteria optimization research was the review paper by Ishibuchi and colleagues [55]. Combining new experiments with reviews of the existing literature, these authors identified that scalarization approaches and Pareto dominance relations modified to include preference information both offered promise for many-criteria optimization going forward. The authors also identified further work with the MCDM community as a promising route to progress.
1.3 Where are We Now? MACODA by the Time of the 2019 Lorentz Center Workshop The number of publications in many-objective optimization has increased rapidly over the decade since the Ishibuchi review [55], and several ideas and techniques that have already been published deserve further attention. This section provides a brief overview of state-of-the-art methods and salient topics.
14
D. Brockhoff et al.
1.3.1 Algorithmic Aspects In the early period of EMO research, the differences to many-objective optimization problems were almost ignored and it was thought that similar algorithms can be applied to problems featuring both up to three and more than three objectives. The development, implementation, and testing of methods tailored towards problems with four or more objectives began in the early 2010s. Now, in the very early days of the 2020s, we have several methods well established. Three of these deserve a closer look.
1.3.1.1
MOEA/D
Building on the MSOPS concept and earlier MCDM foundations, the use of decomposition-based approaches was popularized by the Multi-objective Evolutionary Algorithm based on Decomposition (MOEA/D) algorithm [87]. The method decomposes the objective space by defining reference vectors pointing towards the true Pareto front. The task for solutions is to identify promising, close directions and follow these as fast as possible towards the Pareto front. Within MOEA/D different decomposition approaches can be involved. The most prominent ones are weighted sum, Chebyshev and boundary intersection. All lead to the original problem being decomposed into scalar aggregation functions that are then optimized by an evolutionary algorithm (EA) simultaneously. The EA basically assigns one incumbent solution to each scalarization function, which then undergoes reproduction, variation and selection. This approach provides an easy way to introduce decomposition into MOEA. Even more decomposition schemes can be incorporated easily and used interchangeably. In fact, boundary intersection and its variants became a quasi-standard here. As a consequence of the approach, general scalar optimization problems need to be addressed in contrast to optimizing the original MOP directly. This makes fitness assignment as well as diversity maintenance much easier. A further advantage is the low computational complexity at each generation, compared with Pareto and set-based indicator approaches. The disadvantage of MOEA/D—and other scalarizing methods—is the requirement to normalize what may be non-commensurable objectives. The quality of normalization can substantially influence the efficacy of the algorithm [58].
1.3.1.2
NSGA-III
Comparing MOEA/D to a second prominent approach in many-criteria optimization, the Non-dominated Sorting Genetic Algorithm-III (NSGA-III, cf. [24, 57]), there are similarities and differences. One major similarity is that NSGA-III also decomposes the original problem into scalar optimization subproblems. However, the way this is performed as well as the way these subproblems are handled are completely different.
1 Introduction to Many-Criteria Optimization and Decision Analysis
15
In general, NSGA-III follows a very similar approach as its precursor NSGA-II [25]. Only the secondary selection criterion within the selection operator underwent significant changes. Instead of crowding distance, a new criterion is applied that is based on reference points and, thus, on lines from the reference points towards the origin of a transformed/normalized search space. A hyperplane is defined in each generation and, regularly, the reference points are equally distributed on the hyperplane, e.g. by the method proposed by Das and Dennis [22]. Each population member is then associated with the nearest reference line, i.e. the line connecting a reference point with the adapted origin of the objective space. The secondary selection criterion then prefers members that are associated to lines with a smaller number of members associated with it. If ties exist, first the distance to the corresponding line is considered. If that does not help, remaining ties are broken randomly.
1.3.1.3
HypE
The third method regularly applied to MaOPs is a hypervolume estimation (HypE, cf. [4]) based algorithm similar to the one proposed by Zitzler and Künzli [88] or SMS-EMOA [8]. The underlying algorithm is NSGA-II with its (µ + µ) selection scheme and the secondary ranking criterion replaced again. This time, the crowding distance of a solution is replaced by a fitness that is inversely proportional to the lost hypervolume indicator value when the solution is removed. To cope with the high computational complexity of the hypervolume calculations in high-dimensional objective spaces, HypE approximates these hypervolume contributions using Monte Carlo sampling.
1.3.1.4
Discussion
Comparing results of the three methods mentioned, there is no clear winner. The results depend on the specific problem, its search and objective space dimension as well as on the parameterization of the specific algorithm. Each algorithm has its pros and cons. What they have in common is that the objective space dimension has a severe influence on its computational performance. This holds for HypE with respect to the number of points needed to adequately approximate the hypervolume, for NSGA-III with respect to the number of reference points needed to receive an adequate distribution on the considered hyperplane, and for MOEA/D with respect to the number of subproblems, i.e. the number of scalar aggregation functions needed to achieve well-distributed results. Thus, all three methods still undergo the curse of dimensionality. Due to the limitations of the presented approaches above, there is an ongoing challenge to derive new/improved methods for many-objective optimization. When working on such methods, however, it will be valuable to pay attention to salient topics such as performance assessment, visualization, alternative problem
16
D. Brockhoff et al.
formulations and also some difficulties in reaching convergence that result from the high dimensionality of the objective space.
1.3.2 Salient Topics Looking at the existing literature on many-objective optimization the question of how to design algorithms for many-objective optimization or how to generalize existing algorithmic concepts for multi-objective optimization has been a major topic. Already the previously mentioned MOEAs show that the motivations for adaptations of existing algorithms are diverse and relate to various topics, such as diversity measures and performance indicator properties in high-dimensional spaces. Moreover, there are several proposals for algorithms that exploit structures which are not prevalent in low-dimensional solution spaces—first and foremost algorithms that make use of correlations between objectives in order to reduce the problem complexity. There are alternative approaches available which might be worth taking a look, cf. also [55].
1.3.2.1
Use of Scalarization Functions
One major point in two of the methods described above is the use of scalarization functions like the weighted sum. Here, the methods are rather open with respect to which scalarization function to use, and how these are parameterized and integrated. This leaves room for several improvements. A huge advantage of the use of scalarization functions is their computational efficiency. They are rather simple and easy to calculate yielding a rather small computational effort to do so. However, if the number of scalarization functions becomes too large, this advantage diminishes. This number of functions can depend on the population size of the appointed algorithm or the number of objectives of the (manyobjective) optimization problem. In both cases, this number may become too large to efficiently use the corresponding algorithmic approach. More research is needed to identify the best way to integrate such functions in the overall algorithmic context, which scalarizations to be used for which optimization problems, and how to parameterize the functions adequately. In addition, the apparent empirical success of decomposition-based methods for MaOPs, in comparison to Pareto-based methods, has been questioned by some researchers. Giagkiozis and Fleming noted the equivalence of Chebyshev scalarization to Pareto dominance in terms of convergence trajectories, and argued that the performance of decomposition approaches based on Chebyshev scalarization could not be reasoned in terms of improved probability of identifying better solutions [43]. Further analysis by Giagkiozis et al. suggested that the performance advantage may arise from the stability offered by retaining fixed scalarization directions during the optimization process (also cautioning against the use of adaptive weighting schemes)
1 Introduction to Many-Criteria Optimization and Decision Analysis
17
[44]. A recent analysis by Takahashi has demonstrated convergence problems with Chebyshev-based decomposition at a finite distance from the Pareto front [79]. If the Pareto front is convex, weighted-sum scalarization functions can be used, which provide effective comparability between solutions and consequent convergence efficiencies. More generally, it is important to consider the relationship between objective space and scalarization function geometries in achieving success with decomposition-based methods [1].
1.3.2.2
Objective Space Reduction
The use of dimensionality reduction techniques in many-objective optimization has been an active topic of research in the past ten years and there are various techniques available now. We refer to the discussion in Sect. 1.2.2.3. Still challenges might remain when the number of objective functions is very large and when there are different types of relationships between objectives that, besides correlation, have to be considered.
1.3.2.3
Order Extensions to Pareto Dominance
The study of alternative order relations to the Pareto order is another promising research direction in many-objective optimization. Order relations, such as the Pareto dominance order, can be viewed as special cases of binary relations, i.e. a set of ordered pairs. An extension of an order relation is an order relation that includes all ordered pairs of the original order relation and includes additional ordered pairs. As a consequence, it reduces the number of indifferent and/or incomparable solution pairs. The number of incomparable and, thus, also the number of non-dominated solutions using the Pareto dominance relation tends to be very high in many-criteria optimization. This effect has been analysed in much detail in a recent paper by Allmendinger et al. [2], and it is inherent to problems with many objectives, in particular, if there is no strong correlation between them. Hence, the introduction of order extensions to the Pareto dominance relation has been proposed as a remedy to the undesirable situation of having too many non-dominated solutions to choose from. The Pareto order and the total order introduced by scalarization can be viewed as two extreme cases, in the sense that, in the first case, there is a maximum number of incomparable solutions (given the general problem definition) and, in the second case, all solutions are mutually comparable. In other words, the Pareto dominance can be seen as a minimalistic order relation that requires no additional preferences than a specification of which objectives are to be minimized and to be maximized. On the other side, scalarizations such as a weighted sum of objectives, introduce a total order on the set of candidate solutions, where every pair of two solutions is comparable.
18
D. Brockhoff et al.
In many-criteria optimization, extensions of the Pareto dominance relations that fall in between these extremes and form partial, non-total orders, have been proposed recently. Examples are k-optimality [35], l-optimality [90], and winning score [65]. To give an example, the k-optimality considers in how many objectives a solution is better than another solution. A common idea is also to use the information about the trade-off between objectives when comparing two solutions. This can be done by using dominance cones that expand the Pareto dominance cone [85]. Although initial analyses of these alternative orderings and comparing them to the Pareto order exist, work on comparing these alternative order relations with each other is missing. Chapter 4 of this book is devoted to an in-depth discussion of these various partial order relations.
1.3.2.4
Preference Elicitation and Interactive Methods
Besides using general ways of extending orderings it is also possible to ask the decision-maker for additional preference information. This can be done a priori or in a progressive interactive way, e.g. by asking for additional information at certain moments of the optimization process. For many-objective optimization, both approaches have been exemplified. Preference elicitation is defined as the process of asking the decision-maker simple questions from which the algorithm infers utility functions or preference relations. A common method for preference elicitation is ordinal regression [10], where the decision-maker is asked for pairwise comparisons of a subset of all pairs in X × X and based on this information a compatible utility function is inferred that can be used to rank all solutions in X . Other ways of preference elicitation can be to ask for reference points, target regions on the Pareto front, or information on the relative importance of objective functions. These methods will be discussed in the next subsection on interactive methods.
1.3.2.5
Navigation and Interactive Methods
It was mentioned before in this introduction, that when facing the problem of visualizing high-dimensional manifolds, our spatial cognition does not very well support us. Instead of computing and trying to visualize the entire non-dominated set, navigation methods focus on gradual changes that occur when moving from one solution to another. By repeatedly asking the decision-maker for preferable moves, they enable the decision-maker to navigate along a path through higher dimensional objective space. Classical Pareto navigation methods navigate across a set of non-dominated solutions, where each move entails losses in some objectives and gains in others. The so-called Nautilus method tries to avoid trading off: starting from a very bad, dominated solution, they follow a path of gradual improvement until they hit a solution on the Pareto front [67]. This way, the decision-maker has the positive experience
1 Introduction to Many-Criteria Optimization and Decision Analysis
19
that the solutions are always improving in some objectives without losses in other objectives. The decision-maker has control on which objectives to improve. In general, navigation methods are well suited for many-criteria optimization and by working in a path-oriented manner they do not have to face the curse-ofdimensionality. Within the MCDM community, there has been some work on navigation methods [31, 32], but the methods remain underexploited by the plethora of approaches that seek to cover the Pareto set in its entirety. In interactive methods [66, 68], the decision-maker is not only providing additional information but s/he is also learning about possibilities, limitations and/or trade-offs and in the light of new information can adjust preferences. Reference vectors and their components (called aspiration levels) are a common way to formulate preferences—representing desirable objective function values. Reference points can also be processed and visualized in many-objective optimization. Interactive multi-objective optimization has been an active topic of research (see, e.g. [68] and references therein).
1.3.2.6
Visualization Tools
In addition, to better support the user in recognizing the structure of the nondominated set, recently, methods from visual analytics that can deal with highdimensional multi-variate data sets have been discussed [47, 82]. For a very large number of objectives, even graph theoretical methods that visualize networks of (compatible, conflicting) objective functions have been proposed. Last but not least, the so-called Pareto scanner is an approach where constraints or weights can be defined on objective function values by using sliders, and this enables the fast interactive exploration of the set of non-dominated solutions [5, 80]. Another direction might be to describe the set of non-dominated solutions by human understandable rule sets that describe trade-offs, as it is proposed for instance in so-called ‘innovization’ techniques [27].
1.4 What Remains to be Done? A Vision for MACODA in 2030 In this introduction, we have so far covered the history and current state of the art in MACODA. But what remains to be done? When setting up an agenda for research several salient topics are to be addressed. First and foremost, in order to proceed in a scientific way, besides a collection of ideas, that is perhaps already available, we require means to measure progress in implementing these ideas. In this context, it will be paramount to define scalable performance indicators. Scalability should be achieved in two ways: firstly, in terms of computational resources, which rules out some attractive ideas for performance
20
D. Brockhoff et al.
indicators for low dimensional spaces such as the hypervolume indicator, and also coverage of manifolds will be more difficult to achieve, due to the curse of dimensionality. Secondly, the indicators should assess the performance of solution sets in a way that is meaningful in many-objective optimization. Even when the goal remains to compute an evenly spaced coverage of the non-dominated set, often performance indicators suffer from biases that become more prominent in high dimensions and we are only beginning to understand how to counteract these biases [3, 54, 76]. There are, however, diversity indicators that favour uniformly distributed sets and that can be scaled in dimension, such as the Riesz s-Energy [33]. Future work could investigate how they can be best integrated into MaOP frameworks. The visualization and interactive exploration of solution sets in high dimensions is another active topic of research: how can we make techniques of visual analytics more adapted to the specific structure of Pareto non-dominated point sets or non-dominated point sets of other orders proposed for many-criteria optimization? How to assess the performance of interactive and path-oriented methods, such as navigation methods is largely uncharted terrain. Perhaps it requires a radical shift of mindset to assess performance of interactive algorithms, and first ideas in this direction come from the recent efforts to create artificial decision-makers [63]. One might also consider the minimization of cognitive load, such as number of questions, complexity of questions and number of interaction steps [11, 36]. Once performance metrics are established and agreed upon, algorithm performance can be measured and improved. Also, the creation of benchmarks that resemble characteristics of real-world problems and typical difficulties in many-objective optimization will then become an important topic. Sub-classes of many-objective optimization may become independent research areas, such as problems with a large number of homogeneous objectives or with a hierarchical structure. As compared to classical multi-criteria optimization with a small number of objectives, the amount of different concepts and varieties of general solution approaches is much larger in many-criteria optimization. It makes it a challenge to establish a terminology and to define sub-fields for research in an ordered manner. One attempt to proceed into this direction is to define an ontology of concepts and link this ontology to the existing work. This book makes a first attempt in this direction by initiating the MyCODA ontology, as a curated effort to define taxonomies and sort existing work.
1.5 Synopsis The topics in this book are structured into lead chapters addressing major themes in MACODA (Part I) and chapters addressing emerging and more specialized topics (Part II).
1 Introduction to Many-Criteria Optimization and Decision Analysis
21
1.5.1 Key Topics In the first contributed chapter of the book, Chap. 2, Peter Fleming and colleagues introduce the real-world context for many-objective optimization. In addition to providing examples of real-world applications—highlighting the efficacy of manyobjective techniques—the authors set their sights on seven key issues for the successful resolution of real-world many-objective problems. In so doing, the chapter covers, for example, issues like the problem formulation, the selection of adequate algorithms, and how to handle uncertainties. All such issues are addressed taking real-world problems into consideration and often from the viewpoint of a domain expert, not necessarily a many-objective optimization expert. This way, the chapter bridges the gaps from domain to optimization experts as well as from the applicationoriented approach to the more formal methods following in the remaining chapters of the book. In Chap. 3, Koen van der Blom and colleagues present the findings from a questionnaire among practitioners on the characteristics of real-world problems. Surprisingly, little is known about the general characteristics of such problems, or their relationships to the prolific field of artificial benchmarking in the research community. Whilst the findings will be of interest to algorithm designers faced with single and bi-objective problems, there are particular points of interest to those designing for many-objective problems, including a notable prevalence of positively correlated objectives and the existence of preference information to guide the search. In just over two fifths of cases, the respondent was able to identify that the Pareto front was convex—which presents efficiency opportunities for the configuration of scalarization functions in decomposition-based algorithms, particularly in many-objective settings. In Chap. 4, André Deutz, Michael Emmerich and Yali Wang discuss extensions of the classical Pareto dominance relation to better cope with multi-criteria problems. After introducing the basic concepts of binary relations, (partial) orders, cone orders, and their properties, the authors discuss the idea of order extensions—a new relation that does not contradict the previous one but is allowed to introduce additional dominance relationships among objective vectors. Based on this definition, a large amount of previously suggested dominance relation alternatives from the literature are introduced, discussed and compared. A list of open questions and ideas for future research directions concludes the chapter. In Chap. 5, Bekir Afsar and colleagues consider the impact of many objective problem features on the quality measures, or indicators, that are often used to assess the performance of MOEAs and sometimes used within optimizers to direct the search. Moreover, the authors discuss indicators developed for preferencebased MOEAs, where preference information is incorporated into the indicators. In addition to the computational demands of computing the indicators in the presence of many objectives, the authors highlight concerns associated with the relationship between indicator parameters (e.g. reference points) and the optimal distribution of an approximation set of a given size. These findings lead to strong
22
D. Brockhoff et al.
recommendations for authors to publish full details of indicator parameter settings in reported studies, particularly for many-objective analyses. The authors also argue the case for retaining the use of Pareto-compliant indicators for MaOPs. The chapter further, and uniquely, highlights the many-objective challenges associated with stochastic objective functions, accompanying robustness metrics and quality measures for interactive methods—areas for fruitful investigation in future MACODA research. In Chap. 6, Vanessa Volz and colleagues give an overview of existing benchmark suites for multi- and many-objective optimization and discuss their shortcomings. They also extend on more general aspects of benchmarking like common pitfalls when it comes to performing numerical benchmarking experiments and presenting their results. Finally, they make suggestions on how to avoid those pitfalls by following guidelines and a concrete checklist for benchmarking algorithms. In Chap. 7, Jussi Hakanen and colleagues explore the topic of visualization of (sets of) objective vectors and solutions within decision support processes for optimization problems with many objectives. Visualization is important as is an understanding of the different ways of utilizing visualization in many-objective applications. Guidance is provided for choosing and applying visualization techniques including recommendations from the field of visual analytics. This is illustrated through a complex real-world decision problem having ten objectives. In Chap. 8 Andreia Guerreiro, Kathrin Klamroth and Carlos Fonseca explore theoretical aspects and challenges related to multi- and many-objective optimization problems. The aim of this chapter is to explore the connections between set-quality indicators and scalarizations by considering to what extent indicators can be seen as a generalization of scalarizations. This serves as a motivation for the discussion of the corresponding theoretical properties, including monotonicity, independence/covariance (e.g. with respect to translation and/or scaling), as well as theoretical aspects related to the specification of parameters, such as weights and reference points, and possibly its connection to DM preferences. The authors also discuss indicator-based subset selection in the context of the classical problems in MCDM (choice, ranking and sorting), as subset selection may be seen as a generalization of choice that is different from ranking, itself another generalization of choice. In the next chapter, Chap. 9, a special emphasis is put on the correlation between objectives. Tinkle Chugh and colleagues provide an insight into solving multiobjective optimization problems by considering the correlation among objective functions. After six methods and approaches which have been used to find correlations from data were reviewed, more light is shed on the conflict and harmony between objectives and how these can be compared using correlation measures. The use of correlations in fields such as data mining, innovization and objective reduction, is described. The chapter finishes with an overview on problems—benchmarks and one real-world problem—with correlated objectives and reviews articles which focus on implicit as well as explicit control of corresponding correlations.
1 Introduction to Many-Criteria Optimization and Decision Analysis
23
1.5.2 Emerging Topics In Chap. 10, authors Hao Wang and Kaifeng Yang provide an overview of Bayesian optimization generalizations to multi-objective and many-objective optimization problems. These modern methods provide a blend of machine learning and optimization technology and are distinguished by acquisition functions. A list of realworld applications is also provided, which is mainly on problems with costly or time-expensive function approximations. These play an important role in industrial applications of optimization, e.g. when using simulators for the evaluation of objective functions. In Chap. 11, Mickaël Binois, Abderrahmane Habbal and Victor Picheny note the substantial—possibly intractable—challenges that many-objective problems present to conventional Bayesian optimization algorithms. In this exciting chapter, Binois and colleagues then set out to demonstrate how recently proposed game theoretic perspectives offer a route to resolving these challenges, transforming the scalability potential of Bayesian optimization in the number of objectives. The authors show how a many-objective problem can be expressed in game theoretic terms, before presenting Nash and Kalai-Smorodinsky (KS) solutions. State-of-the-art approaches are introduced, resulting in uncertainty-based infill criteria that incorporate the different learning tasks required to find the KS solution. The approach is demonstrated on a many-objective engineering design problem. Chapter 12 considers heterogeneous objectives, i.e. (many-objective) problems where the objective functions differ in various aspects, most notably in their evaluation time. The authors, Richard Allmendinger and Joshua Knowles, give an exhaustive overview of the few existing results in this area. They list possible properties in which objective functions might differ, discuss the algorithmic concepts that have been proposed to deal with different evaluation times and discuss which affect heterogeneous objectives have on benchmarking. The citation of related work and a discussion of possible future research directions complete the chapter. Chapter 13 introduces a web ontology for concepts in many-objective optimization and decision analysis. The intention of Vitor Basto-Fernandes and colleagues is to allow for the collaborative development of an ontology to represent the MACODA knowledge domain. This aims to make a set of integrated tools for its use by researchers and practitioners available. For example, it will enable the MACODA research community to identify gaps in the research domain or train new learners. When introducing ontologies, the authors cover ontologies in knowledge management as well as the semantic web next to other related work. With respect to the MyCODA platform, they first present the conceptual model before elaborating ontology design best practices as implemented in the platform. The current platform is called MyCODA and is available at http://macoda.club. The authors will maintain and further enrich the platform and invite researchers and practitioners to contribute.
24
D. Brockhoff et al.
1.5.3 Coda Whilst the above chapters cover many of the topics raised during the course of the open space workshop sessions and continuing discussions, of course, it was not possible to do justice to the full spectrum of ideas seeded and issues raised at MACODA. There remain some areas that were given the moniker “Cinderella topics” during the open space sessions at the MACODA workshops—these are overlooked or under-researched topics that are metaphorically still waiting for a group of researchers to ‘take them to the ball’. Future topics that we consider of substantive interest from a many-criteria perspective are the processes of preference articulation and cognitive bias, expert system support for transparent decision-making (e.g. through the use of explainable artificial intelligence methods) and software contexts such as efficient algorithmic design and exploitation of hardware (e.g via parallelism) and MCDM workflow tools.
References 1. N.R. Aghamohammadi, S. Salomon, Y. Yan, R.C. Purshouse, On the effect of scalarising norm choice in a parego implementation, in Evolutionary Multi-criterion Optimization (EMO)(Springer, 2017), pp. 1–15 2. R. Allmendinger, A. Jaszkiewicz, A. Liefooghe, C. Tammer, What if we increase the number of objectives? theoretical and empirical implications for many-objective optimization (2021) 3. A. Auger, J. Bader, D. Brockhoff, E. Zitzler, Theory of the hypervolume indicator: optimal µ-distributions and the choice of the reference point, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2009), pp. 87–102 4. J. Bader, E. Zitzler, HypE: An algorithm for fast hypervolume-based many-objective optimization. Evol. Comput. 19(1), 45–76 (2011) 5. R.J. Balling, J.T. Taber, K. Day, S. Wilson, City planning with a multiobjective genetic algorithm and a pareto set scanner, in Evolutionary Design and Manufacture (Springer, 2000), pp. 237– 247 6. R. Bellman, Dynamic Programming (Princeton University Press, 1957) 7. V. Belton, T. Stewart, Multiple Criteria Decision Analysis: An Integrated Approach (Springer, 2002) 8. N. Beume, B. Naujoks, M.T.M. Emmerich, SMS-EMOA: Multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007) 9. J. Branke, K. Deb, K. Miettinen, R. Slowi´nski, Multiobjective Optimization: Interactive and Evolutionary Approaches (Springer, 2008) 10. J. Branke, S. Greco, R. Słowi´nski, P. Zielniewicz, Learning value functions in interactive evolutionary multiobjective optimization. IEEE Trans. Evol. Comput. 19(1), 88–102 (2015) 11. R. Breukelaar, M.T.M. Emmerich, T. Bäck, On interactive evolution strategies, in Workshops on Applications of Evolutionary Computation (Springer, 2006), pp. 530–541 12. K. Bringmann, T. Friedrich, Approximating the volume of unions and intersections of highdimensional geometric objects. Comput. Geom. 43(6–7), 601–610 (2010) 13. D. Brockhoff, T. Friedrich, N. Hebbinghaus, C. Klein, F. Neumann, E. Zitzler, On the effects of adding objectives to plateau functions. IEEE Trans. Evol. Comput. 13(3), 591–603 (2009) 14. D. Brockhoff, E. Zitzler, Are all objectives necessary? On dimensionality reduction in evolutionary multiobjective optimization, in Parallel Problem Solving from Nature (PPSN) (Springer, 2006), pp. 533–542
1 Introduction to Many-Criteria Optimization and Decision Analysis
25
15. D. Brockhoff, E. Zitzler, Improving hypervolume-based multiobjective evolutionary algorithms by using objective reduction methods, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2007), pp. 2086–2093 16. D. Brockhoff, E. Zitzler, Objective reduction in evolutionary multiobjective optimization: theory and applications. Evol. Comput. 17(2), 135–166 (2009) 17. C. Carlsson, R. Fullér, Multiple criteria decision making: the case for interdependence. Comput. & Oper. Res. 22(3), 251–260 (1995) 18. A. Chipperfield, P. Fleming, Multiobjective gas turbine engine controller design using genetic algorithms. IEEE Trans. Ind. Electron. 43(5), 583–587 (1996) 19. C.A.C. Coello, G.B. Lamont, Applications of Multi-objective Evolutionary Algorithms, vol. 1 (World Scientific, 2004) 20. C.A.C. Coello, G.B. Lamont, D.A. Van Veldhuizen, Evolutionary Algorithms for Solving MultiObjective Problems (Springer, 2007) 21. D. Corne, J.D. Knowles, Techniques for highly multiobjective optimisation: some nondominated points are better than others, in Genetic and Evolutionary Computation Conference (GECCO)(ACM Press, 2007), pp. 773–780 22. I. Das, J.E. Dennis, A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems. Struct. Optim. 14(1), 63–69 (1997) 23. K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms (Wiley, Chichester, UK, 2001) 24. K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-pointbased nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans. Evol. Comput. 18(4), 577–601 (2014) 25. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 26. K. Deb, D. Saxena, Searching for pareto-optimal solutions through dimensionality reduction for certain large-dimensional multi-objective optimization problems, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2006), pp. 3353–3360 27. K. Deb, A. Srinivasan, Innovization: innovating design principles through optimization, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2006), pp. 1629– 1636 28. K. Deb, L. Thiele, M. Laumanns, E. Zitzler, Scalable multi-objective optimization test problems, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2002), pp. 825–830 29. N. Drechsler, R. Drechsler, B. Becker, Multi-objective optimisation based on relation favour, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2001), pp. 154–166 30. M. Ehrgott, Multicriteria Optimization, 2nd edn. (Springer, 2005) 31. A. Engau, M.M. Wiecek, 2D decision-making for multicriteria design optimization. Struct. Multidiscip. Optim. 34, 301–315 (2007) 32. A. Engau, M.M. Wiecek, Interactive coordination of objective decompositions in multiobjective programming. Manag. Sci. 54(7), 1350–1363 (2008) 33. J.G. Falcón-Cardona, C.A.C. Coello, M.T.M. Emmerich, CRI-EMOA: a Pareto-front shape invariant evolutionary multi-objective algorithm, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2019), pp. 307–318 34. M. Farina, P. Amato, On the optimal solution definition for many-criteria optimization problems, in NAFIPS-FLINT International Conference’2002 (IEEE Press, 2002), pp. 233–238 35. M. Farina, P. Amato, A fuzzy definition of “optimality” for many-criteria optimization problems. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 34(3), 315–326 (2004) 36. B. Filipiˇc, T. Tušar, A taxonomy of methods for visualizing pareto front approximations, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 649– 656 37. P.J. Fleming, R.C. Purshouse, R.J. Lygoe, Many-objective optimization: an engineering design perspective, in Evolutionary Multi-criterion Optimization (EMO) (2005), pp. 14–32
26
D. Brockhoff et al.
38. R. Flynn, P.D. Sherman, Multicriteria optimization of aircraft panels: determining viable genetic algorithm configurations. Int. J. Intell. Syst. 10(11), 987–999 (1995) 39. C.M. Fonseca, P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms (I): a unified formulation. IEEE Trans. Syst. Man Cyber. - Part A 28(1), 26–37 (1998) 40. C.M. Fonseca, P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms (II): application example. IEEE Trans. Syst. Man Cybern. - Part A 28(1), 38–44 (1998) 41. T. Gal, T. Hanne, Consequences of dropping nonessential objectives for the application of MCDM methods. Eur. J. Oper. Res. 119(2), 373–378 (1999) 42. T. Gal, H. Leberling, Redundant objective functions in linear vector maximum problems and their determination. Eur. J. Oper. Res. 1(3), 176–184 (1977) 43. I. Giagkiozis, P.J. Fleming, Methods for multi-objective optimization: an analysis. Inf. Sci. 293, 338–350 (2015) 44. I. Giagkiozis, R.C. Purshouse, P.J. Fleming, Towards understanding the cost of adaptation in decomposition-based optimization algorithms, in Systems, Man, and Cybernetics (IEEE Press, 2013), pp. 615–620 45. P. Goodwin, G. Wright, Decision Analysis for Management Judgment (Wiley, 2014) 46. S. Greco, M. Ehrgott, J.R. Figueira, Multiple Criteria Decision Analysis (Springer, 2016) 47. J. Hakanen, K. Miettinen, K. Matkovi´c, Task-based visual analytics for interactive multiobjective optimization. J. Oper. Res. Soc. 72(9), 2073–2090 (2021) 48. J. Handl, S.C. Lovell, J. Knowles, Multiobjectivization by decomposition of scalar cost functions, in Parallel Problem Solving from Nature (PPSN) (Springer, 2008), pp. 31–40 49. M. Herman, What is open space technology? https://openspaceworld.org/wp2/what-is/. Accessed 18 April 2022 50. M. Hinchliffe, M. Willis, M. Tham, Chemical process sytems modelling using multiobjective genetic programming, in Proceedings of the Third Conference on Genetic Programming, ed. by K.J. et al. (Morgan Kauffmann, 1998), pp. 134–139 51. E.J. Hughes, Multiple single objective Pareto sampling, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2003), pp. 2678–2684 52. E.J. Hughes, Evolutionary many-objective optimisation: many once or one many?, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2005), pp. 222–227 53. K. Ikeda, H. Kita, S. Kobayashi, Failure of Pareto-based MOEAs: does non-dominated really mean near to optimal?, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2001), pp. 957–962 54. H. Ishibuchi, R. Imada, N. Masuyama, Y. Nojima, Comparison of hypervolume, IGD and IGD+ from the viewpoint of optimal distributions of solutions, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2019), pp. 332–345 55. H. Ishibuchi, N. Tsukamoto, Y. Nojima, Evolutionary many-objective optimization: a short review, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2008), pp. 2419–2426 56. A.L. Jaimes, C.A.C. Coello, D. Chakraborty, Objective reduction using a feature selection technique, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2008), pp. 673–680 57. H. Jain, K. Deb, An evolutionary many-objective optimization algorithm using referencepoint-based nondominated sorting approach, part II: Handling constraints and extending to an adaptive approach. IEEE Trans. Evol. Comput. 18(4), 602–622 (2014) 58. O.P.H. Jones, J.E. Oakley, R.C. Purshouse, Component-level study of a decomposition-based multi-objective optimizer on a limited evaluation budget, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 689–696 59. V. Khare, X. Yao, K. Deb, Performance scaling of multi-objective evolutionary algorithms, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2003), pp. 376–390 60. J.D. Knowles, D. Corne, Quantifying the effects of objective space dimension in evolutionary multiobjective optimization, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2007), pp. 757–771
1 Introduction to Many-Criteria Optimization and Decision Analysis
27
61. R. Kumar, P. Rockett, Decomposition of high dimensional pattern spaces for hierarchical classification. Kybernetika 34(4), 435–442 (1998) 62. M. Laumanns, E. Zitzler, L. Thiele, On the effects of archiving, elitism, and density based selection in evolutionary multi-objective optimization, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2001), pp. 181–196 63. M. López-Ibáñez, J.D. Knowles, Machine decision makers as a laboratory for interactive EMO, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2015), pp. 295–309 64. A.B. Malinowska, D.F. Torres, Computational approach to essential and nonessential objective functions in linear multicriteria optimization. J. Optim. Theory Appl. 139(3), 577–590 (2008) 65. K. Maneeratana, K. Boonlong, N. Chaiyaratana, Compressed-objective genetic algorithm, in Parallel Problem Solving from Nature (PPSN) (Springer, 2006), pp. 473–482 66. K. Miettinen, Nonlinear Multiobjective Optimization (Kluwer Academic Publishers, 1999); K. Miettinen, Nonlinear Multiobjective Optimization (Springer, 2012) 67. K. Miettinen, F. Ruiz, NAUTILUS framework: towards trade-off-free interaction in multiobjective optimization. J. Bus. Econ. 86(1–2), 5–21 (2016) 68. K. Miettinen, F. Ruiz, A. Wierzbicki, Introduction to multiobjective optimization: interactive approaches, in Multiobjective Optimization: Interactive and Evolutionary Approaches (Springer, 2008), pp. 27–57 69. R. Purshouse, On the evolutionary optimisation of many objectives. Ph.D. Thesis, University of Sheffield, UK (2004) 70. R.C. Purshouse, P.J. Fleming, An adaptive divide-and-conquer methodology for evolutionary multi-criterion optimisation, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2003), pp. 133–147 71. R.C. Purshouse, P.J. Fleming, Conflict, harmony, and independence: relationships in evolutionary multi-criterion optimisation, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2003), pp. 16–30 72. R.C. Purshouse, P.J. Fleming, Evolutionary many-objective optimisation: an exploratory analysis, in Congress on Evolutionary Computation (CEC), vol. 3. (IEEE Press, 2003), pp. 2066– 2073 73. R.C. Purshouse, P.J. Fleming, On the evolutionary optimization of many conflicting objectives. IEEE Trans. Evol. Comput. 11(6), 770–784 (2007) 74. D.K. Saxena, K. Deb, Non-linear dimensionality reduction procedures for certain largedimensional multi-objective optimization problems: employing correntropy and a novel maximum variance unfolding, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2007), pp. 772–787 75. D.K. Saxena, J.A. Duro, A. Tiwari, K. Deb, Q. Zhang, Objective reduction in many-objective optimization: linear and nonlinear algorithms. IEEE Trans. Evol. Comput. 17(1), 77–99 (2013) 76. K. Shang, H. Ishibuchi, W. Chen, L. Adam, Hypervolume optimal µ-distributions on line-based pareto fronts in three dimensions, in Parallel Problem Solving from Nature (PPSN) (Springer, 2020), pp. 257–270 77. R.E. Steuer, Multiple Criteria Optimization: Theory, Computation and Application (Wiley, 1986) 78. A. Sülflow, N. Drechsler, R. Drechsler, Robust multi-objective optimization in high dimensional spaces, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2007), pp. 715–726 79. R.H. Takahashi, On the convergence of decomposition algorithms in many-objective problems, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2019), pp. 39–50 80. C. Thieke, K.-H. Küfer, M. Monz, A. Scherrer, F. Alonso, U. Oelfke, P.E. Huber, J. Debus, T. Bortfeld, A new concept for interactive radiotherapy planning with multicriteria optimization: first clinical evaluation. Radiother. Oncol. 85(2), 292–298 (2007) 81. N.V. Thoai, Criteria and dimension reduction of linear multiple criteria optimization problems. J. Global Optim. 52(3), 499–508 (2012) 82. T. Tušar, Visualizing solution sets in multiobjective optimization. Ph.D. Thesis, Jožef Stefan International Postgraduate School, Slovenia (2014)
28
D. Brockhoff et al.
83. T. Tušar, B. Filipiˇc, Visualization of pareto front approximations in evolutionary multiobjective optimization: a critical review and the prosection method. IEEE Trans. Evol. Comput. 19(2), 225–245 (2014) 84. T. Wagner, N. Beume, B. Naujoks, Pareto-, aggregation-, and indicator-based methods in manyobjective optimization, in Evolutionary Multi-criterion Optimization (EMO) (2007), pp. 742– 756 85. M.M. Wiecek, Advances in cone-based preference modeling for decision making with multiple criteria. Decis. Making Manuf. Serv. 1(2), 153–173 (2007) 86. Y. Yu, Multiobjective decision theory for computational optimization in radiation therapy. Med. Phys. 24(9), 1445–1454 (1997) 87. Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007) 88. E. Zitzler, S. Künzli, Indicator-based selection in multiobjective search, in Parallel Problem Solving from Nature (PPSN) (Springer, 2004), pp. 832–842 89. C. Zopounidis, P.M. Pardalos, Handbook of Multicriteria Analysis (Springer, 2010) 90. X. Zou, Y. Chen, M. Liu, L. Kang, A new evolutionary algorithm for solving many-objective optimization problems. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(5), 1402–1412 (2008)
Chapter 2
Key Issues in Real-World Applications of Many-Objective Optimisation and Decision Analysis Kalyanmoy Deb, Peter Fleming, Yaochu Jin, Kaisa Miettinen, and Patrick M. Reed Abstract The insights and benefits to be realised through the optimisation of multiple independent, but conflicting objectives are well recognised by practitioners seeking effective and robust solutions to real-world application problems. Key issues encountered by users of many-objective optimisation (>3 objectives) in a real-world environment are discussed here. These include how to formulate the problem and develop a suitable decision-making framework, together with considering different ways in which decision-makers may be involved. Ways to manage the reduction of computational load and how to reduce the sensitivity of candidate solutions as a result of the inevitable uncertainties that arise in real-world applications are addressed. Other state-of-the-art topics such as the use of machine learning and the management of complex issues arising from multidisciplinary applications are also examined. It is recognised that optimisation in real-world applications is commonly K. Deb Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA e-mail: [email protected] P. Fleming (B) Department of Automatic Control and Systems Engineering, The University of Sheffield, Mappin Street, Sheffield, S1 3JD, UK e-mail: [email protected] Y. Jin Department of Computer Science, University of Surrey, Guildford, Surrey GU2 7XH, UK e-mail: [email protected]; [email protected] Faculty of Technology, Bielefeld University, 33619 Bielefeld, Germany K. Miettinen University of Jyvaskyla, Faculty of Information Technology, P.O. Box 35 (Agora), FI-40014 University of Jyvaskyla, Finland e-mail: [email protected] P. M. Reed Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY, USA e-mail: [email protected]
© Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_2
29
30
K. Deb et al.
undertaken by users and decision-makers who need not have specialist expertise in many-objective optimisation decision analysis methods. Advice is offered to experts and non-experts alike.
2.1 Introduction There is a close coupling between many-objective optimisation and decision analysis. Many-objective optimisation rarely results in a single solution; instead, we seek to identify a satisfactory compromise from a family of estimated Pareto-optimal solutions using tools and methods that facilitate preference-informed decision-making. Given that in most real-world applications (RWAs) problem formulations are not known perfectly in advance, it is important to note that many-objective optimisation is one step of a broader deliberative and iterative learning process to discover the trade-offs across new or novel decision alternatives. Consequently, it is important to strive to avoid any cognitive bias in the entire process: problem formulation, decisionmaking framework, the elicitation of preferences and the tools and techniques used to perform the underlying search. While there are many decision-making approaches for solving real-world problems with a number of objectives, in this chapter, we favour the use of optimisation using evolutionary methods and, in particular, manyobjective evolutionary algorithms (MaOEAs), thus placing emphasis on problems that involve many (>3) objectives since these are a common feature of RWAs.1 We discuss seven key themes for the successful resolution of real-world manyobjective optimisation problems. We take the view that the participants in an RWA design optimisation need not necessarily be experts in the field of many-objective optimisation. With this in mind, the chapter is targeted at experts and non-experts alike. The chapter’s seven themes are: • • • • • • •
Problem Formulation (Sect. 2.2); Developing a Decision-Making Framework (Sect. 2.3); Algorithm Selection (Sect. 2.4); Interactive Methods, Preference Articulation and Use of Surrogates (Sect. 2.5); Uncertainty Handling (Sect. 2.6); Machine Learning Techniques (Sect. 2.7); More Advanced Topics.
At the commencement of any real-world application, it is imperative to undertake a painstaking approach when formulating the problem. Section 2.2 describes steps 1 It should be noted that the term many-objective optimisation is used by the evolutionary community. In the multiple criteria decision-making field, the distinction is typically made between bi-objective problems (with two objectives) and the rest, since the performance of the methods is not as dependent on the number of objectives as in the evolutionary approaches. Therefore, when discussing non-evolutionary methods in this chapter, the term multi-objective optimisation is used, and it does not indicate that the number of objective functions is limited to 3.
2 Key Issues in Real-World Applications…
31
in this process, emphasising the need for full consultation and feedback and the importance of safeguarding against cognitive bias. Section 2.3 describes the process of developing decision-making frameworks. The ongoing importance of avoiding unintentional partiality is achieved through careful testing of potential frameworks and eliciting stakeholder feedback. Following some guidance on the selection of a suitable MaOEA (Sect. 2.4), the role of preference articulation and decision-maker (DM) interaction is studied in Sect. 2.5, where some examples make connection with problem formulation and framework development issues. Examples also neatly illustrate the value of surrogate models in reducing computational effort. It has already been noted that RWAs are rarely, if ever, known perfectly. For example, models may be imprecise, objectives may lack total clarity and constraints may be inexactly expressed. Section 2.6 addresses the important area of uncertainty handling and provides insights into how this can be managed. Section 2.7 identifies four areas where machine learning (ML) can be useful. It is a valuable technique that can be deployed at various points in the design cycle. Four areas are identified as examples of where ML can be useful: problem formulation and decomposition in large-scale problems, learning details of problem structure for model-based solution generation, combating data paucity in surrogate-assisted optimisation and managing knowledge transfer across optimisation problems. For the last theme, three advanced topics are considered in Sect. 2.8. These studies describe the challenge of design optimisation across disciplines, RWAs that are subject to change during the course of an optimisation and problems that have a range of solutions where the nature or number of design variables is not constant.
2.2 Problem Formulation By contemplating many objectives explicitly and simultaneously, decision-makers have the opportunity to reduce the risks of cognitive bias, where decision-relevant insights might be overlooked (e.g. key trade-offs or new families of decision alternatives). Cognitive myopia may arise through focusing on a narrow set of alternatives or, perhaps, by substituting a weighted aggregation of a limited number of objectives. Our experience is drawn from a wide range of application domains such as engineering design, environmental management and drug discovery, e.g. [36, 39, 43, 120, 129]. Faced with a specific problem, the first step is to enlist stakeholders to describe the challenge. Stakeholder analysis [98] identifies who is involved in decision support, categorises stakeholders according to their influence and reveals relationships between stakeholders. Many stakeholders will be associated with only one aspect of the problem and will naturally have specific objectives that represent their perspective. There will be a few stakeholders who will have broader views and it is important to engage at least one of these individuals to champion the overarching objectives. Each stakeholder’s viewpoint is obtained, including his or her assumptions, goals and preferences. It is normal for an individual stakeholder to have a limited view of the problem, often with a partiality towards the importance of their expertise and
32
K. Deb et al.
its impact on an acceptable solution. This may lead to an accompanying bias that can influence both the formulation of the problem and the ensuing decision-making process. Having assembled a group of representative stakeholders, it is therefore vital to seek a negotiated problem formulation. One recent example involved the design optimisation of a powertrain for use in two quite different automobile vehicle applications. These two vehicles were intended to demonstrate extremes of a single powertrain application, for example, a light sedan and a much larger, heavier off-road-capable vehicle. A powertrain comprises the main components that generate power and deliver it. Three groups contributed to this study—Engine, Transmission and Aftertreatment (for reducing harmful emissions). Through requirements elicitation, the problem was framed and formulated, and goals, objectives, decision variables, constraints and uncertainties were identified. (In the context of problem formulation, ‘goals’ refer to significant factors to be taken into account by DMs, such as the evaluation of performance in given scenarios.) Connectivity diagrams that illustrate dependencies and the interactions of different parts of the application can often be revealing and aid an individual stakeholder’s grasp of the overall problem. These diagrams can also prove to be an effective tool for identifying methods to use to evaluate a candidate solution according to the overall requirements. Trial runs of simplified versions of the problem assisted the powertrain design group’s understanding and led to a holistic view of the application. Simplified versions of the problem were constructed using methods such as surrogate models generated using historical data, estimates of the impact of decision variables on objectives and look-up tables. The problem was designed to ‘feel’ sufficiently real for participants. An iterative process then ensued involving stakeholder feedback before an accepted problem description and formulation was agreed. In this case, one stakeholder, the Chief Engineer, provided overall guidance to the team in pursuit of the overarching objectives.2 In problems involving greater complexity, such as certain environmental systems, we may have a set of lead stakeholders, each championing their own sector, for instance, reliable performance, efficiency and environmental impact. Before moving to develop the many-objective optimisation problem (MaOP) framework, it is prudent to identify the models that might be used for problem simulation. These are likely to take a variety of forms, such as continuous and discrete simulation and some may prove highly compute-intensive, necessitating possible replacement by surrogate models, should there be associated computational/time budget limitations. Interoperability and co-simulation are often an important consideration. Moallemi et al. [91] stress the importance of stakeholder feedback at every step of problem formulation, often requiring iterations of the process.
2 This exercise was undertaken with financial support from EPSRC, United Kingdom, and Jaguar Land Rover as part of the jointly funded Programme for Simulation Innovation (PSi) (EP/L025760/1). A publication on this particular application exercise is currently being prepared.
2 Key Issues in Real-World Applications…
33
2.3 Developing a Decision-Making Framework Following problem formulation, the next phase is the development of the framework to enable decision-making, that is, the decision-support processes to be adopted. Again, this is recommended to be an iterative process based on stakeholder feedback. To take full advantage of the many-objective optimisation approach, it is expected that there will be few, if any, a priori decisions. The default mode will be a posteriori decision support, although, as will be discussed, DM interaction during the optimisation can be used to streamline progress. In real-world applications, there will be situations where there is a desire that the MOP framework attempts to emulate or approximate an existing design approach as this assists the design team transition to the new approach. In passing, it is observed that for a case requiring that a solution conforms to an engineering product family, that requirement should be recognised as an a priori decision that constrains the optimisation search space [130]. In a detailed review of methods for developing decision-support frameworks, Moallemi et al. [91] highlight the tensions that can arise as a result of competing framework proposals. They stress the importance of stakeholder participation and engagement and point out that the use of models can enable the testing of rival frameworks and can assist in resolving contested schemes. Initial steps to develop a framework include the systematic sampling of the decision space, say, by Latin Hypercube sampling, to gain an appreciation of the characteristics of the problem. Again, as in problem formulation, initial trials, using a simplified version of the MOP, can be helpful for stakeholders and framework developers alike. For example, it may be possible to gain insight into relationships between objectives, such as degrees of conflict, harmony or independence [97] along with pairwise correlation data. These correlation data estimates may assist, subsequently, in seeking to reduce dimensionality in decision variables and objectives. Insights may be gained that may later help promote early decisions on parts of the problem, such as fixing certain decision variables. The visualisation and information requirements of the DMs can be established through consultation. Certain problem constraints such as legislative constraints are inviolable, however, in other cases, for instance, financial constraints, it may be prudent not to pose a priori constraints but rather express these as objectives such that the DM(s) may determine their threshold values (effectively ‘soft’ constraints) as the optimisation progresses. Such preference articulation [37, 38] introduces the notion of DM interaction with the optimisation process and is discussed in Sect. 2.5. Thus, during development of the framework, there is the flexibility to interpret constraints identified during problem formulation as soft constraints, provided that the framework permits DM interaction. This may occur at pre-determined design ‘freeze’ points or as appropriate for the effective development of the optimisation. It is not uncommon for design team leaders to prefer a ‘design funnel’ approach, starting from a wide search space and gradually refining it towards a more restricted space by introducing progressively more focused preferences or goals.
34
K. Deb et al.
It is normal in real-world applications to be concerned about the sensitivity to uncertainties of any selected solution. Sources of endogenous uncertainties include model fidelity, the environment, decision variable tolerances and the fact that evaluations of objectives may be qualitative, black-box or statistical. Exogenous uncertainties are those in the modelled system that are not under the control of the stakeholders, such as unforeseen future developments. DMs seek solutions that are robust to uncertainties and are therefore willing to accept solutions that are merely close to the Pareto-optimal surface. Constructs to evaluate uncertainty may be robustness metrics (e.g. performance subject to a percentage confidence requirement) or solution performance in specific situations (e.g. worst-case scenarios or sufficiently many scenarios) and provision should be made in the decision-making framework in accordance with stakeholder requirements. An alternative approach is for a DM to be involved in robustness considerations in an interactive manner so that preferences are iteratively expressed not only on objective function values but also on aspects of robustness. Examples of such interactive approaches are proposed in [140] for uncertainty in objective functions and in [141] for uncertainty in decision variables. Moallemi et al. [91] expand on much of this in a recent review; Ide and Schöbel [63] also provide a recent review where they provide an interesting analysis of different concepts. The final decision-making step is how to select a satisfactory compromise solution with consideration of competing objectives. Examples of decision evaluation methods are those in which: 1. DMs can progressively tighten their different objective preference thresholds (goals) until a single solution remains, 2. DMs can select a small number of representative solutions (e.g. cost-biased or performance-biased) and use pairwise comparisons to present arguments in order to promote convergence to an agreed solution or 3. DMs test a small number of candidate solutions on pre-selected design scenarios to facilitate selection of the most acceptable solution. Woodruff et al. [130] draw attention to the fact that different MOP formulations and decision-making frameworks can often reveal unintended consequences that might arise from specific formulations and frameworks. It can therefore be useful to apply these decision evaluation methods to solutions that emerge from competing MOP formulations and frameworks in order to overcome any inadvertent cognitive myopia. Advances continue to be made in methodologies for the development of MOP frameworks suitable for a range of real-world applications. For example, complex real-world problems sometimes involve a distributed set of design teams focused on different disciplines or different sub-systems that interact to engage in the overall problem. The teams may also operate on different timescales. Multi-node networks, such as these, operating asynchronously present an ongoing challenge to researchers. Klamroth et al. [73] have reviewed recent developments in this field and, while some progress has been made, more research is required.
2 Key Issues in Real-World Applications…
35
2.4 Algorithm Selection A specific part of the development of a decision-making framework is the selection of a suitable MaOEA. In real-world applications, optimisation objectives are often evaluated by simulation or experiment and this can be carried out efficiently by using optimisation algorithms embedded in a comprehensive design optimisation platform. Companies and organisations frequently use such an approach. Commercial examples include HEEDS (https://www.redcedartech.com), modeFRONTIER (https://www.esteco.com/modefrontier) and Isight (https://www.3ds.com/productsservices/simulia/products/isight-simulia-execution-engine). However, while platforms such as these offer MOEAs (multi-objective evolutionary algorithms), it would appear that algorithms that specifically address many-objective optimisation are not currently available, although some platforms enable incorporation of user software. Open-source design environments such as DESDEO [92] (https://desdeo.it.jyu. fi) and Liger [31] (https://github.com/ligerdev/liger) continue to be developed and are able to include many-objective optimisation; jMetal (https://github.com/jMetal), pymoo (https://pymoo.org/) and PLATEMO (https://github.com/BIMK/PlatEMO) are useful MOEA and MaOEA platforms. It is now well accepted that traditional MOEAs (those designed for 2 or 3 objectives) have convergence difficulties when confronted with a many-objective problem. As the number of objectives increases, it becomes increasingly difficult to discriminate solutions to guide the search as a growing proportion of the search population becomes non-dominated and selective pressure diminishes. Before determining which MaOEA to select, let us assume that, during problem formulation and the development of a decision-making framework, the number of objectives has already been reduced through means such as the removal of redundant objectives or, perhaps, the aggregation of certain objectives (possibly due to close correlation) [6]. The first MaOEA selection decision, then, should be to determine: – Category (i): whether revealing the full Pareto front (PF) is essential or – Category (ii): whether the DM is comfortable to focus on a specific region of interest (a subset of the Pareto front). Category (i) is suitable for an RWA that requires a disclosure of the whole PF, for example, in order to gain an appreciation of the nature of the full range of compromises required in selecting a suitable solution. Category (ii) involves preference elicitation and requires the DM to be a domain expert, able and willing to express domain preferences to guide the solution search. While the development of MaOEAs continues apace, three families of algorithms have emerged that form the main selection of MaOEAs to use to search over the whole PF: indicator-based methods, relaxed dominance-based approaches, and methods that decompose the problem to a set of scalar optimisations. There are descriptions of typical algorithms in 1. In an indicator-based method, the indicator is a measure of quality of a member of the search population. For one version, a traditional MOEA, such as a variant
36
K. Deb et al.
of NSGA-II [24], can be used to rank a population with non-dominated sorting and the indicator, rather than crowding distance, applied for second-level ranking. Shortcomings include the validity of indicator choice and, in some cases, the computational cost of indicator evaluation, for example, where the indicator is a measure of hypervolume. The modified dominance approach uses a dominance concept that is more selective than Pareto dominance, thereby addressing the selective pressure difficulty of the original, e.g. [74]. One of its advantages is that it enables the adoption of a suitably modified traditional MOEA for the search. Borg (http://borgmoea.org/) is an algorithm that uses a version of NSGA-II that has been reworked to contain modified dominance; it also includes features to support the user in an RWA such as adaptive population sizing and auto-adaptive multi-operator recombination [52]. Decomposition of the problem to a set of scalar optimisations is realised by selecting a user-defined number of reference vectors to direct the search towards the PF. In MOEA/D [136], these vectors are uniformly spaced and, typically, optimised using a weighted Chebyshev problem to accommodate non-convex PFs. Local information from neighbouring sub-problems is used to improve results. Variants of MOEA/D offer alternative schemes for choosing the direction vectors. In NSGA-III [64], the search is guided by a set of reference solutions where, often, each direction is associated with a member of the search population. NSGA-III is an improved version of NSGA-II where, now, the second level of ranking is based on reference points that are equally distributed on a hyperplane. Both methods have been shown to perform well; the user can be involved in determining direction vectors. A reference vector-guided evolutionary algorithm (RVEA) that has become popular for solving many-objective optimisation problems [11] uses the angle instead of the Euclidean distance to control the diversity of solutions. These algorithms form part of the decision-making process. There are three approaches: a posteriori, interactive and a priori [62, 83]. Category (i) methods rely on the use of a posteriori preference articulation to isolate an acceptable solution. Category (ii) methods may be based on interactive or a priori preference articulation. In a priori decision-making, a DM expresses preferences prior to the optimisation, essentially defining a direction vector for the search. It is seldom the case that a priori preferences can be adequately conveyed. Expressing preferences interactively affords greater flexibility. If the DM is able to select an achievable goal vector for the objectives, a preferability operator [37, 38] may be combined with the Pareto ranking process and has the effect of focusing the search on a region of interest on the PF, thereby reducing computational load. This bears similarities to the modified dominance approach with the advantage that the DM has direct control of the search process. Additionally, as more is learnt during the optimisation about the nature of the PF, the DM can be given the opportunity to pause the search and refine the goal vector, thereby further reducing the region of interest. To reduce the cognitive load on the DM, provision could be made to schedule the stages at which preferences are further clarified. This is one of many methods for expressing DM preferences [17]; this method has the advantages of being straightforward to implement and readily understood
2 Key Issues in Real-World Applications…
37
by the DM. Many interactive schemes have been devised, such as trade-off methods, reference point approaches and classification-based methods and [85, 90, 96] provide overviews. Example studies of preference articulation and DM interaction follow in the next Section. Furthermore, a new paradigm for interactive evolutionary optimisation without a limitation to the number of objectives is proposed in [103]. For cases where there are no particular user preferences available, knee solutions on the Pareto front might be of interest to DMs. This is particularly true when a DM has little a priori knowledge about the problem to be solved and when the number of objectives is large. Therefore, a number of evolutionary algorithms for identifying knee points have been proposed, mainly on two- or three-objective optimisation problems [126]. Most recently, an evolutionary algorithm for identifying knee solutions of many-objective optimisation problems has been proposed based on two localised modified dominance relationships; it is validated on a set of benchmark problems and a seven-objective hybrid electric vehicle design problem [134]. This, of course, is by no means an exhaustive summary of the MaOEA options suitable for RWAs; recent reviews of MaOEAs include [75, 122]. One glaring shortcoming is the absence of an adequate means for comparing approaches with an RWA context in mind; an overview of indicators is given in 5. Recently, an interesting preliminary attempt has been made in [119], where a method for automatically selecting a suitable MOEA for an RWA is presented. The main idea is to train a support vector machine that learns the relationship between the efficiency of representative MOEAs and widely used benchmark problems, and then use it to recommend one MOEA based on the similarity between the RWA and the benchmark problems. However, benchmark problems do not necessarily reflect properties of real applications although some steps towards developing test problems with more tractable properties have been made, e.g. in [35], and the different means of decision-making itself are notoriously difficult to assess. Nonetheless, the first step is clear: to decide whether to follow an a posteriori approach or an interactive one, i.e. whether to search over the whole PF or to focus on a predetermined region or some iteratively selected regions of interest. Beyond that there are some features that are important. The approach should be efficient and effective in attaining high-quality approximations of the Pareto set. Search outcomes should be reliable, irrespective of starting seeds and methods should be scalable, including the potential to deploy in parallel to exploit available computing resources.
2.5 Interactive Methods, Preference Articulation and the Use of Surrogates Even though multi-objective optimisation problems typically have many Paretooptimal solutions, when solving real-world problems, one typically needs to eventually identify a single solution to be implemented. Since vectors cannot be ordered completely, some additional information is needed and the choice is usually based on
38
K. Deb et al.
the preferences of a domain expert, a DM. Interactive methods (see, e.g. [83, 85, 92] and references therein) have proven very useful in solving real-world multi-objective optimisation problems since they only generate solutions that are of interest to the DM. This means that computational resources are reduced and the cognitive load set on the DM is more reasonable. The most important benefit is the opportunity provided for the DM to learn about the interdependencies and trade-offs among the objectives as well as about the feasibility of his/her preferences. Thanks to learning, the DM can also adapt and change preferences and, what is very important, gain confidence in the quality of the final solution. As we have seen in Sect. 2.2, problem formulation is far from trivial in many realworld cases and several iterations may be needed before the essential characteristics have been captured to be able to make a good decision. The situation is easier if the starting point is, for example, a weighted sum formulation where objective functions as functions of variables have already been identified, but instead of considering them as individual objectives they have been merged into a scalar objective. Sometimes the weighted sum method is wrongly used as a synonym for multi-objective optimisation. This is a pity, since the weighted sum has many shortcomings (e.g. the inability to find unsupported Pareto-optimal solutions which means that some Pareto-optimal solutions may remain unfound for nonconvex problems, no matter how the weights are altered [111]). A problem formulated as a weighted sum was the starting point of the case considered in [55], where it was shown that using advanced multi-objective optimisation methods, such as interactive methods, can really be beneficial in solving shape design problems. The problem in [55] considered the optimal shape design of a headbox in a paper machine, which significantly influences the quality of the produced paper. Three objectives, depending on fluid dynamics in the headbox, were formulated. Besides weighting methods, a genetic algorithm had also been applied before the idea of applying an interactive method was implemented. In [55], the interactive method applied was NIMBUS (version of [86]), which is a classification-based method. This means that the DM sees the current Pareto-optimal objective vector and provides preference information to indicate how its components should be changed to get a more preferred objective vector. The DM can use up to five classes to indicate desired changes as: 1. 2. 3. 4. 5.
functions that should improve as much as possible from the current value, functions that should improve up to a given value but not more, functions where the current value is acceptable, functions whose values could impair till a given bound or functions that can change freely.
In this case, it was difficult for the DM to provide preferences. Normally, we assume that DMs have domain expertise and they understand the significance of each of the objective function values. Further consideration showed that one of the objective functions had been formulated incorrectly and that is why its values were not meaningful for the DM. This is an indication that applying an interactive method was useful in model verification in noticing a need to revise the problem model. It is
2 Key Issues in Real-World Applications…
39
alarming that two methods had been applied before NIMBUS but the shortcoming with the objective function formulation had not been noticed. As in the previous case, also in the problem related to wastewater treatment plant design and operation [53, 54], the original thinking of the problem owners was that all objectives must be converted to a single objective function before optimisation is possible. The single objective to be minimised was the total cost. However, such an approach hides the interdependencies among different objective functions and it is impossible to see trade-offs. A commercial wastewater treatment simulator was connected to an interactive multi-objective optimisation method. In this way, it was possible to balance between different conflicting objectives and gain insight into the phenomena involved. The formulation in [53] had three objectives and the final problem in [54] had five objectives. By considering all important aspects as individual objectives, one could study their trade-offs and, importantly, avoid introducing unnecessary uncertainties in the problem formulation (which is unavoidable if they are all converted to money to be able to minimise total cost). For example, estimating future energy prices unavoidably introduces uncertainty, which is avoided by optimising energy consumption, instead. One can also end up having a multi-objective optimisation problem if there are constraints that form an empty set, as in the optimal control problem in [87] related to continuous casting of steel. To be more specific, the task was to control the secondary cooling and the original constraints characterising the quality of steel were so demanding that the feasible region was empty. In such a case, there is nothing to be optimised. However, by converting each of the constraints to objective functions, it was possible to learn more about the problem by minimising constraint violations. The original problem had four constraints and one objective and, thus, the re-formulated problem had five objectives. By applying an interactive method it was possible to learn how much the constraints were actually conflicting and eventually find a solution where only one constraint had to be violated. The ability of applying an interactive method to verify the correctness of the problem formulation was emphasised in [106] and evidenced by considering a case of a dynamic process simulation of a two-stage separation process in process plant design. Ideally, the correctness of the behaviour of the problem model should be checked before optimisation but this may not be possible because of the complexity of the model. As pointed out in [106], the challenges of the problem formulation phase are not always sufficiently emphasised in the literature. For example, a lot of attention in the evolutionary computation literature on multiple objectives is paid to testing with benchmark problems, where problem formulation is not receiving attention. In [106], a so-called augmented interactive multi-objective optimisation method was proposed, where the validation of the problem formulation is incorporated into the interactive solution process. It can be applied with any interactive method but in the case considered, the synchronous NIMBUS [88] was coupled with a dynamic plant-wide process simulator. An example of a case where the task of formulating a multi-objective optimisation problem to be solved was very far from trivial is related to optimal shape design in an
40
K. Deb et al.
air intake ventilation system [14, 16]. The problem concerned ventilation in a tractor cabin where air intake plays an important part in maintaining a uniform temperature. The component of the ventilation system considered had four outlets which should ideally have the same flow rates to guarantee uniform temperature with minimal pressure loss. However, this was not possible since the outlets had different shapes and sizes. The question of how objective functions should be formulated to capture the essential needs required several iterations and consultations with the domain expert, as mentioned in [16]. The agreed problem formulation had three objectives. Because evaluating their values needed computational fluid dynamic simulations, the problem was computationally expensive. For this reason, a multi-objective optimisation method K-RVEA [13] (involving computationally less expensive Kriging models as surrogates) was applied. A version of the method incorporating preference information was applied in [14]. An example of applying metamodels (Kriging) as surrogate models to replace computationally expensive objective functions was presented in [1], where an interactive evolutionary method called interactive K-RVEA was proposed and challenges of incorporating preference information in the model management addressed. It was applied to a simulation-based problem in the energy system design of buildings. As mentioned so far, one of the characteristics of some real-world applications is the computational cost involving function evaluations, which may slow down the solution process significantly. In particular, we do not want to keep the DM waiting for solutions reflecting the preferences to be generated. There are different ways to speed up the calculation. As mentioned above, one can replace each of the computationally expensive objective functions by fitting a surrogate model and solving the problem with the surrogate models as objectives. Instead, one can formulate an approximation of the Pareto front in the objective space as in the PAINT method [57] or fit a surrogate function to a scalarising function applied in the interactive method (to combine the original problem with preference information). In the latter case, only one function is to be replaced by a surrogate as is the case in the so-called SURROGATE-ASF method in [116]. One approach for speeding up the calculation and not keeping the DM waiting is to apply a so-called three-stage process, which was defined and demonstrated in [110] with an integrated design and control problem related to designing paper mills. In the first stage, a representative set of Pareto-optimal solutions is generated with some a posteriori-type method. At this stage, the DM is not involved and, thus, it is not a problem if this takes time. Then, a surrogate problem to the original multi-objective optimisation problem is established and solved by an appropriate interactive method. The surrogate problem is not computationally expensive. Finally, once the DM has found the most preferred solution for the surrogate problem, this solution is projected to the closest solution of the original problem in the third stage. Encouraging results of applying the three-stage approach with PAINT and NIMBUS in the wastewater treatment operation case are reported, for example, in [58]. The three-stage solution process can be applied in various ways, for example, in interactive navigation methods, as formally characterised, in general, in [56]. Examples of such navigation methods are [32, 56], where an approximation of the
2 Key Issues in Real-World Applications…
41
Pareto front is first made and then the DM can move around in it in real time and direct the search with his/her preferences. In this way, the DM can conveniently study the trade-offs. The three-stage approach is also applied in E-NAUTILUS [100], where the DM starts from an unsatisfactory solution in terms of all objectives and gradually approaches the Pareto front. Since the pre-calculated solutions are used in the background, there is no waiting time. In addition, trading off is avoided. In other words, the DM does not need to make a sacrifice in some objectives in order to gain improvement in others. This is expected to enable more free search. Otherwise, the DM might anchor around the starting point because human attitudes to losses loom larger than gains (according to the prospect theory [72]). For more information about the NAUTILUS philosophy, see also [84, 89]. Furthermore, a method hybridising ideas of navigation and NAUTILUS is NAUTILUS Navigator [99]. All NAUTILUS methods start from an unsatisfactory solution and move closer to the Pareto front. While approaching the Pareto front, the ranges of objective function values that are still reachable without trading off, shrink. In NAUTILUS Navigator, the DM can see in real time how the reachable ranges of objective function values shrink during navigation. When applying interactive methods, the importance of a good graphical user interface is emphasised. The DM can also be supported with different visualisations. This topic is discussed in Chap. 7.
2.6 Uncertainty Handling In Sect. 2.3, we have identified that a major concern when using optimisation methods in practical problem solving comes from the fact that on most occasions the optimised solution obtained by solving a simplified model of the real-world problem does not represent the true optimal solution of the real-world problem. This is due to the discrepancies that always exist between the model being optimised and the true real-world problem. While the model can be made closer to the real problem by introducing more details and nonlinearities in objectives and constraints and using more time-consuming but accurate simulation procedures, in this section, we lay stress on the uncertainty aspects of the variables and problem parameters. Every manufacturing or implementation procedure of deploying an optimal solution in practice involves a finite tolerance, within which the desired solution can never be implemented exactly. This introduces an uncertainty in implementing certain optimised variable values (xu ). Other variables (xd ) are considered deterministic. Moreover, the modelling of objectives and constraints involves certain parameters (p) (such as material properties), which can also introduce uncertainties in the optimisation process. The uncertainty quantification of the uncertain variables and parameters is an important matter for uncertainty-based optimisation methods. It requires a careful identification of the source of uncertainties and an estimation of an aggregate quantification. In most studies involving uncertainty handling, a Gaussian uncer-
42
K. Deb et al.
tainty model with its mean lying on the designated value and having a pre-defined appropriate standard deviation σ is used. Thus, an uncertain variable xiu is represented using N (xiu , σ (xiu )) and an uncertain parameter pi is represented using N ( pi , σ ( pi )). The optimisation problem for minimising m conflicting objectives f with J inequality constraints (g j (x) ≤ 0 for j = 1, . . . , J ) and specified variables bounds (x ∈ [x(L) , x(U ) ]) is modified as follows: Minimise μ fi (xu , xd , p) + κσ fi (xu , xd , p), i = 1, . . . , m , (xu ,xd )
subject to P(g j (xu , xd , p) ≤ 0) ≥ R j , x(L) ≤ x ≤ x(U ) .
(2.1)
j = 1, 2, . . . , J,
Due to distributions involved with xu and p, the objective vector f will have a mean μf and a standard deviation σf vectors. In the above chance-constraint problem, the objective with κ (a value of 2 or 3 is often used) standard deviations from the mean is minimised. Interestingly, the original constraint function g j ≤ 0 is now converted into a chance-constraint P(g j ≤ 0) ≥ R j , meaning that the probability of satisfying the constraint over uncertainties in xu and p is at least R j . Considering all constraints together and ensuring an overall reliability R in the final solution, R j can be set equal to R or in any other way considering the relative importance of constraints desired by the user. Various uncertainty handling methods vary in the way the probability P(·) is computed. One naive idea is to use a Monte Carlo sampling approach to compute the probability, but more sophisticated first-order reliability methods (FORMs) exist [19]. The above idea has been extended to multi-objective optimisation problems in several studies [22, 109]. For a given reliability value R, the studies find the reliable frontier, instead of the Pareto frontier, on which each solution satisfies the chance constraints. For a car side-impact design problem having uncertain variables modelled using Gaussian distribution around their nominal values, Fig. 2.1 shows the way the reliable frontier moves inside the feasible region with an increasing desired reliability. Each solution’s reliability index is computed using a fast-RIA approach [20]. Such a study provides two different insights: (i) find a reliable frontier for a desired
30 29
Weight vs. beta
28 Average Deflection
Fig. 2.1 The reliable frontier moves to the interior of the feasible space with an increase in reliability index on a bi-objective problem of minimising weight of a car body and average deflection on a dummy driver subject to several constraints. Taken from [22]
1.5σ
2σ
3σ
27 Fixed
26 25 24 23 22 21
22
24
26
28 30 Weight
32
34
36
2 Key Issues in Real-World Applications…
43
reliability, and (ii) estimate an appropriate reliability from the extent of movement of the frontier.
2.7 Machine Learning Techniques Machine learning techniques have become increasingly indispensable in solving realworld many-objective optimisation problems. Traditional supervised and unsupervised learning methods [66] have widely been adopted for identification of important decision variables and objectives for problem formulation and decomposition, efficient solution generation, and construction of computationally efficient surrogate models for data-driven optimisation. Most recently, new machine learning techniques, such as semi-supervised learning [142], transfer learning [132], and deep learning [44] have also found increased applications in assisting search, handling data paucity, and the recommendation of efficient surrogates or search algorithms for a given optimisation problem with little a priori knowledge. In the following, we will provide a brief review of research developments in applying machine learning techniques to multi- and many-objective optimisation.
2.7.1 Problem Formulation and Decomposition It is nontrivial to formulate an optimisation problem in the real world, if the problem to be optimised is large-scale and contains a large number of possible objectives or constraints. For example, in the design of the aerodynamics of cars, the decision space will be huge if the whole surface of the car is involved in the optimisation. Thus, given history data collected by aerodynamic engineers, it is quite natural to apply sensitivity analysis to extract the parts of the car surface that have the most significant impact on the car aerodynamics such as drag and rear lift. To this end, sensitivity analysis methods, in particular multi-variate global sensitivity analysis methods such as the mutual information index, a measure of the mutual dependence between the two variables [18], can be used for correlation analysis. By combining clustering methods and interaction information [65], a multi-variate extension of mutual information, interesting insights into the influence of the shape of passenger cars on the aerodynamic performance have been gained [46]. Although several evolutionary algorithms for many-objective optimisation have been developed, it is always desirable to reduce the number of objectives for better visualisation and more efficient optimisation by finding out the essentially conflicting objectives [36, 104, 107]. The main ideas for reducing the number of objectives are more or less similar to those in reducing the number of decision variables. For example, linear and nonlinear correlation-based [104, 127] or dominance-based methods [6, 21] can be used for objective reduction. A comparative study and analysis of both
44
K. Deb et al.
approaches are provided in [135], based on which a three-objective optimisation method for objective reduction is proposed. Many-objective optimisation problems will become even more challenging if the number of decision variables is also large, e.g. larger than 500, mainly because the efficiency of most evolutionary algorithms seriously deteriorates as the search dimension significantly increases. A large body of research on large-scale single- and multi-objective optimisation has been reported, and the main ideas can be largely divided into two categories [67], i.e. one based on decision space decomposition [95], and the other on dedicated search techniques such as competitive particle swarm optimisation [9], where particles compete pairwise and then the weaker particle learns from the stronger one. Decomposition of the search space is usually based on the correlation relationship between the decision variables using random grouping or differential grouping [93] and then the sub-problems are simultaneously solved using a co-evolutionary cooperative evolutionary algorithm [5]. Apart from correlation detection, other ideas have also been proposed for largescale many-objective optimisation problems. For example, in [138], decision variables are grouped into convergence and diversity-related ones. This is achieved by perturbing each decision variable and observing how the solution moves in the objective space. Then the k-means clustering algorithm is adopted to categorise the decision variables according to the angle between the line along which each solution moves when it is perturbed and the line perpendicular to the hyperplane. Similar ideas have been adopted in [30], where the decision variables are divided into highly robustness-related and weakly robustness-related according to the sensitivity of the objective values to the perturbations in the decision variables. This way, a more efficient search for robust optimal solutions can be achieved. Most recently, a restricted Boltzmann machine and a denoising autoencoder [121] have been used to learn a sparse distribution and a compact representation of the decision variables in solving large-scale spare optimisation problems [118]. Finally, problem decomposition in the objective space can also be better achieved with the help of machine learning techniques, in particular when the Pareto front of the problem is irregular, i.e. does not span across the whole objective space [61]. The reason for this is that decomposition-based approaches, which decompose a multi-objective problem into a number of single-objective subproblems using a set of reference vectors or weights implicitly assume that the Pareto front spans over the whole objective space. Consequently, the weights or reference vectors are evenly distributed over the objective space. This gives rise to serious problems when the Pareto front is irregular, either discrete, degenerate or inverse. For such irregular problems, the reference vectors must be adapted or adaptively generated during the optimisation using machine learning techniques so that the distribution of the reference vector can match the shape of the Pareto fronts. Among many others, the most frequently used techniques are clustering algorithms, including partitional clustering algorithms such as k-means [79] or hierarchical clustering methods [60]. Another more powerful method for adaptive generation of reference vectors is the growing neural gas network [40], which is used to adapt the reference vectors as well as the scalarising function [81]. A similar idea was independently reported in [80],
2 Key Issues in Real-World Applications…
45
which adapts multiple reference vectors one at a time and the proposed algorithm is successfully applied to the design of a hybrid electric vehicle controller minimising seven objectives.
2.7.2 Model-Based Solution Generation Solving real-world optimisation problems is often faced with challenges such as lack of knowledge about the problem structure and limited computational budget, making it extremely important to develop efficient search methods. However, the genetic operators in conventional evolutionary algorithms often search randomly and cannot make use of the knowledge that can be acquired during the search. Meanwhile, most advanced algorithms for solving many-objective optimisation problems concentrate on developing new environmental selection mechanisms, e.g. the scalarising function in decomposition-based methods, or the dominance relationship in Pareto-based approaches. A few exceptions aim to generate more promising solutions in reproduction using a model that can learn the problem structure, thereby making the search more efficient. In [115], the regularity model-based approach in [137] is extended to many-objective optimisation by introducing a diversity repairing mechanism based on a set of reference vectors and a dimension reduction technique [107]. The regularity model-based approach builds a probabilistic model in an m − 1 latent space instead of in the decision space, where m is the number of objectives, making it more scalable than other estimation of distribution algorithms [8]. Another interesting idea is to promote the diversity of the solutions in generating candidate solutions rather than in selection. To this end, a Gaussian process is employed to build a reverse model from the objective space to the decision space for controlled solution generation [10]. In the third wave of artificial intelligence, deep generative models [45] have achieved impressive successes in a wide range of applications. Most recently, generative adversarial networks (GANs) are also employed to generate candidate solutions in evolutionary multi-objective optimisation by learning the distribution of the parent individuals and then generate new solutions from the trained GAN [59]. It is shown that only a relatively small amount of training data is needed, which is of great interest if objective evaluations are expensive.
2.7.3 Data-Driven Surrogate-Assisted Optimisation The benefits of surrogate models have been introduced in Sect. 2.3. Data-driven surrogate-assisted optimisation represents a large class of real-world complex optimisation problems where the evaluation of candidate solutions must be based on numerical simulations, or physical experiments, or other data collected from real life
46
K. Deb et al.
[70]. Data-driven optimisation can be generally categorised into online optimisation [13, 16], in which a small amount of new data can be actively collected during the optimisation, and offline optimisation, where no new data can be made available [124]. One key challenge in data-driven surrogate-assisted evolutionary multi-objective optimisation is that only a limited amount of data is available because collecting data is expensive [2, 15]. Thus, powerful machine learning techniques for handling data paucity have been introduced to data-driven optimisation. For example, semisupervised learning that can make use of unlabelled data (i.e. solutions that are not evaluated using the expensive function evaluation such as numerical simulations or physical experiments) has been shown to be effective in enhancing the performance [112, 114]. In addition, active learning that aims to find the point to sample that can most effectively enhance the model quality has also been adopted in surrogateassisted optimisation, typically by choosing the best one according to the predicted fitness and the most uncertain one [123], similar to the ideas in infill criteria (acquisition functions) used in efficient global optimisation [71] or Bayesian optimisation [105]. The biggest concern in surrogate-assisted optimisation is dealing with highdimensional problems with a large number of decision variables, since most surrogateassisted algorithms are validated on problems whose dimension is smaller than 30, or even lower, when, for example, a Gaussian process is used as the surrogate [13]. To address the curse of dimensionality, one straightforward idea is to use an ensemble of local Gaussian processes [82], or a combination of global and local surrogates [113], or multiple surrogates for a better approximation [51]. Cooperative co-evolution assisted by a simple fitness inheritance strategy and a radial-basis-function-network has been shown to be effective in handling high-dimensional problems. To tackle the computational difficulties Gaussian processes suffer, a multi-objective infill criterion has been proposed for more effective model management [117] when the amount of data is limited. Most recently, a random feature sub-sampling technique has been introduced to alleviate the curse of dimensionality [41]. When the number of objectives increases, one main challenge in surrogate-assisted optimisation is how to efficiently handle the number of increased objectives. Instead of building a model for each objective, a classification-based surrogate modelling method is presented in [94], where the surrogate is used to determine whether a candidate solution is dominated or not. A heterogeneous ensemble surrogate is employed to replace the Gaussian process for assisting high-dimensional multi-objective optimisation [47], which is able to use infill criteria for model management without suffering from the prohibitive computational cost. An extension of the above work is to use a dropout deep neural network for assisting high-dimensional many-objective optimisation, which is more scalable to the increase in both the number of decision variables as well as the number of objectives [48].
2 Key Issues in Real-World Applications…
47
2.7.4 Transfer Optimisation Similar to machine learning, simultaneously solving multiple optimisation problems has been shown to be more efficient than solving single optimisation problems separately. Here, we distinguish between two different situations, one is known as multi-tasking optimisation [49], where knowledge transfer is achieved via exchange of genetic materials during the optimisation, and the other is transfer optimisation [50], in which knowledge transfer is mainly achieved by means of machine learning models via domain adaptation. One challenge in transfer optimisation is to make sure that knowledge can be transferred when the size of the search space and the location of the optimum of multiple tasks are not the same. To address this problem, decision variable translation and shuffling are introduced in to facilitate knowledge transfer between multiple tasks [29]. Multiple and adaptive crossovers are also proposed to facilitate knowledge transfer between multiple tasks [139]. In addition, the same, yet time-varying, optimisation task in a time-varying environment can be seen as multiple related tasks and therefore knowledge transfer can be done between different environments to speed up the optimisation [68]. More interestingly, the concept of multi-tasking optimisation has been generalised to surrogate-assisted optimisation, such that knowledge can be transferred between different surrogates. For example, knowledge can be transferred from a global surrogate to a local one [131], from a low-fidelity surrogate to high-fidelity one [125], from one task to another [78], and from cheap to expensive objectives of the same task [128].
2.8 More Advanced Topics As we have seen, solving a real-world optimisation problem with many objectives requires many practical considerations. Three additional topics are highlighted here; there are, of course, many others, such as, multi-level, multi- and many-objective optimisation involving the hierarchical nature of multiple interconnected optimisation problems [108], many-objective optimisation with a budget of solution evaluations [26], and ‘customized’ many-objective optimisation methods designed based on use of specific problem information [42].
2.8.1 Multidisciplinary Considerations Many complex real-world optimisation problems must be optimised involving different disciplines. For example, an aircraft wing must be optimal from strength, aerodynamics, vibration, space, storage, and other equally important considerations.
48
K. Deb et al.
Each discipline, when considered alone, will result in a completely different optimal design from another. Thus, variables, constraints and objectives must be gathered from multiple disciplines, resulting in a truly large-dimensional optimisation problem. Since each solution must also be evaluated, often using computationally expensive software for each discipline, the overall many-objective optimisation problem can be an intractable problem to solve in its naive sense. Disciplines must have to be prioritised and considered in a sequence in a lexicographic sense or by constructing a gradually increasing small-sized to large-sized combined set of multi-objective problems. The literature so far indicates that the success of multi-disciplinary optimisation (MDO) lies in the art of constructing the sequence of problems [34, 69]. The use of MaOEAs to handle a large number of objectives simultaneously (10–15 objectives are commonly solved to date [23]) will be helpful in arriving at a huge number of trade-off solutions for considering more disciplines or more details in a sequential MDO process. Such studies have not been pursued yet with any rigour, but they seem to provide a promising path towards solving MDO problems.
2.8.2 Dynamic Environments In time-changing dynamic or online problems, objective functions, constraint functions, and/or problem parameters may change as an optimisation algorithm works its way to the optimal solutions with iterations [25, 33]. Problems with changing feasible region with time (ephemeral resource constraints (ERCs)) are other instances [4]. However, usually, optimisation algorithms assume that the underlying problem remains fixed during its iterative process consuming a finite and, on some occasions, a considerable amount of computational time. It is then not guaranteed that an algorithm with a convergence proof for certain deterministic problems will behave correctly if the problems are allowed to change during the optimisation process. Such problems often arise in practice. For example, in a power dispatch problem, in which the optimisation algorithm must determine how several power generators will meet the overall demand optimally in a locality, a change in actual power demand during the optimisation process will suddenly alter the underlying problem, resulting in a different optimal dispatch schedule than the original problem with which the algorithm started. In the context of multi-objective optimisation, this may change a part or the complete Pareto-optimal front (POF) as the algorithm progresses. In such problems, an algorithm can at best track the POF, rather than ever converging on the actual POF at any time instant. It is clear that if the change in the problem is minor and gradual, an optimisation algorithm can sense the change, adapt to solve the new problem, and attempt to change its focus towards the new POF. Several such modifications to the NSGA-II algorithm [24] were proposed in an earlier study [25]. The so-called frozen-time evolutionary multi-objective optimisation (EMO) algorithm assumed the problem to be fixed for a short duration (τ ) and the algorithm gets restarted with a part of the existing population combined with a new population, either random or
2 Key Issues in Real-World Applications…
49
mutated, to search for the new POF. Then, an offline study was suggested to determine a minimum frozen-time window τ to achieve a pre-defined acceptable performance on multiple changes in the problem. Test problems for dynamic multi-objective problems [33] and various other studies exist. A renewed interest in the EMO field has clearly indicated their practical significance [7, 133]. Interactive approaches mentioned in Sect. 2.5 can be applied to dynamic problems to reduce the overall complexity of solving them.
2.8.3 Mixed and Metameric Nature of Variables In many real-world problems, the basic structure of a solution can be defined with a different number of variables, resulting in metameric problems, as often found in nature. For example, a composite plate can be fabricated with many thin laminates oriented in different principal directions, or with a few relatively thicker laminates. In the former case, the number of variables representing principal directions and thickness of each laminate will be large compared to the latter case, although both may satisfy all specified constraints. Thus, in such metameric problems, two solutions may be constructed using a different number of decision variables, thereby violating one of the fundamental assumptions in the optimisation literature: x ∈ Rn , where n (the number of variables) is fixed. In other problems, variables may have a fixed dimension, but their nature can be mixed—some variables can take discrete or integer values only, some can take real values, some can take values that are many orders of magnitude higher or lower than others and some can even be described as categorical or combinatorial. Evolutionary algorithms allow an easy way to represent mixed and metameric variables in a population and allow variable-specific recombination and mutation operators to be applied to create meaningful new solutions. Additional care must be taken for metameric representations, as a naive recombination of two differently sized population members may generate arbitrary and not-so-meaningful new solutions [101]. Multi-objective optimisation with metameric variables becomes challenging even for EMO algorithms, as the population must now carry differently sized solutions covering the entire POF. A mating restriction between similarly sized solutions is necessary to produce meaningful offspring solutions—a matter which has received some research [76, 102], but where more studies are needed to establish EMO’s ability to solve such challenging problems.
2.9 Conclusions Recognition by real-world application problem solvers of the insights to be gained through the optimisation of multiple independent objectives has spurred on the development of many-objective optimisation as practitioners sought the means to consider
50
K. Deb et al.
more than two or three objectives effectively. In turn, this has increased the emphasis on pairing up decision analysis with the optimisation process. In this chapter, we have explored key issues that arise in the optimisation of RWAs with many objectives. For instance, the importance of taking time and care in problem formulation and developing a decision-making framework has been stressed. Different forms of DM participation have been considered. RWAs often offer formidable challenges in the computation of certain types of objectives, such as outputs from simulation exercises and surrogate functions have proved effective in reducing the computational load. Uncertainty handling has been discussed since a common concern in RWA optimisation is the sensitivity of any prospective solution with respect to uncertainties that are an inevitable component of the problem formulation and implementation environment. We have seen that machine learning has obvious advantages in extracting and interpreting information from large data sets and also in handling data paucity, for example. Finally, we have considered some advanced topics such as complex RWAs where different disciplines are involved, dynamic environments and metameric problems. Inevitably, there are many research challenges that we have been unable to address within the space permitted, such as interwoven systems [73] where the nature of an RWA involves group decision-making representing different perspectives of teams collaborating in the overall application. Examples of other practical issues demanding attention are hierarchical optimisation involving multiple conflicting objectives [27], solving problems involving objectives with non-uniform latencies [3, 12, 128], utilising data from various sources for enhanced data-driven decision-making and decision analytics [70], parallel and distributed computing methods for finding a widely distributed set of high-dimensional Pareto-optimal solutions efficiently [28], and problems having a massively large number of objectives [77]. The important topic of visualisation methods that enables interaction with DMs and the analysis of the high-dimensional PFs that arise in many-objective problems is addressed later in Chap. 7. Finally, the question of how to formulate and solve an RWA at hand deserves further work and development of guidelines. Author Contribution Statement Kalyanmoy Deb was primarily involved in writing Sects. 2.6 and 2.8; Peter Fleming was primarily involved in writing Sects. 2.1 and 2.9 and chapter organisation; Yaochu Jin was primarily involved in writing Sect. 2.7; Kaisa Miettinen was primarily involved in writing Sect. 2.5; and Peter Fleming and Patrick Reed were jointly involved in writing Sects. 2.2, 2.3 and 2.4. All authors contributed to determining the structure of the chapter, its coherence and its drafting.
2 Key Issues in Real-World Applications…
51
Acknowledgements Research reported in this chapter is related to the thematic research areas: – Computational Optimization and Innovation (COIN) Laboratory at Michigan State University (+https://www.coin-lab.org+), – Intelligent Systems, Decision and Control at The University of Sheffield, – Decision Analytics utilizing Causal Models and Multiobjective Optimization (DEMO, jyu.fi/demo) at the University of Jyvaskyla, – Nature Inspired Computing and Engineering (NICE) Group at the University of Surrey, and – The Decision Analytics for Complex Systems Group at Cornell University.
References 1. P. Aghaei Pour, T. Rodemann, J. Hakanen, K. Miettinen, Surrogate assisted interactive multiobjective optimization in energy system design of buildings. Optim. Eng. 23, 303–327 (2022) 2. R. Allmendinger, M.T.M. Emmerich, J. Hakanen, Y. Jin, E. Rigoni, Surrogate-assisted multicriteria optimization: complexities, prospective solutions, and business case. J. Multi-Criteria Decis. Anal. 24(1–2), 5–24 (2017) 3. R. Allmendinger, J. Handl, J.D. Knowles, Multiobjective optimization: when objectives exhibit non-uniform latencies. Eur. J. Oper. Res. 243(2), 497–513 (2015) 4. R. Allmendinger, J.D. Knowles, On handling ephemeral resource constraints in evolutionary search. Evol. Comput. 21(3), 497–531 (2013) 5. L.M. Antonio, C.A.C. Coello, Coevolutionary multiobjective evolutionary algorithms: survey of the state-of-the-art. IEEE Trans. Evol. Comput. 22(6), 851–865 (2018) 6. D. Brockhoff, E. Zitzler, Objective reduction in evolutionary multiobjective optimization: theory and applications. Evol. Comput. 17(2), 135–166 (2009) 7. R. Chen, K. Li, X. Yao, Dynamic multiobjectives optimization with a changing number of objectives. IEEE Trans. Evol. Comput. 22(1), 157–171 (2018) 8. R. Cheng, C. He, Y. Jin, X. Yao, Model-based evolutionary algorithms: a short survey. Compl. & Intell. Syst. 4, 283–292 (2018) 9. R. Cheng, Y. Jin, A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybern. 45(2), 191–205 (2015) 10. R. Cheng, Y. Jin, K. Narukawa, B. Sendhoff, A multiobjective evolutionary algorithm using Gaussian process based inverse modeling. IEEE Trans. Evol. Comput. 19(6), 761–856 (2015) 11. R. Cheng, Y. Jin, M. Olhofer, B. Sendhoff, A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 20(5), 773–791 (2016) 12. T. Chugh, R. Allmendinger, V. Ojalehto, K. Miettinen, Surrogate-assisted evolutionary biobjective optimization for objectives with non-uniform latencies, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 609–616 13. T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, K. Sindhya, A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization. IEEE Trans. Evol. Comput. 22(1), 129–142 (2018) 14. T. Chugh, T. Kratky, K. Miettinen, Y. Jin, P. Makkonen, Multiobjective shape design in a ventilation system with a preference-driven surrogate-assisted evolutionary algorithm, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2019), pp. 1147– 1155 15. T. Chugh, K. Sindhya, J. Hakanen, K. Miettinen, A survey on handling computationally expensive multiobjective optimization problems with evolutionary algorithms. Soft. Comput. 23, 3137–3166 (2019) 16. T. Chugh, K. Sindhya, K. Miettinen, Y. Jin, T. Kratky, P. Makkonen, Surrogate-assisted evolutionary multiobjective shape optimization of an air intake ventilation system, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2017), pp. 1541–1548
52
K. Deb et al.
17. C.A. Coello, G.B. Lamont, D.A. Van Veldhuizen, Evolutionary Algorithms for Solving MultiObjective Problems (Springer, Berlin, 2007) 18. G. Critchfield, K. Willard, D. Connely, Probabilistic sensitivity analysis methods for general decision models. Comput. Biomed. Res. 19, 254–265 (1986) 19. T.R. Cruse, Reliability-based Mechanical Design (Marcel Dekker, New York, 1997) 20. D. Daum, K. Deb, J. Branke, Reliability-based optimization for multiple constraint with evolutionary algorithms, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2007), pp. 911–918 21. A.R.R. de Freitas, P.J. Fleming, F.G. Guimarães, Aggregation trees for visualization and dimension reduction in many-objective optimization. Inf. Sci. 298, 288–314 (2015) 22. K. Deb, S. Gupta, D. Daum, J. Branke, A. Mall, D. Padmanabhan, Reliability-based optimization using evolutionary algorithms. IEEE Trans. Evol. Comput. 13(5), 1054–1074 (2009) 23. K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using referencepoint-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans. Evol. Comput. 18(4), 577–601 (2014) 24. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 25. K. Deb, U.B. Rao, K. Sindhya, Dynamic multi-objective optimization and decision-making using modified NSGA-II: a case study on hydro-thermal power scheduling bi-objective optimization problems, in Evolutionary Multi-criterion Optimization (EMO) (2007), pp. 803–817 26. K. Deb, P.C. Roy, R. Hussein, Surrogate modeling approaches for multiobjective optimization: methods, taxonomy, and results. Math. Comput. Appl. 26(1), 5 (2021) 27. K. Deb, A. Sinha, An efficient and accurate solution methodology for bilevel multi-objective programming problems using a hybrid evolutionary-local-search algorithm. Evol. Comput. 18(3), 403–449 (2010) 28. K. Deb, P. Zope, A. Jain, Distributed computing of Pareto-optimal solutions using multiobjective evolutionary algorithms, in Evolutionary Multi-criterion Optimization (EMO) (Springer, Berlin, 2003), pp. 535–549 29. J. Ding, C. Yang, Y. Jin, T. Chai, Generalized multi-tasking for evolutionary optimization of expensive problems. IEEE Trans. Evol. Comput. 23(1), 44–58 (2019) 30. W. Du, W. Zhong, Y. Tang, W. Du, Y. Jin, High-dimensional robust multi-objective optimization for order scheduling: a decision variable classification approach. IEEE Trans. Ind. Inf. 15(1), 293–304 (2019) 31. J.A. Duro, Y. Yan, I. Giagkiozis, S. Giagkiozis, S. Salomon, D.C. Oara, A.K. Sriwastava, J. Morison, C.M. Freeman, R.J. Lygoe, R.C. Purshouse, P.J. Fleming, Liger: a cross-platform open-source integrated optimization and decision-making environment. Appl. Soft Comput. 98, 106851 (2021) 32. P. Eskelinen, K. Miettinen, K. Klamroth, J. Hakanen, Pareto Navigator for interactive nonlinear multiobjective optimization. OR Spectrum 23, 211–227 (2010) 33. M. Farina, K. Deb, P. Amato, Dynamic multiobjective optimization problems: test cases, approximations, and applications. IEEE Trans. Evol. Comput. 8(5), 425–442 (2000) 34. H.R. Fazeley, H. Taei, H. Naseh, A multi-objective, multidisciplinary design optimization methodology for the conceptual design of a spacecraft bi-propellant propulsion system. Struct. Multidiscip. Optim. 53, 145–160 (2016) 35. J. Fieldsend, T. Chugh, R. Allmendinger, K. Miettinen, A visualizable test problem generator for many-objective optimization, IEEE Trans. Evol. Comput. 26(1), 1–11 (2022) 36. P.J. Fleming, R.C. Purshouse, R.J. Lygoe, Many-objective optimization: an engineering design perspective, in Evolutionary Multi-criterion Optimization (EMO) (2005), pp. 14–32 37. C.M. Fonseca, P.J. Fleming, Genetic algorithms for multiobjective optimization: formulation, discussion and generalization, in International Conference on Genetic Algorithms (ICGA) (Morgan Kaufmann, 1993), pp. 416–423 38. C.M. Fonseca, P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms (I): a unified formulation. IEEE Trans. Syst. Man Cybern. - Part A 28(1), 26–37 (1998)
2 Key Issues in Real-World Applications…
53
39. C.M. Fonseca, P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms (II): application example. IEEE Trans. Syst. Man Cybern. - Part A 28(1), 38–44 (1998) 40. B. Fritzke, A growing neural gas network learns topologies, in Neural Information Processing Systems (NIPS) (MIT Press, 1995), pp. 625–632 41. G. Fu, C. Sun, Y. Tan, G. Zhang, Y. Jin, A surrogate-assisted evolutionary algorithm with random feature selection for large-scale expensive problems, in Parallel Problem Solving from Nature (PPSN) (Springer, 2020), pp. 125–139 42. A. Gaur, A.K. Talukder, K. Deb, S. Tiwari, S. Xu, D. Jones, Unconventional optimization for achieving well-informed design solutions for the automobile industry. Eng. Optim. 52(9), 1542–1560 (2020) 43. V.J. Gillet, W. Khatib, P. Willett, P.J. Fleming, D.V.S. Green, Combinatorial library design using a multiobjective genetic algorithm. J. Chem. Inf. Comput. Sci. 42(2), 375–385 (2002) 44. I. Goodfellow, Y. Bengio, A. Courville, F. Bach, Deep Learning (MIT Press, 2017) 45. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Neural Information Processing Systems (NIPS) (2014), pp. 2672–2680 46. L. Graening, B. Sendhoff, Shape mining: a holistic data mining approach for engineering design. Adv. Eng. Inf. 28, 166–185 (2014) 47. D. Guo, Y. Jin, J. Ding, T. Chai, Heterogeneous ensemble-based infill criterion for evolutionary multiobjective optimization of expensive problems. IEEE Trans. Cybern. 49(3), 1012–1025 (2019) 48. D. Guo, X. Wang, K. Gao, Y. Jin, J. Ding, T. Chai, Evolutionary optimization of highdimensional multi- and many-objective expensive problems assisted by a dropout neural network. IEEE Trans. Syst. Man Cybern.: Syst. 52(4), 2084–2097 (2020) 49. A. Gupta, Y.S. Ong, L. Feng, Multifactorial evolution: toward evolutionary multitasking. IEEE Trans. Evol. Comput. 20(3), 343–357 (2016) 50. A. Gupta, Y.S. Ong, L. Feng, Insights on transfer optimization: because experience is the best teacher. IEEE Trans. Emerg. Topics Comput. Intell. 2(1), 51–64 (2018) 51. A. Habib, H.K. Singh, T. Chugh, T. Ray, K. Miettinen, A multiple surrogate assisted decomposition-based evolutionary algorithm for expensive multi/many-objective optimization. IEEE Trans. Evol. Comput. 23(6), 1000–1014 (2019) 52. D. Hadka, P.M. Reed, Borg: An auto-adaptive many-objective evolutionary computing framework. Evol. Comput. 21(2), 231–259 (2013) 53. J. Hakanen, K. Miettinen, K. Sahlstedt, Wastewater treatment: New insight provided by interactive multiobjective optimization. Decis. Support Syst. 51, 328–337 (2011) 54. J. Hakanen, K. Sahlstedt, K. Miettinen, Wastewater treatment plant design and operation under multiple conflicting objective functions. Environ. Model. Softw. 46(1), 240–249 (2013) 55. J. Hämäläinen, K. Miettinen, P. Tarvainen, J. Toivanen, Interactive solution approach to a multiobjective optimization problem in paper machine headbox design. J. Optim. Theory Appl. 116(2), 265–281 (2003) 56. M. Hartikainen, K. Miettinen, K. Klamroth, Interactive Nonconvex Pareto Navigator for multiobjective optimization. Eur. J. Oper. Res. 275(1), 238–251 (2019) 57. M. Hartikainen, K. Miettinen, M. Wiecek, PAINT: Pareto front interpolation for nonlinear multiobjective optimization. Comput. Optim. Appl. 52(3), 845–867 (2012) 58. M. Hartikainen, V. Ojalehto, K. Sahlstedt, Applying the approximation method PAINT and the interactive method NIMBUS to the multiobjective optimization of operating a wastewater treatment plant. Eng. Optim. 47(3), 328–346 (2015) 59. C. He, S. Huang, R. Cheng, K.C. Tan, Y. Jin, Evolutionary multiobjective optimization driven by generative adversarial networks (GANs). IEEE Trans. Cybern. 51(6), 3129–3142 (2020) 60. Y. Hua, Y. Jin, K. Hao, A clustering based adaptive evolutionary algorithm for multi-objective optimization with irregular Pareto fronts. IEEE Trans. Cybern. 49(7), 2758–2770 (2019) 61. Y. Hua, Q. Liu, K. Hao, Y. Jin, A survey of evolutionary algorithms for multi-objective optimization problems with irregular Pareto fronts. IEEE/CAA J. Autom. Sinica 8(2), 303– 318 (2021)
54
K. Deb et al.
62. C.-L. Hwang, A.S.M. Masud, Multiple Objective Decision Making – Methods and Applications (Springer, 1979) 63. J. Ide, A. Schöbel, Robustness for uncertain multi-objective optimization: a survey and analysis of different concepts. OR Spectrum 38(1), 235–271 (2016) 64. H. Jain, K. Deb, An evolutionary many-objective optimization algorithm using referencepoint-based nondominated sorting approach, part II: Handling constraints and extending to an adaptive approach. IEEE Trans. Evol. Comput. 18(4), 602–622 (2014) 65. A. Jakulin, I. Bratko, Testing the significance of attribute interactions, in International Conference on Machine Learning (ICML) (ACM Press, 2004), pp. 409–416 66. G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning (Springer, 2013) 67. J.-R. Jian, Z.-H. Zhan, J. Zhang, Large-scale evolutionary optimization: a survey and experimental comparative study. Int. J. Mach. Learn. Cybern. 11, 729–745 (2020) 68. M. Jiang, Z. Huang, L. Qiu, W. Huang, G.G. Yen, Transfer learning-based dynamic multiobjective optimization algorithms. IEEE Trans. Evol. Comput. 22(4), 501–514 (2018) 69. C.D. Jilla, D.W. Miller, Multi-objective, multidisciplinary design optimization methodology for distributed satellite systems. J. Spacecr. Rocket. 41(1), 39–50 (2004) 70. Y. Jin, H. Wang, T. Chugh, D. Guo, K. Miettinen, Data-driven evolutionary optimization: an overview and case studies. IEEE Trans. Evol. Comput. 23(3), 442–458 (2019) 71. D.R. Jones, M. Schonlau, W.J. Welch, Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998) 72. D. Kahneman, A. Tversky, Prospect theory: an analysis of decision under risk. Econometrica 47(2), 263–291 (1979) 73. K. Klamroth, S. Mostaghim, B. Naujoks, S. Poles, R. Purshouse, G. Rudolph, S. Ruzika, S. Sayın, M.M. Wiecek, X. Yao, Multiobjective optimization for interwoven systems. J. MultiCriteria Decis. Anal. 24, 71–81 (2017) 74. M. Laumanns, L. Thiele, K. Deb, E. Zitzler, Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263–282 (2002) 75. B. Li, J. Li, K. Tang, X. Yao, Many-objective evolutionary algorithms: a survey. ACM Comput. Surv. 48(1), 1–35 (2015) 76. H. Li, K. Deb, Q. Zhang, Variable-length Pareto optimization via decomposition-based evolutionary multiobjective algorithm. IEEE Trans. Evol. Comput. 23(6), 987–999 (2019) 77. K. Li, K. Deb, T. Altinoz, X. Yao, Empirical investigations of reference point based methods when facing a massively large number of objectives: first results, in Evolutionary Multicriterion Optimization (EMO) (Springer, 2017), pp. 390–405 78. J. Lin, H.-L. Liu, K.C. Tan, F. Gu, An effective knowledge transfer approach for multiobjective multitasking optimization. IEEE Trans. Cybern. 51(6), 3238–3248 (2021) 79. Q. Lin, S. Liu, K.-C. Wong, M. Gong, C.A.C. Coello, A clustering-based evolutionary algorithm for many-objective optimization problems. IEEE Trans. Evol. Comput. 24(3), 391–405 (2019) 80. Q. Liu, Y. Jin, M. Heiderich, T. Rodemann, An adaptive reference vector guided evolutionary algorithm using growing neural gas for many-objective optimization of irregular problems. IEEE Transactions on Cybernetics, 1–14 (2020) 81. Y. Liu, H. Ishibuchi, N. Masuyama, Y. Nojima, Adapting reference vectors and scalarizing functions by growing neural gas to handle irregular Pareto fronts. IEEE Trans. Evol. Comput. 24(3), 439–453 (2020) 82. J. Lu, B. Li, Y. Jin, An evolution strategy assisted by an ensemble of local Gaussian process models, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2013), pp. 447–454 83. K. Miettinen, Nonlinear Multiobjective Optimization (Kluwer Academic Publishers, 1999) 84. K. Miettinen, P. Eskelinen, F. Ruiz, M. Luque, NAUTILUS method: An interactive technique in multiobjective optimization based on the nadir point. Eur. J. Oper. Res. 206(2), 426–434 (2010)
2 Key Issues in Real-World Applications…
55
85. K. Miettinen, J. Hakanen, D. Podkopaev, Interactive nonlinear multiobjective optimization methods, in Multiple Criteria Decision Analysis: State of the Art Surveys, 2nd edn. ed. by S. Greco, M. Ehrgott, J. Figueira (Springer, 2016), pp. 931–980 86. K. Miettinen, M. Mäkelä, Interactive bundle-based method for nondifferentiable multiobjective optimization: NIMBUS. Optimization 34, 231–246 (1995) 87. K. Miettinen, M. Mäkelä, T. Männikkö, Optimal control of continuous casting by nondifferentiable multiobjective optimization. Comput. Optim. Appl. 11, 177–194 (1998) 88. K. Miettinen, M.M. Mäkelä, Synchronous approach in interactive multiobjective optimization. Eur. J. Oper. Res. 170, 909–922 (2006) 89. K. Miettinen, F. Ruiz, NAUTILUS framework: towards trade-off-free interaction in multiobjective optimization. J. Bus. Econ. 86(1–2), 5–21 (2016) 90. K. Miettinen, F. Ruiz, A. Wierzbicki, Introduction to multiobjective optimization: interactive approaches, in Multiobjective Optimization: Interactive and Evolutionary Approaches, ed. by J. Branke, K. Deb, K. Miettinen, R. Slowinski (Springer, 2008), pp. 27–57 91. E.A. Moallemi, F. Zare, P.M. Reed, S. Elsawah, M.J. Ryan, B.A. Bryan, Structuring and evaluating decision support processes to enhance the robustness of complex human-natural systems. Environ. Model. & Softw. 123, 104551 (2020) 92. G. Misitano, B.S. Saini, B. Afsar, B. Shavazipour, K. Miettinen, DESDEO: the modular and open source framework for interactive multiobjective optimization, IEEE Access, 9, 148277– 148295 (2021) 93. M.N. Omidvar, X. Li, Y. Mei, X. Yao, Cooperative co-evolution with differential grouping for large scale optimization. IEEE Trans. Evol. Comput. 18(3), 378–393 (2013) 94. L. Pan, C. He, Y. Tian, H. Wang, X. Zhang, Y. Jin, A classification based surrogate-assisted evolutionary algorithm for expensive many-objective optimization. IEEE Trans. Evol. Comput. 23(1), 74–88 (2019) 95. X. Peng, Y. Jin, H. Wang, Multi-modal optimization enhanced cooperative coevolution for large-scale optimization. IEEE Trans. Cybern. 49(9), 3507–3520 (2019) 96. R. Purshouse, K. Deb, M. Mansor, S. Mostaghim, R. Wang, A review of hybrid evolutionary multiple criteria decision making methods, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2014), pp. 1147–1154 97. R.C. Purshouse, P.J. Fleming, Conflict, harmony, and independence: relationships in evolutionary multi-criterion optimisation, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2003), pp. 16–30 98. M.S. Reed, A. Graves, N. Dandy, H. Posthumus, K. Hubacek, J. Morris, C. Prell, C.H. Quinn, L.C. Stringer, Who’s in and why? a typology of stakeholder analysis methods for natural resource management. J. Environ. Manag. 90(5), 1933–1949 (2009) 99. A. Ruiz, F. Ruiz, K. Miettinen, L. Delgado-Antequera, V. Ojalehto, NAUTILUS Navigator: Free search interactive multiobjective optimization without trading-off. J. Global Optim. 74(2), 213–231 (2019) 100. A.B. Ruiz, K. Sindhya, K. Miettinen, F. Ruiz, M. Luque, E-NAUTILUS: A decision support system for complex multiobjective optimization problems based on the NAUTILUS method. Eur. J. Oper. Res. 246, 218–231 (2015) 101. M.L. Ryerkerk, R.C. Averill, K. Deb, E.D. Goodman, Meaningful representation and recombination of variable length genomes, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2012), pp. 1471–1472 102. M.L. Ryerkerk, R.C. Averill, K. Deb, E.D. Goodman, Solving metameric variable-length optimization problems using genetic algorithms. Genet. Program Evolvable Mach. 18, 247– 277 (2017) 103. B. Saini, J. Hakanen, K. Miettinen, A new paradigm in interactive evolutionary multiobjective optimization, in Parallel Problem Solving from Nature (PPSN) (Springer, 2020), pp. 243–256 104. D.K. Saxena, J.A. Duro, A. Tiwari, K. Deb, Q. Zhang, Objective reduction in many-objective optimization: linear and nonlinear algorithms. IEEE Trans. Evol. Comput. 17(1), 77–99 (2013) 105. B. Shahriari, K. Swersky, Z. Wang, R.P. Adams, N. de Freitas, Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)
56
K. Deb et al.
106. K. Sindhya, V. Ojalehto, J. Savolainen, H. Niemisto, J. Hakanen, K. Miettinen, Coupling dynamic simulation and interactive multiobjective optimization for complex problems: an APROS-NIMBUS case study. Expert Syst. Appl. 41(5), 2546–2558 (2014) 107. H.K. Singh, A. Isaacs, T. Ray, A Pareto corner search evolutionary algorithm and dimensionality reduction in many-objective optimization problems. IEEE Trans. Evol. Comput. 15(4), 539–556 (2011) 108. A. Sinha, P. Malo, K. Deb, A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Trans. Evol. Comput. 22(2), 276–295 (2018) 109. R. Srivastava, K. Deb, R. Tulsyan, An evolutionary algorithm based approach to design optimization using evidence theory. J. Mech. Des. 135(8), 081003 (2013) 110. I. Steponavice, S. Ruuska, K. Miettinen, A solution process for simulation-based multiobjective design optimization with an application in paper industry. Comput. Aided Des. 47, 45–58 (2014) 111. R.E. Steuer, Multiple Criteria Optimization: Theory, Computation and Application (Wiley, 1986) 112. C. Sun, Y. Jin, Y. Tan, Semi-supervised learning assisted particle swarm optimization of computationally expensive problems, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2019), pp. 45–52 113. C. Sun, Y. Jin, J. Zeng, Y. Yu, A two-layer surrogate-assisted particle swarm optimization algorithm. Soft. Comput. 19(6), 1461–1475 (2015) 114. X. Sun, D. Gong, Y. Jin, S. Chen, A new surrogate-assisted interactive genetic algorithm with weighted semi-supervised learning. IEEE Trans. Cybern. 43(2), 685–698 (2013) 115. Y. Sun, G.G. Yen, Y. Zhang, Improved regularity model-based EDA for many-objective optimization. IEEE Trans. Evol. Comput. 22(5), 662–678 (2018) 116. M. Tabatabaei, M. Hartikainen, K. Sindhya, J. Hakanen, K. Miettinen, An interactive surrogate-based method for computationally expensive multiobjective optimization. J. Oper. Res. Soc. 70(6), 898–914 (2019) 117. J. Tian, Y. Tan, J. Zeng, C. Sun, Y. Jin, Multi-objective infill criterion driven Gaussian process assisted particle swarm optimization of high-dimensional expensive problems. IEEE Trans. Evol. Comput. 23(3), 459–472 (2019) 118. Y. Tian, C. Lu, X. Zhang, K.C. Tan, Y. Jin, Solving large-scale multiobjective optimization problems with sparse optimal solutions via unsupervised neural networks. IEEE Trans. Cybern. (2021). To appear. https://doi.org/10.1109/TCYB.2020.2979930 119. Y. Tian, S. Peng, T. Rodemann, X. Zhang, Y. Jin, Automated selection of evolutionary multiobjective optimization algorithms, in IEEE Symposium Series on Computational Intelligence (SSCI) (IEEE Press, 2019), pp. 3225–3232 120. B. Trindade, P. Reed, G. Characklis, Deeply uncertain pathways: integrated multi-city regional water supply infrastructure investment and portfolio management. Adv. Water Resour. 134, 103442 (2019) 121. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked denoising autoencoders: learning useful representationsina deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010) 122. C. von Lücken, B. Barán, C. Brizuela, A survey on multi-objective evolutionary algorithms for many-objective problems. Comput. Optim. Appl. 58(3), 707–756 (2014) 123. H. Wang, Y. Jin, J. Doherty, Committee-based active learning for surrogate-assisted particle swarm optimization of expensive problems. IEEE Trans. Cybern. 47(9), 2664–2677 (2017) 124. H. Wang, Y. Jin, C. Sun, J. Doherty, Offline data-driven evolutionary optimization using selective surrogate ensembles. IEEE Trans. Evol. Comput. 23(2), 203–216 (2019) 125. H. Wang, Y. Jin, C. Yang, L. Jiao, Transfer stacking from low- to high-fidelity: a surrogateassisted bi-fidelity evolutionary algorithm. Appl. Soft Comput. 92, 106276 (2020) 126. H. Wang, M. Olhofer, Y. Jin, A mini-review on preference modeling and articulation in multiobjective optimization: current status and challenges. Complex & Intell. Syst. 3(4), 233–245 (2017)
2 Key Issues in Real-World Applications…
57
127. H. Wang, X. Yao, Objective reduction based on nonlinear correlation information entropy. Soft. Comput. 20(6), 2393–2407 (2016) 128. X. Wang, Y. Jin, S. Schmitt, M. Olhofer, Transfer learning for Gaussian process assisted evolutionary bi-objective optimization for objectives with different evaluation times, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2020), pp. 587–594 129. T.B. Wild, P.M. Reed, D.P. Loucks, M. Mallen-Cooper, E.D. Jensen, Balancing hydropower development and ecological impacts in the Mekong: tradeoffs for Sambor Mega Dam. J. Water Resour. Plan. Manag. 145(2), 05018019 (2019) 130. M.J. Woodruff, P.M. Reed, T.W. Simpson, Many objective visual analytics: rethinking the design of complex engineered systems. Struct. Multidiscip. Optim. 48(1), 201–219 (2013) 131. C. Yang, J. Ding, Y. Jin, T. Chai, Off-line data-driven multi-objective optimization: knowledge transfer between surrogates and generation of final solutions. IEEE Trans. Evol. Comput. 24(3), 409–423 (2020) 132. Q. Yang, Y. Zhang, W. Dai, S. Pan, Transfer Learning (Cambridge University Press, Cambridge, 2020) 133. S. Yang, X. Yao, Evolutionary Computation for Dynamic Optimization Problems (Springer, 2013) 134. G. Yu, Y. Jin, M. Olhofer, A multi-objective evolutionary algorithm for finding knee regions using two localized dominance relationships. IEEE Trans. Evol. Comput. 25(1), 145–158 (2021) 135. Y. Yuan, Y.S. Ong, A. Gupta, H. Xu, Objective reduction in many-objective optimization: evolutionary multiobjective approaches and comprehensive analysis. IEEE Trans. Evol. Comput. 22(2), 189–210 (2018) 136. Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007) 137. Q. Zhang, A. Zhou, Y. Jin, RM-MEDA: a regularity model-based multiobjective estimation of distribution algorithm. IEEE Trans. Evol. Comput. 12(1), 41–63 (2008) 138. X. Zhang, Y. Tian, R. Cheng, Y. Jin, A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization. IEEE Trans. Evol. Comput. 22(1), 97–112 (2018) 139. L. Zhou, L. Feng, K.C. Tan, J. Zhong, Z. Zhu, K. Liu, C. Chen, Toward adaptive knowledge transfer in multifactorial evolutionary computation. IEEE Trans. Cybern. (2021). To appear. https://doi.org/10.1109/TCYB.2020.2974100 140. Y. Zhou-Kangas, K. Miettinen, Decision making in multiobjective optimization problems under uncertainty: balancing between robustness and quality. OR Spectrum 41(2), 391–413 (2019) 141. Y. Zhou-Kangas, K. Miettinen, K. Sindhya, Solving multiobjective optimization problems with decision uncertainty: an interactive approach. J. Bus. Econ. 89(1), 25–51 (2019) 142. X. Zhu, A.B. Goldberg, Introduction to Semi-Supervised Learning. (Morgan & Claypool, 2009)
Chapter 3
Identifying Properties of Real-World Optimisation Problems Through a Questionnaire Koen van der Blom, Timo M. Deist, Vanessa Volz, Mariapia Marchi, Yusuke Nojima, Boris Naujoks, Akira Oyama, and Tea Tušar Abstract Optimisation algorithms are commonly compared on benchmarks to get insight into performance differences. However, it is not clear how closely benchmarks match the properties of real-world problems because these properties are largely unknown. This work investigates the properties of real-world problems through a questionnaire to enable the design of future benchmark problems that more closely resemble those found in the real world. The results, while not representative as they are based on only 45 responses, indicate that many problems possess at least one of the following properties: they are constrained, deterministic, have only continuK. van der Blom (B) Leiden University, Leiden, The Netherlands Sorbonne Université, CNRS, LIP6, Paris, France e-mail: [email protected]; [email protected] T. M. Deist Centrum Wiskunde & Informatica, Amsterdam, The Netherlands e-mail: [email protected] V. Volz modl.ai, Copenhagen, Denmark e-mail: [email protected] M. Marchi ESTECO SpA, Trieste, Italy e-mail: [email protected] Y. Nojima Osaka Metropolitan University, Sakai, Osaka, Japan e-mail: [email protected] B. Naujoks TH Köln, Gummersbach, Germany e-mail: [email protected] A. Oyama Japan Aerospace Exploration Agency, Sagamihara, Japan e-mail: [email protected] T. Tušar Jožef Stefan Institute, Ljubljana, Slovenia e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_3
59
60
K. van der Blom et al.
ous variables, require substantial computation times for both the objectives and the constraints, or allow a limited number of evaluations. Properties like known optimal solutions and analytical gradients are rarely available, limiting the options in guiding the optimisation process. These are all important aspects to consider when designing realistic benchmark problems. At the same time, the design of realistic benchmarks is difficult, because objective functions are often reported to be black-box and many problem properties are unknown. To further improve the understanding of real-world problems, readers working on a real-world optimisation problem are encouraged to fill out the questionnaire: https://tinyurl.com/opt-survey.
3.1 Introduction Optimisation algorithms are ultimately used to solve real-world problems. In contrast, the benchmarks used in academic research to assess algorithmic performance often consist of artificial mathematical functions. As previous studies have indicated [34], it is questionable whether the performance of algorithms on such benchmarks translates to real-world problems. While artificial mathematical functions are designed to have specific properties, it is not clear how closely they reflect the properties seen in the real world. In fact, work by Ishibuchi et al. [10] points out that differences in algorithm performance exist between commonly used benchmark functions and real-world problems. In addition, Tanabe et al. [23] establish that both C-DTLZ problems and common test problems that aim to imitate real-world problems contain a number of unnatural characteristics. Further, it has been shown that in some cases, simple algorithms can be surprisingly effective on real-world-like problems, in contrast to what artificial benchmarks would predict [34]. As a consequence, the design of benchmarks would benefit from a better understanding of which properties appear in real-world problems, how common these properties are, and in which combinations they are found. Further, it is of particular interest to identify characteristics of real-world problems that are not yet represented in artificial benchmarks, and to identify real-world problems that might be usable as part of a benchmark suite. All these aspects would then provide better guidance to the development of algorithms that perform well in the real world. Although such reality-inspired benchmarks may still, in part, consist of artificial functions, those functions are then designed based on clear indications of needs. In addition, actual real-world problems or simplified problems that are correlated with them could be included in benchmarks. By contributing problems to benchmarks, industry can benefit from better algorithms and solutions. At the same time, academia benefits from greater insight into the particularities of the specific problem. While understandably not all real-world problems are publicly available, some properties might still be made available, and could thus guide the design of useful benchmark problems.
3 Identifying Properties of Real-World Optimisation Problems
61
A better understanding of real-world problems and their properties is clearly important. This is why we have designed a questionnaire to gather information on real-world problems faced by researchers and practitioners. The questionnaire was designed by researchers from the Evolutionary Computation community, which may have induced biases in questionnaire design and the reached audience. We hope that the results obtained are either directly or indirectly useful for other communities as well, even though they might have added, changed, or formulated some questions differently. Most importantly, each reader is strongly encouraged to fill out the questionnaire themselves and to advertise it in their own community. This chapter presents the questionnaire, and analyses and discusses the responses we have collected between October 2019 and July 2020. Our motivation for creating the questionnaire was to (a) identify properties of existing real-world problems and (b) identify or create practical benchmarks that resemble these properties. Based on our findings, we are able to make some first (tentative) suggestions for constructing future benchmarking suites. For example, we find that many real-world problems include constraints and there are several with high-dimensional search spaces. Currently, such problems are not well reflected in popular benchmark suites. However, to make benchmark suites better suited to match real-world problems, these should be adapted to reflect the above observations. What could be used directly in benchmark suites are problems with short evaluation times ( 0 Remark 4.14 In Ikeda et al. [18], the αi j ≥ 0 and αii = 1, ∀ j ∈ [m] and i ∈ [m]. α
Proposition 4.12 For any set of real numbers αi j , i ∈ [m], j ∈ [m], the relation ≺ is: 1. a strict partial order on Rm , 2. asymmetric (thus also antisymmetric), and α α 3. ≼ := ≺ ∪Δ is a partial order, where Δ := {(x, x) | x ∈ Rm }.
94
A. Deutz et al. α
Proof Clearly, ≺ is irreflexive as ∀i ∈ [m] and ∀y ∈ Rm : gi (y, y) = 0. α α α Secondly, ≺ is also transitive. Assume, y(1) ≺ y(2) and y(2) ≺ y(3) . For all i ∈ [m], gi (y(1) , y(3) ) = gi (y(1) , y(2) ) + gi (y(2) , y(3) ) is non-negative as each of the summands by the assumptions is non-negative. There exists a k ∈ [m], such that gk (y(1) , y(2) ) > 0. Thus, also gk (y(1) , y(2) ) + gk (y(2) , y(3) ) = gk (y(1) , y(3) ) > 0. α α Secondly, ≺ is asymmetric. For let y(1) ≺ y(2) . Then there exists a k ∈ [m] such α that gk (y(1) , y(2) ) > 0. Assume we also have y(2) ≺ y(1) . Then gk (y(2) , y(1) ) ≥ 0. But α gk (y(2) , y(1) ) = −gk (y(1) , y(2) ) < 0. This is a contradiction. Thus, ≺ is asymmetric □ (therefore also antisymmetric because of irreflexivity). α
Remark 4.15 The binary relation ≺ on Rm could also be defined in matrix language, that is, we view the m 2 numbers αi j given as an m × m matrix: ⎛
α11 α12 ⎜ α21 α22 ⎜ A := ⎜ . .. ⎝ .. . αm1 αm2
⎞ · · · α1m · · · α2m ⎟ ⎟ .. .. ⎟ , . . ⎠ · · · αmm
α
then y(1) ≺ y(2) :⇔ 0 ≺ Par eto A(y(2) − y(1) ). In the sequel, we will predominantly use the matrix view. α
For the following Proposition see in Proposition 4.12 the definition of the relation
≼ associated to numbers αi j . α
Proposition 4.13 Let A be an m × m matrix with real entries and let ≼ be the α partial order associated to A. Then ≼ = {(y, y' ) | A(y' − y) ≥ 0} if and only if A is invertible. Proof Let R := {(y, y' ) | A(y' − y) ≥ 0}. α Part 1: we show that ≼ = R ⇒ A is invertible by showing that the contrapositive α
of this statement is true, that is, (A is not invertible ) ⇒ ≼/= R. So assume A is not invertible, then ∃v /= 0 such that A(v − 0) = 0, in other words, (0, v) ∈ R but α α / ≺ and (0, v) ∈ / Δ. Thus, R /= ≼. clearly, (0, v) ∈ α
In general, also when A is not invertible we have that ≼ ⊆ R (as R = {(y, y' ) | 0 ≺ Par eto A(y' − y) } ∪ {(y, y' ) | 0 = A(y' − y) }, and this union consists of α disjoint sets, the first set of the union is equal to ≺ and the second set contains Δ). α
We now assume A to be invertible and show that R ⊆≼. This is equivalent to showing that {(y, y' ) | 0 = A(y' − y) } ⊆ Δ. Suppose there exist y and y' such that y /= y' and α
0 = A(y' − y). This implies A is not invertible. A contradiction. Thus, R ⊆≼. α
□
In the next proposition, we give a characterization of matrices for which ≺ is an extension of the Pareto order.
4 Many-Criteria Dominance Relations
95 α
Proposition 4.14 The strict partial order ≺ is an extension of the Pareto order if and only if the entries αi j , i ∈ [m], j ∈ [m] of the matrix A are non-negative and each column of A is non-zero. Proof (a) Let αi j ≥ 0, i ∈ [m], j ∈ [m] and let each column of A contain a strictly α positive entry. We show ≺ Par eto ⊆≺: let y ≺ Par eto y' , that is yi' − yi ≥ 0 and y' − y /= 0. As the entries of the matrix A are non-negative and yi' − yi ≥ 0, i ∈ [m], we have A(y' − y) ≥ 0. Secondly, there exists k ∈ [m] such that A(y' − y)k > 0 as each α column of A is non-zero. That is, y ≺ y' . α (b) Let ≺ Par eto ⊆≺. We will show that the entries of A are non-negative: Con(i) sider 0 ≺ Par eto e , where e(i ) ∈ Rm and e(ij ) = δi j , i ∈ [m], j ∈ [m] with δi j the α
Kronecker delta. As ≺ Par eto ⊆≺, we get (for all i ∈ [m]) A(e(i) − 0) ≥ 0, i.e., the entries of the ith column of A are non-negative. Moreover, again because of α□ dominance, one of the entries in this column is strictly greater than zero. Remark 4.16 In the sequel, we shall assume that the αi j are non-negative and αii = 1—though many results are still valid with fewer conditions on the αi j . Obviously, the gi -functions take the role of the f i -functions in the Pareto dominance order. Here, gi (y, y' ) is the relative fitness vector between two solutions on the ith objective. In the approach of Ikeda et al. [18], the αi j are designed to be the trade-off rates between objectives i and j. There is no explicit formula for the calculation of the αi j , and their choice is usually left to the designer. Higher values of α extend the order relation and dominance is more likely to occur. α By considering Proposition 4.13 we see that the α-dominance relation ≼ is described by a closed, convex cone (as it is the intersection of halfspaces). The shape of this cone is determined by the estimated trade-off rates. In case of diminishing trade-off rates αi j ≈ 0, we get the special case of the Pareto dominance cone. We now introduce two definitions and terminology for cones which are described as the intersection of halfspaces and cones which are finitely generated. Subsequently, we will show that these two descriptions are equivalent. Definition 4.20 1. A set C ⊆ Rm is a finite polar cone, if C = {x|∃ a t by m matrix A such that A(x) ≥ 0} 2. A set C ⊆ Rm is a finitely generated cone, if C = {x ∈ Rm |∃ m by s matrix D and ∃y ∈ Rs : y ≥ 0 such that x = D(y)}, and the column vector of D form a set of generators. For matrices A, we can find generators for the closed, convex, pointed cone C A = {x ∈ Rm |Ax ≥ 0}. In case A is invertible, we will find generators for the cone α
by which ≼ is determined.
96
A. Deutz et al.
We begin with a theorem (Weyl’s Theorem) which given generators for the cone will give the representation of this cone as an intersection of halfspaces. Secondly, this theorem plays a crucial role in getting the generators of a cone which is described as the intersection of halfspaces, that is, Minkowski’s Theorem. Theorem 4.1 (Weyl’s Theorem for Cones) Let E be an m × s matrix with real entries. Let E = {x ∈ Rm | ∃y ∈ Rs such that y ≥ 0 and x = Ey}. Then there exists a t × m matrix F such that E = {x ∈ Rm |Fx ≥ 0}. In other words, for each finitely generated, convex cone there is a matrix F such that the cone is representable in the form {x|Fx ≥ 0}. Proof Apply Fourier–Motzkin elimination to the consistent system x − Ey □ = 0 and y ≥ 0. We now show how to get the generators for a cone which is represented as the intersection of halfspaces. Definition 4.21 (Dual of a set S) Let S be a non-empty subset of Rs . Then the dual of S, notation S ∗ , is the set {x ∈ Rs | xT s ≥ 0 ∀s ∈ S}. Lemma 4.2 Let the cone D be a finite polar cone. Then (D∗ )∗ = D. Proof For the proof of the statement, see Page 167 in Dimitri et al. [3].
□
Remark 4.17 The above Lemma holds for any closed, convex set. In case S is infinite, S ∗ is the dual of S, in case S is finite, we termed S ∗ as finite polar cone. Clearly, D in the above Lemma is closed and convex as the intersection of halfspaces. The above Lemma holds for any finite polar cone, also if it is defined by a zero matrix. In that case, the associated cone is trivial, that is, the whole space (Rs ). Let A be a matrix with real entries of size u × v. ⎛ ⎛ ⎞ a11 a12 · · · a1v a11 ⎜a21 a22 · · · a2v ⎟ ⎜a12 ⎜ ⎜ ⎟ T A=⎜ . . . ⎟ , then A = ⎜ .. ⎝ .. .. · · · .. ⎠ ⎝ . au1 au2 · · · auv
a1v
⎞ a21 · · · au1 a22 · · · au2 ⎟ ⎟ .. .. ⎟ . ··· . ⎠ a2v · · · auv
and let ⎧⎛ ⎞ ⎧⎛ ⎞ ⎫ x1 x1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ | ⎪ ⎪ ⎨⎜ x 2 ⎟ ⎨⎜ x 2 ⎟ ⎬ | ⎜ ⎟ ⎜ ⎟ A := ⎜ . ⎟ ∈ Ru || x = Ay, y ∈ Rv , y ≥ 0 , B := ⎜ . ⎟ . ⎪ ⎪ ⎪ ⎝ .. ⎠ ⎝ ⎠ . ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ ⎭ xu xu Then the following proposition holds.
⎫ ⎛ ⎞ x1 ⎪ ⎪ | ⎪ ⎬ ⎟ x | T⎜ 2⎟ ⎜ |A ⎜.⎟≥0 | ⎪ ⎝ .. ⎠ ⎪ ⎪ ⎭ xu
4 Many-Criteria Dominance Relations
Proposition 4.15
97
A∗ = B and A = B∗
In other words: A and B are each other’s dual. Theorem 4.2 (Minkowski’s Theorem for Cones) Let the cone E be representable as a finite intersection of halfspaces: E = {x ∈ Rs |Ex ≥ 0}, where E is a matrix (say of size t × s) with real entries. Then E is also finitely generated. Proof Define D := {x ∈ Rs |x = E T y, y ≥ 0}. Note that, by Proposition 4.15, we have: E = D∗ The cone D is finitely generated, and according to Weyl’s Theorem, there exists a matrix D of size l × s, for some l such that D is represented as an intersection of halfspaces: D = {x ∈ Rs |Dx ≥ 0}. (The matrix can be obtained via Fourier–Motzkin Elimination.) Define the auxiliary cone C = {x ∈ Rs |x = D T y, y ≥ 0}. Again, by Proposition 4.15, we have: C = D∗ . Thus, E = C. Hence, E is finitely generated (and the column vectors of D T form a □ set of generators). Remark 4.18 The proof was designed in such a way as to give rise to a naive algorithm (not an efficient one) for computing generators for the cone E: For let the matrix E give rise to a representation of E as a finite intersection of halfspaces (aka finite polar cone). Then proceed as follows. Take the transpose of E and consider D := {x ∈ Rs |x = E T y, y ≥ 0}. The cone D is finitely generated so by applying Fourier–Motzkin elimination on the system x = E T y, y ≥ 0 we obtain a matrix D by which the cone is representable as a finite intersection of halfspaces D = {x ∈ Rs |Dx ≥ 0}. By taking the transpose of the matrix D we get a set of generators for E as the columns of D T . This algorithm does not in general compute a minimal set of generators for the cone E. There are algorithms which produce a minimal set of generators (which are the extremal rays for the cone). A variation of the Double Description method achieves this goal, see Fukuda et al. [17]. α
In case A is invertible, we get for the cone which defines the relation ≼ an interchange between the two aforementioned representations of this cone. And more specifically, if αii = 1, i ∈ [m] and A is invertible with non-negative entries which are interpreted as trade-offs, we get the connection between generator-based and α
trade-off-based representations of the cone of ≼. For low values of m, a fortiori when m = 2 it is easy to find the different representations of the cone in a geometric way.
98
A. Deutz et al.
∈-dominance and Cone ∈-dominance Laumanns et al. proposed the concept of Pareto ∈-dominance [23]. There is an additive and a multiplicative version of this concept, but next, we will introduce only the additive version: Definition 4.22 Let y ∈ Rm and y' ∈ Rm and ∈ ∈ R, with ∈ > 0. Then y ≺∈ y' , ⇔ ∀i ∈ {1, ..., m} : yi − ∈ ≤ yi' . Optimizing based on ∈-dominance creates, for objective functions with bounded ranges, a finite set of non-dominated solutions that will be placed distant from each other. If the decision-maker wants to maintain a set of maximally T non-dominated solutions and let K denote a constant such that 0 ≤ f i ≤ K , ∀i ∈ {1, . . . , m}, then ∈ can be adjusted to ∈ = (K /T )1/(m−1) . An alleged disadvantage of ∈-dominance is that certain regions of the Pareto front with steep trade-off are underrepresented. Batista et al. [1] proposed Pareto cone ∈-dominance to improve the way solutions distribute as compared to Pareto ∈-dominance. For this, they propose to use a polyhedral cone given by a m × m generator matrix and exemplify their approach in two dimensions, where some concepts were introduced also in higher dimensions. Angle Dominance Liu et al. defined angle dominance [26]. For each point y, a point-dependent “cone” is constructed as follows. For each i (i = 1, . . . , m), a point P (i) := (0, . . . , 0, vi , 0, . . . 0)T ∈ Rm is introduced, all coordinates of P (i ) are zero, except the i-th coordinate. The i-th coordinate is derived from the worst point and a parameter k > 0. (Recall that the worst point, w, is the point—it is in general not the nadir point—for which the i-th coordinate is equal to wi := sup{ f i (x) | x ∈ X}, where f i is the i-th objective, i = 1, . . . , m and X is the search space.) Using the parameter k one defines P (i ) := (0, . . . , 0, kwi , 0, . . . , 0)T . The second ingredient used is the ideal point (or if needed the utopian point). Denote the ideal point by zideal . Then to a point y they associate m angles: (α1 , . . . , αm ). The cosine of αi is equal to cos( αi ) =
(P (i) − y) · (P (i) − zideal ) , |P (i) − y| |P (i ) − zideal |
where in the numerator the inproduct is used. The angle dominance relation is defined as follows. Definition 4.23 (Angle Dominance) y ≺angle y' :⇐⇒ ∀i ∈ {1, . . . , m} : αi ≤ αi' and ∃i ∈ {1, . . . , m}: αi < αi' , where αi are the m angles associated to y and αi' are m angles associated to y' . Trivially for any set S and any function G : S → Rm , we obtain an irreflexive, asymmetric (hence also an antiysymmetric) and transitive binary relation on S as follows: s1 ∈ S, s2 ∈ S, s1 ≺G s2 :⇔ ∀i ∈ [m] : G i (s1 ) ≤ G i (s2 ) and ∃ j ∈ [m] : G j (s1 ) < G j (s2 ). (A constant G gives rise to the empty relation.) Using this definition where S = [zideal , zworst ] and G associates to each point in [zideal , zworst ] the
4 Many-Criteria Dominance Relations
99
m-tuple of angles as in Definition 4.23, we see that Angle Dominance is an irreflexive, asymmetric, and transitive binary relation on [zideal , zworst ], in other words, it is a strict partial order. Moreover, for k ≥ 1, Angle Dominance is a Pareto order extension. For 0 < k < 1, Angle Dominance is not a Pareto order extension and for all k, the larger k, the smaller the hypervolume dominated by a point. Clearly, one can think of the m angles associated to a point as a cone, albeit for different points in general different cones are associated. In this approach, the angles are computed and do not have to be provided by the decision-maker, leaving only one parameter to be provided, that is, k. However, there is limited control on the dynamics of the algorithm (NSGA-II with Pareto-dominated sorting is replaced by Angle Dominated sorting) and the danger that extreme angles might occur in case of degenerate problems. On typical many-criteria optimization benchmark problems, the reported results achieved with this order relation are however very good. Control Dominance Area of Solutions (CDAS) Sato et al. proposed an approach to control the dominance area of solutions (CDAS) [32]. In CDAS, the objective values are modified and the i-th objective value of x i ·π) , where r is the norm of f (x), after modification is defined as fˆi (x) = r ·sinsin(w(Si +S i ·π) wi is the declination angle between f (x) and the coordinate axis. See Fig. 4.3 for the illustration of the geometric concepts (to illustrate CDAS, the problem is maximized). The degree of expansion or contraction of the dominance area of solutions can be controlled by the user-defined parameter S, i.e., fˆi (x) > f i (x) when Si < 0.5; in case of Si = 0.5, f i (x) does not change; and when Si > 0.5, fˆi (x) < f i (x). Depending on the parameter S, increasing or decreasing, the dominance area of solutions expands or increases. Note also that CDAS orders are in fact equal to the aforementioned orders based on edge-rotated cones in disguise. Figure 4.4 and 4.4 show examples of the CDAS binary relations one gets in case of two objectives. Figure 4.4 is created for values of the Si smaller than 21 and Fig. 4.4 is created with values of Si bigger than 21 . Consult the Appendix for these figures.
Fig. 4.3 Illustration of geometrical concepts in CDAS dominance. Left: The declination angle for f 1 is denoted with ω1 and the new coordinate which replaces y1 is y1' and it occurs at an angle of ϕ1 = S1 × π, and for other coordinates ϕi = Si × π . Right: Dominating regions for two points y(dark gray) and z: light gray. Clearly z dominates y
100
A. Deutz et al.
Fig. 4.4 The left picture represents an example of Pareto incomparable points which become comparable in the CDAS order. The points A and B are incomparable in the Pareto order (for there is no inclusion between the translated Pareto cones of A and B). The convex, pointed cone emanating downward from the corner A—in the picture this cone is determined by the blue halflines emanating from A—and the convex, pointed cone emanating from the corner B—determined by the red halflines—obey an inclusion, therefore A and B become comparable in this cone order. Note that the “blue line” cone and the “red line” are translations of each other and the corresponding blue and red halflines are parallel. Moreover S1 = S2 ≤ 21 . In case all the Si ≤ 21 the relation is an extension of the Pareto order. (In this example S1 = S2 ≤ 21 . Thus, ϕ1 = ϕ2 .) The right picture shows a two-objective example of Pareto comparable points which become incomparable in the CDAS order. In other words, this is an example that in case some Si > 21 the CDAS relation is not an extension of the Pareto order. Also in this example S1 = S2 . Thus, also ϕ1 = ϕ2 , but the angle is obtuse as S1 = S2 > 21 . The points A' and B ' are comparable in the Pareto order (for there is an inclusion relation between the translated Pareto cones of A' and B ' ). The convex, pointed cone emanating downward from the corner A' —in the picture this cone is determined by the blue halflines emanating from A' —and the convex, pointed cone emanating from the corner B ' —determined by the red halflines—do not obey an inclusion, therefore A' and B ' are incomparable in the CDAS cone order. Note that the “blue line” cone and the “red line” are translations of each other and the corresponding blue and red halflines are parallel
4 Many-Criteria Dominance Relations
101
The example in Fig. 4.4 is a case where incomparable points in the Pareto order are comparable in the CDAS order. This is not surprising because in case the angles ϕi , i = 1, . . . , m are acute the cone by which pairs of points are compared is larger than the Pareto cone. In the example of Fig. 4.4, we have points comparable in the Pareto order which are incomparable in the CDAS order (in this case the angles ϕi are obtuse giving rise to a cone which is smaller than the Pareto cone). Note that, if some Si > 21 (equivalently, if some of the angles ϕi are obtuse), the CDAS order is not an extension of the Pareto order.
4.4.3 Volume- and Area-Based Order Relations Volume Dominance The concept of volume dominance was proposed by Le and Landa-Silva [24]. This form of dominance is based on the volume of the objective space that a solution dominates. Let r denote a reference point in objective space, chosen by the user, such that, ideally, it dominates all relevant points in objective space that occur in the search process. ∏m Then the dominated hypervolume [44] of a point y is (ri − yi ). ∏ The shared dominated) volume of two points given by H V ({y}) = i=1 m ( ri − max{yi , yi' } . It is said, that for a y and y' is given as S H V (y, y' ) = i=1 ratio r S H V —this is a parameter-threshold—the point y dominates the point y' , iff either [H V '(y' ) = S H V (y, y' ) and H V (y) > H V (y' )] or [H V (y) > H V (y' ) and H V (y)−H V (y ) > r S H V ]. Pareto dominance is obviously preserved, but a point can S H V (y,y' ) also dominate another point if it is exclusively dominating more volume than another point. Grid Dominance and ∈-MOEA Yang et al. proposed a grid dominance relation [40] in the grid-based evolutionary algorithm (GrEA). The grid dominance adds selection pressure by adopting an adaptive grid construction. A similar idea is used in the classical ∈-MOEA algorithm [9]. Here a grid is spanned over the objective space and only one solution is allowed per grid cell. If there are two solutions in the same grid cell, the one Pareto dominated is removed or, in case they are Pareto incomparable, the one that is further away from the lower corner of the grid cell will be discarded.
4.4.4 Preference-Information and Utility Functions Preference information can be expressed by the preference region or the preference point when it is integrated into the search algorithms. The preference region or the region of interest (ROI) can be explicitly defined by the desirability function [14, 25, 37], density functions [6] or a Gaussian on a hyperplane [30]. In some cases,
102
A. Deutz et al.
a direction in the objective space is given to direct the search. An example would be a preference-based approach proposed by Karahan and Köksalan, which assigns different sizes of territories for preferred regions and non-preferred regions [21]. A reference point is sometimes provided by the decision-maker (DM) as the preference information, and this point could also be the knee point. The knee point is a point on the Pareto front for which a small improvement in any objective would lead to a large deterioration in at least one other objective. The knee points are considered as the most interesting solutions of the DM in the absence of explicitly provided preference [8]. R-NSGA-II [10] replaces the crowding distance by a weighted Euclidean distance to the reference point and a set of Pareto optimal solutions near a supplied set of reference points can be found. Branke [5] also modifies the second criterion in NSGA-II, and replaces the crowding distance with either an angle-based measure or a utility-based measure. The angle-based method calculates the angle between an individual and its two neighbors in the objective space. The smaller the angle, the more clearly the individual can be classified as a knee point. In the utility-based method, a marginal utility function is suggested to approximate the angle-based measure in the case of more than two objectives. The larger the external angle between a solution and its neighbors, the larger the gain in terms of linear utility obtained from substituting the neighbors with the solution of interest. It is worth mentioning that the MCDM community created a plethora of classes of utility functions of which many are not volume related. Among others, lexicographic preferences, ordered weighted averages, max-orderings, and financial economics is known for its use of quadratic utility functions giving the onset of general quadratic utility functions. Also, some of the MCDM approaches deliberately focus on the reduction of the Pareto set.
4.5 Discussion and Comparison In this chapter, we discussed ten dominance relations. In Liu et al. [26], these dominance relations are also discussed except fuzzy k-dominance. Moreover, Liu et al. [26] used a modest list of properties the dominance relations may or may not have and created the following Table 4.1. Uniformity and extensity1 are properties defined in a rather informal way. In Sayin [34], coverage, uniformity, (and cardinality) are defined. Uniformity (aka evenness) refers to the property that a dominance relation promotes a uniform distribution and spacing of the solutions. Extensity is referring to the property of capturing the entire Pareto front. We here report the results of Liu et al. and note that a more formal setting of these properties would be desirable for future work. Besides, the main axiomatic properties are discussed, particularly those that constitute a strict partial order. In their paper, Liu et al. [26] compare the order relations also empirically. The table shows that there is something known on the relationship between the proposed indicators among each other, but the list of relevant properties might not be complete and uniformity and extensity require a 1
very often no distinction is made between exentensity and coverage(aka spread).
4 Many-Criteria Dominance Relations
103
Table 4.1 Properties of dominance order relations according to Liu et al. [26]. “” in the cell indicates that the domination relation has a corresponding property. S.P.O. stands for strict partial order Order relation Uniformity Extensity Irreflexive Asymmetric Transitive S.P.O. Pareto-dominance favor relation (k-1)-dominance α-dominance Cone ∈-dominance ∈-dominance Angle dominance CDAS Volume-dominance Grid-dominance
formal definition. Moreover, general properties of such metrics must be viewed in combination with the specific application setting. Only then it can be decided which of the order extensions are suitable for the many-objective optimization problem at hand. It is worth mentioning that very recently Saxena et al. [33] introduced an interesting extension of the Pareto order called localized high fidelity domination (for short: lhf-domination). For a solution x to be better than a solution y, firstly, the number of objectives for which x is better than for y must exceed the number of objectives for which y is better than for x and, secondly, the weighted gain in objectives in which x is better exceeds the weighted loss in objectives in which x is worth than y. The weights can also be the coordinates of a reference vector—the weights are in this case the multiplicative inverses of the coordinates of the reference vector enabling to gauge the gains and losses with respect to the reference values.
4.6 Open Questions In the summary, we shall discuss some topics of future research in the context of alternative dominance relations. For this we will highlight different open questions and the state of the art with respect to these questions: Q1: The need for an updated comprehensive study of order relations and the need for more refined definitions. In this chapter we summarized a few alternative order relations and they can be grouped by main ideas, such as counting-based dominance, trade-off constraints/cone dominance, dominated volume/area, grid-based dominance, and ∈-dominance. The questions arise, whether there are fundamental ideas missing, and secondly, whether
104
A. Deutz et al.
refinements of the concepts are needed to achieve better theoretical properties or to remove parameters and thereby simplify the analysis and application of the order relation. Q2: How do relevant theoretical properties of order relations for many-criteria optimization compare among these different orders? It will be important that the order relations which have been proposed as alternatives to the Pareto dominance relation are not only compared to the Pareto dominance relation itself but also among each other. We discussed ten dominance relations in this chapter (nine of which are also discussed by Liu [26])—we added the fuzzy k-dominance. The comparison provided by Liu [26], which we summarized and extended in this chapter can be seen as a first step in this direction. The comparison was based on the concepts of uniformity, convergence, extensity, and based on a few selected axiomatic properties of binary relations. It will be meaningful to further extend this work and also consider additional properties, such as the ease of choosing parameters (e.g., angle, k-value, etc.), and how useful relaxed forms of transitivity are, e.g., whether or not they offer a meaningful definition of minimal elements. In addition, in view of the existence of a plethora of indicators with very fine-tuned distinctions, commonly classified as convergence, cardinality, and diversity (with subclasses distribution and coverage) indicators, it will be useful to subject the different alternatives to the Pareto dominance to a study taking into account this very rich, diverse approach to assessing the quality of the alternatives and compare them among each other. Q3: Once the order relation is fixed, how to design algorithms that efficiently find/approximate minimal sets? The question of what is the best extension to the Pareto dominance order might remain a question of subjective debate. But once a DM has selected an order relation that suits her, there is a need for algorithms to be able to efficiently converge to the set of minimal solutions. The state of the art in the literature is that the analysis of algorithms for finding minimal solutions is often done only by the authors, who suggested the order relation, and using slightly modified off-the-shelf algorithms such as NSGA-II, and there are usually no in-depth studies on how the convergence is affected by the properties of the new order relation and how algorithms can be improved. Moreover, one might also re-consider the performance indicators, having in mind the preference the DM has expressed when she chooses a particular order relation. Q4: Is it useful to change the order relations during optimization runs? An order relation has been introduced as a fixed binary relation. Alternatively, one might consider other ways to perform selection and steer progress that do not require the order relation to be fixed. • Interactive methods: the DM might change the order relation during the run based on learning what possibilities are available. This is addressed, for example, in interactive methods.
4 Many-Criteria Dominance Relations
105
• Alternating between different orderings to achieve faster convergence: this is based on the observation that algorithms that emphasize diversity are typically not converging very well [19]. Wang et al. [38], for instance, proposed to switch between an extension of the Pareto order and the Pareto dominance order depending on whether the search is focusing on convergence or diversity. • Variable ordering relations: one option in algorithm design is to change the order relation during search, for instance, depending on the position in the search space or depending on time. From a more theoretical perspective, the resulting phenomena have been analyzed by Eichfelder et al. [13] who describe order relations with cones that change their shape depending on the position in the objective space. The aforementioned angle-dominance relation can be viewed as an example of a variable ordering relation. Q5: Ranking methods verses binary relations In this chapter, we focused mainly on orders that compare two points. Alternatively, one might consider to rank an entire population and view the relative merits of individuals relative to others in a given population. Statistical ranking methods have a long tradition in the MCDM literature and often readily extend to higher numbers of criteria. It is beyond the scope of this chapter to provide a comprehensive overview, but a typical example is the average ranking (AR) and maximum ranking (MR) preference ordering relations were proposed in [2]. The solutions of a population are sorted based on each objective, and a set of different rankings is obtained for each solution. The AR value of a solution is calculated by summing its ranks on each objective. While the MR relation takes the best rank as the global rank for each solution. AR has a lack of diversity maintenance mechanism and MR emphasizes solutions with high performance in some objectives. The winning score (WS) was proposed by Maneeratana et al. [27] and it is heuristically calculated from the number of superior and inferior objectives between a pair of non-dominated individuals in a population. Despite these various proposals, a systematic study of ranking methods in the context of many-criteria optimization is still missing. Q6: Portfolio and set-preference methods Order relations and utility functions can be defined directly on the level of sets of solutions. In many-objective optimization, the question could be which set to present to the decision-maker, and in a pairwise comparison, whether or not set A is preferred to set B. Note that comparing two sets can result in a binary order relation in itself. The general idea of how to use order relations on sets in evolutionary multicriteria optimization is outlined in Zitzler et al. [43], but to our knowledge, there is no further work that discusses the particular complications with many-criteria optimization. In Yevseyeva et al. [41], it is discussed how the Sharpe indicator from financial portfolio management can be used to compare sets of solutions based on diversity and Pareto convergence.
106
A. Deutz et al.
Q7: Reconcile various uses and interpretations of cone orders Cone orders and other ordering relations have been introduced for a wide range of reasons and contexts [4, 13, 20, 28, 31, 35], only a few of which are specific to the context of many-criteria optimization (which were emphasized in this chapter). Future work will be required to bring order into these various interpretations and usages but also to transfer theoretical results obtained for the mathematically closely related objects and thereby enhance the understanding of theoretical aspects. Q8: Performance indicators Whenever a new order relation is defined that should govern the search for minimal sets, we may ask the question of how to adapt also the performance metrics. For conebased orders, some work has been done in generalizing the dominated hypervolume indicator to the cone orders and it turns out that the computation is not difficult for a larger number of objectives [15]. The concept of computing the size of the dominated space is generic to order relations on Rm , but in each case, it needs to be checked whether the computations are computationally tractable and can be applied with a larger number of objectives. As this list demonstrated, besides finding appropriate order relations for pairwise comparison, it could also be interesting to consider changing and variable order relations, ranking methods, or order relations between sets of solutions. In all these cases, defining and motivating a clear goal for the optimization and understanding the fundamental properties of these order relations and rankings will be of essential importance.
Appendix 1 In this Appendix, we provide the proofs of Lemma 4.1 and Propositions 4.4 and 4.5 and Corollary 4.1. Proof of Lemma 4.1: Proof The statement is nicely proven geometrically. We give an alternative, algebraic proof. Let y(2) − y(1) = x(2) − x(1) . Then also x(2) − y(2) = x(1) − y(1) , and thus (y(1) + x(1) − y(1) , y(2) + x(2) − y(2) ) = (x(1) , x(2) ). Conversely, suppose (x(1) , x(2) ) = (y(1) + d, y(2) + d) for some d ∈ Rm . Then (2) x − y(2) = d and x(1) − y(1) = d. Thus, y(2) − y(1) = x(2) − x(1) . □ Proof of Proposition 4.4: Proof 1a.–1c. We omit the proof for 1a–1c. 1d. Clearly, if x(2) − x(1) ∈ S, then also ∀d ∈ Rm : x(2) + d − (x(1) + d) ∈ S. Therefore, S R is translation invariant.
4 Many-Criteria Dominance Relations
107
1e. Let α > 0 and let (x(1) , x(2) ) ∈ R S . That is, x(2) − x(1) ∈ S. As S is a cone we get αx(2) − αx(1) ∈ S. Therefore, also (αx(1) , αx(2) ) ∈ R, in other words, R is positive multiplication invariant. 2a.,2b. We skip the proof of 2a. and 2b. 2c. Note that, in general, S R = Rm , does not imply that R = Rm × Rm . For instance, let R = Rm × Rm \ {(5, 7)}. Clearly, S R = Rm , as 2 ∈ S R for (7, 9) ∈ R. By requiring R to be translation invariant, we get that R = Rm × Rm , for let (x(1) , x(2) ) ∈ Rm × Rm and consider x(2) − x(1) ∈ Rm = S R . Thus, there exists (y(1) , y(2) ) ∈ R such that y(2) − y(1) = x(2) − x(1) . By Lemma 4.1 (x(1) , x(2) ) and (y(1) , y(2) ) are translates of each other. Therefore, (x(1) , x(2) ) ∈ R. Hence, R = Rm × Rm . 2d. Let c ∈ S R . That is ∃(x(1) , x(2) ) ∈ R such that c = x(2) − x(1) . Let α > 0. As (αx(1) , αx(2) ) ∈ R, we get αc ∈ S R . 3. First we show that S ⊆ TR S . Let s ∈ S. There exists (x(1) , x(2) ) ∈ Rm × Rm such that x(2) − x(1) = s; for such a pair it holds that it is a member R S , that is ∃(x(1) , x(2) ) ∈ R S such that x(2) − x(1) = s, therefore s = x(2) − x(1) ∈ TR S . Secondly we show TR S ⊆ S. Let t ∈ TR S , that is, ∃(x(1) , x(2) ) ∈ R S such that x(2) − x(1) = t. Hence, t = x(2) − x(1) ∈ S, as (x(1) , x(2) ) ∈ R S . 4. We need to show that (x(1) , x(2) ) ∈ R implies (x(1) , x(2) ) ∈ Q SR . Per definition we have (x(1) , x(2) ) ∈ R implies x(2) − x(1) ∈ S R . As Q SR = {(q(1) , q(2) )|q(2) − q(1) ∈ S R }, we see that (x(1) , x(2) ) ∈ Q SR . If R is translation invariant, we also get the inclusion Q SR ⊆ R. Let (q(1) , q(2) ) ∈ Q SR . This implies q(2) − q(1) ∈ S R . Therefore ∃(x(1) , x(2) ) ∈ R such that q(1) − q(2) = x(2) − x(1) . In other words, (q(1) , q(2) ) is a translate of (x(1) , x(2) ) by invoking Lemma 4.1. As R is translation invariant we see □ that (q(1) , q(2) ) ∈ R. The proof of Proposition 4.5: Proof 1. Note that RC is non-trivial. It is clearly non-empty as C is non-empty. Also RC /= Rm × Rm . For C /= Rm . Let c ∈ Rm \ C. This means (0, c) ∈ / RC . Assume (x, y) ∈ RC . This means y − x ∈ C. As C is a cone, we get for α > 0, α(y − x) ∈ C. Therefore, (αx, αy) ∈ RC . This means RC is positive multiplication invariant. To show that RC is translation invariant proceed as follows. Again assume (x, y) ∈ RC . This means y − x ∈ C. But then also ∀d : y + d − (d + x)) ∈ C. Therefore ∀d : (x + d, (y + d)) ∈ RC 2. Assume RC is antisymmetric and assume ∃d such that d ∈ C, −d ∈ C, and d /= 0. Then ∃x, y such that y − x = d. Thus, (x, y) ∈ RC . Since we also have −d ∈ RC , we get (y, x) ∈ RC . Since d /= 0, x /= y. This is a contradiction. Thus, if ∃d such that d, −d ∈ C, then d = 0. It still could happen that ∀d ∈ Rm , d ∈ C ⇒ −d ∈ / C; in this case C ∩ −C = ∅. Conversely Assume (a, b) ∈ RC and (b, a) ∈ RC and C ∩ −C = {0}. Hence, b − a ∈ C and a − b ∈ C. Thus, b − a ∈ C ∩ −C. In other words, b = a. In case (a, b) ∈ RC and (b, a) ∈ RC and C ∩ −C = ∅, it follows that C ∩ −C /= ∅—a contradiction.
108
A. Deutz et al.
3. Assume RC is transitive. Let c(1) , c(2) ∈ C. It suffices to show that c(1) + c(2) ∈ C. There exist a, b ∈ Rm such that b − a = c(1) and there exist d, e ∈ Rm such that e − d = c(2) . It follows that (a, b) ∈ RC and (d, e) ∈ RC . As RC is translation invariant we get (a − a, b − a) ∈ RC , that is, (0, c(1) ) ∈ RC . Similarly, (0, c(2) ) ∈ RC . Again translation invariance gives rise to: (c(1) + 0, c(1) + c(2) ) ∈ RC . From (0, c(1) ) ∈ RC and (c(1) , c(1) + c(2) ) ∈ RC and the transitivity of RC we get c(1) + c(2) ∈ C. As C is closed under addition and positive multiplication, it is convex. Conversely assume C to be convex. Let (a, b) ∈ RC and (b, c) ∈ RC , in other words, b − a ∈ C and c − b ∈ C. As C is convex we get 21 (b − a) + 21 (c − b) = 1 (c − a) ∈ C. Hence, c − a ∈ C and therefore (a, c) ∈ RC . 2 4. Assume RC is reflexive. As RC is non-trivial, it is non-empty. Therefore ∃x such that (x, x) ∈ RC and x − x = 0 ∈ C. let 0 ∈ C. Then ∀x ∈ Rm we have □ x − x = 0 ∈ C. Therefore, ∀x : (x, x) ∈ RC . A direct proof of Corollary 4.1: Proof 1. (We also assume R to be non-trivial: In case R = ∅, R is vacuously antisymmetric. In case R = Rm × Rm , then R is not antisymmetric and C R = Rm is not pointed. Therefore, also for a trivial R the equivalence holds). Let C R be pointed and let R be non-trivial. We show that R is antisymmetric. Let x ∈ Rm and y ∈ Rm . Case I (x, y) ∈ R and (y, x) ∈ R Case II (x, y)) ∈ R and (y, x) ∈ / R Case III (x, y) ∈ / R and (y, x) ∈ R There is nothing to prove in Cases II and III. So assume Case I. Case Ii: C R ∩ −C R = ∅. This case cannot occur as c = y − x ∈ C R and −c ∈ C R . Thus {c} ⊆ C R ∩ −C R which is a contradiction. Thus, Case Ii does not occur. Case IIii: C R ∩ −C R = {0}. As before we have: c = y − x ∈ C R and −c ∈ C R . Thus c ∈ C R ∩ −C R = {0}. Hence, c = 0. Therefore, x = y. Assume R is antisymmetric. We will show that C R is pointed. c ∈ C R and −c ∈ C R . This implies that there exist x, y, a, b such that (x, y) ∈ R and c = y − x and (a, b) ∈ R and −c = b − a (and c = a − b). Therefore, 0 = (a − y) − (b − x). Thus a − y = b − x and y − a = x − b. As (a, b) ∈ R and R translation invariant we get: (a + (y − a), b + (x − b)) ∈ R. Thus, (y, x) ∈ R. But then because of antisymmetry we get y = x, and c = 0. We now look at the only two cases which can occur: / CR (a) ∀c such that c ∈ C R we have − c ∈ (b) ∃c such that c ∈ C R and − c ∈ C R In case a) C R ∩ C R = ∅ and in case b) C R ∩ C R = {0} (as any c for which b) holds is necessarily equal to 0. 2. Assume C R is convex. We will show that R is transitive. Let (a, b) ∈ R and let (b, c) ∈ R. We want to show that (a, c) ∈ R. As C R is convex and b − a ∈ C R and c − b ∈ C R we get: 21 (b − a) + 21 (c − b) = 21 (c − a) ∈ C R . Thus, c − a ∈ C R , as C R is a cone. Now we invoke translation invariance of R to get (a, c) ∈ R. Conversely assume R is transitive. We need to show that C R is convex. Since
4 Many-Criteria Dominance Relations
109
C R is a cone (R is positive multiplication invariant), it is enough to show that c(1) ∈ C R and c(2) ∈ C R implies c(1) + c(2) ∈ C R . meaning of c(1) ∈ C R is that there exist a, b such that (a, b) ∈ R and b − a = c(1) . Similarly there exist d, e such that (d, e) ∈ R and e − d = c(2) . As R is translation invariant we get (a − a, b − a) = (0, c(1) ) ∈ R. Similarly we get (0, c(2) ) ∈ R. The latter implies (0 + c(1) , c(1) + c(2) ) ∈ R (translation invariance of R). Using transitivity of R we get from (0, c(1) ) ∈ R and (c(1) , c(1) + c(2) ) ∈ R, (0, c(1) + c(2) ) ∈ R. Therefore c(1) + c(2) ∈ C R . That is, C R is closed under addition and as a cone also closed under positive multiplication. In other words, C R is convex. 3. Assume 0 ∈ C R . We will show that R is reflexive. There exists y such that (y, y) ∈ R by the assumption. Let y˜ ∈ Rm . As R is translation invariant we get (y + (˜y − ˜ ∈ R. y), y + (˜y − y) ∈ R. Hence, ∀˜y ∈ Rm : (˜y, y) Conversely assume R is reflexive. As R is non-trivial, we have R /= ∅. Hence, ∀y ∈ Rm it holds that (y, y) ∈ R and therefore 0 ∈ C R . □
References 1. L. Batista, F. Campelo, F.G. Guimaraes, J.A. Ramírez, Pareto cone ε-dominance: improving convergence and diversity in multiobjective evolutionary algorithms, in Evolutionary MultiCriterion Optimization (EMO) (Springer, 2011), pp. 76–90 2. P. Bentley, J. Wakefield, Finding acceptable solutions in the Pareto-optimal range using multiobjective genetic algorithms, in Soft computing in engineering design and manufacturing (Springer, 1998), pp. 231–240 3. D. Bertsekas, A. Nedic, A. Ozdaglar, Convex Analysis and Optimization, vol. 1 (Athena Scientific, 2003) 4. J. Borwein, The geometry of Pareto efficiency over cones. Mathematische Operationsforschung und Statistik. Series Optimization 11(2), 235–248 (1980) 5. J. Branke, K. Deb, H. Dierolf, M. Osswald, Finding knees in multi-objective optimization, in Parallel Problem Solving from Nature (PPSN) (Springer, 2004), pp. 722–731 6. D. Brockhoff, J. Bader, L. Thiele, E. Zitzler, Directed multiobjective optimization based on the weighted hypervolume indicator. J. Multi-Criteria Decis. Anal. 20(5–6), 291–317 (2013) 7. C. Daskalakis, R.M. Karp, E. Mossel, S.J. Riesenfeld, E. Verbin, Sorting and selection in posets. SIAM J. Comput. 40(3), 597–622 (2011) 8. K. Deb, Multi-objective evolutionary algorithms: introducing bias among Pareto-optimal solutions, in Advances in evolutionary computing (Springer, 2003), pp. 263–292 9. K. Deb, M. Mohan, S. Mishra, Evaluating the domination based multi-objective evolutionary algorithm for a quick computation of pareto-optimal solutions. Evol. Comput. 13(4), 501–525 (2005) 10. K. Deb, J. Sundar, Reference point based multi-objective optimization using evolutionary algorithms, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2006), pp. 635–642 11. N. Drechsler, R. Drechsler, B. Becker, Multi-objective optimisation based on relation favour, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2001), pp. 154–166 12. M. Ehrgott, Multicriteria Optimization, 2nd edn (Springer, 2005) 13. G. Eichfelder, Variable Ordering Structures in Vector Optimization (Springer, 2014) 14. M.T.M. Emmerich, A. Deutz, I. Yevseyeva, On reference point free weighted hypervolume indicators based on desirability functions and their probabilistic interpretation. Proc. Technol. 16, 532–541 (2014)
110
A. Deutz et al.
15. M.T.M. Emmerich, A.H. Deutz, J. Kruisselbrink, P.K. Shukla, Cone-based hypervolume indicators: Construction, properties, and efficient computation, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2013), pp. 111–127 16. M. Farina, P. Amato, Fuzzy optimality and evolutionary multiobjective optimization, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2003), pp. 58–72 17. K. Fukuda, A. Prodon, Double description method revisited, in Combinatorics and Computer Science, vol. 11210 (Springer, 1995), pp. 91–111 18. K. Ikeda, H. Kita, S. Kobayashi, Failure of Pareto-based MOEAs: Does non-dominated really mean near to optimal? in Congress on Evolutionary Computation (CEC) (IEEE Press, 2001), pp. 957–962 19. H. Ishibuchi, N. Tsukamoto, Y. Nojima, Evolutionary many-objective optimization, in International Workshop on Genetic and Evolving Systems (GEFS) (IEEE Press, 2008), pp. 47–52 20. I. Kaliszewski, Quantitative Pareto Analysis by Cone Separation Technique (Kluwer Academic Press, 1994) 21. I. Karahan, M. Köksalan, A territory defining multiobjective evolutionary algorithms and preference incorporation. IEEE Trans. Evol. Comput. 14(4), 636–664 (2010) 22. H. Kung, F. Luccio, F. Preparata, On finding the maxima of a set of vectors. J. ACM 22(4), 469–476 (1975) 23. M. Laumanns, L. Thiele, K. Deb, E. Zitzler, Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263–282 (2002) 24. K. Le, D. Landa-Silva, Obtaining better non-dominated sets using volume dominance, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2007), pp. 3119–3126 25. L. Li, Y. Wang, H. Trautmann, N. Jing, M.T.M. Emmerich, Multiobjective evolutionary algorithms based on target region preferences. Swarm Evol. Comput. 40, 196–215 (2018) 26. Y. Liu, N. Zhu, K. Li, M. Li, J. Zheng, K. Li, An angle dominance criterion for evolutionary many-objective optimization. Inf. Sci. 509, 376–399 (2020) 27. K. Maneeratana, K. Boonlong, N. Chaiyaratana, Compressed-objective genetic algorithm, in Parallel Problem Solving from Nature (PPSN) (Springer, 2006), pp. 473–482 28. K. Miettinen, Nonlinear Multiobjective Optimization (Kluwer Academic Publishers, 1999) 29. K. Miettinen, M. Mäkelä, On cone characterizations of weak, proper and Pareto optimality in multiobjective optimization. Math. Methods Oper. Res. 53(4), 233–245 (2001) 30. K. Narukawa, Y. Setoguchi, Y. Tanigaki, M. Olhofer, B. Sendhoff, H. Ishibuchi, Preference representation using Gaussian functions on a hyperplane in evolutionary multi-objective optimization. Soft. Comput. 20(7), 2733–2757 (2016) 31. V. Noghin, Relative importance of criteria: a quantitative approach. J. Multi-criteria Decis. Anal. 6, 355–363 (1997) 32. H. Sato, H. Aguirre, K. Tanaka, Controlling dominance area of solutions in multiobjective evolutionary algorithms and performance analysis on multiobjective 0/1 knapsack problems. IPSJ Digit. Cour. 3, 703–718 (2007) 33. D. K. Saxena, S. Mittal, S. Kapoor, K. Deb, A Localized High Fidelity Dominance based Many-Objective Evolutionary Algorithm. Technical report, Michigan State University (2021) 34. S. Sayın, Measuring the quality of discrete representations of efficient sets in multiple objective mathematical programming. Math. Program. 87(3), 543–560 (2000) 35. J. Serna, M. Monz, K.-H. Küfer, C. Thieke, Trade-off bounds for the Pareto surface approximation in multi-criteria IMRT planning. Phys. Med. Biol. 54, 6299–6311 (2009) 36. P. Shukla, C. Hirsch, H. Schmeck, In search of equitable solutions using multi-objective evolutionary algorithms, in Parallel Problem Solving from Nature (PPSN) (Springer, 2010), pp. 687–696 37. H. Trautmann, J. Mehnen, Preference-based Pareto optimization in certain and noisy environments. Eng. Optim. 41(1), 23–38 (2009) 38. Y. Wang, A. Deutz, T. Bäck, M. Emmerich, Improving many-objective evolutionary algorithms by means of edge-rotated cones, in Parallel Problem Solving from Nature (PPSN) (Springer, 2008), pp. 313–326
4 Many-Criteria Dominance Relations
111
39. M.M. Wiecek, Advances in cone-based preference modeling for decision making with multiple criteria. Decis. Mak. Manufact. Ser. 1(2), 153–173 (2007) 40. S. Yang, M. Li, X. Liu, J. Zheng, A grid-based evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 17(5), 721–736 (2013) 41. I. Yevseyeva, A.P. Guerreiro, M.T.M. Emmerich, C.M. Fonseca, A portfolio optimization approach to selection in multiobjective evolutionary algorithms, in Parallel Problem Solving from Nature (PPSN) (Springer, 2014), pp. 672–681 42. P. Yu, G. Leitmann, Nondominated decisions and cone convexity in dynamic multicriteria decision problems, in Multicriteria Decision Making and Differential Games (Springer, 1976), pp. 61–72 43. E. Zitzler, L. Thiele, J. Bader, SPAM: set preference algorithm for multiobjective optimization, in Parallel Problem Solving from Nature (PPSN) (Springer, 2008), pp. 847–858 44. E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V. Grunert da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)
Chapter 5
Many-Objective Quality Measures Bekir Afsar, Jonathan E. Fieldsend, Andreia P. Guerreiro, Kaisa Miettinen, Sebastian Rojas Gonzalez, and Hiroyuki Sato
Abstract A key concern when undertaking any form of optimisation is how to characterise the quality of the putative solution returned. In many-objective optimisation an added complication is that such measures are on a set of trade-off solutions. We present and discuss the commonly used quality measures for many-objective optimisation, which are a subset of those used in multi-objective optimisation. We discuss the computational aspects and theoretical properties of these measures, highlighting measures for both a posteriori and a priori approaches, where the latter incorporate preference information from a decision maker (DM). We also discuss open areas in this field and forms of many-objective optimisation which are relatively underexplored, and where appropriate quality measures are much less developed including challenges related to developing measures for interactive methods.
B. Afsar · K. Miettinen Faculty of Information Technology, University of Jyvaskyla, P.O. Box 35 (Agora), FI-40014 Jyvaskyla, Finland e-mail: [email protected] K. Miettinen e-mail: [email protected] J. E. Fieldsend (B) Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK e-mail: [email protected] A. P. Guerreiro INESC-ID, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal e-mail: [email protected] S. Rojas Gonzalez Department of Information Technology, Ghent University, Technologiepark-Zwijnaarde 126, Gent B-9052, Belgium e-mail: [email protected] Data Science Institute, University of Hasselt, Agoralaan Building, Diepenbeek 3590, Belgium H. Sato The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_5
113
114
B. Afsar et al.
5.1 Introduction The key difference between evolutionary multi- and many-objective optimisation (EMO and EMaO) is the number of objectives considered. For multi-objective optimisation, this is regularly described as being two-three objectives, whereas for manyobjective optimisation this is four or more objectives [53, 67, 81]. It should be noted however that in related fields like Multiple Criteria Decision Making (MCDM), such a distinction is generally not made (and, instead, typically bi-objective optimisation problems are separated from the rest). Why is the distinction between multi- and many-objective problems particularly important for the evolutionary optimisation community? The main reason is that in higher objective dimensions, Pareto dominance tends not to effectively disambiguate solutions. As such, many optimisers which are well adapted to problems with two or three objectives drastically under-perform on problems with a larger number of objectives, as they lose convergence pressure [82]. This leads to the distinct development of optimisation heuristics particularly for the many-objective case. Inexorably linked to this is the quality measure(s) employed for the optimisation task. Not all measures used for EMO are appropriate or scale-well for EMaO, and there may be additional concerns specific to the EMaO domain that need to be captured. For example, quality measures (or indicators) have known biases and issues, some of which worsen particularly as the number of objectives increases. These include computation time, variability with any required reference samples, and bias towards regions/edges of the Pareto front. It will be helpful at this point if we consider at a high-level what the desired properties of a many-objective quality measure might be. First, we need to consider the measure’s task. Is it being used to drive an optimisation? Compare between optimisers? Or both? Certainly, the computational overhead when a measure is used repeatedly during an optimisation is more burdensome than when used to compare between algorithms—as this tends to be conducted at a lower frequency with regard to the number of solutions evaluated. In practice, by assigning a real value to each set, quality indicators induce preferences over sets. As such, it is important to understand the biases associated to each indicator, and be aware of the guarantees, shortcomings and difficulties of evaluating the quality of approximation sets with the chosen indicator(s). Monotonicity [104], optimal μ-distributions [5], parameter dependence [61, 104], and scaling invariance/independence [61, 104] properties help characterise the preferences reflected by each indicator. It is important to understand how sensitive these are to different scenarios, e.g., different parameter settings. The computational effort required to compute an indicator may be an obstacle, with some indicators requiring exponential time in the number of dimensions, and is thus also an important aspect in many-objective optimisation. Monotonicity properties formalise the agreement between indicator values and a binary relation, such as (weak/strict) Pareto dominance [40, 104]. A quality indicator should be, at least, weakly monotonic with respect to weak Pareto dominance. Such
5 Many-Objective Quality Measures
115
property indicates that the values assigned by the indicator do not contradict weak Pareto dominance, i.e., a set dominated by another will never be assigned a better value than the dominating set [104]. Judging the quality of approximation sets based (solely) on a quality indicator that does not possess such property may be misleading. Typically, the goal of EMO and EMaO is to find a good limited-size approximation of the Pareto front and therefore an approximation to the Pareto set. From an indicator point-of-view, the goal is to find a limited-size subset of the objective space that optimises the indicator, i.e., an indicator-optimal subset. The indicator-optimal subsets of μ points are called the optimal μ-distributions [5]. The possession of particular (strict) monotonicity properties allow us to infer information on indicatoroptimal subsets. Depending on the set-relation with respect to which an indicator is weak/strict monotonic, it may be possible to ensure whether there is a non-empty subset of the Pareto front that is indicator optimal, and/or whether all indicator-optimal subsets contain a Pareto-front point (see Chap. 8). When it is the case that indicatoroptimal subsets are subsets of the Pareto front, an important aspect is the distribution of the points in such subsets. The study of optimal μ-distributions seeks the characterisation of such distributions, such as how the μ points are distributed along the Pareto front and the influence of the indicator parameters on their distribution. Many quality indicators depend on parameter settings. Although these may offer flexibility to integrate some degree of preference information, they can also be a source of difficulty, particularly if they require knowledge not possessed by the user. Setting such parameters may require problem knowledge, and ideally, the understanding of whether and how such parameters affect the ranking of sets imposed by the indicator. Typically, the least the number of parameters to be set, the better. Besides parameter settings, objective space (re)scaling may also have an effect on indicator-based assessment. An indicator is scaling invariant if the assigned values remain the same when the objective space is monotonicaly (re)scaled and is scaling independent if the order imposed on sets remains the same [104]. The following listed properties may be desirable: • Interpretable by a DM or a practitioner. • Cheap to calculate. • Insensitive to objective rescaling (invariant versus independent—value does not change versus ranking does not change). • Insensitive to Pareto front shape. • Ability to capture the characteristics of a successful solution process. • Ability to embed and represent preference information. • No requirement to have access to the Pareto set. • No requirement to have access to the Pareto front. • Clearly understood biases/preferences inherent in measure. • Relatively few meta-parameters, which are easy to understand and report (for reproducibility). • Ability to be applied to noisy problems. • Ability to be applied to uncertain problems where robust solutions are desired.
116
B. Afsar et al.
As can be seen from this list, the burden on an indicator can be quite high, and indeed we do not yet have access to a single indicator which satisfies all these properties. As such, the goal of this chapter is threefold: 1. to provide the reader with an awareness of how the existing indicators scale, hence their usability in the many-objective domain; 2. to outline the typical need to have DMs involved in many-objective optimisation problems, and how this need interacts with indicators/measures; 3. to highlight additional areas of complication that are much less explored, for many-objective optimisation, and how these may impact on the quality measures used. To support these goals the chapter proceeds as follows. In Sect. 5.2 an overview of the quality measures currently exploited in the many-objective optimisation domain is presented, where a good approximation to the whole Pareto front is sought. Alongside these descriptions, details are also provided regarding why these particular measures from the EMO literature have particularly found favour in the many-objective context. Following this, Sect. 5.3 introduces quality measures developed for evolutionary approaches that include preference information in the form of a reference point. Section 5.4 presents some areas of many-objective optimisation which are relatively under-explored (e.g., noisy, robust, interactive), and the characteristics of the quality measures required in such areas. This is followed in Sect. 5.5 with a summary of some future directions in the area and issues to consider when using quality measures in the many-objective optimisation setting. The chapter concludes with Sect. 5.6.
5.2 Currently Used Measures in Many-Objective Optimisation and Their Scalability, Complexity and Properties There exist over 100 quantitative performance indicators in the literature. In the surveys of [25, 70, 83, 104], most of the existing quality indicators are reviewed and generally classified, while the most relevant ones are scrutinized and compared. Evidently, not all of these indicators serve the same purpose, and their specific characteristics will define their particular usability. The literature also offers more specific reviews of certain indicators, such as studies on their consistencies and contradictions (see e.g., [55]), and analysis of the performance of indicators using approximated optimal distributions in the objective space (see e.g., [93]). In this section, we formally define the most widely used indicators and describe their most relevant properties, such as their complexity and monotonicity. We emphasize the difference between (i) indicators that are useful to compare the performance of algorithms (e.g., solution set A is better than solution set B according to a given indicator), and (ii) using quality indicators to search for solutions (i.e., indicatorbased algorithms). In further sections we will discuss the usage of indicators in
5 Many-Objective Quality Measures
117
practice (i.e., to efficiently converge interactively to the most preferred solution for the DM).
5.2.1 Most Commonly Used Indicators Among the many existent quality indicators in multi/many-objective optimisation, a few have been widely adopted, particularly for performance assessment: the hypervolume indicator, the indicators related to the generational distance, the R2 indicator and the e-indicator. All of these assess the quality of a set of solutions by evaluating the corresponding images in the objective space, i.e., they map a (non-dominated) set of points in the objective space, Rm , where m is the number of objectives, into a scalar value that reflects the quality of the set. All of them are parameter-dependent, for example, depend on a (set of) reference point(s). Note, however, that the semantics of reference point(s) may differ between indicators (it may represent an ideal point or a point that is meant to be dominated by all points in the input set), and may differ from the semantics of reference points used in the following sections that are directly connected to preference information. Currently, there is no perfect quality indicator that is, simultaneously, easy to compute, to interpret and to use (regarding parameter settings), that is biased towards evenly distributed sets independently of the shape of the Pareto front, and that possesses desirable properties. This section reviews the characteristics of the most common indicators considering all these aspects. Regarding desirable properties, particular attention will be given to Pareto compliance. Assuming (w.l.o.g.) minimisation, a point p ∈ Rm is said to weakly dominate a point q ∈ Rm , which is represented by p ≤ q, if p is less or equal than q in every objective, i.e., pi ≤ qi for all i = 1, . . . , m. Such a definition may be extended to sets of points. In the context of performance assessment, it is fundamental that an indicator does not contradict Pareto set-dominance. There are a few Pareto setdominance relations (see Chap. 8 for formal definitions). In this context we consider the following. A point set P ⊂ Rm is said to (strictly) dominate a point set Q ⊂ Rm if, for every point in Q, there is a point in P that weakly dominates it, and there is at least one point in P that is not weakly dominated by any point in Q. In such a case, a (good) quality indicator should not contradict this relation, it should assign a better or equal indicator value to P than to Q, i.e., it should be weakly monotonic with respect to strict dominance. This property is also referred to as weak Pareto compliance. This is a minimum requirement for a quality indicator not to be misleading. Ideally, the indicator should also be strictly Pareto compliant, i.e., if P dominates Q, then it should assign a strictly better value to P than to Q. In the following descriptions of the most commonly used quality indicators, biobjective examples are given for simplicity. Nevertheless, most examples are easily extendable for an arbitrary number of objectives.
118
B. Afsar et al.
Fig. 5.1 Example of (a) the HV (grey area), (b) the Pareto compliance property, where H V (P, r ) > H V (Q, r ), and (c) the optimal μ-distribution given a linear Pareto front and μ = 5
5.2.1.1
Hypervolume Indicator (HV)
The hypervolume indicator (HV) measures the region of a subset in the objective space enclosed by a (non-dominated) point set P ⊂ Rm and an upper (reference) point r ∈ Rm [106] (see the example in Fig. 5.1a): ⎛ H V (P, r ) = λ ⎝
||
⎞ {q ∈ Rm | p ≤ q ∧ q ≤ r }⎠ ,
p∈P
where λ denotes the Lebesgue measure. The greater the HV, the better the approximation of the Pareto front from multiple viewpoints: the convergence of the nondominated set toward the Pareto front, the spread of the non-dominated set in the objective space, the distribution of the non-dominated set, and its size contribute to increasing HV. The HV depends on a single parameter, the reference point, it does not require knowledge of the Pareto front, it is easy to interpret, and it is scaling independent [61]. The monotonicity properties possessed by the HV ensure that HV of a point set Q ⊂ Rm is never greater than the HV of a point set P ⊂ Rm that strictly dominates Q. This can be observed in Fig. 5.1b, where the region dominated by Q is contained in the region dominated by P. Such monotonicity properties also ensure that, provided that all points in the Pareto front dominate r , any indicator-optimal subset either contains the Pareto front (if the maximum set size is larger than the number of elements in the Pareto front) or is a subset of the Pareto front [40, 104]. The exact distribution of the points in such indicator-optimal subsets, the so-called optimal μ-distributions, are known only for specific Pareto fronts in low-dimensional cases [20, 91]. For example, the optimal μ-distribution for a bi-objective linear Pareto fronts is such that points are evenly spaced between the two outer points (see the example in Fig. 5.1c), and the density of points on arbitrary bi-objective Pareto fronts depends on the slope of the front [5]. The absolute position of points depends on the reference point.
5 Many-Objective Quality Measures
119
Mainly approximate results are known for more general fronts with more than two objectives [93]. In particular, it has been shown and observed how the reference point influences the optimal μ-distributions [5]. For example, in bi-objective cases, the inclusion of extreme points in the optimal μ-distributions depends on where the reference point is placed. In fact, in some cases there is no setting of the reference point that ensures the inclusion of the extreme points in the optimal μ-distributions [4, 5]. It has been observed for some fronts that the points are more evenly spread all over the front if the reference point is placed near the front, whereas if placed far away the points tend to be spread just on the boundary [50]. Recommended specifications of the reference point can be found in [5, 50]. The main drawback of HV is the computational cost to calculate it which increases exponentially as the number of objectives increases. The asymptotically fastest algorithm known for m ≥ 4 objectives computes the HV for a set of n points in O(n m/3 polylog n) time [22]. Although efficient calculation of HV is a hot-topic [41, 89], we currently face a difficulty in calculating HV exactly at about ten or more objectives. For these high-dimensional cases, estimation techniques for calculating HV are more popular, as they can be more cost-effective than exact methods. An example are the approaches based on Monte Carlo sampling [6, 31, 54]. Monte Carlo sampling was first used in a HV-indicator based evolutionary algorithm (HypE) [6] since HV calculations are repeated to determine the HV contribution of each solution in the selection process. Also, the R2-indicator is utilised to approximate the HV values [90]. The integration of HV in EMaO for selection or archiving purposes may be computationally expensive. A natural aim is to always select, from the available set of n points, the subset of k points that maximise the indicator, where k is the population or the archive size. This is known as the Hypervolume Subset Selection Problem (HSSP) [6]. It can be solved efficiently in the bi-objective case in O(n(n − k) + log n) time [65], and it can be solved in polynomial time in n and k in the case of k = n − 1 where it consists of computing the point that contributes the least [18, 37]. However, the HSSP is NP-hard in m ≥ 3 [17], and current algorithms are exponential either in n, k, or n − k [18, 37], which makes them viable only for very small cases [41]. For this reason, the use of HV for selection and archiving has been used mostly for the particular case of k = n − 1 and through (greedy) approximations [41, 89].
5.2.1.2
Generational Distance (GD)
The Generational Distance (G D) is a function of the distances of the points in a non-dominated set to the closest points in a reference point set in the objective space. We prepare a reference point set R ⊂ Rm , which assumes an ideal set of points to approximate the Pareto front. Then, the G D of a non-dominated set P ⊂ Rm is calculated by [97]:
120
B. Afsar et al.
Fig. 5.2 Example of (a) the computation of G D, and (b) an example showing that G D is not Pareto compliant, where G D(P, R) > G D(Q, R)
⎛ ⎞1/t 1 ⎝E G D(P, R) = min d( p, r )t ⎠ , |P| p∈P r∈R where d( p, r ) is the Euclidean distance between points p and r , |P| denotes the cardinality of P, and a commonly accepted setting of t ∈ R is t = 1. For t = 1, G D represents the arithmetic mean of the distances between each point in P and its closest point in the reference set, R. The arrows in the example in Fig. 5.2a reflect the distance between each point p ∈ Rm and its closest point in ¶, where G D is the average between these four distances. The computation of this measure is upper bounded by m × |P| × |R|. For any dimensional Pareto front, a reference set of points R is employed for its representation. A linear increase in the number of objectives may require an exponential number of points to achieve a good coverage of the entire Pareto front, which is a particular issue in the many-objective case [49]. Although its interpretation is intuitive, G D is not a comprehensive metric to evaluate the total approximation quality of the Pareto front. It is viewed as a metric to assess the convergence of solutions toward the Pareto front mainly [70]. Generally, the lower the G D is, the better the convergence is expected to be. However, this measure is not Pareto compliant, and therefore, it is possible that a point set P ⊂ Rm that dominates a point set Q ⊂ Rm is assigned a higher G D value than Q is (given the same reference set) [61,√107]. This can be observed in the example of√Fig. 5.2b where G D(P, R) = (2 × 22 + 12 )/2 ≈ 2.24 and G D(Q, R) = (2 × 22 )/2 = 2, hence G D(P, R) > G D(Q, R) even though P dominates Q. Therefore, this measure can be misleading, as lower values may not always mean better sets with respect to Pareto dominance. Furthermore, G D is sensitive to the reference set R, i.e., the optimal μ-distributions depend on the reference set. G D is not a good measure of diversity, nor uniformity. For example, assuming that R is a (sub)set of the Pareto front, then any non-empty subset of R is an optimal subset with a G D value of zero.
5 Many-Objective Quality Measures
121
Fig. 5.3 Example of a the I G D computation, and b an example showing that it is not Pareto compliant, where I G D(P, R) > I G D(Q, R)
5.2.1.3
Inverted Generational Distance (IGD and IGD+ )
The Inverted Generational Distance (I G D) is calculated from the inverted viewpoint from G D. The I G D is the arithmetic mean of the distances between each point in a reference set R ⊂ Rm and the closest point in a given non-dominated set P ⊂ Rm [14]: 1 E min d(r, p), I G D(P, R) = |R| r ∈R p∈P where d(r, p) is the Euclidean distance between a reference point r and an objective vector p. The arrows in the example in Fig. 5.3a reflect the distance between each point p ∈ P and its closest point in R, where I G D is the average between these three distances. Similarly to G D, the computation of I G D is upper bounded by m × |P| × |R|. Generally, the lower the I G D, the better the approximation of the Pareto front is from multiple viewpoints. The proximity of the non-dominated set to the Pareto front, the spread of the non-dominated set in the objective space, and its size contribute to decreasing I G D. However, I G D shares the same main drawback of G D, it is not Pareto compliant [52], which means that this measure can also be misleading. Figure 5.3b shows an example where the set P has a worse I G D value √ than the√set Q, i.e., I G D(P, R) > I G D(Q, R), 32 + 12 + 12 + 12 )/3 ≈ 2.58 and I G D(Q, R) = where I G D(P, R) = (2 × √ √ √ 2 2 2 2 ( 2 + 1 + 3 + 2 )/3 ≈ 2.41, even though P dominates Q. Finally, the optimal μ-distributions of I G D also depend on the reference set [51]. I G D + is a modified version of I G D, where the Euclidean distance d(r, p) is replaced by the following distance that only considers the components of vector p ∈ P that are greater than the corresponding component of the reference point r ∈ R [51, 52]:
122
B. Afsar et al.
Fig. 5.4 Example of (a) the IGD+ computation, and (b) an example reflecting the Pareto compliance property, where I G D + (P, R) ≤ I G D + (Q, R)
d(r, p) =
/
(max{ p1 − r1 , 0})2 + . . . + (max{ pm − rm , 0})2 .
Figure 5.4a shows an example of the computation of IGD+ , where the arrows reflect the modified distances considered, which are then averaged. Unlike G D and I G D, I G D + is weakly Pareto-compliant [52], which is an important advantage over the former ones. The example in Fig. 5.4b illustrates this property, showing that by taking into account only the vector components greater than the corresponding components of the reference vector, then for each point q ∈ Q, the point in p√∈ P that the example, I G D + (P, R) = (3 × 12 )/3 = dominates it, must be closer to √ r . In √ 1 and I G D + (Q, R) = (2 × 22 + 32 )/3 ≈ 2.33, and therefore I G D(P, R) < I G D(Q, R). Still, I G D + depends on an appropriate setting of the reference set. Note for example, that R should not be dominated by P, as any set Q ⊂ Rm that weakly dominates R has an I G D + value of zero, which would not be useful for comparison purposes. Ideally, if no a priori preference information is known, R should be (a good representation of) the Pareto front. Since the setting of the reference set has an impact on the calculated I G D + , a large reference set compared with the obtained non-dominated set is generally employed/advised. As in practice it may be difficult to obtain such a representation of the Pareto front, sometimes the union of the known approximation sets is used as R [51]. It is however, important to understand the implicit biases that may arise from this [69]. For example, if such a set R contains more dense regions, then the sets containing more points in such regions are preferred with respect to I G D + . To favour more well-distributed sets, the reference set should also be well distributed. Approximations of optimal μ-distribitions of I G D + (and other indicators) were investigated in [93]. Although the computational cost of I G D + depends on the size of the reference point set, it is generally cheaper than that of HV, especially in many-objective optimisation problems.
5 Many-Objective Quality Measures
123
Fig. 5.5 Example of the optimal μ-distribution of the R2 indicator on a linear Pareto front where μ = |/\| = 3
5.2.1.4
R2 Indicator Family
The R2 indicator family [42], in its unary version [21], evaluates the quality of a point set P ⊂ Rm considering a (discrete and finite) set of utility functions. There are different choices for the utility functions, the most commonly used are based on the weighted Chebyshev function [21, 42]. In such a case, R2 is also a comprehensive metric [70], i.e., proximity to the Pareto front, spread, uniformity, and cardinality are aspects that influence the indicator value of a given point set. Given a predefined weight vector set /\ ⊂ Rm and a utopian point (i.e., a point that is not dominated by any feasible point), z ∗ ∈ Rm , the R2 indicator for a non-dominated point set P ⊂ Rm is calculated by [21] 1 E min R2(P, /\, z ) = |/\| λ∈/\ p∈P ∗
{ max
i∈{1,2,··· ,m}
λi ·
|z i∗
} − pi | .
For every weight vector λ, the minimum (best) scalarising function value is found among the points in P, and their average value is calculated as R2. The time complexity to compute it is O(m · |/\| · |P|). The R2 indicator is weakly Pareto compliant and scaling dependent. The optimal μ-distributions with respect to R2 depend on the set of weight vectors and the utopian point, and have been theoretically [21] and empirically [21, 93] studied. Each weight vector corresponds to a ray that goes through the utopian point (see Fig. 5.5). Assuming that all such rays intersect the Pareto front, if μ ≥ |/\| then an optimal μ-distribution is the set of intersection points between the rays and the Pareto front. If μ = |/\|, then such a distribution is unique (see the example in Fig. 5.5), otherwise (μ / = |/\|) there are cases where it is not unique [21]. In fact, in some cases there are infinitely many optimal μ-distributions [21]. Although uniformly distributed weight vectors are typically used as /\, and in some cases lead to uniformly distributed point sets (e.g., for linear Pareto fronts) [21], note that such weight vectors may not have this effect for problems with complicated Pareto front shapes, at least if |/\| is small.
124
B. Afsar et al.
Fig. 5.6 Example of (a) the additive e-indicator, where R ' is the set R translated by e = 2, and (b) an example reflecting the weak Pareto compliance property, where Ie+ (P, R) < Ie+ (Q, R)
5.2.1.5
e-Indicator
The additive (multiplicative) e-indicator [107], denotes the minimal value e ∈ R that must be added (multiplied) to every coordinate of every point in a reference set R ⊂ Rm such that it becomes weakly dominated by a given (non-dominated) point set P ⊂ Rm . The additive e-indicator is given by: Ie+ (P, R) = max min max
r∈R p∈P i∈{1,...,m}
pi − ri .
The multiplicative e-indicator is given by: Ie (P, R) = max min max
r∈R p∈P i∈{1,...,m}
pi . ri
In practice, only one pair of points (one from P and the other from R) is responsible for the indicator value. Figure 5.6a shows an example of the additive e-indicator. The left-hand side shows the set P, the reference set R, and the pair of points that define e. The right-hand side of Fig. 5.6a (the middle of Fig. 5.6), additionally shows the set R ' , which corresponds to adding e = 2 to each coordinate of every point in R, and shows that every point in R ' is weakly dominated by some point in P. The indicator is weakly Pareto compliant [107], and in this case, the lower the indicator value, the better. Figure 5.6b shows an example where the set P dominates Q and is therefore assigned a lower indicator value (Ie+ (P, R) = e (1) = 1) than Q is (Ie+ (Q, R) = e (2) = 3). This indicator is viewed as a comprehensive metric that takes into account convergence, spread and distribution. It may be computed in O(m × |P| × |R|) time. Similarly to other indicators that require a reference set, the optimal μ-distribution depends on such a set. However, since in practice the coordinate of a pair of points define the indicator value, the optimal μ-distributions may not be unique, because there is some tolerance with respect to the exact positioning of the remaining μ − 1 points.
5 Many-Objective Quality Measures
125
Selecting a subset of (at most) k points from P such that the e-indicator is minimised is known as the e-indicator subset selection problem (EPSSSP) [19, 98]. In the two-dimensional case, this problem can be solved exactly in O(k|P| + |P| log |P|) time [98]. Alternatively, a randomised algorithm can be used to compute it in expected O(|P| log |P| + |R| log |R|) times [19].
5.2.2 Indicator-Based Algorithms The quality measures described above are often employed to evaluate a solution set obtained by multi- and many-objective optimisation. However, these indicators can also be employed as criteria to compare solutions in evolutionary algorithms; these are commonly called indicator-based algorithms. Indicator-based algorithms provide a ranking of solutions based on a scalar value: for a given solution, the indicator values are computed with and without the target solution, and the difference between these indicator values, referred to as contribution, is used as the fitness of each solution. The IBEA algorithm [105] is one of the first multiobjective algorithms to employ quality indicator contributions to rank solutions during the evolutionary phase of the algorithm (originally tested with up to 4 objectives). In IBEA, the (binary) additivee and hypervolume indicators were used to determine the fitness of each solution. Since then, the use of the hypervolume indicator for the selection has been commonly proposed in the literature [9, 35, 60, 62, 63]. However, evaluating the hypervolume at least twice per solution at each generation results in a computational overhead with increasing number of objectives and/or large population sizes, and thus becomes infeasible in practice. To reduce this computational cost, the HypE algorithm [6] uses Monte Carlo sampling to approximate hypervolume contributions specifically for many-objective problems, resulting in a less accurate metric [10]. Moreover, recent research also suggests using the R2-indicator with several forms of modified Chebyshev decomposition functions [73] to better approximate HV [90]. This R2-based approximation of hypervolume is used in the recent R2HCA-EMaO algorithm [88], showing promising results with up to 15 objectives. I G D has also been employed in indicator-based EMaO algorithms. The recent AR-MOEA [95] has been evaluated with up to 10 objectives; it utilises an enhanced I G D metric, referred to as I G D-N S, to distinguish the non-dominated solutions which did not have any contribution in the calculation of IGD. I G D-N S uses the distance from every non-dominated solution to the nearest reference point. The reference points are distributed on an approximate Pareto front using an adaptive method, such that the true Pareto front is not required in the algorithm. Furthermore, the R2 indicator is employed in the MOMBI algorithm [38] to search for solutions, and tested with up to 8 objectives. The main idea of the algorithm is to rank individuals using a set of utility functions. This ranking is a non-dominated sorting of solutions based on the number of utility functions each individual optimises simultaneously (as opposed to using standard non-dominated sorting based on Pareto dominance).
126
B. Afsar et al.
Recently in [10], IBEA, HypE and other more advanced EMaO algorithms were compared. Surprisingly, the IBEA algorithm (using both the additive-e and hypervolume indicators) showed competitive results in terms of convergence for up to 10 objectives. This result is reinforced in the recent work of [84], where several state-ofthe-art many-objective optimization algorithms are benchmarked using a 5-objective practical problem in the construction engineering domain, and IBEA outperforms all of them in terms of hypervolume. As also noted in [84], the design of indicator-based algorithms largely depends on the problem at hand, as not all indicators serve the same purpose. We believe that designing more comprehensive test suites for many objective algorithms (see Chap. 6) will substantially aid the development of more advanced indicator-based approaches.
5.3 Quality Indicators for a Priori Methods Indicators discussed in Sect. 2 are general by nature and aimed to judge the quality of a set of solutions (by solutions, we mean the points in the objective space) approximating the whole Pareto front. In this, they measure the closeness to the Pareto front and the diversity of solutions in cases when no preference information is available. However, when solving problems with many objectives, one solution must eventually be selected to be implemented. Steps to be taken to eventually determine the final feasible solution to the given problem is called a solution process. Typically, this is done by a DM, an expert in the problem domain. In particular, when the number of objectives increases, it is computationally expensive to try to prepare a good representation of the whole Pareto front and, on the other hand, it is cognitively demanding for the DM to compare and analyse a large amount of high-dimensional data. The efforts needed may be decreased if preference information from a DM is available to be incorporated in the solution process. Methods that incorporate a priori preference information are sometimes called in the literature preference-based EMO or EMaO methods. One should, however, note that there are also so-called interactive methods that are based on preferences but where preferences are incorporated iteratively during the solution process. For further information about aforementioned method types see, e.g., [74, 75, 77]. Some indicators have been developed for cases when the DM can provide preference information a priori, that is, before the actual solution process. In these cases, the DM is interested in only a part of the Pareto front. Typically, the preference information is given as a reference point in the a priori evolutionary methods proposed in the literature. As components of a reference point, the DM can indicate aspiration levels and/or reservation levels referring to desired and undesired objective function values, respectively. Naturally, corresponding indicators also take this reference point into account. Some of the proposed indicators in the literature to measure the performance of a priori evolutionary methods are described in what follows. In all of them, the size of the region of interest (ROI) that is, the part of the Pareto front that
5 Many-Objective Quality Measures
127
Fig. 5.7 Illustrative example of a composite front and a ROI
the DM is interested in, depends on a parameter that has to be provided by a user, and the size of the ROI directly affects the indicator values.
5.3.1 User-Preference Metric Based on a Composite Front The user-preference metric based on a composite front (UPCF) [78] has been developed for comparing several evolutionary algorithms that involve a reference point provided by a DM a priori. It adapts regular indicators (I G D, HV) but before applying them, a ROI is defined based on the above-mentioned reference point. The ROI is constructed on a composite front which acts as a replacement for the Pareto front and is used as a reference point set in the I G D calculation. The composite front is formed by first merging solutions of the algorithms that are to be compared and then selecting the non-dominated ones. An example is given in Fig. 5.7a, where two solution sets are merged and the dominated solutions are removed. While defining the ROI, the Euclidean distances between each of the solutions in the composite front and the provided reference point are calculated, and the closest one is selected as the centre of the ROI (see Fig. 5.7b). The size of the ROI is controlled by a parameter denoted by r ∈ R. Once a ROI has been defined on the composite front, the I G D and HV indicators can be applied on the solutions in the ROI. I G D and HV values are calculated on the basis of solutions within the ROI. For the I G D indicator, the solutions in the composite front are used as a reference set instead of using points in the Pareto front. One of the advantages of UPCF is that no knowledge of the true Pareto front is required in the indicator calculations. In addition, since it incorporates preference information in applying existing indicators, it can measure convergence and diversity concerning the ROI. On the other hand, UPCF has also some flaws. The parameter r plays a significant role and the values of the indicator are highly dependent on this
128
B. Afsar et al.
parameter. Solutions outside the ROI are not considered in the indicator calculation, and they are regarded equally bad independently of their distance to the provided reference point. Furthermore, if some of the compared algorithms have not found solutions in the ROI, their performance cannot be evaluated since only solutions in the ROI are considered. If the I G D is to be used as an indicator, this issue could be addressed by measuring the distances between solutions in the ROI and those obtained by the algorithm, even though this is not mentioned in [78] originally.
5.3.2 R-metric Similar to UPCF, the R-metric [68] also constructs a composite front by merging solutions obtained by evolutionary algorithms to be compared considering nondominated solutions of this merged set. For each solution set obtained by each algorithm in the composite front, a representative solution (referred to as a pivot point) is defined as the solution having the minimum achievement scalarizing function (ASF) [100] distance to the reference point provided by the DM. Once the pivot point is found, a ROI for each solution set is defined as a cubic centered at the pivot point with a side length determined by a threshold parameter called delta. In Fig. 5.8a, b, eligible solutions of two algorithms are illustrated. Solutions located outside of the corresponding ROI are eliminated, as shown in Fig. 5.8b. Only solutions in the ROI are considered for performance assessment. Before applying regular indicators, the R-metric transfers the solutions into virtual positions according to a so-called iso-ASF line which is calculated by using an ASF between the reference point and a so-called worst point, as shown in Fig. 5.8c. Finally, I G D and HV indicators can be applied on the transferred solutions. Values of the aforementioned parameters delta and worst point need to be provided by the user of R-metric. Similarly to the UPCF indicator, if some of the compared algorithms have no solutions in the composite front (i.e., all its solutions are dominated by other solutions), the R-metric cannot evaluate its performance. Additionally, the parameter delta, which is used to control the size of the ROI, can significantly affect the comparisons as does the parameter r in UPCF. Unlike UPCF, a ROI is defined by considering solutions of each algorithm in the composite front individually. This means that if any of the compared algorithm has some solutions in the composite front, some of these solutions are going to be in the ROI even after elimination. Therefore, Rmetric can evaluate the performance since only the solutions in the ROI are used for performance assessment.
5.3.3 Other Indicators for a Priori Evolutionary Methods The PMOD indicator [45] has been developed for assessing the performance of an a priori evolutionary method. It constructs a hyperplane based on the reference point
5 Many-Objective Quality Measures
129
Fig. 5.8 Illustrative example of defining ROI and transferring solutions to calculate R-metric
provided by a DM. So-called mapping points for each solution are obtained on the constructed hyperplane and a ROI is defined with the reference point as the centre, as shown in Fig. 5.9. The size of the ROI is controlled by a user-defined parameter r. Solutions whose corresponding mapping points are in the ROI, are called preferred solutions. To measure convergence, distances from the preferred solutions to an ideal point are calculated. To measure diversity, the standard deviation of distances from each mapping point to the nearest point on the hyperplane is calculated. There is a penalty coefficient if the mapping point of the preferred solution is not in the ROI. The indicator proposed in [101] called PMDA (preference metric based on distances and angles) also assesses the performance of an a priori evolutionary method. It uses the idea of decomposition to transform the preference information provided as a reference point. This information of the given reference point is decomposed into m + 1 light beams originated from the ideal point to define a ROI (where m is the number of objectives), as can be seen in Fig. 5.10. The central light beam (L 1 ) is the one launched from the ideal point to the provided reference point. Then, a preference-based hyperplane is constructed and a mapping point (A) is defined which is an intersection point between the central light beam and the preference-
130
B. Afsar et al.
Fig. 5.9 Illustration of the PMOD metric
Fig. 5.10 Illustration of the ROI for PMDA metric. L 1 , L 2 , and L 3 are light beams
based hyperplane. The size of the ROI is controlled by a user-defined parameter which defines the distances between A–B and A–C (see Fig. 5.10). The value of the PMDA indicator is computed by means of distances and angles to evaluate convergence and diversity of the solutions in the ROI by taking preference information into consideration. As mentioned, all indicators discussed in this section involve parameters to control the size of the ROI, and these parameters affect the indicator values directly. Thus, these indicators must be used with care. Naturally, the limitations of the indicators mentioned in the preceding sections are still valid for the indicators discussed in this section. Furthermore, indicators that can handle other types of a priori preference information besides reference points are missing.
5.4 Under-Explored Areas for Quality Indicators In this section we outline areas where quality indicators are under explored, even in the multi-objective let alone the many-objective case, and explain the additional problems that stem from them, and those which may worsen as we move from the multi- to the many-objective case.
5 Many-Objective Quality Measures
131
5.4.1 Noisy Multi- and Many-Objective Optimisation The problem of many-objective optimisation under output uncertainty can be defined as: min[ f 1 (x, e1 ), ..., f m (x, em )] for m ≥ 4 objectives, where the decision variables x = [x1 , ..., xn ]T are contained in the decision space X (usually X ⊂ Rn ), with f : X → Rm the vector-valued function with coordinates f 1 , ..., f m in the objective space o ⊂ R m . The observational noise is denoted by e j , j = 1, ..., m, and it is commonly assumed to be (i) additive, because it is added to any noise that is extrinsic to the system, and (ii) independent among the different objectives and identically distributed across replications. The observed performance for each objective is estimated averaging the value of ri replications at a given input vector xi as E by i ¯f j (xi ) = rk=1 f jk (xi )/ri , j = 1, ..., m., where f jk (xi ) denotes the performance of the kth observation of xi on objective j. In the literature, the noise is commonly assumed to be homogeneous (i.e., the objective values are perturbed with constant noise levels across the search space): f j (x, e j ) = f j (x) + e j , j = 1, ..., m. In practice, however, the noise is often heterogeneous [58] (i.e., its level is dependent on the decision variables and thus not constant throughout the search space): f j (x, e j ) = f j (x) + e j (x), j = 1, ..., m. Note that since the observed performance of the objectives is uncertain, we aim at finding the true Pareto set, as opposed to relying only on the observed Pareto front to determine the non-dominated solutions. We refer to [33] for a discussion of the different types of output noise in multi- and many-objective optimisation. Static resampling (i.e., to evaluate each solution a fixed number of times and use the mean of these replications as an approximation of the response value) is the most commonly used method for handling noise during optimisation [56]. In general, relying on the observed mean objective values to determine the non-dominated points may lead to two possible errors due to sampling variability: solutions that actually belong to the non-dominated set can be wrongly considered dominated, or solutions that are truly dominated are considered Pareto-optimal. In [23], these errors are referred to as Error Type 1 and Error Type 2, respectively, whereas [46] refer to them as misclassification by exclusion (MCE) and misclassification by inclusion (MCI), respectively. See Fig. 5.11 for an illustration. The detrimental effect of the noise when identifying the non-dominated set of points aggravates with the increasing number of objectives. To exemplify this effect, Table 5.1 shows the deterministic versus noisy values of HV and I G D for a concave test problem with number of objectives m = {2, 3, 5, 7, 9}, as well as the number of misclassification errors. These problem settings are analogous to those shown in Fig. 5.11: the (discrete) sets of non-dominated and dominated points are artificially generated and perturbed with heterogeneous noise. To compute the indicator values, the objectives are normalised in the [0, 1] interval, and we assume minimisation of all objectives. We use a value of 1 for j = 1, ..., m objectives as the reference point for HV, and we use the true Pareto front of the function for the computation of IGD. It is clear how the inference on the true performance of a given algorithm in noisy settings can be very misleading by simply relying on the observed mean objective
132
B. Afsar et al.
Fig. 5.11 Identification errors in noisy multi- and many-objective optimisation. To minimise the misclassification errors, the available computational budget must be (smartly) allocated during and after the search in order to improve the accuracy of the indicator values Table 5.1 Example of the detrimental effect of noise on indicator values m=2 m=3 m=5 m=7 Total number of points Number of nondominated points Deterministic HV Noisy HV Deterministic IGD Noisy I G D Number of MCE errors Number of MCI errors
m=9
100
300
1000
1200
1500
20
200
500
700
900
0.3323
0.5585
0.8167
0.9172
0.9658
0.4658 0.0637
0.7020 0.1844
0.8842 0.8489
0.9418 1.7357
0.3566 3.0591
0.4136 15
0.4833 145
0.9108 209
1.7645 97
3.0121 3
4
28
303
415
493
5 Many-Objective Quality Measures
133 INTERACTION Incorporate the preferences of the decision maker
ADD NEW SOLUTIONS
No
INITIAL SAMPLE Design experiments
EVALUATION Compute exact or simulated responses
Stopping criterion met?
INDIFFERENCE ZONE Detect only significant differences
RANKING AND SELECTION Smartly allocate budget to improve accuracy
IDENTIFICATION Return the observed optimal points
NOISE Characterize the intrinsic uncertainty
SEARCH Using a metaheuristic and/or a Bayesian approach
Yes QUALITY INDICATORS Assess the quality of stochastic Pareto fronts
Fig. 5.12 Generic procedure of noisy multi- and many-objective optimisation algorithms. The dashed boxes depict the possible extra steps to be considered with noisy evaluations
values. The impact of the noise affects drastically the number of misclassification errors and thus the indicator values. Note that for HV, in most cases the indicator value is shown to be (much) better than what it should be, except for the case of m = 9; conversely, the behaviour is the opposite for the I G D indicator. In particular, the increasing number of MCI errors may have a significant negative effect in decisionmaking, as these are truly dominated points wrongly considered Pareto-optimal. Multi- and many-objective optimisation algorithms thus need to take into account the noise disturbing the observations during the search of solutions and when identifying the observed Pareto-optimal set. Otherwise the algorithm may lead to incorrect inference about the system’s performance. When the noise level is high and/or the structure is strongly heterogeneous, static resampling may be inefficient, which is especially important when the function evaluations are expensive and the user has a limited computational budget [36]. Moreover, the level and structure of the noise may considerably differ among the objectives, and thus allocating the limited computational budget becomes non-trivial. The literature on noisy multi- and many-objective optimisation is scarce [46, 87]; most of the literature focuses on metaheuristic approaches, such as noisy evolutionary optimisation algorithms (see e.g., [33]). More recently, the simulation optimisation (see e.g., [59]), and Bayesian optimisation/machine learning (see e.g., [86, 108]) communities have increased interest in such problems, due to their practical relevance. Even though there are overlapping elements between the different communities, these links are often not clearly recognized. While the first (reluctant) steps are being made to create synergies between communities, these research areas have evolved rather independently up to now, terminology may differ, and research results are not structurally shared. Figure 5.12 shows how in the presence of uncertainty, both the search and identification of solutions are equally relevant problems, but often addressed separately in the literature. The noisy evolutionary multi- and many-objective optimisation community has put effort in developing techniques to consider the noise during the evolutionary phase of the algorithms, such as probabilistic dominance. Probabilistic dominance specifies the minimum probability α with which a solution xi dominates another
134
B. Afsar et al.
solution xk (i.e., p(xi ≺ xk ) ≥ α) [32]. Thus, instead of using outright dominance, domination is determined using a degree of confidence. Some of the early works that discussed probabilistic dominance [8, 32] propose to use the expected values of any deterministic indicator to compare the quality of different Pareto fronts with a certain confidence level, under the assumption that each solution is inherently associated with a probability distribution over the objective space. With many objectives, this becomes infeasible to compute, as the population sizes increase exponentially with the number of objectives. In [104], the authors propose to use non-parametric statistical tests for comparing deterministic indicator values of two or more Pareto fronts. In [99], probabilistic dominance is defined by comparing the volume in the objective space of the confidence intervals, and the center point of these volumes is used to determine the dominance relationship. Similarly, in [96], the standard deviation is added to the mean such that dominance is determined with the quantile objective values. More recently, in the RTEA algorithm proposed in [33], instead of using probabilistic dominance, during the evolutionary phase the algorithm tracks the improvement on the Pareto set (as opposed to the Pareto front). The RTEA algorithm seems promising, but has only been tested with up to 3 objectives. Under output uncertainty the question of either replicating more existing sampled points, or exploring more into interesting areas of the search space, remains an active research area (see e.g., [11]), even for the single-objective case. Ranking and selection methods are standard in single-objective stochastic (simulation) optimisation; for multiple objectives the problem is known as multi-objective ranking and selection (MORS), and for many objectives the field is still in its infancy. MORS procedures (see e.g., [16, 23, 85]) aim to ensure a high probability of correctly selecting a non-dominated design, by smartly distributing the available computational budget between the search of solutions and replicating critically competitive designs, in order to achieve sufficient accuracy. Analogously, they aim to avoid unnecessary resampling (i.e., when it provides little benefit). Ranking and selection methods are often augmented with an indifference zone (IZ) procedure. IZ procedures give a probabilistic guarantee on the selection of the best solution(s) (i.e., those solutions with the true best expected performance), or a solution within a given user-defined quantity from the best. This user-specified quantity defines the indifference zone, and it represents the smallest difference worth detecting [12]. To the best of our knowledge, there are only two multi-objective IZ procedures in the literature [16, 94], where the former clearly shows the shortcomings of the latter, but remains limited to the bi-objective case. Thus substantial work remains to be done in this regard. In practice, it is highly likely that the DM will be indifferent between marginally different solutions. Thus, effective computing budget allocation and IZ procedures will not only reduce the computational burden, but will also provide a statistical degree of confidence that the selected solutions are within the best possible range. Ranking and selecting the solutions with the true best expected performance is crucial before computing a quality indicator. All too often, multi- and many-objective evolutionary algorithms applied to stochastic problems, simply resort to computing deterministic performance measures (such as hypervolume and IGD) on the
5 Many-Objective Quality Measures
135
estimated objective outcomes, thereby ignoring the inherent noise. The results in [64, 86] highlight that standard quality indicators can be very misleading in stochastic settings (especially in settings with heterogeneous noise). The current literature lacks solutions or guidelines to reliably assess the Pareto front quality in such cases. The integration of MORS procedures during and after stochastic optimisation is essential to determine the solutions with the true best expected performance, and thus to reliably assess the Pareto front quality using deterministic indicators. Even though such procedures are standard in single-objective ranking and selection, in multi- and many-objective settings it is not even clear how to define them.
5.4.2 Robust Many-Objective Optimisation In Sect. 5.4.1 we were largely concerned with uncertainty manifest in observations. Often this uncertainty is modelled as coming from a known probability distribution. In contrast, in robust optimisation the uncertainty appears in objective function(s), f = ( f 1 , . . . , f m ). This may be through the design space, or other unobserved inputs to the functions, for instance (i) machining tolerances, or other factors which make the exact realisation of a putative design impossible, and instead it is subject to some uncontrollable perturbation; (ii) different operational scenarios which a design must perform well over (for instance different flight conditions a wing design may experience [71]); (iii) environmental change that it must operate in. A popular formulation of the robust optimisation problem is to define an uncertainty set U , whose elements, in combination with a putative solution x, lead to the quality assignment for a solution. That is, for a single objective the task becomes to find an x to minimise f (x, u)
(5.1)
for all u ∈ U . It is assumed that we do not have control on the particular element of U experienced when x is used in practice, but can draw from U when performing an optimisation. The set of evaluations under U can be denoted as: fU (x) = { f (x, u) : u ∈ U }.
(5.2)
By extension, we denote the multi-/many-objective version using vector notation as fU (x). See Fig. 5.13 for an illustration of the performance of a design on a bi-objective problem with a small uncertainty set. We generally seek to find a solution whose performance is good under the objective function(s) and tolerant to the uncertainty—i.e., a solution that is robust to the variations observed through f(x, u) caused by the elements of U . How ‘performs well’ is quantified will often vary depending on the situation. It may be characterised by an approximation of the average/expected performance of a design (which often requires a set of evaluations per putative solution) [26], some worst case or lower
136
B. Afsar et al.
Fig. 5.13 Illustration of the performance of a putative solution x in a bi-objective case with |U | = 6
percentile guaranteed performance or similar—for instance minimising the maximum performance over the entire set of scenarios a design may operate in [66]. Alternatively, when the minmax approach may be too restrictive (perhaps not all scenarios, i.e. combinations of a design and members of the uncertainty set, can be evaluated in a feasible time), then a light robustness approach can be taken, where only the likely scenarios are considered [34]. In contrast, in distributionally robust optimisation the distribution of the scenario set is unknown. Instead, a set D of possible distributions (scenario sets), containing the true scenario set is utilised to maximise the expectation of the robust performance (the distributionally robust stochastic program), see e.g., [27]. A lightly robust interactive method is proposed in [102] and a new robustness measure is proposed in the problem as an additional objective function in [103]. They both support the DM in balancing between quality of objective function values and robustness in an interactive manner under uncertainty in objective functions and decision uncertainty, respectively. When multiple (or indeed many) objectives are to be optimised, e.g., we are concerned with f(x, u), the complexity of the robust optimisation task grows greatly. In [48], the authors identify 10 distinct ways to characterise robustness (or efficiency) in a multi-objective optimisation setting from the literature, and which they develop. However, many of these rely on enumerating U for each design. Some measures considered in both the MDCM and EMO literature are: 1. Flimsily robust—a design is flimsily robust if it is not dominated by any other in the feasible set, when evaluated with at least one element u of the uncertainty set. All problems are guaranteed to have at least one flimsily robust solution [48]. As the number of objectives increases, the proportion of flimsily robust solutions is also likely to increase, due to the general increase in likelihood for solutions to be mutually non-dominating as the objective number increases. As such, flimsily robustness is likely to be a poor discriminator in many-objective settings—as nearly all designs evaluated may attain this designation when there are many objectives. Furthermore, this is compounded by the cardinality of the uncertainty set—each additional scenario giving a design another ‘chance’ to be identified as flimsily robust.
5 Many-Objective Quality Measures
137
2. Highly robust—a design x is highly robust if it is not dominated by any other in the feasible set when assessed under any element u of the uncertainty set. It is possible that there is no highly robust solution for particular problems [48]. For the same reason as the previous measure, in many-objective settings the highly robust measure is likely to be poorly discriminative, as it is increasingly likely there will be no highly robust solution as the number of objectives increases, as it is increasingly likely to find a mutually non-dominated design with any putative solution. Again, this is further compounded by the size of the uncertainty set itself. 3. Minmax set robust—if no element of the set fU (x) lies behind the dominated partition defined by fU (x' ), for all x' ∈ X , it is minmax set robust. These partitions are defined by the worst performing evaluations in fU (x), taking an optimistic axis-parallel piece-wise boundary [29]. See illustration in Fig. 5.14a. In the many-objective optimisation setting, the performance mapping of solutions under the uncertainty set is likely to result in many objective vectors resulting from the uncertainty set being mutually non-dominating, and therefore lying on the induced boundary. As any crossing of the boundaries induced by two designs will render them incomparable under this measure, it would appear the measure’s discriminative power degrades faster than single point-based dominance in the transition to the many-objective scenario—however such conjecture would need to be validated. 4. Hull-based minmax robust—similar to set-based minmax, but using a convex hull [13]. See illustration in Fig. 5.14b. Its scaling in many-objective domains is likely to be similar to minmax set robust, though as the convex hull is employed, fewer objective vectors under the uncertainty set will define the boundary. 5. Point-based minmax robust—a design is point-based minmax robust if the vector generated from the maxima of each objective in fU (x), is not dominated by the corresponding maxima vector of any other feasible design [66]. See illustration in Fig. 5.15a. As this collapses the set evaluation of single designs down to points, which are compared under dominance, the discriminative power of this measure will degrade in the same fashion as dominance in the deterministic setting in the shift from multi- to many-objective optimisation. However, the rate of this degradation is not compounded by the uncertainty set size, unlike some other measures in this list. 6. Expected performance robust—a design is expected performance robust if the vector generated from the mean of each objective in fU (x), is not dominated by the corresponding mean vector of any other feasible design. Also cast in terms of effective fitness [26]. See illustration in Fig. 5.15b. As this too collapses the set evaluation of single designs down to points, which are compared under dominance, its discriminative capability will tend to scale as in the deterministic setting. 7. Lower set less ordered robust—here the non-dominated set of objective vectors is extracted from each fU (x) and compared. Where such a set associated with a design is not dominated by any other, it is said to be lower set less ordered robust [47]. See illustration in Fig. 5.16a. The discriminative scaling behaviour of this measure in many-objective optimisation cases will be similar to that of minmax set robust.
138
B. Afsar et al.
Fig. 5.14 Illustration of the minmax set robust measure (a) and hull-based minmax robust (b). The design associated with the black circles is better than that of the unfilled circles, as the surface defined by the red line lies entirely in front of the dashed line. Both the filled circle design and the unfilled circle design are incomparable with the concentric circle design
8. % robust performance—a design is a multi/many-objective % robust optima if the vector constructed of sorted worst N % (e.g., 5%) performance on each objective under fU (x) is not dominated by the respective N % worst fU vectors of any other design. This measure is not as extreme as point-based minmax robust (where just one poor performance mapping under U will rule out a design, but it ensures a generally high confidence (depending on the value of N used) of a minimum baseline performance. This is often cast in terms of a desired confidence level of performance (see e.g., [28] for its recent use in expensive robust multi-objective optimisation). An illustration is provided in Fig. 5.16b. As it transforms the set to a single representative vector for dominance comparison, its discriminative capability will tend to scale as in the deterministic setting as the number of objectives increases. Additional characterisations can be generated via constraining such robustness measures by a minimal acceptable performance vector on some specified member of U (a nominal member), which represents some standard scenario for the problem [48], for instance the cruising conditions for an aircraft wing. As with our earlier discussions, our key concern primarily is how measures scale as the number of objectives increase (and whether some are untenable). Similarly to interactive methods (see next subsection), the impact of transition from multiobjective robust optimisation to the many-objective variants is largely unexplored in the literature thus far. Nevertheless, we have indicated above how we expect these measures to scale in terms of their ability to discriminate between designs as the number of objectives increases. Furthermore, some measures, like point-based minmax robust and expected performance robust may be optimised directly with the indicators mentioned in Sect. 5.3, such as the hypervolume, as they rely on a single quality vector per design, in the set of solutions being summarised. In the case of
5 Many-Objective Quality Measures
139
Fig. 5.15 Illustration of the minmax point robust measure (a) and expected performance robust (b). Derived vectors for each symbol set are shown in red triangles. Under the minmax point robust measure the filled circle design and concentric ring design are incomparable, however the unfilled circle design is worse than both. There is a complete order under the expected performance robust measure (concentric circle design < filled circle design < unfilled circle design)
Fig. 5.16 Illustration of the lower set robust measure (a) and 20% robust performance (b). All three designs can be seen to be incomparable under the lower set robust measure, but there is a complete order under the 20% robust performance measure (concentric circle design < filled circle design < unfilled circle design)
flimsily robust this may also be achieved by optimising over the tuple (x, u) as the surrogate design vector. Other robust definitions are much less amenable to such direct application of existing indicators. This is in large part due to the quality of an individual design being described in the set robust measures by a set/partitioning surface. Designing an appropriate indicator for a set of designs in this case becomes non-trivial, as the indicator effectively must operate on a set of sets. With all such robust approaches the size of U has a significant effect on the time cost, which actually may have a larger impact than the number of objectives, and approximations to the quality of a design may be required. The degree of sensitivity
140
B. Afsar et al.
of the various robustness measures, and indeed indicators built on them given such approximations, is another under-explored area.
5.4.3 Quality Measures for Interactive Methods As mentioned, we have also interactive methods and in many practical situations, they have shown potential [15, 74, 76]. In them, a DM takes part in the solution process and iteratively directs the generation of solutions with one’s preferences. So far, in the a priori indicators, the only type of preference information was a reference point. However, there are many other types of preference information (see, e.g., [76]). Interactive methods aim at saving in both computation and cognitive cost by considering a limited number of solutions at a time. An important element of interactive methods is learning: in the interactive solution process, the DM learns about the trade-offs among the objectives and what kind of preferences are feasible and can adjust one’s hopes based on the insight gained. Considering only solutions that are of interest to the DM saves computation cost. Typically, limitations related to the number of objective functions stem from the DM’s cognitive capabilities rather than technical aspects of interactive methods. Unfortunately, there are currently no quality indicators that are appropriate for interactive methods to properly reflect their nature. Desired properties of interactive methods and solution processes must be first characterized to be able to assess their performance and quality. For this purpose, a systematic literature review of the assessments of interactive methods is provided in [2]. It also proposes desirable properties for interactive methods. This can be seen as the first step towards developing quality indicators for interactive methods. Naturally, one can consider an interactive solution process as a series of a priori type steps and accumulate indicator values developed for a priori methods [1]. However, this does not reflect all relevant aspects of interactive methods, like learning. Some steps have been taken to enable comparison of interactive reference point based methods in [1, 7, 79] but they do not contain indicators of the above-mentioned type; they mainly generate preference information without human involvement. Moreover, some studies have been conducted to generate other types of preference information, such as pairwise comparisons or ranking solutions of a given subset of solutions, by using different types of utility functions [24, 57, 72]. It should be noted that methods where a DM can be replaced by a utility function are called non-ad hoc methods [92]. With this reasoning, reference point based interactive methods are called ad hoc methods and they cannot be compared by replacing a DM with utility functions. The main aim of an interactive method is to support the DM in finding the most preferred solution for a given problem. Therefore, the quality of an interactive solution process is judged by how well the method supports the DM in reaching a satisfactory solution and gaining confidence in it. Such a quality measure should depend on various aspects and consider the problem to be solved and the type of preference information provided by the DM. Therefore, we need characterizations of what kind
5 Many-Objective Quality Measures
141
of solutions are regarded to have a good quality. Thus, developing quality indicators for interactive methods is an important future research direction.
5.4.4 Summary In this section, we have discussed three under-explored areas for indicators—that of noisy, robust and interactive many-objective optimisation. Partially this may be derived by the additional complications these bring, and that a design’s quality may often be categorised by a set rather than a vector, but this is also derived from the fact that they are under-explored in the multi-objective optimisation context too. Additionally, there are other areas which we have not discussed here, such as indicators for dynamic many-objective optimisation (where a problem may change over time) [44] and measures based on decision space in many-objective optimization, which also need addressing.
5.5 Open Issues and Considerations Most quality indicators are fairly well understood, can be efficiently computed, and are easy to use when applied to low-dimensional optimisation problems, particularly in the bi-objective case. This is not, in general, the case for many-objective optimisation. Firstly, the distribution of point sets to which indicators are biased towards (optimal μ-distributions) are, in general, not well known. On the one hand their visualization is more difficult, and on the other hand, it is not trivial how to formally characterize such sets and the influence of parameters. Secondly, computing the indicator may become exponentially more expensive with increasing number of dimensions (e.g., the hypervolume indicator), or require a large reference set and even of exponential size in the number of objectives (e.g., IGD and IGD+ ). Moreover, computing the subsets of given bounded size that maximise the indicator, useful in the context of EMO selection, is NP-hard in some cases (e.g., for the hypervolume subset selection problem). (Weak) Pareto-compliance is an important property that quality indicators should possess. Although Pareto dominance is frequently seen in many-objective optimisation as not being discriminative enough, it still needs to be fulfilled because it reflects common sense. Even if the percentage of solutions mapping to non-dominated points among all feasible solutions is large, it is still desirable to never prefer a dominated solution over a non-dominated one, and as such, Pareto compliance is still a desirable property. Nevertheless, diversity measures (e.g., S-Riesz energy [30, 43]) and (weak) Pareto-compliant indicators based on preference models (e.g., portfolio-based indicator [40]) become more relevant in many-objective optimisation, particularly when it is the case that almost all solutions map to non-dominated points.
142
B. Afsar et al.
Regarding the usage of quality indicators in experimental analysis, we emphasise the importance to always explicitly describe the parameter settings used in the experiments, as these may lead to biasing the indicator towards very different distributions of points in objective space. The problem of determining the dominance relationship between solutions in noisy multi- and many-objective optimisation remains an open question. Very few works have attempted to formally define stochastic Pareto dominance (see e.g., [39]); moreover, there is no consensus among the different communities about the meaning of this concept in view of accurate decision making. Even though the noise perturbing the observations cannot be removed, it can certainly be modeled and learned in order to mitigate its effects. This area in particular requires further research on appropriate indicators, and the impact of noise on them as the number of objectives increases. This also has implications for those approaches, such as Bayesian optimisation (Chap. 10) which have emerged as methodologies for noisy settings (see e.g., [3, 85, 108]). These are capable of characterising the intrinsic noise (e.g., by using Gaussian process metamodels), and can incorporate indicators as part of acquisition function or infill criterion used in their search.
5.6 Conclusions In this chapter, we have been concerned with how to measure the quality of approximation sets for many-objective optimisation problems. The vast majority of the measures commonly in use for this task are drawn from those originally conceived for multi-objective optimisation. However, as we (and others) highlight, not all scale well to the many-objective case. Particular care must be taken if a measure is to be used as an indicator in the optimisation process, that it does not contradict Pareto dominance, and that its calculation does not become the dominant computational cost of the process. In practice, HV tends to be the indicator most heavily relied upon, as IGD+ and many others rely on information that is not known or available a priori (e.g., the Pareto front). The usefulness of a posteriori methods in general in the many-objective setting is also up for debate, given the cost of approximating the entire front well—when just one or a few solutions are eventually utilised by a DM. We have also considered quality measures for less well explored areas of manyobjective optimisation, such as a priori, interactive, noisy and robust problems. Relatively little work has been done in these areas, specifically concerned with manyobjective optimisation. However, there is work on defining quality of individuals which may be extended to set-level measures in the future. Two solution sets that are different in some characteristics (spacing, richness, etc.), may still yield very similar or identical indicator values using current approaches, leading to the discarding of equivalent performing solutions. Measures relating to solution representation, which try to factor in the design space are still relatively under-explored, but will be crucial for the advancement of multi-modal multi-objective optimisation [80].
5 Many-Objective Quality Measures
143
Author Contribution Statement Bekir Asfar was primarily involved in writing Sects. 5.3 and 5.4.3. Jonathan E. Fieldsend was primarily involved with writing Sects. 5.1, 5.4.2 , and chapter organisation. Andreia P. Guerreiro was primarily involved with writing Sects. 5.1 and 5.2. Kaisa Miettinen was primarily involved in writing Sects. 5.3 and 5.4.3. Sebastian Rojas Gonzalez was primarily involved with writing Sects. 5.2 and 5.4.1. Hiroyuki Sato was primarily involved with writing Sect. 5.2. All authors contributed to Sects. 5.5 and 5.6, paper coherence, and drafting. Acknowledgements This work was initiated during the MACODA: Many Criteria Optimisation and Decision Analysis Workshop at the Lorentz Center (Leiden, The Netherlands), 2019. We are grateful to the other participants of the workshop and the Lorentz Center for their support. Jonathan E. Fieldsend was supported in attending the MACODA workshop by Innovate UK [grant number 104400]. Bekir Afsar’s research was funded by the Academy of Finland [grant numbers 322221 and 311877]. The research is related to the thematic research area Decision Analytics utilizing Causal Models and Multiobjective Optimization (DEMO), jyu.fi/demo, at the University of Jyvaskyla. Andreia P. Guerreiro acknowledges the financial support by national funds through the FCT – Foundation for Science and Technology, I.P. [within the scope of the project PTDC/CCI-COM/31198/2017]. Sebastian Rojas Gonzalez was supported by the Fonds Wetenschappelijk Onderzoek – Vlaanderen, grantnumber 1216021N.
References 1. B. Afsar, K. Miettinen, A.B. Ruiz, An Artificial Decision Maker for Comparing Reference Point Based Interactive Evolutionary Multiobjective Optimization Methods, ed. by H. Ishibuchi, Q. Zhang, R. Cheng, K. Li, H. Li, H. Wang, and A. Zhou. , Proceedings of the 11th International Conference on Evolutionary Multi-Criterion Optimization (Springer, Cham, 2021), pp. 619–631 2. B. Afsar, K. Miettinen, F. Rui,. Assessing the performance of interactive multiobjective optimization methods: a survey. ACM Comput. Surv. 54(4) (2021) 3. R. Astudillo, P. Frazier, Multi-attribute Bayesian optimization with interactive preference learning, in Artificial Intelligence and Statistics (2020), pp. 4496–4507. JMLR.org 4. A. Auger, J. Bader, D. Brockhoff, Theoretically investigating optimal μ-distributions for the hypervolume indicator: first results for three objectives, in Parallel Problem Solving from Nature (PPSN) (Springer, 2010), pp. 586–596 5. A. Auger, J. Bader, D. Brockhoff, E. Zitzler, Theory of the hypervolume indicator: optimal μdistributions and the choice of the reference point, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2009), pp. 87–102 6. J. Bader, E. Zitzler, HypE: an algorithm for fast hypervolume-based many-objective optimization. Evol. Comput. 19(1), 45–76 (2011) 7. C. Barba-González, V. Ojalehto, J. García-Nieto, A. J. Nebro, K. Miettinen, J.F. AldanaMontes, Artificial decision maker driven by PSO: an approach for testing reference point based interactive methods, in Parallel Problem Solving from Nature (PPSN) (Springer, 2018), pp. 274–285 8. M. Basseur, E. Zitzler, A preliminary study on handling uncertainty in indicator-based multiobjective optimization, in Applications of Evolutionary Computing (Springer, 2006), pp. 727–739
144
B. Afsar et al.
9. N. Beume, B. Naujoks, M.T.M. Emmerich, SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007) 10. L.C.T. Bezerra, M. López-Ibáñez, T. Stützle, A large-scale experimental evaluation of highperforming multi- and many-objective evolutionary algorithms. Evol. Comput. 26(4), 621– 656 (2018) 11. M. Binois, J. Huang, R.B. Gramacy, M. Ludkovski, Replication or exploration? Sequential design for stochastic simulation experiments. Technometrics 61(1), 7–23 (2019) 12. J. Boesel, B.L. Nelson, S.-H. Kim, Using ranking and selection to “clean up” after simulation optimization. Oper. Res. 51(5), 814–825 (2003) 13. R. Bokrantz, A. Fredriksson, Necessary and sufficient conditions for Pareto efficiency in robust multiobjective optimization. Eur. J. Oper. Res. 262(2), 682–692 (2017) 14. P.A.N. Bosman, D. Thierens, The balance between proximity and diversity in multiobjective evolutionary algorithms. IEEE Trans. Evol. Comput. 7(2), 174–188 (2003) 15. J. Branke, K. Deb, K. Miettinen, R. Słowi´nski (eds.), Multiobjective Optimization: Interactive and Evolutionary Approaches, Lecture Notes in Computer Science, vol. 5252. (Springer, Heidelberg, 2008) 16. J. Branke, W. Zhang, Identifying efficient solutions via simulation: myopic multi-objective budget allocation for the bi-objective case. OR Spect. 41(3), 831–865 (2019) 17. K. Bringmann, S. Cabello, M.T.M. Emmerich, Maximum volume subset selection for anchored boxes, in Symposium on Computational Geometry (SoCG), ed. by B. Aronov and M. J. Katz. (Dagstuhl Zentrum für Informatik, 2017), pp. 22:1–22:15 18. K. Bringmann, T. Friedrich, An efficient algorithm for computing hypervolume contributions. Evol. Comput. 18(3), 383–402 (2010) 19. K. Bringmann, T. Friedrich, P. Klitzke, Two-dimensional subset selection for hypervolume and epsilon-indicator, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2014), pp. 589–596 20. D. Brockhoff, Optimal μ-distributions for the hypervolume indicator for problems with linear bi-objective fronts: exact and exhaustive results, in Simulated Evolution and Learning, ed. by K. Deb et al. (Springer, 2010), pp. 24–34 21. D. Brockhoff, T. Wagner, H. Trautmann, On the properties of the R2 indicator, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2012), pp. 465–472 22. T.M. Chan, Klee’s measure problem made easy, in Symposium on Foundations of Computer Science, FOCS (IEEE Computer Society, 2013), pp. 410–419 23. C.-H. Chen, L.H. Lee, Stochastic Simulation Optimization, vol. 1 (World Scientific, Singapore, 2010) 24. L. Chen, B. Xin, J. Chen, J. Li, A virtual-decision-maker library considering personalities and dynamically changing preference structures for interactive multiobjective optimization, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2017), pp. 636–641 25. S. Cheng, Y. Shi, Q. Qin, On the performance metrics of multiobjective optimization, in Advances in Swarm Intelligence (Springer, 2012), pp. 504–512 26. K. Deb, H. Gupta, Introducing robustness in multi-objective optimization. Evol. Comput. 14(4), 463–494 (2006) 27. E. Delage, Y. Ye, Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595–612 (2010) 28. J.A. Duro, R.C. Purshouse, S. Salomon, D.C. Oara, V. Kadirkamanathan, P.J. Fleming, SParEGO–A Hybrid Optimization Algorithm for Expensive Uncertain Multi-Objective Optimization Problems, eds. P.J. Fleming, K. Deb, E. Goodman, C.A.C. Coello, K. Klamroth, K. Miettinen, S. Mostaghim, P. Reed, Evolutionary Multi-criterion Optimization (EMO) (Springer, 2019), pp. 424–438 29. M. Ehrgott, J. Ide, A. Schöbel, Minmax robustness for multi-objective optimization problems. Eur. J. Oper. Res. 239(1), 17–31 (2014) 30. J.G. Falcón-Cardona, C.A.C. Coello, M.T.M. Emmerich,CRI-EMOA: A Pareto-Front Shape Invariant Evolutionary Multi-Objective Algorithm, eds. K. Deb, E. Goodman, C.A.C. Coello, K. Klamroth, K. Miettinen, S. Mostaghim, P. Reed, Evolutionary Multi-criterion Optimization (EMO) (Springer, Cham, 2019) pp. 307–318
5 Many-Objective Quality Measures
145
31. J. E. Fieldsend. Efficient real-time hypervolume estimation with monotonically reducing error. In Genetic and Evolutionary Computation Conference (GECCO), pages 532–540. ACM Press, 2019 32. J.E. Fieldsend, R.M. Everson, Multi-objective optimisation in the presence of uncertainty, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2005), pp. 243–250 33. J.E. Fieldsend, R.M. Everson, The rolling tide evolutionary algorithm: a multiobjective optimizer for noisy optimization problems. IEEE Trans. Evol. Comput. 19(1), 103–117 (2015) 34. M. Fischetti, M. Monaci, Light robustness, in Robust and Online Large-scale Optimization, ed. by R.K. Ahuja, R.H. Möhring, C. Zaroliagis, (Springer, 2009), pp. 61–84 35. M. Fleischer, The Measure of Pareto Optima Applications to Multi-Objective Metaheuristics. eds. M. Carlos, F. Peter, J. Fleming, E. Zitzler, L. Thiele, K. Deb, Proceedings of the 2nd international Conference on Evolutionary Multi-Criterion Optimization, (Berlin, Heidelberg, 2003), pp. 519–533 36. A. Forrester, A. Sobester, A. Keane, Engineering Design via Surrogate Modelling: A Practical Guide (Wiley, 2008) 37. R.J. Gomes, A.P. Guerreiro, T. Kuhn, L. Paquete, Implicit enumeration strategies for the hypervolume subset selection problem. Comput. Oper. Res. 100, 244–253 (2018) 38. R.H. Gómez, C.A. Coello Coello, MOMBI: a new metaheuristic for many-objective optimization based on the R2 indicator, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2013), pp. 2488–2495 39. S. Greco, B. Matarazzo, R. Slowinski, Dominance-based rough set approach to decision under uncertainty and time preference. Ann. Oper. Res. 176(1), 41–75 (2010) 40. A.P. Guerreiro, C.M. Fonseca, An analysis of the hypervolume Sharpe-ratio indicator. Eur. J. Oper. Res. 283(2), 614–629 (2020) 41. A.P. Guerreiro, C.M. Fonseca, L. Paquete, The hypervolume indicator: Problems and algorithms (2020) 42. M.P. Hansen. A. Jaszkiewicz, Evaluating the quality of approximations to the non-dominated set. Technical Report IMM-REP-1998-7, Institute of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark (1998) 43. D.P. Hardin, E.B. Saff, Discretizing manifolds via minimum energy points. Not. AMS 51(10), 1186–1194 (2004) 44. M. Helbig, A.P. Engelbrecht, Analysing the performance of dynamic multi-objective optimisation algorithms, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2013), pp. 1531–1539 45. Z. Hou, S. Yang, J. Zou, J. Zheng, G. Yu, and G. Ruan, A performance indicator for referencepoint-based multiobjective evolutionary optimization, in IEEE Symposium Series on Computational Intelligence (SSCI) (IEEE Press, 2018), pp. 1571–1578 46. S.R. Hunter, E.A. Applegate, V. Arora, B. Chong, K. Cooper, O. Rincon-Guevara, C. VivasValencia, An introduction to multiobjective simulation optimization. ACM Trans. Model. Comput. Simul. 29(1), 7:1–7:36 (2019) 47. J. Ide, E. Köbis, Concepts of efficiency for uncertain multi-objective optimization problems based on set order relations. Math. Methods Oper. Res. 80(1), 99–127 (2014) 48. J. Ide, A. Schöbel, Robustness for uncertain multi-objective optimization: a survey and analysis of different concepts. OR Spect. 38(1), 235–271 (2016) 49. H. Ishibuchi, N. Akedo, Y. Nojima, Behavior of multiobjective evolutionary algorithms on many-objective knapsack problems. IEEE Trans. Evol. Comput. 19(2), 264–283 (2015) 50. H. Ishibuchi, R. Imada, Y. Setoguchi, Y. Nojima, How to specify a reference point in hypervolume calculation for fair performance comparison. Evol. Comput. 26(3), 411–440 (2018) 51. H. Ishibuchi, H. Masuda, Y. Tanigaki, Y. Nojima, Difficulties in specifying reference points to calculate the inverted generational distance for many-objective optimization problems, in IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making (IEEE Press, 2014), pp. 170–177 52. H. Ishibuchi, H. Masuda, Y. Tanigaki, Y. Nojima, Modified distance calculation in generational distance and inverted generational distance. eds. A. Gaspar-Cunha, C.H. Antunes, C.A.C.
146
53.
54.
55. 56. 57.
58. 59. 60.
61. 62. 63. 64. 65.
66. 67. 68.
69.
70. 71. 72.
73. 74. 75.
B. Afsar et al. Coello, Evolutionary Multi-criterion Optimization (EMO), Part II, (Springer, 2015) pp. 110– 125 H. Ishibuchi, N. Tsukamoto, and Y. Nojima. Evolutionary many-objective optimization: A short review. In Congress on Evolutionary Computation (CEC), pages 2419–2426. IEEE Press, 2008 A. Jaszkiewicz, R. Susmaga, P. Zielniewicz, Approximate hypervolume calculation with guaranteed or confidence bounds, in Parallel Problem Solving from Nature (PPSN) (Springer, 2020), pp. 215–228 S. Jiang, Y.-S. Ong, J. Zhang, L. Feng, Consistencies and contradictions of performance metrics in multiobjective optimization. IEEE Trans. Cybernet. 44(12), 2391–2404 (2014) Y. Jin, J. Branke, Evolutionary optimization in uncertain environments - a survey. IEEE Trans. Evol. Comput. 9(3), 303–317 (2005) M. Kadzi´nski, M.K. Tomczyk, R. Słowi´nski, Preference-based cone contraction algorithms for interactive evolutionary multiple objective optimization. Swarm Evol. Comput. 52, 100602 (2020) S.-H. Kim, B.L. Nelson, Selecting the best system, in Simulation (Elsevier, 2006), pp. 501–534 J.P.C. Kleijnen, Design and Analysis of Simulation Experiments (Springer, 2015) J. Knowles, D. Corne, Bounded pareto archiving: theory and practice, in Metaheuristics for Multiobjective Optimisation, ed. by X. Gandibleux, M. Sevaux, K. Sörensen, V. T’kindt (Springer, 2004), pp. 39–64 J.D. Knowles, D. Corne, On metrics for comparing non-dominated sets, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2002), pp. 711–716 J.D. Knowles, D. Corne, Properties of an adaptive archiving algorithm for storing nondominated vectors. IEEE Trans. Evol. Comput. 7(2), 100–116 (2003) J.D. Knowles, D. Corne, M. Fleischer, Bounded archiving using the Lebesgue measure, in Congress on Evolutionary Computation (CEC) (2003), pp. 2490–2497 J.D. Knowles, D. Corne, A. Reynolds, Noisy multiobjective optimization on a budget of 250 evaluations, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2009), pp. 36–50 T. Kuhn, C.M. Fonseca, L. Paquete, S. Ruzika, M. Duarte, J.R. Figueira, Hypervolume subset selection in two dimensions: formulations and algorithms. Evol. Comput. 24(3), 411–425 (2016) D. Kuroiwa, G.M. Lee, On robust multiobjective optimization. Vietnam J. Math. 40, 305–317 (2012) B. Li, J. Li, K. Tang, X. Yao, Many-objective evolutionary algorithms: a survey. ACM Comput. Surv. 48(1), 1–35 (2015) K. Li, K. Deb, X. Yao, R-metric: evaluating the performance of preference-based evolutionary multiobjective optimization using reference points. IEEE Trans. Evol. Comput. 22(6), 821– 835 (2018) M. Li, S. Yang, X. Liu, A performance comparison indicator for Pareto front approximations in many-objective optimization, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2015), pp. 703–710 M. Li, X. Yao, Quality evaluation of solution sets in multiobjective optimisation: a survey. ACM Comput. Surv. 52(2), 26 (2019) Y. Liang, X.-Q. Cheng, Z.-N. Li, J.-W. Xiang, Robust multi-objective wing design optimization via CFD approximation model. Eng. Appl. Comput. Fluid Mech. 5(2), 286–300 (2011) M. López-Ibáñez, J.D. Knowles, Machine Decision Makers as a Laboratory for Interactive EMO, eds. A. Gaspar-Cunha, C. H. Antunes, C.C. Coello, Evolutionary Multi-criterion Optimization (EMO) (Springer, Cham, 2015) pp. 295–309 X. Ma, Q. Zhang, G. Tian, J. Yang, Z. Zhu, On Tchebycheff decomposition approaches for multiobjective evolutionary optimization. IEEE Trans. Evol. Comput. 22(2), 226–244 (2018) K. Miettinen, Nonlinear Multiobjective Optimization (Kluwer Academic Publishers, 1999) K. Miettinen, Introduction to multiobjective optimization: noninteractive approaches, in Multiobjective Optimization: Interactive and Evolutionary Approaches, ed. by J. Branke, K. Deb, K. Miettinen, R. Slowinski (Springer, 2008), pp. 1–26
5 Many-Objective Quality Measures
147
76. K. Miettinen, J. Hakanen, D. Podkopaev, Interactive nonlinear multiobjective optimization methods, in Multiple Criteria Decision Analysis: State of the Art Surveys, ed. by S. Greco, M. Ehrgott, J. Figueira, 2nd edn. (Springer, 2016), pp. 931–980 77. K. Miettinen, F. Ruiz, A. Wierzbicki, Introduction to multiobjective optimization: interactive approaches, in Multiobjective Optimization: Interactive and Evolutionary Approaches (Springer, 2008), pp. 27–57 78. A. Mohammadi, M.N. Omidvar, X. Li, A new performance metric for user-preference based multi-objective evolutionary algorithms, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2013), pp. 2825–2832 79. V. Ojalehto, D. Podkopaev, K. Miettinen, Towards automatic testing of reference point based interactive methods, in Parallel Problem Solving from Nature (PPSN) (Springer, 2016), pp. 483–492 80. Y. Peng, H. Ishibuchi, K. Shang, Multi-modal multi-objective optimization: problem analysis and case studies, in IEEE Symposium Series on Computational Intelligence (SSCI) (IEEE Press, 2019), pp. 1865–1872 81. R. C. Purshouse, P.J. Fleming, Evolutionary many-objective optimisation: an exploratory analysis, in Congress on Evolutionary Computation (CEC), vol. 3 (IEEE Press, 2003), pp. 2066–2073 82. R.C. Purshouse, P.J. Fleming, On the evolutionary optimization of many conflicting objectives. IEEE Trans. Evol. Comput. 11(6), 770–784 (2007) 83. N. Riquelme, C. Von Lucken, B. Baran, Performance metrics in multi-objective optimization, in Latin American Computing Conference (IEEE Press, 2015), pp. 286–296 84. T. Rodemann, A comparison of different many-objective optimization algorithms for energy system optimization, in Applications of Evolutionary Computation, ed. by P. Kaufmann, P.A. Castillo (Springer, 2019). pp. 3–18 85. S. Rojas-Gonzalez, J. Branke, I. Van Nieuwenhuyse, Multiobjective ranking and selection with correlation and heteroscedastic noise, in Winter Simulation Conference (IEEE Press, 2019), pp. 3392–3403 86. S. Rojas-Gonzalez, H. Jalali, I. Van Nieuwenhuyse, A multiobjective stochastic simulation optimization algorithm. Eur. J. Oper. Res. 284(1), 212–226 (2020) 87. S. Rojas-Gonzalez, I. Van Nieuwenhuyse, A survey on Kriging-based infill algorithms for multiobjective simulation optimization. Comput. Oper. Res. 116, 104869 (2020) 88. K. Shang, H. Ishibuchi, A new hypervolume-based evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 24(5), 839–852 (2020) 89. K. Shang, H. Ishibuchi, L. He, L.M. Pang, A survey on the hypervolume indicator in evolutionary multiobjective optimization. IEEE Trans. Evol. Comput. 25(1), 1–20 (2021) 90. K. Shang, H. Ishibuchi, X. Ni, R2-based hypervolume contribution approximation. IEEE Trans. Evol. Comput. 24(1), 185–192 (2020) 91. P.K. Shukla, N. Doll, H. Schmeck, A theoretical analysis of volume based Pareto front approximations, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2014), pp. 1415–1422 92. R.E. Steuer, Multiple Criteria Optimization: Theory, Computation and Application (Wiley, 1986) 93. R. Tanabe, H. Ishibuchi, An analysis of quality indicators using approximated optimal distributions in a 3-D objective space. IEEE Trans. Evol. Comput. 24(5), 853–867 (2020) 94. S. Teng, L.H. Lee, E.P. Chew, Integration of indifference-zone with multi-objective computing budget allocation. Eur. J. Oper. Res. 203(2), 419–429 (2010) 95. Y. Tian, R. Cheng, X. Zhang, F. Cheng, Y. Jin, An indicator-based multiobjective evolutionary algorithm with reference point adaptation for better versatility. IEEE Trans. Evol. Comput. 22(4), 609–622 (2018) 96. H. Trautmann, J. Mehnen, B. Naujoks, Pareto-dominance in noisy environments, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2009), pp. 3119–3126 97. D. A. Van Veldhuizen, Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations. Ph.D. thesis, Air University, USA, Air Force Institute of Technology, Ohio (1999)
148
B. Afsar et al.
98. D. Vaz, L. Paquete, C.M. Fonseca, K. Klamroth, M. Stiglmayr, Representation of the nondominated set in biobjective discrete optimization. Comput. Oper. Res. 63, 172–186 (2015) 99. T. Voss, H. Trautmann, C. Igel, New uncertainty handling strategies in multi-objective evolutionary optimization, in Parallel Problem Solving from Nature (PPSN) (Springer, 2010), pp. 260–269 100. A.P. Wierzbicki, On the completeness and constructiveness of parametric characterizations to vector optimization problems. OR Spect. 8, 73–87 (1986) 101. G. Yu, J. Zheng, X. Li, An improved performance metric for multiobjective evolutionary algorithms with user preferences, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2015), pp. 908–915 102. Y. Zhou-Kangas, K. Miettinen, Decision making in multiobjective optimization problems under uncertainty: balancing between robustness and quality. OR Spect. 41(2), 391–413 (2019) 103. Y. Zhou-Kangas, K. Miettinen, K. Sindhya, Solving multiobjective optimization problems with decision uncertainty: an interactive approach. J. Bus. Econ. 89(1), 25–51 (2019) 104. E. Zitzler, J.D. Knowles, L. Thiele, Quality assessment of Pareto set approximations, in Multiobjective Optimization: Interactive and Evolutionary Approaches (Springer, 2008), pp. 373–404 105. E. Zitzler, S. Künzli, Indicator-based selection in multiobjective search, in Parallel Problem Solving from Nature (PPSN) (Springer, 2004), pp. 832–842 106. E. Zitzler, L. Thiele, Multiobjective optimization using evolutionary algorithms - A comparative case study, in Parallel Problem Solving from Nature (PPSN) (Springer, 1998), pp. 292–301 107. E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V. Grunert da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003) 108. M. Zuluaga, A. Krause, M. Puschel, ε-pal: an active learning approach to the multi-objective optimization problem. J. Mach. Learn. Res. 17(1), 3619–3650 (2016)
Chapter 6
Benchmarking Vanessa Volz, Dani Irawan, Koen van der Blom, and Boris Naujoks
Abstract The evaluation and analysis of optimisation algorithms through benchmarks is an important aspect of research in evolutionary computation. This is especially true in the context of many-objective optimisation, where the complexity of the problems usually makes theoretical analysis difficult. However, the availability of suitable benchmarking problems is lacking in many research areas within the field of evolutionary computation for example, optimisation under noise or with constraints. Several additional open issues in common benchmarking practice exist as well, for instance related to reproducibility and the interpretation of results. In this book chapter, we focus on discussing these issues for multi- and many-objective optimisation (MMO) specifically. We thus first provide an overview of existing MMO benchmarks and find that besides lacking in number and diversity, improvements are needed in terms of ease of use and the ability to characterise and describe benchmarking functions. In addition, we provide a concise list of common pitfalls to look out for when using benchmarks, along with suggestions of how to avoid them. This part of the chapter is intended as a guide to help improve the usability of benchmarking results in the future.
V. Volz (B) modl.ai, Copenhagen, Denmark e-mail: [email protected] D. Irawan · B. Naujoks TH Köln - University of Applied Sciences, Cologne, Germany e-mail: [email protected] B. Naujoks e-mail: [email protected] K. van der Blom Sorbonne Université, CNRS, LIP6, Paris, France e-mail: [email protected]; [email protected] Leiden Institute of Advanced Computer Science, Leiden, Netherlands © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_6
149
150
V. Volz et al.
6.1 Introduction 6.1.1 Definition The term benchmarking is well known in different areas like business, geolocation, and computer science, but is associated with different meanings. In the field of evolutionary computation (EC), it refers to running one or more algorithms and assessing their performance on a set of optimisation problems, also referred to as a benchmarking suite. Benchmarking studies are conducted for various different goals, for example to compare the performance of different algorithms or to find good hyperparameters for a specific optimiser. Whatever the goal is, the benchmarking procedure should be selected based on it. A recent attempt to compile an overview of common benchmarking goals and existing best practices can be found in [1]. In this chapter, we mostly assume performance assessment as the benchmarking goal, as it is one of the most common ones. Often, the goal is to answer questions about specific algorithms, approaches, or problems or to discover patterns in behaviour with the intention to draw general conclusions. Generality in this case would mean that conclusions hold for untested functions and other versions of the problem as well, thus allowing some form of extrapolation. In contrast, competitions are aimed more towards answering questions with a narrower scope, for example for a single application or a single configuration of an algorithm while indicating a clear winner (cf. [47]). This is why coverage of problem types and robust analysis of results are important topics in the context of benchmarking.
6.1.2 Historical and Current Context In this chapter, we focus on multi- and many-objective optimisation in line with the main theme of the book. In order to do so, we often refer to several current streams of research on benchmarking in general. Below, we provide some historical context to make our writing more accessible. Benchmarking has been a popular tool in evolutionary computation from the very beginning. Even at the first conferences, papers presenting new algorithms or new algorithmic variants contained a comparison between the obtained performances and those of alternative methods. Since then, our knowledge with respect to good benchmarking practice has grown continuously, and introductions as well as tutorials on various areas of benchmarking are well attended at conferences on EC every year. In particular, setting up the COCO/BBOB framework was a milestone in benchmarking making a well developed set of functions featuring different instances available to
6 Benchmarking
151
the community. Moreover, COCO/BBOB also features measurement and analysis tools that enable an easy comparison of algorithms [16, 17]. Lately, there has been a renewed focus on benchmarking in the field, apparent in the large number of different tutorials and workshops held at the main EC conferences. This interest culminated in a general meeting of the associated organisers at GECCO 2019, which was called with the intention to foster connections and the exchange of ideas, as well as to create a common strategy for promoting research on the topic of benchmarking in EC. As a result of that meeting, a benchmarking network [41] was set up to continue the discussion, coordinate activities, and help with organising common events like workshops and competitions. The aim of all such activities is to improve benchmarking and, thus, the applicability of the benchmarked techniques. Several complimentary initiatives were also ongoing in the same timeframe, including an IEEE CIS Task Force on Benchmarking [43] and a survey of existing benchmarking research and best practices [1]. Best practice in benchmarking has possibly been discussed as long as benchmarking itself. Due to the number of facets of the topic as well as the multitude of perspectives, various sub-topics exist around best practices. These include experimental setup, performance measurement, and result analysis, to name a few. Because there is a significant body of research on benchmarking for most of these sub-topics, the purpose of the survey is to provide an overview of existing literature and best practices, as well as to identify issues that remain unsolved. This survey thus covers a wide range of topics and points to specialised literature for more details.
6.1.3 Motivation and Overview In this chapter, we add value by providing a new lens on the topic of benchmarking, namely multi- and many-objective optimisation (MMO). This specificity also allows us to address some of the most important open issues more concretely, by listing common pitfalls and providing a checklist to avoid them. One such commonly mentioned issue in MMO is the lack of diversity in the benchmarking suites employed for the empirical evaluation of algorithms. We hope to help improve variety by compiling a list of existing benchmarks, which is described in Sect. 6.2. The intention is to make fellow researchers aware of other options, and to potentially identify types of problems where no suitable benchmark exists. The latter is especially relevant in the context of real-world applications, which are an important topic in MMO and discussed in more detail in Chap. 2. Ideally, it should be possible to identify benchmarking suites that contain problems with sufficiently close similarities to a given problem. Only in this case can conclusions drawn from the chosen benchmark be trusted to be transferable to the problem at hand. We thus dedicate a part of Sect. 6.2 entirely to problems suitable for benchmarking algorithms intended for real-world applications. However, especially in the context of real-world applications, it is not well understood which properties the most common problems have and which benchmarks
152
V. Volz et al.
cover them. Such properties would further be helpful in understanding the problem better, identifying suitable benchmarks or potentially generating them if none are available. As the first step towards this goal, a questionnaire for practitioners was developed in order to identify the most common characteristics of real-world problems. A more detailed description of the motivation of the questionnaire as well as of the obtained results at the time of writing can be found in Chap. 3. We thus hope to highlight the availability (or lack thereof) of different MMO benchmarks to inspire more development in this area. However, even if benchmarking suites are available, there are numerous pitfalls to be aware of when running benchmarking studies. This includes potential issues when choosing problems and analysis methods, as well as the application of benchmarks in general. In Sect. 6.3, we thus list the most common and problematic ones encountered in benchmarking in EC. The aim is to provide a concise list of instructions that allows a practitioner to avoid many of the potential problems, as well as to provide reasoning for the best practices listed in [1]. We conclude this chapter with a summary as well as a list of open issues specifically related to benchmarking in MMO. We find that important next steps for existing benchmarks are to increase their ease of use and to analyse and describe the fitness landscapes of benchmarking problems. The first steps in this direction have been made already, but further research is needed, in particular regarding its application. We further find that a majority of pitfalls in benchmarking are tied to an often unconscious misalignment of research goals and the benchmarking approach.
6.2 Existing Benchmarks Benchmarks are a popular tool for evaluating MMO algorithms. However, in order to identify benchmarking suites suitable for a given research question, an overview of available options is required. In this section, we thus cover several well-known benchmarks, focusing on the ones containing multi- and many-objective problems. In order to be able to identify suitability, however, a characterisation of the different problems is required. Instead of just listing different benchmarks, we thus also aim to characterise them and describe what sets them apart from each other. The properties used for this characterisation are explained below. In this context, it should be noted that no single optimal point exists in MMO. As different solutions can dominate each other in different objectives, algorithms designed for MMO usually aim to approximate the Pareto front of the problem as closely as possible—both in terms of value and coverage. Their performance is thus assessed by a quality indicator that measures both of these aspects of the achieved approximation. Readers should refer to Chap. 5 for more details and discussions on quality indicators. We split our discussion into two sections, based on whether the benchmarks originate from real-world problems or not, i.e. are artificial. This is because these different types of benchmarks usually support different types of goals; artificial benchmarks
6 Benchmarking
153
are very well suited for gaining a detailed understanding of the different strengths and weaknesses of a given algorithm. This is because the functions contained in them are usually relatively well understood, which allows drawing conclusions based on function types. Artificial functions are usually also not prohibitively computationally expensive, thus allowing for benchmarking suites with a large coverage of different problem types. In contrast, real-world inspired benchmarks are usually intended to represent a specific genre of problems as faithfully as possible while remaining practical as a benchmarking study. They are thus better suited as a way to characterise behaviour in a more specific but highly relevant situation. We choose to not only limit our overview to benchmarks that are coupled to benchmarking software but also include sets of proposed functions as well as functions contained in competitions. The reasoning here is that all of these functions could be integrated into benchmarking frameworks with moderate effort. Best practices for designing such frameworks and including post-processing of results is also a major open issue in current research and out of scope for our purposes.
6.2.1 Artificial Benchmarks Benchmark problems are often designed to test an algorithms’ ability to solve some specific types of problems. With a well-defined mathematical formulation, specific characteristics can be included in the benchmarking problems; hence the term “artificial”. Examples of such characteristics are discussed below in more detail. Some artificial problems are also created to have a known Pareto front, sometimes with a specific shape. This allows for performance evaluation that is independent of other algorithms. Often, such problems can be configured with different parameters which will change the problems’ landscape and characteristics to a certain, knowable degree. This allows the creation of tests that are suited for a given research goal. However, this flexibility can also result in the creation of tests that are specifically playing to an algorithm’s strengths or weaknesses, depending on the narrative, and thus bias the results. In order to ensure a balanced evaluation, research communities often define benchmarking suites that combine sets of problems with predefined parameters and cover a wide range of problems with different characteristics [1]. Table 6.1 lists several artificial benchmarking suites along with basic descriptions. Even in the case of artificial problems, not all properties/characteristics are fully known (cf. Sect. 6.3.1). However, some simple characteristics are commonly used in formulating artificial problems, such as the modality and deceptivity. In manyobjective settings, these characteristics may present themselves differently due to the interactions between the objectives. As an example, when a multimodal function is used as an objective in a many-objective problem, it can potentially create a “multifrontal” problem [19]. In such problems, algorithms may be attracted and converge to a sub-optimum front corresponding to a local optimum in the objective functions. To understand what the meaning of such local front is, let us call the inverse map of
154
V. Volz et al.
Table 6.1 Some existing benchmarking suites with artificial problems. For more detailed information regarding the benchmarking suites, readers should directly consult the referred articles. Benchmarking suite #Obj. Description BBOB [16]
1 or 2
MOrepo [38] MOCOlib [13] ZDT [50] MOP [45] DTLZ [7]
2 2 2 2–3 ≥2
WFG [19]
≥2
DMP [36]
≥2
SDP [25] MaOP [28]
≥2 ≥2
BP [33]
≥2
GPD [34]
≥2
ETMOF [30]
2–50
CEC2021 path [29]
2–7
Spans many classes of problems, including large scale, noisy, and mixed-integer problems Contains several bi-objective combinatorial problems Contains several bi-objective combinatorial problems A set of 6 test problems with scalable input vector size A set of 7 test problems. Input vector size is scalable A set of 7 test problems. Both input and output vector size are scalable Contains 9 test problems, scalable. Separates distanceand position-related variables Minimise distance to vertices on a polygon. Has more complex constraints than the common box-constraint Scalable and dynamic problems Scalable problems with focus on difficult Pareto set and degeneracy Scalable problems based on B-spline with hexagon-shaped Pareto front The problems are scalable and can be deceptive, multimodal or sensitive to noise A set of 40 test problems used in CEC2021 competition on Evolutionary Transfer Multi-objective Optimisation Path Planning problems used in a competition as part of CEC2021
the local front to the search space as a local Pareto set. If new points are generated by applying a small shift from the local Pareto set, their objective vectors will be dominated by the local front. A practical example is shown in Fig. 6.1 where a local front of a 2-objective WFG4 with 2-dimensional search space is shown. An interesting aspect of multi- and many-objective optimisation is that Pareto fronts have distinct curvature shapes, depending on how different objectives interact. Curvature shape functions can create linear, convex, concave, degenerate, and/or disconnected fronts. It is also possible to have a mixture of the different curvature functions. Figure 6.2 shows the mentioned shapes. In the figure, there are three disconnected sections. The left-most section has a mixed shape, combining convex and concave fronts. The middle section depicts a degenerate front with a single point in a 2dimensional objective space. The right section depicts a linear front.
6 Benchmarking
155
Fig. 6.1 The local front (dashed line) is shown to be dominated by the Pareto front (solid line). If small perturbations are applied to the local Pareto set (circles), the obtained objective values are dominated by the local front Fig. 6.2 Different curvatures of a Pareto front. Each colour represents the different shapes of the Pareto front
In addition to the curvature, the Pareto front shape is also one of the interesting features of multi- and many-objective problems. The shape of the Pareto front has a significant effect on the performances of algorithms based on the decomposition of the objectives [33]. Several benchmark problems use a simplex for the Pareto front’s shape. The first variant is created by having (normalised) extreme solutions at (1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . ., (0, 0, 0, . . . , 1). In 3-objective problems, this is referred to as a triangle Pareto front. The test functions DTLZ1-DTLZ4 and WFG4-WFG9 use this shape. The shape is illustrated in Fig. 6.3. Another distinct shape is known as the inverted triangle. In this variant of simplex shaped Pareto front, the (normalised) extreme solutions are located at the inverse of triangle Pareto front, i.e. (0, 1, 1, . . . , 1), (1, 0, 1, . . . , 1), . . ., (1, 1, 1, . . . , 0). In [22], the minus version of DTLZ1-DTLZ4 and WFG4-WFG9 are introduced. These test functions use the inverted triangle shape. The shape is illustrated in Fig. 6.4. In the previously mentioned simplex shapes, there are some solutions that optimise multiple objective functions at the same time. This property is unlikely to be found
156
V. Volz et al.
Fig. 6.3 In triangle Pareto fronts, each vertex serves as the best value for several objectives and the worst value for one objective
Fig. 6.4 In inverted triangle Pareto fronts, each vertex serves as the worst value for several objectives and the best value for one objective
Fig. 6.5 In hexagon Pareto fronts, each vertex can only serve as either the best or worst value for one objective
in real-world problems [32]. To combat this issue, hexagon-shaped fronts have been proposed as a different class of problems with a distinct shape [33]. In hexagonshaped problems, when one objective is at its best attainable value, no other objectives will be at their best attainable value; the same holds for the worst values. The shape is illustrated in Fig. 6.5. Such Pareto front topologies have been a popular focus of research, but, recently, there has also been interest in Pareto set topologies, e.g. [28, 31, 36]. Li et al. [28] designed a benchmarking suite focusing on complex Pareto set topologies and degenerate Pareto fronts. Nojima et al. [36] also proposed a benchmarking suite
6 Benchmarking
157
with difficult topologies by using constraints, i.e. many parts of the search space are considered infeasible. Another important aspect to consider is the scale of the problems, especially if the scaling behaviour of algorithms is of interest. Some benchmarking suites also specifically focus on modelling so-called “large scale problems”, which are usually problems with a large search space, i.e. many input variables. However, there are no consensuses on when to classify a problem as large scale or small scale. The CEC2013 LSGO test problems, for example, use 1000 continuous input variables. However, the number is still small in comparison to the billion-variable problem addressed in [6]. Do note that the billion-variable problem is a discrete optimisation problem while the other problems are continuous. In order to test the scalability of different algorithms, many benchmarks contain functions at different scales. The DTLZ and the WFG benchmarking suites, for example, are scalable in both search and objective space. This way, problems can be configured to be large scale in both search space and objective space, i.e. large-scale many-objective problems. Such problems inherit the difficulties from both large-scale problems and many-objectives problems. Algorithms usually have difficulty finding improvements for these types of problems, because the search space is so large and the number of solutions needed to approximate the Pareto front also grows exponentially [40]. In the case of algorithms relying on Pareto dominance, the selection pressure to the Pareto front is also weaker [23]. Consequently, the convergence rate for known algorithms on these problems is often comparably slow. Another way in which problems are sometimes configurable is by assigning specific effects to different dimensions in the search space. Varying a solution along such a dimension will thus result in a predictable effect on the objective value. As an example, observe the problem in Eq. 6.1: min f(x) = { f 1 (x), f 2 (x)} x
f 1 (x) = 2(x1 + sin(x2 )) f 2 (x) = 4(x1 + cos(x2 )) 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
(6.1)
In Eq. 6.1, the convergence to the Pareto front is governed by x1 and the Pareto front is reached when x1 is zero (see Fig. 6.6). The other variable, x2 , does not affect convergence to the Pareto front; however changing x2 while keeping x1 constant will always create points that do not dominate each other as shown in Fig. 6.7. Variables that only affect convergence to the Pareto front, like x1 does, are referred to as convergence-variables. Variables that do not affect convergence to the Pareto front and instead distribute the solutions over the objective space, like x2 does, are referred to as distribution-variables. With respect to convergence- and distribution-variables, all problems in the DTLZ suite use m − 1 distribution-variables, so it cannot be tuned. The WFG suite, on the other hand, requires the number of distribution-variables
158
V. Volz et al.
Fig. 6.6 Effect of changing only convergence-variables, keeping distribution-variables constant on a concave WFG test problem. Different colours represent different values of convergence-variables for a population of 30 individuals each
Fig. 6.7 Effect of changing only distribution-variables, keeping convergence-variables constant on a concave WFG test problem. In the search space (not shown), all points have identical convergencevariables, but different distribution-variables. In the objective space, it is shown that all points have the same distance to their corresponding closest-point in the Pareto front
as a parameter. While these predictable effects probably do not model real-world problems, they do allow for easier interpretability of the results. In contrast, there are some artificial benchmarks that intend to model some aspects of real-world problems, such as uncertainties and noise. Real-world systems are often affected by uncertainties which can stem from defects, inaccurate measurements, manufacturing limitations, or many other factors. In optimisation under uncertainty, solutions that may seem to be the optimum can produce completely different results
6 Benchmarking
159
when perturbed. Robust solutions are solutions that will give similar performance irrespective of realisations of the uncertainties in the system [46]. The GPD benchmark proposed in [34] can be configured to simulate noise in the search space in order to test the robustness of algorithms. In the resulting problems, there is a globally optimal front as well as a robust optimal front. The performance measures used in this context thus need to be adapted accordingly. Dynamic problems are another example of complexities that can occur in the real world, and that have recently modelled with artificial benchmarks [25]. A dynamic problem is a problem whose objective functions change over time. As the objective functions change, so do the locations of the optimal points. In multi-/many-objective problems, this translates to moving the Pareto set and the Pareto front.
6.2.2 Real-World Benchmarks Besides artificial benchmarks, there are also benchmarks based on real-world problems. Although there are not as many as there are artificial ones, these real-world benchmarks are very important to get an impression of how algorithm performance transfers to real-world problems. Similar to the artificial benchmarking suites, we collected real-world suites and their main properties, and again focus on multiand many-objectives suites. Table 6.2 summarises existing real-world optimisation benchmarking problems and suites. This overview gives insight into which types of real-world problems are available, and where there are still gaps. Since the number of available suites is quite limited, this also highlights the need for more real-world
Table 6.2 Some existing benchmarking suites with real-world problems. For more detailed information regarding the benchmarking suites, readers should directly consult the referred articles Benchmarking suite #Obj. Description CFD [4]
1-2
GBEA [48]
1-2
Car structure [27]
2
EMO’ 2017 [14] JSEC 2019 [42]
2 1 or 5
RE + CRE [39]
≥2
Radar waveform [20]
9
A collection of 3 real-world CFD problems with long objective evaluation times Various functions for two game optimisation problems Two variants of a car structure design problem with many constraints A collection of 10 real-world problems A single problem with 32 continuous decision variables and 22 constraints Contains 16 test problems. At most 7 decision variables. Includes 8 constrained variants, with up to 11 constraints A single problem that can be scaled between 4 to 12 integer decision variables
160
V. Volz et al.
benchmarking problems. Note that there is a large number of MMO problem s that are presented as specific use cases in EC literature. Due to this specificity, there is no easy way of integrating them into practical benchmarking suites, and we therefore do not include them in our overview. In the following, we will briefly cover some interesting details about each of these benchmarks. In [4] a suite of computational fluid dynamics (CFD) problems is introduced, consisting of problems that are computationally expensive in comparison to commonly used benchmarking suites. The suite considers three problems in total, two of which are single-objective, and one is a bi-objective problem. Besides its obvious value to real-world problems involving CFD, it is also of interest due to its computational cost and thus models expensive problems. As a result, the suite can serve to evaluate the performance of algorithms focused on using a small budget of function evaluations on a real-world test case. The CFD suite also allows its three problems to be instantiated with different dimensionalities in the search space, which also influences the difficulty of the problems. By providing a black box constraint evaluator, the CFD suite makes it possible to test whether a decision vector is feasible. However, since the constraint evaluator only indicates feasible or not feasible, the use of more sophisticated constraint handling methods is limited. The computation times for some base geometries as given by the authors roughly range between half a minute and 15 minutes. The games benchmark for evolutionary algorithms (GBEA) was first introduced in [48] and contains 4 benchmarking suites (2 single- and 2 bi-objective) that are integrated with the COCO benchmarking framework [16] for ease of use. The problems contained in the benchmark are all formulated as continuous parameter-tuning problems for two games and a wide range of different objectives considered for optimisation based on the state of the art in AI-assisted game design research. For example, one problem is finding values for a deck of cards so that the final outcome of played rounds tends to be close. The problems contained in the benchmark are modelled after real problems encountered when fine-tuning games in the game industry, and also contain some features that are common in real-world applications in general, such as requiring simulations with associated noise. Despite this, evaluations are relatively quick with a single evaluation taking less than 5 s on average for most functions, except for some problems where they take around 35 s on average. The benchmark is set up to be scalable in search space and allows for convenient post-processing using the functionalities implemented in COCO. It also contains variations (instances) for each problem in order to avoid artefacts in the results tied to the initialisation of the optimiser. In [27], a car structure design problem with two objectives was proposed as a benchmark. This problem considers the simultaneous optimisation of car parts for different car models. The goal here is to minimise the total weight of the car models, and to maximise the number of parts that use a standard thickness. Either two or three different models are considered at the same time, corresponding to either 148 or 222 discrete design variables. In addition, the former case requires 36 constraints to be satisfied, while the latter has 54 constraints. Using the response surface method, the constraints are approximated, and evaluation times for this problem are kept low.
6 Benchmarking
161
For the 2017 evolutionary multicriterion optimisation (EMO) conference, a competition with real-world optimisation problems was organised [14]. References to the original problem descriptions are given in the source code. Since, by design, the competition was truly black box, not much information is available besides the source code published after the competition. However, based on personal communication with the organisers, the following are known: (a) The problems consider exclusively continuous variables, where for one problem, the discrete variables were fixed to default values to achieve this. (b) Constraints (except boundary constraints) were removed by parametrising the feasible region with the unit hypercube. This was done so participation was not limited to algorithms capable of handling constraints and, as it turned out, the existing linear constraints were not essential. (c) As a basic measure to prevent cheating, competition participants would each see different permutations of variables and objectives. Since this was done just for the competition, this is not the case in the published source code. Although the competition considered previously published problems, the above changes mean they do not exactly match their original descriptions in all cases. The welded beam [5] and disc brake [37] problems used in this competition were also included in the RE/CRE suites discussed below. In the 2019 evolutionary computation competition held by the Japanese Society of Evolutionary Computation (JSEC), a wind turbine design optimisation problem was used [42]. Two versions were considered; the single objective case, and one with five objectives. In both cases, all 32 decision variables were used. In addition, with 22 inequality constraints the problem is quite heavily constrained. For the competition, the number of evaluations was limited to 10 000. Due to an evaluation time of three seconds, this results in about eight hours of computation. The RE benchmarking suite described in [39] introduces an easy-to-use collection of real-world multi-objective optimisation problems. With the aim of being practical and easy-to-use, the RE suite only considers problems for which humanunderstandable mathematical formulations exist. Out of a total of 16 problems, five are bi-objective, seven have three objectives, and the other four are many-objective problems with up to nine objectives. Although most of these problems have exclusively continuous variables, there is also one with just integer variables, and there are four mixed-integer problems. Most notable with regard to variables is perhaps that these are all relatively low-dimensional problems, and have at most seven decision variables. Although for the majority of problems Pareto front properties are not known, convexity and connectedness properties are given for six of them. Something to consider when using this benchmarking suite is that four problems are partially based on surrogate responses, and thus do not perfectly match their corresponding real-world problems. Further, most of the included problems are reformulated versions where the sum of constraint violation values was added as an objective. In addition to the RE benchmarking suite, the same work also introduced the CRE suite with eight constrained problems [39]. These problems are the original versions before reformulation as included in the RE benchmarking suite. Out of the eight problems, five are bi-objective, two have three objectives, and a single five-objective problem is included. The number of constraints range from one to eleven. Like for
162
V. Volz et al.
the RE benchmarking suite, these are all relatively low-dimensional problems, and primarily continuous, with just one integer and one mixed-integer problem. The radar waveform problem [20] is an unmodified real-world problem with relatively cheap function evaluations, allowing for many evaluations per second. In addition to the nine objectives, there are nine corresponding inequality constraints in the objective space. The number of decision variables can range from four to twelve, and all of them are integers. Some problem properties are also known. For instance, if the considered number of decision variables is smaller than the number of objectives (nine), the lower-dimensional space is projected into a higher-dimensional search space manifold and thus not all possible objective vectors can be defined. At the same time, when there are more than nine decision variables, multiple points in the search space map to the same point in the objective space. Some relations between objectives also exist, e.g. the first eight objectives contain four pairs: the mean and minimum of the same measure, respectively. Besides the above benchmarking suites and problems, the black box optimisation competition (BBComp) [15] is also worth mentioning. Although there is no claim that any of the problems come from the real-world, by making the problems truly black box (not revealing them to the user), it modelled1 a property that is relevant for some real-world situations. Further, it is also connected to real-world scenarios as it enforces participation to be a single attempt with a fixed evaluation budget. In other words, the final result is what is achieved within a predefined computational budget without prior access to the specific setup.
6.2.3 Shortcomings in Existing Benchmarks Considering the aforementioned benchmarking suites, a few conclusions can be drawn. Benchmarking suites are designed to cover a wide range of problems with different characteristics. Some attributes are easily recognisable; however, other attributes may be unknown. In testing algorithms, it is important to understand which class of problems the algorithms are capable of solving. The unknown attributes may lead to wrong conclusions (cf. Sect. 6.3.1). From the list of available benchmarks, it should be pointed out that almost all benchmarking suites are just a collection of problems, without integrated quality measures or performance analysis functionality. This increases the risk that these problems are used outside of the original intentions, for example by using inappropriate performance measures for the intended use case (cf. misinterpretation of results covered in Sect. 6.3.2.3). At the same time, it also has the advantages to keep evaluation and analysis functionalities separate from the problems, in order to allow researchers to introduce new perspectives. In terms of currently available benchmarking suites, the suites for artificial problems are predominantly focused on bi-objective problems. Further, the many1
Since no future competition is planned, the used problems have since been made publicly available.
6 Benchmarking
163
objective problems often consider (inverted) triangular Pareto fronts. For both multi- and many-objective problems, constraints (beyond box constraints) are rarely included in benchmark suites. Other common issues are the use of distance and position variables, Pareto sets that are aligned with the (box-)constraints, the lack of irregularities, and the predominance of problems that are separable. With regards to real-world applications, although many different real-world applications are researched in the optimisation community, benchmarking suites are only available for a few of them. For example, engineering design (e.g. airfoil optimisation) and trajectory optimisation have active communities, but no common benchmarks seem to exist. The available real-world benchmark suites also primarily focus on problems with a convenient mathematical definition, while benchmarking problems based on simulations are rare. It should be noted here that it is not clear whether the properties of such benchmark problems differ, and as such whether this is really a problem. However, the risk exists that an important class of problems is neglected. Chapter 3 aims to provide the first evidence-based study on properties of a varied set of real-world problems, and discusses key shortcomings of existing benchmarking suites based on the discovered properties. In addition, each of the real-world suites covered here has its own interface, making it difficult to compare an algorithm on multiple suites. As a community, we would benefit from a common interface. To improve the variety of available artificial problems, it would be interesting to develop problems that can be conveniently adjusted to exhibit different (combinations of) challenging properties, including a degree of “realness”. Beyond the difficulty of designing problems that can be adjusted for many different properties, there is unfortunately no indicator to measure realness. In addition to many properties being unknown for most real-world problems, Chap. 3 shows that the properties exhibited by different real-world problems are also very diverse, which poses a challenge in developing such an indicator. Despite this, a configurable artificial problem would still be interesting to test algorithm performance on many properties, regardless of those properties being associated with a (known) real-world problem. Another possible direction to develop new benchmark problems are transformations of realworld problems, so properties like the Pareto front shape and deceptivity can be altered. However, without knowing the real-world characteristics of such a problem, it may lead to testing for unknown properties, in addition to the known ones. This may not always be desirable, but could be employed in situations where we want to test how robust a method is to the presence of properties other than those it is being tested for.
164
V. Volz et al.
6.3 (Avoiding) Pitfalls Benchmarking can be a useful tool when done right—but it can also have detrimental effects in some situations. Drawing incorrect but seemingly reliable conclusions can misdirect future research efforts based on them. In industry, misleading results can have even more severe implications, such as incurring higher costs when implementing suboptimal solutions. It is therefore in every researcher’s and practitioner’s best interest to adhere to scientific best practices when it comes to benchmarking. In (single-objective) evolutionary optimisation, a set of best practices has been established by common usage of benchmarking frameworks such as COCO [16]. Still, there are further considerations about guidelines for benchmarking in evolutionary computation that are being actively discussed at this point in time [1]. In order to contribute further to this cause, and complimentary to these guidelines, we present a list of common pitfalls that impede benchmarking in evolutionary computation. For each of these potential issues, we also discuss how to avoid them. We have structured the list of pitfalls into three subsections based on the area of experimental design they relate to: Problem Choice, Analysis, Performance Evaluation, as well as Benchmark Usage. A visual overview can be found in Fig. 6.8.
6.3.1 Problem Choice (PC) The choice of problems in a benchmark effectively determines the type of landscapes an optimisation algorithm is tested on. As a result, it constrains the type of conclusions that can be drawn from obtained results. The choice of problems or benchmarking suites is thus of great importance. However, issues arise if there is a misalignment between the chosen problems and the benchmarking goals. Naturally, the set of benchmarking functions should reflect the type of problems that are of interest for the analysis. If the goals for the analysis do not align with the problems tested on, the desired interpretations are not possible. For example, if the goal is to test the general performance of an algorithm, a diverse set of problems should be chosen for the analysis. It is thus of utmost importance that the goal of the benchmarking study is determined first, and appropriate problems are identified accordingly (cf. Sect. 2 in [1] for example). There are several pitfalls that can introduce such a misalignment which we will discuss in the following.
6.3.1.1
Misrepresentation of Target Problems (PC-1)
In some cases, algorithms and related empirical studies focus on specific target applications or problems. However, there are several reasons why it could be necessary to use stand-in functions for such target problems in a benchmarking setup. Especially
6 Benchmarking
165 Benchmarking Pitfalls
Problem Choice (PC) PC-1: Misrepresentation of target problems 1. Misrepresentation of complexity 2. Misrepresentation of other problem properties 3. Properties unknown (target)
Analysis and Performance Evaluation (AP) AP-1: Generality in experimental setup
Benchmark Usage (BU) BU-1: Individual misuse of benchmarks
1. Undisclosed assumptions 2. Lack of hyperparameter tuning
1. Unsupported generalisation 2. Reporting an incomplete picture 3. Missing context
PC-2: Undisclosed bias in problem selection
AP-2: Incorrect Application of Statistical Tests
BU-2: Cultural misuse of benchmarks
1. Lack of coverage 2. Baked-in assumptions 3. Properties unknown (benchmark)
1. Inappropriate choice of approach 2. Lack of relevance 3. Multiple testing
1. Unquestioned inheritance of benchmarking setups 2. Benchmark-driven research
AP-3: Misinterpretation of results 1. Biases in quality indicators 2. Overrepresentation of intended narrative 3. Confounding effects
Fig. 6.8 Overview of common pitfalls in benchmarking of evolutionary algorithms
in the case of real-world applications or large-scale problems, using the original functions might not be practical due to limits on the computation budget or the resulting high level of complexity, which does not lend itself to theoretical analysis. It is therefore common practice to represent the problems using simplified versions. Problems arise if this representation is not faithful to the targeted problems. (PC-1.1) Misrepresentation of Complexity Ideally, benchmarking problems and the corresponding target problems have similar complexity. If the problems in a benchmarking suite are too easy or too hard, the performance of different algorithms will be indistinguishable. A good way to identify a mismatched complexity is by computing baseline performances for naive algorithms (such as random search) and comparing them to the state of the art. One way that problem complexity is often scaled is by reducing the number of interactions between different components of a problem. (Partial) Correlations between variables in search or objective space are a commonly
166
V. Volz et al.
observed example of interactions (see Chap. 9) that can be especially complex if the relationship is between more than two variables or not constant for the whole problem. However, in order to save on computational budget and improve the interpretability of the results, users of benchmarks often focus on scaled-down versions of their target problems. It is important to consider whether this modification significantly reduces variable interactions, which would ultimately not accurately represent the complexity of the problems of interest. To identify whether the downscaling unduly simplifies variable interactions in a problem, the scaling behaviour of well-understood algorithms can be observed and analysed to identify unexpected behaviour. Alternatively, the correlations between different sets of variables can be measured using different machine learning techniques, ranging from simple correlation coefficients to more complex models mostly used in dimensionality reduction. However, answers obtained this way are usually more reliable, the more is known in advance about the type of interactions. (PC-1.2) Misrepresentation of other Problem Properties In addition to missing interactions, it is often not possible to find stand-in problems with the exact same characteristics and properties (cf. Sect. 6.2) as the targeted ones. This is aggravated by the fact that, for practical reasons, these stand-in functions need to be relatively computationally cheap. Mixed-integer benchmarks are very scarce, for example, especially if other properties such as noise need to be modelled as well [44]. Often, however, the difference in problem properties would be more subtle. For example, they might differ in terms of deceptiveness, the shape of the Pareto front, or in the number of local optima. In cases where it is not possible to identify or generate stand-in problems with the appropriate properties, and it is impractical to use the targeted ones directly, we recommend aiming for a balanced set of benchmarking problems instead. This means that the selected problems should still be in alignment with the targeted one, but contain enough diversity in order to be able to draw robust conclusions. (PC-1.3) Properties Unknown (Target) In some cases, the properties of problems in the area of interest are unknown. This is especially an issue when optimising real-world problems. Benchmarking suites are often based on artificial test functions for practical reasons. This also means that their similarity to real-world problems is unclear. Take for instance the BBOB [18] problems, or for the multi-objective case, the DTLZ [7] problems. These suites allow comparison between algorithms and learn how these algorithms behave for a selection of function properties. Properties of fitness landscapes constructed from real-world problems, however, are often unknown. In these cases, it is thus not clear how obtained benchmark results translate to the application in question; cf. also Chap. 3. This translation is important to be able to draw meaningful conclusions about the performance potential
6 Benchmarking
167
of the benchmarked algorithms on real-world problems. There is initial evidence for this being an actual issue for some popular benchmarks such as DTLZ, where algorithm benchmark performance differs from the performance on real-world problems [21]. To combat this, in Chap. 3, the authors developed a questionnaire to identify higher-level properties of real-world problems. While their early results have led to some insight, more data and detailed analysis is needed in order to allow the identification of statistically significant patterns.
6.3.1.2
Undisclosed Bias in Problem Selection (PC-2)
A majority of benchmarking studies aim to elicit a general assessment of the performance of different algorithms across a variety of problems. How broad the intended generalisations are usually depends on the application in question. In order to allow for generalisations, however, the problems selected for benchmarking need to be a good representation of the targeted problem space and must therefore be an unbiased selection. If this is not possible, the bias must be disclosed and the generality of any conclusions drawn must be reduced appropriately. For example, if it is expected that the problems encountered in a setting will mostly have four objectives, the benchmark used should not exclusively contain bi-objective problems. While this may be obvious, bias in other problem properties such as the shape of the Pareto front or specific correlations between variables are more difficult to recognise, but just as important. Furthermore, it is common to draw conclusions about problem types based on only some instantiations of the problem. This is another level of generalisation, which, in turn, requires ensuring that a diverse set of instances of each problem is tested. There are several common pitfalls that can result in an undisclosed and often unconscious bias in the selection of the problems, which we describe below. (PC-2.1) Lack of Coverage If some area of the problem space, i.e. the space of potential problems, is not covered, no data can be gathered about problems in that area. While it is of course not possible to test all conceivable problems, it is important to ensure that enough similar problems have been tested in order to allow conclusions to be drawn for a given unseen problem. This is why some notion of similarity between problems and their properties is required to measure coverage. Otherwise, a benchmarking suite could be comprised of a large number of problems, but still only cover a small area of the space of target problems. To assess coverage, it is therefore also necessary to give a clear definition of the target problem space. Coverage is important as algorithm performance can vary significantly based on the choice of problems. By consciously selecting specific problems, it is thus possible to demonstrate good performance results for a specific algorithm. However, because bias in the selection is not always
168
V. Volz et al.
immediately obvious, the interpretation of these results can be misleading if the bias is not disclosed. (PC-2.2) Baked-in Assumptions In some cases, no suitable set of problems might be available. It therefore can be necessary to collect or generate appropriate test problems. In this case, it is of utmost importance to avoid generating problems that fit into the research hypothesis. One way of avoiding this issue is by using real-world problems as an inspiration and/or source for the problem suite. (PC-2.3) Properties Unknown (Benchmark) In many cases, it is not clear what types of properties a considered benchmarking problem possesses (cf. Sect. 6.2). This is especially an issue if some of these unknown properties are shared across all problems in a benchmark. In this case, even results from a seemingly diverse set of problems are not generalisable. For example, in simulation-based benchmarks with unknown optima, it could be the case that all functions have convex Pareto fronts. In order to combat this issue, the properties of benchmarks should be analysed thoroughly using diverse approaches. In addition, any unknowns about the targeted problems should be disclosed and potential risks for the generality of the results discussed. This pitfall also provides additional reasons for detailing how the different problems were chosen in order to be able to better indicate the range of targeted problems, as well as facilitate identifying applications that differ significantly. Several of the pitfalls mentioned above thus require the development of better analysis methods for black box functions, in order to be able to characterise the challenges posed by a benchmark and interpret obtained results accordingly. Exploratory Landscape Analysis (ELA), for example, is a first step in that direction, but due to the data-driven nature of the approach, it does not really facilitate an interpretable understanding or characterisation of a given problem [26, 35]. Another useful approach can be visualisation, as visual perception helps a human observer to identify patterns. Visualisation for many-objective problems is discussed in Chap. 7. Further work in this area is definitely required.
6.3.2 Analysis and Performance Evaluation (AP) The manner in which results are collected, aggregated, and interpreted naturally has a large impact on which conclusions can be drawn. Each of these steps thus also needs to be in alignment with the benchmarking goals in order to draw meaningful conclusions in the target context. In addition, as evolutionary algorithms are non-deterministic, statistical analysis is required in order to interpret the results appropriately. This type of analysis does come with its own set of potential inaccuracies, especially for inexperienced users.
6 Benchmarking
169
Some important examples of pitfalls for each of the steps are described in the following.
6.3.2.1
Generality in Experimental Setup (AP-1)
The experimental setup should reflect the target context and not add any additional constraints. Striving for generality in experimental setups not only is practical because of reusability but also reduces the chance of unconsciously adding assumptions to the execution of the experiments. This, in turn, facilitates their interpretation. (AP-1.1) Undisclosed Assumptions An important aspect of interpreting benchmarking results is to acknowledge the assumptions made in the experimental setup itself. For example, in order to draw conclusions from a performance comparison between two algorithms on a given problem, it is not sufficient to consider the solutions found after an arbitrary number of function evaluations. Instead, the comparison should be independent of such design decisions unless they are motivated by the application or research hypothesis. For example, in general, benchmarks should ideally show the performance of algorithms given different computational budgets for function evaluations. The COCO benchmarking framework, for instance, implements the concept of any-time performance for this purpose [16]. However, if the target application of the study introduces specific constraints to the number of allowed function evaluations, it is of course not necessary to analyse setups beyond that. Other important examples of design decisions that are often made without sufficient motivation in benchmarking are algorithm hyperparameters and problem search space dimensions. (AP-1.2) Lack of Hyperparameter Tuning For most algorithms, hyperparameters affect their behaviour significantly, although the sensitivity depends on the algorithm [11, 12]. It is, however, in practice difficult to determine hyperparameters that would lead to a fair comparison for every algorithm. This is due to several interacting factors: • Runtime adaptation is an integral part of some algorithms, which thus will result in a robust performance using default parameters. The robust performance of course still requires suitably unbiased initialisation instead of defaults. However, not all algorithms are intended to be dynamically adaptable and can only be used as intended after some tuning. • The task of finding optimal parameter settings for different algorithms can vary significantly in terms of difficulty. Therefore, if the research hypothesis does not contain tuning, the algorithms should be compared based on their respective optimal configurations, which might be impossible or impractical to identify. Allowing the same budget to identify the
170
V. Volz et al.
best parameters does not alleviate the problem either, because tuning was not considered part of the algorithm. • Thorough tuning can require a considerable number of additional experiments, which can be impractical. • Authors usually have varying familiarity with the different algorithms they intend to compare. A better understanding of the algorithms can, however, lead to an unconscious bias by giving an algorithm an unfair advantage through more appropriate tuning. Similar problems occur with regards to finding appropriate default parameters in the literature and corresponding code. Based on these factors, it seems there is no single best way to approach hyperparameter tuning in the context of benchmarking. However, the purpose and context of the study can give some indications to support decisionmaking. For example, the amount and manner of tuning should reflect the practical restrictions of the intended scenario. Unconscious biases in tuning efforts are difficult to address, but can be alleviated by using parameters used in recent publications of the algorithms’ authors, as long as the problem choice and experimental setup are comparable. Otherwise, the automatic optimisation of hyperparameters should be considered as a more objective option.
6.3.2.2
Incorrect Application of Statistical Tests (AP-2)
As has been demonstrated in related literature [9], the choice and application of statistics can have a significant impact on the conclusions that can be drawn. Best practices have been formulated and implemented elsewhere [1, Chap. 6] [10]. Here, we just want to list some of the often overlooked pitfalls that should be avoided when processing benchmarking results using statistical approaches. (AP-2.1) Inappropriate Choice of Approach It is often not straightforward to pick a suitable statistical test for a given hypothesis. Different tests have different assumptions about the data they are applied to, and, often, it is not clear immediately whether the obtained benchmarking results fulfil these assumptions. If the assumptions are not fulfilled, this means that the conclusion suggested by the test is not actually being supported. To avoid such issues, we suggest consulting appropriate literature like [3, 8], where the authors give a detailed overview about suitable statistical tests, as well as their advantages and drawbacks. In general, to avoid making unsupported assumptions, parameter-free tests should be chosen in case of uncertainty about the distribution of the data. (AP-2.2) Lack of Relevance The results should include multiple runs of a given (non-deterministic) algorithm on the same problem in order to account for the variation in its behaviour. This can also mean running the algorithm on different variations
6 Benchmarking
171
(instances) of the same problem. However, in practice, some variation will always be observed and there will thus be some uncertainty to obtained results. Therefore, if benchmarks are used to distinguish between different algorithms, the number of times the experiments are run should reflect the conditions in which the algorithms will be executed during runtime. Otherwise, it is possible to produce statistically significant performance differences that are not meaningful in practice because either (a) the algorithm would not be restarted as often or (b) the measured effect is negligible. Therefore, the desired effect size should be specified in advance. For example, when comparing the performance between two algorithms, it should be specified in advance at what point a difference in performance is meaningful. This judgement of course depends on the application as well as the chosen hypothesis. In order to avoid this pitfall, a relevance analysis can be performed (see [1, Chap. 6.4] for more details). (AP-2.3) Multiple Testing In many cases when benchmarking algorithms, multiple hypotheses are tested in the context of a study and/or multiple pairwise comparisons are made. This is for example the case when comparing the performance of two algorithms on different problems. Usually, the report would include a statistical test on the ranks of the algorithms on all different problems. The issue here is that each test comes with an uncertainty. In order to obtain the uncertainty for the complete results table, the uncertainties need to be multiplied. Conclusions drawn about the overall performance of an algorithm based on the complete table thus are made with a higher uncertainty than single comparisons. For this reason, in order to still draw general conclusions with the expected uncertainty, the significance levels for all tests need to be adjusted accordingly (for example using Bonferroni correction). See [1, Chap. 6.3.3] for a more detailed explanation of which adjustments need to be made. Many of the issues discussed above could be alleviated by including the appropriate solutions in the process of benchmarking directly, e.g. in the corresponding software used. For example, the right statistical tests including appropriate corrections of significance levels could be executed automatically in a post-processing step. The COCO benchmarking platform, for example, already includes statistical tests and Bonferroni correction of significance values in their post-processing [16]. If the IOHprofiler [49] is used to interpret COCO results, it is also possible to set desired effect sizes by specifying appropriate targets for algorithm performance.
6.3.2.3
Misinterpretation of Results (AP-3)
In addition to a misinterpretation due to the incorrect application of statistical tests, there are several pitfalls to look out for when drawing conclusions from benchmarking results. However, we can only discuss some general issues here. There is a plethora of potential pitfalls that are specific to the given setup.
172
V. Volz et al.
(AP-3.1) Biases in Quality Indicators It is important to acknowledge implicit and potentially unknown biases in quality indicators. • Targets: In single-objective optimisation, for example, target-based approaches are popular. This means that the performance of an algorithm is assessed based on the required computational effort to achieve a specified target value in an optimisation problem. However, the choice of target values influences the results greatly. If targets can be consciously chosen, this can be considered an advantage for the interpretation of results. However, in problems with unknown optima, it is difficult to find appropriate absolute values for such targets. These issues are compounded for multi-/many-objective optimisation, where the investigation of biases and assumptions of quality indicators is still an important topic of research. For more details, refer to Chap. 5. • Multiple Indicators: The current state of the art is to use multiple quality indicators in order to allow a balanced interpretation of the results. This takes into account the characteristics of the different indicators. For example, in multi-objective optimisation, some indicators encourage the spread of solutions while others prefer the progress of the solution set towards a reference set. It is important to be mindful of the diversity of the chosen indicators, as some are already combinations of others (take for example IGD [2]). Combining such mixed indicators can thus result in unintended weighting of different characteristics. • Alignment: It is of course important to choose quality indicators that correspond to the aim of the study, which should have been formulated clearly and in advance. • Indicator-based Algorithms: Indicator-based algorithms use some quality indicators as feedback for the algorithm in order to steer it through the search space. Using the same indicators to compare the performance of different algorithms afterwards can thus create an unfair advantage. (AP-3.2) Overrepresentation of Intended Narrative If the goal of a study is to provide support for a specific hypothesis, it is tempting to only perform tests that would support the presented narrative. However, it is important to present the results in a balanced fashion in order to not misrepresent the facts. This can for example be achieved by providing a detailed description of how results and conclusions were obtained, as well as some high-level statistics (e.g. on runtime and performance averages). Ideally, additional visualisations of the data should be made available alongside with the raw data itself, in order to allow readers to understand and reproduce the experiments. (AP-3.3) Confounding Effects Lastly, part of interpreting benchmarking results does usually involve making educated guesses about what could explain observed algorithm
6 Benchmarking
173
behaviours. However, ideally, these guesses should be verified in suitable additional experiments. A common pitfall in this regard is confounding effects, i.e. attributing observed behaviour to the wrong mechanisms because the interactions between different mechanisms are not fully understood. While it is very difficult to identify such issues, a good way is to run experiments on algorithms with different combinations of mechanisms. In addition, theoretical analysis of abstracted versions of these mechanisms can offer valuable insights into their interactions. Regardless, unverified guesses should always be clearly presented as such.
6.3.3 Benchmark Usage There are also several pitfalls regarding the general manner of using benchmarks which are important to avoid. Some of these are issues where a single group of researchers chooses the wrong mode of using an existing benchmark, for example by using results out of context. However, there are also several issues that are cultural, where repeated usage of benchmarks and referencing previous results eventually cumulates in overspecialisation, for instance. We discuss both types of issues below.
6.3.3.1
Individual Misuse of Benchmarks (BU-1)
(BU-1.1) Unsupported Generalisation Great care has to be taken when drawing general conclusions from benchmarking results. As stated in Sect. 6.3.1, the generality of conclusions is necessarily limited by the choice of problems and the intended purpose of the benchmark. This issue becomes especially apparent in the context of real-world problems. It is important to consider in what way the benchmarking problems reflect the ones likely to be encountered in real-world applications. While evolutionary computation has undoubtedly been used successfully in various industry applications (cf. Chap. 2), there is no clear consensus of the most promising approaches or shared understanding of real-world problems. There are, however, recent efforts to mitigate this issue by surveying optimisation problems in industry in order to construct representative problems (see Chap. 3). Another important aspect is using existing benchmarks outside of their intended purpose. The original authors might have had a specific purpose in mind and made concessions, e.g. in terms of the diversity of problems included. It is thus necessary to be aware of such limitations in order to be able to interpret obtained results appropriately. (BU-1.2) Reporting an Incomplete Picture The experimental process and resulting conclusions should be reported as thoroughly as possible. It is especially important to clearly disclose the
174
V. Volz et al.
aim and structure of the process without retro-active editing. While it is important to conduct a wide range of evaluations, this does not mean that algorithms should be benchmarked aimlessly. Running experiments and discovering patterns retroactively is not conducive to good research, as the likelihood that these observations are an artefact of the chosen problems is too high. Instead, the experiments should be conducted starting from an initial research hypothesis. Any preliminary experiments and changes to the hypothesis should therefore be included in the discussion of the benchmarking results. Negative benchmarking results should be included in reports for the same reason. Not excluding certain sets of experiments is important for appropriately describing the experimental setup, as well as to give a complete picture of the performance of an algorithm. Being able to identify the weaknesses of an algorithm should be as important as identifying its strengths. (BU-1.3) Missing Context Benchmarking is most useful if baseline data is available to provide context for the interpretation of the obtained results. Otherwise, it is usually not possible to judge the success of a given algorithm. Several baselines should be included, even if the experiments just focus on an improvement to a specific algorithm. This allows detecting instances where the existing algorithm was improved in some capacity, but the original algorithm was underperforming in comparison to the state of the art. To avoid drawing conclusions on missing or even misleading data, it is thus advisable to include a random baseline, a simple naive baseline, as well as at least one state-of-the-art approach. Ideally, several state-of-the-art algorithms are included, representing different approaches. In order to avoid the additional computational effort otherwise required for including additional algorithms in the analysis, there are benchmarking results already available for several popular algorithms. The COCO framework and the IOHprofiler, for example, allow simply loading these results into their post-processing tools [16, 49].
6.3.3.2
Cultural Misuse of Benchmarks (BU-2)
(BU-2.1) Unquestioned Inheritance of Benchmarking Setups Even if a benchmarking suite is diverse and has proven to be useful in the past, it might still be worthwhile to periodically modify it with the intent to diversify it further. Regular updates to popular benchmarks can help to ensure that algorithms (and the whole research field) do not overfit to unknown properties of the problems included in the most popular benchmarks. This can be achieved for example by transforming existing problems (e.g. by inverting multi-objective problems [24]) or by including problems from different sources. Running experiments on a wide range of problems helps to avoid overspecialisation of algorithms by ensuring
6 Benchmarking
175
that they are encountering different challenges, where their strengths and weaknesses are made apparent. (BU-2.2) Benchmark-driven Research Besides all technical issues, research is a creative process, and should never be limited solely to benchmarking. For example, continued use of benchmarks with major issues can lead to stagnating improvement of algorithms. Generally, a benchmark should be continuously monitored for its relevance to meaningful performance comparisons, and should function as a tool to enable high-quality research, not as a goal of its own.
6.3.4 Checklist to Avoid Pitfalls While there are numerous potential pitfalls when benchmarking evolutionary algorithms, many of them can be avoided by following the general recommendations in the checklist below. For each recommendation, we indicate the pitfalls it is related to. The checklist will not be fully applicable in all cases, but we hope it will provide a helpful and practicable guideline for both researchers and practitioners interested in benchmarks. • Start with a hypothesis (PC-2, AP-1.1, AP-2.2, BU) • Compute baselines (state-of-the-art, different type of algorithm, random search) (PC-1.1, PC-1.2, PC-2.3, BU-1.3) • Learn about target and test problems (visualisation, exploratory landscape analysis, evolutionary path) (PC, BU-2.1) • Consciously choose test problems that reflect the target application in appropriate distribution (PC-1.1, PC-1.2, PC-2.1, PC-2.2, AP-1.1, AP-2.2, BU-1.1) • Use existing peer-reviewed benchmarking frameworks with built-in analysis features where possible, at least use statistical methods (AP-2, AP-3, BU-1.3) • Avoid arbitrary decisions on experimental setup, including for hyperparameters (ideal: tune on different, but similar problems) (AP-1, BU-1) • Report complete results, including negative ones (BU-1) • Verify interpretations by isolating potential causes, form new hypotheses and reflect on original expectations (AP-3, BU).
6.4 Summary and Open Issues In this chapter, we have given an overview of existing benchmarks, including a characterisation of the various problems contained in them. While some such surveys of benchmarks are available in previous work [1, Chap. 3], here, we focus on multiand many-objective optimisation problems and are thus able to go into more details. We further specifically investigate both real-world and artificial benchmarks.
176
V. Volz et al.
We find that artificial benchmarks usually allow for scalability in search space in order to allow for an analysis of how dimensionality influences performance. However, in a lot of cases, detailed characteristics about the fitness landscape are unknown. For real-world-based benchmarks, this is even more apparent. In addition, the optima in such settings are often unknown, thus making it difficult to use measures that rely on a target solution (set) in a manner that is meaningful to the application in question. From our survey, we conclude that • A reasonable amount of benchmarking problems as well as frameworks already exists. However, newly proposed benchmarking functions, especially the ones inspired by real-world applications, tend to invent their own framework making it a hassle to combine them into a more extensive experiment. What is required is benchmarking software that is able to handle different standardised formats of benchmarks and incorporate them without much overhead. • Not enough reliable information exists on the properties of existing benchmarks. This hinders the ability to find patterns in benchmarking results based on problem properties. More studies that characterise existing benchmarking problems should be conducted. Based on this characterisation, properties that lack representation in existing benchmarks can be identified and new problems can be constructed accordingly. • The set of popular benchmarks is very small compared to the number of known problems. Aspects such as noise, correlated or preferred objectives, dynamic problems, and many-objectives are not present as commonly. This means that the community using these benchmarks risks overfitting to the specific intricacies of the chosen benchmark, thus making results not as general as desired. A more extensive list of open issues specifically regarding benchmarks and the problems contained in them can also be found in [1, Chap. 3], but the most important ones according to our findings are ease of use and ability to analyse and describe fitness landscapes. In addition, in this chapter we give a list of pitfalls to be aware of when conducting benchmarking studies. This is intended in order to complement the best practices for benchmarking described in [1]. There, a detailed overview is given with the purpose of making the reader aware of existing literature. Because of that, it is not only very thorough and precise but also long and not self-contained. Here, we instead opt for a concise format, with the intent to allow readers to easily identify weak points in their experimental setups and avoid them. Most of the identified issues seem to be tied to some form of misalignment of the research intent and the benchmarking problems used. Research goals and tests should generally be applied a priori and fit together. The second theme of issues is around unknowns in target applications, algorithms as well as benchmarking problems. Some of the pitfalls are technical in nature and thus have easy resolutions that we point out and refer the reader to related literature. One example for this are the several pitfalls related to the application of statistical tests. We hope that these solutions can become
6 Benchmarking
177
the standard in all benchmarking software, which would remove the related problems permanently. Other pitfalls require more research into the topic, for example, issues related to appropriate performance measurements and different problems. Several of the chapters in this book (see Chaps. 3, 5, and 7) address parts of these issues. The third category of pitfalls is more cultural in nature and requires a community-wide acceptance of a solution (e.g. via reviewing standards) in order to be resolved. This will only be possible by raising more awareness, for example through initiatives such as the benchmarking network [41].
References 1. T. Bartz-Beielstein, C. Doerr, J. Bossek, S. Chandrasekaran, T. Eftimov, A. Fischbach, P. Kerschke, M. Lopez-Ibanez, K. M. Malan, J.H. Moore, B. Naujoks, P. Orzechowski, V. Volz, M. Wagner, T. Weise, Benchmarking in optimization: Best practice and open issues (2020) 2. L.C.T. Bezerra, M. López-Ibáñez, T. Stützle, An empirical assessment of the properties of inverted generational distance indicators on multi- and many-objective optimization, in Evolutionary Multi-criterion Optimization (EMO) (2017), pp. 31–45 3. M. Chiarandini, L. Paquete, M. Preuss, E. Ridge, Experiments on metaheuristics: Methodological overview and open issues. Technical Report DMF-2007-03-003, The Danish Mathematical Society, Denmark (2007) 4. S.J. Daniels, A.A. Rahat, R.M. Everson, G.R. Tabor, J.E. Fieldsend, A suite of computationally expensive shape optimisation problems using computational fluid dynamics, in Parallel Problem Solving from Nature (PPSN) (Springer, 2018), pp. 296–307 5. K. Deb, Evolutionary algorithms for multi-criterion optimization in engineering design, in Evolutionary Algorithms in Engineering and Computer Science (EUROGEN) (1999), pp. 135– 161 6. K. Deb, C. Myburgh, Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2016), pp. 653–660 7. K. Deb, L. Thiele, M. Laumanns, E. Zitzler, Scalable multi-objective optimization test problems, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2002), pp. 825–830 8. T. Eftimov, P. Korošec, Identifying practical significance through statistical comparison of meta-heuristic stochastic optimization algorithms. Appl. Soft Comput. 85(105862) (2019) 9. T. Eftimov, P. Korošec, The impact of statistics for benchmarking in evolutionary computation research, in Genetic and Evolutionary Computation Conference (GECCO) Companion (ACM Press, 2018), pp. 1329–1336 10. T. Eftimov, G. Petelin, P. Korošec, Dsctool: a web-service-based framework for statistical comparison of stochastic optimization algorithms. Appl. Soft Comput. 87(105977) (2019) 11. K. Eggensperger, M. Lindauer, F. Hutter, Pitfalls and best practices in algorithm configuration. J. Artif. Intell. Res. 64, 861–893 (2019) 12. A. Eiben, S. Smit, Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol. Comput. 1(1), 19–31 (2011) 13. X. Gandibleux, The MOCO numerical instances library. http://xgandibleux.free.fr/MOCOlib/, Accessed 20 July 2020 14. T. Glasmachers, M.T.M. Emmerich, EMO’2017 Real-World Problems. https://www.ini.rub. de/PEOPLE/glasmtbl/projects/bbcomp/. Online, accessed 22 August 2020 15. T. Glasmachers, I. Loshchilov, Black Box Optimization Competition BBComp. https://www. ini.rub.de/PEOPLE/glasmtbl/projects/bbcomp/. Online, Accessed 22 August 2020 16. N. Hansen, A. Auger, O. Mersmann, T. Tušar, D. Brockhoff, COCO: a platform for comparing continuous optimizers in a black-box setting. Optim. Methods Softw. 36, 114–144 (2021)
178
V. Volz et al.
17. N. Hansen, D. Brockhoff, O. Mersmann, T. Tusar, D. Tusar, O.A. ElHara, P.R. Sampaio, A. Atamna, K. Varelas, U. Batu, D.M. Nguyen, F. Matzner, A. Auger, COmparing Continuous Optimizers: numbbo/COCO on Github (2019) 18. N. Hansen, S. Finck, R. Ros, A. Auger, Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions. Technical Report RR-6829, Inria, France (2009). [Updated February 2010] 19. S. Huband, P. Hingston, L. Barone, L. While, A review of multiobjective test problems and a scalable test problem toolkit. Trans. Evol. Comput. 10(5), 477–506 (2006) 20. E.J. Hughes, Radar waveform optimisation as a many-objective application benchmark, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2007), pp. 700–714 21. H. Ishibuchi, L. He, K. Shang, Regular Pareto front shape is not realistic, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2019), pp. 2034–2041 22. H. Ishibuchi, Y. Setoguchi, H. Masuda, Y. Nojima, Performance of decomposition-based manyobjective algorithms strongly depends on pareto front shapes. IEEE Trans. Evol. Comput. 21(2), 169–190 (2017) 23. H. Ishibuchi, N. Tsukamoto, Y. Nojima, Evolutionary many-objective optimization: a short review, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2008), pp. 2419–2426 24. H. Jain, K. Deb, An improved adaptive approach for elitist nondominated sorting genetic algorithm for many-objective optimization, in Evolutionary Multi-Criterion Optimization (EMO) (Springer, 2013), pp. 307–321 25. S. Jiang, M. Kaiser, S. Yang, S. Kollias, N. Krasnogor, A scalable test suite for continuous dynamic multiobjective optimization. IEEE Trans. Cybernet. 50(6), 2814–2826 (2020) 26. P. Kerschke, H. Trautmann, Comprehensive Feature-based Landscape Analysis of Continuous and Constrained Optimization Problems Using the R-package flacco, in Applications in Statistical Computing (Springer, 2019), pp. 93 – 123 27. T. Kohira, H. Kemmotsu, O. Akira, T. Tatsukawa, Proposal of benchmark problem based on real-world car structure design optimization, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 183–184 28. H. Li, K. Deb, Q. Zhang, P. Suganthan, L. Chen, Comparison between MOEA/D and NSGA-III on a set of novel many and multi-objective benchmark problems with challenging difficulties. Swarm Evol. Comput. 46, 104–117 (2019) 29. J. Liang, C. Yue, G. Li, B. Qu, P.N. Suganthan, K. Yu, Problem definitions and evaluation criteria for the CEC 2021 on multimodal multiobjective path planning optimization. Technical report, Computational Intelligence Laboratory - Zhengzhou Universit, China and Nanyang Technological University, Singapore (2020) 30. S. Liu, Q. Lin, K.C. Tan, Q. Li, Benchmark problems for CEC2021 competition on evolutionary transfer multiobjectve optimization. Technical report, City University of Hong Kong (2021) 31. Y. Marca, H. Aguirre, S. Z. Martinez, A. Liefooghe, B. Derbel, S. Verel, K. Tanaka, Approximating Pareto set topology by cubic interpolation on bi-objective problems, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2019), pp. 386–398 32. H. Masuda, Y. Nojima, H. Ishibuchi, Common properties of scalable multiobjective problems and a new framework of test problems, in 2016 IEEE Congress on Evolutionary Computation (CEC) (2016), pp. 3011–3018 33. T. Matsumoto, N. Masuyama, Y. Nojima, H. Ishibuchi, A multiobjective test suite with hexagon Pareto fronts and various feasible regions, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2019), pp. 2058–2065 34. I.R. Meneghini, M.A. Alves, A. Gaspar-Cunha, F.G. Guimarães, Scalable and customizable benchmark problems for many-objective optimization. Appl. Soft Comput. 90, 106139 (2020) 35. O. Mersmann, B. Bischl, H. Trautmann, M. Preuss, C. Weihs, G. Rudolph, Exploratory landscape analysis, in Conference on Genetic and Evolutionary Computation (GECCO) (ACM Press, 2011), pp. 829–836 36. Y. Nojima, T. Fukase, Y. Liu, N. Masuyama, H. Ishibuchi, Constrained multiobjective distance minimization problems, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2019), pp. 586–594
6 Benchmarking
179
37. T. Ray, K. Liew, A swarm metaphor for multiobjective design optimization. Eng. Optim. 34(2), 141–153 (2002) 38. L. Relund, Multi-objective optimization repository (MOrepo). https://github.com/ MCDMSociety/MOrepo, Accessed 20 July 2020 39. R. Tanabe, H. Ishibuchi, An easy-to-use real-world multi-objective optimization problem suite. Appl. Soft Comput. 89, 106078 (2020). https://github.com/ryojitanabe/reproblems, Accessed 15 April 2020 40. K. Tang, X. Li, P.N. Suganthan, Z. Yang, T. Weise, Benchmark functions for the CEC’2010 special session and competition on large-scale global optimization. Technical report, Nature Inspired Computation and Applications Laboratory (2009) 41. The Benchmarking Network, Benchmarking Network Homepage (2019). https://sites.google. com/view/benchmarking-network, Accessed 13 September 2020 42. The Japanese Society of Evolutionary Computation (JSEC), The 3rd Evolutionary Computation Competition - Wind Turbine Design Optimization (2019). http://www.jpnsec.org/ files/competition2019/EC-Symposium-2019-Competition-English.html, Accessed 1 September 2020 43. The Task Force on Benchmarking. IEEE CIS Task Force on Benchmarking Homepage (2019). https://cmte.ieee.org/cis-benchmarking/, Accessed 8 October 2020 44. T. Tušar, D. Brockhoff, N. Hansen, Mixed-integer benchmark problems for single- and biobjective optimization, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2019), pp. 718–726 45. D.A. Van Veldhuizen, Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations. Ph.D. thesis, Air University, USA, Air Force Institute of Technology, Ohio (1999) 46. M. Vasile, Robust optimisation of trajectories intercepting dangerous neo, in AIAA/AAS Astrodynamics Specialist Conference and Exhibit. AIAA (2002) 47. V. Volz, B. Naujoks, Towards game-playing AI benchmarks via performance reporting standards, in Conference on Games (CoG) (IEEE Press, 2020) pp. 764–777 48. V. Volz, B. Naujoks, P. Kerschke, T. Tušar, Single- and multi-objective game-benchmark for evolutionary algorithms, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2019), pp. 647–655. http://www.gm.fh-koeln.de/~naujoks/gbea/, Accessed 8 October 2020 49. H. Wang, D. Vermettern, F. Ye, C. Doerr, T. Bäck, IOHanalyzer: Performance Analysis for Iterative Optimization Heuristic (2020). arXiv:2007.03953 50. E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)
Chapter 7
Visualisation for Decision Support in Many-Objective Optimisation: State-of-the-art, Guidance and Future Directions Jussi Hakanen, David Gold, Kaisa Miettinen, and Patrick M. Reed Abstract This chapter describes the state-of-the-art in visualisation for decision support processes in problems with many objectives. Visualisation is an important part of a constructive decision making process for examining real world many-objective problems. The chapter first illustrates how visualisation can be applied to problem framing, guided optimization, trade-off assessment and solution selection. Next, the chapter reviews state-of-the-art visualisation approaches in terms of what is available and what is typically used. Guidance is provided for choosing and applying visualisation techniques including recommendations from the field of visual analytics. These recommendations are illustrated through a complex real-world decision problem with ten objectives. Lastly, the chapter concludes with suggested future research directions for advancing the scope and impact of many-objective optimisation when confronting complex decision making contexts.
7.1 Introduction The purpose of this chapter is to provide an overview of how visualisations can be used in the context of multiobjective optimisation and decision support, especially in cases with more than three objectives. In the field of evolutionary computation, such optimisation problems are referred to as many-objective optimisation problems. In J. Hakanen (B) · K. Miettinen University of Jyvaskyla, Faculty of Information Technology, P.O. Box 35 (Agora), FI-40014 Jyvaskyla, Finland e-mail: [email protected] K. Miettinen e-mail: [email protected] D. Gold · P. M. Reed Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY, USA e-mail: [email protected] P. M. Reed e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_7
181
182
J. Hakanen et al.
addition to describing the state-of-the-art, we provide a detailed example of the use of visualisation to aid decision makers in decision making. While many papers provide examples and guidance for using visualisation to examine solution set properties and reveal trade-offs, little attention has been given to the implications of these properties to real-world decision making. In real-world decision contexts, multiple, often conflicting, objectives have to be taken into account when identifying solutions to complex problems to be implemented or further tested. This task often has to be performed under constraints affecting the feasible decisions. With an increasing numbers of objectives, the resulting potential for a large number of candidate solutions and complex geometries renders data visualisation as an essential tool for navigating and evaluating candidate decisions for these problems. Effective visualisation translates data generated by many-objective optimisation algorithms [7] and simulation models into useful information for human decision makers (DMs) who are experts in the problem domain seeking a better understanding of the implications of their candidate decisions. The most common way of using visualisations in many-objective optimisation has been to show the performance of potential solutions in the objective space (see, e.g. [17, 43, 49, 53, 81]). To augment this, often it is as important to visually analyse the potential solutions in the decision space because it determines how the solutions are implemented in practice [88]. In addition, examples of other ways to use visualisations, not covered in this chapter, are to support DMs by proper structuring of the optimisation problem [2, 38], analysing the performance of optimisation algorithms, e.g., by using attainment functions [12, 80] or visualisable test problems [15] understanding topological characteristics of the decision space, e.g., by cost landscapes [17], local optima networks [94], gradient field heatmaps [95] or Plot of Landscapes with Optimal Trade-offs [96], setting parameters of the optimisation algorithms used [11] or expressing preferences during interactive optimisation [9, 23, 73], to name a few uses. In the decision making context many-objective optimisation seeks to advance DMs’ understanding of trade-offs by facilitating the discovery of Pareto optimal (PO) solutions, where improvement in any one objective can only be achieved with degrading performance in one or more of the other objectives. Note that the PO solutions cannot be preferentially ordered in the absence of information on DM’s preferences, which may evolve through interaction with candidate solutions. The set of all PO solutions is referred to as the Pareto front in the objective space. In this chapter, we will refer to the solutions produced by many-objective optimisation algorithms as Pareto approximate (PA) ones since, for example, multiobjective evolutionary algorithms are only guaranteed to produce non-dominated solutions [13]. A solution consists of a decision variable vector and a corresponding objective vector. In evolutionary computation, the decision variables are sometimes characterized through genotypes, the representation of a solution used for crossover and mutation, and phenotypes, the actual solution used for evaluation [97]. In this chapter, we will focus on visualisation of phenotypes as they are what is communicated to DMs. Effective design and usage of visualisation in applications of many-objective optimisation remains an outstanding challenge. Visualising trade-offs is a complex
7 Visualisation for Decision Support in Many-Objective Optimisation …
183
task that becomes increasingly challenging as the number of objectives increases. Recent innovations in multi-dimensional visualisation based on data transformation, visual encoding and interactive exploration have contributed tools that assist DMs when navigating high-dimensional solution spaces [47]. Well designed visualisations provide insights about the solutions and the optimisation problem itself. Understanding the possibilities and limitations of the decision problem at hand, and extraction and utilisation of the preferences of different DMs are fundamental needs when supporting decision making. In this context, interactive multiobjective optimisation (IMO) approaches hold significant value (see, e.g. [51, 54, 89]). In IMO, DM’s preferences are taken into account in an iterative way during optimisation and the idea is to find such PA solutions which best reflect the given and possibly changing preferences. How to discover these preferences and contextualising their implications with respect to problem formulation/characteristics is not at all an easy task in complex decision problems possibly involving multiple DMs. Different visualisation based approaches can be used to support this process and gain valuable insights for structuring compromises. Visualisation should not be seen as a static process; it is better framed as a class of tools seeking to advance iterative and deliberative learning when confronting complex high-dimensional decision problems. In the visualisation community, visual analytics (VA) specifically addresses learning and dynamic workflows aimed at facilitating analytical reasoning by using interactive visual interfaces [40, 58, 75]. VA is focused on enhancing perception, hypothesis testing (e.g., bottom up or top down sense making), and gaining insights from complex data. Many-objective optimisation is a good example of such problems due to a large number of conflicting objective and constraint functions as well as the potential complex relationships between decision variables depending on the application. Understanding relationships between the conflicting objectives, identifying critical constraint boundaries and exploring the implications of decision variables are all concerns that fundamentally benefit DMs. VA is thus closely related to decision making involving many objectives, and the relationships of these two fields have been recently studied [23, 46]. VA systems are typically based on the usage of coordinated multiple views that enable interactive exploration of the data. The idea is to show the data in different ways and the interaction enables e.g. linking and brushing that the DMs can use to manipulate the data elements shown for insight gaining. Although many optimisation-based decision making approaches have been proposed in the literature that have some in-built visualisation capabilities, these frameworks are typically designed by the developers of the optimisation algorithms (in contrast with collaborating with experts in visualisation). Typically, these approaches are based on a single type of visualisation that the user can interact with at a time (see, e.g., [25, 30, 33, 48, 56, 72, 77, 85]). There are examples where more advanced visualisation techniques adopted from visualisation community are used [9, 23, 46, 70, 88]. However, software implementations of these approaches are rarely openly available, which is a serious limitation to their practical use. Examples of existing software include commercial process integration and design optimisation software like modeFrontier (https://esteco.com/modefrontier), Optimus (https://noesissolutions.
184
J. Hakanen et al.
com/our-products/optimus and HEEDS (https://redcedartech.com)). The proprietary nature of these software can be a limitation for many practitioners and, especially, for researchers who want to implement their novel ideas. In practice, there are situations where DMs are not directly using the optimisation/decision making software but are supported in decision making by a person called an analyst, who has knowledge about optimisation and acts as a facilitator providing potential solution candidates along with explanations for the DMs. Usually, different DMs who are interested in the decision problem have their own opinions of the problem formulation and own preferences for the objectives. The role of the analyst is then crucial in providing appropriate visual support in communicating different problem formulations, candidate PA solutions and feasibility of the given preferences to the DMs. This describes a constructive decision making approach where the problem and its solution are constructed collaboratively in facilitated decision support processes [78]. In order to accomplish this, the analyst needs to have a very good understanding and command of advanced visualisation tools. It is not enough to just know the existing visualisation techniques but also to be able to select appropriate techniques for different tasks and to be able to use them in an effective manner given the diversity of audiences and contexts that can arise in real-world decision support applications. In this chapter, we provide illustrated guidance for using advanced visualisation techniques in a challenging real-world application context. The structure of the chapter is the following. Section 7.2 discusses the different ways in which visualisation can support many-objective optimisation applications. Section 7.3 provides a brief review of state-of-the-art methods. Section 7.4 illustrates the use of visualisation in a complex general aviation aircraft real-world example. We conclude with a discussion of current challenges in visualisation for many-objective problems and propose a set of outstanding needs that should be addressed in future research in Sect. 7.5. Finally, the chapter ends with conclusions in Sect. 7.6.
7.2 Different Ways of Using Visualisations to facilitate many-objective decision making The challenging nature of real-world many-objective optimisation problems requires a constructive decision making approach, where analysts and DMs iteratively explore problem formulations, preferences and potential solutions [78]. Figure 7.1 presents a conceptual diagram of the constructive decision making process including four main components: Problem Formulation, Guided Optimisation, Trade-offs Assessment and Solution Selection. Each box below the main component represents one way visualisation is utilised within the given component. While the majority of literature regarding visualisation for many-objective optimisation concerns visualising solution sets, Fig. 7.1 highlights that it is only one part of the whole constructive decision making process and other issues need to be visually communicated to analysts and/or DMs.
7 Visualisation for Decision Support in Many-Objective Optimisation …
185
Fig. 7.1 A conceptual framework for understanding the role of visualisation in constructive decision making
Problem formulation is the process of translating a real-world problem into a mathematical formulation that can be “solved” with the assistance of algorithms [78] (see also Chap. 2). To formulate a problem, analysts and DMs must determine 1) how the system should be modeled, 2) DM’s values and how to measure them through objectives, 3) the full set of actions that a DM can take and 4) relevant uncertainties [44]. In other words, the objective functions, constraints, and decision variables need to be determined. For real-world problems, the proper problem formulation is not known a priori and more than one formulation may be explored. As such, rival framings may be developed where candidate formulations are examined in parallel to facilitate a better understanding of the problem [84]. Visualisation tools are a key part of defining and exploring rival problem formulations. Tools such as means-ends objectives networks [38], value trees, causal mapping, and rich pictures aid analysts and DMs in understanding values and objectives, specifying candidate action sets
186
J. Hakanen et al.
and selecting an appropriate system model [2]. A visualisation can also be used to study uncertainty related to the constraints [71]. After a problem formulation has been chosen, analysts and DMs can begin the processes of searching for candidate solutions using many-objective optimisation. On a general level, one can use many-objective optimisation to either find an approximation of the whole set of PO solutions or to focus to some interesting subset by including DM’s preferences in optimisation. The latter approach is often more desirable in order to avoid solutions that might not be practically relevant. When the DM is actively involved during optimisation as in IMO, visualisation plays a crucial role by allowing DMs to evaluate candidate solutions and specify preferences on how the solutions could be improved. IMO provides a link between using automated optimisation processes and utilising expert knowledge [54]. In addition, the progress of DM’s preferences can be visualised to show how the preferences have evolved during the iterative optimisation process. Population based search strategies for many-objective optimisation produce candidate solution sets, with which analysts and DMs can examine trade-offs and assess the quality of candidate solutions. The high-dimensionality of many-objective problems makes this exploration a challenging task as both trade-offs between objectives and the quality of a solution set are difficult to assess with raw numbers. Visualisation tools logically arrange solution sets and allow analysts and DMs to learn about their mathematical properties. Recent literature has presented many tools for examining solution set properties and tracking the convergence of optimisation algorithms [16]. Visualisation tools are also helpful for assessing solution robustness, uncertainty and sensitivity. Tools such as factor mapping allow analysts and DMs to visualise how uncertainty affects candidate solutions and target specific mitigation actions [44]. Visualisation is also a key for solution selection, where DMs assess trade-offs between conflicting objectives and make a choice reflective of their preferences. The navigation of such trade-offs is aided by using coordinated multiple views, brushing and mappings of decision variables to the objective space [9, 88]. Dynamic, linked visualisations of the decision and objective spaces are especially useful for mapping actions to consequences. Uncertainty, robustness, and sensitivity considerations are also important in solution selection. When multiple DMs are involved, visualisation is a key tool for conflict resolution as it provides negotiating parties with a dynamic tool for exploring the sources of conflict [82]. It is important to note that all decision making processes do not necessarily include all the elements shown in Fig. 7.1. For example, the number of DMs, optimisation tools used, and the nature of the decision to be made may affect the elements used. In the next section, we review state-of-the-art methodologies demonstrating how visualisation is being used in research and practice.
7 Visualisation for Decision Support in Many-Objective Optimisation …
187
7.3 State-of-the-Art This section provides a review of state-of-the-art visualisation techniques for manyobjective problems. Then, we describe how those techniques have been integrated into existing many-objective optimisation frameworks. To do this, we select some papers that are closely related to the four components of the conceptual framework shown in Fig. 7.1 and describe how they use visualisation to support DMs in the tasks of focus.
7.3.1 Overview of Individual Visualisation Techniques for Solution Sets Visualising many-objective solution sets in both the objective and the decision spaces is a key element of all four components shown in Fig. 7.1. It may serve multiple purposes including assessment of algorithmic convergence, identification of dominance relations between solutions, examination of Pareto front geometry and the facilitation of a posteriori or interactive decision making. The high-dimensionality of solution sets, both in the objective and decision spaces, present a challenge for effectively visualising many-objective solutions sets. Effective visualisation is an area of active research that has seen many advances in recent years. As mentioned, visualisations in many-objective optimisation often focus on the objective space. In many cases, it has a lower dimension than the decision space. Solutions in the objective space can also be visualised in an application-independent manner. On the other hand, visualisations of the decision space are often the source of important applicationspecific insights and innovations and the DM in question may be familiar with special application-specific visualisations. Visualisation strategies for high-dimensional solution sets can be roughly broken into two categories, strategies that plot the individual solutions and strategies that plot transformations of solutions to convey set properties [16]. In what follows, we will use a set of 216 PA solutions for a four objective version of the DTLZ7 test problem [98]. These solutions are obtained by sampling the analytical Pareto front. One common example of a strategy that visualises individual solutions is a scatter plot matrix [79] shown in Fig. 7.2. All solutions are represented in pairwise comparisons between each pair of objectives. Figure 7.2 reveals that the Pareto front of DTLZ7 is highly non-linear and disconnected. Objectives 1-3 have eight disjoint clusters while the fourth objective appears to have a continuous front. Scatter plot matrices such as Fig. 7.2 are helpful for examining pairwise trade-offs between objectives, however, they are limited in their ability to convey information about the geometry of the many-objective Pareto front. It is suggested in [86] to use colour mappings in scatter plot matrices to visualise data patterns involving more objectives. In this case, symbols have varying amounts of colours depending on objective values. However, this can be difficult to interpret.
188
J. Hakanen et al.
Fig. 7.2 A scatter plot matrix of the four objective version of the DTLZ7 test problem
Five other strategies for visualising individual solutions for DTLZ7 are demonstrated in Fig. 7.3. Figure 7.3a shows a Bubble plot, a 2D scatter plot that utilises the size and colour of points to represent an additional two dimensions. For DTLZ7, the first two objectives are plotted one the x- and y-axes, a third as the colour of each point and a fourth as the size of each point. The disjoint nature of the Pareto front is evident in the visualisation of objectives 1 and 2, but the qualities of objectives 3 and 4 are harder to comprehend from the visualisation. Figure 7.3b contains a 3D scatter plot, which plots three objectives on the x-, y- and z-axes and a fourth as the colour. In this case, the overall shape of the Pareto front is easier to observe through the 3D scatter plot. 3D scatter plots are often interactive, allowing the user to explore the shape of the Pareto front from multiple vantage points. While scatter plots are effective for problems with two or three objectives, when utilised alone they are limited in their ability to display high-dimensional data. More objectives can be added to these
7 Visualisation for Decision Support in Many-Objective Optimisation …
189
Fig. 7.3 The Pareto front of the four objective DTLZ7 test problem visualised in five different ways: a Bubble Plot, b 3D Scatter Plot, c Heatmap, d RadVis, and e Parallel Coordinate Plot
plots through the use of attributes such as size and orientation of the points, though these may get overwhelming for the viewer. Scatter plots are most effective when used in conjunction with other visualisation techniques, allowing DMs to examine and interact with multiple subspaces of the objective space simultaneously. Radial visualisation (RadVis), shown in Fig. 7.3d, takes a different strategy for representing high-dimensional data. Using an analogy from physics, each objective is plotted uniformly spaced along the circumference of a unit circle, while, each solution is represented as a point that is attached to all m objectives by a set of hypothetical springs (one spring for each objective). A solution’s objective value for the ith objective defines the spring constant exerted by the ith spring. A point’s location thus represents the equilibrium point between all springs [32]. For DTLZ7, RadVis effectively illustrates the disjoint clusters in the Pareto front. While RadVis provides information about the shape and distribution of points within the solution set, it fails to provide any information on the quality of solutions obtained. A solution set containing only high quality solutions may look similar to one containing only poor solutions if the relative distribution of solutions between objectives is similar between the two sets. To include information on convergence within the visualisation, 3DRadVis and 3D-RadVis antenna have recently been proposed [33, 34]. The insights gained from Radvis may be sensitive to the ordering of the objectives, which should be taken into consideration when using this technique. Figure 7.3d contains a heatmap [62], which plots each objective as a column, each solution as a row and objective values as coloured shading. Heatmaps can illuminate general patterns in the solution set, but are not suited for assessing trade-
190
J. Hakanen et al.
offs between individual solutions. For DTZL7, a heatmap of the solutions in the objective space provides very little information about the properties of the Pareto front or the nature of trade-offs between objectives. To enhance the utility of heatmaps in decision making contexts, [83] provide guidance on how to arrange the columns optimally. In turn, an effective technique for viewing trade-offs are parallel coordinate plots [36], shown in Fig. 7.3e. Parallel coordinate plots represent each objective on a vertical axes, and each solution as a poly line. The objective value for each line is represented by the location it crosses each axes. Parallel coordinate plots are especially useful for navigating trade-offs between candidate solutions but are not as useful when examining overall solution set properties. When visualising a large number of solutions, parallel coordinate plots are also not that useful due to cluttering of overlapping solutions. In some cases, brushing can be used to reduce cluttering, e.g., corresponding to ideas presented in [77]. The utility of parallel coordinate plots for decision making can be improved through proper formatting and arrangement of the axes. Further information and guidance on the use of parallel coordinate plots can be found in [1, 28, 37, 45, 67] and [90]. In [91], parallel coordinates are adjusted to reflect information on relations between objective functions and dominance among solutions to illustrate progress of evolutionary algorithms. Another strategy for visualising high-dimensional solution sets is to display properties of the set, rather than individual solutions [16]. Figure 7.4 shows four common ways solution set how properties can be displayed visually. Figure 7.4a shows the DTLZ7 problem through Multi-dimensional Scaling (MDS). MDS maps a highdimensional solution set to a two-dimensional plane while seeking to preserve the Euclidean distance between points. This is accomplished through gradient descent. Figure 7.4a illustrates that this method is quite effective for highlighting the disjoint nature of DTLZ7. Isomapping, shown in Fig. 7.4b, extends MDS by first grouping solutions using k-nearest neighbors, then mapping the points by minimising the geodesic distance between clusters [74]. Isomapping can better account for nonlinearities within high-dimensional solution sets that challenge traditional MDS. For DTLZ7, Isomap displays the eight distinct clusters within the Pareto front. It should be noted that for DTLZ7, these clusters are not apparent with low values of m, highlighting Isomapping’s sensitivity to parameterisation. Another form of low-dimensional mapping is Sammon mapping [66], shown in Fig. 7.4c. Like MDS, Sammon mapping seeks to preserve high-dimensional distances between points. Sammon mapping achieves this by minimising a new error metric, known as Salmon stress, through gradient descent. For DTLZ7, Sammon mapping also effectively illustrates the disjoint clusters within the Pareto front. T-distributed Stochastic Neighbor Embedding (t-SNE) creates a Gaussian distribution over each data point and calculates the density of all other points with respect to the distribution around each point. It then uses a conditional Student-t distribution to calculate point to point probabilities of similarity in a two dimensional space [99]. This method enables t-SNE to preserve both local and macro distances within the high-dimensional structure. Finally, Self Organising Maps (SOMs) [41], shown in Fig. 7.4e, also effectually show the disjoint nature of the DTLZ7 Pareto front. SOMs use a neural network to competitively locate a predetermined set of neurons. The shading on the neurons,
7 Visualisation for Decision Support in Many-Objective Optimisation …
191
Fig. 7.4 The Pareto front of the four objective DTLZ7 test problem visualised using four different lower dimensional mappings: a Multi-dimensional Scaling, b IsoMapping, c Sammon Mapping, d t-SNE and e Self Organising Maps
represented as hexagons, denotes the distance between neurons, with darker shading corresponding to larger distances. In Fig. 7.4e, areas of dark shading represent cluster boundaries. Other techniques for visualising individual solutions include prosection plots, polar coordinate visualisation, Neuroscale, empirical attainment functions hyper-space diagonal counting, spider-web or radar charts, star coordinate systems, level diagrams petal diagrams and different icons like Chernoff’s faces. For further information, see [16, 53] and references therein. In [18], the authors made empirical comparisons of the solution set visualisation techniques described above and concluded that Pareto front visualisation follows the no-free-lunch theorem, as no single visualisation technique was able to adequately capture all salient properties of a solution set. This finding highlights the challenge of effectively utilising visualisation in the constructive decision making
192
J. Hakanen et al.
process described in Fig. 7.1. In the next section, we connect our literature review to the core decision support steps outlined in Fig. 7.1. The guidance contextualises available tools for core steps of many-objective optimisation decision support workflows.
7.3.2 Integrating Visualisation into Many-Objective Decision Support Visualisation in many-objective decision support has seen many important contributions in recent years. Recent innovations have improved the user’s ability to dynamically interact (rotate, brush, change colouring/order, etc.) with many-objective solutions through coordinated multiple views during and after optimisation. These approaches have been influenced by the ideas arising from visualisation research, especially visual analytics. A pioneering paper [40] explains that “visual analytics combines automated analysis techniques with interactive visualisations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets”. VA methods are designed to support decision making and, therefore are easily translatable to many-objective optimisation contexts [23, 46]. The overarching theme in VA is the usage of coordinated multiple views to gain insight of the data [40, 64]. Some VA approaches consider the analysis of an existing data set (see e.g., [3]) while others also deal with supporting generation of new data (see e.g., [50, 57]). The use of coordinated multiple views is one way that analysts and DMs can deal with the application of the no-free-lunch theorem described by [18]. Another important concept is linking and brushing where the selected data points can be automatically identified in different views at the same time to help insight gaining [39, 87]. Visualisations have typically been included into many-objective optimisation software by using existing visualisation capabilities present in the language in which the optimisation methods have been implemented (e.g., matplotlib for Python and visualisation toolboxes for Matlab). In addition, there are visualisation tools that can be connected to different programming languages like D3 (https://d3js.org/) [6] or Plotly that are based on JavaScript. Based on Plotly, Dash (https://plotly.com/dash/) is used to build web apps and it can be used with Python, R and Julia languages. Further, React (https://reactjs.org/) is another JavaScript-based library for building user interfaces and Victory (formidable.com/open-source/victory/) provides modular visualisation components for React. In the remainder of this section, we will overview the state-of-the-art for visualisations in the context of the constructive decision support process outlined in Fig. 7.1. There are many papers published on visualisation and many-objective optimisation. Our aim is not to provide a comprehensive review but rather to highlight studies that are especially relevant to the decision making elements shown in Fig. 7.1.
7 Visualisation for Decision Support in Many-Objective Optimisation …
7.3.2.1
193
Problem Formulation
Visualisation has been shown to be an important aid for supporting problem framing or the identification of problem formulations of interest [9, 46, 88]. Many-objective optimisation and VA were first combined in [88] where a many-objective visual analytics framework was proposed to support in designing complex engineered systems. Emphasis of the paper was to demonstrate how VA can be used to analyse different formulations of the same problem and how different formulations behave with respect to each other. The case study of the paper was related to general aviation aircraft product family design which we will also use in the next section to illustrate visualisation workflow examples. To analyse different formulations with 1, 2 and 10 objectives, coordinated multiple views were used to highlight interrelationships between solutions for these problems [88]. The authors used 2D/3D scatter plots and parallel coordinate plots as coordinated multiple views. The usage of VA techniques was aimed at learning through problem reformulation which emphasises the problem formulation component. The VA used in this had three important contributions: (1) it showed the standard singleand bi-objective formulations neglected important alternatives, (2) goal programming aggregation schemes were shown to be biased to only a few performance objectives, and (3) the broader ten objective formulation yielded significant performance gains as well as diverse designs that could be of interest to different markets.
7.3.2.2
Guided Optimisation
Interactive optimisation methods allow DMs to guide optimisation algorithms towards preferred solutions. Active involvement of DMs in interactive methods necessitate visual interfaces to evaluate computed solutions and guide generation of new solutions with their preferences. Approaches where DMs can guide optimisation from visualisation include DM specifying (1) a primary objective to be optimised and bounds for others from a parallel coordinate plot [9], (2) a reference point (i.e., a desirable value for each objective) from a parallel coordinate plot [23] or interactive decision maps [48] or real-time visualisations of the generated solutions [24, 65, 73], (3) a set of desirable solutions from a heatmap [30], (4) bounds for objectives/decision variables from a parallel coordinate plot [29, 31], (5) bounds for objectives from application specific 2/3D views [46], (6) bounds for decision variables or a preferred solution from a parallel coordinate plot or a scatter plot [70], (7) desirable changes in objective values in a bar chart [60], and (8) ranges for objectives from a spider web plot (i.e., a parallel coordinate plot where the ends of the plot are connected) [77]. The approaches mentioned relate also to the solution selection component. The connection between visual analytics and interactive optimisation was studied in [46] to identify the commonalities between interactive optimisation and VA. It was found that they both try to leverage the distinct strengths of humans and computers while learning from complex data to support decision making. Usefulness of inter-
194
J. Hakanen et al.
active optimisation (through the involvement of a human DM) was acknowledged during the whole decision making process from problem formulation to solution selection. Further, the authors acknowledged the drawback of previous studies in interactive optimisation of not concentrating on visualisation and interaction techniques and that the existing studies almost never evaluate existing approaches based on user satisfaction. A problem-solving loop which covers the whole decision making process from problem structuring to final decision was proposed in [46]. It aimed to represent a theoretical framework similar to a sense-making loop that is commonly used in VA to formalise the VA process. As a case study, they developed a prototype tool for Brachytherapy in collaboration with radiation oncology professionals to answer the drawbacks mentioned above. To better understand what kind of tasks DMs face when using interactive methods, a task abstraction for interactive many-objective optimisation was done in [23]. Task abstraction is a common way to start visual interface design in VA to better understand the requirements of the analysis task. Thus, seven high-level tasks were identified and they were analysed in details and decomposed into low-level tasks from visualisation literature [23]. By having a better understanding of the needs, design and implementation of visual interfaces for interactive optimisation becomes easier. In addition to task abstraction, an example of implementing visualisations to support these tasks, and recommendations for implementing interactive methods were given in [23].
7.3.2.3
Trade-Off Assessment
Parallel coordinate plots have been often used in analysing solution sets with more than three objectives. An important thing to note here is that the parallel coordinate plots need to be interactive in order to get the most insight when using them in tradeoff assessment. Although they scale well with respect to the number of objectives, interpreting properties of solution sets is not necessarily an easy task. For example, the order of the axes highly influences how one can assess trade-offs from a parallel coordinate plot. For a given order of the objectives, a parallel coordinate plot can show m − 1 pairwise trade-offs between m objectives when the total number of objective combinations is m2 . Guidance for observing properties related to quality measures and distribution of the solution set, and ordering the axes was given in [45]. An alternative approach of showing pairwise trade-offs at once is a scatter plot matrix but, as mentioned earlier, it does not scale well with respect to m. While a static parallel coordinate plot shows only limited information, more pairwise tradeoffs can be uncovered by allowing the DMs to change the order of the axis. When m increases, the amount of possible combinations grows quickly and manual ordering fails to provide much additional information. In light of this limitation, systematic methods to define the best ordering as a starting point are needed (see e.g., [1, 91]. The capabilities of parallel coordinate plots can be increased by linking them to other views like scatter plots (see e.g., [9, 23, 88]).
7 Visualisation for Decision Support in Many-Objective Optimisation …
195
Several visualisation techniques have been especially developed for manyobjective optimisation by extending existing methods like parallel coordinates [1, 91], scatter plot matrix [35], RadVis [25, 33, 34, 72, 83], and heatmap [83]. An example of these approaches is PaletteViz which is tailored for analysing manyobjective solution sets [72]. The intent of the method is to allow DMs to quickly identify interesting solutions among a large PA set. The approach has three steps: (1) decompose the solution set into different layers based on their location within the solution set (from boundary to central), (2) visualise each layer by using 2D RadVis plots that are placed on top of each other, and (3) use colour and size of the markers to communicate geometric, performance, and preferential properties of the individual solutions. In addition to presenting different layers as piled up RadViz plots, a colouring scheme is used to represent the Euclidean distance of the solutions to a centroid of the solution set. Further, another colouring scheme is used to show the closeness of the solutions to constraint boundaries in case of constraint problems. Finally, solutions with large trade-offs are highlighted by increasing the marker size and having a red colour (different from the previous colouring schemes). The two proposed colouring schemes are alternative and can not be used at the same time. The authors mention that instead of the properties mentioned above, other metrics can be used in differentiating solutions. As one can see, the more features are added in visualisations, the more time DMs and analysts need to spend in learning to interpret them. PaletteViz was compared against parallel coordinates, RadViz and heatmap visualisations in [72]. One of the problems tested was the general aviation aircraft problem mentioned earlier. From PaletteViz visualisation, the authors made observations based on the solutions’ positioning with respect to constraint boundaries and solutions with large trade-offs but did not get into details about the practical implications of the identified solutions. A limitation of the PaletteViz is that it only considers the objective space and does not take into account how the solutions are located in the decision space directly other than how close they are to the constraint boundaries. Further, there exists no available implementation of PaletteViz yet.
7.3.2.4
Solution Selection
When DMs are selecting solutions for many-objective optimisation problems, coordinated multiple views help them in identifying solutions that match their preferences. The ability to interact with the views and to see the effects of their interactions shown in different ways helps analysing the properties of the solutions. When the solutions are visualised both in the objective and the decision spaces, the consequences of different decisions are easier to understand [88]. As mentioned before, most of the visualisation approaches have focused only on the objective space which means that the connection between the objective and decision variable values can not be easily seen. In addition to linking, brushing is an important tool especially in cases where the size of the solution sets is large. Tools such as Parasol, an open source, interactive D3 Javascript library for creating parallel coordinate plots [63], allow users
196
J. Hakanen et al.
to dynamically brush linked parallel coordinate plots of the objective and decision spaces. The interactive nature of the Javascript D3 library allow users to flexibly explore trade-offs across objectives and examine their implications in the decision space. In the solution selection phase, information about uncertainty, robustness [21] and sensitivity as well as closeness of the solutions to constraint boundaries [71] are important in real-world decision making. This kind of additional information can be added to the views e.g. by using colouring schemes, different type and size of the markers or using separate views. In [14], authors include scenario defining uncertainties within parallel coordinate plots of the objective space to examine how uncertainties affect solution performance. In [19], authors use visualisation to identify the effects of uncertainty in the decision space on the performance of PA solutions. Multivariate measures of robustness may also be visualised using the same tools as many-objective trade-offs, as demonstrated in [26, 27]. Visualisation examples of communicating robustness information in interactive methods are given in [92, 93]. Problems with large numbers of objectives often have large PA sets as the potential number of non-dominated solutions increases rapidly with the number of objectives. Strategies for navigating this challenge draw from both decision science and VA. In [8], the authors recommend presenting DMs with a set of maximally diverse set of alternatives to select from. These solutions can be found through interactive brushing of the PA set, a technique from VA that can aid DMs in combating the information overload resulting from the size of the PA set. An alternative way of thinking is to avoid trying to represent the whole Pareto front well since the higher the dimension, the more solutions this would necessitate. Instead, one could start with a rough approximation and make it more dense only in parts of the Pareto front that the DM is interested in. An example of using linked views to support solution selection was presented in [9] where an urban planning problem was used as a case study. The idea is that DMs can control the whole decision process from the visual interface that uses extended parallel coordinates as the main view. It is linked to other views like 3D scatter plots in the objective space and an application specific plots of the decision space. There is also numerical information of the most promising solution candidates shown to display more detailed information. As said, DMs can guide optimisation by interacting with the parallel coordinate plot and different actions of the DMs (e.g. brushing and selection) are shown in different ways in the plot. The DMs can also add other than objective values as axes in the main parallel coordinate plot view to augment the objective values (e.g. clustering results or aggregated scores of the solutions). Properties of solutions can be highlighted by different colours and thicknesses of the poly lines. As for the benefits of their approach in the case study, the authors state that “by laying down side-by-side the main criteria for a subselection of solutions in the synthesis phase, the user is equipped to take an informed decision, and justify and communicate it to other stakeholders” [9].
7 Visualisation for Decision Support in Many-Objective Optimisation …
7.3.2.5
197
Summary
Table 7.1 presents a summary of methodology and software that focus on visualisation for many-objective optimisation applications. We divide the methodologies into three categories, commercial software, shown in the top section, open source software, shown in the middle and theoretical frameworks without implementation, shown on the bottom. For each paper highlighted we examine seven relevant properties. First, the type of views the methodology/software allows DMs to utilise when examining the PA set. Second, the utilisation of linking between different views. Third, whether the user can interact with the views in real-time. Fourth, the spaces visualised by the methodology/software (objective and/or decision). Fifth, the name or the location of the methodology’s implementation (if any). Sixth, any external tools utilised by the visual implementation and, finally, the nature of the software license. Table 7.1 uses the following acronyms for different views: P=parallel coordinate plot, S2=2D scatter, S3=3D scatter, SM=scatter plot matrix, HM=heatmap, HG=histogram, B=boxplot, R=radviz, R3=3D radviz, M=multi-dimensional scaling, CM=correlation matrix plot, S=spider web plotC=chord plot, L=level diagrams, GH=gradient field heatmap, CL=cost landscape, PL=Plot of Landscapes with Optimal Trade-offs.
7.4 Illustrative Example of Visualisation in Real-World Decision Making Much of the literature concerning visualisation for many-objective optimisation applications focuses on optimisation (Fig. 7.1, box II) or analysing trade-offs, (Fig. 7.1 box III). In this section, we demonstrate how visualisation may be used to aid in decision making for a real-world decision problem (Fig. 7.1, box IV). We intend this demonstration to highlight the potential for interactive visualisation to overcome challenges stemming from information overload [40] due to the large number of candidate alternatives often present in solving many-objective optimisation problems with evolutionary methods. Rather than seeking to find a compromise solution based on initial preferences, we seek to find a diverse set of alternatives that serve as a starting point for DMs to explore their own preferences. This should not be confused with interactive methods where the DM directs the solution process and the generation of new solutions with preference information. Our demonstration focuses on a real-world decision problem related to a General Aviation Aircraft (GAA) product family design [69]. The GAA problem has different variants and we here concentrate on the version having ten objective functions which is clearly a many-objective problem. General Aviation refers to any aircraft that is not used for commercial or military purposes. GAAs are small, usually propeller driven aircraft purchased as flexible transport options not constrained by airports or schedules. GAA represent many fewer
no no
many
many
P
L
GH,CL,PL
P, S2/3, SM
P, S2/3, SM, HG
P, S2/3, S
P, S2/3, C
S2/3, P, HM, R
P, S2/3
Optimus
HEEDS
Raseman et al. [63]
Blasco et al. [100]
Schäpermeier et al. [101]
Hadjimichael et al. [20]
Hadka et al. [21]
Ojalehto & Miettinen [59]
Benítez-Hidalgo et al. [3]
Blank & Deb [4]
Tian et al. [76] no
no
yes
yes
yes
yes
yes
yes
yes
yes
many
modeFrontier
Linking
Views
Methodology/software
Table 7.1 ch7 many-objective optimisation
no
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
Interaction
o, d
o
o
o, d
o, d
o, d
o, d
o, d
o, d
o, d
o, d
o, d
Spaces
https://github.com/ BIMK/PlatEMO
https://pymoo.org
https://github.com/ jMetal/jMetalPy
https://desdeo.it. jyu.fi
https://github.com/ OpenMORDM
https://github.com/ Project-Platypus
https:// schaepermeier. shinyapps.io/ moPLOT/
[5]
https://parasoljs. github.io
https:// redcedartech.com
https:// noesissolutions. com/optimus
https://esteco.com/ modefrontier
Implementation
matlab
matplotlib
matplotlib
Plotly
R
J3
R
matlab
D3
N/A
id8decide
N/A
Visualisation tools
(continued)
free for research
free for research
open source
open source
open source
open source
open source
open source
open source
Commercial
Commercial
Commercial
License
198 J. Hakanen et al.
no no no no
P, HG
P, S3
S2/3
S2, SM, P, H
P, H
S2
S
P, S2/3
P, S2, B
HM, R
CM
R
P
P
R3
R3
R
Miettinen [52]
Cajot et al. [9]
Kollat & Reed [42]
Stump et al. [70]
Pajer et al. [61]
Mühlbacher et al. [57]
Trinkaus & Hanne [77]
Woodruff et al. [88]
Hakanen, et al. [23]
Walker et al. [83]
Ibrahim et al. [35]
Talukder & Deb [72]
Li et al. [45]
Zhen et al. [91]
Ibrahim et al. [33]
Ibrahim et al. [34]
He & Yen [25]
no
no
no
no
yes
yes
no
yes
yes
yes
yes
yes
no
no
P, HG
Miettinen & Mäkelä [55]
Linking
Views
Methodology/software
Table 7.1 (continued)
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
yes
yes
yes
yes
yes
yes
no
no
Interaction
o
o
o
o
o
o
o, d
o, d
o, d
o, d
o
o, d
o
o, d
o, d
o, d
o
o
Spaces
no
no
no
no
no
no
no
no
no
no
knowCube Navigator
TreePOD
WeightLifter
ATSV
VIDEO
SAGESSE (https:// urb.io)
https://ind-nimbus. it.jyu.fi
https://nimbus.it. jyu.fi
Implementation
–
–
–
–
–
–
–
–
–
–
N/A
Visplore
Visplore
Visualisation ToolKit
Visualisation ToolKit
N/A
N/A
N/A
Visualisation tools
–
–
–
–
–
–
–
–
–
–
N/A
N/A
N/A
N/A
N/A
N/A
demo available
free for research
License
7 Visualisation for Decision Support in Many-Objective Optimisation … 199
200
J. Hakanen et al.
Table 7.2 GAA decision variables Name Unit Activity factor (AF) Aspect ratio (AR) Nominal cruising speed (CSPD) Propeller diameter (DPROP) Tail elongation (ELODT) Seat width (SEATW) Wing sweep (SWEEP) Wing taper (TAPER) Wing loading (WINGLD)
Min value
Max value
ratio ratio Mach
85 7 .24
110 11 .48
feet
5.5
5.968
ratio
3
3.75
inches degrees ratio lb/ft^2
14 0 0.46 19
20 6 1 25
Table 7.3 GAA objectives (note: dollar values are in $1970) Name Description Unit Takeoff noise (NOISE) Empty weight (WEMP) Direct operating cost (DOC) Ride roughness (ROUGH) Fuel weight (WFUEL) Purchase price (PURCH) Range (RANGE) Lift-drag ratio (LDMAX) Cruising speed (VCMAX) Product family penalty function (PFPF)
Measurement of noise at takeoff Weight of the aircraft without passengers Cost of flying the aircraft Measure of flight roughness Weight of fuel Price of purchase
Minimise
lb
Minimise
$/hr
Minimise
ratio
Minimise
lb $
Minimise Minimise
Flight range nmi A measure of flight ratio performance Aircraft cruising speed knots A measure of commonality across aircraft families (2, 4 and 6 seats)
Min/Max
dB
Unitless
Maximise Maximise Maximise Minimise
7 Visualisation for Decision Support in Many-Objective Optimisation …
201
Fig. 7.5 A parallel coordinate plot of the PA set of the ten objective GAA problem. The solution set contains 3,462 possible designs, creating a challenge for DMs
passengers, but many more flights and ten times as many airfields when compared to commercial aircraft [88]. A pioneering paper [10] proposed a new approach for the design of GAAs that focused on the design of a family of aircraft with two, four and six seats, respectively. This strategy allows manufacturers to address different segments of the GAA market while taking advantage of a common platform to simplify the manufacturing process. To increase the efficiency of production, DMs seek designs that have high commonality across parameters of the two, four and six seat aircraft. To facilitate commonality, [69] proposed using a popular aircraft, the Beechcraft Bonanza B36TC as a baseline aircraft which simplified the design process by fixing many design parameters. After the fixed parameters, designers are left with nine parameters for each design to customise for each aircraft size. Building on the work of [68, 69] framed the GAA design problem as a many-objective optimisation problem, using the design parameters as decision variables and specifying ten performance objectives. Decision variables can be found in Table 7.2 and objectives are defined in Table 7.3. For this demonstration, the Borg MOEA [22] was used to discover solutions to the GAA problem. Five random seed trials of the Borg MOEA were run, and the overall reference set was taken as the PA set across all runs. Figures 7.5 and 7.6 show the PA set using a parallel coordinate plot and a scatter plot matrix, respectively. The PA set contains 3,462 non-dominated solutions. The large number of solutions poses a challenge for DMs who must select a single design (or a small subset) to move forward with. While two dimensional patterns may be discerned from Fig. 7.6 and the parallel coordinate representation in Fig. 7.5 may be optimised (see guidance in [1, 37] and [90]), the shear number of candidate solutions makes design selection a challenging task. This challenge is an example of information overload [40], which can paralyse DMs or motivate them to seek simpler formulations. One way DMs can break this paralysis is to brush the solution set down to a small set of maximally diverse alternatives [8, 77]. This set of solutions may then be used as a starting point for further exploration. For this task, linked views are critical tools. Suppose DMs are interested in looking into three potential subsets of aircraft designs: (1) an economical design, featuring low direct operating cost (DOC) and low purchase cost (PURCH) and high commonality between design parameters of the three families
202
J. Hakanen et al.
Fig. 7.6 A scatter plot matrix of the PA set of the ten objective GAA problem illustrating the pairwise trade-offs. The solution set contains 3,462 possible designs
Fig. 7.7 Brushing criteria set by DMs. Purple brushes represent comfort criteria (roughness < 1.8 and take off noise < 73.3 dB), green brushes represent economic criteria (purchase price < $42,500, direct operating cost < 60 and PFPF < .8) and orange brushes represent performance criteria (range > 2160 nmi, lift/drag ratio > 15 and max cruising speed > 200 knots). Note that the solutions brushed for comfort were also brushed in the decision space (SEATW>18.5”), which is not shown here
7 Visualisation for Decision Support in Many-Objective Optimisation …
203
Fig. 7.8 A set of linked views showing the three families of solutions across different subspaces. Solutions that represent economical designs are shown in green, solutions representing comfortable designs are in purple and high performance designs in orange. Subplot a contains a parallel coordinate plot of the 10 objective GAA PA front. Subplot b shows solution performance in the economy subspace, subplot c contains the comfort and subplot d depicts the performance subspace
of aircraft (represented through a low PFPF value), (2) a comfortable design featuring low take off noise (NOISE), low roughness (ROUGH) and featuring seat widths (a decision variable, SEATW) always greater than 18.5”, and (3) a high performance design featuring high Lift/Drag Ratio (LDMAX), a long range (RANGE) and a high max cruising velocity (VCMAX). To discover these subsets of solutions, DMs put forth three sets of specifications shown in Fig. 7.7. In this version of the parallel coordinate plot, the axes have been reordered to improve the visibility of trade-offs, a task that remains difficult while viewing the entire PA set. Note that the comfort criteria includes specifications in the decision space (seat width > 18.5 for all three aircraft families), which is not shown in Fig. 7.7. While seat width is not a modeled objective, seat widths that are too narrow will cause discomfort for larger pilots and passengers. Figure 7.8 shows the three subsets of objective vectors generated through brushing on a parallel coordinate plot and in scatter plots within the three subspaces of the objective space which correspond the criteria used for brushing above (here termed “Economy Subspace”, “Comfort Subspace” and “Performance Subspace”) The solutions shown are the only ones that meet the criteria for each subspace. Numerical values of the highlighted solutions are also shown in Table 7.4. Complementing visual representation with numerical data is important so that DMs can access details of the individual solutions [40]. Examination of the brushed solutions reveals several interesting findings about the solution set. First, designs that favor each set of DM
204
J. Hakanen et al.
Table 7.4 Solutions found through brushing with three sets of criteria. Solutions E1 and E2 were brushed to favor economic objectives PURCH and DOC as well as commonality objective PFPF. Solutions P1 and P2 were brushed to favor performance objectives LDMAX, VCMAX and RANGE. Solutions C1, C2 and C3 were brushed to favor comfort objectives ROUGH and NOISE as well as decision variable SEATW ROUGH MIN (ratio)
NOISE MIN (dB)
PFPF MIN (–)
RANGE MAX (nmi)
LDMAX VCMAX WEMP MAX MAX MIN (ratio) (knots) (lb)
PURCH MIN ($)
DOC MIN ($/hr)
WFUEL MIN (lb)
E1
1.89
73.65
0.76
2123
15.02
199
1890
42444
60
449
E2
1.86
73.58
0.72
2108
14.92
199
1890
42435
60
452
P1
1.95
73.28
1.40
2170
15.67
200
1887
42445
61
465
P2
1.98
73.77
0.85
2165
15.74
200
1889
42551
58
459
C1
1.79
73.29
0.25
2002
14.48
193
1961
43539
72
441
C2
1.79
73.29
0.27
2002
14.33
193
1960
43511
65
435
C3
1.79
73.29
0.35
2005
14.41
195
1938
43222
63
434
criteria exist within the PA set. Should DMs seek to produce three separate lines of aircraft, each tailored to different market segments, these designs represent a promising starting point. Second, though the comfort criteria did not include PFPF, solutions brushed for comfort had very low PFPF objective values, indicating high commonality across the three families of aircraft. This may be due to the inclusion of seat width as a comfort criteria, suggesting that the seat width decision variable (SEATW) may have a large influence on commonality objective PFPF. Third, though the family of solutions selected for comfort has a strong trade-off with the other two families, this trade-off does not exist between performance and economic favoring solutions. In fact, the high performance solutions only require very small sacrifices in economic criteria, as can be seen in Fig. 7.8b. This illustrates a strength of manyobjective decision making, if DMs had optimised using a priori economics criteria, they may have settled for lower performing solutions in the name of economy, when high performing solutions were available at only a slightly higher cost. DMs in the GAA problem are also interested in selecting designs that have similar parts across the three families of aircraft (2 seat, 4 seat and 6 seat) to increase the efficiency of production. Figure 7.9 contains heatmaps of the decision variable values of candidate designs. Each subplot represents a decision variable, rows represent the aircraft family and columns represent each solution. The colour of each cell reflects the value of the decision variable for that solution/family. Similar coloured columns indicate similar decision variable values across families. Solutions brushed with economic criteria tend to have similar decision variable values across families of aircraft, supporting the earlier suggestion due to their low value of the PFPF objective. High performance aircraft have different decision variable values across the three design families, particularly in variables DPROP, CSPD, AR, WINGLD and SWEEP, a potential drawback not apparent in Fig. 7.8. The solutions highlighted in this demonstration may or may not contain a final design chosen by DMs. Rather than providing a one step methodology for selecting
7 Visualisation for Decision Support in Many-Objective Optimisation …
205
Fig. 7.9 The decision variable vectors of selected solutions. Each subplot represents a different aircraft design parameter (i.e., a decision variable), each row represents the parameter value for the 2, 4 and 6 seat aircraft family and each column represents a different candidate solution. The value of each decision variable is represented by the shading of each cell, with dark shading indicating high variable values and light indicating low. Columns with uniform vertical colours indicate high similarity across aircraft families
a design, we intend this example to demonstrate one part of the broader constructive decision making process shown in Fig. 7.1. Learning from the initial exploration, DMs not satisfied with the highlighted solutions can choose to either adjust their criteria and explore alternative solutions or use the information gained from this analysis to adjust the problem formulation, restarting the constructive decision making cycle.
7.5 Future Research Directions Next we turn the attention to important challenges for visualisation to support manyobjective decision making. First, we see a need for more accessible, flexible and interactive visualisation toolkits to accompany many-objective optimisation software. By accessible, we refer to open-source/non-commercial toolkits that do not require coding knowledge to use. While visualisation is widely recognised as an important tool for many-objective optimisation, the number of currently available implementations that fit these criteria is low (see Table 7.1). When developing future software packages, developers should include appropriate visualisation capabilities or be mindful of how they may easily link to external capabilities [23]. Building collaborations with visualisation experts is the most effective way to improve visualisation capabilities, though this is not always feasible. As an alternative, developers
206
J. Hakanen et al.
should look to get guidance from the literature on visualisation for many-objective optimisation (see, e.g., [9, 23, 46]). An important concern is user interface design which cannot be separated from visualisations. Of the visualisation toolkits that are available for many-objective optimisation, many lack flexibility, which we define as the ability to incorporate coordinated multiple views and/or manipulate and brush views interactively in real-time. Many of the currently available interactive toolkits only support a limited number of views, while existing toolkits that support multiple views are often not interactive. Increasing the number of accessible, flexible and interactive visualisation toolkits that allow DMs and analysts to easily import data, customise their own visualisations and dynamically link and explore data across multiple views will improve the visual capabilities of many-objective optimisation software. In addition, more effort should be devoted to the development of tools for understanding algorithmic challenges. These may be in the form of visualisable test problems that can help in understanding the progress of optimisation methods [15] or development of new indicators that can be used to conveniently visualise the progress of optimisation. Future visualisation toolkits should be designed to easily co-evolve with rapid advances in algorithmic development. Beyond software, there is a need for continued methodological innovation to address the challenges presented in visualisation for many-objective decision making. In addition to continued innovation in the visualisation of solution set properties, which has been the subject of many recent works, more focus should be given to other elements of constructive decision making process. There is a need for new tools that aid constructive problem formulation, linking the objective and the decision spaces, decision making for multiple DMs, including DM’s preferences in solution generation, selection, and understanding how uncertainty, robustness and sensitivity affect these preferences. Overall, advanced and customisable visualisations play an important role in supporting the DM both when analysing solutions and directing the solution process in generating new solutions of interest. On the algorithmic side, new tools should focus on improving understanding of the challenge of balancing diversity and convergence in solution sets by capturing the dynamic behavior of algorithms. With appropriate visualisations, DMs can gain valuable insight in the phenomena involved. Finally, visualisation techniques for touch based interactions present in mobile devices are urgent since users tend to use mobile devices in an increasing rate. Utilisation of virtual/augmented reality in visualisation for manyobjective decision making is an area that has not gained enough interest.
7.6 Conclusions In this chapter, we have discussed how visualisation can support different phases of constructive decision making for many-objective optimisation problems in enhancing the interaction between algorithms and DMs. Our focus has been in decision support perspectives. In addition, we have provided an overview of state-of-the-art visualisation techniques for visualising many-objective solution sets and how visu-
7 Visualisation for Decision Support in Many-Objective Optimisation …
207
alisation has been integrated into existing many-objective optimisation approaches. We have demonstrated the usefulness of advanced visualisation techniques based on coordinated multiple views in real-world decision making related to general aviation aircraft family product design to obtain insight from a large solution set having ten objectives. Finally, we discussed current challenges and outlined outstanding needs that should be addressed in future research. To summarise, a lot of research has been done in connecting visualisation to many-objective optimisation but there still exists significant potential for improvement by increasing accessibility and flexibility of advanced visualisation toolkits and utilisation of novel technology like touch based interactions and virtual/augmented reality. Author Contribution Statement All the authors contributed to the development of the ideas for the chapter, paper coherence, drafting, and Sect. 7.6. The authors were primarily involved with writing sections as follows: Jussi Hakanen (7.1, 7.3.2, and 7.5), David Gold (7.1, 7.2, 7.3.1, and 7.4), Kaisa Miettinen (7.1, 7.5), and Patrick Reed (7.1, 7.3.2). Acknowledgements This research was supported by the Academy of Finland (grant no 311877) and is related to the thematic research area DEMO (Decision Analytics utilising Causal Models and Multiobjective Optimisation, jyu.fi/demo) of the University of Jyväskylä. Partial funding for this work was provided by the National Science Foundation (NSF), Innovations at the Nexus of Food-Energy-Water Systems, Track 2 (Award 1639268).
References 1. K. Aldwib, A.A. Bidgoli, S. Rahnamayan, A. Ibrahim, Proposing a pareto-VIKOR ranking method for enhancing parallel coordinates visualization, in International Conference on Computer Science & Education (ICCSE 2019), (IEEE Press, 2019), pp. 895–902 2. V. Belton, T. Stewart, Problem structuring and multiple criteria decision analysis, in Trends in Multiple Criteria Decision Analysis, ed. by M. Ehrgott, J. Figueira, S. Greco (Springer, 2010), pp. 209–239 3. W. Berger, H. Piringer, Interactive visual analysis of multiobjective optimizations, in IEEE Symposium on Visual Analytics Science and Technology, (IEEE Press, 2010), pp. 215–216 4. J. Blank, K. Deb, pymoo: multi-objective optimization in python. IEEE Access. 8, 89497– 89509 (2020) 5. X. Blasco, Interactive tool for decision making in multiobjective optimization with level diagrams. https://www.mathworks.com/matlabcentral/fileexchange/62224-interactive-tool-fordecision-making-in-multiobjective-optimization-with-level-diagrams. Accessed 19 Feb 2021 6. M. Bostock, V. Ogievetsky, J. Heer, D 3 : data-driven documents. IEEE Trans. Vis. Comput. Graph. 17(12), 2301–2309 (2011) 7. J. Branke, K. Deb, K. Miettinen, R. Słowi´nski (eds), Multiobjective Optimization: Interactive and Evolutionary Approaches. Lecture Notes in Computer Science, vol. 5252 (Springer, Heidelberg, 2008) 8. E.D. Brill, J.M. Flach, L.D. Hopkins, S. Ranjithan, MGA: a decision support system for complex, incompletely defined problems. IEEE Trans. Syst. Man Cybern. 20(4), 745–757 (1990) 9. S. Cajot, N. Schüler, M. Peter, A. Koch, F. Maréchal, Interactive optimization with parallel coordinates: exploring multidimensional spaces for decision support. Front. ICT. 5, 1–28 (2019)
208
J. Hakanen et al.
10. W. Chen, J.G. Elliot, T.W. Simpson, J. Virasak, Designing a general aviation aircraft as an open engineering system. Design Report for ME8104, Georgia Institute of Technology, (USA, 1995) 11. J.C.M. Climaco, C.H. Antunes, Implementation of a user-friendly software package–a guided tour of TRIMAP. Math. Comput. Model. 12(10–11), 1299–1309 (1989) 12. V.G. da Fonseca, C.M. Fonseca, A.O. Hall, Inferential performance assessment of stochastic optimisers and the attainment function, in Evolutionary Multi-criterion Optimization (EMO), (Springer, 2001), pp. 213–225 13. K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms, (Wiley, Chichester, UK, 2001) 14. S. Eker, J.H. Kwakkel, Including robustness considerations in the search phase of manyobjective robust decision making. Environ. Model. & Softw. 105, 201–216 (2018) 15. J. Fieldsend, T. Chugh, R. Allmendinger, K. Miettinen, A feature rich distance-based manyobjective visualisable test problem generator, in Genetic and Evolutionary Computation Conference (GECCO), (ACM Press, 2019), pp. 541–549 16. B. Filipiˇc, T. Tušar, A taxonomy of methods for visualizing pareto front approximations, in Genetic and Evolutionary Computation Conference (GECCO), (ACM Press, 2018), pp. 649– 656 17. C.M. Fonseca, C.A. Antunes, R. Lacour, K. Miettinen, P.M. Reed, T. Tušar, Visualization in multiobjective optimization, in Understanding Complexity in Multiobjective Optimization (Dagstuhl Seminar 15031), (Dagstuhl Zentrum für Informatik, 2015), pp. 129–139 18. H. Gao, H. Nie, K. Li, Visualisation of pareto front approximation: a short survey and empirical comparisons, in Congress on Evolutionary Computation (CEC), (IEEE Press, 2019), pp. 1750– 1757 19. D.F. Gold, P.M. Reed, B.C. Trindade, G.W. Characklis, Identifying actionable compromises: navigating multi-city robustness conflicts to discover cooperative safe operating spaces for regional water supply portfolios. Water Resour. Res. 55(11), 9024–9050 (2019) 20. A. Hadjimichael, D. Gold, D. Hadka, P. Reed, Rhodium: Python library for many-objective robust decision making and exploratory modeling. J. Open Res. Softw. 8(1), 12 (2020) 21. D. Hadka, J. Herman, P.M. Reed, K. Keller, An open source framework for many-objective robust decision making. Environ. Model. & Softw. 74, 114–19 (2015) 22. D. Hadka, P.M. Reed, Borg: an auto-adaptive many-objective evolutionary computing framework. Evol. Comput. 21(2), 231–259 (2013) 23. J. Hakanen, K. Miettinen, K. Matkovi´c, Task-based visual analytics for interactive multiobjective optimization. J. Oper. Res. Soc. 72(9), 2073–2090 (2021) 24. M. Hartikainen, K. Miettinen, K. Klamroth, Interactive nonconvex pareto navigator for multiobjective optimization. Eur. J. Oper. Res. 275(1), 238–251 (2019) 25. Z. He, G.G. Yen, Visualization and performance metric in many-objective optimization. IEEE Trans. Evol. Comput. 20(3), 386–402 (2016) 26. J.D. Herman, P.M. Reed, H.B. Zeff, G.W. Characklis, How should robustness be defined for water systems planning under change? J. Water Resour. Plan. Manag. 141(10), 04015012 (2015) 27. J.D. Herman, H.B. Zeff, P.M. Reed, G.W. Characklis, Beyond optimality: multistakeholder robustness tradeoffs for regional water portfolio planning under deep uncertainty. Water Resour. Res. 50(10), 7692–7713 (2014) 28. R. Hernández Gómez, C.A.C. Coello, E. Alba, A multi-objective evolutionary algorithm based on parallel coordinates, in Genetic and Evolutionary Computation Conference (GECCO), (ACM Press, 2016), pp. 565–572 29. J. Hettenhausen, A. Lewis, T. Kipouros, A web-based system for visualisation-driven interactive multi-objective optimisation. Procedia Comput. Sci. 29, 1915–1925 (2014) 30. J. Hettenhausen, A. Lewis, S. Mostaghim, Interactive multi-objective particle swarm optimization with heatmap-visualization-based user interface. Eng. Optim. 42(2), 119–139 (2010) 31. J. Hettenhausen, A. Lewis, M. Randall, T. Kipouros, Interactive multi-objective particle swarm optimisation using decision space interaction, in Congress on Evolutionary Computation (CEC), (IEEE Press, 2013), pp. 3411–3418
7 Visualisation for Decision Support in Many-Objective Optimisation …
209
32. P. Hoffman, G. Grinstein, K. Marx, I. Grosse, E. Stanley, DNA visual and analytic data mining, in Visualization’97 (Cat. No. 97CB36155), (IEEE Press, 1997), pp. 437–441 33. A. Ibrahim, S. Rahnamayan, M.V. Martin, K. Deb, 3d-radvis: visualization of Pareto front in many-objective optimization, in Congress on Evolutionary Computation (CEC), (IEEE Press, 2016), pp. 736–745 34. A. Ibrahim, S. Rahnamayan, M.V. Martin, K. Deb, 3d-radvis antenna: visualization and performance measure for many-objective optimization. Swarm Evol. Comput. 39, 157–176 (2018) 35. A. Ibrahim, S. Rahnamayan, M.V. Martin, K. Deb, Enhanced correlation matrix based visualization for multi-and many-objective optimization, in IEEE Symposium Series on Computational Intelligence (SSCI), (IEEE Press, 2018), pp. 2345–2352 36. A. Inselberg, Multidimensional detective, in VIZ’97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium, (IEEE Press, 1997), pp. 100– 107 37. J. Johansson, C. Forsell, Evaluation of parallel coordinates: overview, categorization and guidelines for future research. IEEE Trans. Vis. Comput. Graph. 22(1), 579–588 (2016) 38. R.L. Keeney, Value-focused thinking: identifying decision opportunities and creating alternatives. Eur. J. Oper. Res. 92(3), 537–549 (1996) 39. J. Kehrer, H. Hauser, Visualization and visual analysis of multifaceted scientific data: a survey. IEEE Trans. Vis. Comput. Graph. 19(3), 495–513 (2013) 40. D. Keim, G. Andrienko, J.-D. Fekete, C. Görg, J. Kohlhammer, G. Melançon, Visual analytics: definition, process, and challenges, in Information Visualization: Human-Centered Issues and Perspectives, (Springer, 2008), pp. 154–175 41. T. Kohonen, The self-organizing map. Proc. IEEE. 78(9), 1464–1480 (1990) 42. J.B. Kollat, P. Reed, A framework for visually interactive decision-making and design using evolutionary multi-objective optimization (VIDEO). Environ. Model. Softw. 22, 1691–1704 (2007) 43. P. Korhonen, J. Wallenius, Visualization in the multiple objective decision-making framework, in Multiobjective Optimization: Interactive and Evolutionary Approaches, ed. by J. Branke, K. Deb, K. Miettinen, R. Slowinski (Springer, 2008), pp. 195–212 44. R.J. Lempert, Shaping the Next One Hundred Years: New Methods for Quantitative Long-term Policy Analysis, (Rand Corporation, 2003) 45. M. Li, L. Zhen, X. Yao, How to read many-objective solution sets in parallel coordinates. IEEE Comput. Intell. Mag. 12(4), 88–100 (2017) 46. J. Liu, T. Dwyer, K. Marriott, J. Millar, A. Haworth, Understanding the relationship between interactive optimisation and visual analytics in the context of prostate brachytherapy. IEEE Trans. Vis. Comput. Graph. 24(1), 319–329 (2018) 47. S. Liu, D. Maljovec, B. Wang, P.-T. Bremer, V. Pascucci, Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Vis. Comput. Graph. 23(3), 1249–1268 (2017) 48. A.V. Lotov, V.A. Bushenkov, G.K. Kamenev, Interactive Decision Maps: Approximation and Visualization of Pareto Frontier, (Kluwer Academic Publishers, 2004) 49. A.V. Lotov, K. Miettinen, Visualizing the Pareto frontier, in Multiobjective Optimization: Interactive and Evolutionary Approaches, ed. by J. Branke, K. Deb, K. Miettinen, R. Slowinski (Springer, 2008), pp. 213–243 50. K. Matkovi´c, D. Graˇcanin, R. Splechtna, M. Jelovi´c, B. Stehno, H. Hauser, W. Purgathofer, Visual analytics for complex engineering systems: hybrid visual steering of simulation ensembles. IEEE Trans. Vis. Comput. Graph. 20(12), 1803–1812 (2014) 51. K. Miettinen, Nonlinear Multiobjective Optimization, (Kluwer Academic Publishers, 1999) 52. K. Miettinen, IND-NIMBUS for demanding interactive multiobjective optimization, in Multiple Criteria Decision Making (MCDM), The Karol Adamiecki University of Economics in Katowice, (2006), pp. 137–150 53. K. Miettinen, Survey of methods to visualize alternatives in multiple criteria decision making problems. OR Spectr. 36(1), 3–37 (2014) 54. K. Miettinen, J. Hakanen, D. Podkopaev, Interactive nonlinear multiobjective optimization methods, in Multiple Criteria Decision Analysis: State of the Art Surveys, ed. by S. Greco, M. Ehrgott, J. Figueira, 2nd edn. (Springer, 2016), pp. 931–980
210
J. Hakanen et al.
55. K. Miettinen, M.M. Mäkelä, Interactive multiobjective optimization system WWW-NIMBUS on the internet. Comput. & Oper. Res. 27(7), 709–723 (2000) 56. K. Miettinen, M.M. Mäkelä, Synchronous approach in interactive multiobjective optimization. Eur. J. Oper. Res. 170, 909–922 (2006) 57. T. Mühlbacher, L. Linhardt, T. Möller, H. Piringer, TreePOD: sensitivity-aware selection of pareto-optimal decision trees. IEEE Trans. Vis. Comput. Graph. 24(1), 174–183 (2018) 58. T. Munzner, Visualization Analysis & Design, (CRC Press, 2014) 59. V. Ojalehto, K. Miettinen, DESDEO: an open framework for interactive multiobjective optimization, in Multiple Criteria Decision Making and Aiding: Cases on Models and Methods with Computer Implementations, ed. by S. Huber, M. Geiger, A. de Almeida (Springer, 2019), pp. 67–94 60. V. Ojalehto, K. Miettinen, T. Laukkanen, Implementation aspects of interactive multiobjective optimization for modeling environments: the case of GAMS-NIMBUS. Comput. Optim. Appl. 58(3), 757–779 (2014) 61. S. Pajer, M. Streit, T. Torsney-Weir, F. Spechtenhauser, T. Möller, H. Piringer, Weightlifter: visual weight space exploration for multi-criteria decision making. IEEE Trans. Vis. Comput. Graph. 23(1), 611–620 (2017) 62. A. Pryke, S. Mostaghim, A. Nazemi, Heatmap visualization of population based multi objective algorithms, in Evolutionary Multi-criterion Optimization (EMO), (Springer, 2007), pp. 361– 375 63. W.J. Raseman, J. Jacobson, J.R. Kasprzyk, Parasol: an open source, interactive parallel coordinates library for multi-objective decision making. Environ. Model. & Softw. 116, 153–163 (2019) 64. J.C. Roberts, State of the art: coordinated multiple views in exploratory visualization, in Coordinated and Multiple Views in Exploratory Visualization (CMV), (IEEE Press, 2007), pp. 61–71 65. A. Ruiz, F. Ruiz, K. Miettinen, L. Delgado-Antequera, V. Ojalehto, NAUTILUS navigator: free search interactive multiobjective optimization without trading-off. J. Glob. Optim.74(2), 213–231 (2019) 66. J.W. Sammon, A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 100(5), 401–409 (1969) 67. H. Sato, K. Tomita, M. Miyakawa, Preferred region based evolutionary multi-objective optimization using parallel coordinates interface, in International Symposium on Computational and Business Intelligence, (IEEE Press, 2015), pp. 33–38 68. R.A. Shah, P.M. Reed, T.W. Simpson. Many-objective evolutionary optimisation and visual analytics for product family design, in Multi-Objective Evolutionary Optimisation for Product Design and Manufacturing, ed. by L. Wang, A.H.C. Ng, K. Deb (Springer, 2011), pp. 137–159 69. T. Simpson, J. Allen, W. Chen, F. Mistree, Conceptual design of a family of products through the use of the robust concept extrapolation method, in Symposium on Multidisciplinary Analysis and Optimization, (1996), pp. 1535–1545 70. G. Stump, S. Lego, M. Yukish, T.W. Simpson, J.A. Donndelinger, Visual steering commands for trade space exploration: user-guided sampling with example. J. Comput. Inf. Sci. Eng. 9(4), 044501–1–044501–10 (2009) 71. G.M. Stump, S.W. Miller, M.A. Yukish, C.M. Farrell, Employing multidimensional data visualization tools to assess the impact of constraint uncertainties on complex design problems, in ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, (ASME, 2017) V02AT03A016 72. A.K.A. Talukder, K. Deb, PaletteViz: a visualization method for functional understanding of high-dimensional pareto-optimal data-sets to aid multi-criteria decision making. IEEE Comput. Intell. Mag. 15(2), 36–48 (2020) 73. S. Tarkkanen, K. Miettinen, J. Hakanen, H. Isomäki, Incremental user-interface development for interactive multiobjective optimization. Expert. Syst. Appl. 40, 3220–3232 (2013) 74. J.B. Tenenbaum, V. De Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Sci. 290(5500), 2319–2323 (2000)
7 Visualisation for Decision Support in Many-Objective Optimisation …
211
75. J.J. Thomas, K.A. Cook, A visual analytics agenda. IEEE Comput. Graph. Appl. 26(1), 10–13 (2006) 76. Y. Tian, R. Cheng, X. Zhang, Y. Jin, PlatEMO: a MATLAB platform for evolutionary multiobjective optimization. IEEE Comput. Intell. Mag. 12(4), 73–87 (2017) 77. H.L. Trinkaus, T. Hanne, knowCube: a visual and interactive support for multicriteria decision making. Comput. Oper. Res. 32(5), 1289–1309 (2005) 78. A. Tsoukiàs, From decision theory to decision aiding methodology. Eur. J. Oper. Res. 187(1), 138–161 (2008) 79. P.A. Tukey, J.W. Tukey, Preparation; prechosen sequences of views, in Interpreting Multivariate Data, ed. by V. Barnett (Wiley, 1981), pp. 189–213 80. T. Tušar, B. Filipiˇc, Visualizing exact and approximated 3D empirical attainment functions. Math. Probl. Eng. 1–18, 2014 (2014) 81. T. Tušar, B. Filipiˇc, Visualization of pareto front approximations in evolutionary multiobjective optimization: a critical review and the prosection method. IEEE Trans. Evol. Comput. 19(2), 225–245 (2015) 82. T. Wachowicz, G.E. Kersten, E. Roszkowska, How do I tell you what I want? agent’s interpretation of principal’s preferences and its impact on understanding the negotiation process and outcomes. Oper. Res. 19(4), 993–1032 (2019) 83. D.J. Walker, R.M. Everson, J.E. Fieldsend, Visualizing mutually nondominating solution sets in many-objective optimization. IEEE Trans. Evol. Comput. 17(2), 165–184 (2013) 84. W.E. Walker, P. Harremoës, J. Rotmans, J.P. Van Der Sluijs, M.B.A. Van Asselt, P. Janssen, M.P. Krayer von Krauss, Defining uncertainty: a conceptual basis for uncertainty management in model-based decision support. Integr. Assess. 4(1), 5–17 (2003) 85. R. Wang, R.C. Purshouse, P.J. Fleming, Whatever Works Best for You- a new method for a priori and progressive multi-objective optimisation, in Evolutionary Multi-criterion Optimization (EMO), (Springer, 2013), pp. 337–351 86. C. Ware, Information Visualization: Perception for Design, 4th edn. (Morgan Kaufmann, 2019) 87. G.H. Weber, H. Hauser, Interactive visual exploration and analysis, in Scientific Visualization, ed. by C.D. Hansen, M. Chen, C.R. Johnson, A.E. Kaufman, H. Hagen. Mathematics and Visualization, (Springer, 2014), pp. 161–173 88. M.J. Woodruff, P.M. Reed, T.W. Simpson, Many objective visual analytics: rethinking the design of complex engineered systems. Struct. Multidiscip. Optim. 48(1), 201–219 (2013) 89. B. Xin, L. Chen, J. Chen, H. Ishibuchi, K. Hirota, B. Liu, Interactive multiobjective optimization: a review of the state-of-the-art. IEEE Access. 6, 41256–41279 (2018) 90. M.-H. Xiong, W. Xiong, P. Jian, Visualization of the non-dominated solutions in many-objective optimization, in 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC), (IEEE Press, 2019), pp. 188–195 91. L. Zhen, M. Li, R. Cheng, D. Peng, X. Yao, Adjusting parallel coordinates for investigating multi-objective search, in Simulated Evolution and Learning, (Springer, 2017), pp. 224–235 92. Y. Zhou-Kangas, K. Miettinen, Decision making in multiobjective optimization problems under uncertainty: balancing between robustness and quality. OR Spectr. 41(2), 391–413 (2019) 93. Y. Zhou-Kangas, K. Miettinen, K. Sindhya, Solving multiobjective optimization problems with decision uncertainty: an interactive approach. J. Bus. Econ. 89(1), 25–51 (2019) 94. G. Ochoa, S. Verel, F. Daolio, M. Tomassini. Local optima networks: a new model of combinatorial fitness landscapes. eds. H. Richter, A. Engelbrecht, Recent Advance in theTheory and Application of Fitness Landscapes, pages 233–262. Springer (2014) 95. P. Kerschke, C. Grimme. An expedition to multimodal multi-objective optimization landscapes, eds. H. Trautmann, G. Rudolph, K. Klamroth, O. Schütze, M. Wiecek, Y. Jin, C. Grimme, Evolutionary Multi-Criterion Optimization: 9th International Conference, EMO, Proceedings, (Springer, Berlin, Heidelberg, 2017) p. 329–343 96. L. Schäpermeier, C. Grimme, P. Kerschke. One plot to show them all: visualization of efficient sets in multi-objective landscapes. eds. T. Bäck, M. Preuss, A. Deutz, H. Wang, C. Doerr, M. Emmerich, H. Trautmann, Parallel Problem Solving from Nature (PPSN), (Springer, 2020) p. 154–167
212
J. Hakanen et al.
97. J. Branke. Evolutionary Optimization in Dynamic Environments. (Springer, 2012) 98. K. Deb, L. Thiele, M. Laumanns, E. Zitzler. Scalable test problems for evolutionary multiobjective optimization. eds. A. Abraham, L. Jain, R. Goldberg, Evolutionary Multiobjective Optimization, (Springer, 2005) p. 105–145 99. L. Van der Maaten, G. Hinton. Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008) 100. X. Blasco, J.M. Herrero, G. Reynoso-Meza, M.A.M. Iranzo. Interactive tool for analysing multiobjective optimization results with level diagrams, in The Genetic and Evolutionary Computation Conference (GECCO), (2017) p. 1689–1696 101. L. Schäpermeier, C. Grimme, P. Kerschke. To boldly show what no one has seen before: A dashboard for visualizing multi-objective landscapes, in Evolutionary Multi-criterion Optimization (EMO 2021), (Springer, Berlin, Heidelberg, 2021) To appear
Chapter 8
Theoretical Aspects of Subset Selection in Multi-Objective Optimisation Andreia P. Guerreiro, Kathrin Klamroth, and Carlos M. Fonseca
Abstract In multi-objective optimisation, it is common to transform the multiobjective optimisation problem into a (sequence of) single-objective problems, and then compute or approximate the solution(s) of these transformed problems. Scalarisation methods are one such example where a set of solutions is determined by solving a sequence of single-objective problems. Another example are indicator-based methods where the aim is to determine, at once, a set of solutions that maximises a given set-quality indicator, i.e., a single-objective function. The aim of this chapter is to explore the connections between set-quality indicators and scalarisations, and discuss the corresponding theoretical properties. In particular, the connection between the optimal solutions of the original multi-objective problem and the optimal solutions of the single-objective problems into which it is transformed is considered.
8.1 Introduction Many real-world optimisation problems are naturally modelled with multiple objective functions. The typically conflicting nature of such objectives usually prevents the existence of a single optimal solution that simultaneously minimises (or maximises) all objectives. Instead, a set of Pareto-optimal solutions is sought, whose image in objective space, the so-called Pareto front, reflects the trade-offs between solutions regarding objective values. The Decision Maker (DM), e.g., the person or entity interested in solving the problem, usually seeks a single solution that better fits A. P. Guerreiro (B) INESC-ID, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal e-mail: [email protected] K. Klamroth University of Wuppertal, Gaußstr. 20, 42119 Wuppertal, Germany e-mail: [email protected] C. M. Fonseca Department of Informatics Engineering, Polo II, University of Coimbra, CISUC, Pinhal de Marrocos, 3030-290 Coimbra, Portugal e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_8
213
214
A. P. Guerreiro et al.
their (subjective) preferences. Due to imperfect knowledge and, consequently, the uncertainty regarding DM preferences, the problem of determining and searching for the best solution is nontrivial. The challenges in connection to multi-objective optimisation arise at several levels, and so does a vast range of theoretical aspects in connection to them. One of the challenges is related to the nature of the optimisation problems and the analysis of their computational complexity. Optimisation problems may be NP-hard already with two objectives even if the single-objective case can be solved efficiently (e.g., the multi-objective minimum spanning tree problem and the multi-objective shortest path problem) [21, 23]. This analysis is typically related to the size of the Pareto front that usually grows exponentially (for discrete problems) with respect to the input size, though a new trend is also to measure complexity with respect to the output size, i.e., the size of the Pareto front (see [5, 27]). The construction of preference models and how well they represent the DM’s preferences is another challenge (see [53]). Lastly, there are challenges associated with the different optimisation methods, such as exact and (meta)heuristic methods (e.g., evolutionary multi-objective optimisation algorithms). In such cases, relevant theoretical aspects include, for example, the ability of the optimisation methods to warrant (the most preferred) Pareto-optimal solutions, runtime analysis, optimal parameter choices, and performance assessment (see, for example, [11, 22]). Optimisation algorithms frequently solve the multi-objective optimisation problem by considering a related but somewhat simplified version of the problem, for example, by transforming it into a (sequence of) single-objective problems. This is the case of scalarisation methods for example, and also that of indicator-based evolutionary algorithms, which search for a set of solutions that maximise a given quality indicator. This chapter focuses on the theoretical aspects related to the connection between the optimal solutions of the simplified and original optimisation problems. It discusses to what extent the solutions of the single-objective problems are indeed Pareto-optimal solutions of the original multi-objective problem, as well as other related aspects, with a particular focus on problems with many (more than 4) objective functions. Section 8.2 provides background information, and Sect. 8.3 introduces important definitions for the remainder of the text. Theoretical aspects related to scalarisation functions and quality indicators are discussed in Sects. 8.4 and 8.5, respectively. The concluding remarks are drawn in Sect. 8.6.
8.2 Background Enumerating all Pareto-optimal solutions is not always viable because the size of the Pareto set (and of the Pareto front) can be very large, even infinite. Furthermore, it is usually impractical to ask the DM to select their most preferable solution from a large set of solutions. Thus, optimisation algorithms frequently have to decide which solutions to seek, which ones to keep, and which to discard. Although discarding
8 Theoretical Aspects of Subset Selection in Multi-Objective Optimisation
215
dominated solutions is trivial, deciding over Pareto-optimal solutions cannot be done with certainty without preference information. This is particularly problematic when the Pareto front is large and/or there are many (four or more) objectives, which allows for increasingly complex trade-offs (see Chap. 1).
8.2.1 Preference Articulation Preference information may be provided at different stages of the optimisation process. Traditionally, optimisation methods are classified according to when preference information is articulated, and if at all [36, 50]. The four common classes are nopreference, a posteriori, progressive (or interactive), and a priori methods. In the no-preference case, the DM never provides preference information, and the methods return a (any) Pareto-optimal solution. These methods are adequate if the DM has no expectations and is satisfied with any Pareto-optimal solution. However, even if the DM cannot express preference information upfront, it does not mean that any solution is acceptable. In such cases, preferences may be articulated a posteriori, where the optimisation method provides a set of solutions and the DM selects his/her most preferable solution among them. In that case, optimisation algorithms are typically used to either enumerate all Pareto-optimal solutions (which may not always be possible, or desirable), or to find a good representation (or at least, a good approximation) of the Pareto front, commonly understood as a finite set close to and well spread along the Pareto front. The DM may also provide information in a progressive way, i.e., during the optimisation process. As the objective space is explored, the DM is periodically requested to express preferences regarding (some of) the current solutions, so that the search progresses towards the most preferable one. Finally, the DM may express their preferences a priori, for example, in a (mathematical) form that fully ranks the solutions, or in the form of expectations. This allows the methods to discard some (non-interesting) Pareto-optimal solutions, and to focus on the preferable regions. The optimisation process does not have to be static in the sense that the DM can only express preferences in a way that fits only one of the above classes. Preference articulation can be viewed as a process where, iteratively, the DM may learn from the methods, refine their preferences, and the methods progressively focus on a reduced search region, which is particularly important in many-objective optimisation. In this process, the DM may or may not provide preference information upfront, and the optimisation method then searches for Pareto-optimal solutions according to the available preference information. The method may provide a diverse (finite) set of solutions that, in the absence of a priori preference information, can be taken as an approximation of the Pareto front from which the DM can learn about the search space. The DM can then either pick their preferable solution, or refine their preferences considering what they learnt so that a new run of an optimisation algorithm performs a more focused search. This process may be repeated until a satisfactory
216
A. P. Guerreiro et al.
solution is found [28]. During this process, the DM is also free to change their mind regarding their own preferences, and provide information that will make the method focus on different regions of the search space. Preference information can be expressed in different ways [53, 58]. The DM may express them by comparing pairs of solutions, indicating whether a solution is preferable or indifferent to another and, if applicable, the intensity of preference. The DM may also classify solutions, for example, by stating whether they are good or bad. Other examples include setting desirable objective function values, defining weights over the objective space, and establishing the relative importance of objectives. Such preferences are usually modelled either through a value function or a binary relation. In the former case, such a function induces a total pre-order over the solutions, and the problem becomes a single-objective problem. Ideally, in the latter case, such a relation over pairs of solutions induces a partial order on the objective space, as well as a partial pre-order on the search space, and is a refinement of Pareto dominance.
8.2.2 Decision-Making Problems The field of Multi-Criteria Decision Aiding (MCDA) envisages all the aiding stages of how an analyst helps the DM formalise the problem and express preferences, and develops the methodology that will lead to providing a recommendation to the DM in the end. It assumes the existence of a set of potential actions to choose from, which in general may or may not be feasible. In an optimisation scenario actions represent solutions, and the analyst’s aim is to recommend a solution. The type of recommendation sought is categorised according to three main problematics [57, 58]: the choice, sorting, and ranking problematics. Here, these will be referred to as decision-making problems, as the involvement of the analyst and the interactions with the DM will be disregarded, and it will be assumed that the (optimisation) problem at hand is fully formalised. The choice problem is focused on determining the best solution for the DM. The aim is thus to choose a subset of solutions, as small as possible, from which a solution will eventually be selected by, or recommended to, the DM. In the sorting problem, the goal is to assign the alternative solutions to pre-defined categories, which are not necessarily ordered. These categories are defined based on the fate of the actions in it. An example is the classification of an exam as “approved” or “not approved”. In the ranking problem, solutions are split into equivalence classes according to preferences (i.e., solutions in the same class are deemed equivalent preference-wise). These classes are not pre-specified. Instead, they depend on the ordering of solutions according to preferences, which may be only partial. The classical decision-making problems take a solution-quality point of view. Solutions are compared, categorised, and chosen based on whether the quality of the solution is deemed to be similar to, or better or worse than the quality of the other solutions. As a consequence, the typical goal of a posteriori methods does not quite fit any of these problems. Indeed, a posteriori methods do not focus on the best
8 Theoretical Aspects of Subset Selection in Multi-Objective Optimisation
217
solution, nor on determining a set as small as possible. The aim of such methods is usually to provide a good approximation of the Pareto front, from which the DM can then select a solution, and to support the DM in defining and expressing preferences. An example of a posteriori methods are Evolutionary Multi-objective Optimisation Algorithms (EMOAs). Modern EMOAs take a set-quality point of view, where the principal aim is not exactly to provide the best solution (as it cannot determine it with certainty) but to provide a good and diverse set of solutions. Solutions are judged based on their contribution to the quality of the set. As the “best” individual solution is not necessarily part of the best sets with cardinality larger than one, this can be viewed as a (new) problem that is different from choice, albeit more general. We emphasize that in classical decision-making problems, particularly choice and ranking problems, preferences are defined over solutions. In contrast to this, preferences are defined over sets of solutions when seeking solution sets. Considering that such preferences are expressed through a function that maps (the image in the objective space of) each subset of solutions to a scalar value that reflects its agreement with the set-preferences, this problem may be formalised as a subset selection problem [31, 55]. Problem 8.1 (Subset Selection Problem (SSP)) Given a ground set S, an integer k ≥ 0, and a function I : 2S → R, find a discrete subset A ⊆ S such that |A| ≤ k and I (A) = max I (B) . B⊆S |B|≤k
When the ground set S is the image in the objective space of some set of solutions of a multi-objective optimisation problem, S ⊂ Rm , and the function I defines the set-preferences of the DM, I is known as a set-quality indicator, and A is an indicator-optimal subset of size up to k. SSPs arise in many other contexts, from the classic Knapsack and Maximum Coverage problems to feature selection, sparse regression [55], and sensor placement [4], among many others, and may involve other types of constraints, such as general cost, dynamic cost, and matroid constraints [4, 19, 56].
8.2.3 Remarks Independently of the type of decision-making problems and the type and amount of preference information involved in a multi-objective optimisation process, there are theoretical properties of the associated preference models, including scalarising functions, binary relations, and set-quality indicators, that can be used to characterise those models and to better understand their implicit biases. Two key properties are not contradicting common sense (e.g., prefer a dominated solution over a dominating one) and not contradicting the DM’s preferences (e.g., explicitly favouring a solution over another preferable to it). Other related properties pertain to whether optimisation
218
A. P. Guerreiro et al.
algorithms based on such models are able to achieve and retain Pareto-optimal solutions, and to how the solutions returned by the algorithm are distributed in objective space.
8.3 Notation and Definitions For the sake of completeness, we briefly review the concept of Pareto dominance for an m-objective optimisation problem represented by the vector-valued function f : X |→ Rm [18, 20]. Definition 8.1 (Dominance) A point u ∈ Rm in the objective space is said to weakly dominate a point v ∈ Rm if u i ≤ vi for all 1 ≤ i ≤ m. This is represented as u ≤ v. If, in addition v /≤ u, then u is said to (strictly) dominate v, which is represented here as u < v. If u i < vi for all 1 ≤ i ≤ m, then u is said to strongly dominate v, and this is represented as u « v. Moreover, if neither u ≤ v nor v ≤ u, then u and v are said to be incomparable, or mutually nondominated. Accordingly, a (feasible) solution x ∈ X in the decision space is said to weakly, strictly, or strongly dominate a (feasible) solution y ∈ X if f(x) weakly, strictly, or strongly dominates f(y) in the objective space, respectively. Additionally, x and y are said to be indifferent if they both weakly dominate each other, i.e., if f(x) = f(y), whereas they are said to be incomparable if neither weakly dominates the other. Recall the following definitions in connection with the notion of optimality in multi-objective optimisation [20]. Definition 8.2 (Efficiency and nondominance) A feasible solution x ∈ X is said to be an efficient solution (or Pareto-optimal solution) if there is no other feasible solution y ∈ X such that y strictly dominates x, i.e., such that v < u, where u = f(x) and v = f(y). In that case, u is said to be a nondominated point. Moreover, a feasible solution x ∈ X is said to be a weakly efficient solution if there is no other solution y ∈ X such that y strongly dominates x, i.e., such that v « u, in which case, u is called a weakly nondominated point. Recall that the set of all efficient solutions is called the efficient set or the Pareto(-optimal) set, and the set of all nondominated points is called the nondominated set or the Pareto(-optimal) front, here represented by P∗ . Moreover, a set that consists of mutually nondominated points is called a nondominated (point) set. The dominance relations in the objective space formulated in Definition 8.1 are extended to arbitrary point sets [31]: Definition 8.3 (Set dominance) A set A ⊂ Rm is said to weakly dominate a set B ⊂ Rm if ∀b∈B , ∃a∈A | a ≤ b. This is represented as A ≼ B. If A ≼ B and B /≼ A, then the set A is said to (strictly) dominate the set B, which is represented as A ≺ B.
8 Theoretical Aspects of Subset Selection in Multi-Objective Optimisation
219
If A / = ∅ and ∀b∈B , ∃a∈A | a < b, then A is said to (strictly) dominate B elementwise, which is denoted here by A ≺· B. Finally, A is said to strongly dominate B, which is denoted by A ≺≺ B, if A / = ∅ and ∀b∈B , ∃a∈A | a « b. Additionally, two point sets are said to be incomparable if neither weakly dominates the other, and they are said to be indifferent if both weakly dominate the other. In contrast to the point-based version of weak dominance, in the set-based version, two sets A, B ⊂ Rm that satisfy A ≼ B and B ≼ A do not imply A = B. Note that the set-dominance relations formulated in Definition 8.3 are well-known from the field of set-valued optimisation. In this context, the set-dominance relations ≼, ≺·, and ≺≺ correspond to lower-type set less order relations w.r.t. the dominance relations ≤, M−1 possible to investigate all positive and negative correlations in the range of [−1, 1] beyond two objectives. It should be noted that the correlation structure is defined on the component functions yi j rather than between the objectives. However, they provided a proof that shows that the expected correlation between a pair of objectives is the same as the correlation defined in the relevant cell of the correlation matrix C: E[corr( f d (x), f e (x))] = Cde . Thus, it is possible to precisely control the correlations between the objectives in this class of test problems. Clearly, the correlation structure here is global, as in the correlation between a pair of objectives is always constant. This suite of test problems is therefore a restrictive set, but certainly allows us to explore different characteristics of the interactions between solvers and landscapes. For a detailed treatment of the nature of these problems, and successful algorithms, please refer to [34–36].
262
T. Chugh et al.
9.5.2 Implicit Correlation 9.5.2.1
DTLZ and WFG Problems
DTLZ and WFG problems have been widely used in the literature to test several multiobjective optimisation algorithms. They have also been used to validate objective reduction approaches (see Sect. 9.4.3), especially DTLZ2, DTLZ5, DTLZ5(I , M) and WFG3. These problems are scalable in terms of the number of objectives and also decision variables. Consider the formulation for DTLZ2 in Eq. 9.20 where the total number of decision variables is given by N = M + k − 1, and k is a parameter that can be adjusted by the user to control the number of decision variables. For POF all decision variables in the g function (i.e. x M to x M+k−1 ) are fixed to 0.5, implying that g = 0, and the other decision variables (i.e. x1 to x M−1 ) can take any value within their bounds. The latter set of variables that are unconstrained in the POF are referred to as free variables. M−1 cos(θi ) Min. f 1 = (1 + g)∏i=1 M− j
Min. f j=2:M−1 = (1 + g)∏i=1 cos(θi ) sin(θ M− j+1 ) Min. f M = (1 + g) sin(θi ) s.t. 0 ≤ xi ≤ 1, for i = 1, . . . , M, π θi = xi , for i = 1, . . . , M − 1, 2 M+k−1 ∑ and g = (xi − 0.5)2 .
(9.20)
i=M
The number of free variables is given by M − 1, implying that for M = 2 there is one free variable (θ1 ), for M = 3 there are two free variables (θ1 and θ2 ), and so on. For M = 2 this implies that the POF is a one-dimensional curve since there is only one degree of freedom provided by θ1 , as pointed out by the arrow in Fig. 9.3a. For M = 3 the POF is a two-dimensional surface where two degrees of freedom are provided by θ1 and θ2 , one samples across the horizontal plane and the other across the vertical one as depicted in Fig. 9.3b. For higher dimensions the POF is a hypersurface. Hence, two conflicting objectives implies one free variable and a one-dimensional POF; three conflicting objectives implies two free variables and a two-dimensional POF; and this generalises to m ≤ M conflicting objectives which implies m − 1 free variables and a (m − 1)-dimensional POF. In DTLZ2 m = M and all objectives are equally conflicting, implying that they are not positively correlated. Other DTLZ problems, such as DTLZ1, DTLZ3 and DTLZ4 also share the same property as pointed out in [30]. In DTLZ5 and DTLZ6, the problem can be reduced to only two conflicting objectives: f M−1 and f M . The rest of the objectives are positively correlated (i.e. no conflict) with f M−1 . Similarly, the POF for WFG3 degenerated into a linear hyperplane such that the first M − 1 objectives are perfectly correlated, while the last
9 Identifying Correlations in Understanding and Solving Multi-objective Problems
263
1
POF
0.8
1 0.8
f2
0.6
f3
0.4
0.6 0.4 0.2
0.2
0 0.2
0
f2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.4 0.6
f1
(a) DTLZ2 with 2 objectives
0.4 0.8
1
0.8
0.6
0.2
f1
(b) DTLZ2 with 3 objectives
Fig. 9.3 Representation of the POF for the DTLZ2 problem
objective is in conflict with all other objectives. Although WFG3 has been proposed as a degenerate test problem by [18], the Pareto front is actually not degenerated as argued by [20]. DTLZ5(I , M) is an extended version of DTLZ5. First described in [13], this test problem has been used as a benchmark in the objective reduction literature given its ability to control the dimensionality of the POF by the parameter I , such that I ≤ M. This is achieved by restricting the number of free variables and for this consider the problem formulation in Eq. 9.21. For POF the variables in the g function are fixed to 0.5, implying that g = 0 and θi=I :M−1 = π4 . The number of free variables is I − 1 implying that dimensionality of the POF is I − 1. As a result, the first M − I + 1 objectives are perfectly correlated, while the rest are in conflict with every other objective in the problem. M−1 cos(θi ) Min. f 1 = (1 + g)∏i=1 M− j
Min. f j=2:M−1 = (1 + g)∏i=1 cos(θi ) sin(θ M− j+1 ) Min. f M = (1 + g) sin(θ1 ) s.t. 0 ≤ xi ≤ 1, for i = 1, . . . , M g=
M+k−1 ∑
(xi − 0.5)2 ,
i=M
θi=1:I −1 = ci =
I −2 ∑
π π xi , θi=I :M−1 = (1 + 2gxi ), 2 4(1 + g)
2 pi 2 f M− j + 2 f i 1, for i = 1, . . . , M − I + 1,
j=0
p1 = M − 1 and pi=2:M−I +1 = (M − I + 2) − 2.
(9.21)
264
9.5.2.2
T. Chugh et al.
Radar Waveform Problem
The radar waveform problem, first described in [19], deals with the design a of waveform for a Pulsed Doppler Radar. This type of radars are typically used to equip an aircraft, and the primary aim is to locate and track other aircrafts during an air-toair role. For this, the radar needs to determine the range and velocity, and to take into account that the target aircraft may travel at very high velocity (Mach 5 possible) and its location could be more than 100 nautical miles away. The optimisation problem has a total of 9 objective functions, the first 8 are to be minimised, and the last one is to be maximised. The physical meaning of each objective is as follows: 1. 2. 3. 4.
Median range/velocity extent of target before schedule is not decodable ( f 1 / f 2 ); Median range/velocity extent of target before schedule has blind regions ( f 3 / f 4 ); Minimum range/velocity extent of target before schedule is not decodable ( f5 / f 6 ); Minimum range/velocity extent of target before schedule has blind regions ( f 7 / f 8 ); 5. Time required to transmit total waveform ( f 9 ). This real-world problem has been used as a benchmark problem, in particular to validate objective reduction approaches. This is due to the fact that the author of this problem in [19] has revealed the expected correlations between objectives, and also which objectives should be in conflict. The following objective pairs that measure the range f 1 & f 3 and f 5 & f 7 are expected to be correlated. The same can be said about the following objective pairs that measure the velocity, that is, f 2 & f 4 and f 6 & f 8 . The objectives that measure range and velocity are expected to be in conflict.
9.6 Summary In this paper we have conducted a review focusing on approaches that employ knowledge extraction techniques (such as correlation measures) as a way to facilitate the process of solving a multi-objective optimisation problem (MOP). These approaches can be found in fields such as data mining, objective reduction and innovization. The knowledge is extracted from solutions that have been generated by an MOEA due to their population-approach, since multiple optimal solutions can be found in a single optimisation run. The rationale for these approaches lies in the fact that developing a model for an optimisation problem requires a lot of domain expertise, but knowledge about the problem (e.g. relationships between the objectives and decision variables) may not be available or simply cannot be treated as trivial, specially when dealing with models that are very complex. As these approaches have demonstrated in this review, the knowledge extracted from an MOP can be used to: 1. Reduce the number of objectives which can counter the limitations of Paretobased MOEAs in generating a good approximation of the POF, and facilitate the decision-making process;
9 Identifying Correlations in Understanding and Solving Multi-objective Problems
265
2. Determine a rank between the decision variables based on some criterion which can be used to reduce the dimensionality of the decision space. This also can be used to facilitate the search and decision-making processes. In this review we have first described the correlation measures that are broadly used in the fields of applied sciences and numerical optimisation. We have shown that these are useful for indicating if two objectives are either in conflict or in harmony. However, the presence of nonlinearity can affect the accuracy of some of these measures (e.g. the Pearson correlation), but others have shown to be more robust (e.g. the Kendall correlation). Some correlation measures have been used in data mining alongside other methods (e.g. central tendency and variance statistics, or even machine learning approaches such as rough set theory). The same principles are adopted in innovization where the focus is to provide a better understanding of the problem to a designer or a practitioner, revealing information that can be useful to the task of finding the most desirable solution by the DM. In objective reduction many approaches rely on correlation measures to determine which objectives can be eliminated without affecting the POF, while others rely on the interpretation of the Pareto-dominance structure. In particular, the criterion used by [30, 37] can be used to indicate whether or not two objectives are correlated by comparing their correlation strength against some threshold. Other information that can be of interest to a DM is how much error one incurs if one or more objectives are to be omitted, as in [6, 22, 30]. Thereupon the information provided by an error measure can be used to conduct the δ-MOSS and k-EMOSS analysis, and even to derive a preference ranking between the objectives. Besides the above approaches, this review has covered benchmarking and case studies focusing on correlated objectives. Depending on how the correlations are perceived, the problems have been categorised as either as explicit or implicit. In the explicit case, it is possible to specify a desirable degree of correlation between the objectives. This could be by tweaking a parameter (e.g. One-Max problem) or by defining a correlation structure (e.g. CMNK and ρMNK). In the implicit case, the correlations between objectives cannot be prescribed (say by a user), but it is known which objectives are supposed to be correlated. The latter type of problems have been extensively used in the literature to validate objective reduction algorithms.
References 1. H. Abdi, The kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics (Sage, Thousand Oaks, CA, 2007), pp. 508–510 2. A. Agresti, Analysis of Ordinal Categorical Data, vol. 656 (Wiley, 2010) 3. H.E. Aguirre, K. Tanaka, Working principles, behavior, and performance of moeas on MNKlandscapes. Europ. J. Oper. Res. 181(3), 1670–1690 (2007) 4. S. Bandaru, K. Deb, Automated innovization for simultaneous discovery of multiple rules in bi-objective problems, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2011), pp. 1–15
266
T. Chugh et al.
5. S. Bandaru, A.H. Ng, K. Deb, Data mining methods for knowledge discovery in multi-objective optimization: part a-survey. Expert Syst. Appl. 70, 139–159 (2017) 6. D. Brockhoff, E. Zitzler, Improving hypervolume-based multiobjective evolutionary algorithms by using objective reduction methods, in Congress on Evolutionary Computation (CEC)(IEEE Press, 2007), pp. 2086–2093 7. K. Chiba, S. Jeong, S. Obayashi, K. Nakahashi, Knowledge discovery in aerodynamic design space for flyback-booster wing using data mining, in AIAA/AHI Space Planes and Hypersonic System and Technologies Conference (2006), pp. 1–18 8. T. Chugh, R. Allmendinger, V. Ojalehto, K. Miettinen, Surrogate-assisted evolutionary biobjective optimization for objectives with non-uniform latencies, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 609–616 9. H. Cramér, Mathematical Methods of Statistics (Princeton University Press, 1999) 10. F.A. Csaszar, A note on how nk landscapes work. J. Organ. Design 7(1), 15 (2018) 11. S.J. Daniels, A.A.M. Rahat, G.R. Tabor, J.E. Fieldsend, R.M. Everson, Automated shape optimisation of a plane asymmetric diffuser using combined computational fluid dynamic simulations and multi-objective bayesian methodology. Int. J. Comput. Fluid Dyn. 33, 256–271 (2019) 12. A.R.R. de Freitas, P.J. Fleming, F.G. Guimarães, Aggregation trees for visualization and dimension reduction in many-objective optimization. Inf. Sci. 298, 288–314 (2015) 13. K. Deb, D. Saxena, Searching for pareto-optimal solutions through dimensionality reduction for certain large-dimensional multi-objective optimization problems, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2006), pp. 3353–3360 14. K. Deb, L. Thiele, M. Laumanns, E. Zitzler, Scalable test problems for evolutionary multiobjective optimization, in Evolutionary Multiobjective Optimization, ed. by A. Abraham, L. Jain, R. Goldberg (Springer, 2005), pp. 105–145 15. J.A. Duro, D.K. Saxena, K. Deb, Q. Zhang, Machine learning based decision support for many-objective optimization problems. Neurocomputing 146, 30–47 (2014) 16. L.A. Goodman, W.H. Kruskal, Measures of association for cross classifications. J. Amer. Stat. Assoc. 146(268), 732–764 (1954) 17. L.L. Havlicek, N.L. Peterson, Robustness of the pearson correlation against violations of assumptions. Percept. Motor Skills 43(3_suppl), 1319–1334 (1976) 18. S. Huband, L. Barone, L. While, P. Hingston, A scalable multi-objective test problem toolkit, in Evolutionary Multi-Criterion Optimization, ed. by C.A. Coello Coello, A. Hernández Aguirre, E. Zitzler (Springer, Berlin, Heidelberg, 2005), pp. 280–295 19. E.J. Hughes, Radar waveform optimisation as a many-objective application benchmark. Lecture Notes in Computer Science, 4403 (LNCS, 2007), pp. 700–714 20. H. Ishibuchi, H. Masuda, Y. Nojima, Pareto fronts of many-objective degenerate test problems. IEEE Trans. Evolut. Comput. (2016) 21. H. Ishibuchi, N. Tsukamoto, Y. Nojima, Evolutionary manyobjective optimization: a short review, in 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) (2008), pp. 2419–2426 22. A.L. Jaimes, C.A. Coello Coello, D. Chakraborty, Objective reduction using a feature selection technique, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2008), pp. 673–680 23. M.G. Kendall, A new measure of rank correlation. Biometrika 30(1–2), 81–93 (1938) 24. M.G. Kendall. Rank Correlation Methods (Charles Griffin & Co., 1948) 25. K. Musselman, J. Talavage, A tradeoff cut approach to multiple objective optimization. Oper. Res. 28(6), 1424–1435 (1980) 26. A.J. Onwuegbuzie, L.G. Daniel, Uses and misuses of the correlation coefficient (1999) 27. K. Pearson, Notes on regression and inheritance in the case of two parents proceedings of the royal society of london 58, 240–242 (1895) 28. R.C. Purshouse, P.J. Fleming, Conflict, harmony, and independence: relationships in evolutionary multi-criterion optimisation, in Evolutionary Multi-Criterion Optimization, ed. by C.M. Fonseca, P.J. Fleming, E. Zitzler, L. Thiele, K. Deb (Springer, Berlin, Heidelberg, 2003), pp. 16–30
9 Identifying Correlations in Understanding and Solving Multi-objective Problems
267
29. R.C. Purshouse, P.J. Fleming, On the evolutionary optimization of many conflicting objectives. IEEE Trans. Evolut. Comput. 11(6), 770–784 (2007) 30. D.K. Saxena, J.A. Duro, A. Tiwari, K. Deb, Q. Zhang, Objective reduction in many-objective optimization: linear and nonlinear algorithms. IEEE Trans. Evolut. Comput. 17(1), 77–99 (2013) 31. H.K. Singh, A. Isaacs, T. Ray, A pareto corner search evolutionary algorithm and dimensionality reduction in many-objective optimization problems. IEEE Trans. Evoluti. Comput. 15(4), 539– 556 (2011) 32. C. Spearman, The proof and measurement of association between two things. Amer. J. Psychol. 15(1), 72–101 (1904) 33. A.K.A. Talukder, K. Deb, Paletteviz: a visualization method for functional understanding of high-dimensional pareto-optimal data-sets to aid multi-criteria decision making. IEEE Comput. Intell. Mag. 15(2), 36–48 (2020) 34. S. Verel, A. Liefooghe, L. Jourdan, C. Dhaenens, Analyzing the effect of objective correlation on the efficient set of mnk-landscapes, in Learning and Intelligent Optimization (LION) (Springer, 2011), pp. 116–130 35. S. Verel, A. Liefooghe, L. Jourdan, C. Dhaenens, Pareto local optima of multiobjective nklandscapes with correlated objectives, in European Conference on Evolutionary Computation in Combinatorial Optimization (Springer, 2011), pp. 226–237 36. S. Verel, A. Liefooghe, L. Jourdan, C. Dhaenens, On the structure of multiobjective combinatorial search space: Mnk-landscapes with correlated objectives. Europ. J. Oper. Res. 227(2), 331–342 (2013) 37. H. Wang, X. Yao, Objective reduction based on nonlinear correlation information entropy. Soft Comput. 20(6), 2393–2407 (2016) 38. Q. Wang, Y. Shen, J.Q. Zhang, A nonlinear correlation measure for multivariable data set. Physica D: Nonlinear Phen. 200(3), 287–295 (2005) 39. Y. Yuan, Y.S. Ong, A. Gupta, H. Xu, Objective reduction in many-objective optimization: Evolutionary multiobjective approaches and comprehensive analysis. IEEE Trans. Evolut. Comput. 22(2), 189–210 (2018)
Part II
Emerging Topics
Chapter 10
Bayesian Optimization Hao Wang and Kaifeng Yang
Abstract Bayesian Optimization (BO) is a sequential optimization strategy initially proposed to solve the single-objective black-box optimization problem that is costly to evaluate. Built on the surrogate-assisted modeling technique, BO has shown superior empirical performance in many real-world applications, including engineering design and hyper-parameter tuning for machine learning algorithms. Many approaches have also been proposed to generalize the BO algorithm to multiobjective optimization problems, where we aim to find the Pareto front representing the optimal trade-off between conflicting objectives. In this chapter, we provide an overview of the algorithm with some discussions on important aspects of the BO algorithm, for instance, the consideration on surrogate models, and focus heavily on the single- and multi-objective acquisition functions that underpin this algorithm. Also, we include a some real-world applications in which BO plays a vital role.
10.1 Introduction In this chapter, we aim to elaborate on the well-known Bayesian Optimization (BO) algorithm with a special focus on its multi-objective extension. BO is a sequential optimization strategy targeting black-box global optimization problems that are very expensive to evaluate. Underpinned by the surrogate-assisted modeling technique, it sequentially selects promising solutions using the so-called acquisition function, which relies on the prediction and uncertainty of the surrogate model and balances the trade-off between exploration and exploitation. Iteratively, the surrogate model is also updated on the new solution and its corresponding objective value. The BO algorithm has demonstrated superior efficiency over alternaH. Wang (B) Leiden Institute of Advanced Computer Science, Niels Bohrweg 1, 2333 Leiden, CA, The Netherlands e-mail: [email protected] K. Yang University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_10
271
272
H. Wang and K. Yang
tively black-box optimizers, e.g., Evolutionary Algorithms [5], in terms of function evaluations. Therefore, it has been extensively employed in applications where function evaluations are computationally expensive, for instance, automated machine learning [35], where it tunes the hyperparameter of machine learning algorithms, materials science [59], aerospace engineering [65], and structural topology optimization [83]. Although BO stems from the endeavor to efficiently solve the single-objective optimization problem, it has already been generalized to the multi-objective scenario, where various approaches have been made to modify and extend the acquisition function and the surrogate model to handle multiple objective functions simultaneously. To keep this chapter self-contained, we shall start our discussion with preliminary notations and definitions of the multi-objective optimization problem (Sect. 10.1.1), which will be used frequently in the later sections. Then, we give a general introduction to BO in Sect. 10.2, followed by a detailed description of the surrogate-assisted modeling technique (Sect. 10.3). In Sect. 10.4.1, we summarize most acquisition functions commonly employed in the single-objective case and in Sect. 10.4.2, we present the multi-objective extension of acquisition functions. In addition, parallelization techniques of BO and constraint handling are described in Sects. 10.4.3 and 10.4.4, respectively. Finally, we conclude this chapter by listing the important applications of the BO algorithm.
10.1.1 Definitions and Notations In this chapter, we aim to minimize the following multi-objective optimization (MOO) problem, without loss of generality, f : X → Rm , x |→ [ f 1 (x), f 2 (x), . . . , f m (x)] , with m real-valued objective functions f = { f 1 , · · · , f m }, f i : X → R, i = 1, . . . , m. Here, m indicates the number of objective functions that have to be dealt with simultaneously. Throughout this chapter, we denote the decision space as X , which practically could be a subset of the Euclidean space, X ⊆ Rd , a discrete alphabet X ⊆ {0, 1}d , or even a mixed space, e.g., X ⊆ Rd1 × Nd2 × {0, 1}d3 . We shall use d to denote the dimension of the decision space. For a MOO problem, the Pareto order/dominance is commonly assumed in the objective space for comparing different solutions. For y, y' ∈ Rm , we say y dominates y' , written as y ≺ y' if and only if (iff) ∀i ∈ [1. . .m] : yi ≤ yi' ∧ ∃ j ∈ [1. . .m] : y j < y 'j . The Pareto order can also be extended for sets of points: for A, B ⊆ Rm , A dominates B (written as A B) iff ∀y ∈ B ∃x ∈ A, x ≺ y. A point x ∈ X is called global efficient point if there is no other point in X that dominates x. The set of all global efficient points constitutes the Pareto efficient set, whose image under f is known as the Pareto front. Herein, we aim not to repeat those standard definitions and suggest the reader to look
10 Bayesian Optimization
273
into [4, 26, 121] for definitions pertaining to MOO problem, e.g., local efficiency, Performance indicators, and Pareto compliance. Notations Denoting by n the current number of evaluated points, we represent those points as X = {x(1) , x(2) , . . . , x(n) } and the corresponding objective values as Y = {y(1) , y(2) , . . . , y(n) }, where y(i) = ( f 1 (x(i) ), f 2 (x(i) ), . . . , f m (x(i ) )) for i ∈ [1. . .n]. When elaborating surrogate modeling for each objective function (Sect. 10.3.1), we encapsulate as a vector all evaluations pertaining to each objective function, namely ψ i = ( f i (x(1) ), f i (x(2) ), . . . , f i (x(n) )) for i ∈ [1. . .m], to ease the discussion. Also, we denote the unknown decision points using a star in the subscript, e.g., x∗(1) to distinguish it from points in X. In the objective space, we denote by P the non-dominated subset of Y (a.k.a. the Pareto approximation set). Sometimes it is convenient to use the set of points that is not dominated by P, which is denoted by ndom(P). Other specific notations shall be introduced on the spot where we need them.
10.2 Bayesian Optimization Bayesian Optimization [37, 58, 71, 87, 91] is a sequential optimization algorithm proposed to solve the single-objective black-box optimization problem that is costly to evaluate. Here, we shall first restrict our discussion to the single-objective case, i.e., f : X → R and then generalize it to the multi-objective scenario in the following subsections. BO typically starts with sampling an initial design of experiment (DoE) X ⊆ X of size n, usually generated by simple random sampling, Latin Hypercube Sampling [86], or the more sophisticated low-discrepancy sequence [73] (e.g., Sobol sequences). Taking the initial DoE X and its corresponding set of objective points Y ⊆ R, we construct a statistical model M describing the probability distribution of the objective function f conditioned on the initial evidence, namely Pr( f | Y). In most application scenarios of BO, there is a lack of a priori knowledge about f . Therefore, nonparametric models (e.g., Gaussian process regression or random forests. Please see below) are commonly chosen for M, which gives rise to a prediction fˆ(x) for all x ∈ X and an uncertainty quantification sˆ 2 (x) that estimates, for instance, the mean squared error of the prediction E( fˆ(x) − f (x))2 . Based on fˆ and sˆ 2 , promising points can be identified via the so-called acquisition function, which balances the exploitation with the exploration of the optimization process. A variety of acquisition functions have been proposed in the literature, e.g., Probability of Improvement [71], Expected Improvement [58], Generalized Expected Improvement [89], the Moment-Generating Function of Improvement [105], Upper Confidence Bound [95], Predictive Entropy Search [53], and Thompson Sampling [60]. Please see Sect. 10.4 for a detailed discussion on the acquisition function. In addition, we list the pseudo-code of BO in Algorithm 10.1 and demonstrate in Fig. 10.1 a simple example of BO on a 1D objective function. It remains a daunting task to choose the sample size and sampling method for the initial DoE. The suggested sampling method differs largely in the literature.
274
H. Wang and K. Yang
Algorithm 10.1: Bayesian Optimization 1 BO( f, A , X , M, θ )
2 3 4 5 6 7 8 9 10
/* f : objective function, A : acquisition function, X : search space, θ : parameter of A , M: a surrogate model to train */ Generate the initial DoE: X = {x1 , x2 , . . . , xn } ⊂ X ; Evaluate Y ← { f (x1 ), f (x2 ), . . . , f (xn )}; Train a surrogate model M on (X, Y); while the stop criteria are not fulfilled do x' ← arg maxx∈X A (x; M, θ ); y ' ← f (x' ); X ← X ∪ {x' }; Y ← Y ∪ {y ' }; Re-train the surrogate model M on (X, Y)
Please see [6, 81] for a discussion. As for the initial sample size n 0 , a value of 10d recommended in [58] is commonly adopted in many applications. In contrast, in the extreme case, the SMAC algorithm [54] (Sequential Model-Based Optimization for General Algorithm Configuration), a BO variant for the algorithm configuration task, starts with a single random design point. More recently, in [12], the simple random sampling method is compared empirically against Latin Hypercube sampling and low-discrepancy sequences across various sample sizes on the well-known BBOB (Black-Box Optimization Benchmarking) problem set [49], where Halton sequences [73] with a small sample size outperform the other combinations of sampling methods and sample size. In general, it is utterly fascinating to delve into how to automatically select a proper sample size and sampling method for a particular application of BO. As for the statistical model M, Gaussian process regression (GPR) is frequently chosen when the decision variables are real values, i.e., X ⊆ Rd , and when it comes to a search space composed of real, integer, and categorical values, the random forest regression model is often plugged in the BO framework, e.g., in the SMAC algorithm [54]. We shall delve into this topic in the next section. Another important aspect of this algorithm is the optimization of acquisition functions (line 6 of Algorithm 10.1), which is not a trivial task since the landscape of the acquisition function is intrinsically multi-modal. In the original publication of Efficient Global Optimization [58], where the GPR model is coupled with the expected improvement function, the authors tackled this problem by applying the branch-and-bound method [3], an exact search algorithm that guarantees to find the global optimum of A . It, however, requires a convex relaxation of the predictor fˆ and its uncertainty measure sˆ 2 , which can not be generalized to other models easily. Practically, it is more common to utilize the first- and second-order optimization techniques, e.g., the well-known BFGS algorithm [36] when the decision variables
10 Bayesian Optimization
275
Fig. 10.1 On a sine function (the black curve), we illustrate three steps of a BO algorithm underpinned by a Gaussian process regression (GPR) model (the green curve shows its prediction and the blue shaded band indicates its prediction uncertainty), in which one candidate point (the red dot) is proposed in each iteration by maximizing the expected improvement function (the red curve). The GPR models is updated iteratively using the candidate point and its objective value
take real values (hence we could compute the gradient/Hessian of the acquisition function) or to sort to the black-box optimization algorithm, for instance, Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [50], Mixed Integer Evolution Strategy (MIES) [67], and the combination of CMA-ES and Estimation of Distribution Algorithm (EDA) [7], when the decision space consists of real, integer, and discrete variables. To increase the probability of locating the global optimum, [102] proposed adopting a niching-based evolution strategy, which essentially spawns several independent evolution strategies and keeps them distant, such that each evolution strategies would ideally search in a local basin of attractions. Surprisingly, in auto-
276
H. Wang and K. Yang
mated algorithm selection and configuration [100], where BO is often employed, random search also works well to optimize the acquisition functions [35, 54]. To extend the BO framework for solving multi-objective optimization problems, an indicator that evaluates the performance of a specific solution compared to an existing Pareto-front approximation set can transform the multi-objective optimization into a single-objective optimization problem by maximizing/minimizing it. This indicator is often called the infill criterion or the acquisition function. The theoretical property of the infill criteria determines the search ability during the optimization processes. Specifically, some indicators focus on exploration (e.g., Probability of Improvement), some indicators only work in an exploitation way (e.g., Hypervolume Improvement), and some other indicators can balance the exploration and exploitation (e.g., Expected Hypervolume Improvement). In multi-objective Bayesian optimization (MOBO), it is usually assumed that there exists no correlation between/among objectives. Like single-objective Bayesian optimization, MOBO is usually applied for continuous optimization problems. For mixed-integer optimization problems, it can be solved by replacing the Euclidean metric with the heterogeneous metric [115] or by using the one-hot encoding strategy [39].
10.3 Surrogate-Assisted Modeling In this section, we intend to provide elaborate discussions on the surrogate-assisted modeling technique commonly employed in Bayesian Optimization. We shall primarily focus on the well-known Gaussian process regression model since it is deeply rooted in the inner workings of BO and also extensively exploited for solving multiobjective optimization problems.
10.3.1 Gaussian Process Regression In the Bayesian setup, we model the objective function f as a centered Gaussian process (GP) prior: f ∼ GP(0, k), where k : X × X → R is a positive definite function (a.k.a. kernel) that computes the auto-covariance of the process, i.e., ∀x, x' ∈ X , k(x, x' ) = Cov{ f (x), f (x' )}. Commonly, we choice a stationary kernel k (namely, being translation invariant), for example, the so-called Gaussian kernel (a.k.a radial basis function (RBF)) [16]:
d ∑ (xi − xi' )2 k(x, x ) = σ exp − 2θi2 i=1 '
2
,
(10.1)
where σ 2 models the variance of function values at each point and θi ’s are known as the length scale parameters, controlling the correlation between different points
10 Bayesian Optimization
277
along each dimension. Also, σ 2 and θi ’s are the hyper-parameters of this model, which are typically estimated from data using the maximum likelihood principle. In practice, it is recommended to employ the Matérn kernel family as it facilitate a direct control on the smoothness of the prior process [84, 96]. Please see [43] for an in-depth discussion on the kernel functions. Consider a realization of f at n locations, expressed as the following vector ψ = ( f (x(1) ), f (x(2) ), . . . , f (x(n) )) and X = {x(1) , x(2) , . . . , x(n) } ⊆ X . Bayesian inference yields the posterior distribution of f , i.e., p( f (x) | ψ) ∝ p(ψ | f (x)) p( f (x)). Note that, this posterior probability is also the conditional probability p(Y (x) | ψ) due to the fact that f (x) and ψ are jointly Gaussian: 2 σ k f (x) , ∼ N 0, k K ψ where Ki j = k(x(i ) , x( j) ) and k(x) = (k(x, x(1) ), k(x, x(2) ), . . . , k(x, x(n) )) . Conditioning on ψ, we obtained the posterior of f :
f (x) | ψ ∼ N k K−1 ψ, σ 2 − k K−1 k .
(10.2)
Given this posterior, it is obvious that the best unbiased predictor of Y is the posterior mean function fˆ = k K−1 ψ, which is also the Maximum a Posterior Probability (MAP) estimate. The MSE of fˆ is s 2 = E{ fˆ − f }2 = σ 2 − k K−1 k, which is also the posterior variance. We shall refer this statistical as Gaussian Process Regression (GPR) in this chapter, Moreover, to see the covariance structure of the posterior process, consider two query points x∗(1) , x∗(2) ∈ X : ⎡
⎤⎞ ⎤ ⎛ ⎡ 2 σ k12 k f (x∗(1) ) ⎣ f (x∗(2) )⎦ ∼ N ⎝0, ⎣k21 σ 2 k' ⎦⎠ , k k' K ψ in which k1 = k(x∗(1) ), k2 = k(x∗(2) ), and k12 = k21 = k(x∗(1) , x∗(2) ). After conditioning on ψ, we obtained the following distribution:
| −1 2 σ − k1 K−1 k1 k12 − k1 K−1 k2 f (x∗(1) ) || k1 K ψ , . (10.3) ψ ∼N k21 − k2 K−1 k1 σ 2 − k2 K−1 k2 k2 K−1 ψ f (x∗(2) ) |
In this posterior formulation, it is clear to see that the covariance at two arbitrary locations is expressed in the cross-term of the posterior covariance matrix. Consequently, we provide the posterior kernel kpost as follows:
k post x∗(1) , x∗(2) := Cov{ f (x∗(1) ), f (x∗(2) ) | ψ} = k(x∗(1) , x∗(2) ) − k1 K−1 k2 , (10.4) giving rise to the posterior GP, namely f | ψ ∼ GP( fˆ, k post ), which is essential to proposing a batch of points in each iteration, e.g., via the multi-point expected
278
H. Wang and K. Yang
improvement [21] (a.k.a. q-EI, see Sect. 10.4.3), since such a task requires the correlation among mutiple points on the posterior process. It is worthwhile to address that we only consider the centered GP prior here for brevity of our discussion, and it is also common to use more complicated priors. For instance, it is sometimes beneficial to implement a prior mean function (possibly with unknown parameters) for incorporating a priori knowledge on the objective function [75, 84]. In [90], the student-t process is also suggested as an alternative to the GP prior.
10.3.2 GP for Multi-objective Problems When it comes to multi-objective problems (which is essentially vector-valued functions), it is necessary to generalize the surrogate modeling techniques for modelling vector-valued functions. The most straightforward approach is to construct independent surrogate models for each objective function. Given n observed points X = {x(1) , x(2) , . . . , x(2) } and their corresponding function values ψ 1 , ψ 2 , . . . , ψ m with ψ i = ( f i (x(1) ), . . . , f i (x(n) )), this approach constructs independent GPRs for post each objective function, i.e., f i | ψ i ∼ GP( fˆi , ki ), i = 1, 2, . . . , m, thereby giving rise to following posterior process on the vector-valued objective function: post m ˆ ⊗i=1 ki , f | ψ 1 , ψ 2 , . . . , ψ m ∼ GP f,
(10.5)
where ⊗ stands for the tensor product and fˆ = ( fˆ1 , fˆ2 , . . . , fˆm ). Despite its simplicity, this approach ignores the potential correlations among objective functions. Hence, it will possibly hamper the performance of surrogate models and render the multi-objective optimization process less efficient since modeling the correlations among objective functions can help navigate points in the objective space more effectively. In this regard, substantial works have been devoted to extend GPR to vector-valued functions as well as handle correlated GPRs in the objective space. For instance, in [11], the so-called multi-task GP is proposed, where a Kronecker factorization is applied to the covariance structure between two objective functions, thereby isolating the auto-covariance function of each objective from the cross-covariance between objectives. Loosely speaking, ∀i, j ∈ [1. . .m] and ∀x, x' ∈ X , this approach constructs the following covariance structure: Cov{ f i (x), f j (x' )} = Ci j k(x, x' ), where C ∈ Rm×m is a positive semi-definite (PSD) matrix representing the crosscovariance structure while k is the prior kernel that specifies the auto-covariance and is shared by all objective functions. More recently, the so-called dependent Gaussian processes [13] realize the cross-covariance via the convolution of kernel functions
10 Bayesian Optimization
279
from each pair of objective functions. In this chapter, we shall not dive into the detail of those works on this topic. Please see [2] for a comprehensive review of modeling vector-valued objective functions. In order to simplify the following discussions, we shall only consider the case where independent GPRs are employed to approximate multiple objective functions.
10.3.3 Other Surrogate Models When the decision space X is a mixture of real, integer, and categorical variables, it is still a challenging task to adapt the GPR model to this case. Although many proposals have been made for this purpose, including a generalization of the kernel function [54] to categorical variables, the embedding of non-real variables into the Euclidean space [39], group kernels [85], and latent representations of categorical variables [118], the empirical performance thereof remains questionable. Alternatively, the random forest model [14] is commonly used in practice [35] because it copes with discrete variables by its design principle. Very briefly, a random forest consists of a collection of decision trees, each of which takes as input a bootstrap sample of (X, ψ) and selects a random subset of decision variables before splitting an internal node of the tree. The predictor fˆ is simply the averaged predictor over constituting decision trees while the uncertainty measure of predictions is determined by the variability of predictions from the decision trees. Despite its simplicity and ability to handle categorical variables, the random forest might yield untrustworthy uncertainty measures particularly when it extrapolates the data set (X, ψ) [64], in contrast to the uncertainty quantification in GPR which is reliable for interpolation. Despite its well-accepted empirical performance (e.g., in automated machine learning [35]), the random forest model is less theoretically plausible since its estimation and uncertainty quantification it yields are purely frequentist. Hence, the BO algorithm equipped with random forests is no longer “Bayesian”. Alternatively, one could consider other Bayesian statistical models that handle categorical variables naturally, for instance the so-called Bayesian forest [99] that imposes a prior over decision trees to facilitate Bayesian reasoning. In [92, 94], a Bayesian neural network is employed as the surrogate model, which provides a characterization of uncertainty and also much more scalable than GPRs when the data set becomes vast. Often, the decision space X contains conditional relations among variables, where some decision variables become ineffective on the objective value when another variable takes a specific value. For example, when tuning a Support Vector Machine (SVM) on a given data set, a hyperparameter specific to a particular kernel function would become effective only if this kernel is selected in the first place by the tuning algorithm. In this case, it is suggested to use the so-called Tree-structured Parzen Estimator (TPE) [7] deals with conditional parameters by modeling the conditional relations as a tree structure and constructing Parzen estimators (a.k.a. kernel density estimation) on each leaf node of this tree.
280
H. Wang and K. Yang
10.4 Acquisition Functions When using surrogate modeling for optimization, it is crucial to determine how the model should be explored or exploited properly since that surrogate models give rise to errors in the prediction. On the one hand, we could completely trust the prediction from the model and look for the global minimum on its prediction surface, which should perform well when the model approximates the objective function reasonably. On the other hand, when the model prediction differs drastically from the objective function, it is then more sensible to evaluate locations with higher prediction uncertainty since the chance of observing a better objective value is relatively larger. The so-call acquisition functions are devised to balance such exploitation and exploration of the surrogate models in an automated manner. In this section, we shall first discuss the acquisition function engineered for single-objective optimization problems and then move forward to their multi-objective generalizations.
10.4.1 Single-Objective Acquisition Function Over the last decades, much research has been put into finding a function A that provides a good balance between exploration and exploitation for various applications. One category of such functions is formulated through the concept of improvements over the best observed function value, which, in the single-objective case, is defined as I (x) = max{ f min − f (x), 0}, f min = min{ψ}. As for the multi-objective case, the generalization of the improvement is non-trivial due to the nature of partial orders (e.g., Pareto order) assumed in the objective space. We will deal with the multiobjective acquisition functions in Sect. 10.4.2. Imposing a Gaussian prior on f , we obtain the distribution of I from the posterior process f | ψ ∼ GP( fˆ, k post ), which is a rectified Gaussian: p I (x) (u | ψ) =
⎧ ⎨ϕ fˆ− fmin δ(u) sˆ
⎩sˆ −1 (2π )−1/2 exp
ˆ 2 min − f )) − (u−( f2ˆ s2
u ≤ 0, otherwise.
(10.6)
Here ϕ(·) stands for the cumulative distribution function (c.d.f.) of the standard Gaussian and δ(·) is the Dirac delta. Most of the improvement-based infill criteria are constructed to summarize the statistical properties of the improvement. A short review of the improvement-based infill criteria is given as follows. • Expected Improvement (EI) is originally proposed in [71] and is also used in the standard Efficient Global Optimization (EGO) algorithm [58]. It is defined as the first moment of the improvement:
10 Bayesian Optimization
281
f min − fˆ EI(x) = E {I (x) | ψ} = + sˆ φ . sˆ (10.7) Here, φ(·) represents the probability density function (p.d.f.) of the standard Gaussian. As will be shown in the next section, the EI criterion is highly multi-modal and tries to balance between exploration and exploitation. • Probability of Improvement (PoI) gives the probability of observing an improvement [57, 119]. This acquisition function is more biased towards exploitation than exploration since it rewards the solutions that are more certain to yield an improvement over the current best solution, without taking the amount of the actual improvement into account:
f min − fˆ ϕ
f min − fˆ sˆ
PoI(x) = Pr( f (x) < f min | ψ) = ϕ
f min − fˆ . sˆ
(10.8)
Loosely speaking, PoI rewards low risk solutions that typically come with a relatively small amount of improvement while EI rewards solutions that give high improvement on average but could be risky to realize such an improvement. • Weighted Expected Improvement (WEI) is an alternative approach to explicitly control the balance between exploration and exploitation. Consider Eq. (10.7): the first term calculates the difference between the current best f min and the prediction fˆ, penalized by the probability of improvement. The second term is large when sˆ is large, meaning a large uncertainty about the prediction is preferred. Therefore, it is also straightforward to explicitly control the balance between those two terms [93]:
f min − fˆ + (1 − w)sφ , s (10.9) where w ∈ (0, 1) is the balancing weight that should be chosen in a problemdependent manner [93]. • Generalized Expected Improvement (GEI) [89] is a generalization of the EI criterion where an integer parameter g, is introduced to compute the gth-order moment of the improvement. The larger the value of g, the more explorative locations will be awarded by the criterion and vice versa. GEI is defined as GEI(x; g) = E {I (x)g | ψ} where g ∈ N≥0 . Note that, in [89], GEI is computed recursively, which brings in extra computational costs and hampers its applicability. Also, the setting of the free parameter g is entirely empirical. • Moment-Generating Function of the improvement (MGFI) [103, 105] interpolates all moments of the improvement (consider the probability of improvement as the zero-order moment) by taking a weighted sum of those: WEI(x; w) = w f min − fˆ ϕ
f min − fˆ sˆ
282
H. Wang and K. Yang ∞ ∑ tk E I (x)k | ψ t k!e k=1 f min − fˆ' sˆ 2 2 ˆ =ϕ exp f min − f − 1 t + t , sˆ 2
MGFI(x; t) = Pr(I (x) > 0 | ψ) +
(10.10)
where fˆ' (x) = fˆ(x) − sˆ 2 (x)t and an additional parameter t ∈ R≥0 (“temperature”) is introduced to balance exploration with exploitation. When t goes up, MGFI tends to reward points with a high uncertainty as higher moments of I (x) are dominating the lower ones. When t decreases, lower moments contribute more to MGFI and thus points with less uncertainty are preferred. Instead of specifying a moment to take as in GEI, MGFI aggregates several consecutive moments and therefore exhibits a smoother change when scaling the temperature t. • Stepwise Uncertainty Reduction (SUR) [80] paradigm aims to estimate the global informational gain of observing an unknown point (or equivalently the reduction of uncertainty regarding the global optimum), which resembles the rationale behind the so-called integrated expected conditional improvement (IECI) criterion (discussed in the context of constraint handling. See Sect. 10.4.4). Essentially, this consideration quantifies the global progress made by adding a new point x ∈ X as if it were observed, by firstly conditioning on an event f (x) = y and then integrating the probability of improvement (after adding y to ψ) over the whole decision space X , leading to the expected volume of excursion set (EV):
EV(x; y) =
X
min{ f min , y} − fˆnew (ξ ) ϕ dξ , sˆnew (ξ )
where fˆnew and sˆnew stand for the updated posterior mean and uncertainty after adding y to the evidence ψ. Note that, EV is a random variable due to the unobserved value at x, i.e., y ∼ N ( fˆ(x), sˆ 2 (x)) and fˆnew and sˆnew are also random variables that can be determined from fˆ, sˆ , and y incrementally [28]: k(·, x)2 2 fˆnew (·) = fˆ(·) + σ −2 k(·, x) y − fˆ(x) , sˆnew . (·) = sˆ 2 (·) − 2 sˆ (x) We then proceed to take the expectation of EV concerning the randomness of y, namely, the expectation of expected volume of excursion set (EEV):
EEV(x) =
EV(x; y)φ R
y − fˆ(x) sˆ (x)−1 dy. sˆ (x)
(10.11)
10 Bayesian Optimization
283
10.4.2 Multi-objective Acquisition Functions Additional challenges arise in generalizing acquisition functions to multi-objective problems. There is no trivial generalization of the improvement between two arbitrary points, which underpins many acquisition functions mentioned above, to the objective space Rm where the Pareto order is commonly taken. Intuitively, one can resort to the scalarization-based method [26, 30, 68, 70], which transforms a MOO problem into a single-objective one through some scalarization function, e.g., Chebyshev scalarization [116], linear combinations [30], and boundary intersection methods [23]. Particularly for MOBO, various scalarization-based approaches have been proposed. For instance, the so-called ParEGO [63] algorithm uses the weighted Chebyshev distance and trains on a GPR model on the scalarized objective values, where in each iteration the weights are sampled uniformly at random from the probability simplex. In contrast, MOEA/D-EGO [117] follows the principle of MOEA/D [116] that creates a set of single-objective sub-problems via different scalarization functions (Chebyshev or linear) and solves them simultaneously. MOEA/D-EGO constructs independent GPRs (see Eq. (10.5)) for each objective function and subsequently builds estimators for each sub-problem by scalarizing GPRs as well. Also, [77] delineates a framework of scalarization-based MOBO, which builds independent GPRs for each objective, scalarizes those GPRs using a randomized scalarization function in each iteration, and utilizes UCB or Thompson sampling on the scalarized GPR. This work provides a theoretical upper bound on the Bayes regret of scalarization-based MOBO. Alternatively, one may consider each objective function separately and define the improvement in the objective space with respect to the Pareto order. The most prominent method is to take the advantage of unary performance indicators for MOO m problems [4], i.e., I : 2R → R, which quantify the quality of a Pareto approximation set with a single real value. For quantifying the potential improvement brought by observing at a point x∗ , it suffices to take I (Y ∪ {f(x∗ )}) − I (Y), which is a random variable when a statistical model is assumed on f. Hypervolume indicator [120] HV is an important one, which essentially measures for each Y ⊆ Rm the volume of the set dominated by Y, i.e., HV(Y; r) = λm ({y ∈ Rm | ∃z ∈ Y : z ≺ y ∧ y ≺ r}), where λm is the Lebesgue measure and the reference point r clips the space to ensure a finite volume. Theoretically sound as it is, the hypervolume indicator HV has drawn a great attention in the study of MOBO since it is Pareto compliant, meaning that ∀A, B ⊆ Rm (HV( A) > HV(B) =⇒ A ≺ B). Other performance indicators have also been used in MOBO, e.g., the R2 indicator [15, 48]. Assuming a Pareto approximation set P ⊆ Y, a reference point r ∈ Rm , and independent GPRs (see Eq. (10.5)) built for each objective function, we provide the statistical model of f as follows: m yi − fˆi (x) −1 sˆi (x) φ , PDFf(x) (y | Y) = sˆi (x) i=1 and its truncation to a hyperbox [a, b] = [a1 , b1 ] × · · · × [am , bm ] ⊆ Rm :
284
H. Wang and K. Yang
TPDFf(x) (y | Y; a, b) = PDFf(x) (y | Y)
m i=1
bi − fˆi (x) ϕ sˆi (x)
ai − fˆi (x) −ϕ sˆi (x)
−1 ,
which are essential to the following discussions. Now, we are ready to introduce some commonly used multi-objective acquisition functions. 1. Hypervolume Improvement (HVI) is also called Improvement of Hypervolume in [29]. It measures the change of HV value of a Pareto-front approximation set P by adding a point y ∈ Rm to P. Hypervolume Improvement is defined as HVI(y; P, r) = HV(P ∪ {y}; r) − HV(P; r)
(10.12)
Since HVI only calculates means of multivariate normal distributions, it focus on exploitation during the optimization processes. The computational complexity of HVI is Θ(n log n) [9] when m = 2, 3. In more than 3 dimensions, the algorithm m proposed by Chan [18] achieves O(n 3 polylogn) time complexity. 2. Expected Hypervolume Improvement (EHVI) is a generalization of EI for multiobjective cases and it was proposed by Emmerich [31]. EHVI measures how much hypervolume improvement could be achieved by evaluating the new point, considering the uncertainty of the prediction. Similar to EI, EHVI also balance the exploration and exploration. EHVI is defined as EHVI(x; P, r) =
Rm
HVI(y; P, r) · PDFf(x) (y | Y)dy
(10.13)
The computational complexity of EHVI is Θ(n log n) [32, 111] when m = 2, 3, and O(2m−1 n n/2 ) [109]. Example 10.1 An illustration of the 2-D EHVI is shown in Fig. 10.2. The light gray area is the dominated subspace of P = {y(1) = (3, 1) , y(2) = (2, 1.5) , y(3) = (1, 2.5) } cut by the reference point r = (4, 4) . The bivariate Gaussian distribution has the parameters fˆ1 = 2, fˆ2 = 1.5, sˆ1 = 0.7, sˆ2 = 0.6. The probability density function (p.d.f) of the bivariate Gaussian distribution is indicated as a 3-D plot. Here y is a sample from this distribution and the area of improvement relative to P is indicated by the dark shaded area. The variable y1 stands for the f 1 value and y2 for the f 2 value. 3. The Expected Maximin Fitness Improvement (EMmI) [98] determines the improvement of an objective point y with respect to the incumbent approximation set P by firstly calculating for each p ∈ P the largest objective-wise improvement of y over p and then taking the smallest such improvement over P, i.e., maxmin-I(y; P) = max 0, min max pi − yi . p∈P i∈[1...m]
10 Bayesian Optimization
285
Fig. 10.2 Multivariate normal distributions in 2-D (cf. Example 10.1)
It is then straightforward to define the expected maximin fitness improvement as the expectation of this special improvement: EMmI(x; P) =
Rm
maxmin-I(y; P) PDFf(x) (y | Y)dy.
(10.14)
4. Truncated Expected Hypervolume Improvement (TEHVI) [108, 113] is a more generalized form of the EHVI. The probability density function of the TEHVI is a multi-variate normal distribution that is truncated by the objective value domain. In terms of Bayesian reasoning, it uses the conditional distribution given the a-priori knowledge. This knowledge is about the true output of the objective function within a prescribed range. Practically speaking, the idea behind TEHVI is to focus sampling on more relevant parts of the search space by taking into account a-priori knowledge on objective function value ranges. It is defined as TEHVI(x; P, r, a, b) =
HVI(y; P, r) · TPDFf(x) (y | Y; a, b)dy. y∈[a,b]
(10.15) Note that, TEHVI has the same computational complexity of the EHVI. Since a-prior knowledge of the objective functions is utilized in the TEHVI, TEHVI can force the optimizer only to explore a specific domain in the objective space. Compared to the EHVI, the performance of the TEHVI is better than that of the EHVI w.r.t. the hypverolume indicator [108] and the TEHVI can generators more solutions around the knee point. The truncated domain in the TEHVI
286
H. Wang and K. Yang
Fig. 10.3 Truncated multivariate normal distributions in 2-D (cf. Example 10.2)
can be utilized to define a region of interest, which can be defined/provided by a decision maker. Therefore, it can solve preference-based multi-objective optimization problems [113]. Example 10.2 An illustration of the TEHVI is shown in Fig. 10.3. The truncated probability density function of the bivariate Gaussian distribution is indicated as a 3-D plot, with truncated domains [a, b] = [0, ∞]. All the other parameters are the same ones as in Example 10.1. 5. Expected Hypervolume Improvement Gradient (EHVIG) [110] is the first order derivative of the EHVI with respect to a target point x in the search space. The EHVIG allows gradient assent algorithms being as an optimizer in MOBO. Moreover, EHVIG can be used as a stopping criterion in global optimization, e.g., in Evolution Strategies, since it should be a zero vector at the optimal decision vector. EHVIG is defined as: EHVIG(x; P, r) =
∂ EHVI(x; P, r) ∂x
(10.16)
The computational complexity of the EHVIG is Θ(n log n) for m = 2. For the cases of m ≥ 3, it holds the same computational complexity of of the EHVI theoretically, but the explicit formula are not avialble now. 6. Expected R2 Improvement (ER2I) [27] is the expected improvement of R2 indicator. It provides an alternative to the EHVI that requires a reference point,
10 Bayesian Optimization
287
bounding the Pareto front from above. In contrast, the ER2I works with a utopian reference point that bounds the Pareto front from below. The R2 indicator is evaluated on the given P to which a point in the image of the feasible set is added. The resulting difference is the R2-improvement of the chosen point with respect to the given approximation set. Then, we compute the expected R2 improvement over the set dominated by the utopian point z ∈ Rm under the posterior distribution truncated to [z, ∞], i.e., ER2I(x; P, z) = (R2(P ∪ {y}) − R2(P)) TPDFf(x) (y | Y; z, ∞)dy. y∈[z,∞)
(10.17) 7. Probability of Improvement (PoI) was first introduced by Stuckman [97], and then generalized by Emmerich et al. [31] and Keane et al. [61] to multi-objective optimization. Opposite to HVI, PoI purely focus on exploration during the optimization processes and it is defined as: PoI(x; P) =
Rm
1ndom(P) (y) PDFf(x) (y | Y)dy,
(10.18)
where 1 is the characteristic function. 8. Hypervolume-based Probability of Improvement (Phv ) was proposed by Couckuyt et al. [22]. It is a product of the Hypervolume improvement and the probability of improvement and it is defined as: Phv (x; P, r) = HVI(x; P, r) · PoI(x; P).
(10.19)
9. Lower/Upper confidence bound (L/UCB) is another indicator based on the hypervolume improvement. Denoting by sˆ(x) = (ˆs1 (x), . . . , sˆm (x)) the uncertainty of GPRs, we are interested in the hypervolume improvement of some lower or upper ˆ ± ωˆs(x), ω ∈ R≥0 confidence bound of the true objective point at x, i.e., f(x) specifies the confidence level: ˆ LCB(x; P, r, ω) = HVI(f(x) − ωˆs(x); P, r),
(10.20)
ˆ UCB(x; P, r, ω) = HVI(f(x) + ωˆs(x); P, r).
(10.21)
Other MOBO acquisition functions include extending the EEV from single- to multi-objective cases [80], the entropy of the posterior distribution over P [51], and ∈−based acquisition functions of at least ∈ improvement over an approximation set to the Pareto front [33, 123]. We only unveil a small corner of the MOBO field in this section, namely the extension of acquisition functions to the multi-objective problem. It is not our aim to limit the reader to the discussion herein and instead, we suggest the reader continue exploring the growing variety of algorithmic approaches to MOBO. For instance,
288
H. Wang and K. Yang
in [106], an adaptive scheme is delineated, in which LCB is combined with the socalled Angle-Penalized Distance (APD) indicator [20] and the free parameters of LCB (see Eq. (10.20)) are controlled in an adaptive manner. Please also see [82] for a comprehensive overview and comparison of similar algorithmic proposals. Another critical aspect of MOBO lies in its scalability in terms of dimensionality of the decision and objective space, which, when growing high, hampered the empirical performance of MOBO substantially, as the underline GPR is prone to under/overfitting [19, 66, 72]. As a remedy, there have been quite a bit proposals to alleviate this issue, e.g., in [107], a random embedding is constructed to convert the highdimensional space to a lower-dimensional subspace, in [40], an engineering-inspired approach is taken to learn a lower-dimensional feature space, in [62], an adaptive approach is devised to identify a one-dimensional subspace. Particularly for multiobjective scenarios, [46] proposed to perform a variable selection task with the help of a heterogeneous ensemble of surrogate models (not restricted to GPR). Rather than learning which variables to select/drop, the random dropout of decision variables has been theoretically analyzed and tested, when coupled with a GPR [66]. More recently, [47] suggests substituting GPR that is problematic in high dimensions with an efficient dropout neural network.
10.4.3 Parallelization At this moment, both single- and multi-objective acquisition functions only give rise to a sequential BO algorithm, which turns out to be unfavorable in many application scenarios (e.g., in tuning the hyper-parameters of machine learning models), where the user could afford to execute a small batch of candidate solutions in parallel. In this regard, the design of a well-performing parallelization mechanism, where we aim to propose multiple promising points simultaneously in each iteration, has become increasingly essential to the applicability of BO and is addressed quite often in recent works. For the single-objective BO, the well-known expected improvement (EI) criterion has been generalized to the so-called multi-point expected improvement (a.k.a. q-EI) [44, 88], which is defined as the expectation of the smallest improvement among q correlated unknown points on GPR (cf. Eq. (10.7)):
q-EI x∗(1) , x∗(2) , . . . , x∗(q) = E min{I (x∗(1) ), I (x∗(2) ), . . . , I (x∗(q) )} | ψ Note that, in [69], the closed-form expression and differentiation are provided for q-EI. Despite the elegance brought by the closed-form expression, calculating those for q-EI remains computationally heavy when q is moderate or large, since it requires substantial calls to the cumulative distribution function of a multivariate Gaussian. As such, some heuristic approaches are devised to circumvent the computation bottleneck. For instance, in [44], two heuristics, Kriging believer and constant liar are designed to procedure a batch of points by sequentially maximizing the EI criterion and updating the GPR model with a dummy objective value of the maximum. It is
10 Bayesian Optimization
289
proposed to take as the dummy value either the prediction of GPR (Kriging believer) or a user-defined constant (constant liar). Some parallelization proposals can be applied to both single- and multi-objective problems. For instance, in [102], multiple points are obtained by employing a niching evolution strategy for maximizing the acquisition function, which is in principle applicable to most of the acquisition functions as their landscapes are very often multi-modal. In [45], a local penalization method is developed, which selects the first point by maximizing an acquisition function and subsequently penalizes the acquisition value in a neighborhood of the first point, thereby resulting in a locally penalized acquisition function. Such a method proceeds to generate the remaining points one after another by including the previously found point in the penalty. Please also see [25] for a recent extension of this local penalty method. Particularly, for the lower/upper confidence bound (L/UCB) (which is well defined for both single- and multi-objective scenarios), Hutter et. al. [55] proposed to instantiate multiple different criteria by sampling the free parameter ω controlling the level of confidence (see Eqs. (10.20) and (10.21)) from a log-normal distribution, and then optimizing each of them. Particularly for the MOO problem, several approaches are developed based on the hypervolume improvement, considering it as the counterpart to the improvement in the single-objective case. In [24], the q-Expected Hypervolume Improvement (qEHVI) is proposed to consider the union of sets dominated by multiple points in the objective space, which is called q-HVI: q-HVI
y∗(1) , y∗(2) , . . . , y∗(q)
= λm
q
ndom
{y∗(i) }
\ ndom(P) ,
i=1
and the q-EHVI is computed by taking the expectation of q-HVI with respect to the posterior of GPR built for the vector-valued objective function. The computation of q-EHVI is achieved by the Monte Carlo integration. The most straightforward heuristic parallelization technique is extending Kriging believer or constant liar from BO to MOGO. Recently, a heuristic approach is to use preference-based MOBO to parallelly search for multiple optimal solutions in divided regions of objective space by using TEHVI [114] or by setting multiple reference points for EHVI [41].
10.4.4 Constraint Handling Another crucial concern regarding the applicability of BO lies in handling the constraints. Herein, we shall only discuss the inequality constraints, i.e., a set of functions {g1 , g2 , . . . , gk }, gi : X → R, i = 1, 2, . . . , k, which indicates that a decision point x ∈ X is feasible if and only if ∀i ∈ [k](gi (x) ≤ 0). Such a constrained problem becomes trivial to solve if those constraints are not expansive to evaluate, since in this
290
H. Wang and K. Yang
case one can resort to well-established constraint handling approaches, e.g., adding a penalty, when optimizing the acquisition function. The real challenge is revealed then gi ’s are very expansive to evaluate as with the objective function, which is a stochastic programming problem that is typically formulated as to minimize the objective function in expectation while maximizing the probability of satisfying the constraints. Various works have been devoted to leverage the acquisition function to the constrained problem, e.g., Expected Improvement with Constraints (EIC) criterion [38, 42, 79, 89]: EIC(x) = E I (x)
k
| ! k | 1gi (x)≤0 (x) || ψ = EI(x) Pr (gi (x) ≤ 0 | ψ) ,
i=1
i=1
which is the expectation of the constrained improvement (cf. Eq. (10.6)). It is worth noting that 1) the improvement I here should be determined with respect to the best feasible point; 2) to estimate the probability of each constraint being satisfied, namely Pr (gi (x) ≤ 0 | y), we have to construct a GPR (or other statistical models) for each constraint function as well. Also, it is straightforward to generalize EIC to the multi-objective scenario by substituting I with the hypervolume improvement HVI with respect to the current Pareto approximation set P, resulting in the Expected HyperVolume Improvement with Constraints (EHVIC): EHVIC(x; P, r) =
k Rm i=1
1gi (x)≤0 (x)HVI (y; P, r) PDFf(x) (y | Y)dy.
In [1], a method to compute EHVIC is devised, which involves partitioning the objective space into non-overlapping cells based on the Pareto approximation set P. As a further generalization of EHVIC, in [34], the objective and constraint functions are handled simultaneously by defining the improvement (e.g., HVI again) in the product space of ranges of the objective and constraint functions. Motivated differently, the Integrated Expected Conditional Improvement (IECI) [8] of an unknown point x first considers the reduction of EI if x were observed and then computes an average of reductions over the decision space, where each point is weighted by its probability of satisfying the constraints: IECI(x) =
X
(EI(ξ ) − EI(ξ | x))
k
Pr (gi (ξ ) ≤ 0 | ψ) dξ ,
i=1
where EI(ξ | x) = E{max{ f (ξ ) − f min , 0} | f (x), ψ} is the conditional expected improvement. Intuitively, IECI favors a point that decreases EI values the most (hence making a large step-size progress) while possessing a highly probability of being feasible, which does not entirely prohibit generating infeasible solutions in contrast to EIC, since occasionally evaluating infeasible points may also provide valuable information for the search. Also, IECI can be easily generalized to the multi-objective
10 Bayesian Optimization
291
case by substituting the improvement with HVI. Please see [52] for a review over other related constraint handling approaches for BO.
10.5 Applications In the single-objective optimization task, BO has been widely applied to tune the hyper-parameters of machine learning models [7, 35], where those machine learning model typically takes a long time to train. In [56], the BO algorithm is also improved to tackle the so-called Neural Architecture Search (NAS) problem, in which we target finding appropriate architectures of the deep neural network given a training data set. Also, BO is often used to optimize the empirical performance of other optimizers on a benchmark problem set, e.g., for tuning the well-known CMA-ES [101] and grammatical evolution systems [104]. In control fields, the PID (proportional-integral-derivative) parameter tuning problem is a classical optimization problem, and this problem was optimized by using the EHVI and the TEHVI in [112] and [108], respectively. In [108], the prior-knowledge of the objective domain is incorporated in TEHVI’s truncated domain [a, b] ⊆ Rm . The experimental results indicate the MOBO with TEHVI outperforms EHVI if prior knowledge is utilized in the acquisition function. Robust PID parameter tuning problems were optimized by using EHVIG as the stop criterion in CMA-ES to search for the optimal solution x' based on surrogate models [110]. For the simulation-based control problem, Yang et al. [112] applied two versions of MOBO, by using the EHVI and LCB as the acquisition functions to optimize financial profit and operational cost simultaneously in the biogas plant control problem. The experimental results show that MOGOs using LCB and EHVI generate similar Parto-front approximation sets. At the same time, the MOGO of EHVI is slightly better than that of LCB on this problem. Compared with SMS-EMOA, both MOGOs outperform SMS-EMOA on this real-world problem within a limited evaluation budget (ca. 100 objective function evaluations). Calandra [17] utilized MOGO to optimize the parameters of the controller in robotics fields automatically. For other real-world expensive multi-objective optimization problems, MOBO is used in many different domains. For instance, it was used to minimize the drag and absolute value of the pitching moment coefficient for airfoil optimization problems by using the EHVI and the TEHVI [76]. Yang et al. [114] optimized multi-objective aerodynamic design problems by a multiple-point technique of the EHVI. Zuhal et al. [122] used MOBO to optimize relatively cheap computational fluid dynamics (CFD) problems. Park et al. [78] optimize chemical reactor design with CDF simulations. Biswas et al. [10] applied MOBO to the task scheduling problem in real-time heterogeneous multiprocessor systems. Olofsson et al. maximize neotissue growth and minimize operating costs using MOGO in tissue engineering [74].
292
H. Wang and K. Yang
References 1. M. Abdolshah, A. Shilton, S. Rana, S. Gupta, S. Venkatesh, Expected hypervolume improvement with constraints, in International Conference on Pattern Recognition (ICPR) (IEEE Press, 2018), pp. 3238–3243 2. M.A. Álvarez, L. Rosasco, N.D. Lawrence, Kernels for vector-valued functions: a review. Found. Trends Mach. Learn. 4(3), 195–266 (2012) 3. I.P. Androulakis, C.D. Maranas, C.A. Floudas, αbb: a global optimization method for general constrained nonconvex problems. J. Global Optim. 7(4), 337–363 (1995) 4. C. Audet, J. Bigeon, D. Cartier, S.L. Digabel, L. Salomon, Performance indicators in multiobjective optimization. Eur. J. Oper. Res. 292(2), 397–422 (2021) 5. T. Bäck, Evolutionary Algorithms in Theory and Practice - Evolution Strategies, Evolutionary Programming, Genetic Algorithms (Oxford University Press, 1996) 6. T. Bartz-Beielstein, M. Preuss, Considerations of budget allocation for Sequential Parameter Optimization (SPO), in Workshop on Empirical Methods for the Analysis of Algorithms (2006), pp. 35–40 7. J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, in Neural Information Processing Systems (NIPS) (2011), pp. 2546–2554 8. J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, M. West, Optimization under unknown constraints. Bay. Stat. 9(9), 229 (2011) 9. N. Beume, C.M. Fonseca, M. López-Ibáñez, L. Paquete, J. Vahrenhold, On the complexity of computing the hypervolume indicator. IEEE Trans. Evol. Comput. 13(5), 1075–1082 (2009) 10. S.K. Biswas, A. Rauniyar, P.K. Muhuri, Multi-objective bayesian optimization algorithm for real-time task scheduling on heterogeneous multiprocessors, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2016), pp. 2844–2851 11. E.V. Bonilla, K.M.A. Chai, C.K.I. Williams, Multi-task gaussian process prediction, in Neural Information Processing Systems (NIPS), ed. by J.C. Platt, D. Koller, Y. Singer, S.T. Roweis (Curran Associates, Inc., 2007), pp. 153–160 12. J. Bossek, C. Doerr, P. Kerschke, Initial design strategies and their effects on sequential modelbased optimization: an exploratory case study based on BBOB, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2020), pp. 778–786 13. P. Boyle, M.R. Frean, Dependent Gaussian processes, in Neural Information Processing Systems (NIPS) (2004), pp. 217–224 14. L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001) 15. D. Brockhoff, T. Wagner, H. Trautmann, On the properties of the R2 indicator, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2012), pp. 465–472 16. M.D. Buhmann, Radial Basis Functions - Theory and Implementations, in Cambridge Monographs on Applied and Computational Mathematics, vol. 12 (Cambridge University Press, 2009) 17. R. Calandra, Bayesian modeling for optimization and control in robotics. PhD thesis, Technische Universität Darmstadt, Germany, Darmstadt (2017) 18. T.M. Chan, Klee’s measure problem made easy, in Symposium on Foundations of Computer Science, FOCS (IEEE Computer Society, 2013), pp. 410–419 19. B. Chen, R.M. Castro, A. Krause, Joint optimization and variable selection of highdimensional gaussian processes, in International Conference on Machine Learning (ICML) (jmlr.org, 2012) 20. R. Cheng, Y. Jin, M. Olhofer, B. Sendhoff, A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 20(5), 773–791 (2016) 21. C. Chevalier, D. Ginsbourger, Fast computation of the multi-points expected improvement with applications in batch selection, in Learning and Intelligent Optimization (LION), ed. by G. Nicosia, P.M. Pardalos (Springer, 2013), pp. 59–69 22. I. Couckuyt, D. Deschrijver, T. Dhaene, Fast calculation of multiobjective probability of improvement and expected improvement criteria for Pareto optimization. J. Global Optim. 60(3), 575–594 (2014)
10 Bayesian Optimization
293
23. I. Das, J.E. Dennis, Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM J. Optim. 8(3), 631–657 (1998) 24. S. Daulton, M. Balandat, E. Bakshy, Differentiable expected hypervolume improvement for parallel multi-objective Bayesian optimization, in Neural Information Processing Systems (NIPS), ed. by H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (2020) 25. G. De Ath, R.M. Everson, J.E. Fieldsend, A.A.M. Rahat, ∈-shotgun: ∈-greedy batch Bayesian optimisation, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2020), pp. 787–795 26. K. Deb, K. Sindhya, J. Hakanen, Multi-objective optimization, in Decision Sciences: Theory and Practice, ed. by R.N. Sengupta, J. Dutta, A. Gupta (CRC Press, 2016), pp. 145–184 27. A. Deutz, M.T.M. Emmerich, K. Yang, The expected R2-indicator improvement for multi-objective Bayesian optimization, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2019), pp. 359–370 28. X. Emery, The Kriging update equations and their application to the selection of neighboring data. Comput. Geosci. 13(3), 269–280 (2009) 29. M.T.M. Emmerich, A. Deutz, J.-W. Klinkenberg, The computation of the expected improvement in dominated hypervolume of Pareto front approximations. Technical report, Leiden University, The Netherlands (2008). Technical Report 30. M.T.M. Emmerich, A.H. Deutz, A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat. Comput. 17(3), 585–609 (2018) 31. M.T.M. Emmerich, K.C. Giannakoglou, B. Naujoks, Single-and multiobjective evolutionary optimization assisted by Gaussian random field metamodels. IEEE Trans. Evol. Comput. 10(4), 421–439 (2006) 32. M.T.M. Emmerich, K. Yang, A. Deutz, H. Wang, C.M. Fonseca, A multicriteria generalization of Bayesian global optimization, in Advances in Stochastic and Deterministic Global Optimization, ed. by P.M. Pardalos, A. Zhigljavsky, J. Žilinskas (Springer, 2016), pp. 229–243 33. M.T.M. Emmerich, K. Yang, A.H. Deutz, Infill criteria for multiobjective Bayesian optimization, in High-Performance Simulation-Based Optimization (Springer, 2020), pp. 3–16 34. P. Feliot, J. Bect, E. Vázquez, A Bayesian approach to constrained single- and multi-objective optimization. J. Global Optim. 67(1–2), 97–133 (2017) 35. M. Feurer, A. Klein, K. Eggensperger, J.T. Springenberg, M. Blum, F. Hutter, Auto-sklearn: efficient and robust automated machine learning, in Automated Machine Learning - Methods, Systems, Challenges, ed. by H.F. et al. (Springer, 2019), pp. 113–134 36. R. Fletcher, Newton-like methods, in Practical Methods of Optimization (Wiley, 2013), pp. 44–79 37. P.I. Frazier, A Tutorial on Bayesian Optimization (2018) 38. J.R. Gardner, M.J. Kusner, Z.E. Xu, K.Q. Weinberger, J.P. Cunningham, Bayesian optimization with inequality constraints, in Machine Learning, (ICML), vol. 32 (JMLR.org, 2014), pp. 937–945 39. E.C. Garrido-Merchán, D. Hernández-Lobato, Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes. Neurocomputing 380, 20–35 (2020) 40. D. Gaudrie, R. Le Riche, V. Picheny, B. Enaux, V. Herbert, From cad to eigenshapes for surrogate-based optimization, in World Congress of Structural and Multidisciplinary Optimization (2019) 41. D. Gaudrie, R. Le Riche, V. Picheny, B. Enaux, V. Herbert, Targeting solutions in bayesian multi-objective optimization: sequential and batch versions. Ann. Math. Artif. Intell. 88(1), 187–212 (2020) 42. M.A. Gelbart, J. Snoek, R.P. Adams, Bayesian optimization with unknown constraints, in Uncertainty in Artificial Intelligence (UAI) (AUAI Press, 2014), pp. 250–259 43. M.G. Genton, Classes of kernels for machine learning: a statistics perspective. J. Mach. Learn. Res. 2, 299–312 (2001)
294
H. Wang and K. Yang
44. D. Ginsbourger, R. Le Riche, L. Carraro, Kriging is well-suited to parallelize optimization, in Computational Intelligence in Expensive Optimization Problems, ed. by Y. Tenne, C.-K. Goh (Springer, 2010), pp. 131–162 45. J. González, Z. Dai, P. Hennig, N.D. Lawrence, Batch Bayesian optimization via local penalization, in Artificial Intelligence and Statistics (AISTATS) (JMLR.org, 2016), pp. 648–657 46. D. Guo, Y. Jin, J. Ding, T. Chai, Heterogeneous ensemble-based infill criterion for evolutionary multiobjective optimization of expensive problems. IEEE Trans. Cybern. 49(3), 1012–1025 (2019) 47. D. Guo, X. Wang, K. Gao, Y. Jin, J. Ding, T. Chai, Evolutionary optimization of highdimensional multi- and many-objective expensive problems assisted by a dropout neural network. IEEE Trans. Syst. Man Cybern.: Syst. 52(4), 2084–2097 (2020) 48. M.P. Hansen, A. Jaszkiewicz, Evaluating the quality of approximations to the non-dominated set. Technical Report IMM-REP-1998-7, Institute of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark (1998) 49. N. Hansen, A. Auger, O. Mersmann, T. Tušar, D. Brockhoff, COCO: a platform for comparing continuous optimizers in a black-box setting. Optim. Methods Softw. 36, 114–144 (2021) 50. N. Hansen, S.D. Müller, P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003) 51. D. Hernández-Lobato, J. Hernandez-Lobato, A. Shah, R. Adams, Predictive entropy search for multi-objective Bayesian optimization, in International Conference on Machine Learning (ICML) (JMLR.org, 2016), pp. 1492–1501 52. J.M. Hernández-Lobato, M.A. Gelbart, R.P. Adams, M.W. Hoffman, Z. Ghahramani, A general framework for constrained Bayesian optimization using information-based search. J. Mach. Learn. Res. 17, 160:1–160:53 (2016) 53. J.M. Hernández-Lobato, M.W. Hoffman, Z. Ghahramani, Predictive entropy search for efficient global optimization of black-box functions, in Neural Information Processing Systems (NIPS) (2014), pp. 918–926 54. F. Hutter, H.H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, in Learning and Intelligent Optimization (LION), ed. by C.A.C. Coello (Springer, 2011), pp. 507–523 55. F. Hutter, H.H. Hoos, K. Leyton-Brown, Parallel algorithm configuration, in Learning and Intelligent Optimization (LION), ed. by Y.e.a. Hamadi (Springer, 2012), pp. 55–70 56. H. Jin, Q. Song, X. Hu, Auto-Keras: an efficient neural architecture search system, in SIGKDD Knowledge Discovery & Data Mining (KDD) (ACM Press, 2019), pp. 1946–1956 57. D.R. Jones, A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21(4), 345–383 (2001) 58. D.R. Jones, M. Schonlau, W.J. Welch, Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998) 59. S. Ju, T. Shiga, L. Feng, Z. Hou, K. Tsuda, J. Shiomi, Designing nanostructures for phonon transport via bayesian optimization. Phys. Rev. X 7(2), 021024 (2017) 60. K. Kandasamy, A. Krishnamurthy, J. Schneider, B. Póczos, Parallelised bayesian optimisation via thompson sampling, in Artificial Intelligence and Statistics (AISTATS), vol. 84 (JMLR.org, 2018), pp. 133–142 61. A.J. Keane, Statistical improvement criteria for use in multiobjective design optimization. Amer. Instit. Aeron. Astron. (AIAA) J. 44(4), 879–891 (2006) 62. J. Kirschner, M. Mutny, N. Hiller, R. Ischebeck, A. Krause, Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces, in International Conference on Machine Learning (ICML) (JMLR.org, 2019), pp. 3429–3438 63. J.D. Knowles, ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans. Evol. Comput. 10(1), 50–66 (2006) 64. B. Lakshminarayanan, D.M. Roy, Y.W. Teh, Mondrian forests for large-scale regression when uncertainty matters, in Artificial Intelligence and Statistics (AISTATS) (JMLR, 2016), pp. 1478–1487
10 Bayesian Optimization
295
65. R. Lam, M. Poloczek, P. Frazier, K.E. Willcox, Advances in Bayesian optimization with applications in aerospace engineering, in AIAA Non-Deterministic Approaches Conference (2018), pp. 1656–1665 66. C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, A. Shilton, High dimensional Bayesian optimization using dropout, in International Joint Conference on Artificial Intelligence (2017), pp. 2096–2102 67. R. Li, M.T.M. Emmerich, J. Eggermont, T. Bäck, M. Schütz, J. Dijkstra, J.H.C. Reiber, Mixed integer evolution strategies for parameter optimization. Evol. Comput. 21(1), 29–64 (2013) 68. R.T. Marler, J.S. Arora, The weighted sum method for multi-objective optimization: new insights. Struct. Multidiscip. Optim. 41(6), 853–862 (2010) 69. S. Marmin, C. Chevalier, D. Ginsbourger, Differentiating the multipoint expected improvement for optimal batch design, in Machine Learning, Optimization, and Big Data (Springer, 2015), pp. 37–48 70. K. Miettinen, Nonlinear Multiobjective Optimization (Kluwer Academic Publishers, 1999) 71. J. Moˇckus, On Bayesian methods for seeking the extremum, in Optimization Techniques IFIP technical conference (Springer, 1975), pp. 400–404 72. R. Moriconi, M.P. Deisenroth, K.S.S. Kumar, High-dimensional Bayesian optimization using low-dimensional feature spaces. Mach. Learn. 109(9–10), 1925–1943 (2020) 73. H. Niederreiter, Low-discrepancy and low-dispersion sequences. J. Number Theory 30(1), 51–70 (1988) 74. S. Olofsson, M. Mehrian, R. Calandra, L. Geris, M.P. Deisenroth, R. Misener, Bayesian multiobjective optimisation with mixed analytical and black-box functions: application to tissue engineering. IEEE Trans. Biomed. Eng. 66(3), 727–739 (2018) 75. H. Omre, Bayesian Kriging-Merging observations and qualified guesses in Kriging. Math. Geol. 19(1), 25–39 (1987) 76. P.S. Palar, K. Yang, K. Shimoyama, M.T.M. Emmerich, T. Bäck, Multi-objective aerodynamic design with user preference using truncated expected hypervolume improvement, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 1333–1340 77. B. Paria, K. Kandasamy, B. Póczos, A flexible framework for multi-objective Bayesian optimization using random scalarizations, in Uncertainty in Artificial Intelligence (JMLR.org, 2020), pp. 766–776 78. S. Park, J. Na, M. Kim, J.M. Lee, Multi-objective Bayesian optimization of chemical reactor design using computational fluid dynamics. Comput. & Chem. Eng. 119, 25–37 (2018) 79. J.M. Parr, Improvement criteria for constraint handling and multiobjective optimization. Ph.D. thesis, University of Southampton, UK (2013) 80. V. Picheny, Multiobjective optimization using Gaussian process emulators via stepwise uncertainty reduction. Stat. Comput. 25(6), 1265–1280 (2015) 81. V. Picheny, T. Wagner, D. Ginsbourger, A benchmark of kriging-based infill criteria for noisy optimization. Struct. Multidiscip. Optim. 48(3), 607–626 (2013) 82. S. Qin, C. Sun, Y. Jin, G. Zhang, Bayesian approaches to surrogate-assisted evolutionary multiobjective optimization: a comparative study, in IEEE Symposium Series on Computational Intelligence (SSCI) (IEEE Press, 2019), pp. 2074–2080 83. E. Raponi, M. Bujny, M. Olhofer, N. Aulig, S. Boria, F. Duddeck, Kriging-assisted topology optimization of crash structures. Comput. Methods Appl. Mech. Eng. 348, 730–752 (2019) 84. C.E. Rasmussen, C.K.I. Williams, Gaussian Processes for Machine Learning (MIT Press, 2006) 85. O. Roustant, E. Padonou, Y. Deville, A. Clément, G. Perrin, J. Giorla, H. Wynn, Group kernels for Gaussian process metamodels with categorical inputs. SIAM/ASA J. Uncert. Quantif. 8(2), 775–806 (2020) 86. T.J. Santner, B.J. Williams, W.I. Notz, The Design and Analysis of Computer Experiments (Springer, 2003) 87. T.J. Santner, B.J. Williams, W.I. Notz, Some criterion-based experimental designs, in The Design and Analysis of Computer Experiments (Springer, 2003), pp. 163–187
296
H. Wang and K. Yang
88. M. Schonlau, Computer Experiments and Global Optimization. Ph.D. thesis, University of Waterloo, Canada (1997) 89. M. Schonlau, W.J. Welch, D.R. Jones, Global versus local search in constrained optimization of computer models. Lecture Notes-Monograph Series, vol. 34 (1998), pp. 11–25 90. A. Shah, A.G. Wilson, Z. Ghahramani, Student-t processes as alternatives to Gaussian processes, in Artificial Intelligence and Statistics (AISTATS) (JMLR, 2014), pp. 877–885 91. B. Shahriari, K. Swersky, Z. Wang, R.P. Adams, N. de Freitas, Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016) 92. J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M.M.A. Patwary, Prabhat, R.P. Adams, Scalable Bayesian Optimization Using Deep Neural Networks, in Machine Learning (ICML) JMLR Workshop and Conference Proceedings, vol. 37 (JMLR.org, 2015), pp. 2171–2180 93. A. Sobester, S.J. Leary, A.J. Keane, On the design of optimization strategies based on global response surface approximation models. J. Global Optim. 33(1), 31–59 (2005) 94. J.T. Springenberg, A. Klein, S. Falkner, F. Hutter, Bayesian optimization with robust bayesian neural networks, in Neural Information Processing Systems (NIPS) (2016), pp. 4134–4142 95. N. Srinivas, A. Krause, S.M. Kakade, M.W. Seeger, Gaussian process optimization in the bandit setting: no regret and experimental design, in Machine Learning (ICML) (Omnipress, 2010), pp. 1015–1022 96. M.L. Stein, Interpolation of Spatial Data: Some Theory for Kriging. Springer Series in Statistics (Springer, 1999) 97. B.E. Stuckman, A global search method for optimizing nonlinear systems. IEEE Trans. Syst. Man Cybern. 18(6), 965–977 (1988) 98. J. Svenson, T.J. Santner, Multiobjective optimization of expensive-to-evaluate deterministic computer simulator models. Comput. Stat. & Data Anal. 94, 250–264 (2016) 99. M. Taddy, C. Chen, J. Yu, M. Wyle, Bayesian and empirical Bayesian Forests, in International Conference on Machine Learning (ICML) (JMLR.org, 2015), pp. 967–976 100. C. Thornton, F. Hutter, H.H. Hoos, K. Leyton-Brown, Auto-weka: combined selection and hyperparameter optimization of classification algorithms, in SIGKDD Knowledge Discovery and Data Mining (KDD) (ACM Press, 2013), pp. 847–855 101. D. Vermetten, H. Wang, C. Doerr, T. Bäck, Integrated vs. sequential approaches for selecting and tuning CMA-ES variants, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2020), pp. 903–912 102. H. Wang, T. Bäck, M.T.M. Emmerich, Multi-point efficient global optimization using niching evolution strategy, in EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation. Advances in Intelligent Systems and Computing, vol. 674 (Springer, 2015), pp. 146–162 103. H. Wang, M.T.M. Emmerich, T. Bäck, Cooling strategies for the moment-generating function in Bayesian global optimization, in Congress on Evolutionary Computation CEC (IEEE Press, 2018), pp. 1–8 104. H. Wang, Y. Lou, T. Bäck, Hyper-parameter optimization for improving the performance of grammatical evolution, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2019), pp. 2649–2656 105. H. Wang, B. van Stein, M.T.M. Emmerich, T. Bäck, A new acquisition function for bayesian optimization based on the moment-generating function, in IEEE International Conference on Systems, Man and Cybernetics (SMC) (IEEE Press, 2017), pp. 507–512 106. X. Wang, Y. Jin, S. Schmitt, M. Olhofer, An adaptive bayesian approach to surrogate-assisted evolutionary multi-objective optimization. Inf. Sci. 519, 317–331 (2020) 107. Z. Wang, F. Hutter, M. Zoghi, D. Matheson, N. de Freitas, Bayesian optimization in a billion dimensions via random embeddings. J. Artif. Intell. Res. 55, 361–387 (2016) 108. K. Yang, A. Deutz, Z. Yang, T. Back, M.T.M. Emmerich, Truncated expected hypervolume improvement: exact computation and application, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2016), pp. 4350–4357
10 Bayesian Optimization
297
109. K. Yang, M.T.M. Emmerich, A. Deutz, T. Bäck, Efficient computation of expected hypervolume improvement using box decomposition algorithms. J. Global Optim. (2019) 110. K. Yang, M.T.M. Emmerich, A. Deutz, T. Bäck, Multi-objective bayesian global optimization using expected hypervolume improvement gradient. Swarm Evol. Comput. 44, 945–956 (2019) 111. K. Yang, M.T.M. Emmerich, A Deutz, C.M. Fonseca, Computing 3-D expected hypervolume improvement and related integrals in asymptotically optimal time, in Evolutionary Multicriterion Optimization (EMO) (Springer, 2017), pp. 685–700 112. K. Yang, D. Gaida, T. Bäck, M.T.M. Emmerich, Expected hypervolume improvement algorithm for PID controller tuning and the multiobjective dynamical control of a biogas plant, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2015), pp. 1934–1942 113. K. Yang, L. Li, A. Deutz, T. Bäck, M.T.M. Emmerich, Preference-based multiobjective optimization using truncated expected hypervolume improvement, in Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (IEEE Press, 2016), pp. 276–281 114. K. Yang, P.S. Palar, M.T.M. Emmerich, K. Shimoyama, T. Bäck, A multi-point mechanism of expected hypervolume improvement for parallel multi-objective bayesian global optimization, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2019), pp. 656–663 115. K. Yang, K. van der Blom, T. Bäck, M.T.M. Emmerich, Towards single-and multiobjective bayesian global optimization for mixed integer problems, in AIP Conference Proceedings, vol. 2070 (AIP Publishing LLC, 2019), p. 020044 116. Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007) 117. Q. Zhang, W. Liu, E.P.K. Tsang, B. Virginas, Expensive multiobjective optimization by MOEA/D with gaussian process model. IEEE Trans. Evol. Comput. 14(3), 456–474 (2010) 118. Y. Zhang, S. Tao, W. Chen, D.W. Apley, A latent variable approach to gaussian process modeling with qualitative and quantitative factors. Technometrics 62(3), 291–302 (2020) 119. A. Zilinskas, A review of statistical models for global optimization. J. Global Optim. 2(2), 145–153 (1992) 120. E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999) 121. E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V. Grunert da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003) 122. L.R. Zuhal, C. Amalinadhi, Y.B. Dwianto, P.S. Palar, K. Shimoyama, Benchmarking multi-objective bayesian global optimization strategies for aerodynamic design, in AIAA/ASCE/AHS/ASC Structures. Structural Dynamics, and Materials Conference (2018), p. 0914 123. M. Zuluaga, A. Krause, M. Puschel, ε-pal: an active learning approach to the multi-objective optimization problem. J. Mach. Learn. Res. 17(1), 3619–3650 (2016)
Chapter 11
A Game Theoretic Perspective on Bayesian Many-Objective Optimization Mickaël Binois, Abderrahmane Habbal, and Victor Picheny
Abstract This chapter addresses the question of how to efficiently solve manyobjective optimization problems in a computationally demanding black-box simulation context. We shall motivate the question by applications in machine learning and engineering and discuss specific harsh challenges in using classical Pareto approaches when the number of objectives is four or more. Then, we review solutions combining approaches from Bayesian optimization, e.g., with Gaussian processes, and concepts from game theory like Nash equilibria, Kalai–Smorodinsky solutions and detail extensions like Nash–Kalai–Smorodinsky solutions. We finally introduce the corresponding algorithms and provide some illustrating results.
11.1 Introduction In the previous Chap. 10, methods for multi-objective (MO) Bayesian optimization (BO) have been introduced. Scaling with many-objectives burst the seams of these existing solutions on several aspects: (i) The positive-definiteness requirement for multi-output Gaussian process (GP) covariance matrix severely restricts their modelling ability, letting independent GPs as the default alternative that does not directly exploit possible correlations between objectives (see also Chap. 9); (ii) Summing the objectives is a go-to strategy often observed in practice. But picking weights in scalarization methods becomes more arbitrary, with effects that are difficult to apprehend. More generally, specifying preferences for a large number of conflicting objectives can lead to situations where the connection between achievable solutions M. Binois (B) LJAD, CNRS, Inria, Université Côte d’Azur, Sophia Antipolis, Nice, France e-mail: [email protected] A. Habbal LJAD, UMR 7351, CNRS, Inria, Université Côte d’Azur, Nice, France e-mail: [email protected] V. Picheny Secondmind, Cambridge, UK e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_11
299
300
M. Binois et al.
and the given preferences is not easy to interpret; (iii) For Pareto-based infill criteria, the computational cost grows quickly, for instance with hypervolume computations. Plus there are less closed form expressions available than for two or three objectives; (iv) The effect of more objectives on the acquisition function landscape is to be studied, but they are expected to be more multimodal due to larger Pareto sets. And these are only a few additional technical problems faced by many-objective optimization (MaOO). But even the desired outcome is problematic. As the number of objectives increases, the dimension of the Pareto front generally grows accordingly, and representing this latter is more complex. With a limited budget of evaluations, this may not even be achievable: a few hundred design points would not give an accurate representation of a complex ten-dimensional manifold. In addition, even if representing the entire Pareto front was achievable, the decision maker would be left with a very large set of incomparable solutions, with limited practical relevance. Hence, MaOO is more of an elicitation problem, for which good and principled ways to choose solutions are needed. When the decision maker is involved in the optimization process by giving feedback on pairs of solutions, methods learning this preference information such as [5, 33] are available. More diffuse preference information can also be included via the choice of the reference point for hypervolume computation (nullifying the contributions of equivalent or dominated solutions), interactively modifying ranges for objectives [26] or via an aspiration point serving as target (see, e.g., [21]). Ordering the importance of objectives is another option, exploited e.g., in [1]. With many objectives, possibly of various natures and scales, the burden on the decision maker is heavier, and the effect on the outcome of these preferences is less intuitive. When the decision is only made a posteriori, knee points, Pareto solutions from which improving slightly one objective would deteriorate much more some others, are appealing to decision makers, or in the design of evolutionary algorithms, see e.g., [52]. But their existence and location on the Pareto front may vary, especially in the MaOO case. Other arguments have been directed toward finding a single solution at the “center” of the Pareto front to keep balance between objectives, e.g., by [20, 39]. The rationale is that trade-off solutions at the extremes are not desirable unless prior information has been given. Finding such balanced solutions, without input from the decision maker during the optimization, is the default framework considered in this chapter. The difficulty lies in an appropriate and principled definition of a central solution. A possible approach is to frame the MaOO problem as a game theory one, where each objective is impersonated by a player, and the players bargain until they agree on a satisfying solution for all, referred to as equilibrium. Game theory is already popular in a variety of applications such as engineering [15], control [25], multi-agent systems [11] or machine learning [32]. Game theory has also revealed to be a powerful tool in multi-disciplinary optimization, where decentralized strategies for non-cooperative agents takes its full and relevant meaning, as referring to discipline-specific solvers in a parallel, asynchronous and heterogeneous multi-platform setting, see e.g., the pioneering framework developed in [40]. It comes equipped with principled ways of eliciting solutions (the
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
301
bargaining axioms of scale invariance, symmetry, efficiency, and independence of irrelevant alternatives), echoing several desirable properties for improvement functions used in MO BO [45]: reflecting Pareto dominance, no input in the form of external parameters (such as a reference point), or robustness to scale invariance. Indeed, with more objectives of different nature, finding a reasonable common scale for all is much harder and the effects cannot be apprehended. Compared to the more standard multi-objective BO framework described in the previous chapter, the games framework comes with some specific challenges. The definition of solutions such as Nash equilibrium or Kalai–Smorodinsky solutions can be quite involved, and finding such equilibria generally implies solving several inference tasks at once. For instance, this makes the use of improvement-based approaches difficult. Still, several approaches have emerged using regret or stepwise uncertainty reduction (SUR) frameworks, see e.g., [1, 42] and references therein. The structure of the chapter is as follows. In Sect. 11.2, we briefly review key concepts in game theory such as Nash equilibria. Then in Sect. 11.3 we present highlevel BO approaches to solve many-objective problems under the Games paradigm. Illustrations are provided in Sect. 11.4 before discussing remaining challenges and perspectives in Sect. 11.5.
11.2 Game Equilibria to Solution Elicitation The standard multi-objective optimization problem (MOP) corresponds to the simultaneous optimization of all objectives: (M O P) min( f 1 (x), . . . , f m (x)). x∈X
(11.1)
Besides considering objectives as individual players’ goals, general non-cooperative games, and related solutions, such as Nash ones, do need territory splitting. That is, partitioning of the optimization variables (the input space) among players. Other games and related solution concepts depend on choices of anchor points, as is the case for the Kalai–Smorodinsky (KS) solution, which depends on ideal and disagreement points, as illustrated in Fig. 11.1. We shall introduce in the following the Nash equilibrium concept, then motivated by the generic inefficiency of the latter, move to considering the Kalai–Smorodinsky solution.
11.2.1 Nash Games and Equilibria When considering primarily the standard (static, under complete information) Nash equilibrium problem [23], each objective becomes a player’s outcome. Compared to a standard MOP (11.1) where players share the control of the same set of variables
302
M. Binois et al.
Fig. 11.1 Illustration of solutions of two different elicitation problems with two different disagreement points d: either the Nadir point or a Nash equilibrium, resulting in the KS and NKS points (stars) on the Pareto front P, respectively. The shaded area shows the feasible objective space
(or action space), variables are uniquely allocated to a player in the so-called territory splitting. Denote ∏ xi the variables of player i and Xi its corresponding action space, the variable vector x consists of block components where X = i Xi . Accordingly, x1 , . . . , xm x = (x j )1≤ j≤m . We shall use the convention f i (x) = f i (xi , x−i ) when we need to emphasize the role of xi , where x−i is the subset of variables controlled by players j /= i. Definition 11.1 A Nash equilibrium problem (NEP) consists of m ≥ 2 decision makers (i.e., players), where each player i ∈ {1, . . . , m} wants to solve its optimization problem: (11.2) (Pi ) min f i (x), xi ∈Xi
and f(x) = [ f 1 (x), . . . , f m (x)] : X ⊂ Rn → Rm (with n ≥ m) denotes a vector of cost functions (a.k.a. pay-off or utility functions when maximized), f i denotes the specific cost function of player i. Definition 11.2 A Nash equilibrium x∗ ∈ X is a strategy such that: ∗ (N E) ∀i, 1 ≤ i ≤ m, xi∗ = arg min f i (xi , x−i ). xi ∈Xi
(11.3)
In other words, when all players have chosen to play a NE, then no single player has incentive to move from his xi∗ . The above definition shows some attractive features of NE: it is scale-invariant, does not depend on arbitrary parameters (such as the reference point), and convey a notion of first-order stationarity. On the other hand, it requires partitioning the inputs between players, which may come naturally for some problems, but may be problematic for many MaOO problems. Plus an arbitrary partitioning can have an
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
303
unknown outcome on the solution. One option proposed in [17] is to define the partitioning based on sensitivity analysis of one main objective. Hence a perspective would be to define an optimal partitioning in the sense that it allocates variables (or a linear combination of variables) to the player it has most influence on. Besides the need to define a territory splitting, it is important to notice that, generically, Nash equilibria are not efficient, i.e., do not belong to the underlying set of best compromise solutions, the so-called Pareto front, of the objective vector ( f i (x))x∈X . Indeed, NE efficiency may happen when one of the players has control on all the optimization variables while the others have control on nothing (a somehow degenerate territory splitting). Still, NE have a couple of advantages: a nice notion of stationarity, they are generally well balanced and, of interest for many objectives, scale invariant. Remark 11.1 When block components x1 , . . . , xm of the variable vector x overlap due to, e.g., constraints, then one should modify (11.2) and (11.3) accordingly, so that xi ∈ Xi (x−i ). In the literature, the Nash equilibrium problem is then referred to as a generalized Nash equilibrium problem (GNEP). Remark 11.2 The NE solution is scale invariant, and more generally invariant under any strictly increasing transformation Ψi (1 ≤ i ≤ m) since, in this case, (11.3) is equivalent to ∗ ∗ ∀i, 1 ≤ i ≤ m, Ψi ( f i (xi∗ , x−i )) ≤ Ψi ( f i (xi , x−i ))
Interestingly, it could be shown at least in the convex case that if players share probabilistic control on the optimization variable components (i.e., any player j may control the same component xi with probability pi j ) then there exists a probability matrix ( pi j ) such that the associated Nash equilibrium (with an ad hoc definition) lies in the Pareto front, see [2] for a sketch of the approach. From a practical viewpoint, computing NE for games stated in continuous variable settings (as opposed to discrete or finite games, e.g., in vector spaces with Banach or Hilbert structure) can be based on variational analysis, e.g., with the classical fixed-point algorithms to solve NEPs [6, 35, 48]. A modified notion of Karush– Kuhn–Tucker (KKT) points, adapted to (generalized) NE, is developed by [30], to propose a dedicated augmented Lagrangian method. Yet these methods require too many evaluations to tackle directly expensive black-boxes, requiring specific BO algorithms that are presented in Sect. 11.3.
11.2.2 The Kalai–Smorodinsky Solution The Kalai–Smorodinsky solution was first proposed by Kalai and Smorodinsky in 1975 as an alternative to the Nash bargaining solution in cooperative bargaining. Differently from Nash equilibria, no partitioning of the decision variable among players is required by the KS solution concept.
304
M. Binois et al.
The problem is as follows: starting from a disagreement or status quo point d in the objective space, the players aim at maximizing their own benefit while moving from d toward the Pareto front (i.e., the efficiency set). The KS solution is of egalitarian inspiration [14] and states that the selected efficient solution should yield equal benefit ratio to all the players. Indeed, given the utopia (or ideal, or shadow) point u ∈ Rm defined by u i = min f i (x), x∈X
selecting any compromise solution y = [ f 1 (x), . . . , f m (x)] would yield, for objective i, a benefit ratio di − yi . ri (x) = di − u i Notice that the benefit from staying at d is zero, while it is maximal for the generically unfeasible point u. The KS solution is the Pareto optimal choice y∗ = [ f 1 (x∗ ), . . . , f m (x∗ )] for which all the benefit ratios ri (x) are equal: x∗ ∈ {x ∈ X s.t. r1 (x) = · · · = rm (x)} ∩ {x ∈ X s.t. f(x) ∈ P} . Geometrically, y∗ is the intersection point of the Pareto front and the line (d, u) (see Fig. 11.1), which may be empty. For m = 2, y∗ exists if the Pareto front is continuous. For the general case, we use here the extension of the KS solution proposed by [28] under the name efficient maxmin solution. Since the intersection with the (d, u) line might not be feasible, there is a necessary trade-off between Pareto optimality and centrality. The efficient maxmin solution is defined as the Pareto-optimal solution that maximizes the smallest benefit ratio among players, that is: x∗∗ ∈ arg max min ri (x). y∈P 1≤i≤m
It is straightforward that when the intersection is feasible, then y∗ and y∗∗ := f(x∗∗ ) coincide. Figure 11.1 shows y∗∗ in the situation when the feasible space is nonconvex. y∗∗ is always on the Pareto front (hence not necessarily on the (d, u) line). In the following, we refer indifferently to y∗ (if it exists) and y∗∗ as the KS solution. Note that this definition also extends to discrete Pareto sets. For m = 2, the (non-extended) KS solution can be axiomatically characterized as the unique solution fulfilling all the bargaining solution axioms, which are: Pareto optimality, symmetry, affine invariance, and restricted monotonicity [29]. For m ≥ 3, there is no axiomatic setting, the KS solution being required to fulfill Pareto optimality, affine invariance, and equity in benefit ratio. Still, KS is particularly attractive in a many-objective context since it scales naturally to a large number of objectives and returns a single solution, avoiding the difficulty of exploring and approximating large m-dimensional Pareto fronts–especially with a limited number of observations.
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
305
Closely related to the Kalai and Smorodinsky solution is the reference point methodology, developed by [51]. The reference point methodology uses achievement functions, which refer to aspiration and reservation references. Ideal and nadir points are particular instances of such reference points, respectively, and achievement functions play the same role as the benefit ratio of our present setting. Moreover, the notion of neutral compromise and max-min (of achievement functions) were introduced by Wierzbicki in the cited reference, yielding a framework objectively close to the one of Kalai and Smorodinsky. One important difference between the reference point and KS approaches is that the latter relies on a game theoretic axiomatic construction, in view of palliating the Nash equilibrium inefficiency.
11.2.3 Disagreement Point Choice Clearly, the KS solution strongly depends on the choice of the disagreement point d. A standard choice is the nadir point N given by Ni = maxx∈Pareto set f (i) (x). Some authors introduced alternative definitions, called extended KS, to alleviate the critical dependence on the choice of d [10], for instance by taking as disagreement point the Nash equilibrium arising from a previously played non-cooperative game. But such a choice would need a pre-bargaining split of the decision variable vector x among the m players. When some relevant territory splitting is agreed on, then Nash yields KS an interesting and non arbitrary disagreement point, and in return, KS yields the Nash game an interesting and non arbitrary efficient solution, starting from the non efficient equilibrium. The resulting Nash–Kalai–Smorodinsky solution is denoted NKS. The nadir point is very useful to rescale the Pareto front in case objectives are of different nature or scale, see e.g., [7]. As such, it makes a natural disagreement solution. Still, finding the nadir point is a complex task, especially under a limited budget, as it involves exploring regions good for a few objectives with worse values on others. The many objective context makes this issue more prominent. It is related to the problem of Pareto resistance, i.e., regions of the space with good values on a few objectives but not Pareto optimal (see, e.g., [19] and references therein). Unless some objectives are more important than others, the corresponding extremal regions of the Pareto front are of little interest for selecting a single good solution. A simpler disagreement point is via the pseudo-nadir point, defined as the worst objective solution over designs achieving the minimum for one objective: N˜ i = maxx∈{x(1)∗ ,...,x(m)∗ } f i (x), where x( j)∗ ∈ arg minx∈X f j (x). It can be extracted directly from the pay-off table, but it is not robust as it can under- or over-estimate the true nadir point, see e.g., [36] for details. While for two objectives, nadir and pseudonadir coincide, this is not the case anymore in higher dimensions. Using this pseudo nadir point removes the need to search for extremal non-dominated solutions on the Pareto front, and aligns with finding the ideal point. The egalitarian inspiration of the KS solution is based on the assumption that all the objectives have equal importance. However, in many practical cases, the end user
306
M. Binois et al.
may want to favor a subset of objectives which are of primary importance, while still retaining the others for optimization. One way of incorporating those preferences [43, 46, 50] is to discard solutions with extreme values and actually solve the following problem (constrained version of the original MOP (11.1)): min x∈X
{ f 1 (x), . . . , f m (x)}
s.t. f i (x) ≤ ci ,
i ∈ J ⊂ [1, . . . , m] ,
with ci ’s predefined (or interactively defined) constants. Choosing a tight value (i.e., difficult to attain) for ci may discard a large portion of the Pareto set and favor the ith objective over the others. Incorporating such preferences in the KS solution can simply be done by using c as the disagreement point if all objectives are constrained, or by replacing the coordinates of the nadir with the ci values. In a game theory context, this would mean that each player would state a limit of acceptance for his objective before starting the cooperative bargaining. From geometrical considerations, one may note that d (and hence c) does not need to be a feasible point. As long as d is dominated by the utopia point u, the KS solution remains at the intersection of the Pareto front and the (d, u) line. An example where preferences from a decision maker are integrated this way is provided in [8]. As shown for instance in [8], by applying a monotonic rescaling to the objectives, the corresponding KS solution can move along the Pareto front. An example of a monotonic value function applied to a raw objective corresponds to taking the logarithm of one objective for a physical or modeling reason. Since Nash equilibrium are invariant to such rescaling, they are more stable than nadir or pseudo-nadir based KS solutions, but still affected. One proposed option to alleviate this effect is to use ranks instead of objective values directly. Interestingly, it induces a dependency on the parameterization of the problem. Studying these points can include the use of a measure on the input space and the use of copulas in the objective one, [8, 9], with further work needed to quantify the effect on the Pareto front. The combination with more involved preference learning methods such as [5, 33] could also be investigated.
11.3 Bayesian Optimization Algorithms for Games Solving expensive black-box games is an emerging topic in Bayesian optimization [3, 4, 8, 42]. Similarly to the single and multi-objective cases, BO leverages the probabilistic information available from the Gaussian process models to balance exploration and exploitation in searching for the game equilibrium. Other metamodelling options could be entertained, as long as they also provide mean and variance predictions at any location, plus the ability to get realizations (posterior draws) at arbitrary locations. We present here several algorithms to solve such equilibrium or solution finding problems.
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
307
For the Nash equilibrium (Sect. 11.3.1), [4] proposed an upper confidence boundlike acquisition function based on a regret function approximating the game-theoretic regret. The case of potential games [24], when a single function can summarize the game, commonly arising in problems with shared resources, is considered by [3], with a dedicated expected improvement-like acquisition function. In [8, 42], the flexible and general step-wise uncertainty reduction is applied to estimate different classes of equilibria, including Nash and KS, which we present in Sect. 11.3.2. Although not proposed in the literature (up to our knowledge), a related family of acquisition functions based on Thompson sampling is also applicable, as we describe briefly in Sect. 11.3.3.
11.3.1 Fixed Point Approaches for the Nash Equilibrium Fixed points methods, see an example in Algorithm 11.1, are one way to get Nash equilibrium. In the expensive case, a naïve version is simply to replace the true objectives by their GP mean prediction: fˆi , 1 ≤ i ≤ m. While computationally efficient and possibly sufficient for a coarse estimation, such an approach does not address the exploration-exploitation trade-off, and may lead to poor estimates if the models are not accurate representations of the objectives.
Algorithm 11.1: Pseudo-code for the fixed-point approach [48] 1 2 3 4 5 6 7 8
Require: m: number of players, 0 < α < 1 : relaxation factor, n max : max iterations; Construct initial strategy x(0) ; while n ≤ n max do (k+1) (k) Compute in parallel: ∀i, 1 ≤ i ≤ m, zi = arg minxi ∈Xi fˆi (x−i , xi ); Update : x(k+1) = αz(k+1) + (1 − α)x(k) ; if ∥x(k+1) − x(k) ∥ small enough then exit; ∗ ,x ) Ensure: For all i = 1, . . . , m, xi∗ = arg minxi ∈Xi Ji (x−i i
Al-Dujaili et al. [3] provides an acquisition function based on the upper confidence bound to estimate Nash equilibrium for continuous games. They focus on the game theoretic regret, that is the most any player can gain from deviating from the current solution. Following small deviations for each player in turns is also the underlying principle in fixed point methods. The corresponding acquisition function aims to select as next design to evaluate one achieving the minimum of the regret of the GP surrogate. But rather than optimizing directly to get the minimum objective values, these are replaced by an approximation. That is, for a player i at xi , the minimum value
308
M. Binois et al.
is replaced by the sum of the mean objective value over x−i with the corresponding standard deviation, scaled by a factor γ . These can be computed analytically for GPs for some separable covariance functions. In the special case of potential games [24], also related to fixed point methods, the strategy of improving as much as possible one player leads to the Nash equilibrium (and not any fixed point). This is exploited by [4] who directly considers the potential. By definition, the potential ϕ : X → R is a function such that f i (x' , x−i ) − f i (x'' , x−i ) = ϕ(x' , x−i ) − ϕ(x'' , x−i ), ∀i, x' , x'' . As the available measurements are on the f i ’s and not the potential directly, constructing the GP of the potential ϕ involves taking into account integral and gradient operators. Once the GP is fitted, they proposed an acquisition function similar to expected improvement [37].
11.3.2 Stepwise Uncertainty Reduction Step-wise uncertainty reduction (SUR) methods, see, e.g., [12, 22, 49], require an uncertainty measure ┌ about a quantity of interest, here the game equilibrium. Then new evaluations are selected sequentially in order to reduce this uncertainty, leading to an accurate identification of the quantity of interest. This framework is welladapted to games as it allows to integrate various learning tasks into a single goal (for instance, the definition of KS type solutions involves several other unknown quantities, such as the ideal and nadir points). For any multivariate function f defined over X, with image Y, denote by Ψ : Y → Rm the mapping that associates its equilibrium (Nash, KS or other). When the function is replaced by a GP emulator conditioned on n observations multivariate f(x(1) ), . . . , f(x(n) ) , Ψ is defined on the corresponding distribution Yn (). Then Ψ(Yn ) is a random vector (of unknown distribution). Loosely speaking, the spread of the corresponding distribution in the objective space reflects the uncertainty on the solution location, given by the uncertainty measure ┌(Yn ). One measure of variability of a vector is the determinant of its covariance matrix [18]: ┌(Yn ) = det [cov (Ψ(Yn ))], while other information theoretic measures could be used. For instance, this could be the conditional entropy as in [49]. The SUR strategy greedily chooses the next observation that reduces the most this uncertainty: max ┌(Yn ) − ┌(Yn,x ) , x∈X
where Yn,x is the GP conditioned on {f(x(1) ), . . . , f(x(n) ), f(x)}. However, such an ideal strategy is not tractable as it would require evaluating all f(x) while maximizing over X. A more manageable strategy is to consider the expected uncertainty reduction: ┌(Yn ) − EYn (x) ┌(Yn,x ) , where EYn (x) denotes the expectation taken over Yn (x). Removing the constant term ┌(Yn ), the policy is defined with
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
309
x ∈ arg min J (x) = EYn (x) ┌(Yn,x ) . x∈X
Informally speaking, the variation of Ψ is partly caused by not knowing the precise localization of the solution of interest, plus eventually from not knowing the values of the anchor points. The realizations corresponding to these specific points would either be for designs close to the solution of the GP means, or for designs with large variance. Hence J (x) defines a trade-off between exploration and exploitation, as well as a trade-off between the different learning tasks (d and u points, Nash equilibrium, Pareto front). Hence the BO optimization loop is fully defined (see Chap. 10, Algorithm 10.1) and the difficulty boils down to evaluating J efficiently. A crucial aspect in enabling the use of the SUR strategy is the ability to generate realizations or estimate Yn efficiently. When X is discrete (or can be discretized), the approaches proposed in [8, 42] relies on the use of conditional simulations, i.e., joint posterior samples on X or a well chosen subset of it, coupled with fast update formulas for the resulting ensembles, following [13]. This approach hinges on the discretization size used, which needs to remain in the thousands due to the cubic computational complexity in the number of samples. The targeted solution is then computed on each realization, resulting in samples from Yn . Note that for discrete Nash equilibrium computations, all combinations of the selected strategies for each player must be simulated at. An alternative solution, applicable to continuous X would rely on approximated sample paths with a closed form expression as used, e.g., by [38]. Again, the appropriate solution can be obtained from the samples, which is inexpensive compared to using the expensive black-box. The resulting criterion for KSE estimation automatically interleaves improving the estimation of the specific quantities involved, by balancing between estimation of the Nash equilibrium, of the objective-wise minima for the ideal point and of the intersection with the Pareto front. A different use of the posterior distribution is also possible, with Thompson sampling.
11.3.3 Thompson Sampling Over the past decade, Thompson sampling (TS) [47] has become a very popular algorithm to solve sequential optimization problems. In a nutshell, TS proceeds by sequentially sampling from the posterior distribution of the problem solution, allowing for efficiently addressing exploration-exploitation trade-offs. In a GP-based setting, this is simply achieved by sampling from the posterior distribution of the objectives and choosing as the next sampling point the input that realizes the equilibrium on the sample. As it would not directly try to pinpoint the ideal point (say), it cannot be directly transposed for estimating KS solutions. Still, it may serve to estimate Nash equilibria. However, up to our knowledge this solution has not been studied yet (the closest
310
M. Binois et al.
proposition is an algorithm in [42] based on the probability of realizing the optimum). Note that the same approximation of the samples as for SUR can be used. Next we illustrate the results of this latter on a practical application.
11.4 Application Example: Engineering Test Case We consider the switching ripple suppressor design problem for voltage source inversion in powered systems, which was proposed by [53], and also tested in [27]. The device is composed of three components, one with three inductors, one with parallel LC resonant branches and a capacitor branch. Arguing that this type of problem is naturally many objective, [53] describes several types of conflicting objectives: harmonic attenuations of the switching ripple suppressor under different frequencies, power factor, resonance suppression and inductor cost. In the proposed problem, where n r is the number of resonant branches, there are n r + 4 variables and n r + 1 objectives: variables are inductors L 1 , L 2 , L 3 that can be related to the inductor cost objective ( f nr +1 ), and variables C1 , . . . , Cnr related to the goal of attenuating the harmonics at resonant frequencies ( f 1 , . . . , f nr ). The last variable is C f . Note that five additional design constraints can also be added. The cost of evaluation is here negligible, but it showcases a realistic application example where the true Pareto front is unknown. We start by estimating the Nash–Kalai–Smorodinsky equilibrium in the n r = 4 case; with the following partitioning: ( f 1 : C1 , C f ), ( f 2 : C2 ), ( f 3 : C3 ), ( f 4 : C4 ), ( f 5 : L 1 , L 2 , L 3 ). For the SUR strategy, we discretize each input subspace using Latin hypercube samples of sizes (26, 11, 11, 11, 51) and use a budget of 80 evaluations for the initial design followed by 70 sequential iterations. Note that this discretization of the input space with the partitioning results in ≈ 2 × 106 possible combinations for the Nash equilibrium. To keep it tractable for the SUR procedure, filtering is performed to select 1000 solutions to actually draw posterior realizations, before evaluating the criterion on the 200 most promising candidates. More details on the procedure can be found in [8, 42]. The result obtained using the GPGame package [41] is described in Figs. 11.2 and 11.3. It appears that the first four objectives are highly correlated, which does not affect the estimation of the solution. Most of the initial designs (67/80) are dominated, while this is the case for two thirds (47/70) of the added ones, possibly due to the search of the inefficient Nash equilibrium. These new designs are all concentrated in the same region of the output space. The NE is relatively close to the Pareto front in this case, with the obtained solution dominating it. Then we consider the regular Kalai–Smorodinsky solution, whose results are in Figs. 11.4 and 11.5. This time the input space is discretized with 106 possible solutions, uniformly sampled. Again, most initial design are dominated (72/80), but this is the case for only a third (28/70) of the sequentially added ones. Figure 11.5 shows that the KS solution is almost a straight line, an indication of the centrality of
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
311
Fig. 11.2 Scatter plot matrix of the objectives obtained when estimating the NKS solution by the SUR strategy. Black circles: dominated initial designs, black +: nondominated initial designs, green crosses: sequentially added solutions (circles if dominated), blue triangles: estimated Nash equilibrium, and red diamond: estimated NKS solution. The KS solution (yellow square) is marked for reference only
this solution. It can be seen that, in the sequential procedure, new points are added at the extremities of the Pareto front to locate the nadir, as well as in the central part. Choosing between the obtained NKS and KS solutions now depends on the viewpoint. The spread of the latter is much broader in the objective space than for NKS. Note that the same range is used on all the Figures. To further ease comparison, we added the KS solution on Fig. 11.2 and the Nash and NKS ones on Fig. 11.4, but there were not part of the solutions evaluated in their corresponding BO loops, nor their targets. In case the range of objectives is relevant to the decision maker, the centrality and equity of benefits on all objectives may be appealing to the decision
312
M. Binois et al.
Fig. 11.3 Parallel coordinates plot corresponding to the objectives of Fig. 11.2. Colors are the same, dominated solutions are in dotted lines while the Nash equilibrium is marked by the large blue dashed line and the estimated NKS solution by the large red line
maker. If instead the territory splitting for the objectives is important, the relative scaling of the objectives irrelevant, then the NKS solution would be preferred (also over staying on the Nash equilibrium). As a perspective, it would be pertinent to additionally take into account the constraints defined in [53], for which the Pareto front becomes more complex.
11.5 What Is Done and What Remains We reviewed the enrichment of many-objective optimization with concepts from game theory. In the context of limited budgets of evaluations when the extensive search of the entire Pareto front is unreachable, game theory can guide the estimation of a single solution chosen for its centrality and possible additional game theoretic properties. Several BO-based methodologies have been developed for estimating the corresponding solution, which can scale to many-objective setups. As this is an emerging topic in BO, it seems that many algorithmic developments could be proposed, for instance based on Thompson sampling or on entropy search. For the Nash and NKS solutions, a partitioning of the input space is necessary, while it is seldom included in the multi-objective context. A perspective is to randomize this partitioning when it does not come directly from the problem definition. The possible advantage would be to focus on a set of different Nash equilibria, all with stationarity properties, providing more than a single solution. As with the engineering example we presented, there are usually additional constraints in the problem definition. In the SUR procedure described above, dealing with constraints could amount to filtering unfeasible solutions on the realizations. Other options could include the use of the probability of feasibility as in [44]. For more details on the handling of constraints in BO, we refer to [34] and references
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
313
Fig. 11.4 Scatter plot matrix of the objectives obtained when estimating the KS solution by the SUR strategy. The legend is as for Fig. 11.2 except that the yellow square indicates the KS solution. Here the Nash solution (blue diamond) and NKS solution (red triangle) are marked for reference only
therein. Then the difference between an objective and a constraint is sometimes fuzzy. Especially when the constraints can be modified, taking them as objectives instead, akin to multiobjectivization [31], could help finding feasible solutions (or work in the case the constrained problem is unfeasible). Conversely, prioritizing some objectives [16] while defining constraints on others can help control the trade-offs when moving on the Pareto front.
314
M. Binois et al.
Fig. 11.5 Parallel coordinates plot corresponding to the objectives of Fig. 11.4. Colors are the same, dominated solution are depicted with dotted lines while the estimated KS solution is marked by the large yellow line
References 1. M. Abdolshah, A. Shilton, S. Rana, S. Gupta, S. Venkatesh, Multi-objective Bayesian optimisation with preferences over objectives, in Neural Information Processing Systems (NIPS) (Curran Associates, Inc., 2019), pp. 12235–12245 2. R. Aboulaich, R. Ellaia, S. El Moumen, A. Habbal, N. Moussaid, A new algorithm for approaching Nash equilibrium and Kalai Smoridinsky solution. Working paper or preprint (2011) 3. A. Al-Dujaili, E. Hemberg, U.-M. O’Reilly, Approximating Nash equilibria for black-box games: a Bayesian optimization approach (2018) 4. A. Aprem, S. Roberts, A Bayesian optimization approach to compute Nash equilibrium of potential games using bandit feedback. Comput. J. 64(12), 1801–1813 (2021) 5. R. Astudillo, P. Frazier, Multi-attribute Bayesian optimization with interactive preference learning, in Artificial Intelligence and Statistics (JMLR.org, 2020), pp. 4496–4507 6. T. Ba¸sar, Relaxation techniques and asynchronous algorithms for on-line computation of noncooperative equilibria. J. Econ. Dyn. Control 11(4), 531–549 (1987) 7. S. Bechikh, L.B. Said, K. Ghedira, Estimating nadir point in multi-objective optimization using mobile reference points, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2010), pp. 1–9 8. M. Binois, V. Picheny, P. Taillandier, A. Habbal, The Kalai-Smorodinsky solution for manyobjective Bayesian optimization. J. Mach. Learn. Res. 21(150), 1–42 (2020) 9. M. Binois, D. Rullière, O. Roustant, On the estimation of Pareto fronts from the point of view of copula theory. Inf. Sci. 324, 270–285 (2015) 10. I. Bozbay, F. Dietrich, H. Peters, Bargaining with endogenous disagreement: the extended Kalai-Smorodinsky solution. Games Econom. Behav. 74(1), 407–417 (2012) 11. N. Brown, S. Ganzfried, T. Sandholm, Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas hold’em agent, in AAAI Workshop: Computer Poker and Imperfect Information (ACM Press, 2015), pp. 7–15 12. C. Chevalier, J. Bect, D. Ginsbourger, E. Vazquez, V. Picheny, Y. Richet, Fast parallel krigingbased stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56(4), 455–465 (2014) 13. C. Chevalier, X. Emery, D. Ginsbourger, Fast update of conditional simulation ensembles. Math. Geosci. 47(7), 771–789 (2015) 14. J.P. Conley, S. Wilkie, The bargaining problem without convexity: Extending the egalitarian and Kalai-Smorodinsky solutions. Econ. Lett. 36(4), 365–369 (1991)
11 A Game Theoretic Perspective on Bayesian Many-Objective Optimization
315
15. J.-A. Désidéri, Cooperation and competition in multidisciplinary optimization. Comput. Optim. Appl. 52(1), 29–68 (2012) 16. J.-A. Désidéri, Platform for prioritized multi-objective optimization by metamodel-assisted Nash games. Research Report RR-9290, Inria Sophia Antipolis, France (2019) 17. J.-A. Désidéri, R. Duvigneau, A. Habbal, Multiobjective design optimization using nash games, in Computational Intelligence in Aerospace Sciences (American Institute of Aeronautics and Astronautics, Inc., 2014), pp. 583–641 18. V.V. Fedorov, Theory of Optimal Experiments (Elsevier, 1972) 19. J.E. Fieldsend, Enabling dominance resistance in visualisable distance-based many-objective problems, in Genetic and Evolutionary Computation Conference (GECCO) Companion (ACM Press, 2016), pp. 1429–1436 20. D. Gaudrie, R. Le Riche, V. Picheny, B. Enaux, V. Herbert, Budgeted multi-objective optimization with a focus on the central part of the Pareto front–extended version (2018) 21. D. Gaudrie, R. Le Riche, V. Picheny, B. Enaux, V. Herbert, Targeting solutions in bayesian multi-objective optimization: sequential and batch versions. Ann. Math. Artif. Intell. 88(1), 187–212 (2020) 22. D. Geman, B. Jedynak, An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18(1), 1–14 (1996) 23. R. Gibbons, Game Theory for Applied Economists (Princeton University Press, 1992) 24. D. González-Sánchez, O. Hernández-Lerma, A survey of static and dynamic potential games. Sci. China Math. 59(11), 2075–2102 (2016) 25. A. Habbal, M. Kallel, Neumann-Dirichlet Nash strategies for the solution of elliptic Cauchy problems. SIAM J. Control. Optim. 51(5), 4066–4083 (2013) 26. J. Hakanen, J.D. Knowles, On using decision maker preferences with ParEGO, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2017), pp. 282–297 27. C. He, Y. Tian, H. Wang, Y. Jin, A repository of real-world datasets for data-driven evolutionary multiobjective optimization. Complex & Intell. Syst. 6(1), 189–197 (2020) 28. J.L. Hougaard, M. Tvede, Nonconvex n-person bargaining: efficient maxmin solutions. Econ. Theor. 21(1), 81–95 (2003) 29. E. Kalai, M. Smorodinsky, Other solutions to Nash’s bargaining problem. Econometrica 43, 513–518 (1975) 30. C. Kanzow, D. Steck, Augmented Lagrangian methods for the solution of generalized Nash equilibrium problems. SIAM J. Optim. 26(4), 2034–2058 (2016) 31. J.D. Knowles, R.A. Watson, D. Corne, Reducing local optima in single-objective problems by multi-objectivization, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2001), pp. 269–283 32. M. Lanctot, R. Gibson, N. Burch, M. Zinkevich, M. Bowling, No-regret learning in extensiveform games with imperfect recall, in International Conference on Machine Learning (ICML) (2012), pp. 1035–1042 33. J.R. Lepird, M.P. Owen, M.J. Kochenderfer, Bayesian preference elicitation for multiobjective engineering design optimization. J. Aerosp. Inf. Syst. 12(10), 634–645 (2015) 34. B. Letham, B. Karrer, G. Ottoni, E. Bakshy, et al., Constrained Bayesian optimization with noisy experiments, in Bayesian Analysis (2018) 35. S. Li, T. Ba¸sar, Distributed algorithms for the computation of noncooperative equilibria. Automatica 23(4), 523–533 (1987) 36. K. Miettinen. Nonlinear Multiobjective Optimization (Kluwer Academic Publishers, 1999) 37. J. Moˇckus, On Bayesian methods for seeking the extremum, in Optimization Techniques IFIP Technical Conference (Springer, 1975), pp. 400–404 38. M. Mutny, A. Krause, Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features, in Neural Information Processing Systems (NIPS), ed. by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Curran Associates, Inc., 2018), pp. 9005–9016 39. B. Paria, K. Kandasamy, B. Póczos, A flexible framework for multi-objective Bayesian optimization using random scalarizations, in Uncertainty in Artificial Intelligence (JMLR.org, 2020), pp. 766–776
316
M. Binois et al.
40. J. Periaux, F. Gonzalez, D.S.C. Lee, Evolutionary Optimization and Game Strategies for Advanced Multi-disciplinary Design: Applications to Aeronautics and UAV Design (Springer, 2015) 41. V. Picheny, M. Binois, GPGame: Solving Complex Game Problems using Gaussian Processes (2020). R package version 1.2.0 42. V. Picheny, M. Binois, A. Habbal, A Bayesian optimization approach to find Nash equilibria. J. Global Optim. 73(1), 171–192 (2019) 43. R. Purshouse, K. Deb, M. Mansor, S. Mostaghim, R. Wang, A review of hybrid evolutionary multiple criteria decision making methods, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2014), pp. 1147–1154 44. M. Schonlau, W.J. Welch, D.R. Jones, Global versus local search in constrained optimization of computer models, in Lecture Notes-Monograph Series, vol. 34, pp. 11–25 (1998) 45. J.D. Svenson, Computer experiments: multiobjective optimization and sensitivity analysis. Ph.D. thesis, The Ohio State University, USA (2011) 46. L. Thiele, K. Miettinen, P.J. Korhonen, J. Molina, A preference-based evolutionary algorithm for multi-objective optimization. Evol. Comput. 17(3), 411–436 (2009) 47. W.R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933) 48. S. Uryas’ev, R.Y. Rubinstein, On relaxation algorithms in computation of noncooperative equilibria. IEEE Trans. Autom. Control 39(6), 1263–1267 (1994) 49. J. Villemonteix, E. Vazquez, E. Walter, An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim. 44(4), 509–534 (2009) 50. H. Wang, M. Olhofer, Y. Jin, A mini-review on preference modeling and articulation in multiobjective optimization: current status and challenges. Complex & Intell. Syst. 3(4), 233–245 (2017) 51. A.P. Wierzbicki, The use of reference objectives in multiobjective optimization, in Multiple Criteria Decision Making Theory and Application (Springer, 1980), pp. 468–486 52. X. Zhang, Y. Tian, Y. Jin, A knee point-driven evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 19(6), 761–776 (2014) 53. Z. Zhang, C. He, J. Ye, J. Xu, L. Pan, Switching ripple suppressor design of the grid-connected inverters: a perspective of many-objective optimization with constraints handling. Swarm Evol. Comput. 44, 293–303 (2019)
Chapter 12
Heterogeneous Objectives: State-of-the-Art and Future Research Richard Allmendinger and Joshua Knowles
Abstract Multiobjective optimization problems with heterogeneous objectives are defined as those that possess significantly different types of objective function components (not just incommensurable in units or scale). For example, in a heterogeneous problem the objective function components may differ in formal computational complexity, practical evaluation effort (time, costs, or resources), determinism (stochastic vs deterministic), or some combination of all three. A particularly challenging variety of heterogeneity may occur by the combination of a time-consuming laboratorybased objective with other objectives that are evaluated using faster computer-based calculations. Perhaps more commonly, all objectives may be evaluated computationally, but some may require a lengthy simulation process while others are computed from a relatively simple closed-form calculation. In this chapter, we motivate the need for more work on the topic of heterogeneous objectives (with reference to real-world examples), expand on a basic taxonomy of heterogeneity types, and review the state of the art in tackling these problems. We give special attention to heterogeneity in evaluation time (latency) as this requires sophisticated approaches. We also present original experimental work on estimating the amount of heterogeneity in evaluation time expected in many-objective problems, given reasonable assumptions, and survey related research threads that could contribute to this area in future.
12.1 Motivation and Overview It is a familiar thing to anyone who works in optimization that the objective functions in different optimization problems vary along important ‘dimensions’, giving rise to different large classes of problems. For example, we are aware that objectives R. Allmendinger (B) · J. Knowles Alliance Manchester Business School, The University of Manchester, Manchester, UK e-mail: [email protected] J. Knowles e-mail: [email protected] J. Knowles Schlumberger Cambridge Research, Madingley Road, Cambridge, UK © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_12
317
318
R. Allmendinger and J. Knowles
may take discrete, continuous or mixed input variables; they may give bounded or unbounded outputs, which may be discrete or continuous; they may be closed-form expressions or require more-or-less complicated simulations; if they are closed-form they may be linear, nonlinear but convex, or nonconvex; they may be non-computable in practice, instead requiring experimental processes to be evaluated; and, they may be precise, certain and repeatable, or they may be uncertain objective functions, subject to parametric or output noise. Given this natural variety in the objective functions seen in different (singleobjective) problems, it should not be surprising that when considering multiobjective optimization problems, the different objective function components forming the overall function may also be quite different from one another. Indeed, one aspect of the difference is very well known and well accounted for historically: the function components may give outputs in different units, and those units may be incommensurable; this incommensurability of objective function values is a key motivation of “true” multiobjective methods—those that do not form combinations of objective values, except with explicit reference to a proper preference model or preference elicitation process. Although it is not surprising that different objective functions should be not just different in the scale, range or units of their outputs (incommensurable), but different in the more fundamental ways listed above, there has been very little stated explicitly about this heterogeneity in the literature, and almost no work that seeks to offer ways to handle it inside multiobjective algorithms. The first work the authors are aware of was by us [7]. We were motivated in that work to consider a particular type of heterogeneity that could potentially cause a lot of inefficiency in a standard evolutionary multi-objective optimization (EMO) approach, namely that the objective function components were of different latency i.e., evaluation time. Our motivation came from a number of real problems we had been looking at under the banner of “closed-loop optimization”, which is to say at least one of the objective function components was dependent on a non-computational experiment for evaluation, such as a physical, chemical or biological experiment. Allmendinger’s Ph.D. thesis [2] studied many different consequences apparent in these closed-loop problems (including dynamic constraints and interruptions), and this followed much real experimental work in this vein by Knowles and co-authors [27, 33, 34, 38]. Varying per-objective latencies can, of course, arise also in scenarios requiring intense computational experiments, such as CFD and other complex simulations; for example, obtaining objective function values may require the execution of multiple time-consuming simulations, which may vary in running time. For a broader discussion on real-world problems, please refer to Chap. 2 in this book. Following our publication of the above-mentioned paper [7], which dealt with heterogeneity in latency between the different objectives of a problem, Knowles proposed a broader discussion topic on Heterogeneous Functions [17] at a Dagstuhl Seminar [22], in which the topic was fleshed out by the participants of the seminar over the course of three or four days. Since then, there have been two significant strands of work. First, we (the authors of this chapter) have presented a further study
12 Heterogeneous Objectives: State-of-the-Art and Future Research
319
on heterogeneity [5], in which we extended our work on handling differing latency within a pair of objectives in a bi-objective problem. This was also further extended in [15], and then by Jin and co-authors in [50], with both of these studies adapting surrogate-assisted evolutionary algorithms to cope with latencies between objectives. Secondly, Thomann and co-authors have published a pair of papers, and Ph.D. thesis, on heterogeneity [44, 46, 47] particularly concerning non-evolutionary methods. Our aim in this chapter is to review the basic concepts and algorithms explored so far, and to look ahead to future developments. We begin in the next section with some fundamental concepts needed to handle heterogeneity in latency, and then provide a broader categorisation of other types of heterogeneity not yet explored in detail. The remainder of the chapter builds a deeper understanding of how heterogeneity has been handled so far, and the prospects for further work.
12.2 Fundamental Concepts and Types of Heterogeneity Handling heterogeneous functions relies first on the usual definitions used in multiobjective optimization (available elsewhere in this book). Added to that, we need some well-defined notion of a time budget if we are going to handle problems with different latencies across the objectives. Let’s review the required definitions.
12.2.1 Fixed Evaluation Budget Definitions As in some of our previous work [5, 7], we adopt the notion of a fixed evaluation budget in our optimization (cf.[23]), as this framework is central to the practical problem of handling heterogeneity in evaluation costs (specifically latency differences between objective components). Definition 12.1 (Total budget) The total budget for solving an optimization problem is the total number of time steps B available for solving it, under the assumption that only solution evaluations consume any time. Definition 12.2 (Limited-capacity parallel evaluation model) We assume parallelization of the evaluation of solutions is available, in two senses. First, a solution may (but need not) be evaluated on one objective in parallel to its being evaluated on another objective. Secondly, a number of (at most λ) solutions may be evaluated at the same time (i.e., as a batch or population) on any objective, provided their evaluation is started at the same time step, and finishes at the same time step (i.e., batches cannot be interrupted, added to, etc., during evaluation). For sake of simplicity, we assume λ is the same for all objectives. Definition 12.3 (Per-objective latency) Assume that each objective i can be evaluated in ki ∈ Z+ time steps (for a whole batch). Here, we consider a bi-objective case,
320
R. Allmendinger and J. Knowles
and for simplicity, we define k1 = 1 and k2 = kslow > 1, so that the slower objective is kslow times slower than the faster one. Definition 12.4 (Per-objective budgets) From Definitions 12.1–12.3, it follows that the total budget of evaluations per objective is different. The budget for f 1 is λB, whereas the budget for f 2 is λB/kslow . In Algorithm 3 (Sect. 12.3.1), we will be referring to the fast and slow objective as f fast and f slow , respectively, and to their per-objective budgets as MaxFE fast and MaxFE slow .
12.2.2 Types of Heterogeneity We believe that heterogeneous objectives are the norm in multiobjective optimization rather than the exception. Nevertheless, it is still a largely unexplored topic to understand how each different type of heterogeneity causes specific difficulties to existing multiobjective techniques. In many cases, e.g., in industry, the heterogeneity in objectives is perhaps just handled in an ad hoc way, with some adaptations to existing algorithms. Whether or not such ad hoc solutions are effective remains an unanswered question for which there is a lot of scope for academic or more foundational work. Heterogeneity in the latency of different objectives is perhaps the area where it is obvious that the usual ad hoc solution (“waiting” for all objectives of a solution to be evaluated) is in need of re-thinking, as our work has shown (for two objectives only so far—but see the remainder of this chapter for a sketch of a generalization of this). More generally though, the different types of heterogeneity that might need accounting for in many-objective algorithm design, are as follows (based on but extending [17]): (i) Scaling: Different ranges of objective function values are handled by most EMO approaches using either Pareto ranking or dynamic normalization techniques. (ii) Landscape: Variation in landscape features, such as ruggedness, presence of plateaus, separability, or smoothness. Exploratory landscape analysis (ELA) [32] can help to characterize the landscapes associated with the individual objective functions but there is a lack of research on characterizing the complexity of a multi-objective optimization (MO) problem as a whole (e.g., front shape, local fronts). (iii) Parallelization: Batch vs Sequential evaluation requirements may vary across objectives, and this would complicate the control flow of most types of optimization algorithms. (iv) White vs Grey vs Black Box objectives: A mix of these across the objectives in a single problem would necessitate coordination of different types of searching behaviour, e.g., for black box, evolutionary techniques might be effective, but
12 Heterogeneous Objectives: State-of-the-Art and Future Research
(v)
(vi)
(vii)
(viii)
(ix)
(x)
(xi)
(xii)
321
for white box problems, a more efficient search based on knowledge of the objective function may be possible. Subject to interruptions [4, 35] or ephemeral resource constraints (ERCs) [8]: If evaluating an objective value of a solution depends upon an experiment that depends upon a resource (such as availability of equipment), then for certain solutions, it may be interrupted if the resource becomes unavailable. This is already an involved problem in the single objective case (ibid). Variable type (integer, continuous or mixed): These different types usually require different types of algorithm, which would need coordination in a multiobjective one. Determinism: In stochastic settings, objective functions values may depend upon hidden uncontrolled variables necessitating some way of accounting for this variation such as robust or distributionally robust approaches [10]. Noisy vs noiseless: The usual way of handling output (e.g. measurement) noise on objective values is to re-evaluate them to obtain a mean; it would be wasteful to re-evaluate deterministic objectives however—this seems easily handled, but perhaps there is hidden difficulty. Theoretical and practical difficulty: This relates to the difficulty of finding the optimum as a whole, not the cost or complexity of evaluating. Combining a simpler objective with a harder one might cause EMO methods to be biased. Safety [6, 26]: This relates to safe optimization, and the tightness of an objective’s safety threshold. Evaluating a (non-safe) solution that has an objective value below the safety threshold causes an irrecoverable loss (e.g., breakage of a machine or equipment, or life threat). Correlations between objectives: Conflicting objectives are considered the norm in multiobjective optimization. However, anti-correlated objectives in particular can lead to very large Pareto fronts, and hence difficulties for multiobjective algorithms (see [28, 37]). This may be exacerbated in cases of many conflicting objectives, and algorithm settings would need to be carefully chosen to handle these situations [39]. For a more in depth discussion on the topic of correlation between objectives, please refer to Chap. 9 in this book. Evaluation times (latency): This is the topic explored most fully to date, particularly in [5, 7, 16, 44, 46, 47, 50].
Of course, several of these heterogeneities may exist together in a single problem, which would usually make things even more challenging to handle. Having said that, under certain circumstances, heterogeneity can improve performance. For example, in [5, 15] we observed that a low level of latency between the fast and slow objective can lead to improved results and help reach parts of the Pareto front that may not have been reached otherwise.
322
R. Allmendinger and J. Knowles
12.3 Algorithms and Benchmarking In this section we will describe several existing algorithm schemes and benchmark problems/considerations when tackling MO problems with heterogeneous objectives, in particular differing latencies. For this we will refer back to the definitions provided in Sect. 12.2.1.
12.3.1 Algorithms This section outlines the three existing algorithm schemes for coping with differing latencies, Waiting, Fast-First, and Interleaving. Figure 12.1 provides a schematic of these three schemes. Waiting. The most straightforward strategy to deal with varying latencies of objectives is to go at the rate of the slow objective, thus fully exploiting the per-objective budget of the slow objective and only partially of the fast objective. This approach avoids the development of customised strategies and is applicable to many-objective problems. In [5], this was referred to as a Waiting strategy. It prevents the introduction of search bias, for example, towards the fast objectives, and it has shown to perform better as the evaluation budget increases. Fast-First. Strategies falling into this category neglect the time-consuming objective function(s) as long as possible in order to make potentially better use of the time budget by performing a search directed by the cheaper objective function(s). The approach was first proposed in [5], where it was called the Fast-First strategy. It evaluates solutions at the rate of the fastest objective (ignoring other objectives) using a standard (single-objective) EA for part of the optimization, and then switches late (as late as seems reasonable) in the optimization run to a final, evaluation of some selected solutions on the other (slower) objectives to ensure that at least some solutions are evaluated on all objectives. That is, this approaches uses fully the perobjective budget of the fast objective, but only a fraction of the per-objective budget of the slow objective(s). Fast-First has shown to perform well for problems with highly positively correlated objectives (for obvious reasons), being able to reach parts of the Pareto front (extreme solutions on the fast objective) that may not have been reached otherwise. Moreover, Fast-First is also almost unaffected by the length of latency, and performed better within a generational multiobjective EA (MOEA) than within a steady state-based MOEA [5]. Similar to Waiting, Fast-First is readily applicable to many-objective problems. Interleaving. These strategies are less straightforward since they employ a mechanism to coordinate the evaluation of the objectives during search so as to use the per-objective budgets of all objectives as efficiently and thoroughly as possible. Also, so far Interleaving strategies have been applied to bi-objective problems (one slow and one fast objective) only.
12 Heterogeneous Objectives: State-of-the-Art and Future Research
323
Fig. 12.1 Schematic of the main types of strategy for handling heterogeneous latencies. A biobjective problem is assumed with a slower and a faster objective function (the faster objective is twice as fast as the slower one in the figure, without loss of generality). Further, it is assumed that we are interested in performing well (in a multiobjective sense) given a fixed budget of total evaluation time, and a limited capacity parallel evaluation model (see Sect. 12.2.1). Three different general strategies are shown—Waiting, Fast-First, and Interleaving. The time axis is from top to bottom. After initialization, individual solutions need to be evaluated in a parallel batch (population), before entering the usual evolutionary algorithm phases of ranking, selection and variation (R,S,V). When these phases are carried out using evaluations from both objectives, we denote it as MO R,S,V. When only one objective has been evaluated, we denote it as SO R,S,V. In Waiting, only MO R,S,V is used (and a standard MOEA can be employed). In the Fast-First strategy, SO R,S,V is used for some generations, and then subsequently Waiting is used for the remaining generations. In Interleaving, a much more complicated approach, SO R,S,V and MO R,S,V are both used, and solutions evaluated partially and fully are interleaved so that there is less ‘dead time’ than in Waiting, and more guidance than in Fast-First. Interleaving is generally the best. Fuller algorithmic details of these strategies are given in pseudocode in the original papers [5, 7]
In our initial work on latencies [7], we proposed an Interleaving strategy embodied by a ranking-based EMOA that maintained a population unbounded in size, and which assigned pseudovalues to a solution’s slow objective until that objective has been evaluated. Different techniques to assign pseudovalues have been proposed including one based on fitness-inheritance. The approach works similarly to a standard ranking-based EMOA, where offspring are generated by a process of (multiobjective) selection, crossover and mutation. Every time a batch of solutions has been evaluated on the slow objective, their pseudovalues are replaced with the true objective values, and a new batch of solutions (selected from the current unbounded population) is submitted for evaluation on the slow objective, and new pseudovalues assigned to these solutions’ slow objectives. This new batch consists either of the most recently generated solutions that have not been evaluated on the slow
324
R. Allmendinger and J. Knowles
objective yet, or of solutions selected based on their anticipated quality computed based on their non-dominated sorting rank. This very first approach to cope with heterogeneous evaluation times performed well for long latencies when compared to a Waiting scheme. In [5], we proposed two further variations of interleaving strategies, Brood Interleaving (BI) and Speculative Interleaving (SI). As in the approach explained above, both strategies evaluate solutions on both objectives in parallel. However, BI and SI employ a constant population size, and use the time while the slow objective is being evaluated (the interleaving period) to generate and evaluate solutions on the fast objective only: The Brood Interleaving strategy generates these solutions using uniform selection and variation applied to the population currently evaluated on the slow objective, while SI initializes an inner (single-objective) EA with this population and applies it to the optimization of the fast objective for the remainder of the interleaving period. The solutions evaluated on the fast objective are then used as a quality indicator to decide which solutions to evaluate on the slow objective in the next generation.1 The difference between SI and BI translates into deliberately optimizing the fast objective vs maintaining selection pressure where possible. As shown in [5], the Speculative Interleaving strategy performs well for low evaluation budgets and/or when latencies are long, objectives positively correlated, and fitness landscapes rugged. Similar to Waiting, BI performs well for larger evaluation budgets, with BI performing better for longer latencies and larger search spaces. Furthermore, BI performs significantly better when used in combination with a steady state-based MOEA than a generational-based MOEA. Two surrogate-based interleaving strategies for coping with latencies in the objectives (one fast and one slow) have been proposed recently in [15] (HK-RVEA) and [50] (T-SAEA); see Algorithm 3 for a sketch of the two algorithms. Although these and the non-surrogate-based strategies outlined above adopt the same limitedcapacity parallel evaluation model, the surrogate-based methods opted to use a different value of λ (number of solutions evaluated in parallel) during search: a large λ is used to create the initial training data set, while a significantly smaller λ is used thereafter. Identifying and evaluating fewer samples in each iteration, but doing more iterations, is more suitable for a surrogate-based approach; in fact, traditional surrogate-based methods, e.g. see [24] and Chap. 10 in this book chapter, use one sample per iteration. Consequently, in [16, 50], the stopping criterion was not the maximum number of time steps (or iterations), as used by the non-surrogate-based methods, but the maximum number of per-objective function evaluations only (ignoring the number of time steps used). In practice, the stopping criteria (times steps vs function evaluations only) are dictated by the problem (context) at hand, thus making certain surrogate-based methods potentially unsuitable. For example, in some of our closed-loop optimization work [27], the experimental platform dictates that a batch
1
Solutions evaluated on the fast objective are considered for evaluation on the slow objective in the next generation if they outperform at least one of their parents on the fast objective.
12 Heterogeneous Objectives: State-of-the-Art and Future Research
325
Algorithm 3: Interleaving surrogate-based strategies (HK-RVEA, T-SAEA)
1 2 3 4 5 6 7 8 9 10 11 12
13 14
15 16 17 18 19 20 21 22
Input: MaxFE slow and MaxFE fast : per-objective budget of slow ( f slow ) and fast objective ( f fast ); kslow : latency; λ: initial population size; u: number of new samples per iteration; τ : transfer learning trigger Output: Non-dominated solutions of the archive A Create an initial population P, set archives to A = Afast := ∅, iteration counter to i := 0, and evaluation counters to FE slow = FE fast := 0 while P is evaluated on f slow do Evaluate P on f fast , and add solutions to Afast Run a single-objective EA to optimize f fast using λ × (kslow − 1) function evaluations, and add the solutions to Afast Update archive and counters to A := P, FE slow := λ, FE fast := λ × kslow , i := i + 1 while FE slow < MaxFE slow and FE fast < MaxFE fast do if HK-RVEA then Build surrogates for the slow and fast objective function based on A and Afast Run a multiobjective EA (RVEA [14]) to find samples for updating the surrogates Form new population P by selecting u samples using the acquisition function from the K-RVEA algorithm [16] if T-SAEA then Build surrogates for the slow and fast objective function based on A and Afast , but, every τ iterations (if i mod τ = 0), build the surrogate of the slow objective function using a transfer learning approach Run a multiobjective EA (RVEA [14]) to find samples for updating the surrogates Form new population P by selecting u samples using an adaptive acquisition function [49] followed by the angle-penalized distance approach (taken from RVEA [14]) while P is evaluated on f slow do Evaluate P on f fast , and add solutions to Afast if HK-RVEA then Create u × (kslow − 1) solutions via uniform selection and variation applied to P if T-SAEA then Create u × (kslow − 1) solutions via Latin hypercube sampling around P Evaluate newly created solutions on f fast , and add them to Afast Update archive and counters to A := A ∪ P, FE slow := FE slow + u, FE fast := u × kslow , i := i + 1
of solutions of a certain size should be evaluated in parallel (a 96-well plate was used in some experiments, in others a microarray for assaying 9,000 DNA strands was available). The strategy proposed in [16], called HK-RVEA, resembles a hybrid between BI and SI combined with a surrogate-assisted approach for selecting solutions to be evaluated on the slow and fast objective. Following initialization of the population and submitting it for evaluation on both objectives, HK-RVEA uses a single-objective EA (without any surrogate) to optimize the fast objective (like in SI) whilst the slow objective is being evaluated. In the main loop, the EA is replaced by repetitively applying crossover and mutation to the population (as in BI) as the number of solutions (samples) evaluated on the slow objective is much lower than the ini-
326
R. Allmendinger and J. Knowles
tial population size. HK-RVEA maintains a separate archive of evaluated solutions for the slow and fast objective, and then uses these archives (i.e., different samples) to build an objective-specific surrogate. A multiobjective EA (RVEA [14]) is used to find (three) samples for updating the surrogates using the infill criteria from the K-RVEA algorithm [16]. HK-RVEA has been shown to perform well for problems with short latencies, occasionally even outperforming a multiobjective EA (K-RVEA) optimizing the same problem but without latency. The approach proposed in [50] is called T-SAEA and it varies from HK-RVEA primarily in that it combines a surrogate-assisted evolutionary algorithm with a transfer learning approach, which is used to update the surrogate for the slow objective (for most of the time). The motivation is that if there is a strong similarity or correlation between the slow and fast objective, then knowledge transfer is beneficial, otherwise negative transfer may occur. The basis of the proposed transfer learning approach is a preceding filter-based feature selection method [13] adopted to identify the most relevant decision variables to share between the surrogate of the slow and fast objective. Based on the identified subset of decision variables, an adaptive aggregation method is then used to share the parameters of the two surrogates. The adaptive component, in essence, shifts the importance of sharing parameters of the fast objective surrogate to the parameters of the slow objective surrogate as the optimization progresses. T-SAEA has shown significantly better performance [50] compared to non-surrogate methods designed for handling latencies. The algorithm seems to do well also against HK-RVEA, allowing us to tentatively conclude that transfer learning is a promising approach to cope with latencies, provided the non-trivial issue around negative transfer (decrease in learning performance in the target domain) can be addressed. Non-evolutionary approaches. The approaches discussed above employ evolutionary search at some stage. Thomann and co-authors [44, 47] propose a nonevolutionary approach based on the trust region method, which optimizes one point at a time. The method is called MHT (short for multiobjective heterogeneous trust region algorithm) and it differs from other trust region methods in the way the search direction is computed and the replacement of the objectives by quick-to-evaluate surrogates. In fact, MHT replaces all objectives (slow and fast ones) with a local quadratic model that interpolates the current iteration point, and is agnostic about the per-objective latencies. The next iteration point is determined by first solving the classical trust region subproblem to obtain the ideal point, followed by solving an auxiliary problem (know as the Tammer-Weidner functional [19]) to determine a trial point. The trial point is accepted as the next iteration point if a multiobjective condition describing the improvement of the function values is fulfilled. Otherwise, the current point is kept and the size of the trust region (the trust region radius) reduced. MHT stops when there is no improvement in the iteration point. To keep the number of evaluations of the slow objective to a minimum, MHT evaluates the slow objective only when the surrogate becomes inaccurate. MHT is scalable to any number of objectives and any combination of fast vs slow objectives (while the approaches above assumed a bi-objective problem with one fast
12 Heterogeneous Objectives: State-of-the-Art and Future Research
327
and one slow objective, though Waiting and Fast-First are easily scalable). However, the assumption taken here is that while the expensive functions are black box (and slow to compute), the fast objectives are given as analytical functions for which function values and derivatives can easily be computed (above we assumed that the fast objective can be black box and there is a budget on how often the fast objective can be evaluated). Moreover, it is assumed that the slow (black-box) objectives are twice continuously differentiable, which is a strong assumption to make, as noted by the authors [44]. In [44, 46], Thomann and Eichfelder propose three heuristics augmented onto the standard version of MHT outlined above. The purpose of these heuristics, which are motivated by ideas for bi-objective optimization problems, is to exploit the heterogeneity of the objective functions further to identify additional Pareto optimal solutions that are spread over the Pareto front. For the heuristics to be applicable it is assumed that the fast objective can be optimized with reasonable numerical effort, and that it is bounded from below (assuming a minimization problem); in these heuristics the fast objective is not replaced by a surrogate. Below we explain these heuristics briefly. The idea of the first heuristic, referred to as Spreading, is to minimize the fast objective on local areas that move in the direction of the optima of the fast objective. A local area can be seen as a trust region (defined by user-provided radius, or spreading distance) around an optimal input point. The initial (weakly) optimal input point can be obtained, e.g., by applying the standard version of MHT. The point in this local area that minimizes the fast objective only, can then be used as the starting point for the next run of MHT. Repeating this process allows one to successively identify new optimal (weakly efficient) input points until the global optimal value of the fast objective has been reached. Finally, dominated solutions are deleted to leave optimal solutions only. Note, the closer the initial optimal point to the global minimal value of the fast objective, the fewer successive optimal points can be computed. Also, the larger the spreading distance, the bigger the distance between the computed points and thus the fewer optimal points can be obtained. The second heuristic, Image Space Split, splits the objective space into disjoint areas (which can be thought of as slices in a two-dimensional objective space) in which then a modified version of MHT is applied. The reason for needing a modified version of MHT to accommodate this heuristic is that the presence of a lower bound on the values of the fast objective requires an additional constraint for computing the ideal point and a modified auxiliary problem for determining the descent direction; all other steps in MHT remain unchanged. When splitting the objective space into several disjoint search areas before applying any version of MHT, the choice of starting points in the disjoint search areas is important. A heuristic approach to determine starting points is suggested, and so is a heuristic stopping criterion to save function evaluations. The challenge for this heuristic is to decide on a suitable number of disjoint search regions. In general, the greater this number, the more efficient points are computed and the more function evaluations are required. However, this does not need to be always the case because of the non-linear relationship between the location of the individual search regions and the position of the starting points, resulting in
328
R. Allmendinger and J. Knowles
scenarios where not all search regions contain (weakly) optimal points and some starting points being already close to efficient points. The third heuristic is a Combination of Image Space Split and Spreading. This heuristic first executes Image Space Split, and then applies Spreading in the space captured by both adjacent optimal solutions identified through Image Space Split to compute further optimal points. The challenges associated with the two individual heuristics (merged together) persist in this heuristic. An initial study [45], comparing MHT against a weighted sum-based approach and direct multisearch for multiobjective optimization, concluded that the proposed approach can yield good results.
12.3.2 Empirical Study: Towards Many-Objective Heterogeneous Latencies In this section we will present and analyze original empirical results concerning the relationship between the number of objectives and the level of heterogeneity one might expect in a problem; all this is done in the context of varying latencies. For this experiment, assume a problem with a certain number of objectives, each being associated with a per-objective latency drawn from a given distribution. We want to understand the likely range of latencies among the objectives. Having a better understanding about the level of heterogeneity can help us in the design and selection of suitable algorithms. The experiment consists of creating problems that vary in the number of objectives (1–25 objectives) with per-objective latencies drawn from one of three Beta distributions—beta(2, 8) (skewed to the right), beta(8, 2) (skewed to the left), and beta(5, 5) (symmetric), each defined on the interval [0,1]. Each combination of objective number and distribution (25 × 3 = 75 combinations in total), was realized 100 times, and the mean and standard error of the minimum and maximum differences in per-objective latencies plotted in Fig. 12.2. The reason to use a Beta distribution is that it allows for a convenient simulation of skewness in the per-objective latencies, and due to its wide-spread use in the literature and practice to quantify the duration of tasks (see seminal paper of [31]). Two main observations can be made from the figure: • Regardless of the probability distribution (skewness) used, the mean of the minimum and maximum difference in per-objective latencies starts roughly from the same level for a bi-objective problem, with the mean differences then increasing/decreasing logarithmically with the number of objectives. This pattern is due to sampling from a wide distribution, where it becomes gradually more likely that the per-objective of a new objective is either more similar or more distinct to the per-objective latencies of an existing objective. If all per-objective latencies were identical, then the minimum and maximium difference in per-objective latencies would be identical and of value 0 for all objectives.
12 Heterogeneous Objectives: State-of-the-Art and Future Research
329
Fig. 12.2 Mean and standard error of the maximum (triangles) and minimum differences (circles) in per-objective latencies (y-axis) as a function of the number of objectives in a problem (x-axis). Per-objective latencies are drawn from one of three Beta distributions
• There is an asymmetry between the minimum and maximum difference in the per-objective latencies as a function of the number of objectives. The mean minimum difference is very similar for the three Beta distributions with the mean difference flattening quickly beyond around 15 objectives. However, there is a statistical difference between the Beta distributions when considering the mean of the maximum differences. In particular, the mean maximum difference obtained with the symmetric distribution (beta(5, 5)) increases more rapidly than with the mean maximum difference of the two skewed Beta distributions. This pattern is due to the higher probability of sampling extreme per-objective latencies with the symmetric Beta distribution. The conclusion of this experiment is that, given knowledge of how many objectives a problem has, and even a limited knowledge about the distribution of per-objective latencies, then one can estimate with reasonable accuracy the level of heterogeneity in terms of the greatest latency difference one will observe in practice. This should facilitate the detailed design of algorithms one might consider for handling the heterogeneity.
12.3.3 Benchmarking Validating algorithms designed for coping with heterogeneity in the objective evaluation times requires careful consideration of how evaluations are counted/simulated, what test problems to use, and how to validate performance.
330
12.3.3.1
R. Allmendinger and J. Knowles
How to Count/simulate Evaluations?
In a standard multi/many-objective optimization problem, where homogeneous evaluation times of objectives are assumed, one evaluation is typically equivalent to evaluating one solution on all objectives that the problem has with the stopping criterion being a given maximum number of evaluations. When simulating a real problem with latencies, the maximum number of function evaluations available can be different for each of the objectives, meaning not all solutions can be evaluated on all objectives. Moreover, depending on the problem at hand, the stopping criteria may indeed be only the maximum number of function evaluations on both objectives (regardless of the number of time steps used up) (as used in [16, 50]) but it can also be the maximum number of time steps available (as used in [5, 7]). The latter would apply in the case where the optimization process has to be terminated within a certain time frame. When simulating a real problem with latencies, it is also important to reveal objective values to the optimizer only when the simulated evaluation of an objective is complete. If solutions can be evaluated in a batch, then multiple objective values are revealed in one go. In [44, 46, 47], the difference in evaluation times of the slow and fast objective was not simulated, solutions were not evaluated in parallel, and search was terminated if there is no improvement in a solution’s objective function values. Also, since the slow objective was evaluated only if the surrogate model was not accurate enough, it is unknown a priori how often the slow and fast objective will be evaluated. This setup needs to be taken into account when tackling a practical problem.
12.3.3.2
Test Problems
In an ideal world, one would evaluate an algorithm on a real problem featuring heterogeneous evaluation times. However, this is not feasible because it would usually be too cost and time prohibitive, especially considering that a (stochastic) algorithm would need to be executed multiple times on a problem to obtain information about statistical significance in performance results. This is a typical issue in expensive optimization. Consequently, it is suggested and also accepted by the community, to adapt existing test problems to simulate heterogeneity in the objectives. This is the approach taken to validate all algorithms described above. To simulate heterogeneity in the objective evaluation time, one can simply take any existing multi/many-objective test problem, and declare one/some of the objectives as slow (expensive) and also specify by how much these are slower than the other objectives. In principle, this way one can create a problem where all objectives differ in their evaluation time. Consequently, one can study the impact on algorithm performance of different ratios between the slow and fast objective(s), as done, for example, in [5, 7, 16, 50]. Of course, if the underlying problem has configurable problem parameters, then
12 Heterogeneous Objectives: State-of-the-Art and Future Research
331
the impact of these on heterogeneous evaluation times can be investigated too. For example, in [5, 16] we proposed a binary and continuous bi-objective toy problem with configurable correlation levels between the fast and slow objective, in [5] we investigated the impact of the landscape ruggedness on algorithm performance (using MNK landscapes), and in [44] the impact of varying the number of decision variables was investigated. All research so far on heterogeneous objective evaluation times considered problems with exactly one slow objective, and at least one fast objective. Typically, a random objective or the most difficult objective (as done in [44]) was declared as the slow objective.
12.3.3.3
Validating Performance
It is obvious that any new proposed algorithm for dealing with heterogeneous objective evaluation times should be compared against other methods designed for the same purpose. As a baseline, it makes sense to compare with Waiting (to mimic a naive approach), and, to upper bound performance, to optimize the same problem without heterogeneity (i.e., assume that all objectives have the same evaluation time). So far, standard performance metrics designed for multiobjective optimization were used to measure algorithm performance, such as IGD and the hypervolume metric, and, for the non-evolutionary approaches proposed in [44], the number of Pareto optimal solutions and functions evaluations needed to discover these were recorded. To understand visually whether the heterogeneity in evaluation times introduced any search bias (e.g., towards the optimization of the fast objective), plotting the median attainment surface is a reasonable approach (though this is applicable to bi-objective and tri-objective problems only). For an overview of performance metrics for many-objective optimization, please refer to Chap. 5 in this book.
12.4 Related Research Research on heterogeneous objectives can be related to and gain inspiration from a number of other areas in the literature. In the following we briefly discuss some of these relationships. An obvious connection exists with the use of asynchronous evolutionary algorithms in MO in distributed environments as arising, for example, in grid computing, multi-core CPUs, clusters of CPUs or on virtual clouds of CPUs [29, 40, 51]. Here it is assumed that the cloud computing resource induces heterogeneity and/or unreliability causing the evaluation time of a solution to vary across computing resources. This introduces asynchronicity across the population and not across the individual objectives, which is our focus. Despite considering a different problem setup, the research questions in the two areas are similar and are centred around understanding (i) how to utilize efficiently the available resources to perform MO in the presence of
332
R. Allmendinger and J. Knowles
heterogeneous resources (per solution vs per objective) and (ii) the bias induced by heterogeneity on the search path taken by the optimizer (bias towards search regions quick to evaluate solutions vs quick to evaluate objectives). Machine learning methods focused on dealing with missing data [1, 18, 30] and surrogate models [3, 41, 42] are of importance when designing algorithms for coping with heterogeneous objectives. In particular, these methods can be used to substitute missing objective function values with proxies and substitute expensive functions with an approximation function that is cheap to evaluate, respectively. This is relevant, for example, when dealing with heterogeneity in the evaluation times of objectives, and MO problems that are subject to interruptions or ERCs. The application of surrogate-assisted methods to batch optimization [9, 16, 21] and asynchronous batch optimization [20, 41] is also highly relevant to multiobjective problems with differing latencies. The application of transfer learning [36], potentially combined with dimensionality reduction [48], as done by T-SAEA [50] (see above), is another machine learning methodology that can find application to problems with heterogeneous objectives beyond problems with latencies. Methods for dimensionality reduction can also be applied to the objective space [11] with the aim of homogenizing the objectives. For example, removing an expensive objective that is positively correlated with a cheap objective, would reduce the total number of objectives and homogenize the set of remaining objectives, simplifying their optimization. A potential issue with this approach is that the level of correlation between objectives is a statistical observable that may not hold all over the entire search space and in particular may not hold crucially at or near optimality. Scheduling concepts and methods [12, 21, 25, 43] are also of relevance for MO problems with heterogeneous objectives, especially when faced with latencies or ERCs. Here, scheduling methods can help, for example, to decide on when to evaluate which solution on which objective such that, for example, idle time where no objective is being evaluated is minimized, and existing resources required for evaluating an objective are being utilized most efficiently. To cope with heterogeneous objectives, inspiration can be gained, for example, from classical resource-constrained scheduling [12], parallelization of MO algorithms [43], queuing theory [25] and batch expensive optimization [21].
12.5 Conclusions and Future Work This chapter reviewed the topic of multi/many-objective optimization problems with heterogeneous objectives, meaning the objectives vary in different aspects, such as computational complexity, evaluation effort, or determinism, or a combination of these. Although heterogeneous objectives exist in practical applications, very little research has been carried out by the community to address this problem feature. This chapter started by describing motivational examples of problems with heterogeneous objectives, followed by the introduction of basic concepts and a taxonomy for mod-
12 Heterogeneous Objectives: State-of-the-Art and Future Research
333
elling heterogeneity. We then discussed different types of heterogeneity, described existing algorithms designed for coping with heterogeneous objectives, and reviewed benchmarking considerations arising due to heterogeneity. Finally, we have reviewed related research. The algorithm part of the chapter was focused largely on a particular type of heterogeneity, namely different evaluation times of objectives, as this is the only type that has gained attention in the community, originating from work carried out by the authors of this chapter. The chapter has highlighted that heterogeneous objectives exist in a range of practical applications, and, although the community has started to look at this issue, there is much more that can and needs to be done. First and foremost, we need to raise awareness in the decision and data science community and amongst practitioners about the meaning of heterogeneous objectives and that we have approaches to cope with this issue. This chapter will hopefully contribute to this aspect. However, at the same time, it needs to be made clear that the existing approaches have limitations, and that the only type of heterogeneity investigated so far is different evaluation times of objectives (where one objective is slow and the others fast to evaluate). More research is needed to extend the existing methods to cope with many-objective problems where the objectives can be of any duration (and not of two modes only, slow vs fast). The development of methods to cope with other types of heterogeneity is needed too, and so are methods for problems where heterogeneity of multiples types exist in one problem. There is also a need for a customized benchmarking process. In particular, configurable many-objective test problems to simulate and adjust heterogeneity are needed, and existing performance metrics and visualisation tools may need to be adjusted/extended to understand further the impact of heterogeneous objectives. We look forward to future progress by the field in these directions.
References 1. P.D. Allison, Missing Data (Sage publications, 2001) 2. R. Allmendinger, Tuning evolutionary search for closed-loop optimization. Ph.D. thesis, The University of Manchester, UK (2012) 3. R. Allmendinger, M.T.M. Emmerich, J. Hakanen, Y. Jin, E. Rigoni, Surrogate-assisted multicriteria optimization: complexities, prospective solutions, and business case. J. Multi-Criteria Decis. Anal. 24(1–2), 5–24 (2017) 4. R. Allmendinger, S. Gerontas, N.J. Titchener-Hooker, S.S. Farid, Tuning evolutionary multiobjective optimization for closed-loop estimation of chromatographic operating conditions, in Parallel Problem Solving from Nature (PPSN) (Springer, 2014), pp. 741–750 5. R. Allmendinger, J. Handl, J.D. Knowles, Multiobjective optimization: when objectives exhibit non-uniform latencies. Eur. J. Oper. Res. 243(2), 497–513 (2015) 6. R. Allmendinger, J.D. Knowles, Evolutionary search in lethal environments, in International Conference on Evolutionary Computation Theory and Applications (SciTePress, 2011), pp. 63–72 7. R. Allmendinger, J.D. Knowles, ‘Hang on a minute’: investigations on the effects of delayed objective functions in multiobjective optimization, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2013), pp. 6–20
334
R. Allmendinger and J. Knowles
8. R. Allmendinger, J.D. Knowles, On handling ephemeral resource constraints in evolutionary search. Evol. Comput. 21(3), 497–531 (2013) 9. J. Azimi, A. Fern, X.Z. Fern, Batch bayesian optimization via simulation matching, in Neural Information Processing Systems (NIPS) (2010), pp. 109–117 10. D. Bertsimas, M. Sim, M. Zhang, Adaptive distributionally robust optimization. Manag. Sci. 65(2), 604–618 (2019) 11. D. Brockhoff, E. Zitzler, Objective reduction in evolutionary multiobjective optimization: Theory and applications. Evolutionary computation 17(2), 135–166 (2009) 12. P. Brucker, A. Drexl, R. Möhring, K. Neumann, E. Pesch, Resource-constrained project scheduling: notation, classification, models, and methods. Eur. J. Oper. Res. 112(1), 3–41 (1999) 13. L. Cervante, B. Xue, M. Zhang, L. Shang, Binary particle swarm optimisation for feature selection: a filter based approach, in Congress on Evolutionary Computation (CEC) (IEEE Press, 2012), pp. 1–8 14. R. Cheng, Y. Jin, M. Olhofer, B. Sendhoff, A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 20(5), 773–791 (2016) 15. T. Chugh, R. Allmendinger, V. Ojalehto, K. Miettinen, Surrogate-assisted evolutionary biobjective optimization for objectives with non-uniform latencies, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2018), pp. 609–616 16. T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, K. Sindhya, A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization. IEEE Trans. Evol. Comput. 22(1), 129–142 (2018) 17. G. Eichfelder, X. Gandibleux, M.J. Geiger, J. Jahn, A. Jaszkiewicz, J.D. Knowles, P.K. Shukla, H. Trautmann, S. Wessing, Heterogeneous functions (WG3), in Understanding Complexity in Multiobjective Optimization (Dagstuhl Seminar 15031) (Dagstuhl Zentrum für Informatik, 2015), pp. 121–129 18. P.J. García-Laencina, J.-L. Sancho-Gómez, A.R. Figueiras-Vidal, Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010) 19. C. Gerth, P. Weidner, Nonconvex separation theorems and some applications in vector optimization. J. Optim. Theory Appl. 67(2), 297–320 (1990) 20. D. Ginsbourger, J. Janusevskis, R. Le Riche, Dealing with asynchronicity in parallel Gaussian Process based global optimization Research report, Ecole de Mines Saint-Etienne, France (2011) 21. J. González, Z. Dai, P. Hennig, N.D. Lawrence, Batch Bayesian Optimization via Local Penalization, in Artificial Intelligence and Statistics (AISTATS) (2016), pp. 648–657. JMLR.org 22. S. Greco, K. Klamroth, J.D. Knowles, G. Rudolph (eds.) Understanding Complexity in Multiobjective Optimization (Dagstuhl Seminar 15031), Dagstuhl Reports, vol. 5(1). Schloss Dagstuhl– Leibniz-Zentrum für Informatik, Germany (2015) 23. T. Jansen, C. Zarges, Fixed budget computations: a different perspective on run time analysis, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2012), pp. 1325– 1332 24. D.R. Jones, M. Schonlau, W.J. Welch, Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998) 25. V.V. Kalashnikov, Mathematical Methods in Queuing Theory (Springer, 1994) 26. Y. Kim, R. Allmendinger, M. López-Ibáñez, Safe learning and optimization techniques: Towards a survey of the state of the art, in Trustworthy AI – Integrating Learning, Optimization and Reasoning (TAILOR) (Springer, 2020), pp. 123–139 27. J.D. Knowles, Closed-loop evolutionary multiobjective optimization. IEEE Comput. Intell. Mag. 4, 77–91 (2009) 28. J.D. Knowles, D. Corne, Instance generators and test suites for the multiobjective quadratic assignment problem, in Evolutionary Multi-criterion Optimization (EMO) (Springer, 2003), pp. 295–310 29. A. Lewis, S. Mostaghim, I. Scriven, Asynchronous multi-objective optimisation in unreliable distributed environments, in Biologically-Inspired Optimisation Methods (Springer, 2009), pp. 51–78
12 Heterogeneous Objectives: State-of-the-Art and Future Research
335
30. R.J. Little, D.B. Rubin, Statistical Analysis with Missing Data (Wiley, 2019) 31. D.G. Malcolm, J.H. Roseboom, C.E. Clark, W. Fazar, Application of a technique for research and development program evaluation. Oper. Res. 7(5), 646–669 (1959) 32. O. Mersmann, B. Bischl, H. Trautmann, M. Preuss, C. Weihs, G. Rudolph, Exploratory landscape analysis, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2011), pp. 829–836 33. S. O’Hagan, W.B. Dunn, M. Brown, J.D. Knowles, D.B. Kell, Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of the metabolomes of human serum and of yeast fermentations. Anal. Chem. 77(1), 290–303 (2005) 34. S. O’Hagan, W.B. Dunn, J.D. Knowles, D. Broadhurst, R. Williams, J.J. Ashworth, M. Cameron, D.B. Kell, Closed-loop, multiobjective optimization of two-dimensional gas chromatography/mass spectrometry for serum metabolomics. Anal. Chem. 79(2), 464–476 (2007) 35. L. Orseau, S. Armstrong, Safely interruptible agents, in Conference on Uncertainty in Artificial Intelligence (AUAI Press, 2016), pp. 557–566 36. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345– 1359 (2009) 37. L. Paquete, T. Stützle, A study of stochastic local search algorithms for the biobjective QAP with correlated flow matrices. Eur. J. Oper. Res. 169(3), 943–959 (2006) 38. M. Platt, W. Rowe, D.C. Wedge, D.B. Kell, J.D. Knowles, P.J.R. Day, Aptamer evolution for array-based diagnostics. Anal. Biochem. 390(2), 203–205 (2009) 39. R.C. Purshouse, P.J. Fleming, On the evolutionary optimization of many conflicting objectives. IEEE Trans. Evolut. Comput. 11(6), 770–784 (2007) 40. E.O. Scott, K.A. De Jong, Evaluation-time bias in asynchronous evolutionary algorithms, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2015), pp. 1209– 1212 41. J. Snoek, H. Larochelle, R.P. Adams, Practical Bayesian optimization of machine learning algorithms, in Neural Information Processing Systems (NIPS) (2012), pp. 2951–2959 42. M. Tabatabaei, J. Hakanen, M. Hartikainen, K. Miettinen, K. Sindhya, A survey on handling computationally expensive multiobjective optimization problems using surrogates: non-nature inspired methods. Struct. Multidiscip. Optim. 52(1), 1–25 (2015) 43. E.-G. Talbi, S. Mostaghim, T. Okabe, H. Ishibuchi, G. Rudolph, C.A.C. Coello, Parallel approaches for multiobjective optimization, in Multiobjective Optimization (Springer, 2008), pp. 349–372 44. J. Thomann, A trust region approach for multi-objective heterogeneous optimization. Ph.D. thesis, Technische Universität Ilmenau, Germany, Ilmenau, Germany (2019) 45. J. Thomann, G. Eichfelder, Numerical results for the multiobjective trust region algorithm MHT. Data in Brief 25, 1–18 (2019) 46. J. Thomann, G. Eichfelder, Representation of the Pareto front for heterogeneous multi-objective optimization. J. Appl. Numer. Optim 1(3), 293–323 (2019) 47. J. Thomann, G. Eichfelder, A trust-region algorithm for heterogeneous multiobjective optimization. SIAM J. Optim. 29(2), 1017–1047 (2019) 48. L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative review. J. Mach. Learn Res. 10(66–71), 13 (2009) 49. X. Wang, Y. Jin, S. Schmitt, M. Olhofer, An adaptive bayesian approach to surrogate-assisted evolutionary multi-objective optimization. Inf. Sci. 519, 317–331 (2020) 50. X. Wang, Y. Jin, S. Schmitt, M. Olhofer, Transfer learning for Gaussian process assisted evolutionary bi-objective optimization for objectives with different evaluation times, in Genetic and Evolutionary Computation Conference (GECCO) (ACM Press, 2020), pp. 587–594 51. M. Yagoubi, L. Thobois, M. Schoenauer, Asynchronous evolutionary multi-objective algorithms with heterogeneous evaluation costs, in Congress of Evolutionary Computation (CEC) (IEEE Press, 2011), pp. 21–28
Chapter 13
Many-Criteria Optimisation and Decision Analysis Ontology and Knowledge Management Vitor Basto-Fernandes, Diana Salvador, Iryna Yevseyeva, and Michael Emmerich
Abstract In this chapter, we present a Many-Criteria Optimisation and Decision Analysis (MACODA) Ontology and MACODA Knowledge Management WebBased Platform (named MyCODA, available at http://macoda.club) for the research community. The purpose of this initiative is to allow for the collaborative development of an ontology to represent the MACODA knowledge domain and to make available a set of integrated tools for its use by researchers and practitioners. MyCODA is a knowledge-based platform to identify and describe MACODA research constructs, and to explore how these constructs relate to each other. It is designed to model and systematise the knowledge created by the MACODA research community, supporting features such as querying and reasoning, by means of formal logics, and use cases such as training new learners and finding research gaps in the MACODA research domain.
The authors acknowledge the support provided by the Lorentz Center of University of Leiden— The Netherlands, in the Many-Criteria Optimisation and Decision Analysis (MACODA) Workshop, 16–21 September 2019. V. Basto-Fernandes (B) · D. Salvador Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL, Av. das Forças Armadas, 1649-026 Lisboa, Portugal e-mail: [email protected] D. Salvador e-mail: [email protected] I. Yevseyeva School of Computer Science and Informatics, Faculty of Technology, De Montfort University, LE1 9BH Leicester, Leicester, United Kingdom e-mail: [email protected] M. Emmerich Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, 2333-CA Leiden, The Netherlands e-mail: [email protected] © Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1_13
337
338
V. Basto-Fernandes et al.
13.1 Introduction It is well known and documented that the key driver for countries’ economic growth and productivity is their investment in Research and Development (R&D). Over the years, countries have substantially increased their public and private investment in science, technology and innovation [5]. The trend of increasing investment in R&D results in more researchers being involved in scientific knowledge production and dissemination, for instance, by means of scientific publications such as journal and conference papers, M.Sc. and Ph.D. thesis, technical reports and scientific data repositories. Scientific knowledge represents a valuable resource, it gives an ability to solve problems, promote new ideas and stimulate new research topics. Scientific knowledge production’s and publications’ exponential growth represents a great opportunity for knowledge sharing and development on a global scale, but raises serious difficulties concerning scientific knowledge management. The knowledge is commonly not well structured, well defined or harmonised (different taxonomies, same constructs or concepts named differently, different concepts named in the same way, etc.). Scarbrough, Swan and Preston [19] define knowledge management as a process or practice of creating, acquiring, capturing, sharing and using knowledge, wherever it resides, to enhance learning and performance of organisations and individuals. It enables the creation of value from the expert’s domain knowledge. R&D funding agencies around the world, especially in Europe, highlighted the difficulty in achieving innovation and industrial productivity from the results of research [5, 18]. The struggle in knowledge discovery and utilisation is perceived not only by industries but also by researchers and students, who are overloaded with the amount of knowledge produced in their domains. Even in narrow fields, such as Multi-Objective Optimisation (MOO) or ManyCriteria Optimisation and Decision Analysis (MACODA), the number of studies conducted on these topics is quite extensive. Let us illustrate the situation briefly by looking on the publication trends in Many-Criteria Optimisation. The initial growth of publications is illustrated in Fig. 13.1. Note that we consider here a keyword-based analysis and often papers in Many-Criteria Optimisation also are methodologies from classical EMO methods. With the steep growth of scientific knowledge in the MOO and MACODA fields, the need to develop a new approach to effectively manage, systematise and retrieve the knowledge produced about these fields has become more obvious. Domain knowledge can be captured and made available to both machines and humans by means of an ontology. Ontologies are currently the most suited way to formally represent concepts within a domain and the relationships that hold between them [7]. They not only provide a common understanding of the structure of information but also enable knowledge sharing and reuse. With the help of an ontology, a new researcher or practitioner can easily learn more about an algorithm for a particular application or find a future research topic, considerably decreasing the efforts
13 Many-Criteria Optimisation and Decision Analysis Ontology …
339
Fig. 13.1 Number of publications about many-objective optimisation in the web of science core collection from 2005 to 2019
of searching, finding and selecting the specific knowledge of her/his interest. As an example, let us assume the role of an expert in an engineering domain facing an optimisation problem that he/she is knowledgeable about, but he/she is not an expert on optimisation. This engineering expert would benefit from querying features of an ontology created and managed by the optimisation research community that allows him/her to query the ontology for algorithms that have previously proved successful in a problem similar to his/her problem. In the optimisation field, large numbers of methods and algorithms have been proposed and published in the last decades. Thus, obtaining a systematical view of the knowledge produced in this field is becoming very complex. New approaches and techniques are needed in order to systematise the scientific knowledge in the multi- and many-objective optimisation fields and to make it useful. Non-experts in MOO and MACODA fields should be able to explore and easily retrieve the information of interest by means of a platform that facilitates knowledge search and retrieval. Experts should be able to share their knowledge with the community. Therefore, the development of a platform that serves this purpose is a priority. The work presented in this chapter proposes the systematisation of MACODA knowledge domain by the means of a standardised ontology representation, a Webbased knowledge management platform (named MyCODA, available at http:// macoda.club), and a knowledge management process for the MACODA research and practitioners community. MyCODA platform allows its users to easily access, learn and compare existing optimisation methods, seek an appropriate method for a specific problem, share new scientific knowledge and collaborate with other MACODA researchers.
340
V. Basto-Fernandes et al.
13.2 MACODA Ontology 13.2.1 Ontology Overview Etymologically, ontology comes from Greek and means essentially ‘the study or theory of being or that which is’. In simple terms, ontology seeks the classification and explanation of entities. In philosophy, an ontology is defined as ‘the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality’ [21]. Over the last decades, ontologies became more popular in other areas, namely, Knowledge Management, Artificial Intelligence and the Semantic Web, given the need for a shared and common understanding of the domain. In Computer Science, Gruber [8] and Borst [4] were the pioneers in defining the notion of ontology. Later, Studer et al. [22] presented the most accepted definition of an ontology: ‘An ontology is a formal, explicit specification of a shared conceptualization’. ‘Conceptualization’ refers to an abstract model of a knowledge domain that represents concepts and relationships between them. ‘Explicit specification’ means that the model should be represented using a coherent, unambiguous and structured language. ‘Formal’ implies that the ontology should be machine interpretable. ‘Shared’ means that knowledge represented in an ontology should define a common and consented vocabulary in a given domain that can be shared across people and application systems. Ontologies specify the semantics of an area of knowledge by defining concepts (or classes) that represent existing ‘things’ and the relationships among them, properties that each concept may have, constraints on concepts or properties, and axioms. An instance of a class is known as an individual. Different generality levels of ontologies can be defined [22], namely: . Domain ontologies, which contain knowledge that is valid for a particular type of domain (e.g. medical, mechanic). . Generic ontologies, which capture general knowledge about the world and, therefore, are valid across several domains. . Application ontologies, which contain all the necessary knowledge for modelling a particular domain. . Representational ontologies do not commit to any particular domain. Such ontologies provide representational entities without stating what should be represented. . Metadata ontologies, which provide a vocabulary for describing the content of online information sources. The process of building an ontology is not straightforward. Various approaches exist to guiding the ontology development. A general proposal to the process of building ontologies is given by Noy [15]: 1. Determine the domain and scope of the ontology. 2. Consider reusing existing ontologies.
13 Many-Criteria Optimisation and Decision Analysis Ontology …
3. 4. 5. 6. 7.
341
Enumerate important terms in the ontology. Define the classes and the class hierarchy (taxonomy). Define object properties. Define data properties. Create individuals.
The semantic structure provided by ontologies differs from the formatting of information afforded by relational and XML databases, as they provide an objective specification of domain information, by representing a consensual agreement on the concepts and relations characterising the way knowledge in that domain is expressed. By providing a formal and hierarchically structured representation of an area of knowledge with commonly accepted definitions, ontologies minimise misunderstandings and miscommunications and make reasoning possible. By sharing the same underlying vocabulary, ontologies allow computer agents interoperation, as they can understand incoming requests and return the required knowledge. In addition, their semantic structure facilitates the process of precise knowledge indexing and retrieval. A common understanding of a domain among people and application systems fosters knowledge sharing and reuse not only between communities of experts, but also new learners [4]. In the present work, an ontology is used as a main mechanism to represent and share a domain knowledge of interest.
13.2.2 Ontologies in Knowledge Management The role of an ontology in knowledge management is to facilitate the representation of knowledge, as it provides a common vocabulary about a particular domain of interest. By having explicit knowledge representation, an ontology provides information in machine-understandable form, which allows reasoning from a given set of facts and rules about the domain. The potential advantages of using an ontology for knowledge management in the MACODA domain are obvious. An ontology is especially suited for representing and processing a large amount of information, providing the required capabilities to systematise the scientific knowledge produced in this field. A considerable part of the MACODA knowledge domain can be represented by the means of formal logics (predicate logics), supported by OWL ontologies knowledge representation standards. For example, the following excerpt of the MACODA ontology represents a fragment of the MACODA taxonomy (hierarchy of classes/subclasses) by using isA type of relation, and canSolve type of relation to express which algorithms can successfully be applied to which optimisation problems: JobShop isA SchedulingProblem; FlowShop isA JobShop; NSGA-II canSolve JobShop. Additionally, we can add our own specific knowledge to the knowledge base, for example, mySpecificSchedulingProblem isA FlowShopProblem. When we represent this knowledge by the means of OWL ontologies, we can query/retrieve the explicit
342
V. Basto-Fernandes et al.
knowledge present in the knowledge base (e.g. ‘what are the algorithms that canSolve JobShop?’ query would result in ‘NSGA-II’ algorithm), but also benefit from the formal logics-based inference done on the overall knowledge base by the querying engine (e.g. the result of ‘what are the algorithms that can solve mySchedulingProblem?’ query would include ‘NSGA-II’ algorithm, because mySpecificScheduling problem is a special case of JobShop and NSGA-II canSolve JobShop). The major benefits of using ontologies in knowledge management are given by [1, 10, 17]: . Ontologies improve knowledge search and retrieval by exploiting ontological background knowledge about the application domain. . They provide a solid structure for information gathering, integration and organisation. . Ontologies avoid semantic ambiguities of terms in a domain. . They support knowledge visualisation, valuable for analysing big amounts of data with complex interconnections and finding useful knowledge.
13.2.3 Semantic Web Knowledge representation by the means of OWL ontologies promotes a standardised and open representation of knowledge at the World Wide Web scale. A set of World Wide Web Consortium (W3C) standards, including the OWL (Web Ontology Language, intentionally named as OWL) standard, constitutes what has been identified as the Semantic Web or Web of Knowledge, in contrast to the Web of (HyperText Markup Language—HTML) Content. Berners-Lee et al. [3] describe the Semantic Web as ‘an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation’. The Semantic Web (or Web 3.0) provides a set of standards and technologies that enable computers to understand and manipulate data in a similar way to humans. It connects pieces of information contained in a document or application, rather than documents or applications itself, i.e. it is concerned with the semantics, not the structure of the data [20]. The collection, structuring and retrieval of data are enabled by a set of standards defined by the World Wide Web Consortium (W3C) that are used to formally represent metadata [9]. These technologies provide a common framework to share information across different applications and systems. The architecture of the Semantic Web is presented in Fig. 13.2. The lower layer standards of the Semantic Web Protocol Stack allow for resources identification and for basic forms of data representation, such as URI/IRI (Uniform Resource Identifier/Internationalised Resource Identifier) to identify OWL ontologies, classes, properties, etc., and XML (Extensible Markup Language) family of standards to
13 Many-Criteria Optimisation and Decision Analysis Ontology …
343
Fig. 13.2 Semantic Web Protocol Stack [28]
define lexical and syntactical structures and annotations of OWL ontologies. The upper layer standards allow for more abstract concepts and relations representation and modelling of the knowledge domain of interest (representation of knowledge domain relations—meaning/semantics), such as RDF (Resource Description Framework) and OWL (Web Ontology Language) family of standards. Query languages for data and knowledge representation layers are also defined and available in the Semantic Web standards stack (e.g. SPARQL query language). In this work, we are focussed on the OWL (Web Ontology Language), a knowledge representation language for ontologies. OWL has three sub-languages, namely, OWL Full, OWL Description Logic (DL) and OWL Lite [27]. OWL DL is the most suitable for our work, due to its well-balanced trade-off between language expressiveness and formal logics reasoning features. For the sake of space and clarity, we will not get into details on the differences between the OWL sub-languages. Some of the OWL DL features relevant for our work are [23] . It allows to set cardinality restrictions to restrict the number of distinct values a property may have (e.g. to express that one algorithm can solve one or more types of optimisation problems, one algorithm has one or more authors, one algorithm has only one creation year, etc.). . It has the possibility to declare two classes to be disjoint (e.g. to express that optimisation problems are either combinatorial or continuous). . It also allows to set classes as logical combinations (intersections, complements or unions) of other classes. . It defines functional, reflexive, symmetric, inverse and transitive properties (e.g. expressing that relation isExtensionOf is a transitive relation, and algorithmX isExtensionOf algorithmY, and algorithmY isExtensionOf algorithmZ allows the inference and querying engines to deal and process algorithmZ as an extension of algorithmX). OWL is an ontology language for the semantic Web with formally defined meaning, allowing the use of a reasoner that helps in maintaining a consistent and correct classes’ hierarchy, as well as formal logics inference and ontology querying, by the
344
V. Basto-Fernandes et al.
means of formal logics. Ontologies are OWL documents that can be published in the Web and may refer to or be referred from other OWL ontologies, enabling a richer integration, sharing and reuse of data. In 2009, W3C announced a new version of OWL, named OWL 2. OWL 2 has a very similar structure to OWL but introduced new features, such as increased expressive power for properties, extended support for datatypes, simple metamodelling capabilities, extended annotation capabilities and keys [25]. Moreover, it introduced three new profiles, OWL 2 EL, OWL 2 QL and OWL 2 RL [26]. OWL 2 EL is useful in applications employing large-scale ontologies. OWL 2 QL is aimed at applications that use very large volumes of instance data, where query answering is the most important reasoning task. OWL 2 RL is aimed at applications that require scalable reasoning without sacrificing too much expressive power. An OWL ontology comprises classes, individuals and properties. A class may have subclasses that represent concepts more specific than the superclass. The hierarchy of classes defines the taxonomy adopted in the ontology. Individuals represent class instances in the domain of interest. Properties are divided into two different kinds: object properties and data properties. An object property is a binary relation to relate classes or individuals, and a data property relates classes or individuals with a designed primitive data type (e.g. integer, string, boolean) [12]. Various environments and tools for building ontologies are available, such as OntoStudio, Protégé, NeOn Toolkit, Swoop and TopBraid Composer. With the growing adoption of the OWL, Protégé has become the most popular and widely used Semantic Web ontology editor. Protégé [6] desktop is a free, open-source, java-based ontology editor and framework for building both simple and complex ontology-based applications. It is supported by a strong community of academic, government and corporate users, who use Protégé to build knowledge-based solutions in areas as diverse as biomedicine, e-commerce and organisational modelling. Protégé is fully compliant with the latest OWL specifications and supports collaborative ontology editing as well as annotation of both ontology components and ontology changes. The Protégé editor screenshot in Fig. 13.3 shows an excerpt of the MACODA ontology. It shows the look and feel of the Protégé editor Graphical User Interface (GUI), and three panels displaying the MACODA taxonomy (left panel), the instances of MACODA classes (middle panel) and relations of MACODA domain concepts (right panel). MACODA ontology visualisation, comprehension and modification can be performed by knowledge engineers, using the Protégé ontology editor, with the help and input provided by domain experts (e.g. MACODA researchers/experts), who usually do not have knowledge engineering or semantic Web standards expertise. WebProtégé [24] is a lightweight ontology editor and knowledge acquisition tool for the Web that uses Protégé infrastructure. It can be accessible from any Web browser, has extensive support for ontology collaboration and a highly customizable and pluggable user interface that can be adapted to any level of user expertise. Protégé and WebProtégé are used in the present work for ontology design and edition.
13 Many-Criteria Optimisation and Decision Analysis Ontology …
345
Fig. 13.3 View of MACODA Ontology with Protégé Ontology Editor GUI [6]
13.2.4 Related Work With the increasing trend in the number of publications about many and multiobjective optimisation, the necessity of systematising the resulting knowledge in this domain has increased. The advantages of representing this knowledge in the form of ontologies were presented in a few previously published works, which served as basis of our work, namely: . In [12] ‘Building and Using an Ontology of Preference-Based Multiobjective Evolutionary Algorithms’, Li et al. proposed an OWL ontology to model and systematise the knowledge of preference-based multi-objective evolutionary algorithms (PMOEAs). This ontology aims to help researchers to understanding, accessing and analysing methods, or identifying future research topics. This work also explains how to build/extend the PMOEA ontology and it presents simple and practical examples for various use cases. The PMOEA ontology was built with the help of Protégé Desktop and made public in WebProtégé. . In [13] ‘An Ontology of Preference-Based Multi-objective Metaheuristics’, Li et al. provide an overview of preference-based multi-objective metaheuristics (PMOMHs) and proposed a novel method to systematise and manage the current knowledge in this field. It also details the process of building the PMOH ontology using Protégé. This work extends and improves the PMOEA ontology, and use cases are provided to demonstrate the benefits of the ontology. . In [29] ‘Presenting the ECO: Evolutionary Computation Ontology’, Yaman et al. present an ontology exclusively focussed in evolutionary computation, namely, genetic algorithms, genetic programming, evolutionary programming and evolutionary strategies. The ontology is used for strategies, operators and parameter
346
V. Basto-Fernandes et al.
selection of evolutionary algorithms, to solve optimisation problems, by means of evolutionary computation. . In [11] ‘Evolutionary Computation Ontology: E-Learning System’, an evolutionary computation ontology is designed and implemented by Kaur and Chaudhary, using Protégé. This ontology identifies the essential features of the subject Evolutionary Computation. It was designed for helping learners to enhance their knowledge level in the subject of evolutionary computation and it facilitates the learner to use the visualisation feature and query feature. . In [2] ‘A Survey of Diversity Oriented Optimisation: Problems, Indicators, and Algorithms’, Basto-Fernandes et al. provide an overview of the various concepts, methods and applications of diversity-oriented optimisation. To represent the domain of diversity-oriented search in a systematic way, an ontology was developed using Protégé with the intention to help users to classify algorithms correctly and find related work. Although uncountable OWL ontologies exist in a huge variety of knowledge domains (e.g. Internet of Things, Cybersecurity, etc.), very few ontologies were published in the multi-objective optimisation domain and none in the MACODA domain. Because an (OWL) ontology is not an end by itself, additional artefacts, tools and knowledge management process must be created, to ensure the use and enjoy the benefits of having a knowledge domain represented in an (OWL) ontology. An organised and comprehensible knowledge management process (concerned with acquisition, creation, dissemination, sharing and utilisation of knowledge) must be defined for the MACODA domain, to deal with question such as: what is the relevant knowledge in this domain?; how can it be used?; in which context may it be used?; are there restrictions to its use?; who provided it?; who created it?; is it of high quality and reliable?; how can it be searched, updated and harmonised among the MACODA research and practitioners community members?; how is change management and quality management supported?; etc. Promoting a MACODA research community working environment where knowledge management is done in a research community-wise base, represents a critical success factor to foster MACODA knowledge domain creation and sharing. MyCODA aims to be a community agreed and specific purpose platform to provide an important support for the MACODA domain knowledge management and curation, able to foster knowledge creation, sharing, use and innovation in this domain. No other platform is known by the authors, with this purpose, features and knowledge domain. Since one of the seven steps for developing an ontology suggested by [15] is ‘Consider reusing existing ontologies’, in this work, much attention was put on adopting the vocabularies and concepts presented in the mentioned ontologies, with adaptations designed to serve the purpose of the MACODA ontology and platform.
13 Many-Criteria Optimisation and Decision Analysis Ontology …
347
13.3 MyCODA Platform 13.3.1 Conceptual Model This section presents an overview of MyCODA platform’s features (available at http://macoda.club), use cases, best practices to be adopted in the ontology design and community-built ontology cooperation model. Among the main MyCODA platform features, we highlight the following: . Allow user registration and user authentication by email address and password. . Allow users profiles management (visitor, learner, optimisation practitioner, expert, moderator). . Generate a newsletter about MACODA upcoming events and new releases of MACODA ontology. . Allow users to subscribe to a newsletter by providing their email address, as well as to notify MyCODA platform moderator about upcoming events and news on MACODA area, by the means of a HTML form. . Allow users to search for and retrieve knowledge from the ontology by means of predefined, assisted creation or fully customised queries. . Allow users to visualise and explore visually the MACODA ontology. . Allow users to participate in discussions (forum) on MACODA ontology design updates/evolution and other MACODA topics. . Allow users to propose MACODA ontology updates by filling a HTML form sent to MyCODA platform moderator. . Allow users to propose topics for discussion, documents (journal and conference articles, tutorials, etc.), software frameworks and other MACODA-related materials to be indexed/available in MyCODA platform, by means of a HTML form available for users to fill, and to be sent to the platform moderator. . Allow MyCODA moderator to access user’s proposals, validate and perform changes in MACODA ontology and MyCODA platform contents. . Allow MyCODA platform visits statistics by content and visitor origin. We can foresee six different types of users , who have different perspectives of using the MyCODA platform : . The Visitor is an unregistered user who accesses the platform and intends to explore it. This actor has the lowest level of privileges. . The Learner corresponds to a student or a newcomer in the optimisation field that aims to quickly get familiar with the domain and learn from the platform. . The Optimisation Practitioner doesn’t intend to contribute to the knowledge domain, he/she just needs to solve optimisation problems with the help of the platform. . The Expert corresponds to an experienced optimisation researcher that can add knowledge into the platform and propose ontology changes or improvements, which will be then evaluated by domain experts.
348
V. Basto-Fernandes et al.
Fig. 13.4 MyCODA Web-Based Platform
. The Moderator is a special type of Expert that can perform additional actions, such as validating the Experts suggestions and updating the ontologies and the platform contents. . The Administrator is responsible for user’s management, platform design, evolution and maintenance. This cooperative community-built ontology model allows MACODA researchers and practitioners community to design and evolve a harmonised MACODA ontology in a well-structured, well-defined, systematic, formalised and standard way. MyCODA platform has the role of promoting knowledge management processes and practices in the MACODA domain (creating, acquiring, capturing, sharing, (re)using knowledge), by the means of customised tools, specifically designed for MACODA researchers and practitioners (e.g. querying and visualisation tools). In Fig. 13.4 we show the MyCODA platform initial version (available at http:// macoda.club), including . A (home) welcome area about the MACODA initiative launched in September 2019 at the Lorentz Center of University of Leiden—The Netherlands, and a short introduction to the MyCODA platform. . An education section pointing to educational and training materials, and courses on MACODA. . Events, pointing to recent and future events on MACODA. . Resources, pointing to scientific and technical materials on MACODA (e.g. scientific papers and software of reference in this knowledge domain). . MyCODA tools area, which provides a set of integrated tools to access, browse, visualise and query the MACODA ontology. . A forum section to support experts and researchers knowledge sharing, suggestions and discussions on ontology corrections, improvements, vocabulary and knowledge harmonisation. . A frequently asked questions section.
13 Many-Criteria Optimisation and Decision Analysis Ontology …
349
Fig. 13.5 View of MACODA taxonomy in the MyCODA platform
. Contact/Join Us form. . A list of MACODA ontology and MyCODA platform contributors, and their affiliations, in About Us section. A dynamic and flexible support for the MACODA ontology visualisation is provided by the means of the WebVOWL service [14], including a variety of visualisation, interaction, filtering and statistical features about the ontology. Text tree-based browsing of the MACODA taxonomy is also available in MyCODA. Figure 13.5 shows a snapshot of the taxonomy exploration around the Metaheuristic class branch. Other types of ontology entities and relations can be explored, searched and related, by providing their names, patterns or filters on their names (see Fig. 13.6). The MyCODA platform user will be able to run predefined queries, or build his/her own custom made queries, with MyCODA platform support. The user is not required any acquaintance with OWL ontology design or OWL query languages syntax. Figure 13.7 shows a predefined query example ‘What are the metaheuristics published after 2015?’ to be run on the MACODA ontology. Other predefined (or user custom made) queries could be: ‘What are the Python libraries implementing NSGA-III?’, ‘What are the metaheuristics that were tested in the Knapsack problem?’, ‘What are the metaheuristics that were tested in the Knapsack problem having Java libraries implementations?’, ‘Which order relations have been proposed to many-objective optimisation?’, ‘Who are the researchers working both in decomposition-based and indicator-based metaheuristics?’.
350
V. Basto-Fernandes et al.
Fig. 13.6 View of MACODA ontology browsing in the MyCODA platform
Fig. 13.7 View of MACODA ontology querying in the MyCODA platform
More advanced uses of the ontology can be achieved by adding knowledge to the ontology in the form of necessary and sufficient conditions, and description logic rules. It means that expert’s knowledge inserted in the ontology can result in great benefit for learners and optimisation practitioners, in cases where the queries involve complex levels of inference and a considerably high number of concepts and relations. For instance, the query ‘What are the metaheuristics that can be used to solve the knapsack problem?’ might benefit from knowledge inserted into the ontology by a MACODA domain expert, stating that all heuristics that have been applied successfully to solve a combinatorial problem can be used to solve any other combinatorial problem.
13 Many-Criteria Optimisation and Decision Analysis Ontology …
351
As Knapsack is a combinatorial problem, all metaheuristics used in any combinatorial problem are candidates to solve the knapsack problem. The reasoning ability provided by the OWL inference engine is also expected to support MACODA domain experts in their search for research gaps, emerging research topics and research communities.
13.3.2 Ontology Design Best Practices As researchers’ and practitioners’ community scale effort to represent, manage and access the MACODA knowledge domain in a systematised and standard way, a set of conventions and best practices (e.g. naming, design, commenting and annotation) must be adopted, to create a comprehensible, consistent, easy-to-maintain, easyto-update and easy-to-query ontology, and also to avoid some common modelling mistakes [15, 16]. The conventions and best practices described in the following intend to provide the basic guidelines to be adopted in the construction and evolution of the MACODA ontology. We grouped them in three categories, general best practices, naming conventions and versioning conventions. The following are some general best practices adopted for the MACODA ontology design: . The ontology must be documented in sufficient quality and detail. . Structure and vocabularies of existing ontologies should be reused as much as possible to promote the Semantic Web view of a harmonised and integrated Webscale knowledge base, improving the overall reasoning and querying potential. . External ontologies should be mapped to new created ontologies, to increase the likelihood of sharing and interoperability , i.e. new ontologies should reuse and link their entities (classes, properties, etc.) to the corresponding entities of existing ontologies, stating the eventual synonyms and equivalence relations. . Each class and property should have an IRI to address identifier space. . All classes and properties should have a definition. . Predicates must be clear and precisely defined. . The relationships in the ontology should be coherent. . Disjoint classes should be used to separate classes from each other, where the logic makes sense and dictates. . Property restrictions should be assigned sparingly and judiciously. . Annotation properties should be used to promote the usefulness and human readability of the ontology. . Information on how to contact the authors and how to contribute to the ontology should be available.
352
V. Basto-Fernandes et al.
The following are some naming conventions adopted for the MACODA ontology design: . Class names should start with a capital letter and should not contain spaces. . When a class name contains more than one word, the words should be together and the first letter of each word should be capitalised, e.g. ‘PreferenceModel’. . Classes should be named as single nouns. . Reserved words such as ‘class’, ‘property’ and so on should not be added to class names. . Abbreviations in classes names should be avoided. Exceptions are algorithms that are very commonly referred by their abbreviation, such as, for instance, NSGA-II or SMS-EMOA. . Property names should start with lower case, have no spaces and have the remaining words capitalised, e.g. ‘hasAuthor’. . A ‘has’ or ‘is’ prefix should be added to property names. . Properties should be named as verbs. . The verb sense should be adjusted for inverse properties, for example, Book hasAuthor John would be expressed inversely as John authored Book. In MyCODA ontology, versioning will be adopted in order to control the following ontology evolution stages: . . . .
Ontology initial version. Ontology version resulting from experts’ updates proposals. Ontology version resulting from MyCODA platform moderator validation. Ontology version resulting from MACODA researchers’ and practitioners’ community acceptance.
MyCODA relies on ontology storage at github.com platform, which offers distributed version control and access control. MACODA ontology updates proposed by the research and practitioner community, using the MyCODA platform, will follow a validation and verification life cycle, according to the progressive acceptance of the proposal.
13.4 Conclusions and Future Work This chapter gave a brief introduction of domain knowledge management using ontologies. It also introduced an initial version of the MACODA ontology that summarises current knowledge in the field of Many-Criteria Optimisation and Decision Analysis. MyCODA platform content and features were presented, and highlighted the role it may have for researchers, practitioners and learners in MACODA scientific knowledge management. Ontology design best practices and tools were suggested, and a collaborative ontology development model was proposed.
13 Many-Criteria Optimisation and Decision Analysis Ontology …
353
MACODA ontology and MyCODA platform will be maintained and further enriched by the authors. All researchers, experts, practitioners and learners are encouraged to contribute and keep MyCODA as a knowledge repository of the progress made in this emerging scientific field.
References 1. A. Abecker, L. van Elst, Ontologies for knowledge management, in Handbook on Ontologies, ed. by S. Staab, R. Studer (Springer, 2004), pp. 435–454 2. V. Basto-Fernandes, I. Yevseyeva, A. Deutz, M.T.M. Emmerich, A survey of diversity oriented optimization: problems, indicators, and algorithms, in EVOLVE – A Bridge between Probability, Set Oriented Numerics and Evolutionary Computation, (Springer, 2017), pp. 3–23 3. T. Berners-Lee, J. Hendler, O. Lassila, The semantic web. Sci. Am. 284(5), 34–43 (2001) 4. W. Borst, Construction of engineering ontologies for knowledge sharing and reuse. Ph.D. thesis, University of Twente, The Netherlands, 1997 5. P. Busquin et al., Third european report on science & technology indicators: towards a knowledge-based economy. Technical Report (European Commission, Brussels, Belgium, 2003) 6. M. Musen et al., Protégé. https://protege.stanford.edu/. Accessed 13 May 2020 7. M.R. Genesereth, N.J. Nilsson, Logical Foundations of Artificial Intelligence, (Morgan Kaufmann Publishers, 1987) 8. T.R. Gruber, A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (1993) 9. Information Resources Management. Association, Information Retrieval and Management: Concepts, Methodologies, Tools, and Applications, 1st edn. (IGI Global, Hershey, PA, USA, 2018) 10. I. Jurisica, J. Mylopoulos, E. Yu, Using ontologies for knowledge management: an information systems perspective. Am. Soc. Inf. Sci. 36, 482–496 (1999) 11. G. Kaur, D. Chaudhary, Evolutionary computation ontology: E-learning system, in Reliability, Infocom Technologies and Optimization (ICRITO), (IEEE Press, 2015), pp. 1–6 12. L. Li, I. Yevseyeva, V. Basto-Fernandes, H. Trautmann, N. Jing, M.T.M. Emmerich, Building and using an ontology of preference-based multiobjective evolutionary algorithms, in Evolutionary Multi-criterion Optimization (EMO), (Springer, 2017), pp. 406–421 13. L. Li, I. Yevseyeva, V. Basto-Fernandes, H. Trautmann, N. Jing, M.T.M. Emmerich, An ontology of preference-based multiobjective metaheuristics (2017) 14. S. Lohmann, S. Negru, F. Haag, T. Ertl, Visualizing ontologies with VOWL. Semant. Web. 7(4), 399–419 (2016) 15. N.F. Noy, D.L. McGuinness et al., Ontology development 101: a guide to creating your first ontology (2001). https://protege.stanford.edu/publications/ontology_development/ ontology101.pdf. Accessed 2 May 2022 16. Open Semantic Framework. Ontology best practices. https://wiki.opensemanticframework.org/ index.php/Ontology_Best_Practices#cite_note-odp3-3. Accessed 3 June 2020 17. M. Park, K.-W. Lee, H.-S. Lee, P. Jiayi, J. Yu, Ontology-based construction knowledge retrieval system. KSCE J. Civ. Eng. 17(7), 1654–1663 (2013) 18. A. Rodríguez-Pose, Leveraging Research, Science and Innovation to Strengthen Social and Regional Cohesion. Technical report (European Commission, Brussels, Belgium, 2015) 19. H. Scarbrough, J. Swan, J. Preston, I. of Personnel, and Development. Knowledge Management: A Literature Review. Issues in People Management, Institute of Personnel and Development, London, United Kingdom, 1999 20. C. Semantics, Introduction to the semantic web. https://www.cambridgesemantics.com/blog/ semantic-university/intro-semantic-web/. Accessed 6 May 2020
354
V. Basto-Fernandes et al.
21. B. Smith, The Blackwell Guide to the Philosophy of Computing and Information, (Wiley, 2008), pp. 153–166 22. R. Studer, V.R. Benjamins, D. Fensel, Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998) 23. A. Tatnall, Web Technologies: Concepts, Methodologies, Tools, and Applications (Contemporary Research in Information Science and Technology, IGI Global, 2009) 24. T. Tudorache, C. Nyulas, N.F. Noy, M.A. Musen, Webprotégé: a collaborative ontology editor and knowledge acquisition tool for the web. Semant. Web. 4(1), 89–99 (2013) 25. W3C, Owl 2 web ontology language document overview (second edition). https://www.w3. org/TR/owl2-overview/. Accessed 11 May 2020 26. W3C, Owl 2 web ontology language profiles (second edition). https://www.w3.org/TR/owl2profiles/. Accessed 13 May 2020 27. W3C, Owl web ontology language overview. https://www.w3.org/TR/2004/REC-owlfeatures-20040210/#s2. Accessed 11 May 2020 28. A. Walker, A wiki for business rules in open vocabulary, executable english (2011) 29. A. Yaman, A. Hallawa, M. Coler, G. Iacca, Presenting the ECO: evolutionary computation ontology, in Applications of Evolutionary Computation, (Springer, Germany, 2017), pp. 603– 619
Glossary
(1-k) dominance Dominance relation based on counting results of componentwise comparisons. A posteriori optimisation/decision making The approach to first compute a Pareto front or a set of potentially interesting solutions based on a general multiobjective problem formulation with several objectives, and then present it to the decision maker to pick the best solution from this set. A priori optimisation/decision making The approach to first define a utility function that maps a multiobjective optimisation problem to a single objective optimisation problem, and then solve this problem mathematically and output a single optimal solution. Alpha dominance Dominance relation considering trade-off between objectives to prevent dominance resistant points. Asynchronous evolutionary algorithms Evolutionary algorithms that work on parallel architectures supporting asynchronous solution evaluations, selection and search operators. Bayesian optimisation Optimisation algorithm based on statistical models (Gaussian Processes, Kriging) from previous evaluations. Benchmarking suite The set of problems a benchmark consists of. This includes the definition of different scales for input and output, as well as instances. Brood interleaving strategy A form of interleaving strategy based on the concept of brood selection. Chance-constraint problem Used in robust optimisation, this is a formulation based on the probability of meeting a constraint. Closed-loop optimisation Optimisation by directly connecting (e.g. physical, chemical or biological) experimental rigs to computer-based search. Cone orders Generalization of the Pareto orders. The dominated region of a point in objective space is a polyhedral cone. Constraint (soft, hard) A function that has to be without loss of generality zero or negative for a solution to be feasible. A soft constraint, in practice, can be
© Springer Nature Switzerland AG 2023 D. Brockhoff et al. (eds.), Many-Criteria Optimization and Decision Analysis, Natural Computing Series, https://doi.org/10.1007/978-3-031-25263-1
355
356
Glossary
loosened (such as cost) while a hard constraint must be met (such as manufacturing tolerances). Control of Dominance Area of Solutions (CDAS) Alternative order relation to the Pareto dominance that uses the dominated area. Correlated objectives A positive correlation means that improving one objective also improves at least one other objective. Criterion, pl. criteria Synonym to objective or objective function. Decision Maker (DM) Typically a human who makes decisions about which solution will get implemented. Can specify preferences and might be also involved in the problem formulation itself. Decision space Synonym for search space. Decomposition based MOEA A MOEA that decomposes the problem of approximating a Pareto front into several subproblems of targeting subregions of the Pareto front, and solves these subproblems simultaneously. Ephemeral resource constraint In closed-loop optimisation, a pseudoconstraint that exists only at runtime that prevents a solution from being evaluated because a certain resource needed for the evaluation is depleted (temporarily). Epsilon dominance Dominance relation with parameter epsilon. Used in the componentwise comparison to increase/decrease the coordinates of one solution. Evolutionary Algorithm (EA) An evolutionary algorithm mimicks the concepts of biological evolution to optimise by means of (stochastic) mutation and recombination operators and selection mechanisms, filtering out the lesser-quality solutions over time. Evolutionary Computation (EC) Research field about optimisation algorithms that mimick or are inspired by biological evolution, covering areas such as evolutionary algorithms, ant colony optimisation and other randomised search heuristics. Evolutionary Multiobjective Optimisation (EMO) Research field about multiobjective optimisation problems, theory, applications and algorithms based on and/or inspired by concepts like biological evolution, natural selection etc. Expected improvement Criterion to select a point in Bayesian Optimisation. Expected value of the improvement upon the currently best solution with respect to a predictive probability distribution. Fast-first strategy A technique for handling heterogeneous objectives differing in latency equating to basing fitness assignment on only the fast objective(s) of a problem for most of the optimisation run, using the slow objective only at the end of the run. Favour relation Order relation proposed for many objective optimisation. This in-transitive relation counts in how many objectives one solution is better than another solution. Feasible solution A (search) point in the search space that fulfills all constraints, opposite of an infeasible solution. Goal (vector) Vector of target (f-) values in goal programming approaches.
Glossary
357
Heatmap Matrix where each cell is colored according to the corresponding value. Often the heatmap is augmented with a dendrograms that represents the distances between rows and columns. Heterogeneous objectives and constraints Objective function components of a problem that differ in complexity, latency, uncertainty, domain type, co-domain type, black box vs analytically known, or in other ways that might affect search. Hyperparameter Parameters of an algorithm that can be used to influence its behaviour. The population size, for example, is a common hyperparameter of evolutionary algorithms. Hyperparameter tuning The process of finding optimal hyperparameters for an algorithm. Hypervolume indicator The hypervolume indicator is a Performance indicator of an approximation set to the Pareto front. It is the Lebesgue measure of the union over a Pareto approximation set in the objective space of the (hyper)spaces that are dominated by its elements and bounded from above by a reference point. It is a Pareto compliant performance indicator for the quality of a Pareto front approximation. Ill-conditioning A problem is badly conditioned if, starting from the same point (in the decision space), moving in one direction would give you a small difference in the objective value, but going into another direction would give you a very large difference in the objective value. The condition number of a problem is related to the ratio of the highest and lowest eigenvalue in the form matrix of a convex quadratic problem. Indicator based MOEA A MOEA that has been designed to progress in improving a quality indicator for a Pareto front approximation set, such as the R2indicator or the hypervolume indicator. Interactive decision maps An interactive decision making visualization tool. Interactive optimisation/decision making When the decision making and the optimisation process happen intertwined with feedback from one to the other, we talk about interactive optimisation or interactive decision making. Interleaving strategy A technique for handling heterogeneous objectives differing in latency that performs evaluations on slow and fast objectives in parallel, but more on the fast objective, and interleaves the information from both to assign fitness. Interruption (in closed loop optimisation) In closed-loop optimisation, an interruption to the usual progress of the evolutionary algorithm at runtime due to an active ephemeral resource constraint. Such interruptions prevent solutions from being evaluated and hence introduce a bias to the search that needs to be accounted for. Knee point/region Knee points are ’bulges’ in the Pareto front. That is, knee points are a subset of Pareto optimal solutions for which an improvement in one objective will result in a severe degradation in at least one other one. Latency The time taken for a process, in particular a function evaluation, see heterogeneous objectives.
358
Glossary
Linear order Also called total order, an order which is antisymmetric, transitive, and connected (total). An example is the less or equal relation on the reals. Many-Criteria Optimisation and Decision Analysis (MACODA) Term used for the combination of optimisation and decision analysis research fields, with a large number of objective functions and/or criteria. It served as the title of the workshop held in September 2019 at the Lorentz Center in Leiden, the Netherlands, which gave rise to this book. Many-Criteria/Many-objective problem (MaOP) An optimisation problem with four or more objective functions. Minimal set Given a strict partial order or a strict pre-order, the minimal set is the set of solutions that is not dominated by any other solution. Mixed-integer problem An optimisation problem with both continuous and discrete variables. Multiobjective Evolutionary Optimisation Algorithm (MOEA) An algorithm that uses evolutionary mechanisms (mutation, recombination, selection, etc.) to find an (approximate) solution to a multiobjective optimisation problem. Sometimes it is interpreted in a broader sense, including other population-based algorithms that apply some variation/selection principle, such as particle swarm optimisation. Multiobjective optimisation Optimisation scenario in which two or more (conflicting) objectives must be optimised. Nadir/ideal points Objective-wise worst/best value over the Pareto front. Noisy objective For noisy objective functions, evaluating the same solution does not always return the exact same value. Non-dominated sorting A mathematical procedure to identify the nondominated solutions within a given solution set as well as all subsequent nondomination fronts (for example, the second non-domination front contains all solutions that are not dominated in the entire set without the overall non-dominated solutions). Non-injective objective functions Objective functions are non-injective if different solutions map to the same point in the objective space. Number of objectives Number of objective functions of an optimisation problem, typically denoted as m Objective Synonym to criterion and objective function. Objective function The function to be optimised in a multi- or many-criteria problem, often used synonymous to criterion and objective. Objective space The image of the (feasible) search space under the objective functions. Also called evaluation space sometimes. Order extension An order relation extends another order relation when it preserves all ordered pairs of the original order relation and possibly adds some ordered pairs. Parallel coordinates Equivalent to total order. Pareto dominance A partial preorder on the search space, indicating which search points are better than others. The minimal elements of it comprise the Pareto set.
Glossary
359
Pareto front Image of the Pareto set (in objective space). Pareto set Set of all solutions (in search space) that are not dominated by others. Pareto-based MOEA A MOEA using a selection strategy that is firstly based on Pareto dominance and secondly on diversity (or coverage) to sort a population. Partial order An order relation that is transitive, reflexive and antisymmetric. In general a partial order can have indifferent, comparable and incomparable pairs. Indifference coincides with equality in a partially ordered set. Performance indicator See quality indicator. Preference (among objectives) Some information by the decision maker on whether some objectives are considered to be more important than others. Preference articulation This action is normally a part of interactive decision making where, in the light of new information revealed by the ongoing optimisation process, the DM amends the goal/objective vector. Preference information Information that a decision maker provides in various ways to indicate the type of tradeoffs or solutions that are interested or relevant to her/him. Progressive method/decision making Decision making and computation/search alternates several time. The decision maker gradually refines preferences in an interactive way. Quality indicator Cf. quality measures chapter. A numeric representation of the quality of the discovered solution of an optimisation problem. In multi-objective optimisation, indicators often express the distance to and coverage of the Pareto front. Reference point Either, in the context of the hypervolume indicator the upper bound for the region considered for the Lebesgue measure of the dominated space, or, in the context of preference articulation, desirable values for each objective in objective space that represents a decision maker’s preference. Safe optimisation Safe optimisation refers to the use of strategies to mitigate losses (of life, resources) when optimizing configurable entities that can be damaged by some combinations of parameter settings (or some allele combinations). Scatter plot matrix Matrix where each entry is a scatter plot for the projection of the vectors in a set to a a pair of coordinates. Search space The set of solutions in which the Pareto set is sought. Other name for decision space. Separable objective Separability means that you can optimize each variable individually (if the separability is partial, this holds for just some variables). Speculative interleaving strategy A form of interleaving strategy based on looking ahead at how a solution that has not been evaluated fully yet might perform. Spiderweb plot Plot that uses one ray per coordinate of a high-dimensional space. The rays have the same origin. The connected polyline. Surrogate model Model that replaces the original model. Surrogate-assisted evolutionary algorithms Methods using fitness approximation techniques in place of some objective function evaluations, typically based on neural networks, Gaussian processes, radial basis functions, or similar function regression methods.
360
Glossary
Total order A binary relation which is anti-symmetric, transitive and total (where total means that for any two elements x,y : (x,y) or (y,x) belongs to the relation). In other words, a total order is a partial such that any two elements are comparable. Parallel coordinates plot Plot (visualisation) that uses one vertical axis per coordinate and represents vectors by polylines.that connect points on these axis to polylines and allows interactive selection of solutions by narrowing down coordinate ranges (brushing). Also called Parallel coordinates diagram. Trust region method An optimisation technique based on approximating cost function regions, usually with a quadratic approximation. Uncertain objective function An objective function giving a stochastic or uncertain output for a given input (decision vector). Utility function A utility function is in the context of MCDM a function that summarizes all objective function values in a single scalar. Volume dominance Hypervolume-indicator based order relation; Considers the overlap of the dominated regions of two points and the non-overlapping parts separately. Waiting strategy (in the context of heterogeneous objectives) A technique for handling heterogeneous objectives differing in latency equating to simply waiting for the slower objective(s) to be evaluated while the faster objective(s) have already been evaluate. Weak Pareto dominance A solution is weakly dominated by another solution if the other solution dominates it but it is not better in all objectives.