Heuristics for Optimization and Learning 3030589293, 9783030589295

This book is a new contribution aiming to give some last research findings in the field of optimization and computing. T

263 80 13MB

English Pages 442 [444] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1 Process Plan Generation for Reconfigurable Manufacturing Systems: Exact Versus Evolutionary-Based Multi-objective Approaches
1.1 Introduction
1.2 Literature Review
1.3 Problem Description and Mathematical Formulation
1.3.1 Problem Description
1.3.2 Mathematical Formulation
1.4 Proposed Approaches
1.4.1 Iterative Multi-Objective Integer Linear Program (I-MOILP)
1.4.2 Adapted Archived Multi-Objective Simulated-Annealing (AMOSA)
1.4.3 Adapted Non Dominated Sorting Genetic Algorithm II (NSGA-II)
1.5 Experimental Results and Analyses
1.5.1 Experimental Scheme 1
1.5.2 Experimental Scheme 2
1.6 Conclusion
References
2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving Hybrid Flow Shop Scheduling Problem with Uniform Parallel Machines and Sequence Independent Setup Time
2.1 Introduction
2.2 Description of the Hybrid Flow Shop Problem
2.3 Resolution
2.3.1 Initialization Heuristics
2.3.2 Metaheuristics
2.4 Numerical Simulation
2.4.1 Simulation Instances
2.4.2 Experimental Results
2.5 Conclusion
References
3 A Variable Block Insertion Heuristic for the Energy-Efficient Permutation Flowshop Scheduling with Makespan Criterion
3.1 Introduction
3.2 Problem Formulation
3.3 Energy-Efficient VBIH Algorithm
3.3.1 Initial Population
3.3.2 Energy-Efficient Block Insertion Procedure
3.3.3 Energy-Efficient Insertion Local Search
3.3.4 Energy-Efficient Uniform Crossover and Mutation
3.3.5 Archive Set
3.4 Computational Results
3.5 Conclusions
References
4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems Using Binary Genetic Algorithm
4.1 Introduction
4.2 Literature Review
4.3 Problem Formulation
4.4 Bi-Objective BGA
4.5 Computational Results
4.6 Conclusion
References
5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large Instances of the Multi-objective QAP
5.1 Introduction
5.2 Related Works
5.3 The APM-MOEA Model
5.3.1 Global Search View of the Organizer
5.3.2 Asynchronous Communications
5.3.3 Control Islands
5.3.4 Local Search
5.4 Experimental Results
5.4.1 Performance Metrics
5.4.2 The GISMOO Algorithm
5.4.3 MQAP Instances
5.4.4 Experimental Conditions
5.4.5 Resolution of Small MQAP Instances
5.4.6 Resolution of Large MQAP Instances
5.5 Conclusion
References
6 Learning from Prior Designs for Facility Layout Optimization
6.1 Introduction
6.2 Related Work
6.3 Facility Layout Model
6.4 Similarity Model
6.4.1 Probabilistic Layout Model
6.4.2 Estimation
6.5 Similarity in Layout Optimization
6.6 Experiments
6.7 Discussion
References
7 Single-Objective Real-Parameter Optimization: Enhanced LSHADE-SPACMA Algorithm
7.1 Enhanced LSHADE with Semi-parameter Adaptation Hybrid with CMA-ES (ELSHADE-SPACMA)
7.1.1 LSHADE Algorithm
7.1.2 CMA-ES Algorithm
7.1.3 Semi-parameter Adaptation of Scaling Factor (F) and Crossover Rate (Cr)
7.1.4 LSHADE-SPACMA Algorithm
7.1.5 AGDE Mutation Strategy
7.1.6 ELSHADE-SPACMA Hybridization Framework
7.2 Experimental Study
7.2.1 Numerical Benchmarks
7.2.2 Parameter Settings and Involved Algorithms
7.3 Experimental Results and Discussions
7.3.1 Results of ELSHADE-SPACMA Algorithm
7.4 Conclusion
References
8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach
8.1 Introduction
8.2 Overview of Bulk Optimization
8.3 Integrated Production Planning and Scheduling
8.4 Literature Review
8.5 Integrated Production Planning and Scheduling
8.6 Column Generation Method
8.6.1 Parallel Approach
8.6.2 Primal Heuristic
8.7 Computational Experiments
8.8 Final Remarks
References
9 Heuristic Solutions for the (α, β)-k Feature Set Problem
9.1 Introduction
9.2 Problem Statement
9.3 Building an Instance of (α, β)-k FSP
9.4 The Proposed Solution Method
9.4.1 Obtaining the Minimum Number of Features
9.4.2 Obtaining the Maximum Value of β
9.5 Computational Results
9.6 Conclusion
References
10 Generic Support for Precomputation-Based Global Routing Constraints in Local Search Optimization
10.1 Introduction
10.2 Background
10.2.1 Constraint-Based Local Search, the OscaR Way
10.2.2 Sequence Variables
10.2.3 Cross-Product of Neighbourhoods
10.2.4 Conventions for Modeling VRP
10.3 A Generic Routing Global Constraint
10.3.1 Template for Precomputation-Based Global Constraints
10.3.2 Route Length with Asymmetric Distance Matrix
10.3.3 Keeping Track of Segments and Precomputations
10.3.4 Segment-Based API
10.3.5 Logarithmic Reduction of Quadratic Precomputations
10.4 A Global Constraint for Drone Autonomy
10.5 Global Time Window Constraint
10.6 Related Work
10.7 Conclusion and Perspectives
References
11 Dynamic Simulated Annealing with Adaptive Neighborhood Using Hidden Markov Model
11.1 Introduction
11.2 Literature Review
11.3 The Proposed Approach
11.3.1 Viterbi Algorithm
11.3.2 Baum Welch Algorithm
11.4 Experiments
11.4.1 Experimental Setup
11.4.2 Numerical Results
11.4.3 Comparison of Convergence Performance
11.4.4 Statistical Analysis
11.5 Conclusion and Future Research
References
12 Hybridization of the Differential Evolution Algorithm for Continuous Multi-objective Optimization
12.1 Introduction
12.2 Literature: DE Algorithms for MO Problems
12.2.1 The de Algorithm: Basic Notions
12.2.2 NSDE ch12Iorio04
12.2.3 DEMO ch12Robic05
12.2.4 ADE-MOIA ch12Lin15
12.3 Hybridization Between de and IWO: IWODEMO
12.3.1 The Weed Analogy
12.3.2 IWODEMO
12.4 Numerical Experiments
12.4.1 Instances: DTLZ and ZDT
12.4.2 Metrics
12.4.3 Experimental Conditions and Parameters
12.5 Results and Analysis
12.5.1 Results for the DTLZ Instances
12.5.2 Results for the ZDT Instances
12.5.3 Analysis
12.6 Conclusion
References
13 A Steganographic Embedding Scheme Using Improved-PSO Approach
13.1 Introduction
13.2 Particle Swarm Optimization
13.3 Steganographic Scheme Based Improved-PSO
13.3.1 General Description
13.3.2 Improved-PSO Embedding Scheme
13.4 Experimental Setup
13.4.1 Experimental Case 1
13.4.2 Experimental Case 2
13.4.3 Experimental Case 3
13.5 Conclusion
References
14 Algorithms Towards the Automated Customer Inquiry Classification
14.1 Introduction
14.2 Related Works
14.3 Methods
14.3.1 Preprocessing Phase
14.3.2 Training/Testing Phase
14.4 Data Description
14.5 Empirical Analysis
14.6 Optimization with Neural Networks
14.7 Conclusion
14.8 Future Work
References
15 An Heuristic Scheme for a Reaction Advection Diffusion Equation
15.1 Introduction
15.2 New Optimized Domain Decomposition Methods
15.2.1 Optimized Domain Decomposition Methods
15.2.2 The OO2 Method
15.2.3 New Domain Decomposition Method with Two Iteration (AlgDF)
15.3 Evolutionary Algorithm for PDE
15.4 Numerical Results
15.5 Conclusion
References
16 Stock Market Speculation System Development Based on Technico Temporal Indicators and Data Mining Tools
16.1 Introduction
16.2 Related Works
16.3 Proposed Search Algorithm
16.4 Experimental Results
16.5 Conclusion and Perspectives
References
17 A New Hidden Markov Model Approach for Pheromone Level Exponent Adaptation in Ant Colony System
17.1 Introduction
17.2 Related Work
17.3 Proposed Method
17.3.1 Hidden Markov Model
17.4 Experimental Results and Comparison
17.4.1 Comparison on the Solution Accuracy
17.4.2 Comparison on the Convergence Speed
17.4.3 Statistical Test
17.5 Conclusion
References
18 A New Cut-Based Genetic Algorithm for Graph Partitioning Applied to Cell Formation
18.1 Introduction
18.2 Formulation
18.2.1 Input data
18.2.2 Flow Graph Construction
18.2.3 Decision Variables
18.2.4 Intermediate Processing
18.2.5 Constraints
18.2.6 Objective Function
18.3 Theoretic Preliminaries
18.4 The Cut-Based GA
18.4.1 Principles of the Genetic Algorithm
18.4.2 GA Implementation
18.5 Computational Results
18.6 Conclusions
References
19 Memetic Algorithm and Evolutionary Operators for Multi-Objective Matrix Tri-Factorization Problem
19.1 Introduction
19.2 Multi-Objective Non-Negative Matrix Tri-Factorization Problem
19.3 Classical Evolutionary Operators
19.4 Naive Approach
19.5 Memetic Algorithm
19.5.1 Evolutionary Algorithm
19.6 Experiments
19.7 Results
19.8 Conclusion and Future Work
References
20 Quaternion Simulated Annealing
20.1 Introduction
20.2 Background
20.2.1 Neighborhood Structure
20.2.2 Quaternions
20.3 Simulated Annealing Based Quaternions
20.4 Experimental Results
20.4.1 Benchmark Functions
20.4.2 Comparison of the Convergence Speed
20.4.3 Performance Comparison with Other Optimization Algorithms
20.4.4 Statistical Test
20.5 Conclusion
References
21 A Cooperative Multi-swarm Particle Swarm Optimizer Based Hidden Markov Model
21.1 Introduction
21.2 Literature Review
21.3 Cooperative Multi-swarm Conception of PSO
21.3.1 Standard PSO
21.3.2 Sub-swarms Constitution
21.3.3 Sub-swarms Parameters Adaptation
21.3.4 Multi-swarms Cooperation
21.4 Experimentation
21.4.1 Parameters Setting
21.4.2 Performance Evaluation
21.4.3 Statistical Tests
21.5 Conclusion
References
22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method
22.1 Introduction
22.2 Background Information
22.2.1 Differential Evolution
22.2.2 Grid-Based Parameter Adaptation Method
22.3 Experimental Analysis
22.4 Conclusion
References
23 Auto-Scaling System in Apache Spark Cluster Using Model-Based Deep Reinforcement Learning
23.1 Introduction
23.2 Background
23.2.1 Apache Spark on OpenStack
23.3 Methodology
23.3.1 Feature Selection
23.3.2 Applied DQN for Auto-Scaling Task
23.3.3 Auto-Scaling System Design
23.4 Evaluation
23.5 Conclusion
References
24 Innovation Networks from Inter-organizational Research Collaborations
24.1 Introduction
24.2 Related Work
24.3 Network Generation with Linkage Threshold
24.3.1 Description of the Algorithm
24.3.2 Complexity Analysis
24.4 Experiment Setup
24.4.1 Dataset
24.4.2 Network Metrics
24.5 Network Analysis
24.6 Discussion
24.7 Conclusion
References
25 Assessing Film Coefficients of Microchannel Heat Sinks via Cuckoo Search Algorithm
25.1 Introduction
25.2 Heat Transfer Problem
25.2.1 Design Heat Transfer Problem
25.2.2 Objective of the Direct Heat Transfer Problem
25.2.3 Objective of the Inverse Heat Transfer Problem
25.3 Cuckoo Search Algorithm
25.4 Methodology
25.5 Results and Discussion
25.6 Conclusions
References
26 One-Class Subject Authentication Using Feature Extraction by Grammatical Evolution on Accelerometer Data
26.1 Introduction
26.2 Related Work
26.3 Preliminaries
26.3.1 Data-Set Description
26.3.2 Data Preparation
26.3.3 Training, Validation and Test Sets
26.4 Evolutionary System
26.4.1 System Overview
26.4.2 Grammar
26.5 Experiment Design
26.5.1 Baselines
26.5.2 Run Parameters
26.6 Results and Discussion
26.6.1 Frequently Selected Sub-sequences and Functions
26.7 Conclusions
References
27 Semantic Composition of Word-Embeddings with Genetic Programming
27.1 Introduction
27.2 Learning Word Embeddings Using Neural Networks
27.2.1 Generation of the Embeddings
27.3 A Genetic Programming Approach for Word-Embedding Composition
27.4 Problem Benchmark: Word Analogy Task
27.5 Description of the GP Approach
27.5.1 GP Operators
27.5.2 Fitness Function
27.6 Experiments
27.6.1 Numerical Results
27.6.2 Evaluating Answers and Evolved Programs
27.6.3 Comparison of the Different Fitness Functions
27.7 Conclusions and Future Work
References
28 New Approach for Continuous and Discrete Optimization: Optimization by Morphological Filters
28.1 Introduction
28.2 A Brief Overview of Existing Metaheuristic Algorithm
28.3 Optimization by Morphological Filter
28.3.1 Inspiration
28.4 Results and Discussion
28.4.1 Real Programming Problems
28.4.2 Integer Programming Problem
28.5 Conclusion
References
Appendix Index
Appendix Index
Index
Recommend Papers

Heuristics for Optimization and Learning
 3030589293, 9783030589295

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Computational Intelligence 906

Farouk Yalaoui Lionel Amodeo El-Ghazali Talbi   Editors

Heuristics for Optimization and Learning

Studies in Computational Intelligence Volume 906

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/7092

Farouk Yalaoui Lionel Amodeo El-Ghazali Talbi •



Editors

Heuristics for Optimization and Learning

123

Editors Farouk Yalaoui Laboratory of Logistics and Optimization of Industrial Systems (LOSI) University of Technology of Troyes (UTT) Troyes, France

Lionel Amodeo Industrial System Optimization Laboratory University of Technology of Troyes (UTT) Troyes, France

El-Ghazali Talbi CRISTAL UMR CNRS 9189 & INRIA Lille Nord Europe, Parc Scientifique de la Haute Borne Polytech’Lille—Univrsité of Lille Villeneuve d’Ascq, France

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-58929-5 ISBN 978-3-030-58930-1 (eBook) https://doi.org/10.1007/978-3-030-58930-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This book is a new contribution aiming to give some last research findings in the field of optimization and computing. This work, book, is in the same field target than the previous books published in our two last books: L. Amodeo, E.-G. Talbi, F. Yalaoui. Recent Developments in Metaheuristics book in Springer Series in Operations Research/Computer science Interfaces series ORCSCO Recent Developments in Metaheuristics, and E.-G. Talbi, F. Yalaoui, L. Amodeo. Metaheuristics for production systems book in Springer Series in Operations Research/Computer science Interfaces series ORCSCO “Metaheuristics for production systems”. The challenge with this work is to gather the main contribution in three fields, optimization technique for production decision, general development for optimization and computing method, and widespread applications. The number of research dealing with decision-maker tool and optimization method grows very quickly these past years and in a large number of fields. We may be able to read nice and worthy works from research developed in chemical, mechanical, computing, automotive, and many other fields. We keep focus to bring to the reader some contributions, which deserve to be shared more widely already partially presented in workshops and conferences. The present document is built on 28 chapters. These chapters could be considered into three groups. The first one, Chaps. 1–6, is dedicated to production and logistics issues. The second group, Chaps. 7–12, aims at presenting some new optimization and modeling techniques based on metaheuristics. The goal of the third group, Chaps. 13–28, is to develop some advanced metaheuristic approaches to solve real-life and specific applications. All the results proposed in the present document were accepted and presented during the conference META’18, The 7th International Conference on Metaheuristics and Nature Inspired Computing, Marrakech in Morocco on October 27–31, 2018. Chapter 1 is entitled “Process Plan Generation for Reconfigurable Manufacturing Systems: Exact Versus Evolutionary-Based Multi-objective Approaches” and proposed by Faycal A. Touzout, Hichem Haddou Benderbal, Amirhossein Khezri, Lyes Benyoucef. The contribution of the chapter is to address the multi-objective process plan generation problem in RMS (Reconfigurable v

vi

Preface

Manufacturing System) environment. Three approaches are proposed and compared: an iterative multi-objective integer linear program (I-MOILP) and adapted versions of the well-known evolutionary algorithms, respectively, archived multi-objective simulated annealing (AMOSA) and the non-dominated sorting genetic algorithm (NSGAII). The next chapter, Chap. 2, is entitled “On VNS-GRASP and Iterated Greedy Metaheuristics for Solving Hybrid Flow Shop Scheduling Problem with Uniform Parallel Machines and Sequence Independent Setup Time” and proposed by Said Aqil and Karam Allali. The authors present some effective metaheuristics for solving hybrid flow-shop scheduling problem with uniform parallel machines and sequence-independent setup time. Three metaheuristics are implemented: the variable neighborhood search algorithm, the greedy randomized adaptive search procedure, and the iterative greedy algorithm. The objective function considered is the minimization of the total flow-time taking into account the availability constraints of the machines during scheduling. M. Fatih Tasgetiren, Hande Oztop, Quan-Ke Pan, M. Arslan Ornek, Talya Temizceri proposed Chap. 3 is entitled “A Variable Block Insertion Heuristic for the Energy-Efficient Permutation Flowshop Scheduling with Makespan Criterion”. The research proposed deals with permutation flow-shop scheduling problem. The authors consider a bi-objective permutation flow-shop scheduling problem with the objectives of minimizing the total energy consumption and the makespan. A bi-objective mixed-integer programming model for the problem applying a speed scaling approach is proposed. The augmented-constraint method generates the Pareto-optimal solution sets for small-sized instances is implemented. Chapter 4, proposed by Ozgur Kabadurmus, M. Fatih Tasgetiren, Hande Oztop, M. Serdar Erdogan, is entitled “Solving 0–1 Bi-Objective Multi-dimensional Knapsack Problems Using Binary Genetic Algorithm”. The bi-objective multi-dimensional knapsack problem (BOMDKP) is studied. A Binary Genetic Algorithm (BGA) with an external archive for the problem is built. Chapter 5 is entitled “An Asynchronous Parallel Evolutionary Algorithm for Solving Large Instances of the Multi-objective QAP” by Florian Mazière, Pierre Delisle, Caroline Gagné, and Michaël Krajecki. The authors propose APM-MOEA, a parallel model to solve the Multi-Objective Quadratic Assignment problem (MQAP) using an evolutionary algorithm. It is based on an island model with objective space division. Results show that according to four multi-objective metrics, APM-MOEA outperforms all implementations in terms of convergence or diversity. Chapter 6, entitled “Learning from Prior Designs for Facility Layout Optimization”, was written by H. Rummukainen, J. K. Nurminen, T. Syrjänen, and J.-P. Numminen. The problem of facility layout involves not only optimizing the locations of process components on a factory floor, but in real-world applications there are numerous practical constraints and objectives that can be difficult to formulate comprehensively in an explicit optimization model. As an alternative to

Preface

vii

explicit modeling, the authors present an optimization approach that learns structural properties from examples of expert-designed layouts of other similar facilities, and considers similarity to the examples as one objective in a multi-objective facility layout optimization problem. Chapter 7, is entitled “Single-Objective Real-Parameter Optimization: Enhanced LSHADE-SPACMA Algorithm” by Anas A. Hadi, Ali W. Mohamed, and Kamal M. Jambi, tackles with real parameter optimization. The latter is one of the active research fields during the past decade. The performance of LSHADE-SPACMA was competitive in IEEE CEC’2017 competition on Single Objective Bound Constrained Real-Parameter Single Objective Optimization. Besides, it was ranked fourth among the 12 papers that were presented on and compared to these new benchmark problems. In this work, an improved version named ELSHADE-SPACMA is introduced. The proposed algorithm has been evaluated using IEEE CEC’2017 benchmark. Chapter 8, is entitled “Operations Research at Bulk Terminal: A Parallel Column Generation Approach” by Gustavo Campos Menezes, Lucas Teodoro de Lima Santos, João Fernando Machry Sarubbi, and Geraldo Robson Mateus, discusses various optimization problems existing in the process of storing and transporting loads. The optimization problem involving the production planning, product allocation, and scheduling of products in the largest bulk port terminal existing in Brazil is treated. The main contributions of this chapter are related to the use of a parallel approach to solving the integrated problem. The methodology uses a combination of heuristics, column generation, and optimization package. Chapter 9, is entitled “Heuristic Solutions for the (a; b)-k Feature Set Problem” by Leila M. Naeni and Amir Salehipour. Feature selection aims to choose a subset of features, out of a set of candidate features, such that the selected set best represents the whole in a particular aspect. The (a; b)-k feature set problem (FSP) is a combinatorial optimization based approach for selecting features. On a dataset with two groups of data, the (a; b)-k FSP aims to select a set of features such that the set maximizes the similarities between entities of the same group and the differences between entities of different groups. The authors develop a math-heuristic algorithm for the (a; b)-k FSP. Chapter 10 is entitled “Generic Support for Precomputation-Based Global Routing Constraints in Local Search Optimization” by Renaud De Landtsheer, Fabian Germeau, Thomas Fayolle, Gustavo Ospina, Christophe Ponsard. The objective of the chapter is to build a generic, constraint-based framework for local search optimization (CBLS), with application in routing optimization. In this chapter, the authors identify a generic stereotype of differentiation mechanism used by global constraints for routing optimization and we propose a generic support for this stereotype, so that implementing global constraint is made easier. Chapter 11 is entitled “Dynamic Simulated Annealing with Adaptive Neighborhood Using Hidden Markov Model” by Mohamed Lalaoui, Abdellatif El Afia and Raddouane Chiheb. The Simulated Annealing (SA) is a stochastic local search algorithm. Its efficiency involves the adaptation of the neighborhood structure. In this chapter, the authors integrate Hidden Markov Model (HMM) in

viii

Preface

SA to dynamically adapt the neighborhood structure of the simulated annealing at each iteration. HMM has proven its ability to predict the optimal behavior of the neighborhood function based on the search history. Chapter 12 is entitled “Hybridization of the Differential Evolution Algorithm for Continuous Multi-objective Optimization”, Caroline Gagné, Aymen Sioud, Marc Gravel, and Mathieu Fournier. The authors recall that the hybridization metaheuristics can enhance algorithm performance by combining the advantages of several strategies in order to profit from the resulting synergy. Multi-objective optimization is no exception to this trend. In this chapter, a new hybrid algorithm, IWODEMO, is developed. This latter uses differential evolution and the invasive weed algorithm to solve multi-objective problems with continuous variables, thereby integrating the exploration and exploitation capacities of both algorithms. Chapter 13 is entitled “A Steganographic Embedding Scheme Using Improved-PSO Approach” by Yamina Mohamed Ben Ali, deals with the steganography task as an optimization problem carried out by a bio-inspired approach. An embedding scheme using the substitution principle of LSB and related to an improved version of particle swarm optimization algorithm is proposed. The improved-PSO embedding scheme looks for the best pixels’ locations, and eventually the best pixels’ bits to hide secret messages—both text and image— without degrading the quality of the original image. Chapter 14, is entitled “Algorithms Towards the Automated Customer Inquiry Classification” by Gulshat Kessikbayeva, Nazerke Sultanova, Yerbolat Amangeldi, and Roman Yurchenko. Classification is an important field of research due to the increase of unstructured text, especially in the form of customer inquiries. The problem has two phases such as generally identifying a customer inquiry and automatically assigning an order of a product requested by customer with predefined categories based on its characteristics. The aim of this work is to present a classification model that supports efficiency while working with Russian texts, since it is known that machine learning algorithms proved to be working well with English texts. Chapter 15, is entitled “An Heuristic Scheme for a Reaction Advection Diffusion Equation” by M. R. Amattouch and H. Belhadj. The authors present an alternative meshless method to solve a reaction–advection–diffusion equation. To reduce the cost of computations, a new optimized domain decomposition with differential fractional derivative condition on the interface between sub-domains. Several test cases of analytical problems illustrate this approach and show the efficiency of the proposed new method. Chapter 16, “Stock Market Speculation System Development Based on Technico Temporal Indicators and Data Mining Tools”, is proposed by Zineb Bousbaa, Omar Bencharef, Abdellah Nabaji. The search for an efficient algorithm dedicated to the price exchange rate prediction of a currency is a problem of search for a global optimum, it can be solved using metaheuristics as an optimization technique. In this work, the authors suggest a Gradient Descent Regression

Preface

ix

algorithm optimized with Particle Swarm Optimization Metaheuristic in order to build a robust learning model. The experimental results proposed are compared with those obtained by: Simple Multi-linear regression that we implemented, other regression algorithms provided by the Scikit-Learn library in Python language and by RStudio in R language. In Chap. 17, Safae Bouzbita, Abdellatif El Afia, and Rdouan Faizi proposed a work entitled “A New Hidden Markov Model Approach for Pheromone Level Exponent Adaptation in Ant Colony System”. They propose a Hidden Markov Model (HMM) approach to avoid premature convergence of ants in the Ant Colony System (ACS) algorithm. The proposed approach was modeled as a classifier method to control the convergence through the dynamic adaptation of a parameter that weighs the relative influence of the pheromone. The implementation was tested on several Travelling Salesman Problem (TSP) instances with a different number of cities. Chapter 18 proposed by Boulif Menouar is entitled “A New Cut-Based Genetic Algorithm for Graph Partitioning Applied to Cell Formation”. Cell formation is a critical step in the design of cellular manufacturing systems. M Boulif claims that the problem was tackled by using a cut-based-graph-partitioning model. This model meets real-life production systems requirements as it uses the actual amount of product flows, it looks for the suitable number of cells, and it takes into account the natural constraints such as operation sequences, maximum cell size, cohabitation and non-cohabitation constraints. The author proposes an original encoding representation to solve the problem by using a genetic algorithm. Chapter 19 is dedicated to the work entitled “Memetic Algorithm and Evolutionary Operators for Multi-Objective Matrix Tri-Factorization Problem” by Rok Hribar, Gašper Petelin, Jurij Šilc, Gregor Papa, and Vida Vukašinovic. In memetic algorithm, a population-based global search technique is used to broadly locate good areas of the search space, while repeated usage of a local search heuristic is employed to locate optimum. Intuitively, evolutionary operators that generate individuals with genetic material inherited from the parents and improved performance ability should be the right option for improved performance of the algorithm in terms of time and solution quality. Evolutionary operators with such properties were devised and used in memetic algorithm for solving multi-objective matrix tri-factorization problem. It was shown, by comparing deterministic naive approach with two variants of memetic algorithm with different levels of inheritance, that evolutionary operators do not improve performance in this case. Chapter 20 entitled “Quaternion Simulated Annealing” by Abdellatif El Afia, Mohamed Lalaoui, and El-ghazali Talbi. Simulated annealing (SA) is a well-known stochastic local search algorithm for solving unconstrained optimization problems. It mimics the annealing process used in the metallurgy to approximate the global optimum of an optimization problem and uses the temperature to control the search. Unfortunately, the effectiveness of simulated annealing drops drastically when dealing with a large-scale optimization problem. This is due in general to a

x

Preface

premature convergence or a stagnation. Both phenomenons can be avoided by a good balance between exploitation and exploration. This chapter focuses on the same problem encountered by simulated annealing and tries to solve it using the quaternion which is a number system that extends to complex numbers. Quaternion representation helps the simulated annealing algorithm to smooth the fitness landscape and thus avoiding to get stuck in the local optima by expanding the original search space. Empirical analysis was conducted on many numerical benchmark functions. Chapter 21 entitled “A Cooperative Multi-Swarm Particle Swarm Optimizer Based Hidden Markov Model” is proposed by Oussama Aoun, Abdellatif El Afia, and El-Ghazali Talbi. Particle swarm optimization (PSO) is a population-based stochastic metaheuristic algorithm; it has been successful in dealing with a multitude of optimization problems. Many PSO variants have been created to boost its optimization capabilities, in particular, to cope with more complex problems. In this chapter, the authors provide a new approach of multi-population particle swarm optimization with a cooperation strategy. The proposed algorithm splits the PSO population into four sub-swarms and attributes the main role to each one. A machine learning technique is designed as an individual level to allow each particle to determine its suitable swarm membership at each iteration. In a collective level, cooperative rules are designed between swarms to ensure more diversity and realize the better solution using a Master/Slave cooperation scheme. Several simulations are performed on a set of benchmark functions to examine the performances of this approach compared to a multitude of state-of-the-art PSO variants. Chapter 22 dedicated to “Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method” is proposed by Vasileios A. Tatsis and Konstantinos E. Parsopoulos. Grid-based parameter adaptation method has been recently proposed as a general-purpose approach for online parameter adaptation in metaheuristics. The method is independent of the specific algorithm technicalities. It operates directly in the parameter domain, which is properly discretized forming multiple grids. Short runs of the algorithm are conducted to estimate its behavior under different parameter configurations. Thus, it differs from relevant methods that usually incorporate ad hoc procedures designed for specific metaheuristics. The method has been demonstrated on two popular population-based metaheuristics with promising results. Similarly to other parameter tuning and control methods, the grid-based approach has three decision parameters that control granularity of the grids and length of algorithm runs. The present study extends a preliminary analysis on the impact of each parameter based on experimental statistical analysis. The differential evolution algorithm is used as the targeted metaheuristic, and the established CEC 2013 test suite offers the experimental testbed. Chapter 23 dealing with “Auto-Scaling System in Apache Spark Cluster using Model-Based Deep Reinforcement Learning” is proposed by Kundjanasith Thonglek, Kohei Ichikawa, Chatchawal Sangkeettrakarn, and Apivadee Piyatumrong. Real-time processing is a fast and prompt processing technology that

Preface

xi

needs to complete the execution within a limited time constraint almost equal to the input time. Executing such real-time processing needs an efficient auto-scaling system, which provides sufficient resources to compute the process within the time constraint. We use Apache Spark framework to build a cluster which supports real-time processing. The major challenge of scaling Apache Spark cluster automatically for the real-time processing is how to handle the unpredictable input data size and also the unpredictable resource availability of the underlying cloud infrastructure. If the scaling-out of the cluster is too slow, then the application can not be executed within the time constraint as a result of insufficient resources. If the scaling-in of the cluster is slow, the resources are wasted without being utilized, and it leads to less resource utilization. This research follows the real-world scenario where the computing resources are bounded by a certain number of computing nodes due to limited budget as well as the computing time is limited due to the nature of near real-time application. We design an auto-scaling system that applies a deep reinforcement learning technique, DQN (Deep Q-Network), to improve resource utilization efficiently. Chapter 24 is entitled “Innovation Networks from Inter-organizational Research Collaborations” and proposed by Saharnaz Dilmaghani, Apivadee Piyatumrong, Grégoire Danoy, Pascal Bouvry, Matthias R. Brust. The authors consider the problem of automatizing network generation from inter-organizational research collaboration data. The resulting networks promise to obtain crucial advanced insights. In this paper, the authors propose a method to convert relational data to a set of networks using a single parameter, called Linkage Threshold (LT). To analyze the impact of the LT-value, a standard network metrics such as network density and centrality measures on each network produced is applied. The feasibility and impact of our approach is demonstrated by using a real-world collaboration data set from an established research institution. Chapter 25, entitled “Assessing Film Coefficients of Microchannel Heat Sinks via Cuckoo Search Algorithm”, is proposed by Jorge M. Cruz-Duarte, Arturo García-Pérez, Iván M. Amaya-Contreras, and Rodrigo Correa. The chapter deals with film transfer coefficient, which is one of the most challenging variables to measure in experimental heat transfer. The authors propose an estimation strategy for the film transfer coefficient by solving an inverse heat transfer problem via the Cuckoo Search global optimization algorithm. The designs were achieved through the entropy generation minimization criterion, also powered by Cuckoo Search, employing several specifications (material, working fluid, and heat power). Chapter 26 entitled “One-Class Subject Authentication Using Feature Extraction by Grammatical Evolution on Accelerometer Data” is proposed by Stefano Mauceri, James Sweeney, James McDermott. In this study, we wish to develop a method which confirms that the accelerometer data produced by a source device corresponds to the individual to which the device is assigned.

xii

Preface

Chapter 27, entitled “Semantic Composition of Word-Embeddings with Genetic Programming”, is due to R. Santana. In this chapter, R. Santana shows that it is possible to learn methods for word composition in semantic spaces using genetic programming (GP). The author proposes to address the creation of word embeddings that have a target semantic content as an automatic program generation problem. Chapter 28, entitled “New Approach for Continuous and Discrete Optimization: Optimization by Morphological Filters”, is written by Khelifa Chahinez Nour El houda and Belmadani Abderrahim, proposes a new metaheuristic algorithm called Optimization by Morphological Filters inspired by image processing methods. The proposed approach, applied on several test functions, shows very competitive results. Troyes, France April 2020

Farouk Yalaoui Lionel Amodeo El-Ghazali Talbi

Contents

1

2

3

4

5

6

Process Plan Generation for Reconfigurable Manufacturing Systems: Exact Versus Evolutionary-Based Multi-objective Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Faycal A. Touzout, Hichem Haddou Benderbal, Amirhossein Khezri, and Lyes Benyoucef On VNS-GRASP and Iterated Greedy Metaheuristics for Solving Hybrid Flow Shop Scheduling Problem with Uniform Parallel Machines and Sequence Independent Setup Time . . . . . . . . . . . . . . Said Aqil and Karam Allali A Variable Block Insertion Heuristic for the Energy-Efficient Permutation Flowshop Scheduling with Makespan Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Fatih Tasgetiren, Hande Oztop, Quan-Ke Pan, M. Arslan Ornek, and Talya Temizceri Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems Using Binary Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . Ozgur Kabadurmus, M. Fatih Tasgetiren, Hande Oztop, and M. Serdar Erdogan An Asynchronous Parallel Evolutionary Algorithm for Solving Large Instances of the Multi-objective QAP . . . . . . . . . . . . . . . . . . Florian Mazière, Pierre Delisle, Caroline Gagné, and Michaël Krajecki Learning from Prior Designs for Facility Layout Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hannu Rummukainen, Jukka K. Nurminen, Timo Syrjänen, and Jukka-Pekka Numminen

1

17

33

51

69

87

xiii

xiv

Contents

7

Single-Objective Real-Parameter Optimization: Enhanced LSHADE-SPACMA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Anas A. Hadi, Ali W. Mohamed, and Kamal M. Jambi

8

Operations Research at Bulk Terminal: A Parallel Column Generation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Gustavo Campos Menezes, Lucas Teodoro de Lima Santos, João Fernando Machry Sarubbi, and Geraldo Robson Mateus

9

Heuristic Solutions for the (a, b)-k Feature Set Problem . . . . . . . . . 139 Leila M. Naeni and Amir Salehipour

10 Generic Support for Precomputation-Based Global Routing Constraints in Local Search Optimization . . . . . . . . . . . . . . . . . . . . 151 Renaud De Landtsheer, Fabian Germeau, Thomas Fayolle, Gustavo Ospina, and Christophe Ponsard 11 Dynamic Simulated Annealing with Adaptive Neighborhood Using Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Mohamed Lalaoui, Abdellatif El Afia, and Raddouane Chiheb 12 Hybridization of the Differential Evolution Algorithm for Continuous Multi-objective Optimization . . . . . . . . . . . . . . . . . 183 Caroline Gagné, Aymen Sioud, Marc Gravel, and Mathieu Fournier 13 A Steganographic Embedding Scheme Using Improved-PSO Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Yamina Mohamed Ben Ali 14 Algorithms Towards the Automated Customer Inquiry Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Gulshat Kessikbayeva, Nazerke Sultanova, Yerbolat Amangeldi, and Roman Yurchenko 15 An Heuristic Scheme for a Reaction Advection Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 M. R. Amattouch and H. Belhadj 16 Stock Market Speculation System Development Based on Technico Temporal Indicators and Data Mining Tools . . . . . . . 239 Zineb Bousbaa, Omar Bencharef, and Abdellah Nabaji 17 A New Hidden Markov Model Approach for Pheromone Level Exponent Adaptation in Ant Colony System . . . . . . . . . . . . . . . . . . 253 Safae Bouzbita, Abdellatif El Afia, and Rdouan Faizi 18 A New Cut-Based Genetic Algorithm for Graph Partitioning Applied to Cell Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Menouar Boulif

Contents

xv

19 Memetic Algorithm and Evolutionary Operators for Multi-Objective Matrix Tri-Factorization Problem . . . . . . . . . . 285 Rok Hribar, Gašper Petelin, Jurij Šilc, Gregor Papa, and Vida Vukašinović 20 Quaternion Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Abdellatif El Afia, Mohamed Lalaoui, and El-ghazali Talbi 21 A Cooperative Multi-swarm Particle Swarm Optimizer Based Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Oussama Aoun, Abdellatif El Afia, and El-Ghazali Talbi 22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Vasileios A. Tatsis and Konstantinos E. Parsopoulos 23 Auto-Scaling System in Apache Spark Cluster Using Model-Based Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . 347 Kundjanasith Thonglek, Kohei Ichikawa, Chatchawal Sangkeettrakarn, and Apivadee Piyatumrong 24 Innovation Networks from Inter-organizational Research Collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Saharnaz Dilmaghani, Apivadee Piyatumrong, Grégoire Danoy, Pascal Bouvry, and Matthias R. Brust 25 Assessing Film Coefficients of Microchannel Heat Sinks via Cuckoo Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Jorge M. Cruz-Duarte, Arturo García-Pérez, Iván M. Amaya-Contreras, and Rodrigo Correa 26 One-Class Subject Authentication Using Feature Extraction by Grammatical Evolution on Accelerometer Data . . . . . . . . . . . . . 393 Stefano Mauceri, James Sweeney, and James McDermott 27 Semantic Composition of Word-Embeddings with Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 R. Santana 28 New Approach for Continuous and Discrete Optimization: Optimization by Morphological Filters . . . . . . . . . . . . . . . . . . . . . . 425 Chahinez Nour El Houda Khelifa and Abderrahim Belmadani Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Chapter 1

Process Plan Generation for Reconfigurable Manufacturing Systems: Exact Versus Evolutionary-Based Multi-objective Approaches Faycal A. Touzout, Hichem Haddou Benderbal, Amirhossein Khezri, and Lyes Benyoucef

Abstract Future productions are facing an increasingly complex environments, customized, flexible and high-quality production. Moreover, low costs, high reactivity and high quality products are necessary criteria for industries to achieve competitiveness in nowadays market. In this context, reconfigurable manufacturing systems (RMSs) have emerged to fulfill these requirements. This chapter addresses the multi-objective process plan generation problem in RMS environment. Three approaches are proposed and compared: an iterative multi-objective integer linear program (I-MOILP) and adapted versions of the well-known evolutionary algorithms, respectively, archived multi-objective simulated annealing (AMOSA) and the non-dominated sorting genetic algorithm (NSGA-II). Moreover, in addition to the minimization of the classical total production cost and the total completion time, the minimization of the maximum machines exploitation time is considered as a novel optimization criterion, in order to have high quality products. To illustrate the applicability of the three approaches, an example is presented and the obtained numerical results are analysed. Keywords Reconfigurable manufacturing system · Multi-objective optimization · Multi-objective integer linear programming · Amosa · Nsga-II. F. A. Touzout DISP, Université Lyon, INSA Lyon, EA 4570, 69621 Villeurbanne, France e-mail: [email protected] H. Haddou Benderbal (B) IMT Atlantique, LS2N-CNRS, Nantes, France e-mail: [email protected] A. Khezri · L. Benyoucef Aix Marseille Université, Université de Toulon, CNRS, LIS, Marseille, France e-mail: [email protected] L. Benyoucef e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_1

1

2

F. A. Touzout et al.

1.1 Introduction Under the effect of globalization, technological development and rapid socioeconomic transformations over the last decades, customers are increasingly demanding and impose tailored solutions and products. These changes, increased the complexity of manufacturing environments and have led to the emergence of the industry 4.0, which raises many new challenges for production systems. In this context, it is very important for companies/organizations to respond quickly and cost effectively to be present and to take the lead among the competitors. Today, customer satisfaction is a challenge for most manufacturing companies. Mass customization, a product deployment concept that combines low price with extensive variation and adaptation has emerged due to its potential impact upon the customer regarding the perceived value of the product. With the continuous demand for products incorporating new and complex functionalities there has been a lot of pressure on the manufacturing companies. This requires a changeable structure of the organization to cater to a wide product variety and can be attained through adoption of the concept of reconfigurable manufacturing system (RMS), which comprises of reconfigurable machines, controllers and the software support systems. RMS is one of the latest manufacturing paradigms. In this paradigm, machine components, machines software or material handling units can be added, removed, modified or interchanged as needed and when imposed by the necessity to react and respond rapidly and cost-effectively to changing requirements. It is regarded as a convenient manufacturing paradigm for variety productions as well as a flexible enabler for this variety. Hence, it is a logical evolution of the two manufacturing systems already used in the industries respectively dedicated manufacturing lines (DML) and flexible manufacturing systems (FMS). According to Koren [16], father of RMS, DMLs are inexpensive but their capacities are not fully utilized in several situations especially under the pressure of global competition, thus they engender losses. On the other hand, FMSs respond to product changes, but they are not designed for structural changes. Hence, in both systems, a sudden market variation cannot be countered, like demand fluctuation or regulatory requirements. RMS combines the high flexibility of FMS with the high production rate of DML. It comprises the positive features of both systems, thanks to its adjustable structure and design focus. Thus, in situations where both productivity and system responsiveness to uncertainties or to unpredictable scenarios (e.g. machine failure, market change, …) are of a vital importance, RMS ensures a high level of responsiveness to changes with a high performance. This can be achieved through six main principles respectively customization, convertibility, scalability, integrability, modularity, and diagnosability. Moreover, Koren suggested that in manufacturing systems, the key to responsiveness in markets as well as to cope with changing market conditions that causes product demand and mix fluctuations, is to adjust the production system capacity. He stressed that this adjustment is possible thanks to two types of reconfiguration capabilities in

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

3

manufacturing systems, which are functionality adjustment and production capacity adjustment. These characteristics are achievable due to reconfigurable machine tool (RMT), which is considered as one of the major components of RMS. With this reconfigurable structure, RMT provides a customized flexibility and offers a variety of alternatives features. In this chapter, three approaches are proposed and compared for the process plan generation problem in a reconfigurable manufacturing environment: an iterative multi-objective integer linear program (I-MOILP) and adapted versions of the well-known evolutionary algorithms, respectively, archived multi-objective simulated annealing (AMOSA) and the non-dominated sorting genetic algorithm (NSGAII). In addition to the minimization of the classical total production cost and the total completion time, the minimization of the maximum machines exploitation time is considered as a novel optimization criterion, in order to have high quality products. The rest of the chapter is organized as follows: Sect. 1.2 briefly summarizes the related works to RMS. Section 1.3 presents the problem under consideration and its mathematical formulation. Section 1.4 describes more in details the proposed three approaches. Section 1.5 analyses the obtained numerical results. Section 1.6 concludes the chapter with some future research directions.

1.2 Literature Review The literature related to RMS problems is very rich and covers many areas, such as: design, layout optimization, reconfigurable control, process planning and production scheduling [3]. However, in this section, we will summarize the most related research works to process plan generation in RMS. When considering process plan generation in RMS, the state of the art is rich. ElMaraghy [10] claimed that “we need to associate the evolutions, reconfigurations and reconfigurable process plans to changes and evolutions of manufacturing systems and products”. Nallakumarasamy [9] defined the process plan as “the activity that decides the sequence, which the manufacturing process must follow”. This activity defines the operations order required to complete a single unit of product. It assigns each operation to the appropriate machine under the adequate configuration. Azab and ElMaraghy [1] considered reconfigurable process plans, where an existing process plan is reconfigured when a new feature is added to an existing part, in order to avoid the generation of a wholly new process plan. Reconfiguration of process plan consists to include minor modifications to meet the requirements of the new part. Furthermore, Shabaka and ElMaraghy [5] developed a new genetic algorithm based model to perform process plan in RMS environment. The model simultaneously considers all process plan parameters such as machine assignment and machine configurations. Musharavati and Hamouda [7] investigated the use of simulated-annealing-based algorithms in solving process planning problem for a reconfigurable manufacturing. They developed several variants of the simulated annealing algorithms respec-

4

F. A. Touzout et al.

tively a variant of the basic simulated annealing algorithm, a variant of the simulated annealing algorithm coupled with auxiliary knowledge and a variant of the SA algorithm implemented in a quasi-parallel architecture. The obtained experimental results showed the superiority of the variants in comparison to a basic simulated annealing algorithm. Maniraj et al. [14] proposed a two-phase-based ant colony optimisation approach to solve the process plan generation problem of a single product flow-line in a reconfigurable context. In the first phase, priority-based encoding technique is applied to find feasible operation clusters. Where, in the second phase, ant colony technique is used for minimising the total cost of the RMS. A case study is presented to demonstrate the applicability of the developed approach. In a multi-objective context, Chaube et al. [4] and Bensmaine et al. [2] proposed an evolutionary-based approach to solve the problem. Chaube et al. [4] adapted the non-dominated sorting genetic algorithm (NSGA-II) where two objectives are considered, respectively, the total completion time and the total manufacturing cost. Bensmaine et al. [2] integrated the process plan generation with the design problem using the same approach. Haddou Benderbal et al. [11] proposed a new flexibility metric to generate efficient process plan by integrating unavailability constraints of the selected machines. The resulting multi-objective problem is solved using an adapted version of NSGA-II. More recently, Haddou Benderbal et al. [12] adapted AMOSA to solve the integrated design and process plan generation problem for RMS. In addition to the classical optimization criteria, respectively, cost and time, the authors added modularity as a criterion. Xia et al. [15] extended the concept of reconfigurable process planning to a concept of reconfigurable machining process planning which targets the process plan generation for a part family. Touzout and Benyoucef [8] solved the sustainable process plan generation problem for a RMS, where the amount of greenhouse gases (GHG) emitted during the manufacturing process is minimized in addition to the total production cost and completion time. The authors developed an iterative multi-objective integer linear programming (I-MOILP) approach and compared with adapted versions of the archived multi-objective simulated-annealing approach (AMOSA) and the NSGA-II approach. And, they studded the influence of the probabilities of genetic operators on the convergence of the adapted NSGA-II and illustrated the applicability of the three approaches using numerical examples. In this chapter, and due to the lack of works using exact multi-objective approaches for the process plan generation in RMS, we propose three posteriori approaches to solve our problem. The three approaches are respectively, I-MOILP and the adapted versions of the two well-known evolutionary algorithms, AMOSA and NSGA-II. To illustrate the applicability of the approaches, a simple example is presented and the obtained numerical results are analysed.

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

5

Fig. 1.1 A simple example of a part

Fig. 1.2 An example of a precedence graph

1.3 Problem Description and Mathematical Formulation The manufacturing of a product (e.g. Fig. 1.1) is the processing of a set of operations linked with each other by a precedence graph (e.g. Fig. 1.2). The problem of generating a process plan is to define a sequencing of these operations for a given RMS’s design (i.e. a set of machines, configurations and tools).

1.3.1 Problem Description A machine in a RMS is represented by a set of available configurations and a set of compatible tools. A configuration in this case offers different tool advance directions (TADs) (i.e. x± , y± and z ± ). An operation is represented by the TADs that it requires to be processed. Thus, a set of triplets K i that are able to perform the operation i is defined. A triplet k in this case is defined by the indices of its machine index M k , configuration indexC k and tool index T k . Table 1.1 shows an example of the required TADS and tools for the operations of an instance of the problem. The TADS, configurations and tools that each machine offers are also presented. Table 1.2 illustrates an example of a generated process plan. In this case, three optimization criteria are considered: • The total production cost. • The total completion time. • The exploitation time per machine.

6

F. A. Touzout et al.

Table 1.1 An illustrative example of TADS and tools Ops x+ y+ z+ x− Ms | Cs OP1 OP2 OP3 OP4 OP5 OP6 OP7 OP8 M1 M2 M3 M4

M5

x x

x

x x x

2,3,4

x x x x

x x x x

x x x x

x

x x

x

3,4 x

x

2,3 x

x x x

M3 C1 T4

Tools 4 3 4 4 2 4 2 2 3,4

x

x x x

x

x x x

x

x x

x

Table 1.2 An illustrative example of a process plan Operation OP1 OP7 OP3 OP8 Machine Config Tool

z−

x x x

x x x

C1 C2 C1 C2 C1 C2 C1 C2 C3 C4 C1 C2

y−

M4 C4 T2

M5 C2 T4

M5 C2 T2

2,3,4 x

OP2

OP4

OP5

OP6

M5 C2 T3

M1 C1 T4

M2 C1 T2

M4 C2 T4

1.3.2 Mathematical Formulation Throughout the next section, the following notations are used: Parameters: O : Set of operations n : Number of operations i, i  : Index of operation Pi : Set of predecessors operations

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

7

k, k  : Index of triplet j, j  : Index of position in the sequence M : Set of machines m, m  : Index of machine K i : Set of available triplets for operation i K m : Set of available triplets with machine m K : Set of triplets, where: K = K i ∪ K m h, h  : Index of configuration tl, tl  : Index of tool Production costs: CC Mm,m  : Cost of changing machine per time unit CCC h,h  : Cost of changing configuration per time unit CC Ttl,tl  : Cost of changing tool per time unit C Pi,k : Cost of processing per time unit Time: T C Mm,m  : Time of changing machine T CC h,h  : Time of changing configuration T C Ttl,tl  : Time of changing tool T Pi,k : Time of processing To formulate our problem, these decisions variables are needed: xi,k j = 1 if the ith operation is processed at the jth position using the kth triplet, 0 otherwise. y mj,k = 1 if the mth machine is using the kth triplet at the jth position, 0 otherwise. cm j,m,m  = 1 if between position j − 1 and j, there has been a change between machines m and m  , 0 otherwise. ccmj,k,k  = 1 if between position j − 1 and j, there has been a change between triplet k and k  of machine m, 0 otherwise. f e ∈ N represents the maximal exploitation time of the machines. f c and f t are, respectively, the total production cost and the completion time, where: fc =

n   j=1 i∈O k∈K i

xi,k j × C Pi,k × T Pi,k +

n    j=1 m∈M m  ∈M

cm j,m,m  × CC Mm,m 

8

F. A. Touzout et al.

×T C Mm,m  +

n    

 ccmj,k,k 

× T C Tindex T k ,index T k 

j=1 m∈M k∈K m k  ∈K m 



×CC Tindex T k ,index T k  + T CCindexC k ,indexC k  × CCCindex T k ,index T k  f = t

n  

xi,k j

× T Pi,k +

j=1 i∈O k∈K i

+

n    j=1 m∈M

n    

cm j,m,m  × T C Mm,m 

m  ∈M

  ccmj,k,k  × T CCindexC k ,indexC k  + T C Tindex T k ,index T k 

j=1 m∈M k∈K m k  ∈K m 

Our problem can be formulated as a Multi-Objective Integer Linear Program (MOILP). Constraint (1) states that one operation is processed at each position of the process plan. Constraint (2) states that each operation is processed once. Constraint (3) states that an operation is processed if and only if all its predecessors operations are already processed. Constraint (4) states that each machine is using one configuration and one tool at once. Constraint (5) states which configuration and tool are used at position j for machine m. Constraints (6) and (7) state, respectively, if there’s a change of machine and a change of configuration and/or tool between positions j − 1 and j. Constraint (8) states that there’s only one change of configuration between positions j − 1 and j. Finally, constraint (9) states the maximal exploitation time. MOILP min f c min f t min f e s.t.   i∈O k∈K i n   j=1 k∈K i



k∈K i



=1

∀ j = 1, . . . , n

(1)

xi,k j

=1

∀i ∈ O

(2)

∀i ∈ O, ∀ j = 1, . . . , n

(3)

∀ j = 1, . . . , n, ∀m ∈ M

(4)

   j−1

xi,k j × |Pi |



y mj,t

=1

k∈K m y mj,k  i∈O y mj,k

xi,k j

i  ∈Pi

j  =1 k  ∈K i 



xik , j 

xi,k j + xi,k j−1

≥ xi,k j ≤ ct j,index M k ,index M k  + 1

∀ j = 1, . . . , n, ∀m ∈ M, ∀k ∈ K m ∀ j = 2, . . . , n, ∀k, k  ∈ K

(5) (6)

+ y mj−1,k 

≤ ccmj,k,k  + 1 =1

∀ j = 2, . . . , n, ∀m ∈ M, ∀k, k  ∈ K m ∀ j = 1, . . . , n, ∀m ∈ M

(7) (8)

≤ fe

∀m ∈ M

(9)





ccmj,k,k   k,k ∈K m xi, j,k × T Pi,k k∈K m i, j∈O

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

9

1.4 Proposed Approaches In this section, we will describe in details the three developed approaches.

1.4.1 Iterative Multi-Objective Integer Linear Program (I-MOILP) I-MOILP is a cutting-plane based approach that enumerates the whole optimal Pareto front. The idea behind is to solve at each iteration iter a smaller integer linear program (ILP) by adding cuts (i.e. constraints) to eliminate the efficient solution generated by the ILP at iter − 1 as well as all the solutions dominated by it from the search space. A description of our approach is proposed in Algorithm 1, where Mk is an upperbound of the kth objective, z kiter a boolean variable used to select the objective to optimize at iteration iter and da the minimum dispersion amount between two solutions for an objective. It is important to note that our Iterative-MOILP can be used to tackle other multi-objective optimization problems. However, the problem must be modelled as an integer linear program.

Algorithm 1 Iterative-MOILP 1: input data 2: iter = 0 3: set an empty archive 4: aggregate the objectives of MOILP 5: solve MOILP 6: add solution to the archive 7: while MOILP is still feasible do 8: iter + + 9: ex pr = 0 10: create variables z kiter 11: for k = 1, . . . , nbObjectives do 12: ex pr = ex pr + z kiter 13: add constraint: f k ≤ ( f kiter −1 − da)z kiter + Mk (1 − z kiter ) 14: end for 15: add constraint: ex pr ≥ 1 16: solve MOILP 17: add solution to the archive 18: end while 19: return archive

Although the enumeration of the whole optimal Pareto front by I-MOILP for large-sized instances can be a very time consuming task, it still has undeniable advantages:

10

F. A. Touzout et al.

1. It can provide the exact number of solutions asked by the decision maker (DM). 2. It can control the dispersion of the provided solutions by manipulating the dispersion amount da. 3. It provides the most appealing solutions to the DM from the beginning when the weights of the objectives are properly defined.

1.4.2 Adapted Archived Multi-Objective Simulated-Annealing (AMOSA) AMOSA is a simulated annealing based multi-objective optimization algorithm that provides a set of solutions non-dominated with each other for a considered problem. Starting with a randomized or a given initial solution, a local search is performed to generate a new one. An elaborate procedure is used to determine the acceptance of the new solution in the archive. A brief description of our approach is proposed in Algorithm 2. Bandyopadhyay et al. [6] proposed a more detailed description as well as a complexity study and a comparison with the well-known evolutionary algorithms NSGA-II and PAES for well-known problems in the literature.

Algorithm 2 Adapted AMOSA 1: 2: 3: 4: 5: 6: 7:

input data initialize T max, T min, iter, α, temp = T max, per tur bation Ratio archive current=random(archive) while temp > T min do for i = 0 : iter do new=perturb(current) depending on the dominance status of new with current and the solutions in the archive: new replace current and is added to the archive 8: delete dominated process plans from the archive 9: end for 10: temp = α × temp 11: end while 12: return archive

1.4.3 Adapted Non Dominated Sorting Genetic Algorithm II (NSGA-II) The Non Dominated Sorting Genetic Algorithm (NSGA-II) is a population-based evolutionary algorithm proposed by [13]. Starting with a randomized initial pop-

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

11

ulation called the parent population of a given size, for each iteration of NSGAII, a new population called child population is generated by applying genetic operators (i.e. mutation, crossovers…) with specified probabilities. The parent population of iteration iter + 1 is the result of an elitist procedure applied to par ent Population iter ∪ child Population iter . This elitist procedure is ensured by a fast non dominated sorting algorithm, as well as a crowding distance sorting. A description of our approach is proposed in Algorithm 3. Moreover, more detailed descriptions of the fast non dominated sorting and the crowding distance sorting algorithms are presented in [13]. Algorithm 3 Adapted NSGA-II 1: input data 2: initialize populationSi ze, iteration, pmutation , mutation Ratio, pcr ossover 3: randomize par ent Population 4: for iter = 1 : iteration do 5: generate child Population from par ent Population 6: population = par ent Population ∪ child Population 7: F = f ast N on Dominated Sorting( population) 8: for l = 1 : si ze(F) do 9: if size(new Population)+size(Fl ) < populationSi ze then 10: new Population+ = Fl 11: else 12: cr owding DistanceSor ting(Fl ) 13: for k = 1 : si ze(Fl ) do 14: if size(new Population) < populationSi ze then 15: new Population+ = Flk 16: else 17: break; 18: end if 19: end for 20: end if 21: end for 22: par ent Population = new Population 23: end for 24: return par ent Population

1.5 Experimental Results and Analyses Due to the lack of benchmarks in the literature related to process plan generation in a reconfigurable manufacturing environment, our experiments results are performed with randomly generated instances. An instance is identified by the number of operations and the number of machines and denoted by nbOperations_nbMachines. In order to study the influence of the probabilities of genetic operators on the convergence of adapted NSGA-II, we tested various versions. A version in our case

12

F. A. Touzout et al.

Table 1.3 Averages percentages of the number of solutions in the optimal Pareto fronts: I-MOILP vs adapted AMOSA and NSGA-II Instance I-MOILP AMOSA NSGA-100 NSGA-75 NSGA-50 NSGA-25 NSGA-0 (%) (%) (%) (%) (%) (%) (%) 3_3 4_3 5_3 6_4 7_5 8_5 9_5 10_5

100 100 100 100 100 100 100 100

100 100 100 54.8 35 55 100 0

100 100 100 100 100 100 100 95

100 100 100 100 100 100 100 85

100 100 100 100 100 100 100 92.5

100 100 100 100 100 98 100 70

100 100 100 52.8 25 61.2 90 2.5

is represented by the probability of its mutation operator (e.g. NSGA-90 is an adapted version of NSGA-II where 90 % of the child population is generated using mutation operations while the rest is the result of crossover operations). For visualization purposes, we limited our results to 5 versions of the adapted NSGA-II, which are presented in the comparisons. In this case, we can distinguish two experimental schemes:

1.5.1 Experimental Scheme 1 Adapted AMOSA and adapted NSGA-100, NSGA-75, NSGA-50, NSGA-25 and NS-GA-0 are executed 10 times and the average percentage of the number of solutions in the optimal Pareto front is compared in regards to the obtained Pareto front from the use of I-MOILP. Table 1.3 presents the obtained results for instances of the problem under consideration. The computational time regarding 10_5 for the I-MOILP is of 438 seconds. As we can see from Table 1.3, adapted NSGA-II outperforms adapted AMOSA. We observe that the Pareto fronts obtained by using the adapted NSGA-II are, in most cases, optimal. Moreover, we realise that for these small instances, when the crossover operator is used exclusively, the quality of the Pareto fronts deteriorates.

1.5.2 Experimental Scheme 2 For medium and large-sized instances, the enumeration of the whole optimal pareto fronts by I-MOILP can be very time consuming or even impossible. In this case, comparisons between the generated Pareto fronts of, respectively, adapted AMOSA and

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

13

Fig. 1.3 Pareto fronts of instance 20_10: AMOSA versus NSGA-II

Fig. 1.4 Pareto fronts of instance 50_20: AMOSA versus NSGA-II

Fig. 1.5 Pareto fronts of instance 100_20: AMOSA versus NSGA-II

adapted NSGA-100, NSGA-75, NSGA-50, NSGA-25 and NSGA-0 are presented in Figs. 1.2, 1.3, 1.4, and 1.5. Confirmed by our experiments, we can observe that the adapted NSGA-II still performs better than the adapted AMOSA for larger instances. Moreover, we can see that the larger the instance is, the more efficient the crossover operator gets, and vice-versa. Finally, we can claim that the inefficiency of the mutation operator for large-sized instances is the result of the large number of neighbours of a given solution.

14

F. A. Touzout et al.

1.6 Conclusion In this paper, we presented and compared an exact and two multi-objective evolutionary-based approaches for the process plan generation problem in RMS. Three criteria were considered, respectively, the total production cost, the completion time and the maximum exploitation time for machines. We presented a rich panel of experimental results and analyses to demonstrate the quality of the developed three approaches. For future works, we expect to use other criteria such as greenhouse gas (GHG) emission for sustainability purposes. Moreover, we consider extending the problem to two variants such as, the multi-unit process plan generation problem and the integrated process plan generation and scheduling problem (IPPS) in reconfigurable environment.

References 1. A. Azab, H. ElMaraghy, Mathematical modeling for reconfigurable process planning. CIRP Ann. Manuf. Technol. 56(1), 467–472 (2007) 2. A. Bensmaine, M. Dahane, L. Benyoucef, A non-dominated sorting genetic algorithm based approach for optimal machines selection in reconfigurable manufacturing environment. Comput. Ind. Eng. 66(3), 519–524 (2013) 3. A. Bensmaine, M. Dahane, L. Benyoucef, A new heuristic for integrated process planning and scheduling in reconfigurable manufacturing systems. Int. J. Prod. Res. 52(12), 3583–3594 (2014) 4. A. Chaube, L. Benyoucef, M.K. Tiwari, An adapted nsga-2 algorithm based dynamic process plan generation for a reconfigurable manufacturing system. J. Intell. Manuf. 23(4), 1141–1155 (2012) 5. A.I. Shabaka, H. ElMaraghy, A model for generating optimal process plans in rms. Int. J. Comput. Integr. Manuf. 21(2), 180–194 (2008) 6. S.S. Bandyopadhyay, U. Maulik, K. Deb, A simulated annealing-based multiobjective optimization algorithm: Amosa. IEEE Trans. Evol. Comput. 12(3), 269–283 (2008) 7. F. Musharavati, A.S.M. Hamouda, Enhanced simulated-annealing-based algorithms and their applications to process planning in reconfigurable manufacturing systems. Adv. Eng. Softw. 45(1), 80–90 (2012) 8. F.A. Touzout, L. Benyoucef, Multi-objective sustainable process plan generation in a reconfigurable manufacturing environment: exact and adapted evolutionary approaches. Int. J. Prod. Res. 57(8), 2531–2547 (2019) 9. G. Nallakumarasamy, P.S.S. Raja, K.V. Srinivasanand, R. Malayalamurthi, Optimization of operation sequencing in capp using superhybrid genetic algorithms-simulated annealing technique. ISRN Mech. Eng. (2011) 10. H. ElMaraghy, Reconfigurable process plans for responsive manufacturing systems, in Digital Enterprise Technology (Springer, 2007), pp. 35–44 11. H. Haddou Benderbal, M. Dahane, L. Benyoucef, Flexibility-based multi-objective approach for machines selection in reconfigurable manufacturing system (rms) design under unavailability constraints. Int. J. Prod. Res. 55(20), 6033–6051 (2017)

1 Process Plan Generation for Reconfigurable Manufacturing Systems …

15

12. H. Haddou Benderbal, M. Dahane, L. Benyoucef, Modularity assessment in reconfigurable manufacturing system (rms) design: an archived multi-objective simulated annealing-based approach. Int. J. Adv. Manuf. Technol. 94(1–4), 729–749 (2018) 13. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 14. M. Maniraj, V. Pakkirisamy, R. Jeyapaul, An ant colony optimization-based approach for a single-product flow-line reconfigurable manufacturing systems. Proc. Inst. Mech. Eng. Part B: J. Eng. Manuf. 231(7), 1229–1236 (2017) 15. Q. Xia, A. Etienne, J. Dantan, A. Siadat, Reconfigurable machining process planning for part variety in new manufacturing paradigms: Definitions, models and framework. Comput. Ind. Eng. 115, 206–219 (2018) 16. Y. Koren, General RMS characteristics. Comparison with dedicated and flexible systems, in Reconfigurable Manufacturing Systems and Transformable Factories (Springer, 2006), pp. 27–45

Chapter 2

On VNS-GRASP and Iterated Greedy Metaheuristics for Solving Hybrid Flow Shop Scheduling Problem with Uniform Parallel Machines and Sequence Independent Setup Time Said Aqil and Karam Allali Abstract In this paper, we present some effective metaheuristics for solving hybrid flow shop scheduling with uniform parallel machines and sequence independent setup time. We implement three metaheuristics, the variable neighborhood search algorithm, the greedy randomized adaptive search procedure and the iterative greedy algorithm. The objective function is the minimization of the total flow time taking into account the availability constraints of the machines during scheduling. For each metaheuristic we choose the appropriate parameters to obtain the optimal solution by exploring the space of the neighborhood of the current solution. We conducted a simulation study on a set of randomly generated instances in order to test the effectiveness of different metaheuristics. We found that the iterative greedy algorithm gives good results compared to the other two metaheuristics. Keywords Hybrid flow shop · Sequence independent setup time · Variable neighborhood search · Greedy randomized adaptive search procedure · Iterative greedy algorithm · Total flow time

2.1 Introduction The tough competition in the industrial market between the company is pushing decision makers to improve their production processes by implementing the new tools of modern technology. The workshops are equipped with machines whose performance can fall over time, to meet the demands of customers on time, the S. Aqil (B) · K. Allali FST, Laboratory Mathematics and Applications, University Hassan II of Casablanca, PO Box 146, Mohammedia, Morocco e-mail: [email protected] K. Allali e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_2

17

18

S. Aqil and K. Allali

production managers are brought to equip the workshops of new machines with good performance compared to the old machines. The new machine parks therefore require careful organization by optimizing the use of all the functional machines to meet the requirements of the customers. In this context, we propose a simulation study of a hybrid flow shop (HF) with machine-dependent preparation times also called sequence-independent setup time. The HF problem is widely studied in [1] scheduling literature. The first study concerning this type of problem is presented by [2] by modeling a two-stage HF with two identical parallel machines per stage. The search for the optimal solution by the exact methods is limited to the problems of small sizes. In the real industrial case, the number of works is often much higher, which pushes the researchers to propose other approaches of resolution. These methods are essentially based on heuristics and metaheuristics. They give good results in a reasonable time given the complexity of the problem. The first studies proposed generally concern manufacturing workshops with identical parallel machines. We note that a notation consisting of three fields α|β|γ is proposed in [3] and a state of the art of the last decade for the problems of HF scheduling is presented in [4]. We found that few studies have addressed HF problems with uniform parallel machines, the first works dealing with this type of problem are cited in [5]. However, parallel shop flow problems with identical machines [6] or unrelated parallel machines [7] are largely studied by making the constraints and the criteria to be optimized more diversified. The introduction of the setup time in simulation models is of great importance in optimizing flow management. The most recent works [8] have shown the usefulness of taking this constraint into account in decision-making in the industrial field. This time is considered an unproductive time which generally influences the scheduling of production batches. The mastery and optimal management of this factor allows the company to gain in terms of time and costs generated during the production process. This type of problem is attracting more and more attention from researchers in recent years. They propose models of resolution based essentially on the exact methods for the small problems or meta heuristics for the problems of big size. In the majority of cases, researchers develop models based on meta heuristics for its efficiency and speed to find a good solution to the problem. Many metaheuristics find their application in the resolution of the HF problems we quote in particular the simulated annealing [9, 10], the tabu search [11, 12], the genetic algorithm [13, 14]. We find the new algorithms inspired by nature, such as the bee colony algorithm [15, 16], the ant colony algorithm [17, 18] and the migratory bird algorithm [19, 20]. In this work we are interested for a HF of three stage (see Fig. 2.1) with uniform machine and sequence independent setup time. We suggest a set of heuristics based essentially on Johnson’s rule [21] and the NEH [22] algorithm initially applied to the flow shop problem. Then the improvement of the solutions is made by three metaheuristics the variable neighborhood search (VNS) algorithm, the greedy randomized adaptive search procedure (GRASP) and the iterative greedy (IG) algorithm.

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving …

19

Fig. 2.1 Three stages hybrid flow shop

2.2 Description of the Hybrid Flow Shop Problem In the hybrid flow shop scheduling problem with parallel uniform and sequence independent setup time machines a set J = {J1 , . . . , Jn } of jobs is started in the shop, all jobs follow the same sequence through a set of K stages. On each stage k a machine set M (k) = {M1k , . . . , Mm k k } is available to handle the same job. We note p that vik ∈ N represents the speed of the machine i on the stage k and pi jk = vikjk , the processing time of the job J j on a machine i of the stage k. Either, sik the preparation time independent of the sequence of the machine at stage k before the processing of any job. A job sequence π = (π1 , . . . , πn ) is generated in the permutation space of the neighborhood of the current solution. The end date of the job π j on a machine i of the stage k is Ciπ j k . When the machine is not specified the output date of the jobs at stage k is noted Cπ j ,k . The jobs will be sorted in ascending order of Cπ j (k−1) to be processed by machines at stage k obeying the first-in-first-out (FIFO) rule.  In our case the goal is the minimization of the total flow time (TFT) of all jobs, i.e nj=1 Cπ j 3 in the last stage. The objective function is determined by a set of equations highlighting the end dates of the jobs. On each stage k a job π j is assigned to a single machine, when a machine has not processed any job before, the completion time is given by the following expression Ciπ j k = max{sik , Cπ j (k−1) } +

p jk vik

(2.1)

If a machine i has already processed a job, the completion date is given by Ciπ j k = max{Ciπσ k + sik , Cπ j (k−1) } +

p jk vik

(2.2)

20

S. Aqil and K. Allali

where the machine i has just processed the job πσ the predecessor of the job π j . This job is assigned to the machine i ∗ = argmin 1im k (Ciπ j k ) which completes it as soon as possible and its end date on stage k, knowing that Cπ j k = min (Ciπ j k )

(2.3)

1≤i≤m k

The goal is to establish a scheduling schedule to minimize the TFT for all jobs. Therefore, we are looking for an optimal permutation π ∗ defined in the set of permutations () in the search space whose objective function is minimi ze

T FT =

n 

Cπ j 3

(2.4)

j=1

Our problem will be designated by H F3((Q M)k )3k=1 |S I ST | ing the notation adopted in the scheduling domain.

n j=1

C j3 respect-

2.3 Resolution Hybrid flow shop scheduling issues with setup time is NP-hard problem in the strong sense of optimization. Researchers often favor the heuristics and metaheuristics for its resolution, when the size of the problem is more important. Here we present a set of heuristics based on the rules of priorities very widespread in scheduling problems.

2.3.1 Initialization Heuristics 2.3.1.1

Heuristics Based on Johnson’s Rule

To determine an initial solution we apply the Johnson’s rule used initially for a two machines flow shop problem that we adopt for a three-stage HF problem. The Algorithm 1 gives the steps for determining the Johnson sequence. Algorithm 1 Johnson’s rule Input: Two machines: M 1 , M 2 and the set J = J1 , . . . , Jn The processing time for each job in two machines are p 1j , p 2j Step 1: Build the job set U as p 1j < p 2j and sort the jobs in ascending order their p 1j on M 1 . Step 2: Build the job set V as p 1j  p 2j and sort jobs by descending order their p 2j on M 2 . Step 3: Form the Johnson sequence by the following concatenation π J = U V . Output: π J

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving …

21

We have implemented two heuristics based on the Johnson rule that we adopt for the three-stages HF problem. • H1 : Calculate the processing times of each job on two virtual machines M 1 , M 2 by the following two expressions: p 1j

  p jk , = min sik + 1≤i≤m k vik k=1 2 

p 2j

  p jk = min sik + 1≤i≤m k vik k=2 3 

(2.5)

Construct the sequence by Johnson’s rule using the durations p 1j , p 2j . • H2 : Calculate the processing times of each job on two virtual machines M 1 , M 2 by the following two expressions:

p˜ 1j =

2  k=1

m k  i=1

sik +

mk

p jk vik

 ,

p˜ 2j =

3  k=2

m k  i=1 sik +

p jk vik



mk

(2.6)

Construct the sequence by Johnson’s rule considering the duration p˜ 1j , p˜ 2j . 2.3.1.2

Heuristics Based on the NEH Algorithm

We rely on the NEH algorithm to determine a good starting solution. Indeed, this heuristic is considered among the most important heuristics in the scheduling of the flow shop with K machines. The fundamental steps of this algorithm are described by algorithm 2. Algorithm 2 NEH algorithm Input:A K machines flow shop and a set J = J1 , . . . , Jn and each job j have the processing time p j1 , . . . , p j K in K machines. K Step 1: Calculate the total processing time of each job by: T ( j) = k=1 p jk Construct the sequence π = (π1 , . . . , πn ) by sorting the jobs in descending order of T ( j) Step 2: Schedule the first two jobs π1 ,π2 and choose the best sequence π of the first two jobs Step 3: Schedule other jobs for j = 3 to n do Test π j in the different positions of the current sequence built and choose the best sequence with the minimum TT in all machines and update π N E H of the optimal sequence. end for Output: π N E H

• H3 : Determine the smallest total processing time of each job j on the HF with three stages by the following expression

22

S. Aqil and K. Allali

Table 2.1 Setup time and machine speed per stage Stage 1 Stage 2 M11 M21 M12 vik sik

1 2

2 3

3 3

M22

Stage 3 M13

M23

1 2

1 3

2 4

  p jk T ( j) = min sik + 1≤i≤m k vik k=1 3 

(2.7)

• H4 : Determine the smallest total average processing time of each job j on the HF with three stages by the following expression  m k  p jk 3 s +  ik i=1 vik T˜ ( j) = (2.8) m k k=1 In the last two rule rank the jobs in descending order of T ( j) and T˜ ( j) build an initial sequence according to the NEH algorithm. Illustration Example: We consider a problem consisting of 5 jobs in HF with three stages and two machines per stage. The processing time of jobs are given in matrix [ pk j ] and in Table 2.1 we give the speed vik and setup time sik of all machines, the scheduling flowchart is shown in Fig. 2.2. ⎡

[ pk j ]3×5

⎤ 8 9 888 = ⎣ 9 12 6 9 6⎦ 10 6 8 4 4

Fig. 2.2 The Gantt chart for five jobs and two machines per stage

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving …

23

In order to visualize the scheduling of all jobs in the three-stage HF. We consider the Johnson sequence π J = {3, 1, 2, 4, 5} given by heuristic H1 that we launch in the production cycle in our workshop. We can see that (C323 = 13, C123 = 22, C213 = 28, C523 = 29.5, C413 = 35). The total completion time of all job in the last stage represent the TFT of jobs in the sequence π J that is T F T (π J ) = 13 + 22 + 28 + 29.5 + 35 = 127.5 unit of time.

2.3.2 Metaheuristics To improve the solutions obtained in the initialization phase by heuristics based on the Jonson’s rule and the NEH algorithm. We apply three metaheuristics, the VNS, GRSAP and the IG algorithms. The implementation of these metaheuristics consists in defining a set of neighborhoods based on the methods of permutation and the insertions procedures we describe as follows N1 : Randomly select two jobs in a sequence and swap their positions. N2 : In the insertion neighborhood a position job π p is drawn at random and a position q is drawn at random. If the position q is to the right of π p then insert the job at the q position and shift the other jobs to the left. If not, insert the job in the q position and shift the other jobs to the right. N3 : The neighborhood defined by inversion consists of inverting positions of a block of jobs in current sequence. 2.3.2.1

Variable Neighborhood Search Algorithm

The VNS algorithm differs from the iterative local search algorithm by searching in several neighborhoods. The steps of the VNS metaheuristic is given by Algorithm 3. Algorithm 3 VNS Algorithm Input: π0 = (π1 , . . . , πn ) The initial sequence π ∗ ← π0 Define the entire neighborhood Nh (h = 1, . . . , h max ) while {the stopping criterion not satisfied} do π ← π∗ h←1 while h  h max do Disturbance the current solution and define π  in the neighborhood Nh (π ) Practice local research Choose of Nh (π  ) and find π " if T F T (π ")  T F T (π ∗ ) then π∗ ← π" else h ←h+1 end if end while end while Output: π ∗

24

S. Aqil and K. Allali

Therefore this allows a good exploration in the neighborhood of the current solution to reach the right solution. This metaheuristic is one of the most used in the optimization field, applied in this case to the HF scheduling problem [23], we apply it here to solve our problem.

2.3.2.2

Greedy Random Adaptive Search Procedure

The second metaheuristic that we implement for the resolution of this problem is GRASP. This algorithm is implemented for the resolution of the HF problem in [24, 25]. In the Algorithm 4, we present the steps of running this metaheuristic. Algorithm 4 GRASP with Local Search Input: π0 = (π1 , . . . , πn ) the initial sequence π ∗ ← π0 , π ← π0 while {the stopping criterion not satisfied} do π ← ∅,  ← π and evaluate the end date of all jobs π j ∈  while  = ∅ do Cmini ← min{Cπ j , π j ∈ } , Cmaxi ← max{Cπ j , π j ∈ } RC L = {π j , Cπ j ∈ [Cmini , Cmini + α × (Cmaxi − Cmini )]} Select a job πr from the RCL and choose the best position giving the minimum TFT π ← π ∪ πr , Update from  end while Proceed with a local search perturbation giving π  of neighborhood Nh (π ) if T F T (π  ) < T F T (π ) then π ← π if T F T (π ) < T F T (π ∗ ) then π∗ ← π end if else if random  ex p{−(T F T (π  ) − T F T (π ))/T } then π ← π end if end if end while Output: π ∗

It is based on two main phases, the first is the construction phase, where we select the job from a restricted list of candidates (RCL) defined in the interval given by Eq. 2.9. Cπ j ,3 ∈ [Cmini , Cmini + α × (Cmaxi − Cmini )]

(2.9)

where Cmini , Cmaxi respectively denotes the minimum and maximum end dates in the current sequence build in the last stage. The second phase consists of looking in the neighborhood of the current solution, considering the three neighborhood Nh (π ) h =

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving …

25

1, . . . , 3 types cited in the VNS algorithm. We reinforce this phase by introducing the model defined by simulated annealing. We consider here two parameters α and T for proposed model. The parameter α is the fundamental parameter of the construction phase, we take α ∈ {0.2, . . . , 0.8}, it allows to reduce or enlarge the size of the list RCL. The parameter T is also a determining factor in the local search phase, we consider the chosen expression given by the Eq. 2.10. n T =λ×

j=1

K k=1

min (sik +

1≤i≤m k

10 × K × n

p jk ) vik

(2.10)

This formula is inspired by the model defined in [26] initially expected for the K machine flow shop that we adopt for our three-stages HF problem. We chose to vary the coefficient λ such as: λ ∈ {0.9, . . . , 0.98}.

2.3.2.3

Iterative Greedy Algorithm with Local Search

The algorithm (IG) associated with the local search method also includes two main phases. The first phase is also constituted by a destruction and a construction subphase. The second phase is based on a local search by exploring the neighborhood of the current solution. We retain the same model presented in the local search phase of the GRASP algorithm. We present in the Algorithm 5 the detailed steps of determination of the solution obtained by the IG. This metaheuristic is frequently used in several scheduling problems [27]. We apply it to solve our problem and we adopt its parameters for our case study. A starting solution is given by one of the proposed heuristics, the objective is to improve it alliteratively. The implementation of this metaheuristic requires two basic configuration parameters. T : It is a parameter of simultate annealing given by Eq. 2.10. d : This parameter represents the number of jobs to extract in the destruction phase.

2.4 Numerical Simulation 2.4.1 Simulation Instances In order to validate the proposed metaheuristics, we are implementing a simulation study based on a set of test problems by varying the number of jobs, the number of machines per stage as well as the parameters of each metaheuristics. These test problems are classified into two broad categories, the first category includes small and medium size problems, the second category concerns larger size problems. For the first category, it is characterized by n ∈ {10, . . . , 90}, m k ∈ {2, . . . , 6} and the

26

S. Aqil and K. Allali

Algorithm 5 IG algorithm with Local Search Input: π sequence obtained by an initialization heuristic. π∗ ← π while {unsatis f ied stopping criterion} do π ← π for h = 1 to d do Extract a job πh at random from the sequence π  and add the job πh to the subset. end for for h = 1 to d do Extract the πh job from subset. Test the job on the different positions in the current sequence π  and choose the best position giving the smallest TFT. end for  Choose π from the neighborhood of Nk (π  )  if T F T (π ) < T F T (π ) then π ← π  if T F T (π ) < T F T (π ∗ ) then π∗ ← π end if else  if random  ex p{−(T F T (π ) − T F T (π ))/T } then  π ←π end if end if end while Output: π ∗

number of jobs to extract d ∈ {2, . . . , 10} in the IG algorithm. The processing times p jk are generated according to a uniform law knowing that p jk ∈ [1, 49]. For the second category of problems, we consider n ∈ {100, . . . , 260}, m k ∈ {6, . . . , 12} and d ∈ {10, . . . , 20}. The processing times are generated according to a uniform law such knowing that p jk ∈ [50, 99]. For two category, the speeds of the uniform machines are defined in the set of integers such that vik ∈ {1, . . . , 5} and the setup time is define by sik ∈ [10, 20].

2.4.2 Experimental Results The objective of the simulated instances is to propose a comparative study between the different algorithms. The variability of the parameters of each metaheuristic has led us to choose cases highlighting the significant difference between them. The first analysis consists of comparing the three metaheuristics by calculating the relative percentage deviation (RPD) given by the Eq. 2.11.   T FT − T FT ∗ × 100 (2.11) RPD = T FT ∗

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving …

27

Where TFT designates the value found by a metaheuristic and TFT * the minimum value. For a calibration of the RPD parameter we consider the results for an average of 10 instances for each type of test problem. The results of RPD are given by time t = ρ × n × max(m 1 , m 2 , m 3 ) milliseconds, where ρ ∈ {200, 300, 400}. The second analysis consists in studying the speed of convergence towards the optimal solution. Indeed, it will limit the search time in the neighborhood space, it will be given by a time limit depending on the category of simulated instances. For the first category, the limit time is defined by T ime1 = n × max(m 1 , m 2 , m 3 ) and for the second category we will take a limit time T ime2 = n2 × max(m 1 , m 2 , m 3 ). Tables 2.2 and 2.3 summarize the partial results presented in our paper for two category of problem for ρ = 200. In the first column, we specify the size of the studied instance, the second column represents the value of the optimal solution, the third column represents the limit time of the calculations for the convergence curves in the second analysis. In the remaining columns, the RPD values for the three metaheuristics (V N S, G R AS P, I G) are given according to the four heuristics (Hi , i = 1, . . . , 4). By analyzing the values found for the first category, in Table 2.2 we can see that of the 27 simulated problems, the highest RPD value is recorded by VNS for H1 of 6.28%, while the best values is 1.35% given by IG for H4 . In addition, the IG algorithm achieved a success rate of 66.96% for the four initialization heuristics. In the same way, in Table 2.3, the analysis of the values found for the second category records the clear dominance of the IG algorithm. It is also recorded that the highest value of RPD is achieved by VNS for H1 of 4.34%, and the smallest is 1.05% for IG initialized by H4 . We note that the algorithm records here a success rate of 77.77% for all initialization heuristics. The interest of the convergence of our metaheuristics in terms of calculation in the search for the optimal solution is a primordial factor in the analysis of the results. We give the results of two instances, 40 × 5 × 4 × 4 for category 1 whose time limit T ime1 is 200 s and 160 × 12 × 10 × 12 for category 2 whose time limit is T ime2 is 960 s. From the plots on Figs. 2.3, 2.4, 2.5 and 2.6 showing the evolution of TFT during the search for the optimal solution. We note that for the four initial heuristics the IG algorithm gives the best results in terms of the quality of the solution obtained and in terms of the speed of convergence towards this solution.

2.5 Conclusion In this paper, we have presented three metaheuristics to solve the problem of hybrid workshop flow scheduling with three stages, uniform parallel machine and with sequence-independent setup time. The goal is to minimize the total flow time of all jobs by taking the availability of machines in the process of scheduling. The used metaheuristics are the VNS, GRASP and IG algorithms. The complexity of the problem encountered led us to implement four heuristics that will be the basis

625

2885

2634

2478

5986

5358

4358

8025

8557

7854

13886

12555

11567

15235

14567

13563

24567

22432

24589

23879

23789

20456

34789

33543

30564

10 × 2 × 4 × 4

20 × 2 × 4 × 4

20 × 3 × 4 × 4

20 × 3 × 5 × 4

30 × 2 × 4 × 5

30 × 3 × 4 × 5

30 × 5 × 3 × 4

40 × 5 × 4 × 3

40 × 3 × 4 × 5

40 × 5 × 4 × 4

50 × 3 × 6 × 4

50 × 6 × 5 × 5

50 × 6 × 6 × 6

60 × 6 × 5 × 5

60 × 6 × 5 × 6

60 × 6 × 5 × 4

70 × 5 × 4 × 6

70 × 6 × 6 × 5

70 × 5 × 6 × 4

80 × 4 × 5 × 6

80 × 5 × 6 × 5

80 × 6 × 6 × 6

90 × 6 × 5 × 5

90 × 5 × 5 × 6

90 × 6 × 6 × 6

540

540

540

480

480

480

420

420

420

360

360

360

300

300

300

200

200

200

150

150

150

100

80

80

40

40

745

10 × 3 × 3 × 3

1.56

3.88

1.67

3.45

3.88

2.34

2.23

4.55

5.34

4.56

5.25

4.34

2.11

4.71

4.11

2.23

4.65

5.43

6.11

4.66

4.52

3.45

3.45

5.23

6.28

2.45

5.34

40

632

10 × 4 × 2 × 3

VNS

Time1 (s) H1

n × m 1 × m 2 × m 3 TFT*

Instance

4.56

4.88

3.55

2.78

5.33

1.84

3.34

4.55

3.45

3.12

5.45

4.67

3.36

5.71

3.44

4.56

4.33

3.56

5.21

5.66

3.31

5.65

6.22

4.45

4.55

5.44

4.25

H2

3.23

3.12

4.25

3.78

3.44

4.45

2.56

3.62

5.34

2.83

3.66

3.26

4.25

3.33

2.45

6.2

4.81

4.78

3.33

5.51

4.79

6.71

3.45

6.25

2.33

2.35

2.33

H3

3.76

2.11

2.34

4.23

2.22

3.78

4.42

2.55

2.98

4.45

3.11

2.79

3.56

3.11

6.14

3.58

4.25

2.89

4.25

5.25

5.56

4.41

4.55

4.78

4.66

3.55

4.66

H4

4.11

3.55

2.56

2.22

3.77

3.67

3.45

4.77

3.45

5.23

5.71

2.66

4.11

4.22

4.78

4.34

4.22

3.65

3.25

4.81

3.27

6.23

3.48

3.25

5.24

2.78

4.24

H1

GRSAP

3.86

4.22

1.82

3.12

3.41

3.89

3.56

4.72

5.34

3.21

5.33

3.55

3.11

5.55

3.54

5.21

4.55

4.45

2.88

5.55

4.41

4.98

6.11

3.25

4.55

5.33

3.55

H2

2.57

2.88

3.23

3.45

3.11

2.55

2.45

2.99

2.56

2.65

3.64

4.25

4.31

3.66

2.21

4.45

5.21

3.33

2.68

4.23

5.26

4.65

3.14

4.21

5.21

2.33

3.21

H3

3.64

2.78

2.45

3.56

2.67

2.67

3.31

4.61

3.24

2.45

4.44

5.34

2.12

2.87

3.55

3.78

5.56

2.89

3.65

5.55

6.12

3.31

4.78

3.34

4.53

3.22

3.53

H4

2.53

3.44

1.62

2.24

3.71

2.87

3.66

4.43

2.34

3.21

4.34

2.34

3.21

4.82

5.22

3.34

4.68

5.23

4.52

4.55

3.34

4.15

3.25

4.55

4.88

2.33

4.88

H1

IG

Table 2.2 Experimental results of the performance analysis for all algorithms of the category 1. (Best RPD value is in bold) H2

1.74

2.35

1.89

3.28

5.22

3.45

2.12

4.43

3.22

4.56

4.22

3.23

4.55

5.71

3.56

3.35

4.41

3.79

3.57

3.28

2.52

4.45

6.25

3.23

2.72

5.22

4.72

H3

1.54

2.71

2.23

2.67

2.77

1.62

2.23

2.88

3.31

3.67

4.22

3.22

3.45

3.66

2.78

4.23

4.67

4.11

2.34

3.67

2.42

2.78

3.11

3.35

2.57

2.22

3.57

H4

2.29

1.35

1.52

1.45

1.88

1.87

1.63

2.71

2.23

4.23

3.71

2.68

4.23

3.78

3.45

2.11

4.44

3.25

4.45

3.21

2.21

5.25

4.34

4.21

3.56

3.21

2.56

28 S. Aqil and K. Allali

TFT*

48645

44568

40234

52347

51897

50675

66345

65789

60897

74678

71457

68674

113456

117567

123456

153876

151324

157657

186987

196346

192456

211786

205452

200342

261234

250064

245164

n × m1 × m2 × m3

100 × 10 × 7 × 10

100 × 11 × 9 × 11

100 × 12 × 8 × 12

120 × 11 × 9 × 10

120 × 9 × 10 × 11

120 × 11 × 11 × 11

140 × 8 × 11 × 10

140 × 10 × 8 × 11

140 × 11 × 11 × 11

160 × 12 × 8 × 10

160 × 12 × 8 × 11

160 × 12 × 10 × 12

180 × 9 × 12 × 10

180 × 8 × 12 × 11

180 × 11 × 7 × 12

200 × 10 × 12 × 10

200 × 10 × 12 × 9

200 × 10 × 8 × 12

220 × 11 × 11 × 12

220 × 12 × 10 × 9

220 × 12 × 12 × 11

240 × 11 × 10 × 8

240 × 9 × 12 × 12

240 × 12 × 12 × 12

260 × 11 × 8 × 12

260 × 10 × 12 × 12

260 × 12 × 12 × 12

Instance

1560

1560

1560

1440

1440

1440

1320

1320

1320

1200

1200

1200

1080

1080

1080

960

960

960

770

770

770

660

660

660

600

550

500

2.56

1.76

2.34

3.12

1.71

2.45

3.22

2.11

3.11

3.45

2.53

3.23

2.23

2.88

2.45

3.12

2.66

2.89

2.56

3.11

2.78

3.27

2.87

2.56

3.34

2.11

4.34

Time1 (s) H1

VNS

1.76

1.66

2.11

2.78

1.77

2.89

2.67

2.11

3.45

2.25

2.77

3.56

2.45

2.88

2.33

2.24

2.88

1.89

3.34

3.44

3.2

2.23

2.99

3.21

2.65

2.22

3.55

H2

1.86

1.66

2.89

3.45

1.88

2.15

3.21

2.34

2.65

2.45

2.55

2.86

3.11

2.88

1.67

1.77

3.83

2.12

2.65

3.11

32.78

3.34

2.88

1.62

2.63

2.65

2.33

H3

1.86

2.11

1.89

2.76

2.22

2.35

2.28

2.55

2.46

2.22

3.11

2.87

3.21

3.11

2.87

2.26

3.11

3.11

2.63

3.12

3.56

2.89

2.21

2.56

3.66

2.11

5.66

H4

1.95

1.88

2.45

1.87

2.11

2.21

1.83

2.53

2.35

2.11

2.78

2.65

2.25

2.56

2.45

2.89

2.88

2.12

2.12

2.88

2.89

3.43

2.44

3.64

2.24

2.12

3.24

H1

GRSAP

1.85

1.55

2.22

2.22

1.77

1.87

1.45

2.51

2.74

3.25

2.88

2.75

1.56

2.98

2.55

3.15

3.11

2.22

2.34

3.22

2.68

2.78

2.66

2.98

1.55

2.13

3.44

H2

1.54

2.12

3.24

1.86

2.46

1.76

2.45

2.34

2.32

3.45

2.71

2.54

2.43

3.35

3.45

3.15

2.91

1.88

2.45

2.81

3.21

2.13

2.55

1.88

2.21

2.22

3.11

H3

1.88

1.81

2.15

1.69

2.11

2.11

1.85

2.78

2.86

2.66

2.51

2.53

2.62

2.71

2.68

3.21

2.91

2.45

2.54

3.11

3.56

3.67

2.81

3.23

3.53

2.31

2.53

H4

1.42

1.66

1.77

1.83

1.66

2.25

2.22

1.81

2.21

2.98

2.44

1.88

3.12

2.55

1.85

2.26

3.11

1.89

1.95

2.99

3.45

1.88

2.21

2.22

2.88

1.88

3.88

H1

IG

Table 2.3 Experimental results of the performance analysis for all algorithms of the category 2. (Best RPD value is in bold) H2

1.44

1.35

1.86

2.45

1.81

2.76

2.16

1.88

2.11

3.15

2.44

2.31

3.25

2.45

1.87

2.66

3.11

2.22

2.68

3.22

1.98

2.64

2.88

1.66

3.72

1.89

2.12

H3

1.68

1.15

2.11

2.35

1.77

2.68

1.77

1.89

2.89

2.56

2.11

2.54

3.24

2.55

1.66

2.23

3.11

1.67

2.86

3.61

2.22

1.64

2.76

1.88

2.77

2.11

2.57

H4

1.05

1.51

2.34

1.67

1.55

2.56

1.85

1.65

1.95

2.88

2.71

2.11

2.22

2.71

2.11

1.85

2.81

1.89

1.68

2.61

1.88

1.55

1.86

2.11

2.56

1.71

1.56

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving … 29

30

S. Aqil and K. Allali 4

1.6

4

x 10

x 10

VNS

H2

1.3

GRASPH1

1.4

IGH1

GRASPH2 IGH2

1.2 TFT

TFT

1.4

VNSH1

1.2

1.1 1

1

0.9

0.8 0

50

100 CPU Time

150

200

0.8 0

50

100 CPU Time

150

200

Fig. 2.3 Evolution of TFT value versus CPU time for heuristics H 1 and H 2 of category 1 4

4

x 10

1.4

VNSH3 IGH3

VNSH4

GRASPH4

GRSAPH3

1.2

x 10

IGH4

1.2

TFT

TFT

1.4

1

1

0.8 0

50

100 CPU Time

150

0.8 0

200

200

150

100 CPU Time

50

Fig. 2.4 Evolution of TFT value versus CPU time for heuristics H 3 and H 4 of category 1 4

7.6

4

x 10

x 10

VNSH2

GRASPH1

GRASP

IGH1

H2

VNS

7.5

H2

TFT

7.2

TFT

8

VNSH1

7.4

7 6.8

7

6.6 6.4 0

100

200

300

400 500 CPU Time

600

700

800

900

6.5 0

100

200

300

400 500 CPU Time

600

700

800

900

Fig. 2.5 Evolution of TFT value versus CPU time for heuristics H 1 and H 2 of category 2 4

4

8

x 10

8

VNVH3

x 10

VNS−H4 GRASPH4

GRASPH3 IG

7.5

IG

H4

7.5 TFT

TFT

H3

7

6.5 0

7

100

200

300

500 400 CPU Time

600

700

800

900

6.5 0

100

200

300

400 500 CPU Time

600

700

800

900

Fig. 2.6 Evolution of TFT value versus CPU time for heuristics H 3 and H 4 of category 2

of three metaheuristics chosen for the resolution of this problem. In fact, the initial solutions obtained by these heuristics are based on Johnson’s algorithm and the NEH algorithm that we adopt for our studied problem. We conducted a simulation study on a set of instances modeling industrial encountered cases. Our study was successful in terms of performance recorded by heuristics and metaheuristics applied to solve the problem. The final analysis of the simulation allows us to conclude that the iterative greedy algorithm gives good results compared to the other two metaheuristics, this by supporting initial solutions based on the adopted NEH algorithm.

2 On VNS-GRASP and Iterated Greedy Metaheuristics for Solving …

31

References 1. M.-L. Pinedo, Scheduling: Theory, Algorithms, and Systems (Springer, Science & Business Media, 2012) 2. J.-N. Gupta, Two-stage, hybrid flowshop scheduling problem. J. Oper. Res. Soc. 39(4), 359–364 (1988) 3. A. Vignier, J.-C. Billaut, C. Proust, Les problemes d’ordonnancement de type flow shop hybride: etat de l’art. RAIRO Oper. Res. 33(2), 117–183 (1999) 4. I. Ribas, R. Leisten, J.-M. Framinan, Review and classification of hybrid flow shop scheduling problems from a production system and a solutions procedure perspective. Comput. Oper. Res. 37(8), 1439–1454 (2010) 5. M.-M. Dessouky, Scheduling identical jobs with unequal ready times on uniform parallel machines to minimize the maximum lateness. Comput. Ind. Eng. 34(4), 793–806 (1998) 6. H. Allaoui, A. Artiba, Integrating simulation and optimization to schedule a hybrid flow shop with maintenance constraints. Comput. Ind. Eng. 47(4), 431–450 (2004) 7. V. Yaurima, L. Burtseva, A. Tchernykh, Hybrid flowshop with unrelated machines, sequencedependent setup time, availability constraints and limited buffers. Comput. Ind. Eng. 56(4), 1452–1463 (2009) 8. Q.-K. Pan, L. Gao, X.-Y. Li, K.-Z. Gao, Effective metaheuristics for scheduling a hybrid flowshop with sequence-dependent setup times. Appl. Math. Comput. 303, 89–112 (2017) 9. F. Nikzad, J. Rezaeian, I. Mahdavi, I. Rastgar, Scheduling of multi-component products in a two-stage flexible flow shop. Appl. Soft Comput. 32, 132–143 (2015) 10. P. Ramezani, M. Rabiee, F. Jolai, No-wait flexible flowshop with uniform parallel machines and sequence-dependent setup time: a hybrid meta-heuristic approach. J. Intell. Manuf. 26(4), 731–744 (2015) 11. O. Shahvari, N. Salmasi, R. Logendran, B. Abbasi, An efficient tabu search algorithm for flexible flow shop sequence-dependent group scheduling problems. Int. J. Prod. Res. 50(15), 4237–4254 (2012) 12. S. Wang, M. Liu, Two-stage hybrid flow shop scheduling with preventive maintenance using multi-objective tabu search method. Int. J. Prod. Res. 52(5), 1495–1508 (2014) 13. S. Jun, J. Park, A hybrid genetic algorithm for the hybrid flow shop scheduling problem with nighttime work and simultaneous work constraints: A case study from the transformer industry. Expert Syst. Appl. 42(15–16), 6196–6204 (2015) 14. W. Sukkerd, T. Wuttipornpun, Hybrid genetic algorithm and tabu search for finite capacity material requirement planning system in flexible flow shop with assembly operations. Comput. Ind. Eng. 97, 157–169 (2016) 15. Q.-K. Pan, L. Wang, J.-Q. Li, J.-H. Duan, A novel discrete artificial bee colony algorithm for the hybrid flowshop scheduling problem with makespan minimisation. Omega 45, 42–56 (2014) 16. O. Kheirandish, R. Tavakkoli-Moghaddam, M. Karimi-Nasab, An artificial bee colony algorithm for a two-stage hybrid flowshop scheduling problem with multilevel product structures and requirement operations. Int. J. Comput. Integr. Manuf. 28(5), 437–450 (2015) 17. L. Chen, H. Zheng, D. Zheng, D. Li, An ant colony optimization-based hyper- heuristic with genetic programming approach for a hybrid flow shop scheduling problem, in Evolutionary Computation (CEC), IEEE Congress on (IEEE, 2015), pp. 814–821 18. W. Qin, J. Zhang, D. Song, An improved ant colony algorithm for dynamic hybrid flow shop scheduling with uncertain processing time. J. Intell. Manuf. 29(4), 891–904 (2018) 19. S. Niroomand, A. Hadi-Vencheh, R. Ahin, B. Vizvari, Modified migrating birds optimization algorithm for closed loop layout with exact distances in flexible manufacturing systems. Expert Syst. Appl. 42(19), 6586–6597 (2015) 20. B. Zhang, Q.-K. Pan, L. Gao, X.-L. Zhang, H.-Y. Sang, J.-Q. Li, An effective modified migrating birds optimization for hybrid flowshop scheduling problem with lot streaming. Appl. Soft Comput. 52, 14–27 (2017) 21. S.-M. Johnson, Optimal two-and three-stage production schedules with setup times included. Naval Res. Logistics (NRL) 1(1), 61–68 (1954)

32

S. Aqil and K. Allali

22. M. Nawaz, E.-E. Enscore Jr., I. Ham, A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega 11(1), 91–95 (1983) 23. J. Behnamian, S.-F. Ghomi, Hybrid flowshop scheduling with machine and resource-dependent processing times. Appl. Math. Model. 35(3), 1107–1123 (2011) 24. H. Davoudpour, M. Ashrafi, Solving multi-objective SDST flexible flow shop using GRASP algorithm. Int. J. Adv. Manuf. Technol. 44(7–8), 737–747 (2009) 25. E. Alekseeva, M. Mezmaz, D. Tuyttens, N. Melab, Parallel multi-core hyper-heuristic GRASP to solve permutation flow-shop problem. Concurr. Comput. Pract. Exp. 29(9), (2017) 26. R. Ruiz, T. Stutzle, A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 177(3), 2033–2049 (2007) 27. K.-C. Ying, An iterated greedy heuristic for multistage hybrid flowshop scheduling problems with multiprocessor tasks. J. Oper. Res. Soc. 60(6), 810–817 (2009)

Chapter 3

A Variable Block Insertion Heuristic for the Energy-Efficient Permutation Flowshop Scheduling with Makespan Criterion M. Fatih Tasgetiren, Hande Oztop, Quan-Ke Pan, M. Arslan Ornek, and Talya Temizceri Abstract Permutation flow shop scheduling problem is a well-known problem in the scheduling literature. Even though various multi-objective permutation flowshop scheduling problems have been studied in the literature, energy consumption consideration in scheduling is still very seldom. In this paper, we consider a biobjective permutation flowshop scheduling problem with the objectives of minimizing the total energy consumption and the makespan. We present a bi-objective mixed integer programming model for the problem applying a speed-scaling approach. Then, we employ the augmented ε-constraint method to generate the Pareto-optimal solution sets for small-sized instances. For larger instances, we use the augmented ε-constraint method with a time limit on CPLEX solver to approximate the Pareto frontiers. We also propose a heuristic approach, which employs a very recent variable block insertion heuristic algorithm. In order to evaluate performance of the proposed algorithm, we have carried out detailed computational experiments using well-known benchmarks from the literature. First, we present the performance of the proposed algorithm on small-sized problems; then, we show that the proposed algorithm is very effective to solve larger problems as compared with the time-limited CPLEX. M. F. Tasgetiren (B) Qatar University, Doha, Qatar e-mail: [email protected] H. Oztop · M. A. Ornek Yasar University, Izmir, Turkey e-mail: [email protected] M. A. Ornek e-mail: [email protected] Q.-K. Pan Shanghai University, Shanghai, P. R. China e-mail: [email protected] T. Temizceri Bilgi University, Istanbul, Turkey e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_3

33

34

M. F. Tasgetiren et al.

3.1 Introduction In permutation flowshop scheduling problem (PFSP), a set of n jobs have to be processed on m machines, sequentially, where each job i (i ∈ 1, ..., n) has a fixed processing time pi,k on machine k (k ∈ 1, ..., m). It is also assumed that the jobs are ready at time zero and the same permutation is used on all machines. Then, the aim is to find a permutation of jobs minimizing the makespan (maximum completion time). More formally, given a job processing order, i.e., a permutation of jobs, σ = {σ1 , σ2 ,..., σn }, let job σi be the job at the ith position of the order σ , and has a processing time pσi ,k on machine k. Let C(i, k) be the completion time of job σi on machine k, which is computed as follows: C(1, 1) = pσ1,1

(3.1)

C(i, 1) = C(i − 1, 1) + pσi,1

∀i = 2, ..., n

(3.2)

C(1, k) = C(1, k − 1) + pσ1,k

∀k = 2, ..., m

(3.3)

C(i, k) = max{C(i − 1, k), C(i, k − 1)} + pσi,k

∀i = 2, ..., n; k = 2, ..., m (3.4)

Then, the makespan of the solution Cmax (σ ) is the completion time of the last job on the last machine, which is simply denoted as C(n, m). Note that, the PFSP with makespan criterion is known to be NP-complete [16]. In PFSP literature, makespan, tardiness and flow time performance measures have been widely studied in order to improve production efficiency and customer satisfaction. Note that, the most commonly studied criterion in the literature is the minimization of the makespan, as this criterion is very important to increase the utilization of machines. Consequently, various exact and heuristic solution approaches have been published for this problem. Comprehensive reviews on PFSP with makespan criterion can be found in [10, 27]. Furthermore, some studies in the literature consider several of these production-efficiency related objectives simultaneously. A recent review on multi-objective PFSP is provided in [37]. However, energy efficiency has been rarely considered in production scheduling. Recently, energy consumption has become a key concern for the manufacturing sector because of the negative environmental effects such as gas emissions (CO2 ) and global warming. As the manufacturing facilities consume high energy, they are forced to reduce their energy consumption by developing energy-efficient scheduling systems [7].

3 A Variable Block Insertion Heuristic …

35

A study on a detailed energy-efficient scheduling framework is presented by [11]. In [21], it has been concluded that once machines are turned-off during idle times, a considerable amount of energy could be saved . Later on, this turn-off approach was employed in the single machine-scheduling problem to minimize total energy consumption and total tardiness [20]. Similarly, the turn-off approach was also used for the flexible flowshop problem in [3]. However, the major disadvantage of the turn-off approach is that it might ruin the service-life of the machines. Hence, this strategy may not be applicable in some production systems. Due to the inefficiency of the turn-off approach, a speed scaling strategy was initially developed for the energy-efficient FSP [7], in which the processing speed is considered as an independent variable that can be adjusted to increase energy efficiency. Later, a mixed integer programming (MIP) formulation was proposed for the PFSP in [8], considering makespan as the objective and the peak power consumption as a constraint. The speed scaling approach was also used in various studies on energy-efficient PFSP with makespan objective [1, 5, 13, 17, 18, 39]. A multi-objective NEH and a multi-objective iterated greedy algorithm were presented for the PFSP that minimizes total carbon emissions and makespan [5]. Speed scaling approach was also used for the two machine PFSP with sequence-dependent setup times, and some lower bounds and heuristic algorithms were proposed in [1, 18]. Later, a mathematical model and a multi-objective evolutionary algorithm were proposed for the energy-efficient PFSP with sequence-dependent setup times in [13]. An energy efficient backtracking algorithm [17] and a hybrid shuffle frog-leaping algorithm [39] were also presented for the energy-efficient PFSP with makespan criterion. Recently, a mixed-integer programming model and an iterated greedy algorithm were proposed for the biobjective PFSP with energy consumption and total flow time criteria [25]. Speed scaling strategy was also used for the other variants of energy-efficient scheduling problems such as single machine scheduling problem [9, 32], job shop scheduling problem [38] and distributed PFSP [4, 14, 35]. In this study, a bi-objective mixed integer linear programming model (MILP) is developed for the energy-efficient PFSP using a speed-scaling strategy, in order to investigate the trade-off between Cmax and the total energy consumption (TEC). In addition, we propose a novel energy-efficient variable block insertion heuristic (VBIH) algorithm employing the speed scaling strategy similar to those proposed in [5, 18]. The authors used a matrix representation for the speed scaling, where a different speed scaling strategy is employed for a job on each machine. However, we employ a simple job-based speed scaling approach in this study as in [25], where the same speed strategy is used for a job on all machines. The remainder of this paper is organized as follows. Section 3.2 develops the MILP formulation for the bi-objective PFSP. Section 3.3 explains the bi-objective energy-efficient VBIH (EE_VBIH) algorithm. Section 3.4 provides the computational results. Finally, the conclusions as well as future research directions are presented in Sect. 3.5.

36

M. F. Tasgetiren et al.

3.2 Problem Formulation In this study, we consider the PFSP with the objectives of Cmax and TEC. As mentioned before, we employ a speed-scaling approach. Hence, it is assumed that the machines can process the jobs with different speed levels. As high (low) speed levels result in low (high) processing times and high (low) energy consumptions, there is a trade-off between Cmax and TEC. The notation used in the mathematical formulation is given in Table 3.1. The problem formulation for the bi-objective PFSP is provided below. Note that, we use the energy-efficient PFSP model of [25] by changing the objective function. Min Cmax , TEC Ci1 ≥

 pi1 yi1l l

l∈L

Cij − Ci,j−1 ≥

 pij yijl l∈L

Cij − Ckj + Dxik ≥

l

 pij yijl l∈L

l

(3.5) (1 ≤ i ≤ N )

(3.6)

(2 ≤ j ≤ M ; 1 ≤ i ≤ N )

(3.7)

(1 ≤ j ≤ M ; 1 ≤ i < k ≤ N )

(3.8)

Table 3.1 Notation Sets and Indices L Set of speed levels (l ∈ L) j Index for machines (1 ≤ j ≤ M ) i, k Index for jobs (1 ≤ i, k ≤ N ) Parameters pij Processing time of job i on machine j l Speed factor of speed l λl Processing conversion factor for speed l ϕj Conversion factor for idle time on machine j τj Power of machine j (kW) D A very large number Decision Variables yijl 1 if job i is processed at speed l on machine j; 0, otherwise xik 1 if job i precedes job k; 0 otherwise (i < k) Cij Completion time of job i on machine j θj Idle time on machine j Cmax Maximum completion time (makespan) TEC Total energy consumption (kWh)

3 A Variable Block Insertion Heuristic …

Cij − Ckj + Dxik ≤ D −

 pkj ykjl l∈L

l

37

(1 ≤ j ≤ M ; 1 ≤ i < k ≤ N )

Cmax ≥ CiM 

(1 ≤ i ≤ N )

yijl = 1

(1 ≤ i ≤ N ; 1 ≤ j ≤ M )

(3.9) (3.10) (3.11)

l∈L

yijl = yi,j+1,l

(1 ≤ i ≤ N ; 1 ≤ j < M ; l ∈ L) (3.12)

θj = Cmax −

N   pij yijl i=1 l∈L

TEC =

(1 ≤ j ≤ M )

l

N  M   pij τj λl i=1 j=1 l∈L

60l

yijl +

M  ϕj θj τj j=1

60

(3.13)

(3.14)

Objective function (3.5) minimizes Cmax and TEC. Constraint set (3.6) ensures that the completion time of each job on machine 1 is greater than or equal to its processing time on machine 1. Constraint set (3.7) states that a job can be started only after its preceding operation has been completed. Constraints sets (3.8) and (3.9) guarantee that job i either precedes job k or vice versa in the sequence. Constraint set (3.10) computes the makespan, which is the maximum completion time of all jobs on the last machine. Constraint sets (3.11) and (3.12) ensure that exactly one speed level is selected for each job and the same speed level is employed on every machine. Constraint set (3.13) calculates the idle time of each machine. Constraint set (3.14) computes the total energy consumption as proposed in [18]. Since it is a multi-objective problem, we provide the basic dominance relation concepts [15] that will be used to solve the energy-efficient PFSP: • Definition 1: Dominance relation. – A solution xi dominates another solution xj if the two following conditions are satisfied (denoted as xi ≺ xj ), where Q is the number of objectives: ∀q ∈ 1, .., Q; fq (xi ) ≤ fq (xj ) ∃q ∈ 1, .., Q; fq (xi ) < fq (xj )

(3.15)

– A solution xi weakly dominates another solution xj (denoted as xi  xj ) if: ∀q ∈ 1, .., Q; fq (xi ) ≤ fq (xj )

(3.16)

38

M. F. Tasgetiren et al.

• Definition 2: Non-dominated set: – Among a set of solutions X , the non-dominated set of solutions (X ∗ ) includes the solutions that are not dominated by any element of the set X . • Definition 3: Pareto-optimal set: – The non-dominated set of the entire feasible search space S is called as Paretooptimal set. There are common solution methods for multi-objective problems such as sequential optimization, goal programming, weighting method and ε-constraint method. In this study, we prefer to use augmented ε-constraint method, as it generates only Pareto-optimal solutions [19]. In Pareto-optimal solutions, any objective function cannot be improved without worsening another objective function. In augmented εconstraint method, one of the objective functions is optimized, while other objective functions are defined as constraints. Dissimilar to the standard ε-constraint method, slack/surplus variables are included in these objective function constraints by converting them to equalities, and they are defined as second term in the objective function to ensure the Pareto-optimality. Further details of the augmented ε-constraint method can be found in [19].

3.3 Energy-Efficient VBIH Algorithm Recently, block move-based search algorithms are presented for scheduling problems in literature [12, 28, 31, 33, 34, 36]. Inspired from these studies, we present a biobjective energy-efficient VBIH (EE_VBIH) algorithm in this paper. As mentioned in Sect. 3.3.1, we initially employ the variable block insertion heuristic (VBIH) algorithm with only makespan criterion to obtain a good initial solution for the EE_VBIH algorithm. Hence, we briefly explain the single-objective VBIH algorithm as follows. The VBIH algorithm in this paper simply removes a block of b jobs from the current solution σ ; then it makes n − b + 1 of block insertion moves on the partial solution denoted as bMove(σ, b) procedure. Then, the best move from the bMove(σ, b) procedure is retained to undergo a local search procedure. If the new solution obtained after the local search is better than the current solution, it replaces the current solution. Otherwise, a simple simulated annealingtype of acceptance criterion is used with a constant temperature, as suggested in [24], as follows: n m pij T = i=110nmj=1 ∗ τ P, where τ P is a parameter to be adjusted. Initially, the block size is fixed to b = 1. As long as it improves, it retains the same block size. Otherwise, it is increased by one (b = b + 1). The bMove(σ, b) procedure is carried out until the block size reaches at the maximum block size bmax . The outline of the VBIH algorithm is given in Algorithm 3.1, in which r is a uniform random number in U (0, 1).

3 A Variable Block Insertion Heuristic …

39

Algorithm 3.1 Variable block insertion heuristic 1: σ = NEH 2: σ best = σ 3: while NotTermination do 4: b = 1 5: repeat 6: σ 1 = bMove(σ, b) 7: σ 2 = LocalSearch(σ 1 ) 8: if (f (σ 2 ) ≤ f (σ )) then 9: σ = σ2 10: if (f (σ 2 ) < f (σ best )) then 11: σ best = σ 2 12: end if 13: else 14: b=b+1 15: if (r < exp{−(f (σ 2 ) − f (σ ))/T }) then 16: σ = σ2 17: end if 18: end if 19: until (b > bmax ) 20: end while 21: return σ best and f (σ best )

As mentioned before, a job-based speed scaling strategy is used for the energy-efficient VBIH algorithm, in this study. To handle the speed scaling strategy, a multi-chromosome structure is defined for the EE_VBIH, which is composed of a permutation of n jobs (σ ) and a speed vector of three levels (v). Note that, the three speed levels correspond to fast, normal and slow speed levels, respectively. The solution representation is given in Fig. 3.1.   σ In Fig. 3.1, a solution (individual) x indicates that job σ1 = 5 has a slow v speed level (v1 = 3), job σ2 = 2 has a fast speed level (v2 = 1) and so on. The initial population generation, block insertion, local search, crossover, mutation and archive set update procedures of the proposed bi-objective EE_VBIH algorithm are explained in the following subsections.

3.3.1 Initial Population For the initial population with size NP, the following procedure is used: A solution is constructed by the NEH heuristic [22]. This solution is used as an initial solution for the VBIH algorithm with makespan minimization only as explained in Algorithm 3.1.

Fig. 3.1 Solution representation

x

5 2 1 4 3 ... n 3 1 2 1 2 ... 3

40

M. F. Tasgetiren et al.

Ten percent of the total CPU time budget is devoted to the VBIH algorithm in order to obtain a good starting point for the EE_VBIH algorithm. Once the best solution σbest is found by the VBIH algorithm, the first three solutions in the population are obtained by assigning a fast, normal or slow speed level to each job in the best solution σbest . The rest of the population is obtained by assigning a random speed level to each job in the best solution σbest . The archive set , which is initially empty, is updated.

3.3.2 Energy-Efficient Block Insertion Procedure The bMove(x, b) procedure is a core function in the EE_VBIH algorithm. The procedure randomly removes a block of b jobs with their speed levels from the current solution x. Note that, the block is denoted by xb , whereas the partial solution after the removal is denoted by xp . Then, speed levels of the jobs in xb are randomly changed between 1 and 3. Afterwards, similar to the one presented in [6], the EE_VBIH algorithm applies an additional local search to partial solution xp before carrying out a block insertion. Then; the bMove(x, b) procedure carries out n − b + 1 block insertion moves, and the best insertion is obtained. In other words, block xb is inserted in all possible positions of the partial solution xp to find the best insertion. It should be noted that dominance rule (≺) is used when two solutions and/or partial solutions are compared. In order to explain the bMove(x, b) procedure, the following  example is given.    σ 3, 1, 4, 2, 5 Suppose that we have a current solution x = with block size v 2, 1, 3, 1, 2 b= and  two partial  2.A block  is removed    solutions are obtained as follows: 1, 4 3, 2, 5 b σ p σ = and x = . Initially, speed levels of xb are ranx v 1, 3 v 2, 1, 2     σ 1, 4 domly changed, say, xb = . Then; an insertion local search is applied to v 3, 2 the partial solution xp , in a way that each job and speed pair is removed from xp and inserted into all positions except the position it is removed.  The  best  non-dominated  5, 2, 3 p σ partial solution is retained. Suppose that the best one is x = . Finally, v 2, 1, 2     σ 1, 4, 5, 2, 3 = the block xb is inserted into all positions in xp as follows: x , v 3, 2, 2, 1, 2             σ 5, 1, 4, 2, 3 σ 5, 2, 1, 4, 3 σ 5, 2, 3, 1, 4 . x = ,x = , and x = v 2, 3, 2, 1, 2 v 2, 1, 3, 2, 2 v 2, 1, 2, 3, 2 Finally, among these four solutions, the non-dominated one is selected and the archive set is updated.

3 A Variable Block Insertion Heuristic …

41

3.3.3 Energy-Efficient Insertion Local Search Regarding the local search algorithm, we employ an insertion neighborhood structure for each individual i in the population. Similar to the bMove(x, b) procedure, each job and speed pair is removed from the current solution by randomly varying the speed level, and inserted into all positions of the current solution. The non-dominated solution is retained and the archive set is updated. Note that, the insertion local search has a size of (n − 1)2 .     σ 3, 1, 4, 2, 5 As an example, we consider the solution above x = . The v 2, 1, 3, 1, 2   3 first job and its speed level, are removed from the current solution x and its 2   3 speed level is randomly changed to another speed level, say, . Then; they are 1 inserted the solution x  as follows:       into  all possiblepositions   of  σ 1, 4, 3, 2, 5 σ 3, 1, 4, 2, 5 σ 1, 3, 4, 2, 5 = , x = ,x = ,x v 1, 3, 1, 1, 2 v 1, 1, 3, 1, 2 v 1, 1, 3, 1, 2         σ 1, 4, 2, 3, 5 σ 1, 4, 2, 5, 3 x = ,x = . v 1, 3, 1, 1, 2 v 1, 3, 1, 2, 1 Among these five solutions, the non-dominated one is selected and the archive set is updated. This is repeated for the next pair of a job and a speed level, until the last job and speed pair is inserted into all possible positions.

3.3.4 Energy-Efficient Uniform Crossover and Mutation We also employ a local search algorithm based on uniform crossover operator, considering only speed levels. Note that, with the same permutation, any change in speed levels leads to a different solution in terms of Cmax and TEC. For this reason, having applied the aforementioned block insertion and local search procedures to each individual, the same permutation is kept for each individual in the population and a uniform crossover is carried out on speed levels as follows. For each individual xi in the population, we select another individual xk from the population randomly. Then, the new solution is obtained by taking the speed level either from xi or xk with a crossover probability pC[i] as follows:

42

M. F. Tasgetiren et al.

   xi (vij ) = xi (vij ) ifrij ≤ pC[i] σ xnew = v xi (vij ) = xk (vkj ) otherwise

∀j ∈ 1, .., n

(3.17)

where rij is a uniform random number in U (0, 1) and pC[i] is the crossover probability, which is drawn from unit normal distribution N (0.5, 0.1) for each individual xi in the population. If xnew dominates xi (xnew ≺ xi ), xi is replaced by xnew and the archive set is updated. This is repeated for all individuals in the population. Having carried out uniform crossover for all individuals in the population, we mutate the population by lowering the speed levels with a small mutation probability as follows:    xi (vij ) = xi (vij = 1 + rand ()%2) ifrij ≤ pM [i] ∀j ∈ 1, .., n; σ xi = i = 1, .., NP v otherwise xi (vij ) = xi (vij ) (3.18) where rij is a uniform random number in U (0, 1) and pM [i] is the mutation probability, which is drawn from unit normal distribution N (0.05, 0.01) for each individual xi in the population.

3.3.5 Archive Set An archive set is used to store the non-dominated solutions during the optimization process. This archive set should be updated with non-dominated solutions in order to approximate the Pareto-optimal solutions. When a new non-dominated solution is obtained, it should be added to the archive set and any member dominated by the new non-dominated solution should be removed.

3.3.5.1

Archive Set Update

In order to update the archive set , [26] proposed an effective method as follows. Initially, non-dominated solutions in are stored in increasing order of their first objective function values. Then, their second objective values will be in decreasing order. The procedure for updating the archive set is summarized in Algorithm 3.2.

3 A Variable Block Insertion Heuristic …

43

Algorithm 3.2 Archive set update 1: Step 1. Archive size is s = | | and = {a1 , a2 , ..., as }. In the beginning, is empty and the first non-dominated solution x will be added to the first position in . Let k = 1. 2: Step 2. Find a most appropriate position pos for the next individual x in the archive set : 3: repeat 4: j = (k + s)/2 5: if (f1 (x) = f1 (aj )) then 6: j = j, break 7: else if (f1 (x) < f1 (aj )) then 8: s=j−1 9: else 10: k =j+1 11: end if 12: until (k > s) 13: Step 3. When comparing f1 (x) with f1 (aj ), following cases may happen: 14: Case 1. 15: if (f1 (x) = f1 (aj ) and f2 (x) < f2 (aj )) then 16: pos = j 17: end if 18: Case 2. 19: if (f1 (x) < f1 (aj )) then 20: if (j = 1) then 21: pos = j 22: s=s+1 23: else if (j > 1 and f2 (x) < f2 (aj−1 )) then 24: pos = j 25: s=s+1 26: end if 27: end if 28: Case 3. 29: if (f1 (x) > f1 (aj ) and f2 (x) < f2 (aj )) then 30: pos = j + 1 31: s = s + 1 32: end if 33: If any of cases above is satisfied, the solution x is inserted to position pos , and all solutions dominated by x in are eliminated. The dominated solutions are eliminated from as follows: 34: Step 1. 35: if (pos = s) then 36: go to Step 4 37: end if 38: Step 2. Let pos = pos + 1. 39: if (f2 (apos ) ≥ f2 (x)) then 40: remove apos 41: else 42: go to Step 4 43: end if 44: Step 3. 45: if (pos < s) then 46: go to Step 2 47: end if 48: Step 4. = Report non-dominated solutions

44

3.3.5.2

M. F. Tasgetiren et al.

Crowding Distance

For a solution in , the crowding distance is the sum of the normalized distances between its previous and next neighbors for each objective function value. The extreme solutions have the crowding distance value set to infinity. It is clear that the larger the crowding distance, the sparser the neighbor solutions. Based on the storage structure of , the crowding distance of a non-dominated solution aj is computed as follows:  ∞ if (j = 1 or j = s) cDj = f1 (aj+1 )−f1 (aj−1 ) f2 (aj−1 )−f2 (aj+1 ) (3.19) + f2 (a1 )−f2 (as ) otherwise f1 (as )−f1 (a1 )

3.4 Computational Results To test the performance of the algorithms, extensive experimental evaluations are carried out on the benchmark suite of Taillard [29]. In this paper, we only employ the first 60 instances from 20 jobs and 5 machines to 50 jobs and 20 machines (20 × 5, 20 × 10, 20 × 20, 50 × 5, 50 × 10 and 50 × 20). In addition, due to the computational difficulty of the studied bi-objective problem, we generate 30 small-sized instances with 5 jobs and 5 machines, 5 jobs and 10 machines, 5 jobs and 20 machines by truncating 20 × 5, 20 × 10 and 20 × 20 instances. The energy parameters are defined as l = {1.2, 1, 0.8} , and λl = {1.5, 1, 0.6} for the fast, normal and slow speed levels, respectively [18]. The power of machines are assumed to be same (60 kW ) and the conversion factor for idle time is taken as 0.05 [18]. All instances are solved with the augmented ε-constraint method using IBM ILOG CPLEX 12.6.3 on a Core i7, 2.60 GHz, 8 GB RAM computer. We minimize Cmax by defining TEC as a constraint. Initially, we find the range of each objective function from the payoff tables employing lexicographic optimization. Afterwards, we solve the single-objective model repetitively by reducing the constraint on TEC with a specific ε level. We attain very close approximations for the Pareto-optimal frontiers of the problems with 5 jobs (5 × 5, 5 × 10 and 5 × 20) choosing an ε level as 10−3 . These sets of Pareto-optimal solutions are called as Pareto-optimal solution set (P). Due to the exponentially increasing solution times, we find non-dominated solution sets for larger instances using a relatively higher ε level, which is calculated by partitioning the range of TEC objective function to 20 equal grids. We set 3 minutes time limit in each iteration for the larger instances. The EE_VBIH algorithm is coded in C++ programming language on Microsoft Visual Studio 2013. Population size is taken as NP = 100. For the VBIH with makespan minimization only, the maximum block size is determined as bmax = 5; and the temperature for acceptance criterion is taken as τ P = 0.4 after the preliminary trials. Similarly bmax = 5 setting is used for the EE_VBIH algorithm. Five replications are carried out for each instance. In each replication, the algorithm is

3 A Variable Block Insertion Heuristic …

45

Table 3.2 Comparison of EE_VBIH and CPLEX on small-sized instances Instance set (nxm) Rp IGD DSI 5×5 5 × 10 5 × 20 Average

0.9820 0.8170 0.6360 0.8117

0.00003 0.00023 0.00055 0.00027

0.7000 0.8346 0.8902 0.8083

run for 10 nm ms for small instances and 30 nm ms for larger instances, in which n is the number of jobs and m is the number of machines. It is important to note that we initially set the archive size to s = 5xNP in each replication. After five replications, we keep only non-dominated solutions in , since a solution in a replication can dominate a solution in another replication. Due to the real values of objective functions, we generate as many as non-dominated solutions during five replications. However, we compute the crowding distances of all these solutions and we only report the most crowded solutions up to s = 100 . As stated in [23], there are three key criteria to measure the quality of a nondominated solution set: the cardinality, the proximity to the Pareto-optimal frontier and the distribution of the solutions. Hence, in this study, we assess the performance of the proposed MILP and EE_VBIH algorithm regarding these three main criteria. As we find near approximations to the Pareto-optimal frontiers for instances with five jobs, we use below performance measures to evaluate the solution quality of the EE_VBIH algorithm. Note that, I refers to the non-dominated solution set of the EE_VBIH algorithm. • Ratio of the Pareto-optimal solutions found: Rp = |I ∩ P|/|P| • Inverted Generational Distance [2]: I GD = v∈P d (v, I )/|P|, where d (v, I ) indicates the minimum Euclidean distance between v and the closest solution in I . The low IGD value means that set I isvery close to set P.  [1 (di −d¯ )2 ]1/2 d where d¯ = |Ii∈I| i , di is the min• Distribution Spacing [30]; DSI = |I | i∈I d¯ imum Euclidean distance between solution i and its closest neighbour in I . Low spacing value shows that the solutions in I are uniformly distributed. Table 3.2 reports the average results of Rp , IGD and DS measures for each smallsized instance set, where there are 10 instances in each set. As shown in the table, the EE_VBIH algorithm finds approximately 82% of the Pareto-optimal solutions. Especially, for eight instances, all Pareto-optimal solutions are found by EE_VBIH algorithm. Furthermore, average IGD value of the EE_VBIH algorithm is very low (0.00027), which indicates that the EE_VBIH provides very close approximations to the Pareto-optimal solution sets. In terms of distribution spacing, we can say that solutions in I are evenly distributed due to the low DS value. For large instances, the non-dominated solution sets of EE_VBIH algorithm (I ) and time-limited CPLEX (M ) are compared with each other in terms of the below performance metrics and the aforementioned DS metric (averaged over ten instances).

46

M. F. Tasgetiren et al.

Table 3.3 Comparison of EE_VBIH and CPLEX on large instances Instance set |M | |I| CMI CIM DSM (nxm) 20 × 5 20 × 10 20 × 20 50 × 5 50 × 10 50 × 20 Average

18.300 16.300 16.900 14.400 9.000 6.900 13.633

100.000 93.800 83.700 100.000 80.700 67.100 87.550

0.057 0.013 0.020 0.000 0.001 0.000 0.015

0.651 0.786 0.808 0.866 0.957 1.000 0.845

0.181 0.212 0.301 0.415 0.737 0.941 0.464

DSI 2.019 2.481 2.289 4.428 4.271 4.115 3.267

• Cardinality: the number of non-dominated solutions found. • Coverage of Two Sets (C) [40]: CIM = |m ∈ M ; ∃i ∈ I : i  m|/|M | where CIM equals 1 if some solutions of I weakly dominate all solutions of M . Table 3.3 reports the average results for M and I for each large instance set, where there are 10 instances in each set. As shown in the table, EE_VBIH generates approximately seven times as many non-dominated solutions than the time-limited CPLEX in very reasonable computation times. Furthermore, EE_VBIH performs much better than the time-limited CPLEX in terms of coverage metric, since 85% of the solutions of M are weakly dominated by some solutions of I . Particularly, some solutions of I weakly dominate all solutions of the M , in 17 out of 60 instances. On contrary, solutions of M weakly dominate only 1% of the solutions in I . In terms of distribution spacing, solutions in M are distributed more uniformly than the solutions in I , as a fixed ε level is employed through the augmented ε-constraint method in the time-limited CPLEX.

3.5 Conclusions This paper presents an energy-efficient PFSP with the minimization of makespan and total energy consumption, employing a simple job-based speed scaling strategy. We develop a bi-objective MILP model and an efficient bi-objective VBIH algorithm for the problem. To test the performance of the proposed approaches, we generate smallsized instances from Taillard’s benchmarks and we obtain Pareto-optimal solution sets using the MILP model. For larger instances, we employ time limited CPLEX approach to find non-dominated solutions for each instance. The results for small instances show that the EE_VBIH algorithm is able to find approximately 82% of the Pareto-optimal solutions. Especially, for eight instances, all Pareto-optimal solutions are found by EE_VBIH algorithm. For larger instances, EE_VBIH generates approximately seven times more non-dominated solutions than the time-limited CPLEX in very reasonable computation times. Furthermore,

3 A Variable Block Insertion Heuristic …

47

EE_VBIH performs much better than the time-limited CPLEX in terms of coverage metric, as the EE_VBIH algorithm weakly dominates 85% of the solutions of time-limited CPLEX. Contrarily, only 1% of the solutions of EE_VBIH are weakly dominated by some solutions of time-limited CPLEX. For further research, the matrix representation for speed scaling strategy can be easily adapted for the problem by modifying the MILP model and EE_VBIH algorithm. Other multi-objective metaheuristic algorithms can also be employed for the studied problem. Different performance measures such as total tardiness and total flow time criteria can also be another research direction.

References 1. S. Afshin Mansouri, E. Aktas, Minimizing energy consumption and makespan in a two-machine flowshop scheduling problem. J. Oper. Res. Soc. 67(11), 1382–1394 (2016) 2. C.A.C. Coello, G.B. Lamont, D.A. Van Veldhuizen et al., Evolutionary Algorithms for Solving Multi-Objective Problems, vol. 5 (Springer, Berlin, 2007) 3. M. Dai, D. Tang, A. Giret, M.A. Salido, W. Li, Energy-efficient scheduling for a flexible flow shop using an improved genetic-simulated annealing algorithm. Robot. Comput.-Integrat. Manufact. 29(5), 418–429 (2013) 4. J. Deng, L. Wang, C. Wu, J. Wang, X. Zheng, A competitive memetic algorithm for carbonefficient scheduling of distributed flow-shop, in Intelligent Computing Theories and Application, ed. by D.S. Huang, V. Bevilacqua, P. Premaratne (Springer International Publishing, Cham, 2016), pp. 476–488 5. J.Y. Ding, S. Song, C. Wu, Carbon-efficient scheduling of flow shops by multi-objective optimization. Eur. J. Oper. Res. 248(3), 758–771 (2016) 6. J. Dubois-Lacoste, F. Pagnozzi, T. Stützle, An iterated greedy algorithm with optimization of partial solutions for the makespan permutation flowshop problem. Comput. Oper. Res. 81, 160–166 (2017) 7. K. Fang, N. Uhan, F. Zhao, J.W. Sutherland, A new approach to scheduling in manufacturing for power consumption and carbon footprint reduction. J. Manufact. Syst. 30(4), 234–240 (2011) 8. K. Fang, N.A. Uhan, F. Zhao, J.W. Sutherland, Flow shop scheduling with peak power consumption constraints. Ann. Oper. Res. 206(1), 115–145 (2013) 9. M. Fatih Tasgetiren, H. Öztop, U. Eliiyi, D.T. Eliiyi, Q.K. Pan, Energy-efficient single machine total weighted tardiness problem with sequence-dependent setup times, in Intelligent Computing Theories and Application, ed. by D.S. Huang, V. Bevilacqua, P. Premaratne, P. Gupta (Springer International Publishing, Cham, 2018), pp. 746–758 10. V. Fernandez-Viagas, R. Ruiz, J.M. Framinan, A new vision of approximate methods for the permutation flowshop to minimise makespan: state-of-the-art and computational evaluation. Eur. J. Oper. Res. 257(3), 707–721 (2017) 11. C. Gahm, F. Denz, M. Dirr, A. Tuma, Energy-efficient scheduling in manufacturing companies: a review and research framework. Eur. J. Oper. Res. 248(3), 744–757 (2016) 12. M.A. González, J.J. Palacios, C.R. Vela, A. Hernández-Arauzo, Scatter search for minimizing weighted tardiness in a single machine scheduling with setups. J. Heurist. 23(2), 81–110 (2017) 13. E. da Jiang, L. Wang, An improved multi-objective evolutionary algorithm based on decomposition for energy-efficient permutation flow shop scheduling problem with sequence-dependent setup time. Int. J. Product. Res. 57(6), 1756–1771 (2019) 14. E. Jiang, L. Wang, J. Lu, Modified multiobjective evolutionary algorithm based on decomposition for low-carbon scheduling of distributed permutation flow-shop, in 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (2017), pp. 1–7

48

M. F. Tasgetiren et al.

15. D. Kalyanmoy et al., Multi Objective Optimization Using Evolutionary Algorithms (Wiley, New York, 2001) 16. A.R. Kan, Machine Scheduling Problems Classification, Complexity and Computations (Springer, Netherlands, 1976) 17. C. Lu, L. Gao, X. Li, Q. Pan, Q. Wang, Energy-efficient permutation flow shop scheduling problem using a hybrid multi-objective backtracking search algorithm. J. Cleaner Product. 144, 228–238 (2017) 18. S.A. Mansouri, E. Aktas, U. Besikci, Green scheduling of a two-machine flowshop: trade-off between makespan and energy consumption. Eur. J. Oper. Res. 248(3), 772–788 (2016) 19. G. Mavrotas, Effective implementation of the ε-constraint method in multi-objective mathematical programming problems. Appl. Math. Comput. 213(2), 455–465 (2009) 20. G. Mouzon, M.B. Yildirim, A framework to minimise total energy consumption and total tardiness on a single machine. Int. J. Sustain. Eng. 1(2), 105–116 (2008) 21. G. Mouzon, M.B. Yildirim, J. Twomey, Operational methods for minimization of energy consumption of manufacturing equipment. Int. J. Product. Res. 45(18–19), 4247–4271 (2007) 22. M. Nawaz, E.E. Enscore, I. Ham, A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega 11(1), 91–95 (1983) 23. T. Okabe, Y. Jin, B. Sendhoff, A critical survey of performance indices for multi-objective optimisation, in The 2003 Congress on Evolutionary Computation, 2003. CEC ’03, vol. 2 (2003), pp. 878–885 24. I. Osman, C. Potts, Simulated annealing for permutation flow-shop scheduling. Omega 17(6), 551–557 (1989) 25. H. Öztop, M. Fatih Tasgetiren, D. Türsel Eliiyi, Q.K. Pan, Green permutation flowshop scheduling: a trade- off- between energy consumption and total flow time, in Intelligent Computing Methodologies, ed. by D.S. Huang, M.M. Gromiha, K. Han, A. Hussain (Springer International Publishing, Cham, 2018), pp. 753–759 26. Q.K. Pan, L. Wang, B. Qian, A novel differential evolution algorithm for bi-criteria no-wait flow shop scheduling problems. Comput. Oper. Res. 36(8), 2498–2511 (2009). Constraint Programming 27. R. Ruiz, C. Maroto, A comprehensive review and evaluation of permutation flowshop heuristics. Eur. J. Oper. Res. 165(2), 479–494 (2005) 28. A. Subramanian, M. Battarra, C.N. Potts, An iterated local search heuristic for the single machine total weighted tardiness scheduling problem with sequence-dependent setup times. Int. J. Product. Res. 52(9), 2729–2742 (2014) 29. E. Taillard, Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2), 278–285 (1993) 30. K. Tan, C. Goh, Y. Yang, T. Lee, Evolving better population distribution and exploration in evolutionary multi-objective optimization. Eur. J. Oper. Res. 171(2), 463–495 (2006) 31. M. Tasgetiren, Q.K. Pan, D. Kizilay, K. Gao, A variable block insertion heuristic for the blocking flowshop scheduling problem with total flowtime criterion. Algorithms 9(4), 71 (2016) 32. M.F. Tasgetiren, U. Eliiyi, H. Öztop, D. Kizilay, Q.K. Pan, An energy-efficient single machine scheduling with release dates and sequence-dependent setup times, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18 (ACM, New York, 2018), pp. 145–146 33. M.F. Tasgetiren, Q. Pan, D. Kizilay, M.C. Velez-Gallego, A variable block insertion heuristic for permutation flowshops with makespan criterion, in 2017 IEEE Congress on Evolutionary Computation (CEC) (2017), pp. 726–733 34. M.F. Tasgetiren, Q. Pan, Y. Ozturkoglu, A.H.L. Chen, A memetic algorithm with a variable block insertion heuristic for single machine total weighted tardiness problem with sequence dependent setup times, in 2016 IEEE Congress on Evolutionary Computation (CEC) (2016), pp. 2911–2918 35. J. Wang, L. Wang, A knowledge-based cooperative algorithm for energy-efficient scheduling of distributed flow-shop. IEEE Trans. Syst. Man Cybern.: Syst. 1–15 (2018)

3 A Variable Block Insertion Heuristic …

49

36. H. Xu, Z. Lü, T.C.E. Cheng, Iterated local search for single-machine scheduling with sequencedependent setup times to minimize total weighted tardiness. J. Sched. 17(3), 271–287 (2014) 37. M.M. Yenisey, B. Yagmahan, Multi-objective permutation flow shop scheduling problem: literature review, classification and current trends. Omega 45, 119–135 (2014) 38. R. Zhang, R. Chiong, Solving the energy-efficient job shop scheduling problem: a multiobjective genetic algorithm with enhanced local search for minimizing the total weighted tardiness and total energy consumption. J. Cleaner Product. 112, 3361–3375 (2016) 39. L.C. Zhong, B. Qian, R. Hu, C.S. Zhang, The hybrid shuffle frog leaping algorithm based on cuckoo search for flow shop scheduling with the consideration of energy consumption, in Intelligent Computing Theories and Application, ed. by D.S. Huang, V. Bevilacqua, P. Premaratne, P. Gupta (Springer International Publishing, Cham, 2018), pp. 649–658 40. E. Zitzler, Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications, vol. 63 (Shaker, Ithaca, New York, 1999)

Chapter 4

Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems Using Binary Genetic Algorithm Ozgur Kabadurmus, M. Fatih Tasgetiren, Hande Oztop, and M. Serdar Erdogan

Abstract The multi-dimensional knapsack problem (MDKP) is a well-known NPhard problem in combinatorial optimization. As it has various real-life applications, the MDKP has been intensively studied in the literature. On the other hand, far too little attention has been paid to the multi-objective version of the MDKP. In this chapter, we consider the bi-objective multi-dimensional knapsack problem (BOMDKP). We propose a Binary Genetic Algorithm (BGA) with an external archive for the problem. Our proposed BGA algorithm also employs a binary local search. The non-dominated solution sets are obtained for various bi-objective benchmark instances with 100, 250, 500 and 750 items, by employing the proposed BGA. Then, the performance of the BGA is compared with other multi-objective algorithms from the literature, i.e., MOEA/D and MOFPA. Furthermore, it is observed that the Pareto-optimal solution set provided by Zitzler and Laumans for 500 items and 2 knapsacks includes 30 dominated solutions. Also, the Pareto-optimal solutions for the scenario with 750 items are not reported in Zitzler and Thiele [43]. Hence, the true Pareto-optimal solution sets are found for all benchmark problem instances using Improved Augmented Epsilon Constraint (AUGMECON2) method. The non-dominated solution sets of the BGA, MOEA/D and MOFPA are compared with the Pareto-optimal solution sets O. Kabadurmus · M. S. Erdogan Department of International Logistics Management, Yasar University, Izmir, Turkey e-mail: [email protected] M. S. Erdogan e-mail: [email protected] M. F. Tasgetiren (B) Department of Mechanical and Industrial Engineering, Qatar University, Doha, Qatar e-mail: [email protected] H. Oztop Department of Industrial Engineering, Yasar University, Izmir, Turkey e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_4

51

52

O. Kabadurmus et al.

for all test instances. The computational results indicate that the proposed BGA is more effective to solve the BOMDKP than the best-performing algorithms from the literature.

4.1 Introduction The one-dimensional knapsack problem is a widely studied combinatorial optimization problem in the literature. In the KP, there is a set of n items, where each item i has a pre-defined profit pi and weight wi . The objective of the problem is to select a subset of items that maximizes the total profit without exceeding the total weight capacity b. Contrary to the one-dimensional KP, the multi-dimensional knapsack problem (MDKP) aims to satisfy more than one capacity constraint. Therefore, the MDKP is harder to solve than the single dimensional version. MDKP can be seen in many real-life applications such as stock-cutting, project selection, cargo loading [16], shelf space allocation in retail stores [40], and scheduling of computer programs [19]. In this study, we consider the bi-objective version of the MDKP (BOMDKP), which is rarely studied in the current literature. We propose a simple Binary Genetic Algorithm (BGA) with an external archive to solve the BOMDKP. Our proposed methodology is tested over various benchmark problems and the performance of our algorithm is compared with the two best-performing algorithms from the literature, namely, Multi-Objective Firefly Algorithm with Particle Swarm Optimization (MOFPA) of Zouache et al. [46] and MOEA/D of Zhang and Li [41]. Note that, we also obtained Pareto-optimal solution sets for all benchmark instances by employing Improved Augmented Epsilon Constraint (AUGMECON2) method proposed by Mavrotas and Florios [29]. To show the effectiveness of our proposed algorithm, we compared the non-dominated solution sets of BGA, MOEA/D and MOFPA with the Pareto-optimal solution sets. We observed that the Pareto-optimal solution set provided by Ziztler and Laumans for the benchmark scenario of 2 knapsacks with 500 items contains 30 dominated solutions. We also found the Pareto-optimal solution set of the benchmark scenario of 2 knapsacks with 750 items because it was not previously reported in the literature1 . The rest of this chapter is organized as follows: Sect. 4.2 presents a detailed literature review. Section 4.3 formally defines the problem. Sect. 4.4 explains the proposed algorithm. Section 4.5 presents the computational results of the study. Finally, the conclusion is given in Sect. 4.6 as well as future research opportunities.

1 All data files, computational results and Pareto front graphs are given as supplementary materials,

and can be found at https://okabadurmus.yasar.edu.tr/research/meta2018-solutions/.

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

53

4.2 Literature Review Single-objective MDKP has been investigated in many studies in the previous years. Liu et al. [23] applied a binary differential search algorithm, which uses Brownian motion like random search for solution generation and a feasible solution generation strategy. Meng and Pan [32] proposed an improved fruit fly optimization algorithm employing balance exploitation and exploration for MDKP. Moreover, Meng et al. [31] developed an improved migrating birds optimization algorithm. Lai et al. [21] developed a two-phase tabu-evolutionary algorithm that integrates two solution-based tabu search for MDKP. Chih [5] developed a self-adaptive check and repair operator-based particle swarm optimization in which the pseudo-utility ratio dynamically changes. Chih [6] proposed a three ratio self-adaptive check and repair operator based on particle swarm optimization, where the substitute pseudoutility ratios are systematically changed. Furthermore, Gorski et al. [15] formulated a greedy algorithm to solve MDKP. The aim of the multi-objective combinatorial optimization is to find a set of Paretooptimal solutions for multiple conflicting objectives instead of finding a single optimal solution for one objective. Selection of transportation investment alternatives [36], capital budgeting [17], and planning the recovery of contaminated light station facilities [18] are some application areas of multi-objective knapsack problem (MOKP). Various studies are conducted on MOKP in the literature. Visee et al. [38] applied a branch and bound approach, while Gandibleux and Freville [12] used a Tabu Search algorithm to solve bi-objective KP. Vianna and Arroyo [37] developed a Greedy Randomized Adaptive Search Procedure (GRASP), which finds solutions according to a predefined preference vector and applies local search to find solutions at each iteration. Lust and Teghem [25] applied a two-phase Pareto local search. Kim et al. [20] presented a quantum-inspired multi-objective evolutionary algorithm. Moreover, Lu and Yu [24] proposed an adaptive population multi-objective quantuminspired evolutionary algorithm in which the population size adaptively changes. Gao et al. [13] presented a quantum-inspired artificial immune system (MOQAIS). Cerqueus et al. [4] combined a set of existing branching heuristic to solve the biobjective KP. At each node, the most appropriate heuristic is selected and used with an adaptive selection strategy. Mansour et al. [27] proposed a multi-population based cooperative framework, W-CMOLS, that is using indicator-based multi-objective local search to solve MOKP. Moreover, Mansour and Alaya [26] introduced an indicator-based ant colony optimization. Figueira et al. [10] proposed four different exact solution methods to solve the MOKP up to seven objectives. Rong and Figueira [34] presented a traditional and a hybrid dynamic programming (DP) algorithms for finding true Pareto-optimal solutions of MOKP. Laumans et al. [22] proposed the adaptive ε-constraint method for MOKP. Bazgan et al. [3] proposed a DP algorithm, which uses a dominance relation to eliminate the solutions that cannot lead to a new non-dominated criterion vector. Figueira et al. [9] proposed an algorithmic improvement method, which makes use of lower and upper bounds in DP. Furthermore, Zitzler and Thiele [44, 45] proposed strength Pareto evolutionary algorithm

54

O. Kabadurmus et al.

(SPEA), which stores Pareto-optimal solutions externally and implements clustering to reduce the number of Pareto-optimal solutions to solve MOKP. Zitzler et al. [42] proposed SPEA2 employing an improved fitness assignment technique and new density based selection and archive truncation strategies. In [42], both SPEA and SPEA2 are tested on knapsack problem instances and SPEA2 outperformed SPEA. Furthermore, Zhang and Li [41] developed a multi-objective evolutionary algorithm based on decomposition (MOEA/D) for MOKP. Multi-objective multidimensional knapsack problem (MOMDKP) is a developing area in the knapsack literature and very few studies have investigated MOMKDP. To the best of our knowledge, only Zouache et al. [46], Zhang and Li [41], Mavrotas et al. [28], Mavrotas et al. [30] and Florios et al. [11] worked on MOMDKP. Zouache et al. [46] introduced a novel multi-objective algorithm (MOFPA), which combines Particle Swarm Optimization (PSO) and Firefly Algorithm (FA). Zhang and Li [41] developed MOEA/D algorithm for multi-objective multi-dimensional knapasack problem. Similar to our study, both Zouache et al. [46] and Zhang and Li [41] tested the performances of their algorithms on the benchmark instances taken from the library published by Zitzler and Thiele [45]. Florios et al. [11] developed an exact multi-criteria branch and bound algorithm for MOMDKP with three objectives and three constraints. Both Mavrotas et al. [28] and Mavrotas et al. [30] proposed a heuristic algorithm exploiting “core concept” which was firstly introduced by Balas and Zemel [2]. The “core concept” considers only a core of items with medium efficiency ( pi /wi ) to reduce the complexity of the problem. While items with high efficiencies are selected, none of the items with low efficiencies is selected. The “core concept” helps to reduce the search space of the problem, and thereby, reduce the complexity. Mavrotas et al. [28] combined branch-and-bound algorithm with an extension of “core concept” to solve bi-objective MDKP. The algorithm of Mavrotas et al. [30] is not limited to two objectives and can handle problems with more objectives. In this chapter, we consider the bi-objective multi-dimensional knapsack problem and we propose an efficient binary genetic algorithm for the BOMDKP.

4.3 Problem Formulation The 0-1 BOMDKP considers selecting a subset of n items from a given set of items such that two objectives are maximized without exceeding a set of knapsack capacity constraints. The mathematical model of the general MOMDKP is defined below. Although some studies in the literature (such as [43–45]) considered the same number of constraints as the number of objective functions, the mathematical model presented here can have different numbers of objectives and constraints. Thus, it provides a more general formulation of the problem. Note that the objective functions are conflicting with each other as explained by Zitzler and Thiele [45]. In other words, as the first objective function value increases, the second one reduces. The notations and parameters used in the mathematical formulation are given below:

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

55

k: The number of objectives m : The number of constraints n : The number of items ai j : Weight of item j in constraint i bi : Weight limit of constraint i p kj : Profit of item j in objective k Decision variables are the following:  1 if item jis included in the knapsack xj = 0 other wise The mathematical formulation is presented below: max z 1 (x1 , x2 , ..xn ) =

n 

p 1j x j

j=1

max z 2 (x1 , x2 , ..xn ) =

n 

p 2j x j

j=1

... max z k (x1 , x2 , ..xn ) =

(4.1)

... n 

p kj x j

j=1

s.t. n 

ai j x j ≤ bi , ∀i ∈ {1, 2, ..., m}

(4.2)

j=1

x j ∈ {0, 1} , ∀ j ∈ {1, 2, ..., n}

(4.3)

In the mathematical model, the total profit of each objective function is maximized in Eq. (4.1). In this multi-objective model, there can be k different objective functions. Constraint (4.2) guarantees that the total weights of all selected items do not exceed the knapsack capacities. Constraint (4.3) ensures that all decision variables are binary. As our problem is a multi-objective problem, we used the dominance relation concepts [8] for solving the BOMDKP as in Definition 4.1. Definition 4.1 Dominance Relations (assuming that all objectives are to be maximized). • Solution si dominates s j (denoted as si  s j ) if the following two conditions are satisfied: f k (si ) ≥ f k (s j ), ∀k (4.4) f k (si ) > f k (s j ), ∃k

56

O. Kabadurmus et al.

• Solution si weakly dominates solution s j (denoted as si ≥ s j ) if the following condition is satisfied: f k (si ) ≥ f k (s j ), ∀k (4.5) • Solution si is indifferent to s j (denoted as si ∼ s j ) if the following two conditions are satisfied: f k (si ) > f k (s j ), ∃k (4.6) f k (s j ) > f k (si ), ∃k The definitions of non-dominated set and Pareto-optimal set are given in Definitions 4.2 and 4.3, respectively. Definition 4.2 Non-Dominated Set. Amongst a set of solutions X , the non-dominated set of solutions are the elements of the set X ∗ that are not dominated by any element of the set X . Definition 4.3 Pareto-Optimal Set. The non-dominated set of the entire feasible search space S is called as the Paretooptimal set. There are common solution methods for multi-objective problems such as sequential optimization, goal programming, weighting method and ε-constraint method. In this study, we use Improved Augmented ε-Constraint Method since it effectively generates all Pareto-optimal solutions [29]. In Pareto-optimal solutions, any objective function cannot be improved without deteriorating another objective function. In AUGMECON2 method, one of the objective functions is optimized and the other objective functions are defined as constraints. Note that, slack variables are included in these objective function constraints and they are included in the objective function as additional terms. This new information from the slack variables is updated at every iteration and helps AUGMECON2 to enumerate Pareto-optimal solutions more effectively than the other ε-constraint methods. More details on AUGMECON2 can be found in Mavrotas and Floris [29].

4.4 Bi-Objective BGA Genetic algorithms (GA) are search heuristic algorithms, that are inspired from the biological process of natural selection and evolution [14]. In this study, we propose a binary GA (BGA) algorithm for the BOMDKP. At the beginning of our proposed BGA algorithm, the initial population of size N P is randomly constructed. At each generation, for each individual in the population, another individual is randomly chosen and an offspring is obtained using uniform crossover operator. Then, this offspring is evaluated. Note that, the Superiority of Feasibility (SF) rule by Deb [7] is employed to update both parent population and the archive set . Note that the archive set  stores all non-dominated solutions that

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

57

are found by the algorithm and updated at each generation. This update procedure is repeated for each individual in the population. Afterwards, a local search (as proposed in [39]) is applied to each individual in the population. Additionally, each individual is mutated by changing binary string to one or zero randomly with a small probability. Each individual is a solution of the BOMDKP, which is represented by an n-bit binary string. In the BGA algorithm, the solutions are represented by a string of zero or one. If item j is selected into the knapsack, then, j = 1; otherwise, j = 0. An initial population of N P individuals are randomly generated with an equal probability of 0.5. In other words, if a uniform random number r (0, 1) is less than 0.5, the decision variable is assigned to 1. Otherwise, it is assigned to 0. In addition, crossover ( pC[i]) and mutation ( pM[i]) probabilities are assigned to 0.5 and 0.01. The outline of the procedure to construct initial population is given in Algorithm 4.1. Algorithm 4.1 Initialization of the BGA 1: for i=1 to NP do //NP: Population Size 2: pC[i] = 0.5 3: pM[i] = 0.05 4: for j = 1 to n do 5: ri j = UNIF(0,1) 6: if ri j < 0.5 then 7: xj = 1 8: else 9: xj = 0 10: end if 11: end for 12: end for

Offspring population is obtained through a simple uniform crossover operator. In this crossover operator, we generate an offspring for each individual xi by randomly selecting another individual xk from the population and applying a uniform crossover with a crossover probability pC[i] as in (4.7), where ri j is a uniform random number in U (0, 1). And, the crossover probability is drawn from the unit normal distribution N (0.5, 0.1) for each individual xi in the population. Therefore, crossover probability pC[i] becomes mostly around 0.5 and deviating from 0.2 to 0.8. Note that, all parameter values of the BGA are determined after an extensive experimental design process.  xi j if ri j ≤ pC[i] ∗ xi j = (4.7) xk j other wise After applying the crossover operator, parent population and the archive set are updated according to the procedure given in Algorithm 4.2. Note that, SF_Domination rule is explained in Algorithm 4.4. After having carried out crossovers for all individuals in the population, we mutate the population with a small mutation probability as in (4.8), where ri j is a uniform

58

O. Kabadurmus et al.

Algorithm 4.2 Updating the parent population and archive set 1: for i = 1to N P do xi∗ if xi∗ S F_Dominates xi 2: xi = xi other wise 3: end for

random number in U (0, 1). And, the mutation probability pM[i] is drawn from the unit normal distribution N (0.05, 0.01) for each individual xi in the population.  xi j =

0 if ri j ≤ pM[i] 1 other wise

(4.8)

We also apply a binary local search to the parent population. Note that, the following binary local search is presented by Wang et al. [39], and they showed the efficiency of their local search for the single objective MDKP through a design of experiment. In this binary local search, we randomly select three items for each individual in the population. Then, we flip these items to 0 if the item is 1, or 1 if the item is 0. The binary local search is presented in Algorithm 4.3. Algorithm 4.3 Binary local search 1: for i = 1 to N P do 2: xi∗ = xi 3: repeat 4: a = D_U N I F[1, n] //Discrete uniform random number between 1 and n 5: b = D_U N I F[1, n] 6: c = D_U N I F[1, n] 7: until a = b = c ∗ , x ∗ and x ∗ to either 0 or 1 8: Flip xi,a i,b i,c 9: Compute objective functions of xi∗  x ∗ if xi∗ S F_Dominates xi 10: xi = i xi other wise 11: end for

As the search process can benefit from the infeasible solutions, we do not repair the infeasible solutions [7]. As mentioned before, since the studied BOMDKP is a constrained optimization problem, we use the superiority of feasible solutions rule [7] in this study. However, due to the multi-objective structure of the problem, the evaluation of two individuals is slightly different [7]. When two individuals are compared, the following three cases can be observed: (1) one of the solutions is feasible while the other one is infeasible; (2) both solutions are infeasible; and (3) both solutions are feasible. Consequently, for the studied constrained BOMDKP, we modified SF rule of NSGA-II [8] to make the definition of domination between two

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

59

solutions as presented in Definition 4.4. If x ∗ dominates x (x ∗  x), the external archive set  is updated according to SF_Domination rule. Definition 4.4 An individual x ∗ is considered to be SF_Dominating an individual x if one of the following conditions is met: 1. Individual x ∗ is feasible and individual x is infeasible. Hence, x ∗ is preferred to x by SF_Domination rule. 2. Both individuals are infeasible, but individual x ∗ has a smaller constraint violation. Hence, x ∗ is preferred to x according to SF_Domination rule. 3. Both individuals, x ∗ and x are feasible and individual x ∗ dominates individual x. Based on the concept given in Definition 4.4, we propose a selection procedure named as SF_Domination rule, which is presented in Algorithm 4.4. Algorithm 4.4 SF domination rule 1: for i = 1 to N P do 2: if xi∗ .violation = 0 and xi .violation > 0 then 3: xi = xi∗ 4: else if xi∗ .violation > 0 and xi .violation > 0 then 5: if xi∗ .violation < xi .violation then 6: xi = xi∗ 7: end if 8: else if xi∗ .violation = 0 and xi .violation = 0 then 9: if xi∗  xi then 10: xi = xi∗ 11: end if 12: end if 13: end for

During the algorithm, the non-dominated solutions are stored in the archive set . This archive set is updated with new non-dominated solutions during the search process to approximate Pareto-optimal solutions. When a new non-dominated solution is found, it is included in the archive set  and any dominated member is removed from the set. In order to update the archive set , Pan et al. [33] proposed an effective method as follows. The non-dominated solutions in  are stored in decreasing order of their first objective function values. Then, their second objective values become sorted in increasing order. The procedure for updating the archive set  is summarized in Algorithm 4.5. If any of the cases stated in Algorithm 4.5 is satisfied, solution x is added to position pos, but all solutions dominated by x in  should be removed. The procedure explained in Algorithm 4.6 removes the dominated solutions from .

60

O. Kabadurmus et al.

Algorithm 4.5 Updating external archive set 1: Step 1. Archive size is m = || and  = {a1 , a2 , ..., am }. Initially,  is empty and the first non-dominated solution x will be added to the first position in . Let k = 1. 2: Step 2. Find a most suitable position pos for the next individual x in the archive set  by the following procedure : 3: repeat 4: j = (k + m)/2 5: if f 1 (x) = f 1 (a j ) then 6: j= j 7: else if f 1 (x) > f 1 (a j ) then 8: m = j −1 9: else 10: k = j +1 11: end if 12: until k > m 13: Step 3. When comparing f 1 (x) with f 1 (a j ), following cases may occur: 14: Case 1. 15: if f 1 (x) = f 1 (a j ) and f 2 (x) > f 2 (a j ) then 16: pos = j 17: end if 18: Case 2. 19: if f 1 (x) > f 1 (a j ) then 20: if j = 1 then 21: pos = j 22: m =m+1 23: else if j > 1 and f 2 (x) > f 2 (a j−1 ) then 24: pos = j 25: m =m+1 26: end if 27: end if 28: Case 3. 29: if f 1 (x) < f 1 (a j ) and f 2 (x) > f 2 (a j ) then 30: pos = j + 1 31: m = m + 1 32: end if

Algorithm 4.6 Removing the dominated solutions from external archive Step 1. if pos = m then go to Step 4 end if Step 2. Let pos = pos + 1. if f 2 (a pos ) ≤ f 2 (x) then remove a pos else go to Step 4 end if Step 3. if pos < m then go to Step 2 end if Step 4.  = Report non-dominated solutions

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

61

4.5 Computational Results To evaluate the performance of our algorithm, we used the benchmark problem instances of Zitzler and Thiele [43]. We tested our algorithm on four different benchmark instances with 100, 250, 500, and 750 items having 2 knapsacks. We executed 30 replications with different random seeds for each instance. Non-dominated solution sets were found in all replications and then these sets were united into a single non-dominated solution set according to the dominance relation. We also applied this procedure to the results of 30 replications of MOFPA of Zouache et al. [46] and MOEA/D of Zhang and Li [41] and compared them with our BGA algorithm. In addition, we compared the non-dominated solutions of our algorithm with the Pareto-optimal solutions that we obtained by using AUGMECON2. Note that, the Pareto-optimal solutions reported by Zitzler and Thiele [43] for the scenario with 500 items include 30 dominated solutions. Also, the Pareto-optimal solutions for the scenario with 750 items have not been reported in the literature. All data files, computational results and Pareto-optimal solutions are given as supplementary materials, and can be found at https://okabadurmus.yasar.edu.tr/research/meta2018-solutions/. Our BGA algorithm was coded in Visual C++ 13 and carried out on an Intel(R) Core(TM) i7-2600 CPU with 3.40 GHz PC with 8.00 GB memory. The population size is taken as N P = 100. For each benchmark problem, 30 replications were carried out. MOFPA and MOEA/D results are taken from Zouache et al. [46]. And, parameter values of these algorithms can be found in Zouache et al. [46]. Note that each replication of MOFPA and MOEA/D are run for 300,000 objective function evaluations. Each replication of our proposed BGA is terminated after reaching the maximum CPU time, i.e., T max = 100n ms, where n is the number of items in the knapsack. Since we found the Pareto-optimal sets for the benchmark instances, we used the performance measures explained in Definition 4.5 to compare the performances of the three algorithms (BGA, MOFPA and MOEA/D). Note that, N refers to the non-dominated solution set of any algorithm and P refers to the Pareto-optimal set. Definition 4.5 Performance measures for comparing BGA, MOFPA and MOEA/D. • Ratio of the Pareto-optimal solutions found: R p = |N ∩ P|/|P| • Inverted Generational Distance [1]: I G D = v∈P d(v, N )/|P|, where d(v, N ) denotes the minimum Euclidean distance between v and the solution in N . The low I G D value means that set N is  very close to set P.   ¯ 2 }/d, ¯ where d¯ = i di / • Distribution Spacing [35]: DS N = { 1/N i∈N (di − d) |N |, and di is the minimum Euclidean distance between solution i and its closest neighbour in N . Low spacing value indicates that the solutions in N are evenly distributed.

Table 4.1 report the results of R p , IGD and DS measures of BGA, MOFPA and MOEA/D algorithms for the instances with 100, 250, 500, and 750 items. Since Zouache et al. [46] did not report the results of MOFPA and MOEA/D for 100 items instance, we compared our BGA algorithm with the true Pareto-optimal set.

62

O. Kabadurmus et al.

Table 4.1 Comparison of BGA, MOFPA, and MOEA/D with Pareto-optimal set for all instances Instance Algorithm Rp IGD DS 100 items 250 items

500 items

750 items

BGA BGA MOFPA MOEA/D BGA MOFPA MOEA/D BGA MOFPA MOEA/D

0.97 0.67 0.12 0.00 0.32 0.01 0.00 0.03 0.01 0.00

0.43 2.98 7.63 27.55 38.15 15.95 129.76 184.98 105.77 336.71

0.94 1.44 0.85 0.88 0.99 1.04 0.68 1.4 1.29 0.73

As shown in Table 4.1, our algorithm is able to find 97% of the solutions in the Pareto-optimal set for the instance with 100 items. MOEA/D could not find any Pareto-optimal solution in all instances. For the benchmark instance with 250 items, our algorithm finds 67% of the solutions in the Pareto-optimal set, whereas MOFPA finds 12% of the Pareto-optimal solutions. For the benchmark instance with 500 items, BGA finds 32% of the solutions in the Pareto-optimal set, while MOFPA finds 1% of the Pareto-optimal solutions. For the benchmark instance with 750 items, our algorithm finds 3% of the solutions in the Pareto-optimal set, whereas MOFPA finds 1% of the Pareto-optimal solutions. Therefore, according to R p metric, our algorithm outperforms MOFPA and MOEA/D. In terms of IGD metric, MOFPA has the lowest IGD value on average, whereas MOEA/D has the highest IGD values for all instances. However, the BGA has the lowest IGD value for the instance with 250 items and it has very low IGD value (0.43) for the instance with 100 items. This indicates that the BGA provides very close approximations to the Pareto-optimal sets. In terms of distribution spacing, the solutions of MOEA/D are more evenly distributed than the other algorithms due to the lowest average DS value. However, the DS values of all algorithms are very close to each other. Therefore, it can be said that BGA also has a good distribution as the differences between the average DS values of BGA, MOFPA and MOEA/D are very low. Figures 4.1 to 4.2 show the Pareto front comparisons of the BGA and the Paretooptimal solutions for 2 knapsacks scenarios with 100, 250, 500, and 750 items, respectively. According to Fig. 4.1, BGA solutions are very close to the Paretooptimal solutions for the scenarios with 100 and 250 items. However, BGA has difficulty in finding extreme solutions that are close to the upper and lower bounds of two objectives. As the problem size increases (Fig. 4.2, the Pareto front of the BGA becomes more distant to the two extremes of the solution space. However, the solutions of BGA are closely located on the middle section of the Pareto-optimal fronts.

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

63

Fig. 4.1 Comparison of BGA with the Pareto-optimal solutions for the scenario of 2 knapsacks with 100 and 250 items

Fig. 4.2 Comparison of BGA with the Pareto-optimal solutions for the scenario of 2 knapsacks with 500 and 750 items

Figures 4.3 and 4.4, show the Pareto front comparisons of BGA with MOFPA and MOEA/D for 2 knapsacks scenarios with 250, 500, and 750 items, respectively. BGA shows better performance than the other two algorithms for all problem sizes with respect to closeness to Pareto-Optimal solutions. As the problem size increases, the performance difference of BGA becomes more visible. However, as we mentioned above, the solutions of BGA are distributed around the middle of the Pareto-optimal fronts. However, MOFPA and MOEA/D do not have this issue, their solutions are distributed across the solution space including the lower and upper bounds of the objective functions. On the other hand, BGA performs significantly better than MOFPA and MOEA/D on the middle sections of the Pareto-fronts in all instances.

64

O. Kabadurmus et al.

Fig. 4.3 Comparison of BGA with MOFPA and MOEA/D for the scenario of 2 knapsacks with 250 items and 500 items Fig. 4.4 Comparison of BGA with MOFPA and MOEA/D for the scenario of 2 knapsacks with 750 items

4.6 Conclusion Bi-objective Multidimensional Knapsack Problem is an extension of the widely studied knapsack problem, which is an NP-hard combinatorial optimization problem. In BOMDKP, the goal is to obtain the Pareto-optimal solution set of two conflicting objective functions, by choosing a subset of items subject to the knapsack capacities. Even though the multi-objective single-dimensional and single-objective multidimensional versions of the knapsack problem have been widely studied in the literature, a little attention has been devoted to the multi-objective multi-dimensional knapsack problem. In this chapter, we presented a novel solution approach, Binary GA with External Archive, to solve the BOMDKP. We evaluated the performance of the proposed BGA on four different benchmark problems of Zitzler and Thiele [43]. Note that, it is observed that the Pareto-optimal solution set provided by Zitzler and Laumans for the scenario with 500 items includes 30 dominated solutions. Also, the Pareto-optimal solutions for the scenario with 750 items are not reported in Zitzler and Thiele [43].

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

65

Therefore, we found the Pareto-optimal solution sets for all benchmark instances using Improved Augmented ε-Constraint method. Then, we compared our results with the ones of MOFPA of Zouache et al. [46] and MOEA/D of Zhang and Li [41] in terms of the ratio of Pareto-optimal solutions found, Inverted Generational Distance, and Distribution Spacing. The results show that our proposed BGA outperforms the MOFPA and MOEA/D in all benchmark instances. While the proposed BGA finds almost all Pareto-optimal solutions for small instances, it obtains close solutions to the Pareto-optimal sets of the medium and large instances. All data files, computational results and Pareto front graphs are given as supplementary materials, and can be found at https://okabadurmus.yasar.edu.tr/research/meta2018-solutions/. As a future work, the proposed BGA of this study can be improved by using the “core concept”. Other efficient local search procedures can be embedded into BGA to improve the performance of the algorithm. Furthermore, the proposed BGA can be modified to solve three-objective multi-dimensional knapsack problem.

References 1. C.A.C. Coello, D.A. Van Veldhuizen, G.B. Lamont, Evolutionary Algorithms for Solving MultiObjective Problems, 2nd edn. (Springer, Berlin, 2007) 2. E. Balas, E. Zemel, An algorithm for large zero-one knapsack problems. Oper. Res. 28, 1130– 1154 (1980) 3. C. Bazgan, H. Hugot, D. Vanderpooten, Solving efficiently the 0–1 multi-objective knapsack problem. Comput. Oper. Res. 36(1), 260–279 (2009) 4. A. Cerqueus, X. Gandibleux, A. Przybylski, F. Saubion, On branching heuristics for the biobjective 0/1 unidimensional knapsack problem. J. Heurist. 23(5), 285–319 (2017) 5. M. Chih, Self-adaptive check and repair operator-based particle swarm optimization for the multidimensional knapsack problem. Appl. Soft Comput. 26, 378–389 (2015) 6. M. Chih, Three pseudo-utility ratio-inspired particle swarm optimization with local search for multidimensional knapsack problem. Swarm Evolut. Comput. 39, 279–296 (2018) 7. K. Deb, An efficient constraint handling method for genetic algorithms. Comput. Methods Appl. Mech. Eng. 186(2), 311–338 (2000) 8. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolut. Comput. 6(2), 182–197 (2002) 9. J. Figueira, L. Paquete, M. Simoes, D. Vanderpooten, Algorithmic improvements on dynamic programming for the bi-objective 0,1 knapsack problem. Comput. Optim. Appl. 56, 97–111 (2013) 10. J.R. Figueira, G. Tavares, M.M. Wiecek, Labeling algorithms for multiple objective integer knapsack problems. Comput. Oper. Res. 37(4), 700–711 (2010) 11. K. Florios, G. Mavrotas, D. Diakoulaki, Solving multiobjective, multiconstraint knapsack problems using mathematical programming and evolutionary algorithms. Eur. J. Oper. Res. 203(1), 14–21 (2010) 12. X. Gandibleux, A. Freville, Tabu search based procedure for solving the 0–1 multiobjective knapsack problem: the two objectives case. J. Heurist. 6(3), 361–383 (2000) 13. J. Gao, G. He, R. Liang, Z. Feng, A quantum-inspired artificial immune system for the multiobjective 0–1 knapsack problem. Appl. Math. Comput. 230, 120–137 (2014) 14. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. (Addison-Wesley Longman Publishing Co., Inc, Boston, 1989) 15. J. Gorski, L. Paquete, F. Pedrosa, Greedy algorithms for a class of knapsack problems with binary weights. Comput. Oper. Res. 39(3), 498–511 (2012)

66

O. Kabadurmus et al.

16. B. Haddar, M. Khemakhem, S. Hanafi, C. Wilbaut, A hybrid quantum particle swarm optimization for the multidimensional knapsack problem. Eng. Appl. Artif. Intell. 55, 1–13 (2016) 17. J. Rosenblatt, Z. Sinuany-Stern, Generating the discrete efficient frontier to the capital budgeting problem. Oper. Res. 37, 384–394 (1989) 18. L. Jenkins, A bicriteria knapsack program for planning remediation of contaminated lightstation sites. Eur. J. Oper. Res. 140(2), 427–433 (2002) 19. H. Kellerer, U. Pferschy, D. Pisinger, Knapsack Problems (Springer, Berlin, 2004) 20. Y.H. Kim, J.H. Kim, K.H. Han, Quantum-inspired multiobjective evolutionary algorithm for multiobjective 0/1 knapsack problems, in 2006 IEEE International Conference on Evolutionary Computation, vol. 2601–2606 (2006), pp. 2601–2606 21. X. Lai, J.K. Hao, F. Glover, Z. Lü, A two-phase tabu-evolutionary algorithm for the 0-1 multidimensional knapsack problem. Inf. Sci. 436–437, 282–301 (2018) 22. M. Laumanns, L. Thiele, E. Zitzler, An efficient, adaptive parameter variation scheme for metaheuristics based on the epsilon-constraint method. Eur. J. Oper. Res. 169(3), 932–942 (2006) 23. J. Liu, C. Wu, J. Cao, X. Wang, K.L. Teo, A binary differential search algorithm for the 0–1 multidimensional knapsack problem. Appl. Math. Model. 40(23), 9788–9805 (2016) 24. T.C. Lu, G.R. Yu, An adaptive population multi-objective quantum-inspired evolutionary algorithm for multi-objective 0/1 knapsack problems. Inf. Sci. 243, 39–56 (2013) 25. T. Lust, J. Teghem, The multiobjective multidimensional knapsack problem: a survey and a new approach. Int. Trans. Oper. Res. 19(4), 495–520 (2010) 26. I.B. Mansour, I. Alaya, Indicator based ant colony optimization for multi-objective knapsack problem. Proc. Comput. Sci. 60, 448–457 (2015). Knowledge-Based and Intelligent Information and Engineering Systems 19th Annual Conference, KES-2015, Singapore, September 2015 Proceedings 27. I.B. Mansour, M. Basseur, F. Saubion, A multi-population algorithm for multi-objective knapsack problem. Appl. Soft Comput. 70, 814–825 (2018) 28. G. Mavrotas, J.R. Figueira, K. Florios, Solving the bi-objective multi-dimensional knapsack problem exploiting the concept of core. Appl. Math. Comput. 215(7), 2502–2514 (2009) 29. G. Mavrotas, K. Florios, An improved version of the augmented -constraint method (AUGMECON2) for finding the exact pareto set in multi-objective integer programming problems. Appl. Math. Comput. 219(18), 9652–9669 (2013) 30. G. Mavrotas, K. Florios, J.R. Figueira, An improved version of a core based algorithm for the multi-objective multi-dimensional knapsack problem: A computational study and comparison with meta-heuristics. Appl. Math. Comput. 270, 25–43 (2015) 31. T. Meng, J. Duan, Q. Pan, Q. Chen, An improved migrating birds optimization for solving the multidimensional knapsack problem, in 29th Chinese Control And Decision Conference (CCDC) (2017), pp. 4698–4703 32. T. Meng, Q.K. Pan, An improved fruit fly optimization algorithm for solving the multidimensional knapsack problem. Appl. Soft Comput. 50, 79–93 (2017) 33. Q.K. Pan, L. Wang, B. Qian, A novel differential evolution algorithm for bi-criteria no-wait flow shop scheduling problems. Comput. Oper. Res. 36(8), 2498–2511 (2009) 34. A. Rong, J.R. Figueira, Dynamic programming algorithms for the bi-objective integer knapsack problem. Eur. J. Oper. Res. 236(1), 85–99 (2014) 35. K. Tan, C. Goh, Y. Yang, T. Lee, Evolving better population distribution and exploration in evolutionary multi-objective optimization. Eur. J. Oper. Res. 171(2), 463–495 (2006) 36. J.Y. Teng, G.H. Tzeng, A multiobjective programming approach for selecting non-independent transportation investment alternatives. Transp. Res. Part B: Methodol. 30(4), 291–307 (1996) 37. D.S. Vianna, J.E.C Arroyo, A grasp algorithm for the multi-objective knapsack problem, in XXIV International Conference of the Chilean Computer Science Society (2004), pp. 69–75 38. M. Vis’ee, J. Teghem, M. Pirlot, E.L. Ulungu, Two-phases method and branch and bound procedures to solve the bi-objective knapsack problem. J. Glob. Optim. 12, 139–155 (1998) 39. L. Wang, X. Zheng, S. Wang, A novel binary fruit fly optimization algorithm for solving the multidimensional knapsack problem. Knowl.-Based Syst. 48, 17–23 (2013)

4 Solving 0-1 Bi-Objective Multi-dimensional Knapsack Problems …

67

40. M.H. Yang, An efficient algorithm to allocate shelf space. Eur. J. Oper. Res. 131(1), 107–118 (2001) 41. Q. Zhang, H. Li, MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evolut. Comput. 11(6), 712–731 (2007) 42. E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the strength pareto evolutionary algorithm. TIK-Report 103 (2001) 43. E. Zitzler, L. Thiele, Multiobjective optimization using evolutionary algorithms - a comparative case study, in Parallel Problem Solving from Nature - PPSN V, ed. by A.E. Eiben, T. Bäck, M. Schoenauer, H.P. Schwefel (Springer, Berlin, 1998), pp. 292–301 44. E. Zitzler, L. Thiele, An evolutionary algorithm for multiobjective optimization: the strength pareto approach. Technical Report: TIK43 (1999) 45. E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evolut. Comput. 3, 257–271 (1999) 46. D. Zouache, A. Moussaoui, F.B. Abdelaziz, A cooperative swarm intelligence algorithm for multi-objective discrete optimization with application to the knapsack problem. Eur. J. Oper. Res. 264(1), 74–88 (2018)

Chapter 5

An Asynchronous Parallel Evolutionary Algorithm for Solving Large Instances of the Multi-objective QAP Florian Mazière, Pierre Delisle, Caroline Gagné, and Michaël Krajecki

Abstract In this work, we propose APM-MOEA, a parallel model to efficiently solve the Multi-Objective Quadratic Assignment problem (MQAP) using an evolutionary algorithm. It is based on an island model with objective space division. Its main features are a global view from an organizer to achieve better distribution of solutions, an asynchronous communication scheme to reduce parallel overhead, control islands to improve diversity and a local search procedure to improve the quality of solutions. Extensive experiments have been conducted using the GISMOO algorithm to compare APM-MOEA with a recent specialized parallel algorithm and two state-of-the-art island-based models in the resolution of the MQAP. Results show that according to four multi-objectives metrics, APM-MOEA outperforms all implementations in terms of convergence or diversity.

5.1 Introduction Many approaches have been designed in recent years to efficiently tackle multiobjective optimization problems. Among these are a posteriori methods which aim to provide a set of compromise solutions to the decision maker. A set must often optimize each objective of the problem while proposing diversified solutions to be considered good. In order to approximate the Pareto front, which corresponds to the set of best trade-off solutions in the objective space, Multi-Objective Evolutionary F. Mazière · P. Delisle (B) · M. Krajecki Universite de Reims Champagne-Ardenne, CReSTIC EA 3804, 51097 Reims, France e-mail: [email protected] F. Mazière e-mail: [email protected] M. Krajecki e-mail: [email protected] C. Gagné Université du Québec à Chicoutimi, 555 boul. de l’Université, G7H 2B1 Chicoutimi, Canada e-mail: [email protected] 69 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_5

70

F. Mazière et al.

Algorithms (MOEAs) have been largely investigated in the fields of continuous and combinatorial optimization [1]. Unlike some other classical algorithms, MOEAs have the ability to provide many solutions in one single run and are less sensitive to the shape of the Pareto front [3]. As they often require a high amount of computing resources to explore large portions of the search space and handle complex real-life constraints, MOEAs can greatly benefit from today’s high-performance computing architectures. Thus, parallel Multi-Objective Evolutionary Algorithms (pMOEAs) have been proposed in the literature to reduce computation times and improve the quality of the obtained Pareto fronts [4]. Among the pMOEAs paradigms, master-slave and distributed models are certainly the most studied [3]. Master-slave models aim to reduce the execution time of MOEAs by parallelizing their main operations such as crossover, mutation and evaluation functions. Island models, also called distributed models, mainly aim to reach the Pareto front more effectively and improve the quality of obtained solutions. Some islands models use a divide-and-conquer strategy to distribute the approximation of the Pareto front and two of them are based on objective space division. The first model is the cone separation model [5]. It uses a geometrical approach to subdivide the objective space, which is effective if the Pareto optimal front is continuous and convex. The second model has been proposed by Streichert et al. [6] and applies a subdivision scheme based on a k-means clustering algorithm. All subpopulations are periodically gathered, clustered and redistributed onto the available islands. The main drawback of the approach is that the gathering of all subpopulations produces a high communication overhead, in particular with a high number of processors [7]. Although significant progress has been made in recent years in the design and improvement of parallel models for evolutionary algorithms, most of these models have limited scalability and ability to solve various problems. In fact, solving multi-objective combinatorial optimization problems efficiently on a large number of processors remains a challenge today [2]. This paper aims to propose a parallel model, namely APM-MOEA (Asynchronous Parallel Model for Multi-Objective Evolutionary Algorithms), which is based on an island model with objective space division and inspired by the works of Streichert et al. [6]. Its main features are the following: • • • • •

A specific clustering algorithm to divide the objective space; An organizer with a global view of the current search through a global archive; Asynchronous cooperation between islands to limit parallel overheads; Control islands to guide the exploration of the search space and improve diversity; A periodic use of a specific local search procedure to improve convergence.

This study is focused on the Multi-Objective Quadratic Assignment Problem (MQAP) [8] which has often been used to assess the performance of MOEAs. To handle this problem, APM-MOEA is used to parallelize GISMOO [9], an algorithm that has been proven efficient in solving classical multi-objective optimization problems. In order to evaluate the performance of the proposed model, solutions sets provided by APM-MOEA are compared with those obtained by PasMoQAP [10],

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

71

the cone separation model [5] and the original clustering model of Streichert et al. [6] in the resolution of small and large MQAP instances. The remainder of this paper is organized as follows: the next section provides an overview of the main existing approaches to solve the MQAP. In Sect. 5.3, the main features of the APM-MOEA model are outlined. Experimental conditions, results and analysis are exposed in Sect. 5.4. Finally, the last section offers some concluding remarks and research perspectives.

5.2 Related Works The Quadratic Assignment Problem (QAP) is one of the most studied combinatorial optimization problems and was first formulated by Koopmans et al. [13]. It can be described as follows: given a set of facilities and locations along with distances and flows between locations, the objective of the QAP is to assign each facility to a location in order to minimize the total flow cost. In the multi-objective formulation proposed by Knowles and Corne [8], named Multi-Objective QAP (MQAP), several kinds of flows are considered. Each objective function is associated with a specific kind of flows. The goal when solving the MQAP is to place the facilities on locations in a way that minimizes all the flow costs. More formally, the MQAP with n locations/facilities and k objectives can be stated as:   minimize C(π ) = C 1 (π ), C 2 (π ), ...., C M (π ) with C k (π ) =

n n  

ai j bπk i π j , k = 1, ...., M

(5.1)

i=1 j=1

where ai j is the distance between location i and location j, bπk i π j is the cost of the k − th flow from facility πi to facility π j and πi is the i-th in the permutation π . During the last few years, several works were carried out to efficiently solve the MQAP. In their respective studies, Garrett et al. [14] and López-Ibáñez [15] analyzed the performance of the hybridization of multi-objective algorithms with a local search procedure. Different strategies for applying local search were empirically compared, revealing that strategy performance is heavily dependent on resolved instances. These works also show the importance of balancing local search and genetic operations in a memetic algorithm. Drugan et al. [16] introduced a Pareto Local Search (PLS) method with a stochastic perturbation procedure to escape from local optimal solutions. Results on a set of biobjective MQAP instances show that their approach outperforms a straightforward multi-start PLS method. A parallel memetic algorithm, named PasMoQAP, was proposed by Sanhueza et al. [10] to solve the MQAP. It follows the outlines of the island model with migrations of solutions at regular intervals and applies a local research procedure after genetic

72

F. Mazière et al.

operations to improve the quality of solutions. On a set of MQAP instances with 60 decision variables and different numbers of islands, it outperforms an island model using NSGA-II. In addition to memetic algorithms, other types of approaches have been studied to solve the MQAP. Li et al. [17] proposed an elitist greedy algorithm, named mGRASP/MH, which uses a cooperation mechanism to improve solutions obtained by a local search procedure. Experiments on MQAP instances show that mGRASP/MH performs better than some other state-of-the-art multi-objective metaheuristic algorithms such as MOEA/D or NSGA-II. Özkale et al. [18] adapted several multiobjective ant colony algorithms and experimentally compared them to identify the best strategies, particularly for updating the pheromone. Finally, Samanta et al. [19] recently used a bee colony algorithm to handle a variation of the MQAP with two objectives, in which each objective function corresponds to a uni-objective QAP. Works dedicated to solve the MQAP are generally limited to instances with few decision variables. As this paper aims to solve this problem in a broader context, next section is dedicated to the proposal of a new parallel model that takes advantage of multiple processor cores to efficiently solve larger problem instances.

5.3 The APM-MOEA Model APM-MOEA is an island-based model which uses a clustering algorithm to divide the objective space and assign specific parts of it to each island. It defines three different kinds of islands: one organizer (or master), a number of classic islands and two control islands. Figure 5.1 illustrates the APM-MOEA model using 6 islands and highlights the communications from the organizer to other islands. The overall behavior of the proposed model is defined as follows. First, populations and archives used to store non-dominated solutions are initialized. Then, an iterative process is repeated until a stopping criterion is reached, which generally is a maximum number of generations or a maximum execution time. In the first step of this process, populations are evolved through genetic operations for a number of generations where each island is focused on a specific region. During genetic operations, local archives, one for each island, are continually updated with newly generated solutions. A local search procedure is then applied on each population to improve the quality of solutions. The next steps are managed by the organizer and aim to redefine the regions of the objective space for each island. Although it follows the outlines of the model of Streichert et al. [6], APM-MOEA implements significant changes which are described in the following sections.

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

73

Fig. 5.1 Scheme of APM-MOEA using 6 islands

Fig. 5.2 Example of clustering for APM-MOEA using 6 islands. 4 clusters are defined: 3 for the classic islands and 1 for the organizer

5.3.1 Global Search View of the Organizer The organizer island is responsible for handling the clustering and has a global view of the current search. It maintains a global archive that is updated with local archives from all other islands. At each migration point, the global archive is partitioned by a clustering algorithm, then clusters and centroids are distributed to their respective islands. Other islands have a similar outline, but they only send their local archives when a new migration point is reached, and check for a new cluster at regular intervals. It is important to note that the solutions from each cluster are not necessarily integrated into the population since elitist replacement is done to keep only the best Pareto solutions. The original model of Streichert et al. uses a classic k-means and has the drawback of relying heavily on randomly selected initial centroids. Other methods have been proposed to improve these initial points and thus improve the algorithm efficiency. Among these, Yedla et al. [20] introduced a new clustering algorithm which includes a method for finding better initial points and a more efficient approach to assign the data points to the corresponding clusters. Experiments have shown that this algorithm is more accurate and faster than the original k-means clustering algorithm. Its main features are then integrated into APM-MOEA and the origin point is replaced by the ideal vector to take into account the characteristics of the multi-objective problems. An example of partitioning for this model is given in Fig. 5.2 with the corresponding clusters and centroids. In order to limit each island to a specific region of the objective space during the evolution of populations, constraints are added to the problem. Thus, a solution

74

F. Mazière et al.

S is feasible if and only if the nearest centroid from S is the centroid assigned to the considered island. In other cases, the solution S is marked as invalid and is not favored in the evolutionary operations. For example, in the context of the MQAP, the constraint violation can be the Euclidean distance between the objective vector and the centroid attributed to the island. The constrained domination principle introduced by Deb et al. [21] modifies the definition of domination and it is used to limit the islands to their own region. More precisely, it is used to compare two solutions in the binary tournament of the MOEA, to select the best offspring in the genetic phase and to sort the population in the replacement step. It is important to mention that the original Pareto dominance is still used to update the local archives.

5.3.2 Asynchronous Communications As it has been pointed out by Jaimes et al. [7], the main drawback of the Streichert model is the communication overhead caused by the exchange of populations. To address this issue and handle the irregularity of local search computations, the APM-MOEA model uses asynchronous communications to avoid using a global synchronization barrier. As shown in Fig. 5.3, the organizer island does not have to wait for all the local archives to perform the clustering. At regular intervals, it simply checks if new local archives have been sent. If this is the case, the global archive is updated with the new solutions. All send operations are non-blocking and therefore do not need to wait for the corresponding receive operations. The communication overhead for the other islands is also limited as they only have to periodically check for new clusters and send their local archive.

Fig. 5.3 Scheme of communications in the APM-MOEA model

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

75

5.3.3 Control Islands Two additional islands, named control islands, are included in APM-MOEA in order to find more diversified solutions and explore the objective space more efficiently. Unlike the other islands, they have no constraint about the search area. The first control island includes the local search procedure in its process while the second generates new solutions only with evolutionary operations. These islands allow to explore the whole objective space with a global view of the current search. The control islands communicate exclusively and asynchronously with the organizer island which provides new solutions contained in its global archive. As the classic islands, they periodically send their local archive to the organizer.

5.3.4 Local Search The performance of multi-objective algorithms can often be improved through the inclusion of some form of local search. In this sense, hybrid algorithms specifically tailored to combinatorial multi-objective problems have been developed [10, 22, 23]. In APM-MOEA, a non-iterative local search is applied to improve each solution of the population at regular intervals. Partial neighborhood exploration [24] is favored rather than exhaustive exploration to reduce computation time. For the MQAP, we adopted a first improving strategy. This strategy generates all the neighbors and stops when a solution which dominates the original solution is found. A maximum number of tries can be set to limit computation overhead. To define the MQAP solution neighborhood, the swap operator is used. It interchanges the locations of two given facilities in a permutation. We adapted the computation method proposed by Taillard [12] for the multi-objective problem in order to compute the swap cost. Thus, given a MQAP permutation π and the swapped facilities π(i), π( j), the difference  for the objective function value f is computed as defined in Equation 5.2 (π, i, j) f =

n 

(aki − ak j )(bπf k π j − bπf k πi ) + (aik − a jk )(bπf j πk − bπf i πk ) (5.2)

k=0 k=i, j

where n is the number of facilities, ai j is the distance between location i and location j and bπk i π j is the cost of the k-th flow from facility πi to π j . It is important to note that this computation is valid when the distance matrix is symmetrical, which is the case for all MQAP instances handled in this work. It allows to speed up the objective functions evaluations from a solution to its neighbor.

76

F. Mazière et al.

5.4 Experimental Results The first step of the experiments is to compare the performance of APM-MOEA with the recent parallel algorithm PasMoQAP [10]. In a second step, solutions provided by APM-MOEA for large MQAP instances are compared with those obtained by two implementations of state-of-the-art island-based models which use a divideand-conquer approach: the cone separation model [5] (Cones) and the original model of Streichert et al. [6] (Clustering). For a fair comparison, local search is incorporated in the same way for all models. The quality of obtained solutions are analyzed through four metrics that are presented in the next subsection.

5.4.1 Performance Metrics In a multi-objective context, it is difficult to assess both convergence toward the Pareto front and diversity of the obtained solutions. However, in the last decades, many metrics have been proposed to compare the performance of different MOEAs [25]. Four metrics are used in the proposed experiments: • The inverted generational distance I G D [26] measures the average distance between solutions of the reference set and obtained solutions. It allows the evaluation of the convergence of a solution set toward the Pareto front. Lower values of this metric are associated with better convergence; • The hypervolume metric H [27] measures the size of the objective space that is dominated by a given solution set. It allows the estimation of both the convergence and the diversity of the solutions. Higher values of hypervolume are preferable; • The coverage of two sets C [11] compares the convergence of two given solution sets using Pareto dominance. More specifically, C(A, B) measures the proportion of solutions in set B which are dominated by solutions in set A; • The minimal spacing (ms) [28] evaluates the spread of solutions contained in a set. It addresses the limitations of classical spacing metrics by computing the distance from a solution to its nearest neighbor which has not already been considered. The lower the value is, the better is the diversity. Before presenting the conditions and results of the experiments, the reference algorithm and the MQAP instances used are briefly exposed.

5.4.2 The GISMOO Algorithm The GISMOO algorithm [9] was chosen to evolve the subpopulations and to illustrate the performance of APM-MOEA and state-of-the-art parallel models. In recent

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

77

Table 5.1 Large MQAP instances, from 100 to 1000 locations, created with the problem instance generator proposed by Knowles and Corne [8] #

Name

Size

Type

Obj

Corr.

#

Name

Size

Type

Obj

Corr.

1

Gar100-2fl1uni

100

Uniform

2

−0, 5

9

Gar500-2fl1uni

500

Uniform

2

0

2

Gar100-2fl2uni

100

Uniform

2

0

10

Gar500-2fl2uni

500

Uniform

2

0,5

3

Gar100-2fl1rl

100

Real-like

2

−0, 5

11

Gar500-2fl1rl

500

Real-like

2

0

4

Gar100-2fl2rl

100

Real-like

2

0

12

Gar500-2fl2rl

500

Real-like

2

0,5

5

Gar200-2fl1uni

200

Uniform

2

0

13

Gar10002fl-1uni

1000

Uniform

2

0

6

Gar200-2fl2uni

200

Uniform

2

0,5

14

Gar10002fl-2uni

1000

Uniform

2

0,5

7

Gar200-2fl1rl

200

Real-like

2

0

15

Gar10002fl-1rl

1000

Real-like

2

0

8

Gar200-2fl2rl

200

Real-like

2

0,5

16

Gar10002fl-2rl

1000

Real-like

2

0,5

years, it has been proven effective in solving a variety of continuous [9, 29] and combinatorial [30, 31] problems. More specifically, its immune phase allows to obtain well-diversified solutions which seems to be a significant benefit for a divide-andconquer model. Following the authors guidelines, all GISMOO experimentations were carried out with the following parameters: population size, mutation probability and crossover probability are respectively set to 100, 0.06 and 1.0.

5.4.3 MQAP Instances In the first part of the experiments, the Gar60 MQAP instances with 60 facilities/locations, proposed by Garrett et al. [14], are used to compare APM-MOEA to PasMoQAP. In the second part, the MQAP instance generator proposed by Knowles and Corne [8] is used to evaluate the performance of the proposed model on larger instances. Thus, 16 test instances from 100 to 1000 decision variables have been created1 and their specifications are exposed in Table 5.1: instance number (#), instance name (Name), number of locations (Size), instance type (Type), number of objectives (Obj.) and the correlation between flow matrices (Corr).

1 https://sourceforge.net/projects/apm-moea-mqap-instances/.

78

F. Mazière et al.

5.4.4 Experimental Conditions All parallel models are implemented in the C++ language using the MPI library and the mpiicpc compiler. They are executed on the ROMEO supercomputer [32] which allows the use of many processor cores for these experiments. For each parallel model, a single processor core is associated with an island. All test instances are solved five times for each parallel model and the average of the metric values are presented. As large instances of MQAP are experimented, the Pareto fronts are not known. Thus, the non-dominated solutions found by all parallel models are used as reference sets for the convergence metrics. In regard to the evolutionary phase, the order crossover (OX) is used to perform the recombinations and the swap mutation is used as the mutation operator. The local search procedure presented in Sect. 5.3.4 is implemented for each model. The same maximum execution time is also set as the stopping criteria for each test instance and each model. In their original work, Sanhueza et al. [10] have experimented their approach on Gar60 instances [14] using 5, 8, 11, 16 and 21 processor cores and a maximum execution time of 300 s. For our experiments on APM-MOEA, the same numbers of islands are set and computational time is reduced to 200 s in order to take into account the difference between experimental environments. The solution sets made available by the authors are used to compare the parallel models. For large MQAP instances, as the evaluation time of an MQAP solution considerably increases with the number of decision variables, different times have been allocated for each instance type. Therefore, the maximum execution times for the instances with 100, 200, 300, 500 and 1000 locations are respectively set to 300, 500, 1000, 1500 and 9000 s.

5.4.5 Resolution of Small MQAP Instances The I G D metric allows the estimation of the convergence to the approximate Pareto front. Average I G D values obtained by PasMoQAP and APM-MOEA are presented in Table 5.2 for each test instance. The reader may note that objective functions have been normalized from the ideal and Nadir points to obtain representative average values for all instances. Results show that APM-MOEA allows for better convergence to the reference set. In fact, all its I G D values are closest to 0, showing that it minimizes the distances to the Pareto front in the objective space. Furthermore, its I G D values decrease with the number of islands, showing its ability to improve the quality of its solutions when the number of islands increases. However, the reader may notice that using 16 or 21 islands does not allow APM-MOEA to significantly improve the quality of the solutions. To confirm the results obtained with the I G D metric, the hypervolume metric is computed for all models. Mean values for each configuration are presented in Fig. 5.4. The higher values obtained by APM-MOEA show its ability to find a set of

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

79

Table 5.2 Mean inverted generational distance I G D for PasMoQAP (P) [10] and APM-MOEA (A) for Gar60 instances 5 islands 8 islands 11 islands 16 islands 21 islands Instances Gar60-2fl-1uni Gar60-2fl-2uni Gar60-2fl-3uni Gar60-2fl-4uni Gar60-2fl-5uni Gar60-2fl-1rl Gar60-2fl-2rl Gar60-2fl-3rl Gar60-2fl-4rl Gar60-2fl-5rl Gar60-3fl-1uni Gar60-3fl-2uni Gar60-3fl-3uni Gar60-3fl-1rl Gar60-3fl-2rl Gar60-3fl-3rl

P 0,209 0,357 0,489 0,069 1,123 0,285 0,302 0,313 0,270 0,524 0,433 0,177 0,753 0,323 0,347 0,481

A 0,007 0,011 0,025 0,002 0,166 0,006 0,005 0,006 0,006 0,011 0,027 0,022 0,030 0,020 0,023 0,017

P 0,220 0,372 0,521 0,066 1,151 0,246 0,258 0,276 0,236 0,547 0,459 0,175 0,674 0,299 0,352 0,418

A 0,007 0,008 0,021 0,002 0,067 0,009 0,003 0,007 0,004 0,032 0,011 0,008 0,025 0,009 0,013 0,015

P 0,211 0,327 0,457 0,064 1,027 0,258 0,265 0,239 0,253 0,482 0,453 0,163 0,637 0,275 0,369 0,442

A 0,006 0,008 0,019 0,001 0,067 0,008 0,003 0,005 0,005 0,024 0,010 0,007 0,034 0,007 0,009 0,008

P 0,226 0,361 0,495 0,071 1,084 0,296 0,300 0,251 0,252 0,646 0,462 0,178 0,719 0,345 0,363 0,442

A 0,004 0,004 0,009 0,001 0,067 0,004 0,004 0,005 0,004 0,015 0,008 0,004 0,012 0,006 0,004 0,005

P 0,212 0,303 0,480 0,068 1,114 0,282 0,302 0,268 0,266 0,585 0,464 0,158 0,738 0,334 0,390 0,486

A 0,003 0,003 0,009 0,001 0,070 0,003 0,004 0,004 0,003 0,016 0,009 0,004 0,009 0,006 0,004 0,004

Fig. 5.4 Mean hypervolume H for Gar60 instances

solutions which covers a larger region of the objective space than other models. This indicator confirms that APM-MOEA provides solutions with better convergence and diversity. Also, while the PasMoQAP local search procedure seems to remain stuck in Pareto local optimums, the APM-MOEA non-iterative procedure allows for better exploration of the objective space and leaves more work to the core MOEA. In order to specifically analyze the diversity of solutions sets, the minimal spacing metric is also computed for each parallel model. Table 5.3 summarizes the average results grouped by configuration.

80

F. Mazière et al.

Table 5.3 Mean minimal spacing ms for Gar60 instances 5 isl. 8 isl. 11 isl. PasMoQAP APM-MOEA

0,0715 0,0151

0,0690 0,0134

0,0876 0,0127

16 isl.

21 isl.

0,0881 0,0124

0,0624 0,0123

According to the ms indicator, lower values obtained with any number of islands indicate that APM-MOEA provides a better spread of solutions than PasMoQAP. Moreover, its diversity increases with the number of islands. In particular, by using an enhanced clustering algorithm to distribute the computations, APM-MOEA provides well-distributed solutions in the objective space. The first part of the experiments showed the effectiveness of APM-MOEA to solve small MQAP instances both in terms of convergence to the Pareto front and distribution on the objective space. The next section is devoted to analyzing the performance of APM-MOEA in solving larger instances.

5.4.6 Resolution of Large MQAP Instances The new MQAP instances, introduced in Sect. 5.4.3, are used to compare the performance of APM-MOEA with the Cones and Clustering models on the parallelization of the GISMOO algorithm. First, the convergence ability of APM-MOEA and other parallel models are analyzed through two metrics. The I G D metric allows for the estimation of the average distance to the Pareto front. To allow its computation, objective functions are normalized between 0 and 1 with the extreme solutions among all non-dominated solutions found. Table 5.4 presents the average results obtained for each model in each configuration. The best results are marked in bold characters. By first considering the I G D metric, Cones and APM-MOEA reach the best performances using 2 islands. The first obtains the lowest average I G D values for 5 of the 16 MQAP instances while the second offers a better convergence for 9 of the 16 instances. In 4 to 8 islands configurations, APM-MOEA provides solutions closer to the Pareto front for most instances, especially the larger ones with 500 or 1000 locations. With 16 islands, APM-MOEA offers better quality of solutions than other models as it obtains the lowest I G D values for 14 of the 16 instances. Furthermore, I G D values for APM-MOEA significantly decrease with the number of islands, showing its ability to improve the quality of its solutions when the number of islands increases. To confirm the results obtained with the I G D metric and to directly compare solutions provided for each instance, the values of the set coverage metric C are presented in Fig. 5.5 for 16 islands configurations.

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

81

Table 5.4 IGD values for Cones, Clustering and APM-MOEA for large MQAP instances 2 islands

4 islands APM

8 islands

CON

CLU

CON

CLU

APM

CLU

APM

CON

1

0,037

0,0436 0,0373 0,031

0,027

0,0388 0,0236 0,009

0,027

0,0152 0,005

2

0,0929 0,058

3

0,0681 0,0427 0,038

0,0263 0,0271 0,024

0,0313 0,0293 0,026

0,0219 0,0159 0,015

4

0,0763 0,029

0,0461 0,0383 0,02

0,0447 0,0246 0,008

0,024

5

0,1599 0,1657 0,147

0,1058 0,1059 0,089

0,0767 0,054

6

0,192

7

0,0607 0,035

0,0558 0,0597 0,035

0,0525 0,0541 0,0493 0,043

0,0463 0,0174 0,004

8

0,041

0,058

0,0431 0,0883 0,049

0,041

0,0392 0,0337 0,027

0,0356 0,0204 0,009

9

0,204

0,2479 0,2317 0,1704 0,1991 0,147

10

0,2706 0,2968 0,184

11

0,187

12

0,2567 0,2724 0,234

0,3054 0,4199 0,043

13

0,3206 0,3192 0,25

0,215

0,1434 0,0952 0,027

0,1072 0,0459 0,013

14

0,7176 0,7436 0,61

0,5198 0,649

0,3495 0,4777 0,27

0,4202 0,4162 0,08

15

0,3902 0,3899 0,293

0,3492 0,3913 0,134

0,3236 0,35

0,005

0,3214 0,3956 0,015

16

0,4149 0,4364 0,381

0,456

0,4171 0,4374 0,059

0,4434 0,4415 0,002

0,048 0,052

0,038

CON

16 islands

#

0,0528 0,0561 0,038

0,1957 0,2819 0,1942 0,0967 0,058

0,0276 0,019

CLU

0,0147

0,0337 0,0223 0,017 0,0227 0,012

0,0728 0,0339 0,023

0,1437 0,2164 0,106

APM

0,0607

0,1018 0,1029 0,058

0,1118 0,0892 0,062

0,0535 0,024

0,2171 0,2266 0,076

0,1948 0,1552 0,045

0,1704 0,0916 0,011

0,3224 0,2087 0,1818 0,3423 0,069

0,2342 0,3834 0,017

0,235

0,2324 0,4657 0,07

0,2639 0,5208 0,004

0,2322 0,138 0,329

0,4509 0,13

0,016

0,3745 0,003

Fig. 5.5 Average coverage C of APM-MOEA, Cones and Clustering using 16 islands for large MQAP instances

In most cases, average values of C(APM-MOEA, Cones) and C(APM-MOEA, Clustering) are ranging between 0.7 and 1.00, which indicate better convergence of APM-MOA as most of its solutions dominate those provided by Cones and Clustering. On the contrary, values of C(Cones, APM-MOEA) and C(Clustering, APM-MOEA) are close to 0, showing that APM-MOEA finds solutions that are not dominated by those of Cones and Clustering. In solving the largest test instances, this is with more than 500 decision variables, APM-MOEA

82

F. Mazière et al.

Fig. 5.6 Average hyper-volume H of Cones, Clustering and APM-MOEA Table 5.5 Average minimal spacing ms of Cones, Original and APM-MOEA for large MQAP instances 2 islands 4 islands 8 islands 16 islands Cones Original APM-MOEA

0,507 0,519 0,497

0,512 0,516 0,484

0,52 0,503 0,467

0,528 0,497 0,442

is even more efficient as it obtains C values close to 1.00 on the other models. The convergence ability of APM-MOEA is thus highlighted by the first two metrics as they indicate that it manages to find solutions closer to the Pareto front than other models, especially for configurations with a large number of islands. In order to highlight the distribution and diversity of the obtained solutions, the results of the hypervolume H and the minimal spacing ms metrics are presented respectively in Fig. 5.6 and Table 5.5. According to the first metric, better average H values indicate that APM-MOEA reaches a better coverage of the objective space with any number of islands than other models and especially Cones which provided better convergence I G D values on some MQAP instances. The division of the objective space performed by APM-MOEA enhanced clustering algorithm allows for the covering of a larger portion of the objective space as indicated by the largest H values obtained with more than 4 islands. As the results of the convergence metric I G D suggested, the best performances are achieved in the largest configurations as the H metric significantly increases with the number of islands. Finally, by considering the ms metric, APM-MOEA provides a better spread of solutions than any other models with any number of islands, as shown by the low ms values obtained. Moreover, its diversity increases with the number of islands. In particular, by using an enhanced clustering algorithm to distribute the computations and with 2 control islands, APM-MOEA provides well-distributed solutions. Figures 5.7 and 5.8 illustrate overall results by providing examples of solution sets obtained by the Cones, Original and APM-MOEA models on typical runs solving two MQAP instances. The reader may note that APM-MOEA obtains solution

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

83

Fig. 5.7 Example of solution sets obtained by Cones, Clustering and APM-MOEA on the Gar200-2fl-1rl instance

Fig. 5.8 Example of solution sets obtained by Cones, Clustering and APM-MOEA on the Gar1000-2fl-1rl instance

sets that are better distributed and that cover larger regions of the objective space, confirming the values obtained by the minimal spacing and the hypervolume metrics. These examples also highlight the good convergence of APM-MOEA which provides lower values for each objective function.

5.5 Conclusion In this work, a new parallel model is proposed to improve the efficiency and execution time of multi-objective evolutionary algorithms in solving the MQAP. Based on the parallelization of the GISMOO algorithm [9], APM-MOEA managed to find solutions of better quality than the recently proposed PasMoQAP parallel algorithm [10] on small MQAP instances. In order to show the performance of the proposed model in a wider context, large MQAP instances have also been generated and handled by APM-MOEA and two state-of-the-art models. According to various multi-objective metrics, APM-MOEA outperformed other parallel models in terms of convergence

84

F. Mazière et al.

toward the Pareto front and diversity of the obtained solution sets. Moreover, APMMOEA has proven to be scalable using from 4 to 16 islands as the computed metrics indicated good convergence and coverage of the objective space with the largest number of processors. The additional islands also allowed the model to obtain a better distribution of the solutions in the search space. Furthermore, APM-MOEA provided very promising results on one specific multiobjective problem, but future works should be dedicated to studying its behavior in other contexts. For example, it would be interesting to experiment the APMMOEA model with continuous multi-objective or many-objective problems. Another promising avenue would be to adapt the model to other multi-objective evolutionary algorithms such as NSGA-II to further validate its efficiency. In this study, the main focus was on studying solution quality, but master-slave models have been proposed to speed-up MOEAs. They often parallelize the evaluations of the objective functions, which usually require long computing times. Thus, it could be interesting to hybridize APM-MOEA with a master-slave model to both improve solution quality and reduce execution time.

References 1. S. Bandyopadhyay, S. Pal, B. Aruna, Multiobjective GAs, quantitative indices, and pattern classification (IEEE Trans. Syst. Man Cybern, B Cybern, 2004) 2. J. Branke, H. Schmeck, K. Deb, and M. Reddy Parallelizing multi-objective evolutionary algorithms: cone separation, in Evolutionary Computation. CEC2004., vol.2 (2004), pp. 1952– 1957 3. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, in IEEE Transactions on In Evolutionary Computation (2002), pp. 182–197 4. M.M. Drugan, D. Thierens, Stochastic Pareto local search: Pareto neighbourhood exploration and perturbation strategies. J. Heurist. 727–766, (2012) 5. C. Gagné, A. Zinflou, Solving multi-objective quadratic assignment problems using hybrid genetic/immune strategy, in Proceedings of the 10th edition of the Metaheuristics International Conference (MIC 2013) (2013) 6. C. Gagné, A. Zinflou, An hybrid algorithm for the industrial car sequencing problem, in Proceedings of the IEEE World Congress on Computational Intelligence (WCCI 2012), P (Brisbane, Australia, 2012) 7. D. Garrett, D. Dasgupta, An empirical comparison of memetic algorithm strategies on the multiobjective quadratic assignment problem, in 2009 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making( (2009), pp. 80–87 8. D. Garrett, D. Dasgupta, Analyzing the performance of hybrid evolutionary algorithms for the multiobjective quadratic assignment problem, in IEEE International Conference on Evolutionary Computation (2006), pp. 1710–1717 9. C. Grosan, M. Oltean, D. Dumitrescu, Performance metrics in multi-objective optimization, in Proceedings of the Conference on Applied and Industrial Mathematics (CAIM ’03) (2003), pp. 121–125 10. J. Knowles, D. Corne, Instance generators and test suites for the multiobjective quadratic assignment problem, in Evolutionary Multi-Criterion Optimization (2003), pp. 295–310 11. T.C. Koopmans, M. Beckmann, Assignment problems and the location of economic activities. Econometrica 53–76, (1957)

5 An Asynchronous Parallel Evolutionary Algorithm for Solving Large …

85

12. H. Li, D. Landa-Silva, An Elitist GRASP metaheuristic for the multi-objective quadratic assignment problem, in Evolutionary Multi-Criterion Optimization (2009), pp. 481–494 13. A. Liefooghe, J. Humeau, S. Mesmoudi, L. Jourdan, E.G. Talbi, On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems. J. Heurist. 317–352, (2012) 14. A. Lopez Jaimes, C.A. Coello Coello, Hybrid population-based algorithms for the bi-objective quadratic assignment problem. J. Math. Modell. Alg. 111–137 (2006) 15. A. Lopez Jaimes, C.A. Coello Coello, MRMOGA: parallel evolutionary multiobjective optimization using multiple resolutions, in IEEE Congress on Evolutionary Computation (2005), pp. 2294–2301 16. F. Luna, E. Alba, Parallel multiobjective evolutionary algorithms, in Springer Handbook of Computational Intelligence (2015), pp. 1017–1031 17. T. Lust, J. Teghem, The multiobjective traveling salesman problem: a survey and a new approach, in Advances in Multi-Objective Nature Inspired Computing (2010), pp. 119–141 18. B.S.P Mishra, S. Dehuri, R. Mall, A. Ghosh, Parallel single and multiple objectives genetic algorithms: a survey, in International Journal of Applied Evolutionary Computation (2011), pp. 21–57 19. ROMEO HPC Center (2018), https://romeo.univ-reims.fr 20. S. Samanta, D. Philip, S. Chakraborty, Bi-objective dependent location quadratic assignment problem: Formulation and solution using a modified artificial bee colony algorithm, in Computers & Industrial Engineering (2008), pp. 8–7626 21. C. Sanhueza, F. Jimenez, R. Berretta, P. Moscato, PasMoQAP: a parallel asynchronous memetic algorithm for solving the multi-objective quadratic assignment problem, in Conference on Evolutionary Computation (2017) 22. F. Streichert, F. Ulmer, A. Zell, Parallelization of multi-objective evolutionary algorithms using clustering algorithms, in Evolutionary Multi-Criterion Optimization (2005), pp. 92–107 23. E.D. Taillard, Comparison of iterative searches for the quadratic assignment problem. Location Sci. 87–105, (1995) 24. E.-G. Talbi, A unified view of parallel multi-objective evolutionary algorithms (J. Parallel Distrib, Comput, 2018) 25. D.A. Van Veldhuizen, G.B. Lamont, Evolutionary Computation and Convergence to a Pareto Front (In Stanford University, California, 1998), pp. 221–228 26. M. Yedla, S. Rao Pathakota, T.M. Srinivasa, Enhancing k-means clustering algorithm with improved initial centre, in International Journal of Computer Science and Information Technologies (2010), pp. 121–125 27. A. Zhou, B-Y. Qu, H. Li, S-Z. Zhao, P-N. Suganthan, Q Zhang, Multiobjective evolutionary algorithms: A survey of the state of the art, in Swarm and Evolutionary Computation (2011), pp. 32–49 28. A. Zinflou, C. Gagné, M. Gravel, GISMOO vs Genetic and differential evolution algorithms in multiobjective optimization, in Proceedings of the 9th Metaheuristics International Conference (MIC 2011) (2011), pp. S1-52-10 29. A. Zinflou, C. Gagné, M. Gravel, GISMOO: A new hybrid genetic/immune strategy for multiple-objective optimization, in Computers & Operations Research (2012), pp. 1951–1968 30. E. Zitzler, Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications (1999) 31. E. Zitzler, L. Thiele, Multiobjective optimization using evolutionary algorithms - A comparative case study, in Parallel Problem Solving from Nature - PPSN V: 5th International Conference Amsterdam (1998), pp. 292–301 32. C. zkale, A. Figlali, Evaluation of the multiobjective ant colony algorithm performances on biobjective quadratic assignment problems, in Applied Mathematical Modelling (2013), pp. 7822–7838

Chapter 6

Learning from Prior Designs for Facility Layout Optimization Hannu Rummukainen, Jukka K. Nurminen, Timo Syrjänen, and Jukka-Pekka Numminen

Abstract The problem of facility layout involves not only optimizing the locations of process components on a factory floor, but in real-world applications there are numerous practical constraints and objectives that can be difficult to formulate comprehensively in an explicit optimization model. As an alternative to explicit modelling, we present an optimization approach that learns structural properties from examples of expert-designed layouts of other similar facilities, and considers similarity to the examples as one objective in a multiobjective facility layout optimization problem. We have tested the approach on small-scale artificial test data, and the initial results indicate that a layout objective can be learned from example layouts, even if the process structure in the examples differs from the target case.

6.1 Introduction We consider the problem of facility layout, that is where to place the various process components, e.g. production machines, in a manufacturing facility. Whether in process industry or in goods manufacturing, the design of a facility is a complex multi-disciplinary effort. The layout of components is directly or indirectly affected H. Rummukainen (B) VTT, P.O. Box 1000, 02044 VTT, Finland e-mail: [email protected] J. K. Nurminen University of Helsinki, P.O. Box 68, 00014 Helsinki, Finland e-mail: [email protected] T. Syrjänen · J.-P. Numminen Pöyry Finland, P.O. Box 4, 01621 Vantaa, Finland e-mail: [email protected] J.-P. Numminen e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_6

87

88

H. Rummukainen et al.

by numerous factors starting from the construction, operation and maintenance of relevant production technologies and infrastructure, and including considerations such as production economics, safety, security, environmental issues, and regulatory requirements. Because the number of influencing factors is large, and often difficult to specify completely, modelling the task explicitly as an optimization problem is difficult or impossible. The key idea behind our work is that earlier plans for different but similar facilities contain a lot of useful knowledge. The experts who have created the earlier plans have used their skills and expertise to consider and balance the multiple influencing factors. As a result, the plans contain a lot of explicit as well as hidden knowledge that can be useful for future planning tasks. In this work we investigate the use of machine learning to distill facility layout knowledge from old plans and apply it to the design of new facilities. Instead of attempting to include all essential factors to facility layout optimization explicitly we use probabilistic machine learning to generate new goals for the model automatically. In this way we think the optimization model better reflects the real world complexities without expending modelling effort to find, formulate, and balance the different factors. At the time of initial layout design, the data available about the particular facility is typically limited, and it is not practical to require the user to input large amounts of site-specific parameters. An automated decision support tool should regardless be able to quickly provide plausible layouts that can be used as a starting point for more detailed planning and design. If the designer disagrees with automated layout decisions, it should then be easy to interactively improve the generated designs. An established design consultant or manufacturing business may have detailed design data about dozens of facilities, which can then be used as source data for machine learning. Our research question is whether such data is sufficient to learn relevant design rules well enough to apply them to new designs. If so, machine learning has the advantage of both reducing the complexity of the layout model and reducing the need for explicit site-specific parameter data. The key contributions of this paper are: 1. We propose the idea of using knowledge automatically extracted from the plans of old facilities in the design of new ones (Sect. 6.1) 2. We formulate a probabilistic similarity model used to extract the essential planning knowledge from old plans (Sect. 6.4). 3. We combine a basic facility layout model (Sect. 6.3) and the similarity model into an optimization model which uses similarity as an additional goal (Sect. 6.5) 4. We evaluate the idea with small-scale artificial test data (Sects. 6.6–6.7). Figure 6.1 illustrates how the proposed optimization approach would be used in practice in a decision support tool by a facility designer.

6 Learning from Prior Designs for Facility Layout Optimization

89 Designer

Complete past layouts Component positions, connections

Learning evaluation measures

Case specification Components, connections, infrastructure, parameters

Multiobjective layout optimization

Layout proposals

Ad hoc adjustments

Fig. 6.1 Concept of a decision support tool for facility layout, based on example layouts. Working on a new facility, the designer outlines the basic process and relevant constraints to the decision support tool, and gets a small, varied selection of plausible layouts as a starting point for more detailed design work

6.2 Related Work In the operations research literature, facility layout has typically been addressed as a combinatorial optimization problem, where the most important layout rules are explicitly modelled [2, 3, 11]. Typically the primary objective is to minimize material transport costs derived from inter-component distances. Additional objectives and constraints may be used to model e.g. safety considerations [13]. However, it would be complicated to fully model all relevant design considerations. Due to the limitations of existing models, automatic facility layout models have made little headway as a practical design tool to date [18]: The initial design of a process plant is in practice performed by an experienced designer, and detailed design is performed by a multidisciplinary team of experts. The issue of how to include human expertise in automated facility layout has been commonly addressed by actively involving expert designers either in the model specification or in the solution process. In our work we do not require experts to directly or indirectly weight different objectives. Instead, we implicitly learn those weights from old plans, which we suggest contain the expert judgement used to create them. To the best of our knowledge we are the first to attempt facility layout optimization with similarity to expert-designed layouts as an explicit optimization goal. Because of the difficulty of modelling and weighing all relevant factors, researchers have suggested using expert opinions to rate multiple plans generated by computational tools [6]. Similarly, García-Hernáindez et al. [8, 9] involve a human expert designer in facility layout via interactive multiobjective optimization: After each iteration of a genetic algorithm, a representative subset of solutions are scored by the

90

H. Rummukainen et al.

expert. An explicit quantitative objective (based on weighted transport distance) and the expert scores are treated as separate objectives, and Pareto-optimal solutions are then selected for the genetic algorithm to work on in the next iteration. In order to automate processing of qualitative design rules, Grobelny and Michalski [10] apply fuzzy set theory to formalize layout rules expressed in semi-formal natural language, and then run simulated annealing to optimize the mean fuzzy truth value of the rules. Chung [7] presents a neural network based method to generate new facility layouts that are similar to given examples. The method does not consider any explicit objectives or constraints on the generated layouts. Ahmad et al. [1] train a neural network to reproduce expert evaluations of given layouts, and propose that the resulting model would be useful for optimization; however the network was limited to four quantitative measures of the layout as inputs, and as many as 500 manually evaluated layouts were used for training data. Merrell et al. [16] generate building layouts by training a Bayesian network on example layouts. Their method generates new layouts in two stages: room shapes, sizes and adjacencies are sampled from the trained Bayesian network, and then the rooms are positioned by a separate optimization procedure with an explicitly modelled objective. Our approach does not need a separate sampling stage, but embeds a probabilistic similarity model directly in the final layout optimization model. Machine learning has been applied to many planning tasks. Approaches to use machine learning for VLSI circuit design are reviewed in [25]. The problem of deriving a constraint programming model from feasible and infeasible example solutions is in general quite challenging, but has been addressed by several methods [20] including machine learning [5]. Mizoguchi and Ohwada [17] present an inductive logic programming method that derives constraints from examples of feasible solutions only, and apply the method to a floor layout problem. In contrast, we derive an evaluation function instead of hard constraints, and we avoid the complexity of the general constraint acquisition problem by focusing on the modelling of geometric similarity. In optimization literature, the problem of deducing an objective function that makes the given solutions optimal is known as inverse optimization, and again the general problem is hard [4, 22]. We know of no applications of inverse optimization to facility layout problems. Lombardi et al. [14] present a methodology for embedding a machine learning model into a combinatorial optimization model. Our work can be seen as an application of their methodology, extending it by (1) embedding a probabilistic machine learning model in a constraint programming model, and (2) using a learned objective in addition to an explicit model objective.

6.3 Facility Layout Model Here we define the basic layout model that will be combined with the similarity model in Sect. 6.5.

6 Learning from Prior Designs for Facility Layout Optimization

91

The production process is described by a directed graph (V, E), called process graph, where the nodes V are identified with components, and the edges E ⊆ V × V indicate material flows between components. Note that directedness is not needed for the layout model, but we use the information later in the similarity model of Sect. 6.4. For each edge (i, j) ∈ E, the distance between components i and j contributes to the objective function with a cost coefficient ci j ≥ 0. Each component i ∈ V can be realized as a number of different rectangular patterns, denoted by the set Pi : the choice of pattern p ∈ Pi determines the width w p and height h p of the component. Different patterns may arise when a component is rotated (by 90 degrees), or is replaced by another interchangeable component of different dimensions. We use a discrete model with integer coordinates on a W × H grid. Distances between components are measured by rectilinear (taxicab) distance between component centre points; note that we double the distances to avoid half-integral values. We use the following decision variables: xi ∈ {0, . . . , W } Lower left corner x-coordinate of component i ∈ V yi ∈ {0, . . . , H } Lower left corner y-coordinate of component i ∈ V pi ∈ Pi Choice of pattern for node i ∈ V di j ∈ N Distance from component i to j, for edge (i, j) ∈ E We start natural numbers N from 0. The basic facility layout model is now: min z C =



ci j di j

(6.1)

(i, j)∈E

    di j = (2xi + w pi ) − (2x j + w p j ) + (2yi + h pi ) − (2y j + h p j ) ∀(i, j) ∈ E (6.2) xi + w pi ≤ x j ∨ x j + w p j ≤ xi ∨ yi + h pi ≤ y j ∨ y j + h p j ≤ yi ∀i, j ∈ V . (6.3) The objective (6.1) is the sum of edge distance costs. Equation (6.2) links the distance variables with the component coordinates and pattern choices. Equation (6.3) ensures that no two components overlap each other.

6.4 Similarity Model We develop a probabilistic model that assigns a probability density to each potential layout of a given problem instance, based on examples of expert-designed layouts in different problem instances. We expect that the number of observed example layouts is on the order of dozens. Due to the low number, instead of considering the layout as a whole, we focus on pairwise geometric relationships between components. The production processes in the example cases may differ, but we assume a shared

92

H. Rummukainen et al.

classification of process components is available: for example, each component could be classified as either a large tank, a small tank, a pump, or “other component”. For best results, all components in the same class should be of approximately the same dimensions.

6.4.1 Probabilistic Layout Model Each example layout is assumed to be drawn independently from a random distribution of “well-designed” process layouts. An example comprises both the process graph described in Sect. 6.3 and the positions and dimensions of the components. In addition, we assume that we can observe an orientation for each component, indicating which of the four cardinal directions the component is facing. Note that we assume no knowledge of any objective functions or constraints under which the example layouts could be considered optimal solutions. In the following, T denotes the set of shared component types, and O = {(0, 1), (1, 0), (0, −1), (−1, 0)} is the set of orientation vectors. The random variable L represents a process layout from the random distribution of well-designed layouts. We observe the following random variables, which are technically functions of L. N ∈ N The number of nodes in the process graph V = {1, . . . , N} The index set of nodes in the process graph E ⊆ V × V The edges in the process graph Ti ∈ T , i ∈ V Type of component i in the shared classification Ci ∈ R2 , i ∈ V Position of the midpoint of component i Oi ∈ O, i ∈ V Orientation vector of component i We omit the subscripts when convenient to refer to vectors or matrices formed by subscripted random variables. This is somewhat of an abuse of notation considering that the ranges of the subscripts are not fixed. Commensurate with distances (6.2) in the basic constraint model, we define the length of vector c ∈ R2 as d(c) = 2c1 . The oriented angle between direction vectors c, c ∈ R2 is denoted by α(c, c ) ∈ (−Q, Q], where the constant Q represents a half turn. For technical reasons we use a nonstandard angle measurement, described in Sect. 6.5; however the difference to e.g. radians is immaterial for the similarity model. To consider locality purely on the graph level, we define a distance measure in the undirected graph corresponding to the directed process graph (V, E): Let δi j (V, E), or simply δi j when clear in the context, be the number of edges between components i and j in the graph, ignoring edge direction. Further, we define the augmented edge set (6.4) E˜ = {(i, j) ∈ V × V : 0 < δi j ≤ δ max } ,

6 Learning from Prior Designs for Facility Layout Optimization

93

where the distance bound δ max is 3 in our tests. Other bounds are quite possible, considering the trade-off that E˜ = E requires the least computational effort, while E˜ = V × V results in the most complete similarity model. The similarity model is built on the following auxiliary random variables: E˜ The edge set E augmented as above (6.4) ˜ H ∈ E˜ Uniform random edge from the augmented random graph (V, E) I ∈ V The source node of edge H J ∈ V The target node of edge H AIJ = α(OI , CJ − CI ) Angle between orientation of component I and the direction to component J DIJ = d(CJ − CI ) Distance between midpoints of components I and J IJ = δIJ (V, E) ∈ N Number of edges between nodes I and J Given a process graph (V, E) and the component types T , we model the conditional density of the layout distribution as f C,O|V,E,T (C, O | V, E, T ) = 

f AIJ |TI ,TJ ,IJ (α(Oi , C j − Ci ) | Ti , T j , δi j )

(i, j)∈ E˜

(6.5)

f DIJ |TI ,TJ ,IJ (d(C j − Ci ) | Ti , T j , δi j ) , where we assume that the pairwise geometrical relationships between components are not only independent but adequately modelled by the two conditional density functions f AIJ |TI ,TJ ,IJ and f DIJ |TI ,TJ ,IJ , which describe the angles and distances between random pairs of components. The model is independent of the overall orientation of the example layout, and the shape of the floor plan may also vary. By using a simplified model, we avoid the need for large amounts of example data for parameter estimation. Moreover, the simplified density can be relatively easily incorporated in the objective function of a constraint programming model of the layout problem, as described below.

6.4.2 Estimation To evaluate the conditional density (6.5) of “well-designed” layouts, we estimate the conditional density functions f AIJ |TI ,TJ ,IJ and f DIJ |TI ,TJ ,IJ from example layout data by kernel density estimation. Specifically, we apply kernel density estimation of mixed categorical and continuous data as described by Racine and Li [21] and Ju et al. [12]. Both conditional density functions are estimated in the same way, so we describe the method for f AIJ |TI ,TJ ,IJ only. We construct a sample of the random variables (AIJ , DIJ , TI , TJ , IJ ) by observing them on all (augmented) edges of available example layouts. The sample is assumed to be a uniform random sample, following the simplifying assumption we made for

94

H. Rummukainen et al.

the layout density model (6.5) that the angles AIJ and distances DIJ are independent on different edges of the same graph. Let L be the set of observed example layouts, and let us indicate observations on layout L ∈ L by superscript L. For each edge (i, j) ∈ E˜ L in each layout example L ∈ L , we denote the corresponding multivariate sample   point by    L L L L L (Ai j , Di j , Ti , T j , i j ). We get the total sample size n = L∈L  E˜ L . Note that we have both continuous observations of angle AiLj and distance DiLj , and discrete observations of component type Ti L , T jL and path distance iLj . Following Racine and Li [21], we use an unnormalized joint density estimate of the form fˆAIJ ,DIJ ,TI ,TJ ,IJ (a, d, t, t , δ) AiLj − a DiLj − d 1  = k( )k( )l(Ti L , t, λT )l(T jL , t , λT )l(iLj , δ, λ ) , n hA hD L∈L (i, j)∈ E˜ L

(6.6) 1 2 where k(x) = √12π e− 2 x is the Gaussian function, and l is the unnormalized Aitchison-Aitken kernel for unordered discrete variables, defined as  1 if X = x , l(X, x, λ) = (6.7) λ if X = x . The smoothing parameters h A , h D , λT , λT and λ are estimated by the least-squares cross-validation method. The unnormalized estimate of conditional density is derived from estimates of joint densities as fˆA ,T ,T , (a, t, t , δ) , fˆAIJ |TI ,TJ ,IJ (a|t, t , δ) = IJ I J IJ fˆTI ,TJ ,IJ (t, t , δ)

(6.8)

where the unnormalized density in the numerator is derived by eliminating a k factor from (6.6): fˆAIJ ,TI ,TJ ,IJ (a, t, t , δ) AiLj − a 1  = k( )l(Ti L , t, λT )l(T jL , t , λT )l(iLj , δ, λ ) , n hA

(6.9)

L∈L (i, j)∈ E˜ L

and the denominator is derived analogously by eliminating both k factors from (6.6).

6 Learning from Prior Designs for Facility Layout Optimization

95

6.5 Similarity in Layout Optimization We extend the layout optimization model of Sect. 6.3 to address layout similarity. The constraint model is applied to a specific non-random instance of the layout problem, where the process graph (V, E) and other parameters of Sect. 6.3 are known. In addition, each component i ∈ V is assumed to have a type ti ∈ T in the shared classification. Rotations of components were already represented as alternative patterns. We extend the notion so that each pattern choice pi ∈ Pi also determines the orientation of component i, denoting the orientation vector by (x O ( pi ), y O ( pi )) ∈ O. To be able to use integer-only constraint solvers, we use a nonstandard discrete angle measure in the range {−Q + 1, . . . , Q}, closely related to the one described by Todd [24]. With a solver supporting floating point and trigonometric functions, we could use angles measured in radians instead. In our tests we set Q = 11. The following auxiliary decision variables are used: (xiRj , yiRj ) ∈ Z2 Direction vector from component i to j, relative to orientation of component i aiRj ∈ {−Q + 1, . . . , Q} Angle of direction vector (xiRj , yiRj ), i.e. direction from component i to component j, relative to orientation of component i The auxiliary variables are linked to the variables of the basic model by xiRj = x O ( pi )(x j − xi ) + y O ( pi )(y j − yi )

∀(i, j) ∈ E˜

= −y ( pi )(x j − xi ) + x ( pi )(y j − yi ) ∀(i, j) ∈ E˜ Q+1 ⎧ xR ⎪  2  ij  ⎪ Q+1 if yiRj ≥ 0 − trunc ⎨  R  R 2 1+ xi j +yi j  R Q+1 ai j = xR ⎪ ⎪  2  ij  if yiRj < 0 + trunc ⎩− Q−1  R  R 2 yiRj

O

O

(6.10) (6.11) ∀(i, j) ∈ E˜ ,

1+xi j +yi j 

(6.12) where trunc : R → Z maps a real number to its integer part, i.e. rounds towards zero. We note that the angle measurement function α in Sect. 6.4.1 can now be determined as the mapping that Eqs. (6.10)–(6.12) define from the orientation (x O ( pi ), y O ( pi )) and the offset (x j − xi , y j − yi ) to the angle aiRj . In addition to the distance cost objective (6.1), we define two similarity objective functions zA = zD =



(i, j)∈ E˜

− log fˆAIJ |TI ,TJ ,IJ (aiRj |ti , t j , δi j ) ,

(6.13)

(i, j)∈ E˜

− log fˆDIJ |TI ,TJ ,IJ (di j |ti , t j , δi j ) ,

(6.14)



representing the log-densities of the angles and the distances in the solution, respectively. Both are minimization objectives. By the model (6.5), the sum −(z A + z D ) is

96

H. Rummukainen et al.

the unnormalized log-density of the solution in the distribution of “well-designed” layouts, conditioned on parameters of the known problem instance. In a discrete constraint model, the objectives z A and z D are naturally implemented by tabulating the values of the log terms for feasible values of aiRj and di j on each edge ˜ Storage requirements are quite low as we only need two one-dimensional (i, j) ∈ E. arrays for each edge.

6.6 Experiments We have implemented a prototype of the proposed similarity model and facility layout model in order to test the viability of the proposed decision support approach. We implemented the similarity model estimation in Python, using multivariate conditional kernel density estimation from the Python statsmodels library [23]. We implemented the facility layout model in the MiniZinc constraint modelling language [19]. We use the local search based constraint solver Yuck [15]. We report here initial tests on artificial test data. There are components of three types: large cylinder, medium-size rectangle and small square. In addition, small round nodes are used to guide the example layout, but not included in the process graph for the similarity model or layout case. We use only four different process graphs, shown in Fig. 6.2, and vary the layouts of the graphs. The goal of all our experiments is to optimize the layout of graph (d) of Fig. 6.2, called the layout case. The similarity model is built from example layouts, in which the graph structure may differ from the layout case. We report results for two experiments: U. Learning similarity measure from uniform example data: All layout examples as well as the layout case have the same graph structure (d), and only the layout varies. There are nine example layouts of graph (d). V. Learning similarity measure from varied example data: The layout examples use three different graphs (a)–(c), all of which differ from the structure (d) of the layout case. There are three example layouts of each of the graphs (a)–(c), for a total of nine examples. We generated the example layouts by running the basic layout optimization model of Sect. 6.3. We used unit cost on the bright yellow edges and 0 cost on the dark green edges: in other words, our example layouts have short bright edges, and the length of dark edges is ignored. We collected all solutions produced by a local search solver, and then picked a diverse subset of solutions within 10 % of the lowest-cost solution found. For component orientations, we picked the major direction closest to the mean direction of the edges connected to each component. As we optimize the layout case after the construction of the similarity model, we apply two objective functions: (1) the similarity objective z A + z D measures similarity to the example layouts and (2) the case objective is the distance cost objective z C with unit cost on the dark edges and 0 cost on the bright edges. In other

6 Learning from Prior Designs for Facility Layout Optimization

97

(a)

(b)

(c)

(d)

Fig. 6.2 Artificial process graphs used in the tests. Only the graph structure is of importance, the specific layouts are not. There are three types of components: large cylinders with a rectangular bounding box, medium-size rectangles and small squares. The small round nodes at edges are layout guides which can be ignored. Black triangles point in the direction of component orientation. Edge shades indicate which edges we apply different edge weights on. Some edges are curved solely for clarity of visualization

words, here the similarity objective implicitly minimizes the length of bright edges and the case objective explicitly minimizes the length of dark edges. We use a different explicit objective in the layout case and in the examples, so that machine learning is necessary to consider the example objective in the layout case. The intention is that by applying different weights on the similarity and distance cost objectives, we can trade off similarity to the examples and optimality with respect to the case objective.

98

H. Rummukainen et al.

It is important to note that the edge colour is not a factor in the similarity model: the effect of the different distance cost objectives in the examples and the layout case must be learned indirectly via the types of the edge endpoints. The optimization of the layout case was run with a range of weights on the similarity objective z A + z D and the case objective z C , i.e. dark edge length. The relative proportion of similarity weight to distance weight ranged from 1/64 to 64/1, with an additional run optimizing pure similarity with 0 weight on the case objective. From each optimization run with specific objective weights, we collected three solutions within 10% of the best weighted objective value.

Fig. 6.3 Case layouts after learning in experiment V, minimizing (a) similarity to example layouts with short bright yellow edges, (b) the case objective, i.e. dark green edge length, (c) the sum of similarity and case objective with weights 32/1, and (d) the sum of similarity and case objective with weights 8/1. The thin translucent lines show the augmented edges of the similarity model

6 Learning from Prior Designs for Facility Layout Optimization

99

Fig. 6.4 Objective values of case layout results with different weights on the similarity and case objectives, showing also the objective values of the example layouts. The proportion of similarity weight to case objective weight is on the x-axis, with examples and results of pure similarity optimization at ”infinity”. The y-axis represents either (a) the explicit case objective, i.e. dark green edge length or (b) the example layout objective, i.e. bright yellow edge length

Four specific case layouts for experiment V are shown in Fig. 6.3: these are the solutions with the best similarity value (a) and the best distance cost value (b), and solutions minimizing a combination of both objectives with different weights (c–d). An overview of the distance cost objective values reached with different similarity weights in experiments U and V is shown in Fig. 6.4.

6.7 Discussion The similarity model works reasonably well to reproduce geometrical relationships between components, even when the examples are not identical in structure to the current case: In particular, in Fig. 6.3a it is clear that optimizing similarity has resulted in short bright edges at the expense of long dark edges, as desired.

100

H. Rummukainen et al.

In Fig. 6.4 we see that as the similarity weight increases, the case objective (dark edge length) tends to increase, and in exchange, the example objective (bright edge length) decreases. Nevertheless, the case results (learned-U and learned-V in Fig. 6.4) do not fully reach the level of the examples (example-U and example-V), even when optimizing only the similarity objective without regard to the case objective: This indicates that our simplified similarity model does not fully reproduce the example layouts, since the model is limited to pairwise geometrical relationships between components. A more accurate model would likely require far more training data than typically available in facility layout planning. Looking at the right half of Fig. 6.4, we arrive at the natural conclusion that increasing the weight of the similarity objective gives more consistent reductions in the example objective, when the similarity model is derived from examples of the same process graph as in the case (experiment U), as compared to examples of different process graphs (experiment V). In any case, there appears to be more variance in the results with larger weights on the similarity objective; this may be related to the fact that similarity to examples with long dark edges conflicts with the minimization of dark edge length in the case objective. The presented similarity model is limited to geometrical features, and more abstract relationships that involve e.g. numerical properties of the process components are not considered: for example, the minimum safe distance from a tank to a furnace might depend both on the tank volume and the stored material. Another limitation of our model is that it has difficulties reproducing continuous physical spaces between components, e.g. walkways for machine operators. We believe this issue can be solved by alternative similarity models with a more global view of the layout, not just pairwise geometrical relationships between components. It can also be useful to learn hard constraints in addition to the purely objective function based approach employed here. The general approach shows promise, and we aim to evaluate it further with realworld data, and to verify its usefulness by soliciting feedback from expert facility designers. Although the presented model is limited to basic two-dimensional layouts, it would be relatively straightforward to extend the layout similarity model to threedimensional layouts on multiple interconnected floors. Acknowledgements This work was supported by Business Finland, through project Engineering Rulez.

References 1. A.-R. Ahmad, I.A. Tasadduq, M.H. Imam, K.B. Shaban, Automated discovery and utilization of tacit knowledge in facility layout planning and optimization. J. Softw. Syst. Develop. (2015). Article ID 369029 2. M.F. Anjos, M.V.C. Vieira, Mathematical optimization approaches for facility layout problems: the state-of-the-art and future research directions. Eur. J. Oper. Res. 261, 1–16 (2017)

6 Learning from Prior Designs for Facility Layout Optimization

101

3. G.C. Armour, E.S. Buffa, A heuristic algorithm and simulation approach to relative location of facilities. Manag. Sci. 9(2), 294–309 (1963) 4. A. Aswani, Z.-J. Shen, A. Siddiq, Inverse optimization with noisy data. Oper. Res. 66(3), 870–892 (2018) 5. C. Bessière, R. Coletta, E.C. Freuder, B. O’Sullivan, Leveraging the learning power of examples in automated constraint acquisition, in CP’04: 10th International Conference on Principles and Practice of Constraint Programming, ed. by M. Wallace. Lecture Notes in Computer Science (Springer, 2004), pp. 123–137 6. K.E. Cambron, G.W. Evans, Layout design using the analytic hierarchy process. Comput. Ind. Eng. 20(2), 211–229 (1991) 7. Y.-K. Chung, A neuro-based expert system for facility layout construction. J. Intell. Manuf. 10, 359–385 (1999) 8. L. García-Hernández, H. Pierreval, L. Salas-Morera, A. Arauzo-Azofra, Handling qualitative aspects in unequal area facility layout problem: an interactive genetic algorithm. Appl. Soft Comput. 13, 1718–1727 (2013) 9. L. García-Hernáindez, A. Arauzo-Azofra, L. Salas-Morera, H. Pierreval, E. Corchado, Facility layout design using a multi-objective interactive genetic algorithm to support the DM. Expert Syst. 32(1), 94–107 (2015) 10. J. Grobelny, R. Michalski, A novel version of simulated annealing based on linguistic patterns for solving facility layout problems. Knowl.-Based Syst. 124, 55–69 (2017) 11. H. Hosseini-Nasab, S. Freidouni, S.M.T. Fatemi Ghomi, M.B. Fakhrzad, Classification of facility layout problems: a review study. Int. J. Adv. Manuf. Technol. 94, 957–977 (2018) 12. G. Ju, R. Li, Z. Liang, Nonparametric estimation of multivariate CDF with categorical and continuous data. Adv. Econom. 25, 291–318 (2009) 13. S. Jung, Facility siting and plant layout optimization for chemical process safety. Korean J. Chem. Eng. 33(1), 1–7 (2016) 14. M. Lombardi, M. Milano, A. Bartolini, Empirical decision model learning. Artif. Intell. 244, 343–367 (2017) 15. M. Marte. Yuck, https://github.com/informarte/yuck (2018). Accessed 04 Oct 2018 16. P. Merrell, E. Schkufza, V. Koltun, Computer-generated residential building layouts. ACM Trans. Graph. 29(6) (2010), Article 181 17. F. Mizoguchi, H. Ohwada, Constrained relative least general generalization for inducing constraint logic programs. New Gen. Comput. 13, 335–368 (1995) 18. S. Moran, An Applied Guide to Process and Plant Design (Elsevier, Amsterdam, 2015) 19. N. Nethercote, P.J. Stuckey, R. Becket, S. Brand, G.J. Duck, G. Tack, MiniZinc: Towards a standard CP modelling language. in Proceedings of the 13th International Conference on Principles and Practice of Constraint Programming, ed. by C. Bessiere. Lecture Notes in Computer Science, vol. 4741 (Springer, 2007), pp. 529–543 20. B. O’Sullivan, Automated modelling and solving in constraint programming, in Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), eds. by M. Fox, D. Poole (AAAI Press, 2010), pp. 1493–1497 21. J. Racine, Q. Li, Nonparametric estimation of regression functions with both categorical and continuous data. J. Econ. 119, 99–130 (2004) 22. A.J. Schaefer, Inverse integer programming. Opt. Lett. 3, 483–489 (2009) 23. S. Seabold, J. Perktold, Statsmodels: econometric and statistical modeling with Python, in 9th Python in Science Conference (2010) 24. J. Todd, Encoding 2 angles without trigonometry (2009), https://www.freesteel.co.uk/wpblog/ 2009/06/05/. Accessed 20 Aug 2018 25. B. Yu, D.Z. Pan, T. Matsunawa, X. Zeng, Machine learning and pattern matching in physical design, in The 20th Asia and South Pacific Design Automation Conference, pp. 286–293 (2015)

Chapter 7

Single-Objective Real-Parameter Optimization: Enhanced LSHADE-SPACMA Algorithm Anas A. Hadi, Ali W. Mohamed, and Kamal M. Jambi

Abstract Real parameter optimization is one of the active research fields during the last decade. The performance of LSHADE-SPACMA was competitive in IEEE CEC’2017 competition on Single Objective Bound Constrained Real-Parameter Single Objective Optimization. Besides, it was ranked fourth among twelve papers were presented on and compared to this new benchmark problems. In this work, an improved version named ELSHADE-SPACMA is introduced. In LSHADESPACMA, p value that controls the greediness of the mutation strategy is constant. While in ELSHADE-SPACMA, p value is dynamic. Larger value of p will enhance the exploration, while smaller values will enhance the exploitation. We further enhanced the performance of ELSHADE-SPACMA by integrating another directed mutation strategy within the hybridization framework. The proposed algorithm has been evaluated using IEEE CEC’2017 benchmark. According to the comparison results, the proposed ELSHADE-SPACMA algorithm is better than LSHADE and LSHADE-SPACMA. Besides, The comparison results between ELSHADESPACMA and the best three algorithms from the IEEE CEC’2017 Competition indicate that ELSHADE-SPACMA algorithm shows overall better performance and it is highly competitive algorithm for solving global optimization problems.

A. A. Hadi · K. M. Jambi Faculty of Computing and Information Technology, King Abdulaziz University, P. O. Box 80200, Jeddah 21589, Saudi Arabia e-mail: [email protected] K. M. Jambi e-mail: [email protected] A. W. Mohamed (B) Operations Research Department, Faculty of Graduate Studies for Statistical Research, Giza 12613, Egypt Wireless Intelligent Networks Center (WINC), School of Engineering and Applied Sciences, Nile University,Giza 12588, Egypt e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_7

103

104

A. A. Hadi et al.

7.1 Enhanced LSHADE with Semi-parameter Adaptation Hybrid with CMA-ES (ELSHADE-SPACMA) In this section, we describe the details of ELSHADE-SPACMA which is a new improvement of LSHADE-SPCMA [7]. A brief background about each component of ELSHADE-SPACMA is given, then, the new improved version will be discussed.

7.1.1 LSHADE Algorithm LSHADE algorithm proposed by Tanabe and Fukunaga [9]. In order to establish a starting point for the optimization process, an initial population P 0 must be created. Typically, each jth component of the ith individuals in the P 0 is obtained as follow: x 0j,i = x j,L + rand(0, 1)(x j,U − x j,L )

(7.1)

where rand (0,1) returns a uniformly distributed random number in [0, 1]. At generation G, for each target vector xiG , a mutant vector viG is generated according to current-to-pbest/1 mutation strategy which was proposed by in the framework of JADE by Zhang and Sanderson [10]. viG = xiG + FiG (x Gpbest − xiG ) + F(xrG1 − xrG2 )

(7.2)

The P value here is considered as a control parameter for the greediness of the mutation strategy in order to balance exploitation and exploration. r1 is a random index selected from the population, r2 is another random index selected from the concatenation of the population with an external archive. This external archive holds parent vectors which successfully produced better vectors. x Gpbest is the best individual vector with the best fitness value in the population at generation G. The scale factor FiG is a positive control parameter for scaling the difference vector. In the crossover, the target vector is mixed with the mutated vector, using the following scheme, to yield the trial vector u iG .  vi,Gj , i f (randi, j ≤ Cr O R j = jrand G u i, j = (7.3) xi,Gj , other wise where rand j,i is a uniformly distributed random number in [0, 1], i ∈ {1, N }, j ∈ {1, D}, Cr ∈ [0, 1] called the crossover rate that controls how many components are inherited from the mutant vector, and jrand is a uniformly distributed random integer in[1, D] that makes sure at least one component of trial vector is inherited from the mutant vector. DE adapts a greedy selection strategy. If and only if the trial vector u iG yields as good as or a better fitness function value than xiG , then u iG is set to xiG+1 . Otherwise,

7 Single-Objective Real-Parameter Optimization: Enhanced …

105

the old vector xiG is reserved. The selection scheme is as follows (for a minimization problem):  G u i , i f ( f (u iG ) ≤ f (xiG )) xiG+1 = (7.4) xiG , other wise In order to improve the performance of LSHADE-SPA, Linear Population Size Reduction (LPSR) was used. In LPSR the population size will be decreased according to a linear function. The linear function in LSHADE-SPA was:    min − N init N ∗ N F E + N init (7.5) N(G+1) = r ound M AX N F E where N F E is the current number of fitness evaluations, M AX N F E is the maximum number of fitness evaluations, N init is the initial population size, and N min = 4 which is the minimum number of individuals that DE can work with.

7.1.2 CMA-ES Algorithm Among many variants of Evolution Strategies, CMA-ES was efficiently able to solve diverse types of optimization problems [5]. In CMA-ES the search space is modeled using multivariate normal distribution. New individuals are generated using Gaussian distribution considering the path that the population takes over generations. CMA-ES automatically adapt the mean vector m, covariance matrix C, and step size σ . CMA-ES steps are as the following: 1. Create an initial population and evaluate the fitness function. 2. Generate new individuals using Gaussian distribution: xi = N (m, σ 2 C)∀i = 1 : n

(7.6) μ

3. Mean vector m is updated using best μ individuals according to: m = (i=1) wi xi μ where (i=1) wi = 1 and w1 ≥ w2 ≥ · · · ≥ wμ 4. Step size σ and Covariance matrix C are updated 5. Repeat steps 2 to 3 until a stopping criterion is met In order to improve the exploration capability of LSHADE-SPACMA, a crossover operation was applied after the CMA-ES offspring generation step according crossover Eq. 7.3.

106

A. A. Hadi et al.

7.1.3 Semi-parameter Adaptation of Scaling Factor (F) and Crossover Rate (Cr) Parameter setting has a significant impact on the performance of DE. The practices in the fields of parameter adaptation demonstrate the relationship between the problem itself and the parameter values [4]. Each problem has its own appropriate parameter values. In order to perform Semi-Parameter Adaptation (SPA) for F and Cr , SPA is composed of two parts. The first part is activated during the first half of the search, while the second part is activated during the second half of the search.

7.1.3.1

First Part of SPA

The idea is to activate the change one parameter at a time policy. Thus, during the first part of SPA, the adaptation is concentrated on one parameter Cr using LSHADE adaptation, while F parameter will be generated using uniform distribution randomly within a specific limit. The first part of SPA using the condition: (n f es < maxn f es /2) where n f es is the current number of function evaluations and maxn f es is the maximum number of function evaluation. During SPA, each individual has its own Fi and Cri values. Fi will be generated using uniform distribution within the range (0.45, 0.55): Fi = 0.45 + 0.1 ∗ rand

(7.7)

On the other hand, Cri values is adapted according to the following equation: Cri = randn(Mcri , 0.1)

(7.8)

where Mcri is a randomly selected memory slot where successful means of previous generations which are stored. Memory index i is selected randomly from the range [1, h] where h here is the memory size. Initially, all Mcr values are set to 0.5, and by the end of each generation, one memory slot Mcr is updated using the arithmetic mean of Cri values, which successfully generate new individuals.

7.1.3.2

Second Part of SPA

During the second part, L-SHADE adaptation will be used to adapt F parameter adaptation using the following equation. The adaptation will be concentrated on F. Each individual has its own Fi value. Fi will be generated using Cauchy distribution: Fi = randc(M Fi , σ )

(7.9)

7 Single-Objective Real-Parameter Optimization: Enhanced …

107

σ is the standard deviation for Cauchy distribution and it was set to 0.1, M Fi is a randomly selected memory slot where successful means of previous generations which are stored. By the end of each generation, one memory slot M F is updated using the Lehmer mean of Fi values, which successfully generate new individuals. Fi values of the last 5 generation of the first part of SPA are used to initialize memory slot M Fi for the second part of SPA. Cr parameter adaptation process will remain as is during the second this part. Due to the nature of LSHADE parameter adaptation, Cr parameter will be gradually frozen to the adapted values. According to LSHADE parameter adaptation, when all Cr values in a generation are failed to generate successful individuals, the corresponding memory slot is set to a terminal value. Thus, Mcr will be frozen until the end of the search.

7.1.4 LSHADE-SPACMA Algorithm LSHADE-SPACMA framework starts with a mutual population P. Each individual x in P will generate offspring individual u using either LSHADE or CMA-ES. This assignment is done according to class probability variable (FC P). FC P values are randomly selected from memory slots M FC P . By the end of each generation, one memory slot M FC P is updated according to the performance of each algorithm. Thus, more populations will be assigned gradually to the better performance algorithm. The update is performed using individuals that successfully generate new individuals only. Memory slot M FC P is updated according to: M FC P,g+1 = (1 − c)M FC P,g + c Alg1

(7.10)

where c is the learning rate, and  Alg1 is the improvement rate for each algorithm.    ω Alg1 (7.11)  Alg1 = min 0.8, max 0.2, ω Alg1 + ω Alg2 where 0.2 and 0.8 values are the minimum and maximum probabilities assigned to each algorithm. Thus, to maintain both algorithms executed simultaneously, FCP values will be always kept in the range (0.2, 0.8). ω Alg1 is the summation of differences between old and new fitness values for each individual belongs algorithm Alg1 (Fig. 7.1). n f (x) − f (u) ω Alg1 = (i=1)

(7.12)

where f is the fitness function, x is the old individual, u is the offspring individual, and n is the number of individuals belongs to algorithm Alg1 (Fig. 7.1).

108

A. A. Hadi et al.

Fig. 7.1 Pseudo-code of LSHADE-SPACMA algorithm

7.1.5 AGDE Mutation Strategy AGDE mutation strategy was proposed in [8]. In order to utilize the information of good and bad vectors in the DE population, AGDE integrate information from the best and worst groups from the population. Figure 7.2 shows AGDE pseudo code. AGDE uses two random chosen vectors of the top and the bottom 100p% individuals in the current population of size NP while the third vector is selected randomly from the middle [N P − 2(100 p%)] individuals. This new mutation scheme helps maintain effectively the balance between the global exploration and local exploitation abilities for searching process of the DE. In each generation, the population is divided into three clusters (best, better and worst) of sizes 100 p%, N P − 2 ∗ (100 p%) and 100 p% respectively. Three vectors are selected randomly, one from each partition to generate the mutant vector based on the following equation: (g+1)

vi g

g

g

= xrg + F ∗ (x p−best − x p−wor st )

(7.13) g

g

where xr is chosen randomly from the middle N P − 2 ∗ (100 p%), x p−best , x p−wor st are chosen randomly from the top and bottom 100p%, where F is the mutation factors

7 Single-Objective Real-Parameter Optimization: Enhanced …

109

Fig. 7.2 Pseudo-code of AGDE algorithm

that are independently generated according to uniform distribution in [0.1, 1]. Figure 7.2 shows AGDE-SPA pseudo code.

7.1.6 ELSHADE-SPACMA Hybridization Framework Two improvement were considered to improve the performance of LSHADESPACMA. The first one was a hybridization framework between LSHADE-SPACMA and AGDE. Both algorithms were integrated with 50% for each of them. The framework will assign all the population to LSHADE-SPACMA for one generation, then all the population will be assigned to AGDE for another generation. ELSHADESPACMA framework is illustrated in Fig. 7.3. The second improvement was to balance the exploration and exploitation behavior of ELSHADE-SPACMA by adjusting the greediness parameter p of the mutation. p value will start with a larger value in order to enhance the exploration capability of ELSHADE-SPACMA, then it will be reduced linearly in order to concentrate the search and enhance the exploitative capability of ELSHADE-SPACMA during late stages of the search. p value will be reduced according to:    min p − pinit ∗ N F E + pinit (7.14) pG+1 = r ound M AX N F E

110

A. A. Hadi et al.

Fig. 7.3 Pseudo-code of ELSHADE-SPACMA algorithm

where N F E is the current number of fitness evaluations, M AX N F E is the maximum number of fitness evaluations, pinit is the initial p value, and p min is the minimum value of p.

7.2 Experimental Study 7.2.1 Numerical Benchmarks The performance of the proposed ELSHADE-SPACMA algorithm is evaluated using a set of problems presented in the CEC2017 competition on real-parameter single objective optimization. This benchmark contains 30 test functions with a diverse set of characteristics. D is the dimensionality of the problem and the functions are tested on 10D, 30D, 50D and 100D. In summary, functions 1–3 are unimodal, functions 4– 10 are multimodal, functions 11–20 are hybrid functions and 21–30 are composition functions. More details can be found in [6]. Note that f 2 has been excluded because it shows unstable behavior especially for higher dimensions.

7.2.2 Parameter Settings and Involved Algorithms Algorithm parameters for ELSHADE-SPACMA are as the following. The initial population size (NP) were set to 18 ∗ D, Memory size (H), and archive rate (Ar cr − ate) were set to 5 and 1.4 as it was described in LSHADE. Probability Variable (FCP) was set to 0.5, and learning rate (c) was set to 0.8. The threshold, where the

7 Single-Objective Real-Parameter Optimization: Enhanced …

111

second part of SPA is activated, was set to (maxn f es /2). AGDE p value were set to 0.1. Finally, pinit was set to 0.3 and p min was set to 0.15. ELSHADE-SPACMA is compared with best three algorithms from the IEEE CEC’2017 Competition were, in this order, EBOwithCMAR [3], jSO [2] and LSHADE-cnEpSin [1].

7.3 Experimental Results and Discussions 7.3.1 Results of ELSHADE-SPACMA Algorithm To evaluate the performance of algorithms, experiments were conducted on the test suite. We adopt the solution error measure f (x) − f (x ∗ ), where x is the best solution obtained by algorithms in one run and x ∗ is the well-known global optimum of each benchmark function. Error values and standard deviations smaller than 10-8 are taken as zero. The maximum number of function evaluations (FEs), the terminal criteria, is set to 10000 ∗ D, all experiments for each function and each algorithm run 51 times independently. The statistical results of the ELSHADE-SPACMA on the benchmarks with 10, 30, 50 and 100 dimensions are summarized in Tables 7.1, 7.2, 7.3 and 7.4. It includes the obtained best, worst, median, mean values and the standard deviations of error from the optimum solution of the proposed ELSHADE-SPACMA over 51 runs for all 29 benchmark functions.The statistical results, statistical analysis and score metric between ELSHADESPACMA and other compared algorithms are summarized in (Tables 7.5, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11 and 7.12). From the above results, comparisons and discussion through this section, the proposed ELSHADE-SPACMA algorithm is of better searching quality, efficiency and robustness for solving unconstrained global optimization problems. It is clear that the proposed ELSHADE-SPACMA algorithm performs well and it has shown its outstanding superiority with separable, non-separable, unimodal and multimodal functions with shifts in dimensionality, rotation, multiplicative noise in fitness and composition of functions. Consequently, its performance is not influenced by all these obstacles. Contrarily, it greatly keeps the balance the local optimization speed and the

Table 7.1 Using score metric between LSHADE-cnEpSin, jSO, EBOwithCMAR and ELSHADESPACMA algorithms Algorithms Score1 Score2 Score Rank ELSHADESPACMA EBOwithCMAR jSO LSHADEcnEpSin

50

50

100

1

48.93 48.62 45.81

49.72 46.63 47.89

98.64 95.24 93.69

2 3 4

112

A. A. Hadi et al.

Table 7.2 Results of the 10D benchmark functions, averaged over 51 independent runs Func. Best Worst Median Mean Std. f1 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30

0.00E+00 0.00E+00 0.00E+00 9.95E-01 0.00E+00 1.11E+01 9.95E-01 0.00E+00 1.14E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.63E-07 2.04E-02 1.42E-02 2.29E-04 0.00E+00 0.00E+00 1.00E+02 1.00E+02 3.00E+02 0.00E+00 3.98E+02 3.00E+02 3.89E+02 0.00E+00 2.27E+02 3.95E+02

0.00E+00 0.00E+00 0.00E+00 1.09E+01 0.00E+00 1.97E+01 1.29E+01 0.00E+00 2.27E+02 0.00E+00 1.31E+02 5.39E+00 9.95E-01 5.00E-01 1.14E+00 4.35E-01 2.00E+01 3.92E-02 3.12E-01 2.06E+02 1.00E+02 3.10E+02 3.40E+02 4.46E+02 3.00E+02 3.90E+02 6.12E+02 2.36E+02 4.43E+02

0.00E+00 0.00E+00 0.00E+00 3.00E+00 0.00E+00 1.25E+01 3.98E+00 0.00E+00 6.53E+00 0.00E+00 4.16E-01 4.84E+00 0.00E+00 1.44E-01 5.19E-01 6.05E-02 3.27E-01 1.94E-02 0.00E+00 1.00E+02 1.00E+02 3.05E+02 3.34E+02 3.98E+02 3.00E+02 3.90E+02 3.00E+02 2.30E+02 3.95E+02

0.00E+00 0.00E+00 0.00E+00 3.87E+00 0.00E+00 1.33E+01 4.10E+00 0.00E+00 2.27E+01 0.00E+00 2.85E+01 3.57E+00 7.80E-02 2.51E-01 5.62E-01 1.39E-01 7.10E-01 1.55E-02 1.41E-01 1.02E+02 1.00E+02 3.04E+02 2.91E+02 4.13E+02 3.00E+02 3.89E+02 3.25E+02 2.30E+02 4.02E+02

0.00E+00 0.00E+00 0.00E+00 2.02E+00 0.00E+00 1.75E+00 2.51E+00 0.00E+00 4.84E+01 0.00E+00 5.14E+01 2.21E+00 2.70E-01 2.17E-01 2.55E-01 1.44E-01 2.80E+00 1.14E-02 1.57E-01 1.48E+01 1.21E-01 2.30E+00 9.54E+01 2.18E+01 0.00E+00 1.67E-01 1.04E+02 2.26E+00 1.77E+01

global optimization diversity in challenging optimization environment with invariant performance. Besides, its performances is superior and competitive with the performance of the best three algorithms from the IEEE CEC’2017 Competition. Finally, It can be concluded that the new directed mutation scheme of AGDE and dynamic p value help to maintain effectively the balance between the global exploration and local exploitation abilities for searching process of the LSHADE-SPACMA which enhances significantly its performance during the search process.

7 Single-Objective Real-Parameter Optimization: Enhanced …

113

Table 7.3 Results of the 30D benchmark functions, averaged over 51 independent runs Func. Best Worst Median Mean Std. f1 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30

0.00E+00 0.00E+00 5.86E+01 4.97E+00 0.00E+00 3.31E+01 6.12E+00 0.00E+00 1.07E+03 0.00E+00 3.93E+00 2.98E+00 2.13E+00 2.99E-01 3.57E+00 1.05E+01 2.94E-01 2.67E+00 1.67E+01 2.10E+02 1.00E+02 3.49E+02 4.21E+02 3.87E+02 8.95E+02 4.86E+02 3.00E+02 3.63E+02 1.94E+03

0.00E+00 0.00E+00 5.86E+01 3.78E+01 0.00E+00 4.79E+01 3.58E+01 0.00E+00 2.81E+03 6.40E+01 5.53E+02 2.16E+01 3.10E+01 4.56E+00 2.89E+02 4.13E+01 2.28E+01 8.18E+00 4.25E+01 2.36E+02 1.00E+02 4.00E+02 4.57E+02 3.87E+02 1.27E+03 5.11E+02 4.14E+02 4.58E+02 2.12E+03

0.00E+00 0.00E+00 5.86E+01 1.79E+01 0.00E+00 3.81E+01 1.39E+01 0.00E+00 1.65E+03 3.99E+00 2.48E+02 1.70E+01 2.50E+01 1.46E+00 2.56E+01 2.92E+01 2.14E+01 4.62E+00 2.69E+01 2.22E+02 1.00E+02 3.69E+02 4.42E+02 3.87E+02 1.07E+03 4.98E+02 3.00E+02 4.35E+02 1.97E+03

0.00E+00 0.00E+00 5.86E+01 1.86E+01 0.00E+00 3.89E+01 1.61E+01 0.00E+00 1.70E+03 7.80E+00 2.47E+02 1.57E+01 2.37E+01 1.86E+00 6.68E+01 2.97E+01 2.09E+01 4.61E+00 2.73E+01 2.22E+02 1.00E+02 3.69E+02 4.41E+02 3.87E+02 1.08E+03 4.99E+02 3.02E+02 4.33E+02 1.98E+03

0.00E+00 0.00E+00 0.00E+00 8.04E+00 0.00E+00 3.43E+00 7.46E+00 0.00E+00 4.06E+02 1.42E+01 1.28E+02 5.03E+00 5.25E+00 1.29E+00 8.35E+01 6.76E+00 3.03E+00 1.35E+00 4.56E+00 6.64E+00 0.00E+00 1.05E+01 7.84E+00 9.60E-03 8.68E+01 6.15E+00 1.60E+01 1.56E+01 3.34E+01

7.4 Conclusion In order to enhance the overall performance of LSHADESPACMA algorithm, an improved version named ELSHADE-SPACMA is introduced. In LSHADESPACMA, p value that controls the greediness of the mutation strategy is constant. In ELSHADE-SPACMA, p value is dynamic. Larger value of p will enhance the exploration, while smaller values will enhance the exploitation. We further enhanced the performance of ELSHADE-SPACMA by integrating another directed mutation strat-

114

A. A. Hadi et al.

Table 7.4 Results of the 50D benchmark functions, averaged over 51 independent runs Func. Best Worst Median Mean Std. f1 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30

0.00E+00 0.00E+00 8.02E+00 9.95E-01 0.00E+00 5.54E+01 6.96E+00 0.00E+00 2.69E+03 2.03E+01 4.03E+02 1.07E+01 2.52E+01 1.94E+01 1.29E+02 3.48E+01 2.16E+01 8.46E+00 3.66E+01 2.23E+02 1.00E+02 4.35E+02 5.14E+02 4.77E+02 1.08E+03 4.80E+02 4.59E+02 3.27E+02 5.79E+05

0.00E+00 0.00E+00 1.42E+02 2.98E+01 0.00E+00 7.35E+01 4.28E+01 0.00E+00 5.40E+03 3.31E+01 2.28E+03 6.71E+01 4.25E+01 2.91E+01 8.66E+02 5.09E+02 3.43E+01 1.92E+01 3.01E+02 2.62E+02 4.85E+03 4.89E+02 5.62E+02 4.92E+02 1.64E+03 5.28E+02 5.08E+02 4.10E+02 6.63E+05

0.00E+00 0.00E+00 2.85E+01 1.39E+01 0.00E+00 6.12E+01 1.69E+01 0.00E+00 3.59E+03 2.52E+01 1.39E+03 4.51E+01 2.94E+01 2.22E+01 3.57E+02 2.26E+02 2.47E+01 1.44E+01 8.59E+01 2.41E+02 1.00E+02 4.61E+02 5.33E+02 4.80E+02 1.33E+03 5.09E+02 4.59E+02 3.56E+02 5.90E+05

0.00E+00 0.00E+00 4.36E+01 1.39E+01 0.00E+00 6.15E+01 1.79E+01 0.00E+00 3.69E+03 2.62E+01 1.36E+03 3.68E+01 3.07E+01 2.28E+01 4.15E+02 2.30E+02 2.51E+01 1.44E+01 1.08E+02 2.42E+02 7.16E+02 4.62E+02 5.34E+02 4.81E+02 1.34E+03 5.10E+02 4.60E+02 3.58E+02 5.97E+05

0.00E+00 0.00E+00 3.62E+01 5.55E+00 0.00E+00 3.86E+00 7.47E+00 0.00E+00 6.07E+02 3.76E+00 3.42E+02 1.72E+01 3.95E+00 2.20E+00 1.77E+02 9.68E+01 2.56E+00 2.31E+00 7.31E+01 9.52E+00 1.44E+03 1.39E+01 9.14E+00 2.80E+00 1.38E+02 9.52E+00 6.84E+00 1.78E+01 2.38E+04

egy within the hybridization framework. The proposed algorithms were tested on the benchmarks of the CEC2017 which is used in the special Session and Competition on Real-Parameter Single Objective Optimization of the IEEE CEC2017. As a summary of results, the performance of the ELSHADE-SPACMA algorithm was superior to and competitive with LSHADE-SPACMA, LSHADE-SPA and LSHADE algorithms in the majority of functions and for different dimensions especially for high dimension functions with different types. When compared with the best three algorithms from the IEEE CEC’2017 Competition, it shows a very competitive performance and it is ranked first. Moreover, future research will investigate the performance

7 Single-Objective Real-Parameter Optimization: Enhanced …

115

Table 7.5 Results of the 100D benchmark functions, averaged over 51 independent runs Func. Best Worst Median Mean Std. f1 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30

0.00E+00 0.00E+00 1.92E+02 1.09E+01 0.00E+00 1.08E+02 1.19E+01 0.00E+00 7.74E+03 3.46E+01 3.44E+03 6.97E+01 3.69E+01 5.23E+01 7.10E+02 4.81E+02 5.62E+01 4.25E+01 4.88E+02 2.64E+02 7.95E+03 5.53E+02 8.91E+02 6.37E+02 2.98E+03 5.30E+02 4.78E+02 8.04E+02 2.09E+03

0.00E+00 2.81E-06 2.12E+02 2.89E+01 9.37E-08 1.15E+02 2.59E+01 0.00E+00 1.23E+04 2.45E+02 1.87E+04 2.88E+02 6.18E+01 2.22E+02 3.00E+03 1.92E+03 2.02E+02 7.45E+01 1.46E+03 3.33E+02 1.20E+04 6.57E+02 9.78E+02 7.74E+02 4.12E+03 6.08E+02 5.77E+02 1.77E+03 2.61E+03

0.00E+00 4.59E-08 1.97E+02 1.79E+01 0.00E+00 1.11E+02 1.79E+01 0.00E+00 1.08E+04 5.22E+01 6.76E+03 1.49E+02 4.69E+01 9.09E+01 1.79E+03 1.27E+03 1.02E+02 6.09E+01 8.89E+02 2.95E+02 9.29E+03 6.01E+02 9.32E+02 6.98E+02 3.17E+03 5.62E+02 5.22E+02 1.21E+03 2.22E+03

0.00E+00 1.60E-07 2.01E+02 1.78E+01 0.00E+00 1.11E+02 1.82E+01 0.00E+00 1.08E+04 7.34E+01 7.79E+03 1.49E+02 4.75E+01 1.08E+02 1.76E+03 1.27E+03 1.05E+02 6.05E+01 9.28E+02 2.96E+02 9.70E+03 6.03E+02 9.32E+02 7.00E+02 3.24E+03 5.62E+02 5.21E+02 1.21E+03 2.25E+03

0.00E+00 4.00E-07 8.67E+00 3.85E+00 1.34E-08 1.48E+00 3.02E+00 0.00E+00 9.53E+02 4.30E+01 2.92E+03 3.83E+01 5.69E+00 4.34E+01 4.88E+02 3.45E+02 2.56E+01 7.55E+00 2.54E+02 1.65E+01 1.20E+03 2.19E+01 1.90E+01 3.99E+01 2.19E+02 1.75E+01 2.38E+01 1.98E+02 1.11E+02

of the ELSHADE-SPACMA algorithm in solving constrained and multi-objective optimization problems as well as real-world applications such as big data, data mining and clustering problems. The MATLAB source code of ELSHADE-SPACMA is available upon request and it can be downloaded from https://sites.google.com/ view/optimization-project/files.

116

A. A. Hadi et al.

Table 7.6 Comparison between EBOwithCMAR(EBO), jSO, LSHADE-cnEpSin(EpSin), ELSHADE-SPACMA(ESPACMA) on the benchmark with 10 and 30 dimensions D = 10

D = 30

EBO

jSO

EpSin

ESPACMA EBO

jSO

EpSin

ESPACMA

f1

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f3

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f4

0.00E+00

0.00E+00

0.00E+00

0.00E+00

5.65E+01

5.87E+01

4.23E+01

5.86E+01

f5

0.00E+00

1.76E+00

1.69E+00

3.87E+00

2.78E+00

8.56E+00

1.23E+01

1.86E+01

f6

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f7

1.06E+01

1.18E+01

1.20E+01

1.33E+01

3.35E+01

3.89E+01

4.33E+01

3.89E+01

f8

0.00E+00

1.95E+00

1.80E+00

4.10E+00

2.02E+00

9.09E+00

1.29E+01

1.61E+01

f9

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f10

3.72E+01

3.59E+01

4.30E+01

2.27E+01

1.41E+03

1.53E+03

1.39E+03

1.70E+03

f11

0.00E+00

0.00E+00

0.00E+00

0.00E+00

4.49E+00

3.04E+00

1.35E+01

7.80E+00

f12

9.02E+01

2.66E+00

1.01E+02

2.85E+01

4.63E+02

1.70E+02

3.72E+02

2.47E+02

f13

2.17E+00

2.96E+00

3.66E+00

3.57E+00

1.49E+01

1.48E+01

1.73E+01

1.57E+01

f14

6.05E-02

5.85E-02

7.80E-02

7.80E-02

2.19E+01

2.18E+01

2.16E+01

2.37E+01

f15

1.09E-01

2.21E-01

3.24E-01

2.51E-01

3.69E+00

1.09E+00

3.24E+00

1.86E+00

f16

4.17E-01

5.69E-01

5.37E-01

5.62E-01

4.26E+01

7.89E+01

2.29E+01

6.68E+01

f17

1.47E-01

5.02E-01

3.07E-01

1.39E-01

2.98E+01

3.29E+01

2.86E+01

2.97E+01

f18

7.00E-01

3.08E-01

3.86E+00

7.10E-01

2.21E+01

2.04E+01

2.11E+01

2.09E+01

f19

1.50E-02

1.07E-02

4.47E-02

1.55E-02

8.04E+00

4.50E+00

5.83E+00

4.61E+00

f20

1.47E-01

3.43E-01

2.57E-01

1.41E-01

3.57E+01

2.94E+01

3.03E+01

2.73E+01

f21

1.14E+02

1.32E+02

1.46E+02

1.02E+02

1.99E+02

2.09E+02

2.12E+02

2.22E+02

f22

9.85E+01

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

f23

3.00E+02

3.01E+02

3.02E+02

3.04E+02

3.51E+02

3.51E+02

3.56E+02

3.69E+02

f24

1.66E+02

2.97E+02

3.16E+02

2.91E+02

4.18E+02

4.26E+02

4.28E+02

4.41E+02

f25

4.12E+02

4.06E+02

4.26E+02

4.13E+02

3.87E+02

3.87E+02

3.87E+02

3.87E+02

f26

2.65E+02

3.00E+02

3.00E+02

3.00E+02

5.37E+02

9.20E+02

9.49E+02

1.08E+03

f27

3.92E+02

3.89E+02

3.89E+02

3.89E+02

5.02E+02

4.97E+02

5.04E+02

4.99E+02

f28

3.07E+02

3.39E+02

3.85E+02

3.25E+02

3.08E+02

3.09E+02

3.15E+02

3.02E+02

f29

2.31E+02

2.34E+02

2.28E+02

2.30E+02

4.33E+02

4.34E+02

4.35E+02

4.33E+02

f30

4.07E+02

3.95E+02

1.76E+04

4.02E+02

1.99E+03

1.97E+03

1.98E+03

1.98E+03

7 Single-Objective Real-Parameter Optimization: Enhanced …

117

Table 7.7 Comparison between EBOwithCMAR(EBO), jSO, LSHADE-cnEpSin(EpSin), ELSHADE-SPACMA(ESPACMA) on the benchmark with 50 and 100 dimensions D = 50

D = 100

EBO

jSO

EpSin

ESPACMA EBO

jSO

EpSin

f1

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

ESPACMA 0.00E+00

f3

0.00E+00

0.00E+00

0.00E+00

0.00E+00

2.99E-07

2.39E-06

0.00E+00

1.60E-07

f4

4.29E+01

5.62E+01

5.14E+01

4.36E+01

1.93E+02

1.90E+02

1.98E+02

2.01E+02

f5

7.58E+00

1.64E+01

2.52E+01

1.39E+01

2.87E+01

4.39E+01

5.59E+01

1.78E+01

f6

8.54E-08

1.09E-06

9.16E-07

0.00E+00

1.63E-05

2.02E-04

6.02E-05

0.00E+00

f7

5.79E+01

6.65E+01

7.66E+01

6.15E+01

1.22E+02

1.45E+02

1.62E+02

1.11E+02

f8

7.91E+00

1.70E+01

2.63E+01

1.79E+01

2.97E+01

4.22E+01

5.35E+01

1.82E+01

f9

0.00E+00

0.00E+00

0.00E+00

0.00E+00

1.76E-03

4.59E-02

0.00E+00

0.00E+00

f10

3.11E+03

3.14E+03

3.20E+03

3.69E+03

9.91E+03

9.70E+03

1.03E+04

1.08E+04

f11

2.64E+01

2.79E+01

2.14E+01

2.62E+01

6.56E+01

1.13E+02

4.92E+01

7.34E+01

f12

1.94E+03

1.68E+03

1.48E+03

1.36E+03

4.19E+03

1.84E+04

4.62E+03

7.79E+03

f13

4.14E+01

3.06E+01

6.94E+01

3.68E+01

2.45E+02

1.45E+02

1.25E+02

1.49E+02

f14

3.12E+01

2.50E+01

2.65E+01

3.07E+01

1.38E+02

6.43E+01

4.97E+01

4.75E+01

f15

2.94E+01

2.39E+01

2.56E+01

2.28E+01

1.65E+02

1.62E+02

8.99E+01

1.08E+02

f16

3.46E+02

4.51E+02

2.75E+02

4.15E+02

1.41E+03

1.86E+03

1.22E+03

1.76E+03

f17

2.75E+02

2.83E+02

2.07E+02

2.30E+02

1.21E+03

1.28E+03

9.32E+02

1.27E+03

f18

3.20E+01

2.43E+01

2.43E+01

2.51E+01

2.37E+02

1.67E+02

7.79E+01

1.05E+02

f19

2.45E+01

1.41E+01

1.74E+01

1.44E+01

1.15E+02

1.05E+02

5.55E+01

6.05E+01

f20

1.47E+02

1.40E+02

1.14E+02

1.08E+02

1.36E+03

1.38E+03

1.08E+03

9.28E+02

f21

2.11E+02

2.19E+02

2.27E+02

2.42E+02

2.60E+02

2.64E+02

2.77E+02

2.96E+02

f22

3.65E+02

1.49E+03

1.59E+03

7.16E+02

1.02E+04

1.02E+04

1.04E+04

9.70E+03

f23

4.34E+02

4.30E+02

4.39E+02

4.62E+02

5.77E+02

5.71E+02

5.98E+02

6.03E+02

f24

5.06E+02

5.07E+02

5.13E+02

5.34E+02

9.19E+02

9.02E+02

9.17E+02

9.32E+02

f25

4.89E+02

4.81E+02

4.80E+02

4.81E+02

7.16E+02

7.36E+02

6.84E+02

7.00E+02

f26

7.06E+02

1.13E+03

1.20E+03

1.34E+03

2.77E+03

3.27E+03

3.11E+03

3.24E+03

f27

5.22E+02

5.11E+02

5.25E+02

5.10E+02

5.88E+02

5.85E+02

5.89E+02

5.62E+02

f28

4.67E+02

4.60E+02

4.59E+02

4.60E+02

5.10E+02

5.27E+02

5.15E+02

5.21E+02

f29

3.47E+02

3.63E+02

3.53E+02

3.58E+02

1.28E+03

1.26E+03

1.12E+03

1.21E+03

f30

6.18E+05

6.01E+05

6.58E+05

5.97E+05

2.40E+03

2.33E+03

2.36E+03

2.25E+03

118

A. A. Hadi et al.

Table 7.8 Wilcoxon’s test against ELSHADE-SPACMA for D = 10, 30, 50 and 100 D Algorithms R+ R− p-value + ≈ – 10

30

50

100

EBOwithCMAR jSO LSHADE-cnEpSin EBOwithCMAR jSO LSHADE-cnEpSin EBOwithCMAR jSO LSHADE-cnEpSin EBOwithCMAR jSO LSHADE-cnEpSin

102.5 95 144 81 61 96 166.5 180.5 180.5 251 326.5 126

173.5 115 46 172 215 157 184.5 119.5 170.5 184 79.5 225

0.280 0.709 0.049 0.140 0.019 0.322 0.819 0.383 0.899 0.469 0.005 0.209

8 8 13 9 7 11 14 14 13 18 22 8

6 9 10 7 6 7 3 5 3 0 1 3

15 12 6 13 16 11 12 10 13 11 6 18

Dec. ≈ ≈ + ≈ + ≈ ≈ ≈ ≈ ≈ + ≈

Table 7.9 Results using score metric between LSHADE, LSHADE-SPA, LSHADE-SPACMA and ELSHADE-SPACMA algorithms Algorithms Score1 Score2 Score Rank ELSHADE-SPACMA LSHADE-SPACMA LSHADE-SPA LSHADE

50 45.44 44.69 42.53

46.23 50 39.23 35.73

96.23 95.44 83.92 78.26

1 2 3 4

Table 7.10 Wilcoxon’s test against ELSHADE-SPACMA for D = 10, 30, 50 and 100 D Algorithms R+ R− p-value + ≈ – 10

30

50

100

LSHADESPACMA LSHADE-SPA LSHADE LSHADESPACMA LSHADE-SPA LSHADE LSHADESPACMA LSHADE-SPA LSHADE LSHADESPACMA LSHADE-SPA LSHADE

Dec.

124

107

0.767

13

8

8



159 120 114.5

94 90 138.5

0.767 0.575 0.697

12 9 11

7 9 7

10 11 11

≈ ≈ ≈

119 134.5 149

134 141.5 127

0.808 0.915 0.738

10 12 12

7 6 6

12 11 11

≈ ≈ ≈

226.5 199.5 145

124.5 125.5 180

0.195 0.319 0.638

19 16 10

3 4 4

7 9 15

≈ ≈ ≈

258 307.5

93 98.5

0.036 0.017

17 21

3 1

9 7

+ +

7 Single-Objective Real-Parameter Optimization: Enhanced …

119

Table 7.11 Comparison between LSHADE, LSHADESPA(SPA), LSHADE-SPACMA(SPACMA, ELSHADE-SPACMA(ESPACMA) on the benchmark with 10 and 30 dimensions D = 10

D = 30

LSHADE

SPA

SPACMA

ESPACMA LSHADE

SPA

SPACMA

ESPACMA

f1

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f3

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f4

0.00E+00

0.00E+00

0.00E+00

0.00E+00

5.86E+01

5.86E+01

5.86E+01

5.86E+01

f5

2.99E+00

1.76E+00

1.76E+00

3.87E+00

6.70E+00

1.25E+01

3.45E+00

1.86E+01

f6

0.00E+00

0.00E+00

0.00E+00

0.00E+00

1.47E-08

0.00E+00

0.00E+00

0.00E+00

f7

1.22E+01

1.19E+01

1.09E+01

1.33E+01

3.74E+01

4.30E+01

3.38E+01

3.89E+01

f8

2.42E+00

1.87E+00

8.42E-01

4.10E+00

7.97E+00

1.27E+01

3.20E+00

1.61E+01

f9

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f10

2.23E+01

2.17E+01

2.18E+01

2.27E+01

1.40E+03

1.33E+03

1.44E+03

1.70E+03

f11

4.15E-01

0.00E+00

0.00E+00

0.00E+00

3.41E+01

1.55E+01

1.78E+01

7.80E+00

f12

7.69E+01

1.20E+02

1.19E+02

2.85E+01

1.00E+03

4.08E+02

6.15E+02

2.47E+02

f13

3.16E+00

3.62E+00

4.37E+00

3.57E+00

1.56E+01

1.52E+01

1.46E+01

1.57E+01

f14

1.74E-01

2.04E-02

1.56E-01

7.80E-02

2.17E+01

2.25E+01

2.34E+01

2.37E+01

f15

1.70E-01

2.68E-01

4.08E-01

2.51E-01

3.80E+00

2.17E+00

4.46E+00

1.86E+00

f16

4.09E-01

5.25E-01

7.42E-01

5.62E-01

4.18E+01

3.05E+01

2.52E+01

6.68E+01

f17

1.72E-01

1.19E-01

1.56E-01

1.39E-01

3.27E+01

2.85E+01

3.04E+01

2.97E+01

f18

2.78E-01

2.43E+00

4.35E+00

7.10E-01

2.32E+01

2.11E+01

2.34E+01

2.09E+01

f19

1.11E-02

5.51E-02

2.34E-01

1.55E-02

6.14E+00

4.91E+00

1.03E+01

4.61E+00

f20

1.49E-02

1.78E-01

3.12E-01

1.41E-01

3.06E+01

2.77E+01

8.38E+01

2.73E+01

f21

1.61E+02

1.56E+02

1.01E+02

1.02E+02

2.08E+02

2.13E+02

2.07E+02

2.22E+02

f22

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

1.00E+02

f23

3.04E+02

3.02E+02

3.03E+02

3.04E+02

3.55E+02

3.55E+02

3.55E+02

3.69E+02

f24

3.21E+02

2.90E+02

2.75E+02

2.91E+02

4.28E+02

4.29E+02

4.29E+02

4.41E+02

f25

4.11E+02

4.25E+02

4.28E+02

4.13E+02

3.87E+02

3.87E+02

3.87E+02

3.87E+02

f26

3.00E+02

3.00E+02

3.00E+02

3.00E+02

9.84E+02

9.62E+02

9.53E+02

1.08E+03

f27

3.89E+02

3.90E+02

3.90E+02

3.89E+02

5.08E+02

5.05E+02

5.05E+02

4.99E+02

f28

3.59E+02

4.02E+02

3.17E+02

3.25E+02

3.42E+02

3.21E+02

3.11E+02

3.02E+02

f29

2.34E+02

2.31E+02

2.31E+02

2.30E+02

4.35E+02

4.30E+02

4.45E+02

4.33E+02

f30

7.82E+04

4.09E+04

4.30E+02

4.02E+02

2.00E+03

2.00E+03

2.01E+03

1.98E+03

120

A. A. Hadi et al.

Table 7.12 Comparison between LSHADE, LSHADESPA(SPA), LSHADE-SPACMA(SPACMA, ELSHADE-SPACMA(ESPACMA) on the benchmark with 50 and 100 dimensions D = 50

D = 100

LSHADE

SPA

SPACMA

ESPACMA LSHADE

SPA

SPACMA

ESPACMA

f1

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

0.00E+00

f3

0.00E+00

0.00E+00

0.00E+00

0.00E+00

2.88E-06

0.00E+00

0.00E+00

1.60E-07

f4

6.30E+01

4.96E+01

2.94E+01

4.36E+01

1.96E+02

2.03E+02

2.01E+02

2.01E+02

f5

1.18E+01

2.88E+01

5.99E+00

1.39E+01

2.83E+01

5.19E+01

1.22E+01

1.78E+01

f6

7.61E-08

2.65E-07

0.00E+00

0.00E+00

1.66E-03

2.60E-05

0.00E+00

0.00E+00

f7

6.43E+01

8.01E+01

5.70E+01

6.15E+01

1.36E+02

1.57E+02

1.12E+02

1.11E+02

f8

1.15E+01

2.89E+01

5.81E+00

1.79E+01

2.72E+01

5.02E+01

1.02E+01

1.82E+01

f9

0.00E+00

0.00E+00

0.00E+00

0.00E+00

1.07E-01

0.00E+00

0.00E+00

0.00E+00

f10

3.09E+03

2.96E+03

3.49E+03

3.69E+03

1.02E+04

9.88E+03

1.00E+04

1.08E+04

f11

5.01E+01

2.74E+01

3.22E+01

2.62E+01

3.78E+02

4.38E+01

5.20E+01

7.34E+01

f12

2.23E+03

1.42E+03

1.56E+03

1.36E+03

2.35E+04

5.74E+03

4.80E+03

7.79E+03

f13

6.37E+01

4.90E+01

3.71E+01

3.68E+01

1.22E+03

9.42E+01

1.48E+02

1.49E+02

f14

3.20E+01

2.74E+01

2.94E+01

3.07E+01

2.68E+02

5.44E+01

7.14E+01

4.75E+01

f15

4.46E+01

2.44E+01

3.04E+01

2.28E+01

2.58E+02

9.71E+01

1.08E+02

1.08E+02

f16

3.80E+02

3.00E+02

3.35E+02

4.15E+02

1.56E+03

1.37E+03

1.25E+03

1.76E+03

f17

2.45E+02

2.33E+02

2.77E+02

2.30E+02

1.10E+03

1.10E+03

1.03E+03

1.27E+03

f18

4.79E+01

2.49E+01

3.24E+01

2.51E+01

2.07E+02

9.75E+01

1.35E+02

1.05E+02

f19

3.44E+01

1.69E+01

2.17E+01

1.44E+01

1.80E+02

5.85E+01

7.35E+01

6.05E+01

f20

1.67E+02

1.30E+02

1.68E+02

1.08E+02

1.40E+03

1.37E+03

1.47E+03

9.28E+02

f21

2.15E+02

2.30E+02

2.15E+02

2.42E+02

2.57E+02

2.74E+02

2.44E+02

2.96E+02

f22

2.80E+03

1.53E+03

1.38E+03

7.16E+02

1.09E+04

1.00E+04

9.99E+03

9.70E+03

f23

4.31E+02

4.41E+02

4.41E+02

4.62E+02

5.61E+02

5.90E+02

5.81E+02

6.03E+02

f24

5.10E+02

5.14E+02

5.13E+02

5.34E+02

9.18E+02

9.32E+02

9.19E+02

9.32E+02

f25

4.81E+02

4.82E+02

4.81E+02

4.81E+02

7.48E+02

6.91E+02

7.09E+02

7.00E+02

f26

1.19E+03

1.26E+03

1.14E+03

1.34E+03

3.41E+03

3.22E+03

3.14E+03

3.24E+03

f27

5.41E+02

5.27E+02

5.32E+02

5.10E+02

6.62E+02

5.89E+02

5.93E+02

5.62E+02

f28

4.63E+02

4.61E+02

4.60E+02

4.60E+02

5.28E+02

5.13E+02

5.17E+02

5.21E+02

f29

3.51E+02

3.47E+02

3.92E+02

3.58E+02

1.37E+03

1.18E+03

1.59E+03

1.21E+03

f30

6.61E+05

6.24E+05

6.68E+05

5.97E+05

2.41E+03

2.39E+03

2.38E+03

2.25E+03

7 Single-Objective Real-Parameter Optimization: Enhanced …

121

References 1. N. Awad, M. Ali, J. Liang, B. Qu, P. Suganthan, Problem definitions and evaluation criteria for the cec, special session and competition on single objective real-parameter numerical optimization, Technical Report (2017), p. 2016 2. N.H. Awad, M.Z. Ali, P.N. Suganthan, Ensemble sinusoidal differential covariance matrix adaptation with euclidean neighborhood for solving cec2017 benchmark problems, in 2017 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2017) 3. J. Brest, M.S. Mauˇcec, B. Boškovi´c, Single objective real-parameter optimization: algorithm jso, in 2017 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2017), pp. 1311–1318 4. S. Das, S.S. Mullick, P.N. Suganthan, Recent advances in differential evolution-an updated survey. Swarm Evol. Comput. 27, 1–30 (2016) 5. N. Hansen, The cma evolution strategy: a comparing review, in Towards a New Evolutionary Computation (Springer, Berlin, 2006), pp. 75–102 6. A. Kumar, R.K. Misra, D. Singh, Improving the local search capability of effective butterfly optimizer using covariance matrix adapted retreat phase, in 2017 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2017), pp. 1835–1842 7. A.W. Mohamed, A.A. Hadi, A.M. Fattouh, K.M. Jambi, Lshade with semi-parameter adaptation hybrid with cma-es for solving cec 2017 benchmark problems, in 2017 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2017), pp. 145–152 8. A.W. Mohamed, A.K. Mohamed, Adaptive guided differential evolution algorithm with novel mutation for numerical optimization. Int. J. Mach. Learn. Cybern. 10(2), 253–277 (2019) 9. R. Tanabe, A.S. Fukunaga, Improving the search performance of shade using linear population size reduction, in 2014 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2014), pp. 1658–1665 10. J. Zhang, A.C. Sanderson, Jade: adaptive differential evolution with optional external archive. IEEE Trans. Evol. Comput. 13(5), 945–958 (2009)

Chapter 8

Operations Research at Bulk Terminal: A Parallel Column Generation Approach Gustavo Campos Menezes, Lucas Teodoro de Lima Santos, João Fernando Machry Sarubbi, and Geraldo Robson Mateus

Abstract This chapter discusses various optimization problems existing in the process of storing and transporting loads. In particular, we investigated an optimization problem involving the production planning, product allocation, and scheduling of products in the largest bulk port terminal existing in Brazil. The main contributions of this chapter are related to the use of a parallel approach to solving the integrated problem. The methodology uses a combination of heuristics, column generation, and optimization package. The computational experiments (based on real cases) showed that the parallel solution was faster in all tested instances reaching gains close to 90%.

8.1 Introduction Approximately 80% of global trade by volume and more than 70% of its value is transported by ships and handled at port terminals around the world. In 2016, maritime trade reached a total of 10.3 billion tons (a growth of 2.6%), a volume comprising mostly containers and dry and liquid bulk cargo. Additionally, according G. C. Menezes (B) Departamento de Eletroeletrônica e Computação, Centro Federal de Educação Tecnológica de Minas Gerais, Contagem, MG, Brazil e-mail: [email protected] G. R. Mateus Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil e-mail: [email protected] L. T. de Lima Santos · J. F. Machry Sarubbi Departamento de Computação, Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, MG, Brazil e-mail: [email protected] J. F. Machry Sarubbi e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_8

123

124

G. C. Menezes et al.

Table 8.1 Dry bulk (2015 and 2016). Source: UNCTAD report 2015 2016 Major bulks Iron ore Coal Grain Bauxite/alumina Phosphate rock Minor bulks Steel Forest Total

Growth Between 2015–2016

1364 1142 459 126 30

1410 1140 476 116 30

3.4 −0.2 3.7 −7.9 1.0

406 346 4827

404 354 4888

−0.5 2.3 1.3

to the UNCTAD report, world demand for dry bulk commodities grew 1.3%, taking total shipments to 4.9 billion tons. China remained the primary consumer. The Table 8.1 show the volumes handled in between 2015 and 2016. Although being a relevant export product in the trade balance of several countries, the analyses carried out by the UNCTAD report, estimates that states spent 15% of the value of their imports on international transport and insurance. Developing countries pay more (approximately 21%). Lower efficiency, inadequate infrastructure, less competitive transport, low technology are factors that generate a considerable increase in costs. These factors push producers, customers, port operators, ship owners, and other actors to improve performance levels and strategies that strive for efficiency and efficacy in all port operations. Also, another point to consider is that the flow of products in the terminal is associated with fluctuations in demand influenced by financial crises, natural disasters, wars. The challenge, therefore, is to achieve improvements in terminal productivity and efficiency by intelligently using available resources and seeking to minimize massive investment in infrastructure. Given this scenario, the use of models, methods, and techniques from operational research is increasingly important. Its association with big data, machine learning, will bring logistics and transportation operations to a new level of efficiency. The share of the major bulk commodities (coal, iron ore, grain, and bauxite/alumina/phosphate rock) in 2016 (UNCTAD report) amounted to about 43.9% of total dry cargo volumes, followed by containerized trade (23.8%) and minor bulk (23.7%). Although having an essential role in international trade, research on bulk port terminals is little explored in the literature. Articles with literature review and research opportunities are concentrated on the container terminal. Port operations management, whether in container terminals, bulk or both, can be summarized in some of the following main topics:

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

125

• To determine the attendance of a queue of ships that need to dock to perform loading and/or unloading of cargo; • Establish for a set of products arriving at the port, which ones should be stocked and where, and which ones should be loaded onto the ships; • To choose the best route by which products stored in the port should be transported, from arrival to yards and yards to ships; • Define which equipments (conveyor belts, reclaimers, stacker, trucks, cranes) should be used to handle products in the yard and transport them to ships; • Establish the order in which the machines will be used to reduce machine preparation costs and delays, respecting the limits of capacity.

8.2 Overview of Bulk Optimization The infrastructure of the port terminal is directly associated with the type of cargo to be handled. Port terminals can be categorized into container and bulk terminals. A port terminal for bulk cargo consists basically of the following components: piers and berths (where ships dock to load and unload the cargo) and one or several storage areas (product stockyards). The connection of the port with the land is made by railways, which facilitate the disposal of products to consumers or the receipt of products from suppliers (mines, agricultural producers, power plants) to load onto ships. The flow of products between ships, stockyards, and railways is performed using specific equipment: conveyor belts, trucks, ore stackers, and reclaimers, among others. The availability and capacity of the equipment and the storage limits of the stockyards are factors that define the ability of the terminal to fulfill the shipping contracts. Delays unloading a ship at the terminal (demurrage) or even unloading it before schedule (dispatch) result in a fine: for the terminal operator in the event of demurrage or the shipowner in the event of dispatch. Port operations management, whether in a container, bulk, or both, can be summarized in some of the following (Table 8.2). The problems evaluated can be solved in several ways: exact or heuristic approaches, as well as modeled as deterministic, stochastic, or integrated problems. In general, most of these problems are recently studied in an integrated way. Figure 8.1 shows the main problems related to Bulk Port Optimization.

8.3 Integrated Production Planning and Scheduling In this work, we solve a real problem involving the production planning, product allocation and scheduling of products in the largest bulk port terminal existing in Brazil. This problem can be modeled as the Integrated Product Flow Planning and Scheduling Problem (P F P S P), a two decision-making problem that has critical importance to ensure efficient operation at port terminal facilities. The P F P S P can

126 Table 8.2 General Optimization problems Problem

G. C. Menezes et al.

Definition

Berth Allocation Problem

Determine when and where the ships should moor along a quay. The objective is to minimize the total time spent by the ships at the port Stacker/Reclaimer scheduling problem Given a set of handling operations in a yard, the objective is to find an optimal operation scheduling on each stacker-reclaimer Yard truck scheduling The problem is to assing a fleet of trucks to transport containers to minimize the makespan Stockyard Allocation How to arrange and holding of cargo at a shipping point for future transportation Production planning, allocation and scheduling The problem is to define the amount and destination of each input or output order in a bulk cargo terminal and establishing a set of feasible routes to transport the cargo

Fig. 8.1 Bulk Port Optimization Problems

be defined, in general, as follows: there are a set of supply nodes, where products are available for transportation, storage nodes where the products are stocked and demand nodes or delivery subsystem for shipping products. Specialized equipment with predefined capacities is used to transport the products within the network. An equipment route between nodes has a given capacity and handle one product at a time. The defined problem can be applied in different scenarios, such as: in the mining industry, bulk ports, agroindustry, among others. In all these cases, the supply nodes are the arrival points of products (iron ore, grains, coal), a storage yard, the place

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

127

Fig. 8.2 Offer, Stock and Demand nodes

where the products are stored, and the point of demand a node where the products are delivered in the final destinations. There are several challenges involved in storage yards management. Besides, the solutions obtained by investigating this problem can be applied not only to bulk ports but also in other operations that affect the flow of cargo between supply, stock, and demand points. Figure 8.2 highlights the three major nodes related to this problem. The offer node represents the arrival or supply of products to the system. The stock node is responsible for temporary storage. This storage can be done in sheds, outdoors (in the case of iron ore) or silos (in the case of grains). Finally, the demand node, which represents the destination of the products. This destination can be the end customer, industries or another cargo terminal. Trucks, conveyor belts, ships, and trains can be used to transport products between these nodes. In 2017, Menezes et al. [10] proposed an algorithm to solve the P F P S P. Their algorithm uses a combination of heuristics and column generation and provides initial solutions to a Branch and Price method. Despite the good results found by Menezes et al. [10], the method had a high computational cost. In some instances, the proposed algorithm spend more than four hours to found a solution. In this research, we introduced a new approach to solving the problem. Our algorithm also uses a combination of heuristics and column generation, but instead of work as a sequential algorithm as Menezes et al.[10], it uses a parallel framework. Our results demonstrate that our parallel algorithm was faster in all tested instances reaching gains close to 90%.

8.4 Literature Review The integrated approach to solve Planning and Scheduling is commonly adopted in the literature. In Grossmann et al. [16], is investigated an integrated production problem, whose goal is to determine at each period which products to manufacture, as well as to establish an optimal capacity modification plan, such that future demand is satisfied. Bruno et al. [5], investigates the integration of Planning and Scheduling of a Network of Batch Plants. The problem is to define the amounts of products to be produced in each time period, the allocation of products to batch units, and the detailed timing of operations and sequencing of products. Other researchers in this same direction are the works of: [6, 8, 9, 15]), and, more recently, [11] and [14]. The central problem study in this article involves the flow of products between supply nodes, storage areas, and demand nodes. The references highlighted below

128

G. C. Menezes et al.

are related to mathematical models and algorithms for problems in bulk cargo terminal. Bilgen et al. [2], study the problem of blending and allocating ships for grain transportation. Byung et al., [7] study the allocation of products in the stockyard. This problem is solved using a mixed-integer programming model. Barros et al. [1], develop an integer linear programming model for the challenge of allocating berths in conjunction with the storage conditions of the stockyard. Boland [3], address the issue of managing coal stockpiles in Australia. Singh et al. [13], present a mixedinteger programming model for the problem of planning the capacity expansion of the coal production chain in Australia. Finally, Robenek et al. [12], proposes an integrated model for the integrated berth allocation and yard assignment problem in bulk ports.

8.5 Integrated Production Planning and Scheduling The mathematical model for the Integrated Production Planning and Scheduling Problem was initially proposed in [10]. All production is planned for a given time horizon, divided into T periods. Routes are classified into three types: routes x that transport products from the Supply node to the Storage yard, routes y from the Supply node to the Demand node, and routes z from the Storage yard to the Demand. The number of routes is limited, and they may share equipment. Thus, if two different products are assigned to routes sharing equipment, these routes must be active at non-overlapping intervals. Figure 8.3 shows a case where two routes (routes 1 and 2) share the same equipment.

Fig. 8.3 Routes with shared equipments

Fig. 8.4 PFPSP Model

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

129

Figure 8.4 describes at a high level the objective function and the main constraints of the model. In general, the formulation represents the production planning; that is, it defines which product, route, the quantity and in what period a product will be transported. The formulation also represents the scheduling problem, which will set the start and the end time of use of each route, guaranteeing that routes sharing equipment are not executed simultaneously.

8.6 Column Generation Method The PFPSP formulation can be broken into a restricted linear master problem (RMLP), representing the production planning and the pricing subproblem (scheduling), responsible for providing columns for the RMLP. In the PFPSP formulation, the time required to transport the products is limited by the duration of one period. Thus, these constraints determine the start time of each task in a specific period and consider the disjunction between conflicting tasks. These constraints impose great difficulty in solving the PPSFP. An alternative to overcome this obstacle is to divide the period t into micro periods because the time needed to transport the products between offer, stock and demand nodes, are supposed to be smaller than the duration of one period. To show the entire procedure of solving the PFPSP, we present a small illustration of the method. Suppose that the relaxed PFPSP (by disregarding the scheduling constraints) has been solved and that the following variables (tasks) for the period 3 (three) were extracted from the solution (Table 8.3). The complete methodology of the column generation procedure can be found in [10]. 1 (column Variables) represents that In the first row of Table 8.3, the variable x23 the product 2 should be carried by Route 1 in period 3, and the total time to transport this product on Route 1 (column Routes) will be 4 h (column Values). This variable corresponds to task or vertex A in the conflict graph. The remaining rows of the Table 8.3 have similar operations. Figure 8.5 illustrates the conflict between routes. Routes 1 and 2 share equipment and therefore, cannot operate simultaneously. The same is

Table 8.3 Solution of the relaxed PFPSP (disregarding scheduling constraints) Vertices Variables Values Routes A B C D E F

1 x23 2 x53 8 y553 9 y333 10 z 663 6 z 443

4 6 3 5 6 3

R1 R2 R8 R9 R10 R6

130

G. C. Menezes et al.

Fig. 8.5 Routes with conflicts

Fig. 8.6 Weighted conflict graph

Fig. 8.7 Heuristic solution

true for the routes numbered 6 and 9 and three other routes sharing equipment among themselves (routes 6, 8, and 10). To generate the weighted conflict graph G (Fig. 8.6), its vertices are defined at column Vertices in Table 8.3, the weight of each vertex is initially considered as the duration (column Values) and edges are created considering conflicts of Fig. 8.5. Once the graph is obtained, the next step is to solve it with one greedy algorithm. Figure 8.7 presents a possible solution. Based on Fig. 8.7, the number of microperiods obtained (equivalent to the total number of vertex colors) is three (3). Microperiod one (1) contains the subset of vertices (A, C, and D), which are equivalent to variables 1 8 9 , y553 , and y333 , respectively. The constraints (32), (33), and (34) after the initial x23 column generation are as follows:

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

131

Fig. 8.8 Weighted graph

Fig. 8.9 DSATUR solution

1 x23 ≤ μ13 2 x53 ≤ μ23 8 y553 ≤ μ13 9 y333 ≤ μ13 10 z 663 ≤ μ33 6 z 443 ≤ μ23

1 , π223 2 , π553 8 , π553 9 , π333 10 , π663 6 , π443

1 6 Considering the variables π223 to π443 , the values of these dual variables are obtained after the RMLP solution. Now the process of column generation begins: the tasks and values of their dual variables are used to create the weighted conflict graph (Fig. 8.8). To color a graph G with k colors is equivalent to finding k independent sets. Therefore, a strategy to return more columns to the RMLP in each period is to produce more than one independent set. For this purpose, a heuristic based on the Weighted Vertex Coloring Problem (WVCP) is adopted. From the greedy strategies implemented, the one that has given the best results is DSATUR. It was initially developed for the VCP by [4] based on the classic greedy strategy for this problem. The DSATUR heuristic receives as input the weighted graph (Fig. 8.8) and returns a color for each vertex (Fig. 8.9). For each color of Fig. 8.9, if the solution contains a microperiod of reduced negative cost, the tasks associated with this color will be part of a new microperiod available for the RMLP. The new columns for RMLP are as follows:

132

G. C. Menezes et al. 1 x23 ≤ μ13 +μ63 2 x53 ≤ μ23 +μ43 8 y553 ≤ μ13 +μ43 9 y333 ≤ μ13 +μ63 10 z 663 ≤ μ33 +μ63 6 z 443 ≤ μ23 +μ53

1 , π223 2 , π553 8 , π553 9 , π333 10 , π663 6 , π443

The iterative process continues until the DSATUR heuristic no longer yields attractive columns for the RMLP. At this moment, the optimal solution for the pricing subproblem is found using optimization packages. If this solution contains a column of negative reduced cost, then the iterative process is restarted. Otherwise, the entire column generation procedure is terminated.

8.6.1 Parallel Approach At each step that the pricing subproblem is solved, t subproblems are solved (one for each period). Because the subproblems are independent, it is possible to solve them using a parallel approach, which can be applied for both the exact (based on the maximum weight independent set) and heuristic (based on the weighted vertex coloring problem). The following pseudo-code highlights this procedure. The sequential version of this procedure was initially published in [10]. The code referring to lines 8 through 19 in the procedure detailed in section 8.4 was parallelized using the parallel programming API (OpenMP (2008)). This tool provides a simple and intuitive API for multithreaded environments. The main characteristics and functionalities can be found in Chapman et al. (2007).

8.6.2 Primal Heuristic Only the conversion of relaxed variables from RMLP to integer, after converging the column generation procedure, and its resolution into a MIP solver, does not provide a reasonable upper bound. To provide a higher quality upper bound, a heuristic was developed for PPSFP, based on a variable-fixing and depth-first search strategy: the fixing and search heuristic (FSH). The following algorithms illustrate the operation of this heuristic. The FSH Heuristic is divided into two phases: in Phase 1 (lines 4 to 10), consider the optimum solution of the RMLP (when there are no more columns with a reduced negative cost), line 2 of the FSHHeuristic procedure. From the initial node, for each subarea and period, set the most fractional patio allocation variable ( f pts ) to 1, whose quantity of stored product (variable spt s is higher than 80% of the subarea capacity. Then start the parallel column generation process again: solve each iteration for the

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

133

Procedure: ParallelCG  ← Get I nitialColumns() // (Generates initial columns for RMLP ) dual ← ∅ Solution ← ∅ repeat dual ← Solve R M L P() // (Exact solution by solver) End ← tr ue Start Parallel (OpenMp) for i = 1toT // (Number of periods) do Color s ← DS AT U R(i, dual) // (Returns only colors with negative reduced cost (WVCP problem)) if Color s = ∅ //( Heuristic above failed to provide new columns) then Color s ← E xact SolutionSubpr oblem(i, dual) // (Exact solution pricing subproblem Cplex parallel mode) end if if Color s = ∅ then  =  ∪ Color s End = f alse end if end for End Parallel until End = tr ue Solution ← Solve R M L P() //(Cplex parallel mode) Lower Bound = Solution if Fractional(Solution) = T r ue // (Checks whether the solution is fractional) then Solution ← Get I nteger Solution(Solution) end if U pper Bound = Solution ProcedureFSHHeuristic Solution ← ParallelCG() V alue ← 0.2 for i = 1 to 3 do if LinearSolution(Solution) then F S H Fase1(V alue) Solution ← ParallelCG() V alue = V alue + 0.2 end if end for if ( (IntegerSolution(Solution)) or (Solution = infeasible) ) then Return Solution end if repeat F S H Fase2() Solution ← ParallelCG() until (Solution = Integer) or (Solution = infeasible) return Solution

134

G. C. Menezes et al.

RMLP and the pricing subproblem until there are no more columns with negative cost to be inserted. If the solution found is either whole or unfeasible, quit the heuristic. Otherwise, create another node, now setting allocation variables whose inventory is higher than 60% of capacity. Repeat the whole process again. If the solution found is not yet complete, repeat for 40 % of capacity. This heuristic step is described between lines 4 through 10 of the pseudocode. The rate parameter (line 6) corresponds to the values of 80%, 60%, and 40% (which are incremented in line 8). Finally, if an entire solution has not yet been found, the resolution of Phase 2 starts (lines 15 to 18): select the most fractional allocation variable, set its value to 1 and repeat the column generation process (line 16). Repeat Phase 2 until an entire solution or an impractical solution is found.

8.7 Computational Experiments The experiments are performed based on a real product flow problem in an iron ore port terminal in Brazil, recognized as one of the largest worldwide. The basic parameters are the number of periods, the products, and the routes. In general, they work with seven periods of one day or fourteen periods of twelve hours. The experiments were conducted using a computer cluster, composed of XEON processors with 8 physical threads, hyperthread, and 32 GB RAM. The machines are connected by a Gigabit network, running version 12.5 of the CPLEX solver. For all instances, a time limit of 5 h was set. Table 8.4 highlights the main parameters used to create the instances. The first and second columns of Table 8.5, contains the type and name of each instance. For the first column (Type), the instances were divided into three levels of difficulty: For the first set (type 1 Instances), it was assumed that anyone of the P

Table 8.4 Data used to generate the instances Parameter Description Stockyard Delivery Equipment α pt βnpt s γ p, p  ,t

λ pp σr

The product storage area is divided into four stockyards Two berths: two ships can be loaded simultaneously at berth 1 Five car dumpers, four ore reclaimers, three stackers/reclaimers (equipment that performs both tasks), and eight stackers 2 (two monetary units) 10 (ten monetary units for the berth number one), 50 (fifty monetary units for the berth number two) 10 (ten monetary units) Based on the following formula: 0.01 (monetary unit) ∗| p − p  |, where | p − p  | represents the quality deviation between the product p and p  Based on the following formula: 0.01 (monetary unit) ∗ length of route r

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

135

Table 8.5 Computational Experiments Sequential Heuristic

Parallel Heuristic

Type

Instance

ZLB

ZU B

t(s)

Gap(%)

1

4P5Prod

67.55

68.40

57

4P10Prod

77.32

91.21

518

8P5Prod

125.69

150.03

8P10Prod

141.18

15P5Prod

234.13

15P10Prod

298.25

15P15Prod

171266.00

15P20Prod

2

3

CPLEX

t(s)

Number of nodes

ZLB

ZU B

t(s)

Gap(%)

1.24

20

9

15.22

175

15

68.27

68.27

219

0.00

82.23

82.23

3354

241

16.22

45

0.00

21

128.61

128.61

1241

0.00

259.18

1410

45.53

292.38

375

19.92

336

24



259.18

1410



52

13

243.62

243.64

16664

0.00

547.85

4770

171392.00

6820

45.56

1069

39

301.42

922432.00

18007

0.99

0.07

1824

38



171392.00

6820

433229.00

433251.00



2291

0.01

838

17



433251.00

2291

20P5Prod

331.74



391.16

1330

15.19

155

43

341.81

465.94

1330

20P10Prod

0.26

480.14

644.34

5617

25.48

1275

40



644.34

5617



20P15Prod

211939.00

212026.00

7339

0.04

2104

32



212026.00

7339



20P20Prod

664364.00

664416.00

4026

0.01

1363

19



664416.00

4026



4P5Prod

77.94

78.08

13

0.18

5

9

78.01

78.01

69

0.00

4P10Prod

100.59

111.44

96

9.74

34

16

103.29

103.29

255

0.00

8P5Prod

146.98

147.11

37

0.09

8

14

147.11

147.11

86

0.00

8P10Prod

170.41

244.52

236

30.31

47

19

175.84

175.84

433

0.00

15P5Prod

274.54

275.13

68

0.21

12

14

275.13

275.13

622

0.00

15P10Prod

350.31

394.14

358

11.12

49

17

362.38

362.39

3950

0.00

15P15Prod

509.26

629.03

2024

19.04

415

42



15P20Prod

587.68

731.79

6026

19.69

2410

42



20P5Prod

382.62

383.83

96

0.32

16

14

383.83

383.83

858

0.00

20P10Prod

553.18

633.40

700

12.66

82

23

570.5

570.50

6203

0.00

20P15Prod

703.22

932.54

2884

24.59

584

41

738.50

42106.40

18006

0.98

20P20Prod

761.28

1022.87

11922

25.57

5241

68



4P5Prod

73.45

73.72

20

0.36

7

10

73.72

73.72

32

0.00

8P5Prod

137.03

149.06

87

8.08

20

15

137.75

137.75

230

0.00

8P10Prod

155.76

220.13

414

29.24

99

19

160675.00

160.68

1856

0.00

15P5Prod

254.53

280.16

212

9.15

32

19

256819.00

256.82

1157

0.00

15P15Prod

461.89

605.08

4715

23.66

1154

57



15P20Prod

555.41

914.35

11847

39.26

3228

76



20P5Prod

357.78

384.69

299

7.00

43

26

361281.00

20P10Prod

507.79

617.94

2145

17.83

452

39





20P15Prod

641.92

860.85

7376

25.43

1820

63





20P20Prod

714.18

1159.96

17058

38.43

4867

76





– –



– – 361.28

2101

0.00

available products could be allocated to any subarea. For the second set (type 2), the full set of products is split into two subsets: the first subset of products could be stored in any subarea of stockyards 1 and 2, and the second subset could be stored in the subareas of stockyards 3 and 4. Finally, for the third set (type 3), the products were split into four subsets, each one allocated to one of the four stockyards. The instance 8P5Pr od corresponds a planning horizon of 8 periods, and 5 different products being handled. Columns Z L B and Z U B provide the best lower and upper bounds, respectively. The GAP column provides the solution, computed as, G A P = 100(Z U B − Z L B )/Z U B , ts are the elapsed computational times, expressed in seconds. The character (−) represents instances for which the solver could not obtain a solution for the PFPSP due to insufficient memory.

136

G. C. Menezes et al.

Fig. 8.10 Relative gain between parallel and sequential solution

Table 8.5 presents three solution approaches for PFPSP: sequential heuristic, parallel heuristic, and exact solution (through the use of the CPLEX solver). The results are shown in Table 8.5 indicates that solving the PFPSP in optimization packages is not feasible. The solver was able to produce solutions only for 20 instances. In the rest, because of insufficient memory, it was not possible to obtain even an upper bound. With the heuristic, it is possible to get solutions for all instances; all supplies and demands were met. As expected, the parallel heuristic solved instances in less time. Its performance was superior to sequential heuristics and to the CPLEX solver in all cases. Figure 8.10 represents the relative gain, in terms of time, between parallel and sequential solutions. The x-axis represents several instances, while the y-axis represents the relative gain in percentage. When a gain reaches 60% means that the parallel algorithm was 60% faster than the sequential approach.

8.8 Final Remarks In this work, we consider an integrated problem of planning and scheduling. The problem is general and can be applied to represent various scenarios related to the flow of bulk cargo (iron ore, coal, and grains). It was proposed a Parallel approach to solve the PFPSP. Computational experiments were conducted considering a sequential and parallel version of the method, and the results were compared to an exact approach based on CPLEX solver. As future work, a parallel branch and price algorithm is being developed. Another research topic is the development of a stochastic optimization model capable of considering the uncertainty surrounding supply, demand, and equipment failure.

8 Operations Research at Bulk Terminal: A Parallel Column Generation Approach

137

Acknowledgements This research is supported by the following institutions: Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

References 1. V.H. Barros, T.S. Costa, A.C.M. Oliveira, L.A.N. Lorena, Model and heuristic for berth allocation in tidal bulk ports with stock level constraints. Comput. Ind. Eng. 60, 606–613 (2011) 2. B. Bilgen, I. Ozkarahan, A mixed-integer linear programming model for bulk grain blending and shipping. Int. J. Prod. Econ. 107(2), 555–571 (2007) 3. N. Boland, D. Gulezynski, M. Savelsbergh, A stockyard planning problem. EURO J. Transp. Logist. 1(3), 197–236 (2012) 4. D. Brélaz, New methods to color the vertices of a graph. Commun. ACM 22(4), 251–256 (1979) 5. B.A. Calfa, A. Agarwal, I.E. Grossmann, J.M. Wassick, Hybrid bilevel-lagrangean decomposition scheme for the integration of planning and scheduling of a network of batch plants. Ind. Eng. Chem. Res. 52(5), 2152–2167 (2013) 6. F. Gaglioppa, L.A. Miller, S. Benjaafar, Multitask and multistage production planning and scheduling for process industries. Oper. Res. 56(4), 1010–1025 (2008) 7. B. Kim, B. Koo, B.S. Park, A raw material storage yard allocation problem for a large-scale steelworks. Int. J. Adv. Manufact. Technol. 41(9–10), 880–884 (2009) 8. T. Kis, A. Kovcs, A cutting plane approach for integrated planning and scheduling. Comput. Oper. Res. 39(2), 320–327 (2012) 9. G.R. Mateus, M.G. Ravetti, M.C. Souza, T.M. Valeriano, Capacitated lot sizing and sequence dependent setup scheduling: an iterative approach for integration. J. Sched. 13(3), 245–259 (2010) 10. G.C. Menezes, G.R. Mateus, M.G. Ravetti, A branch and price algorithm to solve the integrated production planning and scheduling in bulk ports. Eur. J. Oper. Res. 258(3), 926–937 (2017) 11. H. Meyr, M. Mann, A decomposition approach for the general lotsizing and scheduling problem for parallel production lines. Eur. J. Oper. Res. 229(3), 718–731 (2013) 12. T. Robenek, N. Umang, M. Bierlaire, A branch-and-price algorithm to solve the integrated berth allocation and yard assignment problem in bulk ports. Eur. J. Oper. Res. 235(2), 399–411 (2014) 13. G. Singh, D. Sier, A.T. Ernst, O. Gavriliouk, R. Oyston, T. Giles, P. Welgama, A mixed integer programming model for long term capacity expansion planning: A case study from the hunter valley coal chain. Eur. J. Oper. Res. 220(1), 210–224 (2012) 14. C. Wolosewicz, S. Dauzere-Peres, R. Aggoune, A Lagrangian heuristic for an integrated lotsizing and fixed scheduling problem. Eur. J. Oper. Res. 244(1), 3–12 (2015) 15. D. Wu, M. Ierapetritou, Hierarchical approach for production planning and scheduling under uncertainty. Chem. Eng. Process.: Process Intens. 46(11), 1129–1140 (2007) 16. F. You, I.E. Grossmann, J.M. Wassick, Multisite capacity, production, and distribution planning with reactor modifications: Milp model, bilevel decomposition algorithm versus lagrangean decomposition scheme. Ind. Eng. Chem. Res. 50(9), 4831–4849 (2011)

Chapter 9

Heuristic Solutions for the (α, β)-k Feature Set Problem Leila M. Naeni and Amir Salehipour

Abstract Feature selection aims to choose a subset of features, out of a set of candidate features, such that the selected set best represents the whole in a particular aspect. The (α, β)-k feature set problem (FSP) is a combinatorial optimization-based approach for selecting features. On a dataset with two groups of data, the (α, β)-k FSP aims to select a set of features such that the set maximizes the similarities between entities of the same group and the differences between entities of different groups. This study develops a matheuristic algorithm for the (α, β)-k FSP. We test the algorithm on 11 real-world instances ranging from medium to large. The computational results demonstrate that the proposed matheuristic competes well with the standard solver CPLEX.

9.1 Introduction Feature selection, which is also known as variable selection, attribute selection or variable subset selection, aims to choose a subset of features, out of a set of candidate features, such that the selected set best represents the whole in a particular aspect. Feature selection is an important technique in refining data because data contain many redundant, irrelevant or unimportant features, which may not be of interest when generating a model, or analyzing data. In practice, different criteria and measures might be of interest for selecting a set of features such as “selection cost”, “better classifier”, and “new or independent features in the set”, among others. For example, in bioinformatics studying the whole set of probes or genes (features) in a dataset is highly resource demanding. In addition, obtaining a set of probes to act as L. M. Naeni School of Built Environment, University of Technology, Sydney, Australia e-mail: [email protected] A. Salehipour (B) School of Mathematical and Physical Sciences, University of Technology, Sydney, Australia e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_9

139

140

L. M. Naeni and A. Salehipour

a biomarker is often one of the primary goals of the analysis because it can be used to distinguish a certain group of samples, or to diagnosis a disease. Feature selection has a broad range of applications including data mining and analysis, machine learning and prediction, and in a variety of domains such as urban transport network planning [7], stock price prediction [12, 21], and computational biology and bioinformatics [1, 6, 8, 17, 18]. One interesting example in data analysis, which is related to the present study, is as follows. Assume that a set of healthy and disease samples and a set of features, e.g., genes or probes with their values of expression level for every sample are given. We aim to select the minimum number of features such that a classification between healthy and disease samples with a good accuracy when predicting those groups can be concluded. In other words, we seek a set of features to act as a biomarker. Two applications include the studies of [14, 16] in which novel biomarkers for the prediction of Alzheimer’s disease and Prostate cancer were presented. From the classification point of view, the studies argue that the reported biomarkers are superior to the earlier ones because they lead to a better accuracy classification when predicting healthy and disease groups. As acknowledged by [11], removing irrelevant or redundant features and reducing the dimensionality of the dataset are two reasons to perform feature selection. Those are both interesting and important because given the size of the datasets we encounter in many applications, they ease analysis, utilization and interpretation of high-dimensional datasets. The (α, β)-k feature set problem (FSP) is a combinatorial optimization-based approach for the feature selection [5]. On a dataset with two groups of data, the (α, β)-k FSP is able to select the minimum number of features in order to distinguish two groups (classes) of data such that the selected set of features maximizes the similarities between entities of the same group and the differences between entities of different groups. The problem is a generalization of the k-feature set problem, which is proven NP-hard [5]. Hence, the (α, β)-k FSP is also NP-Hard. There exist only a few studies on the (α, β)-k FSP. Initial works were appeared in [2, 5]. They developed integer programs for the (α, β)-k FSP, and used the standard solver CPLEX to solve the problem. The studies of [14, 15, 18] also focused on utilizing the standard solvers as the only solution technique. For example, [14, 15] used CPLEX to solve the (α, β)-k FSP on datasets associated with Alzheimer’s disease and Prostate cancer. Given the complexity of the problem, which is NP-hard, it is clear that exact solvers (for example, CPLEX) may not be able to solve large instances of the problem in a reasonable amount of time. This is a major limitation, particularly, because the majority of applications of (α, β)-k FSP involve dealing with large datasets. To the best of our knowledge, the algorithm of [11], which combines variable neighborhood search (VNS) and tabu search (TS), and utilizes general-purpose and randomized local searches is the only non-exact solution method available for the (α, β)-k FSP. That VNS+TS algorithm is able to solve large instances, however, as investigated in [11] it may not show a stable performance over all benchmark instances (due to its randomized components).

9 Heuristic Solutions for the (α, β)-k Feature Set Problem

141

The present study, which is motivated by the computational difficulty of the (α, β)k FSP, proposes new solution techniques for (α, β)-k FSP. The major contributions of the present study can be summarized as follow: (1) we propose an efficient solution method for the (α, β)-k FSP; (2) we produce quality solutions for medium and large instances of the problem; (3) we show that the proposed method is able to solve very large instances, and that significantly faster than the available exact solvers; and, (4) we show that the developed method has a very stable performance. The remaining of this paper is organized as follows. Section 9.2 defines the (α, β)k FSP. Section 9.3 models the problem of feature selection on a dataset with two groups of data as the (α, β)-k FSP. We propose a matheuristic algorithm for the (α, β)-k FSP in Sect. 9.4. Several mathematical properties of (α, β)-k FSP are also discussed in the section. We report the computational results in Sect. 9.5. Finally, the paper concludes with the outcomes of the study and a few future research directions.

9.2 Problem Statement Assume that two groups (classes) of data exist, for example, group 1 and group 2, and the set J = {1, . . . , n}, |J | = n of features each with a before profile P j , ∀ j ∈ J , i.e., P = {P j , ∀ j ∈ J } is given. A before feature profile P j includes a set of discrete values of either 0 or 1 for feature j. Furthermore, let S1 and S2 denote the set of all entities in group 1 and group 2, where S1 = {s11 , . . . , s1n 1 }, |S1 | = n 1 , and S2 = {s21 , . . . , s2n 2 }, |S2 | = n 2 . Let I1 and I2 represent sets of pairs of entities of different groups, and of the same group. That is, I1 = {(s11 , s21 ), . . . , (s11 , s2n 2 ), . . . , (s1n 1 , s2n 2 )} includes every combination of size two of entities belonging to different groups, and I2 = {(s11 , s12 ), . . . , (s11 , s1n 1 ), . . . , (s21 , s22 ), . . . , (s21 , s2n 2 )} includes every combination of size two of entities belonging to the same group. The profile of feature j can be modeled by a set of binary values. More precisely, P j = {ai j ∈ {0, 1}, ∀i ∈ I1 ∪ I2 , ∀ j ∈ J }. For element i ∈ I1 , ai j = 1 if and only if feature j has different values of expression level for the pair (for example, one entity has the value of 1 and the other one has the value of 0). Otherwise, ai j = 0. For element i ∈ I2 , ai j = 1 if and only if feature j has the same value of expression level for the pair (for example, both have the value of 1). Otherwise, ai j = 0. We note that the (α, β)-k FSP is defined with three positive integer variables α, β and k. The value of α represents the minimum number of features that must “explain” the differences between any pair of entities of different groups. The value of β represents the minimum number of features that must explain the similarities between any pair of entities of the same group. Finally, k represents the number of features to be selected. Following this, the (α, β)-k FSP aims to ensure (1) a set J ∗ ⊆ J of k features, among all alternative sets, is selected; (2) every element of I1 is explained by at least α features, where 1 ≤ α ≤ α ∗ ≤ k, and α ∗ is the maximum value of α; and, (3) every element of I2 is explained by at least β features, where 1 ≤ β ≤ β ∗ ≤ k, and β ∗ is the maximum value of β.

142

L. M. Naeni and A. Salehipour

We follow a three-phase decomposition-based approach for solving (α, β)-k FSP, which has also been used in the study by [11]. This three-phase approach decomposes the (α, β)-k FSP into three combinatorial optimization problems: • Phase 1. Obtaining α ∗ , i.e., the maximum value of α such that there exists a feasible solution for an instance of (α, β)-k FSP. We note that α ∗ is an instance-dependent parameter, and its value can be derived in polynomial time via α ∗ = mini∈I1 (αi ). • Phase 2. Obtaining the minimum number of features k necessary to explain the dichotomy between the groups, considering that at least α features do so for each pair of entities of different groups. This problem is known the Min k (α, β)-k FSP [11], and can be modeled as a set multi-cover problem. We note that any α ≤ α ∗ would lead to a different value for k. • Phase 3. Obtaining β ∗ , i.e., the maximum value of β such that a set of k features is selected to explain the dichotomy between the groups, and at least α features do so for each pair of entities of different groups. This problem is known the Max β (α, β)-k FSP, and can be modeled as an integer program, where α and k are given parameters. We note that Phase 3 intends to maximize the internal consistency of the entities in the same group (equivalent to a more robust feature set). Next, we show how the problem of selecting features from a dataset with two groups of data can be modeled as the (α, β)-k FSP.

9.3 Building an Instance of (α, β)-k FSP In this section we explain building an instance of (α, β)-k FSP from a dataset with two groups of data. Table 9.1 shows a dataset that includes five features, which may represent genes, probes, etc., and two groups of data. Group 1 consists of three healthy samples (entities), and group 2 consists of three disease samples (the number of entities in the groups do not necessarily need to be equal). The last row in Table 9.1 states the group label of samples. The entities of Table 9.1 refer to the discretized gene expression levels. Here, a feature may be “up-regulated” (associated with the value of 1 or “down-regulated” (associated with the value of 0) in a sample. We start by forming sets I1 and I2 . All combinations of size two of entities belonging to different groups lead to I1 = {(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)}, and all combinations of size two of entities belonging to the same group result in I2 = {(1, 2), (1, 3), (2, 3), (4, 5), (4, 6), (5, 6)}. Then, the feature profiles are extracted. This is shown in Table 9.2. We note that the column entitled “αi ” denotes the number of features that can explain the differences between any pair of entities of I1 . It is clear that α ∗ = mini∈I1 (αi ). The values of parameter αi are only derived for set I1 . Next, we explain our proposed solution method for solving the (α, β)-k FSP.

9 Heuristic Solutions for the (α, β)-k Feature Set Problem

143

9.4 The Proposed Solution Method We propose an efficient solution method for the (α, β)-k FSP, which operates by (1) determining α ∗ ; (2) obtaining a set of k features; and, (3) determining β ∗ . We earlier showed that α ∗ can be determined in polynomial time. Therefore, in the next sections we explain how k and β ∗ are determined.

Table 9.1 The dataset with two groups (classes) of data Feature Sample 1 Sample 2 Sample 3 Sample 4 A B C D E Group

0 1 1 1 0 1

0 1 1 1 0 1

0 1 1 1 1 1

Table 9.2 Building an instance of (α, β)-k FSP Element Profile PA PB PC (1, 4) (1, 5) (1, 6) (2, 4) (2, 5) (2, 6) (3, 4) (3, 5) (3, 6) (1, 2) (1, 3) (2, 3) (4, 5) (4, 6) (5, 6)

0 1 0 0 1 0 0 1 0 1 1 1 0 1 0

0 1 0 0 1 0 0 1 0 1 1 1 0 1 0

1 0 1 1 0 1 1 0 1 1 1 1 0 1 0

Sample 5

Sample 6

0 1 0 1 0 2

1 0 1 0 0 2

0 1 0 0 0 2

PD

PE

0 1 1 0 0 0 1 1 1 1 1 1 0 0 1

0 0 0 0 0 0 1 1 1 1 0 0 1 1 1

αi 1 3 2 1 2 1 3 4 3 – – – – – –

144

L. M. Naeni and A. Salehipour

9.4.1 Obtaining the Minimum Number of Features The Min k (α, β)-k FSP, which obtains the minimum number of features necessary to explain the dichotomy between the groups, considering that at least α ≤ α ∗ features do so for each pair of entities of different groups, can be modeled as the set multi-cover problem, where features represent the “columns” and elements denote the “rows” (see [19]). Despite being an NP-hard problem, very efficient exact and heuristic algorithms exist for the set multi-cover problem. Those algorithms are capable of solving very large instances of the problem. Two recent heuristics are due to [22] and [19]. Therefore, the minimum set of features (and hence, k) can be efficiently, though not optimally, determined. We let J ∗ denote such a set of features.

9.4.2 Obtaining the Maximum Value of β The Max β (α, β)-k FSP aims to obtain a set of k features with the maximum value of β, i.e., β ∗ that explains the dichotomy between the groups, and at least α ≤ α ∗ features do so for each pair of entities of different groups. We propose a matheuristic algorithm for obtaining β ∗ . The algorithm operates by utilizing certain properties of Min k (α, β)-k FSP and Max β (α, β)-k FSP. We explain these properties in the following. Proposition 9.1 An alternative optimal (best) solution for Min k (α, β)-k FSP is  x = k in the integer program of the problem, and obtained by including ∗ j j∈J performing a re-optimization.  Proof Observe that adding j∈J ∗ x j = k to the integer program of Min k (α, β)-k FSP ensures that the optimal (best) solution represented by J ∗ will not be explored, and new optimal (best) solutions will therefore be sought.  Iterative application of Proposition 9.1 leads to two possible outcomes: (1) no solution, implying that we explored all optimal (best) solutions, and (2) a new optimal (best) solution, showing that we may be able to build a pool of optimal (best) solutions. We note that because Min k (α, β)-k FSP is an integer program, obtaining all optimal solutions is therefore generally NP-hard. Nevertheless, we later show that even utilizing a fair-sized pool of optimal solutions for the Min k (α, β)-k FSP may lead to building a good quality feasible solution for the Max β (α, β)-k FSP. Proposition 9.2 Given an optimal solution for Min k (α, β)-k FSP, a feasible solution for Max β (α, β)-k FSP may be obtained in polynomial time. Proof Following the proposed three-phase decomposition approach for solving (α, β)-k FSP, it is clear that an optimal solution for Min k (α, β)-k FSP is a feasible solution for Max β (α, β)-k FSP. The  value of parameter β for this solution can  be derived in polynomial time via mini∈I2 ( j∈J ∗ ai j x j ).

9 Heuristic Solutions for the (α, β)-k Feature Set Problem

145

Proposition 9.3 Among all optimal solutions for Min k (α, β)-k FSP, Max β (α, β)-k FSP selects one with the largest value of β. Proof According to Proposition 9.2, it is clear that an optimal solution for Min k (α, β)-k FSP is a feasible solution for Max β (α, β)-k FSP. Given all optimal solutions for Min k (α, β)-k FSP, the Max β (α, β)-k FSP is therefore to find the solution with the largest value of β.  Proposition 9.3 is important because it states that the optimal solution for Max β (α, β)-k FSP lies in the pool of all optimal solutions of Min k (α, β)-k FSP. We later show that we may be able to produce quality solutions for Max β (α, β)-k FSP by only building a reasonable-sized pool of optimal solutions for Min k (α, β)-k FSP. We propose a matheuristic algorithm for Max β (α, β)-k FSP that uses the aforementioned properties in order to generate quality solutions. The algorithm has two major steps of generating a feasible solution, and producing an improved solution. Algorithm 1 summarizes the proposed matheuristic. Algorithm 1 The matheuristic algorithm for solving Max β (α, β)-k FSP. Input: Integer programs for Min k (α, β)-k FSP and Max β (α, β)-k FSP; set J of features; set J ∗ = {}, J ∗ ⊆ J of selected features; sets I1 and I2 of elements; parameters α, k and p (the size of pool). Output: A solution (set J ∗ ⊆ J of features) for Max β (α, β)-k FSP. Step 1. Constructing a feasible solution. Obtain a pool P = {P1 , . . . , Pp } of optimal (best) solutions for Min k (α, β)-k FSP; Given J˜ ⊂ J the set of common features across all solutions of P, construct a partially built solution for Max β (α, β)-k FSP via J ∗ = J ∗ ∪ J˜; IF J ∗ is not feasible Solve a sub-problem (of the original problem) over the sets of available features and “yet to be satisfied” elements; let J˜ be the set of features in the optimal solution of the sub-problem; J ∗ = J ∗ ∪ J˜; Step 2. Improving the solution. Utilize an exact solver/algorithm for solving the original Max β (α, β)-k FSP by starting from the so obtained feasible solution J ∗ ; Return the best obtained solution;

9.4.2.1

Constructing a Feasible Solution

From the pool of p optimal (best) solutions of Min k (α, β)-k FSP, the matheuristic algorithm extracts the features that are common across all optimal solutions. Because these features appear in every optimal solution to Min k (α, β)-k FSP (and hence, in every feasible solution to Max β (α, β)-k FSP), we believe they are able to form

146

L. M. Naeni and A. Salehipour

quality solutions. Let J˜ ⊂ J denote the set of common features. We note that J˜ may not be a feasible solution. Therefore, we need to add additional features in order to obtain a feasible solution. Adding additional features may be performed by solving a sub-problem of the original Max β (α, β)-k FSP, which has a reduced number of features and elements. The sub-problem is modeled by including available features and “yet to be satisfied” elements. The union of the set of features obtained through solving this sub-problem and J˜ forms a feasible set of features for Max β (α, β)-k FSP. If the sub-problem is large and therefore cannot be solved in a short time, recursive application of the feasible solution construction can be performed.

9.4.2.2

Improving the Feasible Solution

We propose a simple, yet effective, improvement procedure for Max β (α, β)-k FSP. Once we generate a feasible solution, we use the solver CPLEX in order to produce an improved solution by starting from the feasible solution. Particularly, the solvers have greatly improved in recent years, and they have shown great capability in solving variety of optimization problems, making solver-based algorithms (such as the one in this study) significantly faster and more effective. In Sect. 9.5 we show that this strategy is extremely effective in solving large instances of Max β (α, β)-k FSP.

9.5 Computational Results We implement the proposed solution method in the programming language Python 2.7. We also implement all mathematical models in Python by using the solver CPLEX 12.5.0 Python API. The computing resource has Linux Ubuntu 14.04 LTS operating system with 32 GB of memory and 12 cores of Intel®Xeon CPU E51650 clocked at 3.50 GHz. For all computational experiments we utilize only one processor. We evaluate the performance of the algorithm and models on two sets of 11 realworld instances. The first set includes six biological datasets ranging from small to large, and we choose them because the study of [11] utilized the same instances to evaluate the performance of their VNS+TS algorithm. The second set includes five large face recognition instances and may truly represent the actual computational challenges we may encounter in the applications of (α, β)-k FSP. We discuss that obtaining optimal solutions for the instances of the second set, or even good quality solutions, is a computational challenge for CPLEX. We show basic information regarding these 11 instances in Table 9.3. The first three columns show the instance name, number of features and the total number of entities/samples (of both group 1 and group 2) in the dataset. The next four columns provide the size of an instance of (α, β)-k FSP. Here, column “|J |” gives the total number of features, columns “|I1 |” and “|I2 |” denote the total number of pairs of

9 Heuristic Solutions for the (α, β)-k Feature Set Problem Table 9.3 Basic information of 11 real-world datasets Instance No. of No. of |J | |I1 | features entities ADMF DS PD1 PD2 PC SM 0-vs-all 1-vs-all 2-vs-all 3-vs-all 4-vs-all

686 73 17099 1674 3556 525 1969 3304 4243 5436 2005

83 15 105 25 171 1219 450 450 450 450 450

686 73 17097 1674 3556 525 1969 3304 4243 5436 2005

1720 56 2750 144 7290 273834 32400 32400 32400 32400 32400

147

|I2 |

α∗

Reference

1683 49 2710 156 7245 468537 68625 68625 68625 68625 68625

86 50 3970 760 229 22 354 683 1016 1394 387

[18] [10] [20] [9] [3] [4] [8] [8] [8] [8] [8]

entities of different groups and of the same group, and column “α ∗ ” shows the optimal value of α. The ADMF dataset represents 43 cases of Alzheimer’s disease and 40 controls, where pairs of proteins denote features. The DS contains 7 cases of Down Syndrome and 8 controls, and 73 genes (features). The PD1 and PD2 datasets are associated with Parkinsons’s disease, and contain 50 cases and 55 controls, and 16 cases and 9 controls, respectively. The PC dataset is for Prostate cancer and includes 90 cases and 81 controls. Finally, the samples in the SM dataset include 922 individuals who smoke and 297 who do not. SM involves more samples than features. The five datasets of the second set were originally extracted from a very large face recognition dataset of [13], which includes more than two groups of data. Due to the procedure we explained in Sect. 9.3, [8] applied a “one-vs-all” approach and obtained five datasets each containing two groups of data. Table 9.4 shows the outcomes of both CPLEX (solving the whole problem) and the proposed matheuristic algorithm on the 11 instances. The first three columns show the name of instances and the values of α ∗ and k. Under CPLEX heading, the column β reports the largest value of β obtained by CPLEX, “Time (s)” shows the computation time in seconds and “Gap (%)” reports the gap of CPLEX. Under the proposed matheuristic, column | J˜| shows the number of common features across all solutions of the pool, β0 denotes the initial value of β, which is calculated for the feasible solution obtained in Step 1 of Algorithm 1, and β shows the largest value of β obtained in Step 2 of Algorithm 1 (the highlighted values show optimal or superior solutions). Columns “Time (s)” and “Impr (%)” denote the computation time in seconds and the amount of improvement over CPLEX in percent, calculated CPLEX × 100, where β is from the matheuristic algorithm. as β−β βC P L E X As the table shows, the matheuristic algorithm is able to obtain superior solutions to CPLEX for all instances, including the optimal solutions. The algorithm also

148

L. M. Naeni and A. Salehipour

Table 9.4 Computational results of solving 11 instances, where |P| = 20 (20 optimal solutions for each instance of Min k (α, β)-k FSP were obtained) Instance α ∗

k

CPLEX β

Time (s)

Gap (%)

Proposed matheuristic | J˜| β0 β

Time (s)

Impr (%)

ADMF

86

292

118

5.84

0.00

101

114

118

13.11

0.00

DS

50

65

51

0.02

0.00

51

51

51

0.16

0.00

PD1

3970

9807

4325

2013.78

0.00

8858

4324

4325

3489.94

0.00

PD2

760

1265

645

0.4

0.00

1265

645

645

14.56

0.00

PC

229

725

233

18539.13 0.00

225

233

233

3657.76

0.00

SM

22

128

40

10577.19 0.00

37

39

40

6579.08

0.00

0-vs-all

354

1116



36000



998

471

471

1-vs-all

683

2220

987

36000

0.41

2120

982

989

2-vs-all 1016

3154



36000



3005

1394

1394

4215

1962

1965

7642.15

0.05

501

536

549

3935.98

0.00

3093.52

18.20

3-vs-all 1394

4395

1964

11044.41 0.33

4-vs-all

1324

549

1877.36

387

Average

0.00

13823.47 0.08

2466.08 100.00 2428.11

0.20

3801.78 100.00

leads to an average improvement of around 18% over the CPLEX outcomes. On average, the proposed algorithm is in the order of four times faster than CPLEX. We note that for the small and medium-sized instances CPLEX is faster than the matheuristic algorithm. For larger instances, however, the proposed matheuristic algorithm significantly outperforms CPLEX. Particularly, while CPLEX is not able to report even a feasible solution for some instances within 10 h of running (“-” denotes this and means that CPLEX is unable to solve the instance), the matheuristic algorithm produces quality solution for all instances. This shows the effectiveness and efficiency of the proposed algorithm, particularly, for large instances of (α, β)-k FSP. From the presented outcomes of Table 9.4, we emphasize that by only using CPLEX to solve (α, β)-k FSP we are not able to obtain quality solutions for all 11 instances, and even we are not guaranteed of producing feasible solutions. The proposed matheuristic algorithm, however, obtains feasible solution for all instances, delivers quality solutions that are superior to those delivered by CPLEX, and is significantly faster than CPLEX on large instances.

9.6 Conclusion This study proposes an effective and efficient matheuristic algorithm for the (α, β)-k FSP, which is an optimization-based approach for feature selection. We follow an existing three-phase decomposition approach in order to solve (α, β)-k FSP. We show that the proposed algorithm outperforms CPLEX, in terms of both solution quality and computation time, and obtains promising solutions for real-world instances.

9 Heuristic Solutions for the (α, β)-k Feature Set Problem

149

Additionally, we show that while CPLEX is not able to generate feasible solution for some instances, the proposed matheuristic delivers feasible solution for all instances. Further improvements to the performance of the matheuristic algorithm may be investigated. In particular, the present algorithm still requires significant computation time in order to solve large instances of (α, β)-k FSP, though it is mush faster than CPLEX. Acknowledgements Amir Salehipour was supported by the University of Newcastle (Australia). Also, he is the recipient of an Australian Research Council Discovery Early Career Researcher Award (project number DE170100234) funded by the Australian Government.

References 1. A.A. Albrecht, Stochastic local search for the feature set problem, with applications to microarray data. Appl. Math. Comput. 183(2), 1148–1164 (2006) 2. R. Berretta, W. Costa, P. Moscato, Combinatorial optimization models for finding genetic signatures from gene expression datasets, in Bioinformatics: Structure, Function and Applications. Series: Methods in Molecular Biology, vol. 453, no. 01 (2008), pp. 363–377 3. U.R. Chandran, C. Ma, R. Dhir, M. Bisceglia, M. Lyons-Weiler, W. Liang, G. Michalopoulos, M. Becich, F.A. Monzon, Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 7(1), 64 (2007) 4. J.C. Charlesworth, J.E. Curran, M.P. Johnson, H.H. Goring, T.D. Dyer, V.P. Diego, J.W. Kent, M.C. Mahaney, L. Almasy, J.W. MacCluer, Transcriptomic epidemiology of smoking: the effect of smoking on gene expression in lymphocytes. BMC Med. Genom. 3(1), 29 (2010) 5. C. Cotta, C. Sloper, P. Moscato, Evolutionary search of thresholds for robust feature set selection: application to the analysis of microarray data, in Applications of Evolutionary Computing, vol. 3005, ed. by G. Raidl. Lecture Notes in Computer Science (EvoWorkshops Conference, Coimbra, 2004), pp. 21–30 6. Y.-J. Fan, W.A. Chaovalitwongse, Optimizing feature selection to improve medical diagnosis. Ann. Oper. Res. 174(1), 169–183 (2010) 7. S.E. Ferchichi, K. Laabidi, S. Zidi, Genetic algorithm and tabu search for feature selection. Stud. Inf. Control 18(2), 181–187 (2009) 8. M.N. Haque, N. Noman, R. Berretta, P. Moscato, Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. PLoS ONE 11(1), e0146116 (2016) 9. T.G. Lesnick, S. Papapetropoulos, D.C. Mash, J. Ffrench-Mullen, L. Shehadeh, M. de Andrade, J.R. Henley, W.A. Rocca, J.E. Ahlskog, D.M. Maraganore, A Genomic pathway approach to a complex disease: axon guidance and Parkinson disease. PLoS Genet 3(6), e98 (2007) 10. H. Lockstone, L. Harris, J. Swatton, M. Wayland, A. Holland, S. Bahn, Gene expression profiling in the adult Down syndrome brain. Genomics 90(6), 647–660 (2007) 11. M.R. de Paula, Efficient methods of feature selection based on combinatorial optimization motivated by the analysis of large biological datasets. Ph.D. Thesis, School of Electrical Engineering and Computer Science, The University of Newcastle, Australia (2012) 12. R. Meiri, J. Zahavi, Using simulated annealing to optimize the feature selection problem in marketing applications. Eur. J. Oper. Res. 171(3), 842–858 (2006) 13. N. Pinto, Z. Stone, T. Zickler, D. Cox, Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook, in 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2011), pp. 35–52

150

L. M. Naeni and A. Salehipour

14. M.G. Ravetti, P. Moscato, Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9), e3111 (2008) 15. M. G. Ravetti, R. Berretta, P. Moscato, Novel biomarkers for prostate cancer revealed by (α, β)k-feature sets, in Foundations of Computational Intelligence, Chap. 7, vol. 5 (Springer, Berlin, 2009), pp. 149–175 16. M.G. Ravetti, P. Moscato, Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9), e3111 (2009) 17. M.G. Ravetti, O.A. Rosso, R. Berretta, P. Moscato, Uncovering molecular biomarkers that correlate cognitive decline with the changes of Hippocampus’ gene expression profiles in Alzheimer’s Disease. PLoS ONE 5(4), e10153 (2010) 18. M. Rocha de Paula, M. Gmez Ravetti, R. Berretta, P. Moscato, Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s Disease. PLoS ONE 6(3), e17481 (2011) 19. A. Salehipour, Combinatorial optimization methods for the (α, β)-k Feature Set Problem. Ph.D. Thesis, School of Electrical Engineering and Computing, The University of Newcastle, Australia (2019) 20. C.R. Scherzer, A.C. Eklund, L.J. Morse, Z. Liao, J.J. Locascio, D. Fefer, M.A. Schwarzschild, M.G. Schlossmacher, M.A. Hauser, J.M. Vance, L.R. Sudarsky, D.G. Standaert, J.H. Growdon, R.V. Jensen, S.R. Gullans, Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3), 955–960 (2007) 21. C.-F. Tsai, Y.-C. Hsiao, Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis. Support Syst. 50(1), 258–269 (2010) 22. Y. Wang, Y. Minghao, O. Dantong, Z. Liming, A novel local search algorithm with configuration checking and scoring mechanism for the set k-covering problem. Int. Trans. Oper. Res. 6, 1463– 1485 (2017)

Chapter 10

Generic Support for Precomputation-Based Global Routing Constraints in Local Search Optimization Renaud De Landtsheer, Fabian Germeau, Thomas Fayolle, Gustavo Ospina, and Christophe Ponsard Abstract Global constraints are very popular in routing optimization because they have a very good time complexity. Yet, they can be hard to implement because they have lots of technical requirement on the way information is transmitted to them. Our goal is to build a generic, constraint-based, framework for local search optimization (CBLS), with application in routing optimization. In this chapter, we identify a generic stereotype of differentiation mechanism used by global constraints for routing optimization and we propose a generic support for this stereotype, so that implementing global constraint is made easier. Our generic support is implemented as an abstract class with placeholders for the differentiation algorithm. Once this class is properly extended, it gives rise to a fully usable and efficient global constraint that efficiently supports virtually any move and neighbourhood exploration. Our framework is illustrated on the time window and route length constraints, as well as a constraint representing the energy consumption of quadcopters.

10.1 Introduction Efficiently evaluating constraints in the context of vehicle routing optimization by local search relies on so-called global constraints algorithms [1–3]. A classic example R. De Landtsheer (B) · F. Germeau · T. Fayolle · G. Ospina · C. Ponsard CETIC Research Centre, Charleroi, Belgium e-mail: [email protected] F. Germeau e-mail: [email protected] T. Fayolle e-mail: [email protected] G. Ospina e-mail: [email protected] C. Ponsard e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_10

151

152

R. D. Landtsheer et al.

of global constraint is one that maintains the length of the route in routing optimization through an incremental computation [1]. For example, when considering the flip of a portion of route (i.e. a, b, c, d becomes d, c, b, a or a, b, c, d, e, f, g, h, i becomes a, b, c, g, f, e, d, h, i) and considering a symmetric distance, a smart global constraint is able to update the route length in O(1)-time because the length of the flipped segment is the same in both directions. More generally, two generic approaches can be used by global constraints to achieve a good complexity: pure delta and precomputation. Pure delta: From a start state, given the necessary information about changes on the input, some global constraints update their output incrementally. A typical example is the route length constraint with a single vehicle, and a symmetric distance matrix, as presented here above. Precomputation: Prior to a neighbourhood exploration, the global constraint performs some precomputation, based on the current values in the optimization model; it decorates each node in the route with some value. Such values are then exploited when the constraint must evaluate a neighbour explored by a neighbourhood. Typically, a neighbourhood breaks the starting route into segments, which are assembled in a different order as defined by the neighbour. For instance, a 2-opt neighbourhood breaks a route into three segments: the segment before, the flipped segment and the segment after. Such constraint then computes the value of each segment and combines these values to quickly produce the output value of the constraint. The value of a segment is computed by applying some difference operator between the precomputed values associated with the last and the first node of the segment. In the case of the asymmetric distance matrix, the precomputation typically associates the forward and the backward distance to each node. The forward (resp. backward) distance associated to a node represents the distance travelled by the vehicle from the start point to the node (resp. the distance travelled from the node to the start point if the whole route was flipped). When a subsequence is flipped, the old (resp. new) length of the flipped segment is the difference between the forward (resp. backward) distances associated to the nodes at both ends of the flipped segment. These two distances can be computed in O(1)-time, thanks to the precomputation. In this chapter we focus on precomputation-based global constraints. Global constraints are often implemented into custom-made solvers, usually for run time efficiency reasons. This raises two concerns; first: their implementation is often not reusable, and second: they are costly to implement since these algorithms can be complex. Our research line is to provide a modular framework enabling to quickly and easily develop, benchmark and evolve search procedures by assembling, in an algorithmic and efficient way, more generic building bricks. This framework is OscaR.CBLS, which contains a constraint-based modelling language that notably features variables of type “sequence of integers” (SeqVar) that makes it possible to implement global constraints in routing within a CBLS engine [4–6]. Yet this remains a rather tedious task because one must implement the algorithms of the constraints, and interface them with the engine. This paper proposes an additional mechanism that makes the implementation of global constraints easier.

10 Generic Support for Precomputation-Based Global Routing …

153

The structure of this chapter is as follows: Sect. 10.2 provides some background about OscaR.CBLS; Sect. 10.3 presents our generic support for global constraints; Sect. 10.4 illustrates on a constraint of energy consumption by quadcopters; Sect. 10.5 illustrates on a time window constraint, and presents some benchmarks; Sect. 10.6 presents some related work.

10.2 Background 10.2.1 Constraint-Based Local Search, the OscaR Way Like other local search approaches, CBLS relies on a model and a search procedure. CBLS frameworks may offer support for both aspects. In CBLS frameworks, the model is composed of variables (integers and sets of integers at this point), and invariants, which are directed constraints maintaining one or more output variables according to the specification they implement as well as the value of one or more input variables. A classical invariant is Sum. It has an array of integer variables as input, and a single integer variable as output. The value of the output variable is maintained by the invariant to be the sum of the values of all the input variables. This means that the invariant sets the value of the output variable and adjust it according to the changes of the input variables. This is generally implemented through incremental algorithms. For instance, when the Sum invariant defined here is notified that one of its input variables has changed its value, it computes the delta on this variable between the new value and the old value, and updates the value of its output variable by increasing it by this delta. OscaR.CBLS has a library of roughly 80 invariants. The model is declared at start-up by some application-specific code that instantiates the variables and invariants. During the search, the model is active: if the value of some input variable changes, this change is propagated to the other variables of the model through a process called propagation, that is managed by the underlying CBLS framework. Propagation is performed in such a way that a variable is updated at most once, and only if it needs to be updated.

10.2.2 Sequence Variables The OscaR.CBLS framework supports variables of type “sequence of integers” [6]. The purpose of this type of variable is threefold: (1) to be quickly updated to reflect typical moves used in routing optimization, such as flipping a fraction of route, of moving a fraction of route somewhere else in the sequence; our sequence variable was therefore developed with tailored, efficient data-structures (2) to communicate these updates in a symbolic way to listening constraints, so that global constraint

154

R. D. Landtsheer et al.

Fig. 10.1 Search tree generated by a cross-product in a PDP solver

algorithms can be implemented (3) to support a notion of checkpoint mechanism that serves to signal when precomputations should be performed by constraints if needed and also makes it possible to use a Rollback operation that restores the variable to the value it had at the checkpoint. Sequence variables support three incremental update operations, namely: inserting a value in the sequence, removing the value at some position, and moving a subsequence within the sequence, optionally flipping it. Sequence variables have a dedicated language to communicate their updates to listening constraints, called SeqUpdates. The language includes the three incremental operations, a non-incremental assignment, and checkpoint definition and rollback. Upon propagation, sequence variables notify their value updates to the global constraint listening to them. SeqUpdates is an adequate language for implementing pure delta global constraints as defined in Sect. 10.1, but offers no support for precomputation-based constraints, using the notion of checkpoints instead [6].

10.2.3 Cross-Product of Neighbourhoods Local search neighbourhoods can also be combined together by a cross-product operator [7, 8]. The cross-product of two neighbourhoods is a new neighbourhood, that resembles a tree with two levels. The first level are the neighbours generated by the fist neighbourhood. The second level are the neighbours generated by the second neighbourhood, starting from each neighbour of the first neighbourhood. A typical search tree is shown in Fig. 10.1. It comes from a pick-up & delivery (PDP) routing optimization solver. It is the cross product of (1) inserting a pickup into a route and (2) inserting the corresponding delivery in the same route, after the insertion position of the pick-up.

10.2.4 Conventions for Modeling VRP To represent vehicle routing problems (VRP) with sequence variables, we use the following conventions that neighborhoods must take care of: • there are n nodes, numbered [0; n − 1];

10 Generic Support for Precomputation-Based Global Routing …

0

3

4

5

1

8

2

155

6

7

Fig. 10.2 Representing a VRP with a single SeqVar (v=3)

• there are v vehicles, numbered [0; v − 1] with v < n; • the routes of all vehicles are represented in the same sequence variable, so that we can move a sub-route from one vehicle to another vehicle within the same data-structure; • nodes [0; v − 1] represent the start and end points of vehicles [0; v − 1], respectively; • nodes [0; v − 1] are always in the sequence, in this order; • nodes [0; v − 1] cannot be involved in a move, such as a flip; • the nodes in the sequence between two vehicles, say i and i + 1, represent the route of vehicle i; we implicitly represent at the last node before i + 1 that the vehicle i returns to its starting point, as shown in Fig. 10.2; • each node can be present at most once in the sequence.

10.3 A Generic Routing Global Constraint 10.3.1 Template for Precomputation-Based Global Constraints The general scheme of a routing global constraint is the following: a invariant is defined whose update in a naive way requires to pursue the entire route; a smart precomputation allows to update the output of the invariant in a constant time for the segments on which the precomputation has been continuously made; since exploring classical routing neighbourhood consists either to insert a non- routed point or to move segments of road, the output of the invariant for a neighbour can be computed in a time that depends on the number of segments of the route instead of the number of nodes of the route. Using this general scheme, a generic interface can be developed in order to simplify the definition of a global constraint. The generic interface contains a function precompute that associate a value of type T1 to each node of the route of a vehicle and a function computeVehicleValue that takes the list of segments that compose the route of a vehicle and returns a value of type T2 which is the type of the output of the invariant. The implementation of these two functions are used by the internal mechanism to compute the output of the invariant. The template for the generic global constraint is in Table 10.1. Using the classical routing neighbourhoods (insert point, one-point move, twoopt and three-opt), a segment of road can be of three types: it can be (1) a new

156

R. D. Landtsheer et al.

Table 10.1 Generic template for global constraint in routing Template Definition T1 T2 precompute computeVehicleValue

Type of the precomputed values Type of the output of the invariant Precompute function: it associates a value of type T1 to each node of a route Function that computes a value of type T2 that is the output of the invariant for the route of a vehicle; it takes as a parameter the segments that compose the route

inserted node, (2) a flipped segment or (3) a segment that is not flipped. When only a point is moved, it is considered as a segment composed of one point. A new inserted point contains the identifier of the node that is inserted and a segment containing the identifiers of the nodes at its extremities and the precomputed values associated to the nodes.

10.3.2 Route Length with Asymmetric Distance Matrix A classic example of global constraint is the constraint that maintains the length of the route in routing optimization. It takes a distance matrix specifying the distance between each pair of nodes of the routing problem and the current route as input. Assuming the most general case with an asymmetric distance matrix, we can perform the precomputation described in the introduction to ensure that the overall route length can be updated in O(1)-time in case a portion of the route is flipped. To do this, we associate a couple of integers to each node: • the length of the route from the starting node of the vehicle to the actual node (forward distance) • the length of the route from the actual node to the starting node of the vehicle (backward distance) This constraint can be implemented on the generic global constraint defined in Sect. 10.3.1. The type T1 , which is the type of precomputed values, is a couple of integers that represents the length of the route from the starting node to the actual node and the length of the route from the actual node back to the starting node. The type T2 , which is the output type, is an integer that represents the total length of the road. Figure 10.3 illustrates how the generic global constraint mechanism works on a two-opt example. In this example, we have some route that goes from a to l. The part c, . . . , f of this route is flipped. This route is represented by three segments: the segment from a to b, the segment from f to c which is flipped and the segment from

10 Generic Support for Precomputation-Based Global Routing …

a

b

c

d

e

(fa ,ba )

(fb ,bb )

(fc ,bc )

(fd ,bd )

(fe ,be )

a

b

f

e

Initial route h f g (ff ,bf )

(fg ,bg )

(fh ,bh )

157

i

j

k

l

(fi ,bi )

(fj ,bj )

(fk ,bk )

(fl ,bl )

j

k

l

Route after two-opt move d c g h i

Fig. 10.3 A two-opt move

g to l. Each node n is decorated with the forward ( f n ) and backward (bn ) distance. Using the precomputed values, the length of each segment can be calculated in constant time. The length of the segment from a to b is equal to f b − f a . Since the segment from f to c is flipped, the backward distance is used and the length of the segment from f to c is equal to bc − b f . And finally, the length of the route from a to c is equal to the length of segment from a to b, plus the length of the segment from f to c, plus m(b, f ), the distance from b to f , where m is the distance matrix. The complete length of the new road is: ( f b − f a ) + m(b, f ) + (bc − b f ) + m(c, g) + ( fl − f g )

10.3.3 Keeping Track of Segments and Precomputations The global strategy is to ensure that precomputation is done before exploring a neighbourhood. When a neighbour is evaluated, the relevant subsequence is identified and the differential formula using precomputation is applied on it. When dealing with simple neighbourhoods (like 2-opt,3-opt), it is trivial to identify the involved subsequence. However, this becomes trickier when the exploration is more complex and uses cross-product of neighbourhoods. In order to reuse precomputation when updating the sequence or when exploring search trees, we maintain a bijection that maps the node position in the current sequence to the node position in the sequence when precomputation was performed, similarly to the approach presented in [6]. Such bijection is made of segments of slope +1 or −1. When precomputation is performed, the bijection is initialized to the identity function. As the sequence is being updated, this bijection is updated to reflect the moves in the sequence. for instance, flipping a subsequence, as done by a 2-opt, introduces a segment with a slope of −1 on the flipped segment, see for instance the result of such a flip in Fig. 10.4d. Furthermore, we also keep track of the last position in the sequence. This is named ω. It is represented in the figures by the vertical line. Newly inserted nodes are mapped on a position that is beyond the end of the sequence when precomputation was performed. This value is named α. It is represented in the figures by the horizontal line. Through this mechanism,

158

R. D. Landtsheer et al. 12.

12.

11.

11.

10.

10.

9.

9.

8.

8.

7.

7.

6.

6.

5.

5.

4.

4.

3.

3.

2.

2.

1. −1. −1.

1. 0

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

(a) Insert at position 5 on identify function

−1. −1.

12.

11.

11.

10.

10.

9.

9.

8.

8.

7.

7.

6.

6.

5.

5.

4.

4.

3.

3.

2.

2.

−1. −1.

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

(b) Remove at position 8 on (10.4a)

12.

1.

0

1. 0

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

(c) Move of [2,4] after position 5 on (10.4b)

−1. −1.

0

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

(d) Flip of [5,7] on (10.4c)

Fig. 10.4 Illustrating how bijections can capture the evolution of the segment at precomputation time

we are able to keep track of any change and composition of change performed on the sequence as illustrated in Fig. 10.4. The bijections are represented using non mutable data-structures, so that roll back to checkpoints are performed by restoring the bijection as it was when the checkpoint was defined. From the bijection we directly identify the subsequences that were present in the sequence when precomputation was performed. These are called segments. Segments present in the bijection can be classified in three categories: (1) segments having precomputations (2) segments having precomputations and a negative slope (i.e. flipped) and (3) segments without precomputations. Since the image of inserted nodes is always strictly greater than α, it is easy to distinguish nodes with precomputations from those without precomputations. The segments found in these bijections were present in the sequence when precomputation was performed (proofs are in [9]).

10.3.4 Segment-Based API The OscaR.CBLS framework provides an implementation of generic routing global constraints. It comes as an abstract class where two main methods must be implemented: performPrecompute, which effectively do the precomputations, and computeVehicleValue, which computes a value related to a given vehicle in the route, relying on the precomputed values. An excerpt of the Scala code for the

10 Generic Support for Precomputation-Based Global Routing …

159

Fig. 10.5 Template of a routing global constraint in OscaR.CBLS Table 10.2 Segment classes for routing global constraints in OscaR.CBLS Class Definition NewNode(node)

PreComputedSubSequence( startNode, startNodeValue:T, endNode, endNodeValue:T) FlippedPreComputedSubSequence( startNode, startNodeValue:T, endNode, endNodeValue:T)

A node that was not present in the initial sequence when the precomputation was performed A subsequence starting at startNode and ending at endNode, that was present in the initial sequence when precomputation was performed A subsequence starting at startNode and ending at endNode, such as the result of flipping that subsequence was present in the initial subsequence when precomputation was performed

global constraint template is shown in Fig. 10.5. The implementation makes use of Segment that either represent sections of route that where present when precomputation was performed, or newly inserted nodes. There are three classes of segments, documented in Table 10.2.

10.3.5 Logarithmic Reduction of Quadratic Precomputations Some global constraints algorithms do not have any “−” operator because the precomputed values lose information from one node to the next node. Time window constraint in Sect. 10.5 is an example of such constraint. Therefore, they cannot compute the difference between the end and start of a segment. A possible fallback approach is to precompute the value of every possible segment present in the route. This takes O(n 2 )-time. In such cases, precomputation is more time consuming and the global constraint only outperforms naive on neighbourhoods with more than O(n 2 ) neighbours. To cope with such cases, we propose a generic log-reduced generic constraint. It works on steps of type S from one node to the next node and requires a composition operator that can compose two such steps to deliver a new one.

160

R. D. Landtsheer et al.

n0

n1

n2

n3

n4

n5

n6

Fig. 10.6 Precomputations in log-reduced constraints

It precomputes the step from each node to the next one, and through composition, it computes the step from each even node to the next even node, from each 4th node to the next 4th node etc. This is illustrated on Fig. 10.6. This precomputation requires O(n) time. Based on this precomputation, the step that spans over any segment in the route can be constructed out of O(log n) steps fetc.hed from the precomputed values. In practice, these precomputed values are sent to the global constraint through an API similar to the one presented in Sect. 10.3.4.

10.4 A Global Constraint for Drone Autonomy To illustrate our framework, we present a global constraint that computes and maintains the energy needed for a quadcopter to flight over a given route. The route is a succession of points, where the drone is expected to drop weights. The application scenario is selective spraying of phytosanitary products on corn fields. A model the energy consumption of a drone in a straight-line flight can be developed, considering various parameters such as air friction, cargo weight, maximal tilt of the drone, etc. Through interpolation, this was simplified into a quadratic formula: E i, j = di, j .(am 2j + bm j + c)

where E is the energy needed from the battery to perform the straight-line flight; d is the length of the straight-line flight; a,b,c are factor that are deduced from the interpolation and represent the flight and energy capabilities of the drone; m is the overall mass of the drone including the cargo weight. The total energy consumption of a drone on a given route is therefore the sum of energy consumption for each hop, knowing that the weight of the drone decreases by wi at each node i: Er oute = where m i =



2 (i, j)∈route di, j .(am i



+ bm i + c)

k∈route.suffixFrom(i) wk

(10.1) (10.2)

Evaluating this formula in a naive way on a route requires O(n)-time. We have engineered a factorization of (10.1) and (10.2) to achieve faster evaluation, at the cost of some precomputation. The values to precompute on a route route and associate to each node q are as follows:

10 Generic Support for Precomputation-Based Global Routing … Eq = mq = dmq = lq =

   

2 (i, j)∈route.prefixTo(q) di, j .(am i

161

+ bm i + c)

k∈route.suffixFrom(q) wk (i, j)∈route.prefixTo(q) di, j .m i (i, j)∈route.prefixTo(q) di, j

These values are computed on the routes and on the reversed routes. Each node has therefore eight precomputed numerical values associated to them. These values are computed in a daisy-chain fashion, so that precomputation is performed in O(n) time. When evaluating a route over a list of segments, we proceed in a backward fashion, iterating over the segments from the end of the route to the start of the route. The iteration maintains a state(m, i, E) that includes: m the current mass of the drone, i the current node, and E the energy needed to perform the flight from the current node to the end of the route. Performing a backward step over a PreComputedSubSequence( startNode, startNodeValue, endNode, endNodeValue) starting from some state(m, i, E) gives rise to the new value state(m  , startNode, E  ) as follows: m seg = m + wi − endNodeValue.m E seg = (endNodeValue.E − startNodeValue.E) +2a(m seg )(endNodeValue.dm − startNodeValue.dm) +(a(m seg ) + b)(m seg )(endNodeValue.l − startNodeValue.l) m hop = state.w + wi E hop = (dendNode,i (am 2hop + bm hop + c)) m  = startNodeValue.m + m seg E  = E + E seg + E hop

The seg values represent the flight over the PreComputedSubSequence, the hop values represent the flight from endNode to i. Stepping over flipped segments requires using the values precomputed over the flipped route. Stepping over a new node requires applying the hop formulas twice. The proof proceeds by expanding and simplifying the polynomial formulas. Evaluating a route is performed in O(s)-time.

10.5 Global Time Window Constraint Time window (TW) constraint is another classical constraint that neatly incorporates into our framework. In a Time window constraint, each node i has three constant values associated to it: ei : the early line: if the vehicle arrives before it, it has to wait, otherwise it can enter the node straight, di : the deadline: if the vehicle arrives after it, it is an error, and q: the duration to be spent inside the node. Furthermore, there is a matrix m i, j that specifies the duration to go from node i to node j.

R. D. Landtsheer et al.

f e,d ,l (t )

if(t

e) l

else if(t

d) t

e

l

leave time

162

1 l

else ERROR





e d arrival time

Fig. 10.7 Transfer function; definition and numerical representation

To fit such constraint into our framework, we introduce the concept of transfer function that maps the arrival time on a node to the leave time from the node, or ERROR if the arrival rime is after the deadline [10]. We also use this function to represent the temporal impact of route segments. A transfer function is defined by a triplet e, d, l where e is the opening of the time window (like for a node), d is the latest arrival time (for a node, it is the deadline), and l is the earliest leave time of the considered structure (for a node, it is equal to e). Transfer function is defined in Fig. 10.7. We define a chaining operator, chain, that takes two transfer functions, say f and g plus an additional travel time, say m, and returns a new transfer function h = chain(g, m, f ) such that h(t) = g(m + f (t)), for any t. The function h represents the transfer function obtained by travelling first through f , then perform the travel between f and g and then through g. h has the same structure, hence it can be defined through a triplet that can be computed from m and the e, d, l values of f and g. We have implemented two versions of global time window constraints, based on our generic framework and transfer functions, namely: a quadratic TW where a precomputation assigns a transfer function to each subsequence of the current route; his takes O(n 2 )-time and the transfer function of each segment can be queried from this precomputation in O(1)-time. a log-reduced TW that uses the logarithmic reduction presented in Sect. 10.3.5; precomputation is then performed in O(n)-time and the transfer function of a segment is made of o(log n) transfer functions, global evaluation time of the constraint is therefore O(s log n)-time. Figure 10.8 presents a comparative benchmarks of various versions of time window constraints on a VRPTW including a naive constraint (no precomputation), the quadratic TW and the log- reduced TW. We used two search procedures: (1) a realistic one that starts from an empty route, inserts nodes, after that, it performs a 3-opt and (2) a pathological search strategy that only performs inserts, and selects the first possible insert; it explores very few neighbours, so it barely uses the precomputations. The explorations and selected moves of the two search procedures are not dependent on the constraint in use, as far as the constraints have identical values, which is the case. The two search procedures perform identical pruning at start-up, based on time criterion. All neighbourhoods select the first improving move on each dimension they explore, and their inner loop is restricted to the 20 more promising neighbours, based on a k-nearest heuristic.

10 Generic Support for Precomputation-Based Global Routing … Insert exhaust 3-opt

163

Insert first (pathological)

10 8

N Q L

6

0.3

N Q L

0.2

4 0.1

2 0 100 200 300 400 500 600 700 800 9001000

0 100 200 300 400 500 600 700 800 9001000

Fig. 10.8 Average run time (in seconds) on 100 random VRPTW instances for the three time window constraint (Naive, Quadratic and Log-reduced) and various values of n (100, 200, …, 1k)

From the benchmarks, we can see that (1) on the realistic search procedure, the quadratic and log-reduced constraints achieve very similar efficiency, and they are up to 3.5 times faster than the naive constraint (2) on the pathological search procedure, the quadratic constraint is slower than the naive constraint because precomputations are barely used (3) for both procedures, the log-reduced constraint exhibits the best efficiency.

10.6 Related Work One of the earliest works on the integration of global constraints with local search is [11]. They adapt the notion of global constraint as generalizations of lower-level constraints containing sets of heuristics that can be used at updating of local search variables. In a similar way, in [12] an adaptive search algorithm is described, which is roughly a way to explore local search neighbourhoods using constraint satisfaction problems. Another approach is the use of local search techniques for enriching the propagation of global constraints in constraint programming, like the branch and move algorithm described in [13] to solve the TV-Break packing problem. VRP is one of the most well-known problems in combinatorial optimization [14, 15]. Several VRP variants have been tackled with constraint programming and local search approaches. An architecture for VRP decision systems which used constraint propagation is proposed in [16]. The proposed architecture allows to consider and to process the constraints identified by the work domain analysis during the problem resolution. In [17], a model combining route capacity and length constraints is proposed. A kind of global constraint for VRP is defined in [18], called inter-tour constraints, related to resources shared by all the routes. By extending the routes graph with Hamiltonian cycles, the Large Neighbourhood Search metaheuristic can be applied.

164

R. D. Landtsheer et al.

10.7 Conclusion and Perspectives In this chapter, we extended the basic mechanisms provided by our sequence variable to manage global constraints by defining an abstract invariant template that makes their implementation much easier. We also contributed efficient mechanics to manage the execution of the precomputation and the queries to the precomputation. Precomputation being an expensive operation, we designed our framework so that it can exploit precomputation even if the sequence was modified since it was performed through a specific bijective function. We illustrated our template on a few global constraints. As future work, we plan to capture more constraints into our framework, and reduce as much as possible the technical overhead of our global constraint framework. We will also carry out a wider scale benchmarking campaign based on a larger set of examples and also compare our work with global constraints implemented in other engines not only from the performance point of view but also for the ease of development and evolution. Acknowledgements This work is partly funded by the ePick Innovation Partnership Project from the Walloon Region of Belgium (grant nr. 7570). The global constraint for drone autonomy and the bijection mechanism were developed by Yann Ferezou and Quentin Meurisse, respectively, during their internship at CETIC.

References 1. F.W. Glover, G.A. Kochenberger, Handbook of Metaheuristics. International Series in Operations Research & Management Science (Springer, New York, US, 2003) 2. N. Mladenovi´c, D. Uroševi´c, S. Hanafi, Variable neighborhood search for the travelling deliveryman problem. 4OR 11, 57–73 (2013) 3. M.W.P. Savelsbergh, The vehicle routing problem with time windows: minimizing route duration. ORSA J. Comput. 4, 146–154 (1992) 4. OscaR Team, OscaR: operational research in Scala (2012). Available under the LGPL licence from https://bitbucket.org/oscarlib/oscar 5. R. De Landtsheer, C. Ponsard, Oscar.cbls : an open source framework for constraint-based local search, in Proceedings of ORBEL’27 (2013) 6. R. De Landtsheer, Y. Guyot, G. Ospina, F. Germeau, C. Ponsard, Reasoning on sequences in constraint-based local search frameworks, in Proceedings of the 15th International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, CPAIOR (2018) 7. R. De Landtsheer, Y. Guyot, G. Ospina, C. Ponsard, Combining neighborhoods into local search strategies, in Recent Developments of Metaheuristics (Springer, Berlin, 2018), pp. 43–57 8. R. De Landtsheer, F. Germeau, Y. Guyot, G. Ospina, C. Ponsard, Easily building complex neighbourhoods with the cross-product combinator, in Proceedings of the 32th ORBEL Annual Meeting, pp. 127–128 (2018) 9. Q. Meurisse, Development of a Capacity Global Constraint for Vehicle Routing. Internship Report (in French), University of Mons (2017) 10. F. Germeau, Optimisation de routage de véhicules : conception et implémentation de contrainte globale de temps. Master thesis (in French), University of Namur (2018)

10 Generic Support for Precomputation-Based Global Routing …

165

11. A. Nareyek, Constraint-Based Agents. Volume 2062 of Lecture Notes in Computer Science (Springer, Berlin, 2001) 12. P. Codognet, D. Diaz, Yet another local search method for constraint solving, in Stochastic Algorithms: Foundations and Applications, International Symposium, SAGA 2001 Berlin, Germany, December 13–14, 2001, Proceedings, pp. 73–90 (2001) 13. T. Benoist, E. Bourreau, Improving global constraint support by local search, in: CP01, Seventh International Conference on Principles and Practice of Constraint Programming, Paphos, Cyprus, LNCS 2239, pp. 61–76 (2001) 14. N. Labadie, C. Prins, C. Prodhon, N. Monmarché, P. Siarry, Metaheuristics for Vehicle Routing Problems. Wiley Online Library (2016) 15. P. Toth, D. Vigo, Vehicle Routing: Problems, Methods, and Applications (SIAM, Bangkok, 2014) 16. B. Gacias, J. Cegarra, P. Lopez, An interdisciplinary method for a generic vehicle routing problem decision support system, in International Conference on Industrial Engineering and Systems Management (IESM 2009) (2009) 17. M.T. Godinho, L. Gouveia, T.L. Magnanti, Combined route capacity and route length models for unit demand vehicle routing problems. Discret. Optim. 5, 350–372 (2008) 18. C. Hempsch, S. Irnich, Vehicle routing problems with inter-tour resource constraints, in The Vehicle Routing Problem: Latest Advances and New Challenges (Springer, Berlin, 2008), pp. 421–444

Chapter 11

Dynamic Simulated Annealing with Adaptive Neighborhood Using Hidden Markov Model Mohamed Lalaoui, Abdellatif El Afia, and Raddouane Chiheb

Abstract The Simulated Annealing (SA) is a stochastic local search algorithm. Its efficiency involves the adaptation of the neighbourhood structure. In this paper, we integrate Hidden Markov Model (HMM) in SA to dynamically adapt the neighbourhood structure of the simulated annealing at each iteration. HMM has proven its ability to predict the optimal behavior of the neighbourhood function based on the search history. An experiments were performed on many benchmark functions and compared with others SA variants.

11.1 Introduction Since the publication of the first article related to the SA algorithm [1], researchers have tried to increase the rate of convergence and to reduce the execution time of simulated annealing. Researchers focused since, the first version of the simulated annealing algorithm described by [1], on two strategies in order to improve the performance of SA. The first strategy was the implementation of parallel simulated annealing [2–4]. The second one was about the optimization of cooling schedule and the adaptation of parameters. In general, the SA can be seen as an iterative improvement process composed of three functions: generation, acceptance and cooling. An extensive studies have been done on the update and acceptation function, but only limited attention has been paid to the generate function that governs the convergence of SA. When the M. Lalaoui (B) · A. El Afia · R. Chiheb National School of Computer Science and Systems Analysis, Mohammed V University, Rabat, Morocco e-mail: [email protected] A. El Afia e-mail: [email protected] R. Chiheb e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_11

167

168

M. Lalaoui et al.

SA is applied to continuous optimization problems, the appropriate adjustment of the neighborhood range according to the landscape of the given problem is very important. Because It significantly affects the accuracy of the solutions. In general, the SA tends to get stuck into local minima when the neighborhood range is too small. In the special case of the fast-simulated annealing (FSA) [5], the state variation x is generated by a Cauchy distribution whose probability distribution is given as: p (x, Tn ) =

Tn 1 π (x)2 + Tn2

(11.1)

The Cauchy distribution allows the SA to escape from local minima easier than the normal distribution because of its flatter tails [6]. The new neighborhood solution is generated using the formula: xnew = xold + μ × Tn × tan(u)

(11.2)

u is a random vector generated from the uniform distribution U (−π /2, π/2), μ is the scale constant for tuning the probabilistic acceptance criteria also called the learning rate and Tn is the temperature at stage n. The temperature schedule is conducted by the following equation: Tn =

T0 1+n

(11.3)

Where T0 the initial temperature. This paper explores the Hidden Markov Model for a dynamic tuning of FSA during the run. The main idea is to adapt the convergence speed through the dynamic adjustment of μ parameter. The rest of the paper is organized as follows. Section 11.2 is devoted to literature review, Sect. 11.3 describes our proposed method for FSA parameter tuning through Hidden Markov Model. Section 11.4 describes a performance analysis on several benchmark functions, finally Sect. 11.5 presents some conclusions.

11.2 Literature Review The fast simulated annealing is the most used in the literature, but the major drawback of this cooling policy is that it does not take into account the state of the system during the search. Thus, it is difficult to adaptively modify the intensity of the search based on the difficulty of the problem. According to Battiti and Brunato [7], a better performance can be achieved by a self-analysis during the search. Researchers integrate a learning technique into the neighborhood function. An adaptive approach to adjust the neighborhood range of SA for continuous optimization problems was proposed by Corana [8]. But It has been proved by [9] that this method is not better than the SA with the good neighborhood range. Miki et al. [9]

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

169

tried to enhance the performance of SA by appropriately adjusting the neighborhood range according to the landscape of the given problem using the opposition based learning. In fact, the opposition based learning increases only the diversity of the candidate’s solution by selecting not only the random guess but also its opposite. Furthermore, [10] also proposed a SA based on an adaptive search vector. This method guides the search direction according to the landscape of the problem. This adaptive search vector is based on the Powell’s [11] method which is a direct search algorithm. It was reported by [12] that Powell’s method may fail to converge when the problem dimension is greater than 30. In addition, the experiments performed by [10] does not exceed ten dimensions. A simulated annealing algorithm with a dynamic neighborhood size adjusted according to the temperature parameter was proposed by [5]. The efficiency of this method is depending on a good adjustment of the temperature parameter. Thus, a tuning for each problem will still necessary. Machine learning methods proved their efficiency to predict with high accuracy. Those methods have been applied to adapt metaheuristic, which can be done by an off-line or online approaches. The offline one implies a configuration of features before executing the algorithm while the online adjusts the algorithm features during the run-time. In this paper, we apply the Hidden Markov Model to predict the best cooling law parameter. The Hidden Markov Model (HMM) [13] is a probabilistic model. It has been applied in many fields where information is not immediately observable but depend on other observable data. HHMs are the most successful approach in speech recognition, they have been successfully applied to the whole spectrum of recognition tasks. HMMs success is due to ability to deal with the variability by means of stochastic modeling. In addition, the HMM was successfully used to enhance the behavior of metaheuristics by estimating their best configuration [14–34]. The main idea behind our approach is to control the behavior simulated annealing at each temperature stage by predicting the optimal neighborhood search strategy according to the history of the run.

11.3 The Proposed Approach An hybridization with the HMM was adopted to enhance the performance of the FSA. During the run, the hidden Markov model performs classification based on observable sequence generated from a set of rules. This sequence allows the model to guess the hidden state which can be an exploration, exploitation, or Re-annealing to escape from a local minimum (see Fig. 11.1). The Hidden Markov Model can be defined as 5-tuple (S, O, A, B, p 0 ) where: – S= {S1 , S2 , S3 } is the set of hidden states, which is respectively: low learning rate (μ = 0.2), medium (μ = 0.5), learning rate and high learning rate (μ = 0.8). – O= (O1 , O2 , . . . ,O5 ) is the set of the observation per state.

170

M. Lalaoui et al.

Fig. 11.1 Markov chain for simulated annealing algorithm

– A = (ai j ) is the transition probability matrix, where ai j is the probability that the state at time t + 1 is S j , is given when the state at time t is Si – p 0 = p10 , p20 , p30 is the initial probability, where pi0 is the probability of being in the state Si . – B= (bit ) are the observation probabilities, where bit is the probability of observing Ot in state Si . This observations matrix B of hidden Markov model is estimated at early stage by Maximum Likelihood Estimation (MLE). To estimate the observable matrix of HMM model, we generate a sequence of state containing the cooling schedule having the minimum variance in the cost V (T ) at equilibrium for each stage of temperature. This measure of variance in cost is defined as, V (T, f ) =< V 2 (T ) > −< V (T ) >2

(11.4)

where f is the cost function of the stationary distribution and < f (T ) > is the expected cost in equilibrium for N last best cost value defined as,   exp − f T(x)     f (T ) = f (x)  (11.5) f (x) exp − x F−N ≤x≤x F x F−N ≤x≤x F T The next step consists of generating the observations as demonstrated by the Algorithm 1. The main purpose of our model is to estimate states S that best explains the observation sequence O. Given the observation sequence O= O1 O2 . . .OT and a model λ = (A, B, π ). Firstly, we estimate the transition and emission probabilities from the first sequence of observation using a supervised training. In which we count frequencies of transmissions and emission of the model (see Algorithm 2).

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

171

Algorithm 1 Generate_Observation Input T, S1 , S2 , S3 , { f (x) , x F−N ≤ x ≤ x F } Output O The current observation Begin for i = 1 to 3 do Vi = V (Si (T ) , f ) end for temp ← V1 for i = 1 to 2 do if temp ≤ Vi+1 then O =i else temp ← Vi+1 and O ←i+1 end if end for return O End

Algorithm 2 MLE Input O= (O1 O2 . . .OT ) Output A = (ai j ), B = (bit ) Begin for i = 1 to T−1 do a Oi Oi+1 = a Oi Oi+1 + 1 end for for i = 1 to T do b Oi Oi = b Oi Oi + 1 end for for i = 1 to 3 do  Ai = 3j=1 ai j and B i = Tj=1 bi j end for for i = 1 to 3 do for j = 1 to 3 do ai j = ai j /Ai end for for t = 1 to T do bit = bit /B i end for end for return A End

11.3.1 Viterbi Algorithm Then we use the Viterbi Algorithm 3 to select the corresponding state sequence Q = q1 q2 . . . qT that best explains the observations, secondly, the Baum Welch adjust the model parameters λ = (A, B, π ) to maximize P(O | λ) i.e., the probability of the observation sequence given the model. The Baum–Welch algorithm is used to

172

M. Lalaoui et al.

adjust the parameters of the HMM. This training step is based on Forward-Backward algorithm. Algorithm 3 Viterbi

  Input S= ( s1 , s 2 ,s3 ), O= ( O1 O2 . . .OT ), A = (ai j ), B = (bit ), p 0 = π10 , π20 , π30 ∗ ∗ ∗ ∗ Output s = (s 1 , s2 , s3 ) the most probable sequence of states Begin for i = 1 to 3 do δ1 (i) = bi (o1 ) πi and ψ1 (i) = 0 end for Initalization for t = 2 to T do for j = 1 to 3 do 3 [δ δt ( j) = max i=1 t−1 (i) ai j b j (ot )] 3 ψt ( j) = argmax i=1 [δ t−1 (i) ai j ] end for 3 Pmax = max i=1 [δ T (i)] ∗ 3 [δ (i)] s10 = argmax i=1 T for t = T − 1 to 1 do ∗ ) st∗ =ψt+1 (st+1 end for end for return s ∗ End

11.3.2 Baum Welch Algorithm The Baum–Welch algorithm is used to adjust the parameters of the HMM. This training step is based on Forward-Backward algorithm.

11.3.2.1

Forward Algorithm

The first algorithm used by the Baum–Welch algorithm is the Forward Algorithm 4. This algorithm returns the forward variable α j (t) defined as the probability of the partial observation sequence until time t, with state S j at time t :    a j (t) = P O1 O1 . . .Ot , qt =S j  λ

(11.6)

We define P(O | λ) as the probability of the observation sequence given the model λ.

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

173

Algorithm 4 Forward

    Input S=( s1 , s2 , s3 ) ,O=(O1 O2 . . .OT ), A = ai j , B =(bit ), π 0 = π10 , π20 , π30 Input α= (αt+1 ( j)) , P (O|λ) Begin for i = 1 to 3 do αi (i) = πi bi1 end for for t = 1 to T − 1 do for j = 1 to 3 do  3 αt+1 ( j) = i=1 αt (i) ai j b jt+1 end for 3 P (O|λ) = i=1 α(i) return α, P (O|λ) end for End

11.3.2.2

Backward Algorithm

The second algorithm used by Baum–Welch is the Backward Algorithm 5. This algorithm calculates the backward variable β (t) defined as the probability of the partial observation sequence after time t, given state Si : βi (t) =P(Ot+1 Ot+2 . . .OT |qt =Si , λ)

(11.7)

The Baum–Welch is then used to re-estimate the parameters of the model, which maximizes the probability of the observation sequence. Algorithm 5 Backward

    Input S=( s1 , s2 , s3 ) ,O=(O1 O2 . . .OT ), A = ai j , B =(bit ), π 0 = π10 , π20 , π30 Output β= βt (i) the probability of the partial observation sequence Begin for i = 1 to 3 do βT (i) = 1 end for for t = T − 1 to 1 do for i = 1 to3 do βt (i) = 3j=1 ai j βt+1 ( j) b jt+1 end for end for return β End

In the following we will implement a variant simulated annealing based on hidden Markov models. The interest behind hybridization and simulated annealing with the HMM is improved simulated annealing performance. This approach will allow the SA to overlap between slow and rapid cooling, this by changing the cooling schedule.

174

M. Lalaoui et al.

Thus, the simulated annealing algorithm hybridization with HMM using Baum Welch and Viterbi algorithms is presented as follows: Algorithm 6 HMM-FSA algorithm Input f the cost function, T0 the initial temperature, T f the final temperature, Tn the temperature at the n-th stage, L max the length of temperature stage, x0 the initiale solution, O: the empty sequence of observations. Output xbest : the best solution. Begin Initialization: x ← x0 cmp ← 1 u Random vector from the uniform distribution. while T f ≤ Tn do for k = 1 to L max do xnew = xold + μ × Tn × tan(u) if f (xnew ) − f (x) ≤ 0 then x ← xnew else Generate a pseudorandom number from uniform distribution f (x) if < f (xnewT)− then n x ← xnew end if end if end for On ← Generate-Observation (Tn , f (x)x_best F−9 ≤x≤x_best F ) if cmp ≤ 10 then O ← [O , Oi ] [A, B] ← M L E(O) state ← V iter bi(O, A, B) cmp ← cmp + 1 else O ← [On−9 , . . . , On ] [A, B] ← Baum − −W elch(O, A, B) state ← V iter bi(O, A, B) end if μ ← state Tn ← T0 /(1 + n) n ←n+1 end while xbest ← x End

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

175

11.4 Experiments Our experiments were designed to measure the hybridization effects of HMM and SA and to show how our approach can improve the solution quality, we have chosen ten benchmarks selected from the literature and presented in Table 11.1. They are divided into two groups, unimodal functions with no local minimum except the global one and multimodal functions with many local minima.

11.4.1 Experimental Setup The proposed hybridization of SA algorithm and HMM was coded in Scilab programming language and experiments were conducted on a PC with an Intel Core i7-5500U 2.40 GHz (4 CPUs) and 8 GB of RAM. For convenience, the vanilla version of the fast simulated annealing was denoted as (FSA) [5], the fuzzy fast simulated annealing [19] as (Fuzzy FSA) and the hybridization of the SA and HMM was denoted as (HMM FSA). These variants have been tested using the test functions presented above. Each function was tested over 50 trials. We eliminated the effects of other factors which play an important role in the performance of the algorithm, by choosing the same starting point for all methods (in each run) and their location was chosen to be far from the basins of attraction of global minima. Also, we have chosen the same initial acceptance probability and an identical length of the inner and outer loops. The final temperature of the cooling process, T Final have been taken close to zero. We fixed T Final at 10−5 . The initial temperature T0 , have been calculated from mean energy rises  f during the initialization. Before the start of the SA, the mean

Table 11.1 The benchmark functions No Name 1 2 3 4 5 6 7 8 9 10

Six-Hump Camel Rastrigin De Jong Schuffer N◦ 4 Colville Levy N◦ 13 Branin Quadric Shubert Bukin

Global Minimum −1.0316 0 0 0.292579 0 0 0.397887 0 −186.7309 0

176

M. Lalaoui et al.

value of cost rises is estimated by a constant number of moves equal to 100. Then, − f initial temperature T0 has been calculated using the following formula T0 = ln P0 [35], where P0 is the initial average probability of acceptance and is taken equal to 0.95. The length N of observed sequence was chosen equal to 10.

11.4.2 Numerical Results The computational results and statistical analyses are summarized in Table 11.2. It provides the details of the results for the test functions. The overall best solution of the total 50 replications is shown in bold. HMM-FSA provide the best solution for the test functions f 1 , . . . , f 10 . In general, the HMM-FSA algorithm overcomes others variants in most of benchmark functions. In the most cases, our approach gives the best solution except for function f 3 . Even if this hybridization could increase significantly the complexity of the algorithm, our main objective is to demonstrate how a machine learning technique like HMM could improve the quality of solution.

Table 11.2 Results comparisons between HMM-FSA, Fuzzy-SA and FSA Functions HMM-FSA Fuzzy-FSA 1 2 3 4 5 6 7 8 9 10

Best Mean Best Mean Best Mean Best Mean Best Mean Best Mean Best Mean Best Mean Best Mean Best Mean

−1.03E+00 −1.03E+00 5.99E−07 1.27E−03 8.77E−12 2.48E−06 5.00E−01 5.00E−01 1.22E−03 2.43E+02 3.60E−09 4.32E−05 3.61E−01 3.61E−01 1.08E−07 6.92E−04 −1.87E+02 −1.87E+02 1.45E−02 3.73E−01

1.23E−03 7.32E+02 2.61E−06 2.75E−06 1.26E−12 1.34E−06 5.56E−01 6.92E−01 6.23E+00 3.33E+05 1.18E−06 1.21E−02 8.45E−01 2.88E+00 1.17E−06 1.14E−03 1.37E−09 3.72E−06 2.03E−01 7.97E−01

FSA 4.67E+01 1.35E+04 3.84E−01 5.39E+00 1.21E−04 4.73E−03 5.00E−01 5.00E−01 1.43E+02 3.92E+03 8.62E−06 4.91E−02 3.67E−01 3.67E−01 6.86E−05 5.72E−03 −1.87E+02 −1.87E+02 8.23E−02 6.48E−01

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

177

11.4.3 Comparison of Convergence Performance For further insights into the convergence behavior of our approach. HMM-FSA method was compared to the others variants. The experiments were designed to measure the effects of the hybridization of FSA and HMM presented in the previous section. It was noticed that the HMM-FSA can converge rapidly to the global minimum. The time gained in early stages can be used to converge to a better solution. This behavior is depicted in Figs. 11.2, 11.3, 11.4 and 11.5. In the most cases, the HMM-FSA gives better results than other variants except for functions f 3 . The fast simulated annealing give a better solution over 50 runs for function f 3 . Observed from Figs. 11.2, 11.3, 11.4 and 11.5, we can find that the cost function mean of HMM-FSA converges to the best solution. In addition, there are a certain stages that others SA variant’s outperforms the HMM-SA. For example, the fuzzy simulated annealing convergence’s speed is faster than HMM-SA for Schuffer N◦ 4 function in the early stages. However, the convergence speed of HMM-FSA is faster than others variants for functions f 3 , f 7 and f 8 .

Fig. 11.2 Djong function

Fig. 11.3 Quadric function

178

M. Lalaoui et al.

Fig. 11.4 Schuffer-4 function

Fig. 11.5 Branin function

11.4.4 Statistical Analysis The significance of the results has been evaluated using the Friedman’s test [36, 37]. It is a non-parametric statistical test equivalent to the parametric ANOVA. The Freedman’s test Hypothesis are for formulated as follows: H0 : Each ranking of the metaheuristics within each problem is equal, (i.e., there is no difference between them) so that for instance, the population medians are equal: H0 : [μ1 = · · · = μ N ]

(11.8)

H1 : At least one of the metaheuristics has a different performance than at least one of the other metaheuristics: H1 : [ μ1 , . . . , μ N not all equal ] The Friedman statistic was calculated by the following formula: ⎡ ⎤ 3 2  10N ⎣ k(k − 1) ⎦ χ F2 = R2 − = 81.25 k(1 + k) j=1 j 4

(11.9)

(11.10)

Where the χ F2 statistic was distributed according to the F-distribution with k − 1 = 2 and (k − 1) (N − 1) = 18 degrees of freedom. χ F2 = 81.25 is greater than the critical values of F (2, 18) =3.55 [38]. Thus, we reject the null hypothesis at the level of

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

179

Table 11.3 The rank for all algorithms in each benchmark function and the their average rank Functions HMM-FSA Fuzzy FSA FSA f1 f2 f3 f4 f5 f6 f7 f8 f9 f 10 Average rank (R j )

1 1 2 1 1 1 1 1 1 1 1.1

2 2 1 3 2 2 3 2 3 3 2.3

3 3 3 2 3 3 2 3 1 2 2.5

significance α=0.05. Then, we conclude that the performance of all algorithms is statistically different. We can proceed with a post hoc significant test to know if algorithm i and j are different. In addition, we rank the results of the metaheuristic for each benchmark function, giving 1 to the best algorithm and 3 to the worst one. Let r( pi j ) be the rank of j th algorithm in k algorithm on the i th function of N benchmark functions, where k is equal to 3 and N is equal to 10 in our experiment. The average ranks of the algorithms were then computed by Eq. 11.11, as shown in Table 11.3. Rj =

N 1  r( pi j ) f or j ∈ [1..3] N i=1

(11.11)

The average ranks by themselves give a useful performance comparison. As depicted in Table 11.3 the HMM-FSA ranks the first with the rank average of 1.1 followed by the Fuzzy FSA with the rank average of 2.3, the FSA ranks the third.

11.5 Conclusion and Future Research In this study, we proposed a dynamic simulated annealing with adaptive neighborhood using Hidden Markov Model. To test the performance of this approach, a number of benchmark functions have been applied. This approach allows to controls the neighborhood steps based on the history of the search. The HMM parameters are calculated and updated at each cooling step. The Viterbi algorithm is then used to classify the observed sequence and select the adequate learning value. The comparisons of the proposed approach with simulated annealing based on fuzzy control and the vanilla version of FSA, demonstrates that the simulated annealing based on

180

M. Lalaoui et al.

HMM is able to find better solutions in reasonable time. Our approach is able to manage time by rapidly decreasing temperature and thus anticipating exploitation state, this lead to a better convergence. Future research may focus on the application of our method to some real optimization problems.

References 1. S. Kirkpatrik, Optimization by simulated annealing. Science 220(4598), 671–680 (1983) 2. E. Aarts, F. Bont, E. Habers, P. van Laarhoven, Parallel implementations of the statistical cooling algorithm. Integr. VLSI J. 4, 209–238 (1986) 3. P. Banerjee, M. Jones, A parallel simulated annealing algorithm for standard cell placement on a hypercube computer, in The Proceedings of the IEEE International Conference on ComputerAided Design (1986), pp. 34–37 4. A. Casotto, F. Romeo, A. Sangiovanni-Vincentelli, A parallel simulated annealing algorithm for the placement of macro-cells, in The Proceedings of the IEEE International Conference on Computer-Aided Design (1986), pp. 30–33 5. H.H. Szu, R.L. Hartley, Fast simulated annealing. Phys. Lett. A 122, 157–162 (1987) 6. P. Melin, F. Olivas, O. Castillo, F. Valdez, J. Soria, M. Valdez, Optimal design of fuzzy classification systems using PSO with dynamic parameter adaptation through fuzzy logic. Expert Syst. Appl. 40(8), 3196–3206 (2013) 7. R. Battiti, M. Brunato, Reactive Search and Intelligent Optimization. Computer Science Interfaces (Springer, Berlin, 2008) 8. A. Corana, M. Marchesi, C. Martini, S. Ridella, Minimizing multimodal functions of continuous variables with the simulated annealing algorithm. ACM Trans. Math. Softw. 13(3), 262–280 (1987) 9. M. Miki, T. Hiroyasu, K. Ono, Simulated annealing with advanced adaptive neighborhood, in The 2nd International Workshop on Intelligent Systems Design and Application (2002), pp. 113–118 10. M. Miki, S. Hiwa, Simulated annealing using an adaptive search vector, in Cybernetics and Intelligent Systems (2006) 11. M.J.D. Powell, An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7(2), 155–162 (1964) 12. R.T. Haftka, Z. Gurdal, Elements of Structural Optimization. Solid Mechanics and Its Applications, vol. 11, Chap. 4 (1992), p. 124 13. L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989) 14. M. Lalaoui, A. El Afia, A versatile generalized simulated annealing using type-2 fuzzy controller for the mixed-model assembly line balancing problem. IFAC-PapersOnLine 52(13), 2804–2809 (2019). https://doi.org/10.1016/j.ifacol.2019.11.633 15. A. El Afia, M. Lalaoui, R. Chiheb, A self-controlled simulated annealing algorithm using hidden Markov model state classification. Procedia Comput. Sci. 148, 512–521 (2019). https:// doi.org/10.1016/j.procs.2019.01.024 16. M. Lalaoui, A. El Afia, A fuzzy generalized simulated annealing for a simple assembly line balancing problem. IFAC-PapersOnLine 51(32), 600–605 (2018). https://doi.org/10.1016/j. ifacol.2018.11.489 17. M. Lalaoui, A. El Afia, R. Chiheb, A self-tuned simulated annealing algorithm using hidden Markov model. Int. J. Electr. Comput. Eng. (IJECE) 8(1), 291–298 (2017). https://doi.org/10. 11591/ijece.v8i1.pp291-298 18. M. Lalaoui, A. El Afia, R. Chiheb, A self-tuned simulated annealing algorithm using hidden Markov model, in The International Conference on Learning and Optimization Algorithms: Theory and Application (LOPAL’2018) (2018). https://doi.org/10.1145/3230905.3230963

11 Dynamic Simulated Annealing with Adaptive Neighborhood …

181

19. A. El Afia, M. Lalaoui, R. Chiheb, Fuzzy logic controller for an adaptive Huang cooling of simulated annealing, in The 2nd International Conference on Big Data, Cloud and Applications (CloudTech’17) IEEE Conference (2017). https://doi.org/10.1145/3090354.3090420 20. M. Lalaoui, A. El Afia, R. Chiheb, A self-adaptive very fast simulated annealing based on hidden Markov model, in The 3rd International Conference on Cloud Computing Technologies and Applications, ACM Conference (2017). https://doi.org/10.1109/CloudTech.2017.8284698 21. M. Lalaoui, A. El Afia, R. Chiheb, Hidden Markov model for a self-learning of simulated annealing cooling law, in The 5th International Conference on Multimedia Computing and Systems IEEE Conference, ICMCS’16 (2016). https://doi.org/10.1109/ICMCS.2016.7905557 22. S. Bouzbita, A. El Afia, R. Faizi, A novel based hidden Markov model approach for controlling the ACS-TSP evaporation parameter, in The 5th International Conference on Multimedia Computing and Systems (ICMCS) (2016), pp. 633–638. https://doi.org/10.1109/ICMCS.2016. 7905544 23. S. Bouzbita, A. El Afia, R. Faizi, M. Zbakh, Dynamic adaptation of the ACS-TSP local pheromone decay parameter based on the hidden Markov model, in The 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech) (2016), pp. 344– 349. https://doi.org/10.1109/CloudTech.2016.7847719 24. A. El Afia, S. Bouzbita, R. Faizi, The effect of updating the local pheromone on ACS performance using fuzzy logic. Int. J. Electr. Comput. Eng. 7(4), 2161–2168 (2017). https://doi.org/ 10.11591/ijece.v7i3.pp2161-2168 25. S. Bouzbita, A. El Afia, R. Faizi, Hidden Markov model classifier for the adaptive ACS-TSP pheromone parameters, in Bioinspired Heuristics for Optimization, vol. 774. (Springer, Berlin, 2018), p. 153. https://doi.org/10.1007/978-3-319-95104-1_10 26. S. Bouzbita, A. El Afia, R. Faizi, Parameter adaptation for ant colony system algorithm using hidden Markov model for TSP problems, in The Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications (ACM, 2018), p. 6. https:// doi.org/10.1145/3230905.3230962 27. S. Bouzbita, A. El Afia, R. Faizi, Adjusting population size of ant colony system using fuzzy logic controller, in The International Conference on Computational Collective Intelligence, vol. 11684 (Springer, Berlin, 2019), pp. 309–320. https://doi.org/10.1007/978-3-030-283742_27 28. A. El Afia, M. Sarhan, O. Aoun, A probabilistic finite state machine design of particle swarm optimization, in Bioinspired Heuristics for Optimization (Springer, Cham, 2019), pp. 185–201. https://doi.org/10.1007/978-3-319-95104-1_12 29. A. El Afia, O. Aoun, S. Garcia, Adaptive cooperation of multi-swarm particle swarm optimizerbased hidden Markov model. Prog. Artif. Intell. 8, 441–452 (2019). https://doi.org/10.1007/ s13748-019-00183-1 30. O. Aoun, M. Sarhani, A. El Afia, Hidden Markov model classifier for the adaptive particle swarm optimization, in Recent Developments in Metaheuristics (Springer International Publishing, Cham, 2018), pp. 1–15. https://doi.org/10.1007/978-3-319-58253-5_1 31. O. Aoun, M. Sarhani, A. El Afia, Particle swarm optimisation with population size and acceleration coefficients adaptation using hidden Markov model state classification. Int. J. Metaheuristics 7(1), 1–29 (2018). Inderscience Publishers (IEL). https://doi.org/10.1504/IJMHEUR.2018. 091867 32. O. Aoun, A. El Afia, S. Garcia, Self inertia weight adaptation for the particle swarm optimization, in Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications (ACM, 2018), pp. 8:1–8:6. https://doi.org/10.1145/3230905.3230964 33. A. El Afia, M. Sarhani, O. Aoun, Hidden Markov model control of inertia weight adaptation for particle swarm optimization. IFAC-PapersOnLine 50(1), 9997–10002 (2017). https://doi. org/10.1016/j.ifacol.2017.08.2030 34. O. Aoun, M. Sarhani, A. El Afia, Investigation of hidden Markov model for the tuning of metaheuristics in airline scheduling problems. IFAC-PapersOnLine 49(3), 347–352 (2016). https://doi.org/10.1016/j.ifacol.2016.07.058

182

M. Lalaoui et al.

35. P. Kouvelis, W.-C. Chiang, J.A. Fitzsimmons, Simulated annealing procedures for machine layout problems in the presence of zoning constraints. Eur. J. Oper. Res. 57(22), 203–223 (1992) 36. M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937) 37. A. LaTorre, S. Muelas, J.M. Pena, A comprehensive comparison of large-scale global optimizers. Inf. Sci. 316, 517–549 (2015) 38. D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd edn. (Chapman & Hall/CRC, Boca Raton, 2004)

Chapter 12

Hybridization of the Differential Evolution Algorithm for Continuous Multi-objective Optimization Caroline Gagné, Aymen Sioud, Marc Gravel, and Mathieu Fournier

Abstract Hybridization metaheuristics can enhance algorithm performance by combining the advantages of several strategies in order to profit from the resulting synergy. Multi-objective optimization is no exception to this trend. In this paper we develop a new hybrid algorithm, IWODEMO, which uses differential evolution and the invasive weed algorithm to solve multi-objective problems with continuous variables, thereby integrating the exploration and exploitation capacities of both algorithms. The efficiency of IWODEMO is shown by experimental results that are generally better than those of other algorithms in the literature (NSDE, DEMO and ADE-MOIA) in terms of solution convergence and dispersion on the Pareto-optimal front. These other algorithms were reproduced and tested in the same experimental conditions.

12.1 Introduction Many industrial optimization problems, discrete as well as continuous, have simultaneous contradictory objectives. The “optimum” is then not a unique point but rather a set of “compromise” solutions, also called the Pareto-optimal (PO) front. Consequently, there are two main goals in multi-objective (MO) optimization: (i) the solution set must be as close as possible, in quality, to the PO front; and (ii) as diversified as possible. Meeting these two goals in a reasonable time is an important C. Gagné (B) · A. Sioud · M. Gravel · M. Fournier Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada e-mail: [email protected] A. Sioud e-mail: [email protected] M. Gravel e-mail: [email protected] M. Fournier e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_12

183

184

C. Gagné et al.

challenge for any MO algorithm [5]. In the last few years, several metaheuristics have been successfully adapted for MO optimization. Further information on the variety of MO metaheuristics can be found in recent state-of-the-art surveys [2, 7]. Optimization problems with continuous variables and several contradictory objectives are frequent in industry. Continuous problems require particularly good exploration of the solution space. Differential Evolution (DE) is known to have a good global search capability [11]. For its part, Invasive Weed Optimization (IWO) [12] is inspired by the dispersal of seeds in a natural environment. This analogy with nature represents r-selection and K-selection allowing more exploration at the beginning of the algorithm and then increasing exploitation. A consequence of the No Free Lunch Theorem [17] is that no general optimization strategy can be the “best” for all problems. Hybridization is then an interesting idea that may approach this global optimality. The main motivation for hybridization is to improve efficiency by combining the advantages of several methods so as to profit from the resulting synergy. This has been applied to MO optimization. For example, NSDE [9] is a hybrid between DE and a genetic algorithm which has given better results than NSGAII on rotation problems. ADE-MOIA [11] is a hybrid between DE and an artificial immune system. GISMOO [18] uses DE to handle continuous variables, giving good results compared with other algorithms in the literature. Regarding the hybridization between DE and IWO, there seems to be only one algorithm that optimizes uni-objective problems [1]. The main contribution of this chapter is to propose IWODEMO, a new efficient DE-based hybrid algorithm that handles continuous MO problems. The other “base” is the IWO algorithm, thereby balancing the exploration and exploitation capabilities. This new algorithm must meet the two goals of MO optimization: convergence toward the PO front and diversified solutions. As far as we know, the literature does not present a comparison between DE and IWO for the optimization of MO problems [10, 13]. We experimentally compare IWODEMO with NSDE, DEMO and ADE-MOIA, handling the same instances, using the same computing power, and maximizing common components. Section 12.2 presents the main DE-based MO hybrid algorithms. Section 12.3 describes IWODEMO, which is numerically compared with NSDE, DEMO and ADE-MOIA in Sect. 12.4, with the resulting analysis in Sect. 12.5. The conclusion and future directions for our research are given in Sect. 12.6.

12.2 Literature: DE Algorithms for MO Problems 12.2.1 The de Algorithm: Basic Notions Originally, differential evolution (DE) was proposed for single-objective optimization [14, 16]. It is known to have a good global search capability [11]. The DE algorithm is summarized in Algorithm 1.

12 Hybridization of the Differential Evolution Algorithm …

185

Algorithm 1 Outline of DE algorithm with mutation DE/rand/1/bin

1: Create randomly an initial population P O P0 of size N and evaluate each  X i solution 2: while no stopping rule is invoked do 3: for each target vector  X i (i = 1, . . . , N ) of P O Pg do Vi 4: MUTATION - Create a mutant vector  5: Select three random vectors  Xr 1,  X r 2 and  Xr 3 6: Calculate the mutant vector using weighted difference vector:    Vi =  Xr 1 + F ·  Xr 2 −  Xr 3 (12.1) 7:

 CROSSOVER - Create an offspring named trial vector U i  vi, j if (rand [0, 1] ≤ C R) or ( j = jrand ) with j = 1, . . . , D u i, j = xi, j otherwise

(12.2)

 8: Evaluate the trial vector U i  and  9: SELECTION - Keep the best solution between U X i for the next generation P O Pg+1 i 10: end for 11: end while 12: Return the best solutions found

This evolutionary algorithm begins by randomly generating an initial population X i = [xi,1 , . . . , xi, j , . . . , xi,D ] is a so-called target P O P0 of N solutions. A solution  vector represented by D decision variables, which are real numbers bounded by xi, jmin and xi, jmax . In each iteration, the algorithm creates a single descendant by a Xi . mutation, a crossover and a selection for each  X i , a so-called mutant (or donor) In the mutation phase, for each target vector     vector Vi is created from three vectors X r 1 , X r 2 and  X r 3 randomly chosen from the population, such that they and the target vector represent four different solutions. Vi Equation (12.1) in Algorithm 1 presents the DE/rand/1/bin mutation that creates  by adding the weighted difference of vectors  X r 2 and  X r 3 to the vector  X r 1 . The weighting factor F is a real number between 0.4 and 1, preferably. This study uses the mutation DE/rand/1/bin because it is known to converge to the PO front while maintaining diversity [3]. X i and the mutant After the mutation, a binary crossover between the target vector    Vi is done, producing the so-called trial vector Ui . For each index j = 1, . . . , D, the trial vector component “chooses” between that of the target vector and that of the mutant, using the crossover probability C R, which is fixed (between 0 and 1) at the beginning of the algorithm. This is described by Eq. (12.2) in Algorithm 1, where jrand is an integer chosen randomly in the interval [1, D], in order to keep at least one “gene” from the mutant. Finally, the selection phase creates the next population P O Pg+1 by choosing, for  , according X i and its child U each index i = 1, . . . N , the “best” solution, between  i to the objective function. Note that the size N of the population remains unchanged at each iteration. Three known algorithms using DE for MO optimization are now presented: NSDE, DEMO and ADE-MIOA.

186

C. Gagné et al.

Fig. 12.1 Outline of the algorithms a NSDE b DEMO c ADE-MOIA

12.2.2 NSDE [9] The algorithm NSDE has the same structure as NSGAII [5], wherein the genetic crossover method is replaced by the mutation DE/current-to-rand/1/bin [9]. In this study, the mutation DE/rand/1/bin will be used in NSDE, in order to compare it equitably with the other algorithms. Figure 12.1a describes the main components of NSDE. Initially, a population P O P0 of size N is randomly created and then sorted by fronts. At each iteration, the classical concepts of the genetic algorithm (selection, crossover, mutation and replacement) are applied. The hybridization with DE occurs in parent selection and crossover. In a binary tournament, a single parent is selected  by the mutation DE/rand/1/bin, X i . It produces a trial vector U as the target vector  i  adding a perturbation at X i . Then the trial vector undergoes a polynomial mutation [4]. A population Childg of N new solutions is thereby created, then combined with P O Pg and sorted by fronts for the replacement step. In each front the solutions are sorted by the isolation factor. Thus the N best solutions are kept for the next iteration.

12.2.3 DEMO [15] Differential Evolution for Multiobjective Optimization (DEMO) is a pure DE algorithm introducing MO resolution strategies, described in Fig. 12.1b. At each iteration X i , a DE/rand/1/bin crossover is on a population of size N , and for each target vector   used to produce a trial vector Ui . In the replacement phase, DEMO uses the notion of dominance in the Pareto sense to determine which solution is kept in the population:

12 Hybridization of the Differential Evolution Algorithm …

187

 dominates the other, the target vector, the trial vector or both. If either of  X i or U i  is added to the then the other is eliminated; but if there is no dominance relation, U i population. The augmented population (which may have as many as 2 ∗ N solutions) is sorted by fronts, the crowding distance is calculated for each solution, and the N “best” solutions are retained.

12.2.4 ADE-MOIA [11] ADE-MOIA, summarized in Fig. 12.1c, is a hybridization between an adaptive version of the algorithm DE and an artificial immune system. The initial population of N solutions is randomly generated and then divided into two subsets: E X A contains the non-dominated solutions and D A the dominated ones. A number N C < N is also fixed at the beginning, such that N C non-dominated solutions are selected to Xi produce N clones. In particular, the number of clones nci,g produced by a solution  at iteration g is proportional to its crowding distance cdi in the set E X A according to Eq. (12.3).   cdi nci,g = N ·  N C (12.3) j=1 cd j To generate new solutions, the clones are first modified by the crossover DE-/rand /1/bin, in which the parameters C R and F “adapt” to the succeeding iterations. C R, used in the crossover creating the trial vector (Eq. (12.1) in Algorithm 1), decreases at each iteration, going from 0.9 to 0.1 as the problem instance is optimized. The idea is to allow more exploring at the beginning of the process and more intensive searching at the end. The weighting factor F is used to produce the mutant vector (Eq. (12.2) in Algorithm 1), so that the amplitude of the perturbations can be determined. This parameter is also adaptive, in that it is modified (by a random choice in a Cauchy distribution, between 0.1 and 0.9) each time a new solution is produced. In order to add diversity, the trial vector generation in the mutation DE-/rand/1/bin X r 1 and  X r 2 are chosen involves the sets E X A and D A because the random vectors   from E X A, whereas X r 3 is chosen from D A [11]. After a crossover modifies a clone, it undergoes a polynomial mutation, receiving a perturbation on some of its variables. When the child population has N new solutions, it is combined with the set E X A, suppressing identical solutions. Then the solutions are distributed in the sets E X A and D A. If the size of E X A exceeds N , the solutions are sorted by crowding distance, and the least isolated solution is suppressed. This elimination process is repeated until there are only N non-dominated solutions in E X A. This strategy of progressive replacement is good in maintaining solution diversity in the population, but the computation is costly.

188

C. Gagné et al.

12.3 Hybridization Between de and IWO: IWODEMO 12.3.1 The Weed Analogy The IWO algorithm [12], is an evolutionary algorithm inspired by the idea of weeds (unwanted plants) proliferating in the environment. In the natural competition among plant colonies, some plant seeds are dispersed over long distances into unpredictable regions, whereas other plants proliferate nearby, exploiting their environment and reinforcing their colony. The former plants are in what is called the r-selection, and the latter, in the K-selection. In IWO, the solutions are called plants and their descendants are called seeds. The Wi from the plants  X i are dispersed in the population is modified when the seeds  solution space by using r-selection and K-selection. Initially, the seeds are dispersed over long distances to establish diversified colonies. As the iterations continue, the seeds are not dispersed as far. A seed is a clone of the parent plant where each decision variable is modified according to a standard deviation σg which decreases as the number of iterations increases, so that exploration becomes exploitation, in the solution space. IWO uses another natural concept to enhance convergence: a stronger plant has more seeds to be dispersed. At each iteration, a plant produces a number of seeds proportional to its objective function value. The natural competition among plant colonies (some die out; others expand and adapt to their environment) is represented in the algorithm by a classical elite replacement at the end of each iteration, in order to keep the best solutions. In the literature, there are several MO IWO methods, such as NSIWO [13], which has the same procedure as NSGAII, but with seed proliferation as the crossover method. There is also IWO-MO [10], which uses the weed optimization mechanism. To determine solution quality, IWO-MO uses fuzzy dominance instead of Pareto dominance, thereby enabling a population sort that determines which solutions are kept for the next iteration. The solution rank also helps to determine the number of seeds dispersed.

12.3.2 IWODEMO The algorithm IWODEMO is a hybridization of DE and IWO. Algorithm 2 outlines its pseudo-code. The initial population is chosen at random, and the population size at each iteration is N . As in IWO, a solution produces a number of descendants in accordance with its sorted rank, as follows. The population is sorted first by a dominance factor (i.e., by fronts, as in NSGAII [5]) and then by an isolation factor (the crowding distance) in each front. After the sorting is completed, the most isolated solution produces children, then the next most isolated, and so on, until N children

12 Hybridization of the Differential Evolution Algorithm …

189

are produced. At each iteration g, a solution  X i produces a number N Di,g ofchildren 

X i and (see Eq. (12.4) in Algorithm 2) proportional to its objective function value f  parametrized by the best (Max Fitg ) and worst (Min Fitg ) solutions in the current population, as well as the global maximum (Max G) and minimum (MinG) number of children possible for any solution.

Algorithm 2 Outline of IWODEMO algorithm

X i on each 1: Create randomly an initial population P O P0 of size N and evaluate each solution  objective 2: Sort P O P0 according to dominance and isolation factors 3: while no stopping rule is invoked do 4: while N bO f f spring < N do 5: Take in order the next vector  X i of P O Pg 6: Compute the number of children N Di,g for solution  Xi ⎢⎛   ⎞⎥  ⎢ ⎥ f X i − Min Fitg ⎢ ⎥ N Di,g = ⎣⎝ · (Max G − MinG)⎠⎦ + MinG (12.4) Max Fitg − Min Fitg

7: 8: 9: 10: 11: 12: 13: 14:

for each each of the N Di,g children of  X i do  with adaptive mutation DE/rand/1/bin REPRODUCTION - Create a trial vector U i SELECTION - 3 possible cases  dominates   (a) if U X i , replace  X i by U i i    (b) if Ui is dominated by X i , delete Ui and do WEED COLONIZATION  to P O P (c) if no dominance relationship, add U i g  WEED COLONIZATION - (Only if Ui is dominated) Plant a seed  Wi according to IWO weed colonization:    wi, j = xi, j + N 0, σg2 f or j = 1, . . . , D (12.5)

) 15: Do SELECTION with  Wi (instead of U i 16: end for 17: Sort P O Pg according to dominance and isolation factors 18: Delete identical solutions in P O Pg 19: Progressive replacement of P O Pg to N solutions 20: end while 21: end while 22: Return non-dominated solutions

 and a seed  A solution  X i can have two types of descendants: a trial vector U Wi . i  is produced from the target vector  In the first case, U X i by the adaptive mutation i DE/rand/1/bin, as in the DE algorithm. Note that the parameter C R adapts (by decreasing from 0.8 to 0.1) as the iterations proceed, according to the r-selection and K-selection. A higher C R at the beginning means that the trial vector has more variables from the mutant vector, thereby enhancing exploration, and a lower C R  later on enhances exploitation, as in IWO and ADE-MOIA. In the selection phase, U i  and X i are compared: if either vector dominates the other, it is kept in the population; otherwise both vectors are kept, as in DEMO.

190

C. Gagné et al.

 In the second case, a seed  Wi is produced from a plant Eq. (12.5) in   Ui 2using Algorithm 2, where j indexes the decision variables and N 0, σg is the normal distribution with mean 0 and standard deviation σg . This “seeding” corresponds to plant proliferation seen in IWO, except that IWODEMO adds a mutation probability to the equation: only the decision variables passing the mutation criterion are perturbed. In X i is eliminated, and the the selection phase, a seed dominated by its target vector  iteration proceeds without further proliferation. X i continues to generate its N Di,g children, and any child replacing The solution   X i will produce its own descendants. A maximum of N children is produced in an iteration, including trial vectors and seeds. The ADE-MOIA algorithm also produces several descendants from a single solution, but they are placed in a child population. In contrast, IWODEMO immediately replaces a solution, thereby achieving a more rapid convergence. After the descendants are created, the population is again sorted by fronts, and in each front, by the isolation factor. Identical copies of solutions are eliminated. If the population size is still greater than N , then, starting with the least isolated one, the solutions are eliminated until only N are left. As is the case for ADE-MOIA, this progressive replacement strategy favors solution diversity but at a computation cost.

12.4 Numerical Experiments 12.4.1 Instances: DTLZ and ZDT The first test instances used in the numerical experiments are in the DTLZ package created by Deb et al. [6]. These instances offer a number of challenges for MO algorithms, such as the existence of numerous local PO fronts and an unequal distribution of points on the PO front. A feature of the DTLZ instances is that the number M of objective functions to minimize and the number of decision variables D can be modified to obtain the desired complexity. The parameters of the DTLZ instances are defined as recommended in the original paper, where D = M + k − −1, and k = 5 for DTLZ1, k = 10 for DTLZ2 to DTLZ6, and k = 20 for DTLZ7. Note that the decision variables are bounded in the unit interval [0, 1]. The reader can refer to the original article for more details on the DTLZ instances. The second experimental test set used here consists of instances mentioned in Zitzler et al. [19]. This set is one of the most widely used MO references in the literature. Note that there are two objective functions to minimize ( f 1 and f 2 ). The feasible solutions are presented with real numbers, with 30 (ZDT1, ZDT2, ZDT3) or 10 (ZDT4, ZDT6) decision variables. The PO fronts include an infinite (but bounded) set of solutions. In the decision space, the PO solutions in each instance have the following characteristics in common: 0 ≤ x1 ≤ 1 and xi = 0 for i = 2, . . . , D where D is the number of decision variables. The ZDT instances provide some diversity

12 Hybridization of the Differential Evolution Algorithm …

191

in solution spaces. In fact, the PO solutions have specific patterns, adding to the optimization complexity of this test set.

12.4.2 Metrics A survey of the literature shows that the performance evaluation of an MO algorithm is often a difficult task. On the one hand, the metrics used by the different authors vary considerably. On the other hand, MO algorithms search for both the quality and the diversity. A single metric can hardly measure both aspects. In this chapter, we therefore use two metrics to evaluate algorithm performance: the convergence γ and the diversity . The PO front of each instance should be known in order to calculate these metrics. The convergence metric γ measures the distance between the set of non-dominated solutions P and that of the PO front P ∗ . In Eq. (12.6), where γ is calculated, di is the Euclidean distance between the solution i ∈ P and the nearest solution in P ∗ . A smaller value of γ indicates a better convergence to the PO front. When all the solutions obtained are in P ∗ , the value of γ is zero. |P| γ =

i=1

di

|P|

(12.6)

The second performance measure used here is the diversity metric , which measures the propagation of the non-dominated solutions. This metric is calculated using Eq. (12.7), where di , as in the paragraph above, is the distance between the solution i ∈ P and the nearest solution in P ∗ , and d¯ is the average of these distances. The parameters d f and dl represent respectively the Euclidean distance between the extreme solutions of P ∗ and the limit solution of P. A smaller value of  indicates a greater diversity in the approximation.  |P|−1  d f + dl + i=1 di − d¯  (12.7) = d f + dl + (|P| − 1) · d¯

12.4.3 Experimental Conditions and Parameters The main hybrid algorithms using DE to efficiently optimize MO problems include NSDE [9], DEMO [15] and ADE-MOIA [11]. Comparing these algorithms is not easy, because the original papers generally differ in their implementations and performance evaluations. In order to achieve a reliable and equitable comparison, the algorithms NSDE, DEMO and ADE-MOIA were reprogrammed and tested in the same experimental conditions. The reprogramming, following the indications in the

192

C. Gagné et al.

Table 12.1 Parameters of NSDE, DEMO, ADE-MOIA and IWODEMO algorithms NSDE DEMO · N : 100 · N : 100 · #evaluations: 25000 · #evaluations: 25000 · C R: 0.3 (Except ZDT4/6: 0.1) · C R: 0.3 (Except ZDT4/6: 0.1) · F: 0.5 · F: 0.5 · Prob. mutation: 1/D ADE-MOIA IWODEMO · N : 100 · N : 100 · #iterations: 250 · #iterations: 250 · C R: Adaptive in [0.1, 0.9] · C R: Adaptive in [0.1, 0.8] · F: Adaptive in [0.1, 0.9] · F: 0.5 · Prob. mutation: 1/D · Prob. mutation IWO: 0.3 · Max G: 2, MinG: 1 · Init. deviation(σini ): (Max Fit0 − Min Fit0 ) · 0.2 · Final deviation: σini · 0.01

original paper, was checked to insure the exactness of each algorithm. This was done by using the original metric. Pertinent details can be found in [8]. In this comparison, the experimental conditions are those described in the corresponding papers [5, 15, 18]. The convergence and diversity metrics presented in the results of the next section are the averages of ten runs. A maximum of 25,000 solution evaluations is given for the approximation of a PO front. This maximum corresponds to 250 iterations for IWODEMO and ADE-MOIA, whose stopping criterion is a number of iterations. The algorithms are implemented in C++ with Visual Studio 2010. The numerical experiments were done on a ASUS laptop computer with an Intel 2.4 GHz processor with 8 GB RAM and a Windows operating system. The algorithm parameters, listed in Table 12.1 below, are generally those found in the original papers.

12.5 Results and Analysis 12.5.1 Results for the DTLZ Instances A series of tests was performed on the seven DTLZ instances with their three objective functions. The PO fronts for each instance had 600 to 28,000 solutions; they are available at the following address: http://jmetal.sourceforge.net/prob-lems.html. In fact, they are the PO fronts used in the original paper on ADE-MOIA [11]. Table 12.2 gives the average values of the convergence metric γ . The shaded value in each row is the “best” for the corresponding instance. The best convergence is achieved by IWODEMO on four instances (DTZ1, 2, 3, 5), by ADE-MOIA on two

12 Hybridization of the Differential Evolution Algorithm …

193

Table 12.2 Average convergence values (γ ) for the DTLZ instances

DTLZ1 DTLZ2 DTLZ3 DTLZ4 DTLZ5 DTLZ6 DTLZ7 Mean

NSDE

DEMO

ADE-MOIA

IWODEMO

2.24e-03 6.10e-03 5.09e-01 3.06e-02 1.35e-03 2.30e-03 1.59e-02 8.11e-02

1.95e-03 4.43e-03 7.73e-02 3.06e-02 9.90e-04 2.20e-03 1.41e-02 1.88e-02

2.01e-03 4.57e-03 3.61e00 2.77e-02 9.94e-04 2.25e-03 1.21e-02 5.23e-01

1.86e-03 4.42e-03 6.94e-03 3.12e-02 9.36e-04 2.23e-03 1.42e-02 8.83e-03

Table 12.3 Average diversity values () for the DTLZ instances

DTLZ1 DTLZ2 DTLZ3 DTLZ4 DTLZ5 DTLZ6 DTLZ7 Mean

NSDE 0.48 0.49 0.58 0.49 0.61 0.96 0.48 0.58

DEMO 0.49 0.49 0.48 0.49 0.65 0.79 0.46 0.55

ADE-MOIA 0.54 0.47 0.78 0.57 0.34 0.35 0.44 0.50

IWODEMO 0.50 0.51 0.51 0.55 0.34 0.35 0.47 0.46

(DTLZ4, 7), and by DEMO on one (DTLZ6). We also see that each algorithm does indeed converge to the PO front. However, IWODEMO has the most “convergence stability” with a mean of 0.00883 over all the instances, whereas ADE-MOIA has the least, with a mean of 0.523. Table 12.3 gives the average values of the solution diversity metric . The results do not seem to point to any clear “winner” in this category. Each algorithm has the most diversity for at least one instance. ADE-MOIA has the most diversity for four instances, but the least for DTLZ3. With respect to the mean of 0.46 over the seven instances, IWODEMO is slightly “more diverse” than the three other algorithms. Table 12.4 gives the average execution times (in seconds) for the optimization of the DTLZ instances. We can see that DEMO is generally the fastest of the four algorithms. To explain the somewhat greater execution times for ADE-MOIA and IWODEMO, note that these two algorithms have a progressive replacement to keep N solutions in the population. This helps conserve solution diversity, at the cost of a greater number of calculations.

194

C. Gagné et al.

Table 12.4 Average execution times (in seconds) for the DTLZ instances

DTLZ1 DTLZ2 DTLZ3 DTLZ4 DTLZ5 DTLZ6 DTLZ7 Mean

NSDE

DEMO

ADE-MOIA

IWODEMO

0.80 0.30 0.90 0.30 0.30 0.90 0.60 0.59

0.20 0.30 0.10 0.40 0.50 0.40 0.40 0.33

1.40 1.90 1.30 1.50 1.20 1.70 1.70 1.53

0.80 1.10 0.90 1.10 0.90 0.90 0.70 0.91

12.5.2 Results for the ZDT Instances Table 12.5 gives the average values of the convergence metric γ for the five ZDT instances with PO fronts containing 500 solutions, as proposed by the creators of DEMO [15]. IWODEMO has the best value for each instance, except ZDT4 for which DEMO has a better value, and ZDT6 for which DEMO and IWODEMO have the same values. Note that each algorithm converges “well” towards the PO fronts. As is the case for the DTLZ instances, IWODEMO has the best mean value (0,000924) over the ZDT instances. The analysis of solution quality can be extended by a closer examination of the convergence. This is obtained by considering PO fronts with more than 500 solutions, which is possible because of the mathematical definition of the instances. Table 12.6 gives the average convergence values for ZDT instances with PO fronts containing 25,000 solutions. IWODEMO now has the best convergence for each ZDT instance, including ZDT4. Note also that the values of the convergence metric are smaller than in Table 12.5.

Table 12.5 Average convergence values (γ ) for the ZDT instances (Pareto fronts with 500 solutions)

ZDT1 ZDT2 ZDT3 ZDT4 ZDT6 Mean

NSDE

DEMO

ADE-MOIA

IWODEMO

1.25e-03 8.84e-04 1.25e-03 1.15e-03 6.22e-04 1.03e-03

1.06e-03 7.86e-04 1.23e-03 9.76e-04 6.02e-04 9.31e-04

1.10e-03 8.15e-04 1.25e-03 1.64e-03 5.54e-03 2.07e-03

1.02e-03 7.56e-04 1.20e-03 1.04e-03 6.02e-04 9.24e-04

12 Hybridization of the Differential Evolution Algorithm …

195

Table 12.6 Average convergence values (γ ) for the ZDT instances (Pareto fronts with 25,000 solutions)

ZDT1 ZDT2 ZDT3 ZDT4 ZDT6 Mean

NSDE

DEMO

ADE-MOIA

IWODEMO

2.28e-04 1.50e-04 9.05e-05 6.72e-04 2.40e-05 2.33e-04

1.23e-05 1.45e-05 2.71e-05 1.39e-05 2.39e-05 1.83e-05

1.09e-04 6.33e-05 1.40e-04 1.01e-03 4.79e-03 1.22e-03

4.80e-06 3.91e-06 2.66e-05 4.89e-06 2.31e-05 1.27e-05

Table 12.7 Average diversity values () for the ZDT instances

ZDT1 ZDT2 ZDT3 ZDT4 ZDT6 Mean

NSDE

DEMO

ADE-MOIA

IWODEMO

0.32 0.33 0.34 0.30 1.31 0.52

0.36 0.36 0.37 0.33 0.34 0.35

0.11 0.10 0.19 0.20 0.32 0.18

0.12 0.12 0.17 0.13 0.12 0.13

Table 12.7 gives the average of the ten diversity values  for each ZDT instance and each algorithm. Note that the diversity is independent of the number of solutions in the PO fronts. The two extreme PO solutions are enough to determine . ADEMOIA has (slightly) the most diversity for ZDT1 and ZDT2, while IWODEMO has the most for ZDT3, ZDT4 and ZDT6. In fact, IWODEMO and ADE-MOIA are the only two algorithms that use a progressive elitist replacement, i.e., to obtain the next population, they eliminate the “worst” solutions, one by one, calculating the isolation factor each time. The process is more time-consuming, but the results confirm that it gives a greater solution diversity for the ZDT instances. To complete this comparative study, Table 12.8 gives the average execution times for the ZDT instances. These values show that DEMO and IWODEMO are somewhat faster than NSDE and ADE-MOIA in optimizing the ZDT instances.

12.5.3 Analysis Now that the four algorithms have been compared with respect to convergence, diversity and execution time, it is possible to make some supplementary remarks for each algorithm. The above results for the ZDT and DTLZ instances show that

196

C. Gagné et al.

Table 12.8 Average execution times (in seconds) for the ZDT instances

ZDT1 ZDT2 ZDT3 ZDT4 ZDT6 Mean

NSDE

DEMO

ADE-MOIA

IWODEMO

0.40 0.40 0.50 0.60 0.50 0.48

0.20 0.20 0.20 0.20 0.20 0.20

0.40 0.40 0.40 0.70 0.70 0.52

0.20 0.20 0.30 0.20 0.30 0.24

the NSDE algorithm rapidly obtains a good approximation of the PO fronts. Even if NSDE seems less efficient than the other three algorithms, it is easy to implement. The DEMO algorithm is also quite easy to implement. The results show that it gives approximations of good quality. Its authors introduced concepts which were used later in ADE-MOIA and IWODEMO. The results also show that DEMO is the fastest of the four algorithms, generally performing better than NSDE. The main difference between the two algorithms is in the selection producing children. NSDE uses a tournament selection, while DEMO produces a child for each solution in the current population. Because of the convergence capacity of the mutation DE/rand/1/bin, it seemed preferable to choose a selection which diversifies the solutions, like that of DEMO. ADE-MOIA is the slowest of the four algorithms tested here, mainly because of its replacement strategy whose goal is to maximally diversify the solutions in the approximation. The results for the DTLZ and ZDT instances show that progressive replacement attains this goal, since ADE-MOIA has more solution diversity than DEMO and NSDE. However, ADE-MOIA has the poorest convergence of solutions towards the PO fronts, among the four algorithms. It is also slightly more complicated to implement, because of the sets E X A and D A needed by the mutation DE/rand/1/bin. The results above show that the new algorithm IWODEMO meets the goals of MO optimization. For the DTLZ and ZDT instances, IWODEMO obtains approximations with very stable convergence and diversity metrics, whose average values always exceed those of the other three algorithms. We can say, therefore, that IWODEMO keeps a good balance between the convergence toward the PO front and the diversity of solutions. Indeed, the IWO part of IWODEMO makes a mutation which efficiently diversifies the PO solutions. Moreover, convergence is enhanced by allowing the best solutions of the current population to produce several descendants.

12 Hybridization of the Differential Evolution Algorithm …

197

12.6 Conclusion This chapter has presented IWODEMO, a new Pareto MO algorithm, hybridizing the concepts of differential evolution and invasive weed optimization. IWODE-MO’s performance was tested using the well-known MO instances DTLZ and ZDT. The results show that this hybrid approach efficiently uses the advantages of the two concepts. A natural conclusion for these experimental results is that IWODEMO is an efficient alternative for optimizing MO problems. In particular, the results highlight the importance of diversification and solution search intensification mechanisms, for the overall performance of an MO algorithm. The proliferation phase from IWO and the adaptive parameters of IWODEMO seem to be a complement that fits well with DE, which is already known to have a good global search capacity. We plan to continue testing IWODEMO in order to further validate the results obtained. This can be done by increasing the number of objective functions, varying the parameters such as C R and F, and tweaking the progressive replacement in IWODEMO, in order to reduce its computation cost while retaining its benefits for solution diversity. Finally, it would be interesting to verify the efficiency of the algorithm on real industrial problems. Acknowledgements The financial support of the NSERC (Canada) has made this research and its publication possible.

References 1. A. Basak, D. Maity, S. Das, A differential invasive weed optimization algorithm for improved global numerical optimization. Appl. Math. Comput. 219(12), 6645–6668 (2013) 2. I. Boussaïd, J. Lepagnot, P. Siarry, A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013) 3. S. Das, P.N. Suganthan, Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15(1), 4–31 (2011) 4. K. Deb, R.B. Agrawal, Simulated binary crossover for continuous search space. Complex Syst. 9(3), 1–15 (1994) 5. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 6. K. Deb, L. Thiele, M. Laumanns, E. Zitzler, Scalable test problems for evolutionary multiobjective optimization. TIK Report 112, Computer Engineering and Networks Laboratory (TIK), ETH Zurich, Jul 2001 7. I. Fister Jr., X.S. Yang, I. Fister, J. Brest, D. Fister, A brief review of nature-inspired algorithms for optimization. Electrotech. Rev. 80(3), 1–7 (2013) 8. M. Fournier, Hybridation entre l’évolution difféentielle et l’algorithme de la mauvaise herbe. mathesis, Universite du Quebec a Chicoutimi, 2016 9. A. Iorio, X. Li, Solving rotated multi-objective optimization problems using differential evolution, in Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence (AI04), ed. by G. Webb, X. Yu, vol. 3339, Cairns, Australia (Springer, Berlin, 2004), pp. 861–872 10. D. Kundu, K. Suresh, S. Ghosh, S. Das, B.K. Panigrahi, S. Das, Multi-objective optimization with artificial weed colonies. Inf. Sci. 181(12), 2441–2454 (2011)

198

C. Gagné et al.

11. Q. Lin, Q. Zhu, P. Huang, J. Chen, Z. Ming, J. Yu, A novel hybrid multi-objective immune algorithm with adaptive differential evolution. Comput. Oper. Res. 62(C), 95–111 (2015) 12. A.R. Mehrabian, C. Lucas, A novel numerical optimization algorithm inspired from weed colonization. Ecol. Inform. 1(4), 355–366 (2006) 13. A.H. Nikoofard, H. Hajimirsadeghi, A. Rahimi-Kian, C. Lucas, Multiobjective invasive weed optimization: application to analysis of pareto improvement models in electricity markets. Appl. Soft Comput. 12(1), 100–122 (2012) 14. K. Price, R.M. Storn, J.A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization. Natural Computing Series (Springer, Germany, 2005) 15. T. Robic, B. Filipic, Demo: differential evolution for multiobjective optimization, in International Conference on Evolutionary Multi-Criterion Optimization (EMO 2005), ed. by C.A.C. Coello, A.H. Aguirre, E. Zitzler. Lecture Notes in Computer Science Book Series (LNCS), vol. 3410 (Springer, Berlin, 2005), pp. 520–533 16. R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997) 17. D. Wolpert, W. Macready, No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997) 18. A. Zinflou, C. Gagné, M. Gravel, Gismoo: a new hybrid genetic/immune strategy for multiple objective optimization. Comput. Oper. Res. 39(9), 1951–1968 (2012) 19. E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)

Chapter 13

A Steganographic Embedding Scheme Using Improved-PSO Approach Yamina Mohamed Ben Ali

Abstract In this paper, we are interested by tackling the steganography task as an optimization problem carried out by a bio-inspired approach. Indeed, a novel embedding scheme using the substitution principle of LSB and related to an improved version of particle swarm optimization algorithm is proposed. Therefore, to fulfill the requirement of a good steganographic tool, some updating at the evolution rules of a particle were done. The improved-PSO embedding scheme looks for the best pixels locations, and eventually the best pixels bits to hide secret messages -both text and image- without degrading the quality of the original image. Thus, the robustness of the proposed steganographic scheme relates strongly on the strength of the improvedPSO algorithm in order to achieve higher imperceptibility. Three test cases were undertaken to highlight the performance of the improved-PSO embedding scheme.

13.1 Introduction The steganography can be considered as a branch of cryptography that tries to hide messages within others [1, 2], avoiding the perception that there is some kind of message. Thus, there are two trends to implement steganographic algorithms: the methods that work in the spatial domain, and the methods that work in the transform domain. While the algorithms that work in the transform domain are more robust, that is, more resistant to attacks, the algorithms that work in the spatial domain are simpler and faster. The most common steganographic technique in spatial domain is the simplest Least Significant Bit Insertion method -LSB- [3, 4] in which the hidden message is converted to a stream of bits which replace the pixels values in the cover image. This sort of steganography is only suitable for images stored in bitmap form or losslessly compressed. Unfortunately, it is vulnerable to even a slight image manipulation like converting an image from a format like GIF or BMP [5]. Therefore, many works were done based on both conventional and adaptive scheme of LSB method [6–8]. Y. M. B. Ali (B) Computer Science Department, University of Badji Mokhtar, BP 12, 2300 Annaba, Algeria e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_13

199

200

Y. M. B. Ali

However, others researchers looked at others issues such the use of discrete cosine transform -DCT- and others techniques as more powerful steganographic too [9–14]. Improving the steganography relies also on the use of bio-inspired algorithms as optimization tool. Indeed, Jackson et al. [15, 16] proposed a computational immune system approach to blind steganography detection. In [17], the authors attempt to use genetic algorithms for achieving the optimal data imperceptibility. Wu et al. in [18] applied the LSB substitution and genetic algorithm to develop two different optimal substitution strategies. A novel steganographic method, based on JPEG and particle swarm optimization algorithm (PSO), is proposed in [19]. In the same trend for achieving more imperceptibility when communicating images, Fazli et al. [20] presented a novel method to embed secret message in the cover-image so that the interceptors will not notice about the existence of the hidden data. In [21] the authors presented a genetic algorithm based method for breaking steganalytic systems. In [22], Wang et al. proposed for more performance to involve both genetic algorithm and a dynamic programming. In [23], the authors applied the ant colony optimization algorithm to construct an optimal LSB substitution matrix. Brazil et al. in [24] proposed a hybrid heuristic, combining a genetic algorithm and the path relinking metaheuristic. Others works have also been reported in [25–27]. For recent works, we can cite the work of Veena et al. [28] which introduced an hybrid technique of greedy randomised adaptive search-binary grey wolf optimization, in addition to the work of Mohamed Ben Ali where the author introduced a novel metaheuristic named smell bees optimization for a new steganographic embedding scheme in spatial domain [29]. In this paper, an limproved-PSO embedding scheme is introduced in order to hide a secret message (text, image) inside a cover image with the aim of achieving a higher imperceptibility. The proposed approach uses an improved version of the particle swarm optimization algorithm to carry out the steganography as an optimization problem. Thus, the paper is organized as follows: Sect. 13.2 describes the used bio-inspired approach of PSO, while Sect. 13.3 describes the proposed improved approach. In Sect. 13.4, we provide some experimental tests, and conclude our work by a conclusion.

13.2 Particle Swarm Optimization Particle Swarm Optimization -PSO- is a population-based stochastic optimization technique developed in 1995 [30, 31] inspired by the social behavior of bird flocking or fish schooling. A population of particles randomly initialized in a search space tries to look for the optimal movement. Indeed, each particle i is assigned to it three flying parameters: its current position in the search space #‰ x i , its current velocity #‰ #‰ i ib v , in addition to its memorized best past position x . However, for the next flying direction at t+1, the particles must coordinate with each others to update their posi-

13 A Steganographic Embedding Scheme Using Improved-PSO …

201

tions and velocities relating to the global best particle movement designed by #‰ x gb . The movement of PSO individuals can be described by Eqs. (13.1) and (13.2): #‰ v i (t + 1) = ω #‰ v i (t) + c1r1 ( #‰ x ib (t) − #‰ x i (t)) + c2 r2 ( #‰ x gb (t) − #‰ x i (t)) #‰ x i (t) + #‰ v i (t + 1) x i (t + 1) = #‰

(13.1) (13.2)

The parameter ω is called the inertia factor, [c1 , c2 ] are acceleration factors and r1 , r2 are random values taking their values from [0,1].

13.3 Steganographic Scheme Based Improved-PSO 13.3.1 General Description The proposed approach introduces an adaptive embedding scheme based on LSB principle, i.e. hiding a digit of a secret message in one pixel Bit. The objective is then not only to avoid the detection of the hided message but also to avoid that a third person think that such hiding is possible. Figure 13.1 outlines the principle of the embedding scheme related on the improved PSO algorithm. Thus, each particle at time t represents a stego image (original image incremented with the hidden message) and then must evolve during generations looking for the best stego image that which leads to a more imperceptibility as pertinent criterion. In fact, the encoded parameters of a particle are two vectors of m-dimensions representing respectively the encoded image pixels and bits and where m is the length of the hidden message. The following steps show the main algorithm steps.

Fig. 13.1 Particle swarm schema for stego image optimization

202

Y. M. B. Ali

1. 2. 3. 4.

Step 1: Introduce the hidden message under the text form. Step 2: Convert the text into a binary message. Step 3: Determine the number of pixels needed to hide the binary message. Step 4: Initialize a population of particles with respectively random pixels and random bits positions. 5. Step 5: Evaluate each particle according to the appropriate fitness function (distance). 6. Step 6: Select the best particle that embodies the pertinent pixels that means the best solution. 7. Step 7: Reiterate the adaptive particle swarm optimization process, by updating the particles positions and velocities, until the best stego-image will be reached which means stopping criterion was satisfied.

13.3.2 Improved-PSO Embedding Scheme The main improvements reported to the conventional PSO algorithm were the updating position and velocity rules. Additionally, and in order to accelerate the moves of particles and to provide them the ability to explore far regions in the search space an adaptive inertia weight was introduced. The enhancement done on the inertia weight parameter is highlighted by Eq. (13.3) which shown an acceleration movement determined at the first generations in favor of an exploitation strategy that yields to a local search. ω(t + 1) = e−ω(t)

(13.3)

Initially, ω(t = 0) is set to the empirical value 0.4. However, in order to avoid stagnation in a local optimum especially after some evolution steps, we introduced a second acceleration parameter in Eq. (13.4) devoted to an exploration with large steps jumps. η(t) = β × N (0, 1)

(13.4)

Where β ∈ [1, 10]. In light of these updating rules Eqs. (13.1) and (13.2) were rewritten according to the novel modifications. So, the updated velocity rule was given by Eq. (13.5) and the updated position rule was given by Eq. (13.6). #‰ v i (t) + c1 r1 ( #‰ x ib (t) − #‰ x i (t)) + c2 r2 ( #‰ x gb (t) − #‰ x i (t)) (13.5) v i (t + 1) = ω(t) #‰ #‰ x i (t) + η(t) × #‰ v i (t + 1) x i (t + 1) = #‰

(13.6)

13 A Steganographic Embedding Scheme Using Improved-PSO …

203

Obviously, and like any optimization problem, herein a minimization objective, the improved embedding scheme requires an objective function that fulfills all prerequisite cited above such as the higher imperceptibility. For this purpose, we used as the mean square error (MSE) measure as the fitness function f which compares original and stego images as illustrated in Eq. (13.7). MSE =

 1  (I (x, y) − I (x, y))2 M×N x y

(13.7)



Where M × N is the size of the image, I (x, y) and I (x, y) are respectively the intensities of a pixel at the position (x,y) before and after hiding the secret message. Figure 13.2 detailed the proposed algorithm.

13.4 Experimental Setup To evaluate the performance of the propose approach, we developed an algorithm with Java (programming language) under Eclipse environment using Pentium® DualCore CPU 1.93 GB of RAM. The experiments are divided up in three sub-sections showing each one a performance facet of the proposed embedding scheme.

13.4.1 Experimental Case 1 This experiment aims to compare the performance results achieved from the improved-PSO algorithm and the conventional LSB technique. For this purpose, we handle three messages with different sizes such as: M1 = 303 bytes, M2 = 342 bytes, and M3 = 1078, as well as three cover images illustrated in Fig. 13.3 and. The performance measures undertaken are the MSE (described above), the Peak Signal-to-Nose Ratio -PSNR- described by Eq. (13.8), and the Signal-to-Noise Ratio (SNR) described by Eq. (13.9).   Im ax 2 (13.8) P S N R = 10 × log10 MSE Where Im ax represents the maximal intensity value of the image. Therefore, in the case of gray-level images, Im ax = 255. The PSNR value approaches infinity as the M S E approaches zero. Thus, higher PSNR higher image quality, and smaller PSNR, higher difference between the original image and the best retrieved stego image.   Ia v 2 (13.9) S N R = 10 × log10 MSE

204

Fig. 13.2 Improved-PSO embedding scheme Fig. 13.3 Three test images: cameraman, clown, and medical

Y. M. B. Ali

13 A Steganographic Embedding Scheme Using Improved-PSO …

205

Table 13.1 Comparison performances between Improved-PSO scheme and LSB Images Cameraman Clown Medical

M1

M2

M3

MSE PSNR SNR MSE PSNR SNR MSE PSNR SNR

ImprovedPSO 9.46E−3 68.3684 62.6701 1.02E−2 68.0191 62.3208 3.37E−2 62.8435 57.1452

LSB 2.42E−2 64.2801 58.5818 2.73E−2 63.7658 58.0674 8.11E−2 59.0368 53.3385

ImprovedPSO 9.05E−3 68.5599 63.0026 1.03E−2 67.9887 62.4314 3.32E−2 62.9151 57.3578

LSB 2.43E−2 64.2630 58.7057 2.74E−2 63.7467 58.1894 8.09E−2 59.0471 53.4897

ImprovedPSO 8.12E−3 69.0342 62.9933 9.37E−3 68.4127 62.3719 3.03E−2 63.3093 57.2685

LSB 2.32E−2 64.4736 58.4327 2.30E−2 64.4961 58.4553 5.11E−2 61.0436 55.0028

Where Ia v represents the average intensity of the target image. Table 13.1 shows the achieved comparison results. As we can see, for all tests the improved-PSO algorithm outperforms than LSB. Related to the higher PSNR values (in decibel), the proposed algorithm ensures a higher imperceptibility avoiding the degradation of the cover image. In general, the quality of the stego image is sensitive to the size of the embedded secret message. Indeed, the results show that as well as the message size increases the PSNR and SNR values decrease but till over the acceptable level of goodness assuming good performance.

13.4.2 Experimental Case 2 In this experiment, we intend to compare PSNR performances of the improved-PSO scheme with two efficient steganography methods, the algorithm described in [19] and JQTM described in [14]. For this purpose, six grayscale secret images: Lena, Mandrill, Woman, Boat, Goldhill and Girl with size 256 × 256 are considered. The size of the secret image in [14] is 104 × 64, and those of [19] is 128 × 72. For testing our algorithm, we undertake a message of size 128 × 72. Figure 13.4 shows the reached results from the proposed method. Indeed, although the secret image is embodied in the original image, it appears clearly that the best stego-image (in the middle) seems very similar to the original one. To better view the efficiency of the proposed method we have also picked the worst image (at the right) to demonstrate that the algorithm has really found the optimum pixels that could hide efficiently the message, because otherwise we will achieve the worst image which is strongly distorted. Table 13.2 illustrates the achieved results provided by the improved-PSO sche-me and those reported by their authors in respectively [14, 19]. It is clear that not only

206

Y. M. B. Ali

Fig. 13.4 From left to right: Original image, best stego-image, and worst stego-image of respectively Lena, Mandrill, Woman, Boat, Goldhill and Girl images

Table 13.2 Image quality PSNR of respectively JQTM, [19], and the proposed scheme Methods Lena Mandrill Woman Boat Goldhill Girl JQTM [19] ImprovedPSO

36.81 37.06 41.81

31.17 31.28 39.15

35.42 35.71 41.41

36.05 36.29 41.05

36.43 36.78 41.44

37.64 38.02 42.56

13 A Steganographic Embedding Scheme Using Improved-PSO …

207

the improved-PSO scheme outperforms compared to its competitors by providing well results.

13.4.3 Experimental Case 3 This experiment emphasis on the fact that not only the imperceptibility is a crucial criterion to reach but preserving the quality of the stego image against some attacks when communicated is also required to confirm the efficiency of the steganographic tool. Indeed, to evaluate the comportment of Improved-PSO algorithm against attacks, three JPEG cover images of resolution 512 × 512 and a secret image of size 64 × 64 were used. Thus, the performance measures of robustness taken into account is the normalized correlation (NC) coefficient described by Eq. (13.10). L  L 

N C(X, Y ) =

(X (i, j) × Y (i, j))

i=1 j=1 L  L 

(13.10) X (i,

j)2

i=1 j=1

Table 13.3 Robustness performances against conventional image processing attacks (NC values) Image Algorithm Noise addition Lena

Mandrill

Peppers

Image Lena

Mandrill

Peppers

ImprovedPSO [25] ImprovedPSO [25] ImprovedPSO [25] Algorithm ImprovedPSO [25] ImprovedPSO [25] ImprovedPSO [25]

Gaussian 1.0019

Salt & Pepper 0.9964

Poisson 1.0003

0.9204 0.9905

0.9149 0.9980

0.9562 0.9990

0.8511 0.9764

0.8975 0.9888

0.9387 0.9963

0.9196 Median 3 × 3 0.9991

0.9010 Median 5 × 5 0.9982

0.9493 Mean 3 × 3 0.9986

Mean 5 × 5 0.9968

0.9785 0.9826

0.9029 0.9754

0.6577 0.9798

0.6577 0.9714

0.8806 0.9991

0.8867 0.9977

0.6730 1.0051

0.6722 1.0075

0.9599

0.9053

0.6419

0.6419

208

Y. M. B. Ali

Table 13.4 Robustness performances against conventional geometric attacks (NC values) Image Algorithm Crop 10% Crop 20% Crop 40% Hist. equalization Lena

Mandrill

Peppers

Image Lena

Mandrill

Peppers

ImprovedPSO [25] ImprovedPSO [25] ImprovedPSO [25] Algorithm

0.9056

0.8450

0.6193

1.0227

0.9378 0.8461

0.8861 0.7102

0.7803 0.4781

0.9503 1.0495

0.9243 0.8898

0.8757 0.7890

0.8204 0.6060

0.7936 0.9864

0.9283 Scale × 0.8

0.8722 Scale × 1.2

0.7654 Scale × 1.5

ImprovedPSO [25] ImprovedPSO [25] ImprovedPSO [25]

0.7586

0.9575

0.9483

0.9698 Rotation 1-degree 0.9825

0.9478 0.7490

0.9388 0.9364

0.9046 0.9060

0.8757 0.9592

0.8608 0.7827

0.8586 0.9715

0.8144 0.9519

0.7881 0.9842

0.9599

0.9633

0.8933

0.8593

where L is the size of embedded image, and X and Y are respectively the embedded image (after hiding) and its extraction after undergoing attacks. As cited in literature, conventional attacks in image processing further to geometric attacks are performed. The first category gathers Gaussian noise with zero mean and 0.01 variance, salt and pepper noise, and Poisson noise which all are added to the stego image. Mean and median filters of size 3 × 3 and 5 × 5 were also considered. The second category gathers cropping attacks with different rates 10%, 20%, and 40%, scaling attacks with 0.8, 1.2 and 1.5 ratios, the 1-degree rotation attack, and histogram equalization operation. Therefore, to compare the performances of the proposed algorithm, we reported the results presented in the work reference [25]. As we can see on Table 13.3, and in comparison with the results of reference [25], the proposed algorithm outperforms and exhibits real improvement especially with regard to the noise generated by the mean filter of size 3 × 3 and 5 × 5. This phenomenon is due to the fact that the noise is dispersed in the entire image without any distinction between specific zones. On the other hand, against geometric attacks, our proposed algorithm fails to show successful results with respect to the algorithm results in [25], as it is observed on Table 13.4. Indeed, the cropping attack decreases relatively the performance of the proposed method since whenever the attacked zone increases the normalized correlation value decreases in consequence until reaching

13 A Steganographic Embedding Scheme Using Improved-PSO …

209

NC value of 0.4781 on Mandrill image. The same behavior is also observed for the scale × 0.8 attack. Thus, for these two kind of attacks, the results reported in [25] are better compared to ours. For the rest of attacks, the proposed algorithm outperforms than the algorithm of [25].

13.5 Conclusion This paper presents an improved-PSO scheme as a steganographic tool for intelligently hide secret messages inside a cover image based on the substitution principle. Thus, the approach consists at looking for the best pixels and best bits in order to maximize the imperceptibility and then to minimize the error between cover and stego images in order to avoid discrimination between them. The proposed approach easy to implement has shown good results with regard to the size of the embedding secret message. Other performances to evaluate the robustness of the approach were also be carried out considering some image processing attacks. Consequently, and seen the achieved results, we can say that the proposed improved-PSO algorithm still a serious competitor algorithm at least in the steganography context.

References 1. R. Chandramouli, M. Kharrazi, N. Memon, Image steganography and steganalysis: concepts and practice, in LNCS Proceedings, vol. 2939 (Springer, Berlin, 2004), pp. 35–49 2. A. Donovan, Digital steganography: hiding data within data. IEEE Internet Comput. 5(3), 75–80 (2001) 3. J. Fridrich, M. Goljan, D. Rui, Reliable detection of LSB steganography in color and grayscale images. Mag. IEEE Multimed. Spec. Issue Multimed. Secur. 8, 22–28 (2001) 4. A.D. Ker, Improved detection of LSB steganography in grayscale images, in Proceedings of 6th Information Hiding Workshop, LNCS, vol. 3200 (2004), pp. 97–115 5. R. Chandramouli, N. Memon, Adaptive of LSB based image steganography techniques, in Proceedings of International Conference on Image Processing, vol. 3 (2001), pp. 1019–1022 6. S. Manoharan, Towards robust steganography using T-codes, in Proceedings of 4th EURASIP Conference Focused on Video/Image Processing and Multimedia Communications (2003), pp. 707–711 7. C.-K. Chan, L.M. Cheng, Hiding data in images by simple LSB substitution. Pattern Recognit. 37(3), 469–474 (2004) 8. T. Sakakura, A. Hayashi, A simple detection scheme of LSB steganography based on statistics of image difference signal, in Proceedings of the International Symposium on Information Theory and Its Applications (2010), pp. 320–325 9. H.-W. Tseng, C.-C. Chang, High capacity data hiding in JPEG-compressed images. Informatica 15, 127–142 (2004) 10. S.K. Muttoo, S. Kumar, Secure image steganography based on Slantlet transform, in Proceedings of International Conference on Methods and Models in Computer Science (2009), pp. 102–108 11. C.-C. Chang, H.-W. Tseng, Data hiding in images by hybrid LSB substitution, in Proceedings of the 3rd International Conference on Multimedia Ubiquitous Engineering (2009), pp. 360–363

210

Y. M. B. Ali

12. D.-C. Huang, Y.-K. Chan, W. Jhin-Han, An agent-based LSB substitution image hiding method. Int. J. Innov. Comput. Inf. Control 6(3), 1023–1038 (2010) 13. F.H. Rabevohitra, J. Sang, Using PSO algorithm for simple LSB substitution based steganography scheme in DCT transformation domain, in Advances in Swarm Intelligence, LNCS, vol. 6728 (2011), pp. 212–220 14. C.C. Chang, T.S. Chen, L.Z. Chung, A steganographic method based upon JPEG and quantization table modification. Inf. Sci. 141, 123–138 (2002) 15. J.T. Jackson, G.H. Gunsch, R.L. Claypoole, G.B. Lamont, Blind steganography detection using a computational immune system: a work in progress. Int. J. Digit. Evid. 4(1), 1–19 (2003) 16. J.T. Jackson, G.H. Gunsch, R.L. Claypoole, G.B Lamont, Novel steganography detection using an artificial immune system approach, in Proceedings of the Congress on Evolutionary Computation (2003), pp. 139–145 17. S.P. Maity, M.K. Kundu, P.K. Nandi, Genetic algorithm for optimal imperceptibility in image communication through noisy channel, in Proceedings of the International Conference on Neural Information Processing, LNCS, vol. 3316 (2004), pp. 700–705 18. M.-N. Wu, M.H. Li, C.C. Chang, A LSB substitution oriented image hiding strategy using genetic algorithms, in Proceedings of the Advanced Workshop On Content Computing, LNCS, vol. 3309 (2004), pp. 219–229 19. X. Li, J. Wang, A steganographic method based upon JPEG and particle swarm optimization algorithm. Inf. Sci. 177, 3099–3109 (2007) 20. S. Fazli, M. Kiamini, A high performance steganographic method using JPEG and PSO algorithm, in Proceedings of the IEEE International Multitopic Conference (2008), pp. 100–105 21. F.Y. Shih, Y.-T. Wu, Digital steganography based on genetic algorithm, in Handbook of Research on Secure Multimedia Distribution (2009), pp. 439–453 22. C.-H. Yang, S.-J. Wang, Transforming LSB substitution for image-based steganography in matching algorithms. J. Inf. Sci. Eng. 26(4), 1199–1212 (2010) 23. C.-S. Hsu, S.-F. Tu, Finding optimal LSB substitution using ant colony optimization algorithm, in Proceedings of the Second International Conference on Communication Software and Networks (2010), pp. 293–297 24. A.L. Brazil, A. Sanchez, A. Conci, N. Behlilovic, Hybridizing genetic algorithms and path relinking for steganography, in Proceedings of ELMAR (2011), pp. 285–288 25. E. Vahedi, R.A. Zoroofi, M. Shiva, Toward a new wavelet-based watermarking approach for color images using bio-inspired optimization principles. Digit. Signal Process. 22, 153–162 (2012) 26. P. Fakhari, E. Vahedi, C. Lucas, Protecting patient privacy from unauthorized release of medical images using a bio-inspired wavelet-based watermarking approach. Digit. Signal Process. 21, 433–446 (2011) 27. H. Al-Qaheri, Digital watermarking using ant colony optimization in fractional Fourier domain. J. Inf. Hiding Multimed. Signal Process. 1(3), 179–189 (2010) 28. S.T. Veena, S. Arivazhagan, W.S.L. Jebar, Improved detection of steganographic algorithms in spatial LSB stego images using hybrid GRASP-BGWO optimisation, in Proceedings of Nature Inspired Optimization Techniques for Image Processing Applications (2019), pp. 89–112 29. Y.M.B. Ali, Smell Bees optimization for new embedding steganographic scheme in spatial domain. Swarm Evol. Comput. 44, 584–596 (2019) 30. J. Kennedy, R.C. Eberhart, Particle swarm optimization, in Proceedings of the IEEE International Conference on Neural Networks (1995), pp. 1942–1948 31. J. Kennedy, R.C. Eberhart, Swarm Intelligence (Morgan Kaufmann, Burlington, 2001)

Chapter 14

Algorithms Towards the Automated Customer Inquiry Classification Gulshat Kessikbayeva, Nazerke Sultanova, Yerbolat Amangeldi, and Roman Yurchenko

Abstract Classification is an important field of research due to the increase of unstructured text, especially in form of customer inquiries. The problem has two phases such as generally identifying a customer inquiry and automatically assigning an order of a product requested by customer with predefined categories based on its characteristics. These two phases can be accomplished by using various techniques at each interior steps. Choosing the proper technique can affect the efficiency of the text classification performance by saving a time and physical effort. The aim of this paper is to present a classification model that supports efficiency while working with Russian texts, since it is known that machine learning algorithms proved to be working well with English texts. After performing most challenging task which was the preprocessing of unstructured text by stemming, parsing and indexing, the following some logical sequence of steps and analysis with compatible combination of the embedded techniques gives us a chance for comparing algorithms and their behavior on different type of normalized and Unnormalized data. The experimental results over 33000 dataset have been performed using bi-grams, TF-IDF scores along with their parsed frequencies and shows that SVM and Naive Bayesian algorithms outperform others for normalized data. Moreover, optimization using stochastic gradient descent was applied along with neural network, and the results were compared with G. Kessikbayeva · N. Sultanova (B) · Y. Amangeldi · R. Yurchenko Suleyman Demirel University, Kaskelen, Kazakhstan e-mail: [email protected] G. Kessikbayeva e-mail: [email protected] Y. Amangeldi e-mail: [email protected] R. Yurchenko e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_14

211

212

G. Kessikbayeva et al.

the traditional machine learning algorithm. The results have proven the capability of the proposed model’s performance can be improved by identifying outliers and their patterns.

14.1 Introduction There are large number of industries that manually classify each customer inquiry according to their class and type which they belongs to. In such cases it would be very useful to classify those inquiries more efficiently. This paper proposed methods and solutions to automatically classify those inquiries by using various Machine Learning Algorithms like SVM, Naive Bayes, Logistic Regression by vectorizing texts using TF-IDF. This methods would dramatically decrease the time that would be spent to manually do it. Since all texts are in Russian, default text preprocessing methods have to be analyzed to make sure that they are relevant to texts in Russian. This analysis would be very helpful for industries that are involved in text classification and categorization of any sort. The organization of this paper is given below: Sect. 14.2 contains Related Works that have been conducted to similar topic and do the review of algorithms that were used in this research. Section 14.3 explains methods and techniques that were used during this research. Sections 14.4 and 14.5 demonstrates data description with its properties and Empirical Analysis of our methods and their results. Section 14.6 shows and compares the results of optimization with neural networks. Section 14.7 concludes the work. Section 14.8 explains how results can be improved and give other tips related to Future Work.

14.2 Related Works Text classification methods have become widely studied field in recent years. Many researchers already found efficient algorithms for text classification and proved its effectiveness in classifying the unstructured text documents in English. Borodkin et al. [2] have presented the way of preprocessing the unstructured text document. Further they mentioned many problems that can occur during preprocessing and provided the solutions for such cases. Additionally, they compared many machine-learning algorithms by estimating the results after training those algorithms on pre-processed data. In this work, authors used Machine learning algorithms such as k-Nearest Neighbours, Naive Bayes, and k-means within the Software Package which is a black box. Their comparative results shows that k-means algorithm outperforms others and have precision of 78,33% Mowafy et al. [11] provided their sequence of methods to perform preprocessing. Besides that, they profoundly discussed about tf-idf algorithm and its applying to their work. Furthermore, they trained various machine learning algorithms and calculated

14 Algorithms Towards the Automated Customer Inquiry Classification

213

the most suitable ones to work with the format obtained after vectorizing using tfidf. The authors concluded superiority of Multinomial Naive Bayes with TF-iDF. Moreover, proposed model has and application of chi-square which increased the efficiency from 71 to 76%. Ashwin et al. [5] discussed about text analytics,. They reviewed many text related problems that companies tried to solve using text analytics. For each problem they provided the method of solution and the evaluation of that method. It was revealed that most of the problems with text classification in industry require robust algorithm since the data is collected from different sources. Additionally, k-means algorithm was proposed as one the convenient algorithms for high clustering quality. Authors suggest to use minimum supervision, instead apply TF-IDF or GloVe. Tilve and Jain [1] compared three different text classification algorithms(VSM, Naive Bayes, and Standford Tagger) that were trained by two unsimilar datasets(20 Newsgroupd and New news). According to their results, Naive Bayes algorithm is better choice among the others due to its performance and simplicity. Overall, all known algorithms were analyzed and their results were compared. It is known that algorithms outperforms one each other according to given data and its type. However some application identities may affect variations in results. Therefore, analyzing best known algorithms in our case is indispensable to compare their efficiency such as: • Random Forest Classifier—is one kind of bagging algorithms where a number of decision trees used on a various subsets of dataset and use the averaging to decrease the error and prevent over-fitting [3]. • Support Vector Machines—are universal learners that are used for classification, regression and outlier detection purposes.It is very powerful algorithm since it is very effective in high dimensional spaces and have different kernel functions [6]. • Multinomial Naive Bayes—is supervised learning method that is suitable for classification the discrete values or fractions such as tf-idf. Naive Bayes algorithms are based on statistical Bayes Theorem where every pair of features are assumed to be independent [13]. • Logistic Regression—is a supervised learning algorithms that uses logit function as its base. The goal of logistic regression is to find the best suitable hypothesis function [7] • Decision Tree Classifier—is non-parametric method that is used for supervised learning. This algorithm used tree-like graph model of decisions [12].

14.3 Methods It is intended to compare all algorithms using TF-IDF as a Vector Space model for text extraction. In order to be able to build a classification model, there are some general steps to perform such as:

214

G. Kessikbayeva et al.

Fig. 14.1 The scheme that shows the work process of data

• Preprocessing phase • Training/Testing phase • Usage phase In this section, these steps are clarified by giving examples and illustrating in details (Fig. 14.1).

14.3.1 Preprocessing Phase Preprocessing phase includes removing stop-words tokenization, removing, stemming. • Tokenization—is the process of partitioning string into list of tokens. • Removing stop-words—is the process that gets rid of stop-words(e.g.‘or’, ‘and’, ‘etc.’) that helps to clean the string and use only meaningful words. • Stemming—is the process that converts different words into similar canonical form. First of all 4 input fields were concatenated to be used as one input. Each of that input is treated like different document. NLTK library’s tokenizer [10] is used in order to perform tokenization (Fig. 14.2). After obtaining the list of tokens, the data must proceed to stop-words removal process that excludes all stop-words and numbers as well. Stop-words remove process uses NLTK library’s default corpus for Russian language [10]. In addition to stopwords and numbers removing, named entities were removed by library too [10] because they do not have any impact on classifier. The next and the last step of preprocessing is stemming. In order to perform stemming pymoprphy2 [9] was used. In this way the pre-processed input is obtained and preserved the text in a clear

14 Algorithms Towards the Automated Customer Inquiry Classification

215

Fig. 14.2 The scheme that shows the preprocessing process

prepared format for next phases in text classification which is hanging and testing phase

14.3.2 Training/Testing Phase In order to start training process the pre-processed input must be translated into more convenient form. One of the best techniques is to convert data into vector form and then use it to work with machine learning algorithms. In this case the data was translated into TF-IDF vectors. TF-IDF—Term Frequency—Inverse Document Frequency, is a numerical statistic that shows how important a word is to a document in a collection or corpus. Basically, TF-IDF is the product of two statistics, term frequency and inverse document frequency. • Term Frequency—is a number of times the word occurs in a document • Inverse Document Frequency—is a measure of how much information the word provides, that is, whether the term is common or rare across all documents.

Wi, j = t f i, j ∗ log(N /d f i ) where t f i, j = number of occurrences of word i in document j d f i = number of documents that contains the word i N = total number of documents

(14.1)

216

G. Kessikbayeva et al.

We use this w  s (weights) in our algorithms in such way: • Multinomial Naive Bayes: [13] Nci N count (w, ci ) + 1 p(ci |w) = count (ci ) + |vci | p(ci ) =

(14.2) (14.3)

Here Nci is the number of correct words in class c N is total number of words p(ci |w) is the probability of correct words in class ci given the weights w. • Logistic Regression: [7]

s(w) =

1 1 + e−w T x

(14.4)

where s(w) is sigmoid activation function w’s are weights and x’s are our features • Linear SVM: [6] sgn(w) : Sign function for Support Vector Classifier: w T x + w0 ≥ 1 if y = 1 w T x + w0 < −1 if y = −1 To get TF-IDF vectors scikit-learn’s TfIdfVectorizer have been used. This approach uses uni-grams as well as bi-grams. This would output sparse vector which has a feature size of 10459. D1 D2 ... Dj

T1 w1,1 w2,1 ... w j,1

T2 w1,1 w2,1 ... w j,1

... ... ... ... ...

Ti w1,i w2,i ... w j,i

Where Ti represent the words and the Di represent the documents. This phase is in charge for learning the classifier model and the output which is already a trained classifier and is ready for testing phase.

14.4 Data Description Data for this research were retrieved from the website www.kmggs.kz. The data consists of 2 separate databases with slightly different structures. First database consists of 32000 records of customer inquiries. Whereas, second one is 2000 detailed

14 Algorithms Towards the Automated Customer Inquiry Classification

217

records of inquiries. All this data is labeled into 20 classes (Fig. 14.5) that are different in fields. There are 4 main features that customer would provide and they are labeled into 1 main class and 5 consecutive classes that define the entity of that inquiry. 4 main features are: 1. 2. 3. 4.

Main description of an inquiry Short description of an inquiry Technical details of an inquiry Additional info about inquiry And the classes are as following:

1. 2. 3. 4. 5. 6.

Main class: type of a material Group of a material Class of a material Group of a purchase Class of an assessment Quantity of a material

This research mainly focuses on classifying main class because other classes are can be identified by the main class. From Fig. 14.3 you can see that records are not equally distributed. Also you can see that there are only 13 classes not 20. This is because other classes are almost nonexistent or can be seen as outliers. This data is more detailed and contains many classes so in analyzing data this research would mainly focus on this data. Whereas, second data-set contains large number of instances but it is not detailed but more generalized. From Fig. 14.4 you can also see that proportions are very similar to the previous one.

Fig. 14.3 Histogram of small data-set

218

G. Kessikbayeva et al.

Fig. 14.4 Histogram of large data-set

This data contains many lexical errors and are not structured. Each instance is written in heavily technical language with mixture of random symbols. Additional information and technical details fields has many empty cells.

14.5 Empirical Analysis Since prime goal of this research paper is to analyze and compare algorithms performance of text classification, main focus would be to analyze their accuracy and effectiveness in real data then compare their productivity. First of all, let’s look at TF-IDF. Scikit-learn’s TfIdfVectorizer was used to convert raw text into tfidfVectors which produced 10459 dimensional Vector for each document. this also used bi-grams to improve performance. Then cross-validation score was found using scikit-learn’s cross val score library. Figure 14.5 shows cross validation scores of 5 algorithms. As you can see from this figure LinearSVC and Naive-bayes algorithms outperformed others. Random forest and Decision tree classifiers are as expected have worse cross validation score in text classification. As can be seen from Fig. 14.5 best algorithms for now is LinearSVC. After splitting the dataset into test (30%) and train (70%) this model gave the accuracy of 86.33%. And the heatmap of actual vs predicted can be seen from the Fig. 14.6 As you can observe from the heatmap most misclassification are due to classes like “STRM”, “ERSA” and “ETSM” being classified as class “ROH”. The sole reason

14 Algorithms Towards the Automated Customer Inquiry Classification

Fig. 14.5 Cross validation score

Fig. 14.6 Heatmap of actual versus predicted

219

220

G. Kessikbayeva et al.

Table 14.1 F1-scores Class Precision AUTO PROS MOPS ROH ERSA ETSM LEER INVT TOPL BILD Avg/total

1.00 0.75 0.83 0.81 0.94 0.89 0.85 1.00 1.00 0.00 0.86

Recall

F1-score

Support

0.88 0.62 0.85 0.91 0.88 0.92 0.81 0.89 0.25 0.00 0.86

0.93 0.68 0.84 0.85 0.91 0.91 0.83 0.94 0.40 0.00 0.86

8 39 88 290 203 224 100 18 4 3 1032

for this may be due to the fact that class “ROH” outnumbers other classes very much. That is why classification report of this model is pretty poor in some of the examples with few test cases (Table 14.1).

14.6 Optimization with Neural Networks This chapter is devoted to evaluate whether the Artificial Neural Networks can be successfully applied to customer inquiry classification task, especially considering the limited number of dataset available in Russian. Indeed, it seems that is is valuable solution for this kind of problems. The literature for applications of Neural Networks is vast and growing. Hassan and Mahmood [4] have compared the linear classifiers and deep neural networks, and authors propose to improve the classification model by using deep learning techniques. Kowsari et al. [8] proposed the hierarchical deep learning model for text classification which combines a few neural network architectures, and achieved the 86% of accuracy. In this work the feed-forward fully connected neural network have been implemented with five hidden layers, each consists of 512 nodes. Choosing the optimizer for a neural network is crucial since well trained network need to have properly defined loss function. The first choice for the optimizer in this work was “adam” optimizer from sklearn library. Unfortunately, the accuracy for this was about 30–40%. Stochastic gradient descent was the optimization for the neural network. Stochastic Gradient Descent (SGD) simply updates the expectation and computes the gradient of the parameters using only a single or a few training examples. The new update is given by: θ = θ − α∇θ j (θ ; xi , yi )

14 Algorithms Towards the Automated Customer Inquiry Classification Table 14.2 F1-scores for neural network Class Precision Recall SRTM AUTO PROS MOPS ROH ERSA ETSM INVT TOPL BILD FERT OFIS Avg/total

0.96 0.22 0.54 0.61 0.79 0.72 0.82 0.92 0.71 0.00 0.00 0.00 0.76

0.50 0.50 0.38 0.70 0.83 0.85 0.75 0.56 0.56 0.00 0.00 0.00 0.73

221

F1-score

Support

0.66 0.31 0.45 0.65 0.81 0.78 0.78 0.70 0.63 0.00 0.00 0.00 0.74

52 8 39 88 290 203 224 100 18 4 3 3 1032

The results are given in the Table 14.2. From the results given above, we can see that the F1 score is low on the classes with small amount of data. This happens because we have biased and sparse data and the NN cannot be properly trained.

14.7 Conclusion In conclusion, it can be noted that inquiry classification in russian can be performed pretty good with default TF-IDF and with minimal preprocessing if dataset contains biased class and with limited number of data. It can be seen that LinearSVC and Naive-Bayes performs very good with TF-IDF. Whereas algorithms like Decision tree and Random Forest perform poorly. Moreover, it can be noted that with the sparse data, it is less efficient to apply neural networks. Also, in the process of preprocessing we have observed that libraries like pymorphy2 are very good for stemming russian language. Although, it should be noted that it cannot be used with words with spelling errors. This work is important since the analysis of real data and it coincides with expected results of algorithms.

14.8 Future Work This model can also be improved by working with outliers. Most outliers are result of misclassification of largest class of its closest classes. This can be improved by building Doc2Vec library and observing whether if those outliers can form clusters

222

G. Kessikbayeva et al.

so we can detect them before classification. Doc2Vec is required in order for us to use the nature of those documents and seeing if they can be accurately identified. Using this kind of embedding is due to the fact that Doc2Vec gives us sort of meaning of those documents whereas TF-IDF is just frequencies of words in those documents. Additionally, better Named Entity Removal can be built and applied to this model which can improve accuracy specific for Kazakhstan locations and names. Moreover χ 2 (chi-square) can be added to TF-IDF in order to improve results as it was proposed in [11].

References 1. K. Shet Tilve Amey, N. Jain Surabhi, Text classification using naive bayes, svm and pos tagger. Int. J. Ethics Eng. Manag. Educ. (2017) 2. A. Borodkin, E. Lisin, W. Strielkowski, Data algorithms for processing and analysis of unstructured text documents. Appl. Math. Sci. 8(25), 1213–1222 (2014) 3. L. Breiman, Mach. Learn. Random forests 45(1), 5–32 (2001) 4. A. Hassan, A. Mahmood, Deep learning for sentence classification, pp. 1–5 (2017). https://doi. org/10.1109/LISAT.2017.8001979 5. A. Ittoo, L.M. Nguyen, A. Bosch, Text analytics in industry: challenges, desiderata and trends. Comput. Ind. 78, 96–107 (2016) 6. T. Joachims, Text categorization with support vector machines: learning with many relevant features, in European Conference on Machine Learning (Springer, 1998), pp. 137–142 7. D.G. Kleinbaum, Introduction to logistic regression. Logistic Regression. Statistics in the Health Sciences (Springer, Berlin, 1994) 8. K. Kowsari, D. Brown, M. Heidarysafa, K.J. Meimandi, M. Gerber, L. Barnes, HDLTex: hierarchical deep learning for text classification (2017). https://doi.org/10.1109/ICMLA.2017.0134 9. M. Korobov, Morphological analyzer and generator for Russian and Ukrainian languages, in Analysis of Images, Social Networks and Texts, eds. by M.Y. Khachay, N. Konstantinova, A. Panchenko, D.I. Ignatov, V.G Labunets. Communications in Computer and Information Science, vol. 542 (Springer International Publishing, 2015), pp. 320–332 10. E. Loper, S. Bird, Nltk: the natural language toolkit, in Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 , ETMTNLP’02 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2002), pp. 63–70 11. M. Mowafy, A. Rezk, H.M. El-bakry, An efficient classification model for unstructured text document. Am. J. Comput. Sci. Inf. Technol. 6 (2018). https://doi.org/10.21767/2349-3917. 100016 12. H. Tibshirani, R. Tibshirani, J. Friedman, The Elements of Statistical Learning (Springer, New York, 2009) 13. G.I. Webb, Naive bayes, Encyclopedia of Machine Learning (Springer, Berlin, 2015), pp. 713– 714

Chapter 15

An Heuristic Scheme for a Reaction Advection Diffusion Equation M. R. Amattouch and H. Belhadj

Abstract In this paper we present an alternative meshless method to solve a reaction advection diffusion equation. To reduce the cost of computations, we first use a new optimized domain decomposition with differential fractional derivative condition on the interface between sub-domains. We prove that our boundary problem is equivalent to solve in parallel and separately new defined boundaries problems on each sub domain. This is relevant because the method will operate in just one iteration on each sub-domain and there is no need of transmission condition in the interface to be down and it processes fast without using a preconditioner to accelerate it. We next convert on each sub-problem the equations into a global optimisation problems with equality constraints taking into account the boundary conditions. An evolutionary algorithm (EA) is used to search for the global solution. Several test-cases of analytical problems illustrate this approach and show the efficiency of the proposed new method.

15.1 Introduction Many equations in the field of fluid mechanics, bio-chemistry and engineering could be modified to a linear reaction advection diffusion equation type. For instance, the implicit scheme of the Navier stokes equation or the linear and non linear models of turbulence could be linearized (we propose a modified fixed point method for linearization: articles ([1–3]) to these types of equations. Thus, the purpose of our work is to propose a method that is accurate with less cost for these equations. A M. R. Amattouch (B) Department of Mathematics, Mechanics, Cryptography and Numerical Analysis, Faculty of Sciences and Techniques, University of Hassan II, BP.146 Mohammedia, Morocco e-mail: [email protected] H. Belhadj Department of Mathematics, Faculty of Sciences and Techniques, University of Abdelmalek Essaadi, BP.416 Tangier, Morocco e-mail: [email protected]

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_15

223

224

M. R. Amattouch and H. Belhadj

general equation of a reaction advection diffusion equation could be: 

− μu = f (x, y) on  cu + a ∂∂ux + b ∂u ∂y u = g on ∂

(15.1)

 is a continuum domain. μ is the viscosity, c is a term of reaction that is arbitrary variable (it is the general case of the implicit Euler scheme for the advection diffusion equation) and a, b are respectively the normal and tangent advection of the equation. Theses types of equations can be solved analytically; however in most practical cases, it’s complicated to find analytical solutions for domains with complex geometries. An alternative to analytical approach is the use of mesh based approximation like finite element methods ([4–6]), finite difference methods, and finite volume methods ([7, 8]). Nevertheless, theses methods are mostly used in practice they still have some drawbacks, for example, difficulties in geometries generation and mesh distortion for complex geometries. This leads to establishing meshless methods ([9–11]). Unlike the finite element and volume methods, this methods uses only a distribution on nodes to obtain an approximate solution. The drawback of this methods is the difficulties in dealing with boundary conditions. An alternative methods ([12–14]) that we use in this article is to convert the boundary problem into an optimization problem while a metaheurestic or an evolutionary algorithm is employed to search the optimum solution. In this paper we use a new type of functions adapted to boundary conditions we treat. we show that this approach is more efficient than using polynomial or sig-log functions. To reduce computational time cost and make our GA feasible, we first use a domain decomposition method (DDM) to split our Boundary equations on separate and parallel sub-problems with less resolution cost in time. The principle of DDM methods is to change a problem with large size to a decoupled consecutive subproblems of small size that could be solved in parallel. The gain in time is significant. After creating parallel software in the seventies, researchers have developed many domain decomposition methods ([15–23]) adapted to parallel computing. In our work we tried to perform the known optimized domain decomposition method of two order (OO2)introduced in [23]. This method is the fastest method knew in the DDM history. The OO2 method uses a specified differential interface condition between sub-domains in two order right the tangential direction and of one order in the normal direction, the method is accelerated by solving the interface condition with a preconditionneer (Krylov type, GMRES, Big Stab...).In this work we propose new differential equations in the interface somehow, our DDM converges in one iterations avoiding the use of preconditionneer. The condition of transmission in the interface between two subdomains are fractional derivative conditions. In addition The OO2 method need a transmission condition to perform iteratively which causes significant time loss due to transmission issues in the Von Neuman architecture of computers. In the first part of this work, we present the new optimized decomposition domain, we prove the convergence of our method, then we present the GA method to solve

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation

225

partial differential equations and finally we give some numerical results that prove the efficiency of the proposed methods.

15.2 New Optimized Domain Decomposition Methods 15.2.1 Optimized Domain Decomposition Methods In what follows we present the principle of OO2 and our proposed method. We take for example the case of two sub-domains decomposition. Consider the two sub-domains 1 and 2 with an interface  (see Fig. 15.1)

15.2.2 The OO2 Method p

p

We built two sequences u 1 and u 2 as follows: Considering two initials functions u 01 and u 02 defined on 1 (respectively 2 ), we p p compute then u 1 and u 2 by solving the problems: ⎧ ⎪ ⎨

p+1

L(u 1 ) = f (x, y) on 1 p+1 u1 = g on ∂ ⎪ ⎩ B (u p+1 ) = B (u p ) on  1 1 1 2

Fig. 15.1 splitting of the domain in two sub-domains

(15.2)

226

M. R. Amattouch and H. Belhadj

⎧ p+1 ⎪ ⎨ L(u 2 ) = f (x, y) on 2 p+1 u2 = g on ∂ ⎪ p+1 ⎩ B (u ) = B (u p ) on  2 2 2 1

and

where: L(u) = cu + a and

∂u ∂u +b − μu ∂x ∂y

∂ u B1 (u) = ∂u − C1 u + C2 ∂u − C3 ∂τ 2 ∂n ∂τ ∂u ∂2u a B2 (u) = − ∂u − (C − )u + C − C3 ∂τ 1 2 2 ∂n μ ∂τ

(15.3)

(15.4)

2

(15.5)

n and τ are the normal and the tangent on 1 . B1 , B2 are the artificial transmission condition on the interface  and L is the reaction advection diffusion operator of Eq. 15.1. Because of the Fourier analysis we show that the rate of convergence of our algorithm in the fourier way is (see [23] for more explanation) ρ(C1 , C2 , C3 , k) = ( where

λ− (k) − C1 + ikC2 + C3 k 2 2 ) λ+ (k) − C1 + ikC2 + C3 k 2

 ¯ a 2 + 4cμ − 4iμbk + 4k 2 μ2 a+ λ (k) = 2μ ¯ +

(15.6)

(15.7)

are the eigenvalue of the steklov operator [12]. We don’t have convergence of the OO2 method for any coefficients C1 , C2 and C3 , but the convergence of the method is ensured if we have max|k|< πh ρ(C1 , C2 , C3 , k) < 1 ([12]) and the convergence is optimized. However a lot of numerical tests cases show that the method is not so fast for high viscosity μ (see [1] for explanation).

15.2.3 New Domain Decomposition Method with Two Iteration (AlgDF) The aim of this section is to provide a domain decomposition with fractional derivative transmission condition (AlgDF) in the interface between sub-domains. The goal of this procedure is to create a method for the reaction advection diffusion equation with a rate of convergence equal to zero, which means that our domain decomposition method is converging only by two iteration to the solution of problem (15.1). This method is relevant comparing to OO2 method which could make more than two iterations to converge to the solution of our problem.

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation

227

In the next we also consider the splitting of our domain on two sub-domains (The general case of several domain splitting is constructed and treated by the same way). We take the notations of the Sect. 15.2.2. Case Where a = b = 0

15.2.3.1

In the case a = b = 0, L is the operator that can represent the heat equation, it’s eigenvalues λ+ and λ− are : 

+

λ = Notice that:



 c c − 2 + k et λ = − + k2 μ μ

c + k2 = μ



k +i

c × μ

 k −i

(15.8)

c μ 1

(15.9) 1

This term is none other than the Fourier transform of e−π μ y ∂∂ y2 eπ μ y ∂∂ y2 . This lead us to considerate the method with the two iterations: considering an initial function u 0 defined on , we consider this next two iterations to build the solution of problem (15.1) on 1 and 2 : We solve c

L(u 1 ) = f on 1 u 1 = g on ∂ ∂u 0 ∂n

1

1 2

+ e−π μ τ ∂∂τ2 eπ μ y ∂ ∂τu 0 = c

c

1

c

(15.10)

1 2

+ e−π μ τ ∂∂τ2 eπ μ y ∂ ∂τu 1 on  c

∂u 1 ∂n

c

Then we resolve the next iteration L(u 2 ) = f on 1 u 2 = g on ∂ 0 − ∂u ∂n

+

1

c e−π μ ∂∂τ2

1

c 2 eπ μ y ∂ ∂τu 0

=

0 − ∂u ∂n

+

1

c e−π μ ∂∂τ2

1

c 2 eπ μ y ∂ ∂τu 0

(15.11) on 

1

u 2 is the Caputo fractional derivative of the function u (for more detailed definitions of this operator, see for example [23]): 1

∂2u =

1 ( 21 )



x

(x − t)− 2 u(t)dt 1

(15.12)

0

√  is the gamma function: ( 21 ) = π . As we construct the Fourier rate of convergence for the OO2 method, we prove by mean of the Fourier transform that the rate of convergence is null which mean that this specific domain decomposition method, operate only with two iterations to

228

M. R. Amattouch and H. Belhadj

build the solution of our heat equation. To show that the rate of convergence is null we use the results bellow: 1 √ 2 F( ∂∂τ f ) = ikF( f (x)) F(eaπ y f ) = F( f )(k − ia)

15.2.3.2

(15.13)

General Case

In this case, a and b are arbitrary coefficients. The eigenvalues of the reaction advection diffusion operator L calculated in [23] are: ±

λ =



  4(μk 2 + ibk + c) − a 2 ) a a = ± k 2 + ibk + c + ( )2 2μ 2μ μ

(15.14)

As we have done for the case a = b = 0, we build a domain decomposition method with fractional derivative condition transmission in the interface between the two sub-domains AlgDF that provide the solution of our problem solution (15.1) only by two iterations: there exist two real numbers α and β such that:   √ a k 2 + ibk + c + ( )2 = k + iα × k + iβ μ

(15.15)

The two iterations of the domain decomposition method that we propose is then: L(u 1 ) = f on 1 u 1 = g on ∂

∂u 0 ∂n

1

+ e−πατ ∂∂τ2 eπβy ∂

1 2 e(π(β−α)) u 0

∂τ

=

∂u 1 ∂n

1

+ e−πατ ∂∂τ2 eπβy ∂

1 2 e(π(β−α)) u 1

∂τ

(15.16) on 

then, we solve the next iteration problem: L(u 2 ) = f on 1 u 2 = g on ∂

1

0 − ∂u + e−πα ∂∂τ2 eπβy ∂ ∂n

15.2.3.3

1 2 e(π(β−α)) u 0

∂τ

1

2 = − ∂u + e−πα ∂∂τ2 eπβy ∂ ∂n

1 2 e(π(β−α)) u 2

∂τ

(15.17) on 

Convergence of the AlgDF

The AlgDF consist of solving in parallel the two iterations:

∂u 0 ∂n

+

1

e−πατ ∂∂τ2

1

L(u 1 ) = f on 1 u 1 = g on ∂

2 (π(β−α)) eπβy ∂ e ∂τ u 0

=

∂u 1 ∂n

+

1

e−πατ ∂∂τ2

1

2 (π(β−α)) eπβy ∂ e ∂τ u 1

(15.18) on 

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation

229

and

0 − ∂u ∂n

+

1

e−πα ∂∂τ2

1

L(u 2 ) = f on 1 u 2 = g on ∂

2 (π(β−α)) eπβy ∂ e ∂τ u 0

=

2 − ∂u ∂n

+

1

e−πα ∂∂τ2

1

2 (π(β−α)) eπβy ∂ e ∂τ u 2

(15.19) on 

where u 0 is an initial function. Theorem The solutions of equations (15.18) and (15.19) are the restriction of the solution of equation (15.1) on 1 and 2 respectively. Proof We explain the demonstration in the case of which the interface is the line: − → − →  : x = 0 (n = i and τ = j ) we get by substraction [L(u 1 − u) = 0 and L(u 2 − u) = 0 B1 (u 1 − u) = B1 (u 0 − u) and B2 (u 2 − u) = B2 (u 0 − u)] Where u is the exact solution of the problem (15.1) and ∂2 ∂ 2 e(π(β−α)) . ∂. + e−πατ eπβy ∂n ∂τ ∂τ 1

B1 (.) =

1

∂2 ∂ 2 e(π(β−α)) . ∂. + e−πατ eπβy ∂n ∂τ ∂τ 1

B2 (.) = −

(15.20)

1

(15.21)

If ω1 and ω10 is the Fourier transform of u 1 − u and u 0 − u respectively on 1 , we deduce: ∂ 2 ω1 ∂ω1 − ibkω1 − μ 2 + k 2 μω1 = 0 cω + a ∂x ∂x k is the Fourier frequency. If we take a solution in the form of ω = ω(0, k)eλx we obtain: c + aλ − ibk − μλ2 + μk 2 = 0 It’s an equation of two order which have two solutions:  ¯ a 2 + 4cμ − 4iμbk + 4k 2 μ2 a+ λ (k) = 2μ ¯ +

+

since lim ω1 (x, k) = 0 we take ω1 (x, k) = ω1 (0, k)eλ x (remark that λ+ > o and x→−∞

λ− < o) by the same way if ω2 and ω0 is the Fourier transform of u 2 − u and u 0 − u on 2 , − we have: ω2 (x, k) = ω2 (0, k)eλ x .

230

M. R. Amattouch and H. Belhadj

The two conditions: B1 (ω1 (0, k)) = B1 (ω10 (0, k)) and B2 (ω2 (0, k)) = B2 (ω20 (0, k)) lead to:

and

0 = (λ− − λ− )ω10 (0, k) = (λ− − λ+ )ω1 (0, k) 0 = (λ+ − λ+ )ω10 (0, k) = (λ+ − λ+ )ω2 (0, k)

we get: ω1 = 0 and ω2 = 0 and then the solution u 1 = u in the Fourier way on 1 and u 2 = u in the Fourier way on 2 and this gives the result of our theorem. Remark The problem (15.18) and (15.19) has one unique solution (we can prove it by Lax Milgram theorem ([24–27]) that make our method consistent and even when the coefficient c is zero or the boundary condition of problem (15.1) is a Neumann condition instead of the dirichlet condition.

15.3 Evolutionary Algorithm for PDE After decomposing the problem (15.1) in subproblems, we present in this section our heuristic method to solve the new boundary problems. A subproblem can be written as: L(u) = 0 on i B j (u) = 0 on ∂i f or j = 1, ...4

(15.22)

L is the reaction advection diffusion equation and B j are the new differential conditions in the interface and the frontier of the subdomain i , i=1,..Number of subdomains. We consider a series of nodes (Ml )l=1,..,N P T where (Nll )ll=1,..,M P T are in the frontier of our domain (Fig. 15.2). We look for an approximate solution u(M) of problem (15.22) in the form: u app (M) = P(M) + Q(M)(x − xm )αi + R(M)(x − xm )βi

(15.23)

P,Q and R are respectively polynomial of degree NPT, NPT-1 and NPT-1 and Q and haven’t a coefficient of order 0. The coefficients of these polynomials have to be determined by an optimization procedure. The global optimum is clearly an approximation to the solution of the problem. αi , βi , are the fractional coefficients determined in the differential boundary condition of the subdomain i in the interface between sub-domains. We have already computed this coefficients in Sect. 15.2.3.2.

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation

231

Fig. 15.2 Nodes in the computational domain

xm is the low point in the domain . We can compute easily L(u app ), B j (u app ) and then the objective function: ε=

N PT M PT 4 1 1 (L(u app (Ml ))2 + (B j (u app )(Nll ))2 N P T + 1 l=1 M P T + 1 ll=1 j=1

(15.24) This objective function is continuous and differentiable with high dimension. The use of a differentiable evolutionary algorithm is suitable to optimize this function. We perform the known Differential evolution algorithm (DEA) cited in ([13, 14]) to fund the global optimum. Instead of the particle Swarm optimization algorithm (PSO), this methods have the advantage to converge quickly to the solution and uses less parameters (which mean less time consuming). To minimize our objective functions (to determine the suitable coefficient of the polynomials P,Q and R), we add new parameters to ensure the accuracy and the quick convergence of the algorithm (We used some strategies of the PSO and Lipschiz algorithms to modify the DE algorithm). Our evolutionary algorithm is described by the following steps: 1. Mutation: For a population vector xi,G , i = 1, 2, 3, ...N p where G is a generation number and NP i a population size, a mutant vector for the rand scheme is produced using (15.25) vi,G+1 = xr1 ,G + F × (xr2 ,G − xr3 ,G ) where r1 , r2 and r3 are random integer indices of the population vectors (r1 , r2 , r3 ∈ {1, 2, 3, ..., N p}). The current to best scheme is produced using: vi,G+1 = xi,G + λ × (xbest − xi,G ) + F × (xr1 ,G − xr2 ,G )

(15.26)

xi,G is the current population of last generation, xbest is a population vector that has the best (minimum) fitness in the previous generation, F is a scaling factor

232

M. R. Amattouch and H. Belhadj

(F ∈ [0, 2]) that controls amplification of difference vector, and λ is an additional control variable (λ ∈ [0, 1]) that control influence of the best population vector. 2. Crossover: In the crossover operation, a target vector (xi,G+1 ) will be bred with a mutant vector vi,G+1 leading to a trial vector (u i,G+1 ). The probability of crossover(PC ∈ [0, 1] will be predefined. Once the operator is revoked, there can be an exponential crossover Algorithm 26. Algorithm 26 Exponential crossover algorithm Input x,v, CR Output u Set u=x Randomly choose k ∈ {1, 2, .., d} Set L = 1, Generate a uniform random number rand ∈ [0, 1] while rand ≤ C R or L ≤ d do u k = vk , L = L + 1 k = k + 1, if k > d, k = 1 end while

3. Selection: Before adding trial vectors into a new generation population, the selection operator will compare the fitness function value of trial vector to that of its target vector. Then the new vector that has a lower fitness function value is selected into the next generation population. 4. Termination Criteria: The calculation will be terminated when maximum function evaluation is reached or allowable minimum fitness function and constraint values are reached. The coefficients CR (which specify if crossover is constant), F, λ (it ensures communication between chromosomes and control the new mutant vector v) and the maximum function evaluation are user specified. In our simulations we often took F = 0.6, C R = 0.7, λ = 0.9 and the termination criteria is 500. Eventually the performance of the DEA algorithm is sensitive to the value of these coefficients and the optimum parameter to use remains under research.

15.4 Numerical Results To show numerically the efficiency of our new DDM method compared to the OO2 method, we consider Eq. 15.1 on a squared domain ( = [0, 1] × [0, 1]), where an exact solution u exact is chosen someway the associated terms g and f of Eq. 15.1 are determined such that u exact is the solution of the problem (15.1) for given coefficient c, a, b and μ that we change (They are fixed with the exact solution to define an equation of type (1)).We take for example: u exact = 2sin(2π x)cos(2π y)

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation

233

Table 15.1 On the right CPU time for the OO2 with six iterations and AlgDF for different coefficient of the Eq. 15.1 Differents coefficients OO2 method AlgDF c = 10 a = 1 c = 10 a = −10 c = 10 a = sin(x y)

b=0 b=2 b = −cos(x y)

μ = 1 h = 0.001 2437.621 μ = 2 h = 0.001 3342.681 μ = 0.1 h = 0.1 9.713

56.2 91.0 49.6

We split our domain into four squared sub-domains and we apply the described domain decomposition methods OO2 and AlgDF (one can split the domain to the minimum number of computer unit of execution). We solve Eqs. 15.2, 15.3, 15.18 and 15.19 by the same heurestic method. To implement our codes in Matlab and Freefeem++ or C++ Language, we work our programs with an intel(R) processor core 2 duo for our computer. We use the parallel package of matlab (parfor, gather, multithread...) and openMP for C++ and we make the discretized (algebraic) equations from problem (15.1) or (15.18), (15.19) on a vectorial and optimal form. The Table 15.1 shows the time that takes the OO2 method and the AlgDF to resolve each problem (15.1) for some given coefficients c, a, b, μ. h is the maximum difference between nodes. The table shows that the proposed method AlgDF is too fast and optimal in time compared to classical OO2 method For our academic test for low mesh h. The same results is founded for several other academic test cases. the whole realized tests show that AlgDF goes faster than OO2 method to accurate method. Notice that for low mesh h (h = 0.1) OO2 method is faster than AlgDF and that’s expected because the transmission in the OO2 method is low: consume less time to transmit this information during the algorithm. Nevertheless, when the the mesh is high, the transmission condition is heavy and consume time in the execution process. The AlgDF don’t use transmission conditions to process which is an asset. Furthermore, the comparison between the two methods in complex geometries is unfeasible because the OO2 method tend to diverge or produce low accuracy in the solution approximation. A complex geometry of the domain  that illustrate when the OO2 suffer from accuracies is the lemnescate domain with one side. Another complex geometry to consider  is the domain defined by the polar equation:r (θ ) ≤ cos(2θ ) + 1.5 − sin 2 (2θ ). The Table 15.2 shows the infinite error between the exact solution and the approximate solution using OO2 and AlgDF method. The results of the table shows 15.2 that the proposed method is accurate even with one iteration. It take for the OO2 method at least 20 iterations to have a good accuracy. We notice that the OO2 method takes a lot of time to converge for a high viscosity μ (We studied this case in [1]) but we don’t have this issue using our AlgDF, Figs. 15.3 and 15.4 illustrate this point. Indeed, the numerical rate of convergence for the OO2 method is increasing and became nearest to one when the viscosity increases. The

234

M. R. Amattouch and H. Belhadj

Table 15.2 On the right, the infinite norm between the exact solution and the approximated solution by the OO2 with sex iteration and AlgDF for different coefficient of the Eq. 15.1 Differents coefficients OO2 method AlgDF c = 10 a = 1

b=0

c = 10 a = −10

b=2

c = 10 a = sin(x y)

b = −cos(x y)

μ=1 h = 0.0001 μ=2 h = 0.0001 μ = 0.1 h = 0.0001

3.4 ∗ 10−4

7.8 ∗ 10−8

4.5 ∗ 10−5

3.52 ∗ 10−10

1.72 ∗ 10−8

6.52 ∗ 10−14

Fig. 15.3 Logarithm of approximate error of OO2 and AlgDF when fixing c = 1, a = −1 b = −2 and varying the parameter μ

Fig. 15.4 Logarithm of approximate error of OO2 and AlgDF when fixing c = 0.1, a = 1 b = −2 and varying the parameter μ

advection velocity don’t have influence on the convergence of the OO2 method for non null coefficient c (Figs. 15.5 and 15.6) but tend to one when c = 0 (Figs. 15.7 and 15.8). In addition the OO2 method diverges when cμ < 0. Otherwise, the numerical error of the AlgDF remains nearly constant when varying viscosity, advection or the reaction c as shown in the figures, it depends only on the mesh h. For the next simulation, the evolutionary algorithm parameter’s we take a chromosome size of 30 (eventually between 30 and 50 elements in all tests), a stochastic uniform selection function and a generation of 50 with 10−13 tolerance. Next we give some examples of results of the proposed algorithm corresponding to some test-cases of analytical solutions and the error between the exact solution and approximate solution. The exact solution is on the left, the approximate solution is on the center and the error between this two solution is on the right. These simulations are

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation Fig. 15.5 Logarithm of approximate error of OO2 and AlgDF when fixing c = 1, b = −2, μ = 1 and varying the parameter a

Fig. 15.6 Logarithm of approximate error of OO2 and AlgDF when fixing c = 10, a = cos(θ, b = sin(θ), μ = 0.1 and varying θ

Fig. 15.7 Logarithm of approximate error of OO2 and AlgDF when fixing c = 0.01, a = cos(θ), b = sin(θ), μ = 1 and varying θ

Fig. 15.8 Logarithm of approximate error of OO2 and AlgDF when fixing c = 0, a = 2cos(θ), b = 2sin(θ), μ = 3 and varying θ

235

236

M. R. Amattouch and H. Belhadj

Fig. 15.9 Case1: c = 1 a = 2 b = −1 μ = 2 and h = 0.01. In the left u exact = (x − 0.5)2 (y − 0.5)2 , The approximate solution u appr oximate in the center and the error between u appr oximate and u exact

Fig. 15.10 Case2: c = 1 a = 2 b = −1 μ = 2 and h = 0.01. In the left u exact = u = ex p(x y(1 − x)(1 − y) − 1), The approximate solution u appr oximate in the center and the error between u appr oximate and u exact

xy Fig. 15.11 Case3: c = x+y+1 a = 0.1 b = 3 μ = 5 and h = 0.01. In the left u exact = ln(1 − x y(1 − x)(1 − y)), The approximate solution u appr oximate in the center and the error between u appr oximate and u exact

made in the case of spliting the domain in 4 sub-domains with different coefficients c, a, b, μ given in figures and h = 0.01. Case1 (Fig. 15.9): u = (x − 0.5)2 (y − 0.5)2 , the computed infinite norm of the error between exact and approximate solution is in order of 10−9 Case2 (Fig. 15.10): u = exp(xy(1 − x)(1 − y) − 1), the computed infinite norm of the error between exact and approximate solution is in order of 10−6 Case3 (Fig. 15.11): u = ln(1 − xy(1 − x)(1 − y)), the computed infinite norm of the error between exact and approximate solution is in order of 10−10 Case4 (Fig. 15.12): u = exp(x*y) − xy, the computed infinite norm of the error between exact and approximate solution is in order of 10−6

15 An Heuristic Scheme for a Reaction Advection Diffusion Equation

237

Fig. 15.12 Case4: c = −0.5 a = 0.5 b = −0.5 μ = 100 and h = 0.01. In the left u exact = ex p(x ∗ y) − x y, The approximate solution u appr oximate in the center and the error between u appr oximate and u exact

15.5 Conclusion In this work we have developed a new optimized domain decomposition algorithm applied to a reaction advection diffusion equation. We firstly have computed rate of convergence of this method using the Fourier transform. The fundamental result is that our method need one iteration, it doesn’t need nor conditions of transmission nor optimization time for computing coefficient (like C1 , C2 , C3 ) for the transmission conditions in comparison with global calculation using classical solvers (OO2, Robin,...). We converted the equations to an optimization problem with constraints and solved this equations by an heuristic method. this method is better adapted to the fractional differential boundary equation instead of a classical methods(finite element or finite volume methods...) . One difficult issue of our method is the calculation of the Fourier transform of non constant and discontinuous coefficients, but in matlab, there is a toolbox to do this job. Finally we have presented several test-cases to show the efficiency of this approach. As perspective of the present work, we can study the following ideas: • Perform the method to have one iteration domain decomposition for non linear partial differential equations. We are thinking use some operator decomposition or preeconditionner • Perform this method to the equation of Turbulence (The method could be applied by the same way for boundary problem with vector equations

References 1. M.R. Amattouch, H. Belhadj, Combined optimized domain decomposition method and a modified fixed point method for non linear diffusion equation. Appl. Math. Inf. Sci. 11(1), 201–207 (2017) 2. M.R. Amattouch, N. Nagid, H. Belhadj, A modified fixed point method for The Perona Malik equation. J. Math. Syst. Sci. 7, 175–185 (2017) 3. M.R. Amattouch, N. Nagid, H. Belhadj, Optimized domain decomposition method for non linear reaction advection diffusion equation. Eur. Sci. J. 12(26) (2016)

238

M. R. Amattouch and H. Belhadj

4. E. Onate, J. Rojek, M. Chiumenti, S.R. Idelsohn, F.D. Pin, R. Aubry, Advances in stabilized finite element and particle methods for bulk forming processes. Comput. Methods Appl. Mech. Eng. 195(48–49), 6750–6777 (2006) 5. J. Nam, M. Behr, M. Pasquali, Space-time least-squares finite element method for convectionreaction system with transformed variables. Comput. Methods Appl. Mech. Eng. 200(33–36), 2562–2576 (2011) 6. S.R. Idelsohn, E. Onate, F.D. Pin, N. Calvo. Fluid-structure interaction using the particle finite element method. Comput. Methods Appl. Mech. Eng. 195(17–18), 2100–2123 (2006) 7. X. Nogueira, L.C. Felgueroso, I. Colominas, H. Gomez, Implicit large eddy simulation of nonwall-bounded turbulent lows based on the multiscale properties of a high-order finite volume method. Comput. Methods Appl. Mech. Eng. 199(9–12), 615–624 (2010) 8. A.G. Malan, O.F. Oxtoby, An accelerated, fully-coupled, parallel 3D hybrid finite-volume fluid-structure interaction scheme. Comput. Methods Appl. Mech. Eng. 253, 426–4 9. S.N. Atluri, T. Zhu, The Meshless Local Petrov-Galerkin (MLPG) approach for solving problems in elasto-statics. Comput. Mech. 25(2), 169–179 (2000) 10. K.M. Liew, X. Zhao, A.J.M. Ferreira, A review of meshless methods for laminated and functionally graded plates and shells. Compos. Struct. 93(8), 2031–2041 (2011) 11. V.P. Nguyen, T. Rabczuk, S. Bordas, M. Dulot, Meshless methods: a review and computer implementation aspects. Math. Comput. Simul. 79(3), 763–813 (2008) 12. J.M. Chaquet, E.J. Carmona, Solving differential equations with Fourier series and evolution strategies. Appl. Soft Comput. 12(9), 3051–3062 (2012) 13. R. Storn, K. Price, Differential evolution- a simple and efficient heuristic for global optimization over continuous spaces. Technical Report TR-95-012, International Computer Science Institute (1995) 14. X. Wang, S. Zhao, Differential evolution algorithm with self-adaptive population resizing mechanism. Math. Probl. Eng. 2013(Article ID 419372), 14 (2013) 15. A. Schwertner Charao, Multiprogrammation parallèle générique des méthodes de décomposition de domaine. Thèse de doctorat, Institut National Polytechnique de Grenoble, France (2001) 16. S.-L. Sovolev, L’Algorithme de SCHWARZ dans la Théorie de l’élasticité. Comptes Rendus (Doklady) de l’Académie des Sciences de l’URSS, IV((XIII)(6), 243–246 (1936) 17. I. Babuska, Uber Schwatzsche Algorithmen in partiallen Differntialgleichungen dermathematischen Physik. ZAMM 37(7/8), 243–245 (1957) 18. P.-L. Lions, On the Schwarz alternating method, I, in First International Symposium on Domain Decomposition Methods for Partial Differential Equations, by eds. R. Glowinsky, G.-H. Golub, G.-A. Meurant, J. Periaux (1987), pp. 1–42. SIAM, Paris, France 19. P.-L. Lions, On the Schwarz alternating method, I, in First International Symposium on Domain Decomposition Methods for Partial Differential Equations, by eds. R. Glowinsky, G.-H. Golub, G.-A. Meurant, J. Periaux (1987), pp. 1–42. SIAM, Paris, France 20. A. Toselli, O. Widlund, Domain Decomposition Methods - Algorithms and Theory. de Springer Series in Computational Mathematics, vol. 34. Springer, Berlin (2005). ISBN 3-540-20696-5 21. A. Quarteroni, A. Valli, Domain decomposition methods for partial differential equations. Oxford Science Publications, Oxford (1999) 22. M.J. Gander, Optimized Schwarz methods. SIAM J. Numer. Anal. 44(2), 699–731 (2006) 23. M.J. Gander, Optimized Schwarz methods. SIAM J. Numer. Anal. 44(2), 699–731 (2006) 24. L. Boccardo, F. Murat, J.P. Puel, Résultats d’existence pour certains problèmes elliptiques quasilinéaires. Ann. Scuola Norm. Sup. Pisa Cl. Sci., Serie IV 11(2), 213–235 (1984) 25. V. Lakshmikantham, S. Leela, J. Vasundhara, Theory of Fractional Dynamic Systems (Cambridge Academic Publishers, Cambridge, 2009) 26. V. Lakshmikantham, S. Leela, J. Vasundhara, Theory of Fractional Dynamic Systems (Cambridge Academic Publishers, Cambridge, 2009) 27. M. Benchohra, J. Henderson, S.K. Ntouyas, A. Ouahab, Existence results for fractional order functional differential equations with infinite delay. J. Math. Anal. Appl. 338, 1340–1350 (2008)

Chapter 16

Stock Market Speculation System Development Based on Technico Temporal Indicators and Data Mining Tools Zineb Bousbaa, Omar Bencharef, and Abdellah Nabaji

Abstract Artificial Intelligence has been widely used in the late few years to forecast currencies exchange rate, in addition of forecasting all other different available assets in financial markets. Many companies applied the scientific method to manage investment strategies in financial markets, this method combines a massive amount of data, computing power and financial expertise. The search for an efficient algorithm dedicated to the price exchange rate prediction of a currency is a problem of search for a global optimum, it can be solved using Metaheuristics as an optimization technique. In this work, we suggest a Gradient descent Regression algorithm optimized with Particle Swarm Optimization Metaheuristic in order to build a robust learning model. The experiments are carried out on a 17 years historical data, ranging from 30/05/2000 to 28/02/2017, and collected from different sources, many processings have been applied on our dataset before fitting the model. The experimental results of our model are compared with those obtained by: Simple Multi Linear regression that we implemented, other regression algorithms provided by the Scikit-Learn library in Python language and by RStudio in R language. The obtained results of our model are competitive. Keywords Gradient Algorithm · Particle Swarm Optimization (PSO) · Forex Market · Exchange Rate Speculation

Z. Bousbaa (B) Superior School of Technology, Cadi Ayyad University, Essaouira, Morocco e-mail: [email protected] O. Bencharef · A. Nabaji Faculty of Science and Technology, Cady Ayyad University, Marrakesh, Morocco e-mail: [email protected] A. Nabaji e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_16

239

240

Z. Bousbaa et al.

16.1 Introduction In order to achieve the best profit in the market of change, investors use different approaches and strategies according to their profiles (computer scientists, mathematicians specifically statisticians, financial experts etc.), depinding on their experience in trading and to their starting capital. For example, the adopted strategies by a finance expert will be deeper and more complex compared to the adopted strategies by ordinary investors, since they have less knowledge in finance. In the market, there are many indicators that help investors to take the decision of whether buying, selling the currencies, or waiting for the perfect moment to act. These indicators are categorized into two types: Technical indicators that are based on the technical and graphical analysis of financial markets using the currencies price history (for example, we have MACD, RSI, Moving average, etc.). The second type are fundamental indicators, their role is to qualify macroeconomic data of a country or globally an economic zone (for example, there is the unemployment indicator, Gross Domestic product, etc.). The strongest strategies are those that combine both categories of indicators. This paper will illustrate different Data Mining algorithms, that are used in the literature for exchange rate speculation, they can be divided into four major categories: Black box algorithms, they include random inputs and arbitrary chosen treatments, like Neural Nets and Support Vectors Machines. The second category are the rule discovery algorithms like decision trees and genetic algorithms, they are often used for classification problems, where we have the famous three classes: buy, sell and wait. The third category contains the different types of regression algorithms: linear, non linear and semi linear regression, all of them have proved a good performance on time series datasets. The last category are hybrid methods, they combine the results of multiple models for a better performance, for example we can mention Ensembling techniques, autoregressive algorithms and Hybrid based neural network algorithms. Our dataset has been collected to fit an efficient regression model, that predicts the price of the Euro against the American Dollar, it consists of historical price of three major currency pairs (EUR/USD, GBP/USD and JPY/USD), fundamental factors that have an impact on the market: we have chosen gold and petrol historical data, 12 technical indicators calculated using the three currency major pairs historical data: we used 14-Day RSI, 14-day Stochastic Oscillator, 14-day StochRSI, MACD, ADX, 14-day williams %R, 20-day CCI, ATR, High-Low Index, Ult Osc, 12-day ROC, Bull Power and Bear Power. Several treatments and calculations were done over the dataset, in order to unify its granularity level, as a results each line represent one day, the final dataset is used as an input for the algorithm, in the end the data had a daily granularity. After the literature review, data collection and preparation, we began the experimental study, where we implemented and tested many algorithms on our dataset, in order to get a comparative study and deduce the most performant algorithms for our case of study. We concluded that regression methods seem to be the most adequate especially for the short term datasets, but we had few problems with them that we tried to solve using metaheuristics. We chose Metaheuristic optimization technique, that is inspired by natural systems, because they made a success

16 Stock Market Speculation System Development Based on Technico Temporal …

241

in the literature. We suggest a Simple Multi-Linear Regression model, optimized using to Particle Swarm Optimization Metaheuristic (PSO), for its implementation we were inspired with the paper [6], this technique helped to minimize the regression error, to avoid the local optimum, or optimize the Multi-Linear Regression weights initialization problem, because before using PSO, the random values weights are initialized with had an effect on the final results, and also to manage weights update in order not to get out of the search space of our regression function weights. We are going to study the obtained results on experimental results. The remainder of this paper is divided into four main sections: The second section will give the state-of-art of the algorithms that proved their performance for exchange rate speculation. In the third section, we will show our proposed algorithm and explain how it works. The fourth section will illustrate the experimental results of several algorithms versus our proposed algorithms on three datasets. Finally, the last section summarizes the main conclusions and the perspectives of this study.

16.2 Related Works For the categories of used algorithms in financial markets speculation we mention Black box algorithms, they include neural networks, their purpose is to replicate the functioning of the biological neuron the best possible way, a neuron is a tiny structure that treats the coming nerve impulses (inputs), each one according to its relative importance, and which emits one (and only one) output signal. One of the papers that had simulated this behavior was [14], they made tests on the prediction capacity of neural networks on the daily exchange rate data of EUR/USD, GBP/USD and USD/ JPY, in order to use them as technical analysis methods of foreign market. We also have [15] that treated artificial neural networks for forecasting and investing in FOREX, and provide information on how to construct an artificial neural network with a particular respect of its parameters, in order to obtain the best possible prediction capacity. The second algorithm that belongs to black box algorithms is Support Vector Machine (SVM), they are a set of supervised learning techniques that help to resolve discrimination and regression problems. SVM is a generalization of linear classifiers, they function by transforming none linearly input data into a wide dimension space of variables, and then doing a linear regression on the transformed space. The result of the process is a nonlinear regression on a reduced space, [10] had introduced SVM to Forex market, in addition of exploiting the effect of kernel functions and regularization parameters on currencies transactions. The second category includes rules discovery methods, for example we have decision trees that are easily interpretable, it can find rules by giving a wide set of indicators and price movement historical data, the indicators and their values that browse the price movement in the best possible way, We take as an example [11] that presents a new approach of generating a set of Forex Market real world data, and transform them into a decision table, each object consists of conditional attributes, in our case, indicators values are either buy, sell or wait. The second goal of [11] was to test the classification quality of the known decision tree algorithms like CART and C4.5. We have in addition of decision

242

Z. Bousbaa et al.

trees genetic algorithms, they optimize decision trees results, by imitating the natural selection process and creating unique daughter strategies containing a combination of best parents strategies, with a chance of obtaining random mutations, we have as an example [17], it presents the design of an overall system based on the genetic algorithm, to predict the trend of weekly prices in the Sao Paulo stock index, and evaluate the performance of the suggested methods, we also have [18] that presents an exchange system based on a heuristic using Forex data, it is developed using popular technical indicators based on genetic algorithms. Regression methods are statistical methods that are widely used in order to analyze the relationship between variables, for example, we have gradient algorithm that is often used in finance research field, but most of researches in this fields are private, as a public research we can mention the paper [13] that used conjugate gradient on the network nodes, and found that multiple linear regression weight initialization provides a good starting point, and improves the trend price forecasting results. We also have partial least squares regression that is often applied on Forex data in combination of other algorithms, for example we take [12] that took the Partial Least Squares Regression, Support Vector Machine and Decision Tree results as an input for Cuckoo search metaheuristic algorithm, the proposed system made a success in forecasting exchange rate of American dollar in European euro between January 2014 and January 2016. Another type of regression that made a success was Regression on complex data. Non parametric, semi-parametric, and nonlinear models are the tree models categories that offer flexible approaches for complex and longitudinal processes modeling. Many papers have processed this type of data. We can cite [5] that suggested a nonlinear model that allows the analysis and the learning on time series data, this model has a wavelet decomposition, and the data belongs to medicine field and has a sinusoidal characteristic. We also have [23] that was published in 1990, it made the prediction of exchange rate using a non-parametric regression, and compared its results to the simple random walk method. In another side, we have [24] that dealt with the classification of medical and financial data using an efficient algorithm in the frequency domain, for large, complex and temporary data. Metaheuristics have also made a success in our case of study: [12] that we previously mentioned, [19] that developed a new predictive model of multi-layered neural networks and made a comparative study between the proposed model and existing models in order to confirm its efficiency. Finally, we have hybrid methods as an example we have [20], it compared 4 models based on autoregressive and moving average models in combination with separate optimization algorithms. We also have [21] that made a comparative study between autoregressive models and ARIMA integrated mobile average and complex nonlinear models such as neural networks and fuzzy neurons. Another example is [22] that made a study of the prediction direction and the currency pairs movement problems in the foreign exchange market.

16.3 Proposed Search Algorithm The gradient descent algorithm is an algorithm whose role is to optimize the regression function weights, by minimizing the error function until converging to its local

16 Stock Market Speculation System Development Based on Technico Temporal …

243

minimum. Regression function f is inputs linear combinaition we can represent it using this formula: f (x) = w0 + w1 x1 + x2 x2 + · · · + wd xd = w0 +

d 

wi xi

(16.1)

i=1

For the resolution of local minima, Initialization and weights update problems that we had while using gradient algorithm, we had many options: Non parametric regression, metaheuristic optimization methods, Autoregressive regression models and Genetic algorithms. In this paper, we suggest a forecasting model based on multiple linear regression that is improved using particle swarm optimization metaheuristic, we were inspired by the paper [6] for the implementation of the solution. The function that we work on to optimize is the following error function: n 1 f (x) = (yi − f (xi ))2 n i=1

(16.2)

Our problem is a continuous category problem, where we look for values to assign to the parameters of the regression model, so that our model reproduces the observed behavior the best possible way. Since there is no algorithm that would fall in the global optimum (The best possible solution) in a finite number of iterations, the algorithms of ant colonies and particulate swarms optimization are the best solution for us, thanks to its dynamic movement over the search space, so instead of falling in the best local solution like what classic gradient descent does (See Fig. 16.1 from [41]), it moves until hopefully falling in the region where the algorithm can converge to the best global solution because we only have the local information. The distributed structure and self-organized character of PSO provide it the flexibility to evolve in a changing environment, in order to explore new regions of research space, which allows us to avoid the problem of falling into a local optimum as showed in Fig. 16.2 [6] (Fig. 16.3). Particle Swarm optimization concept can be easily described when talking from the point of view of a particle. In the beginning of the algorithm, a swarm is distributed randomly in the search space, each particle has a random speed. Then in each step: • Each particle is able to assess the quality of its position, and keep in memory its best performance, which means the best positions it had reached so far, and this performance quality. • Each particle is able to interrogate other particles (in addition of itself), in order to obtain the best performance of each one. • In each step, each particle chooses the best performance it knows, and modifies its speed based on this information. This model presents some interesting properties, which makes it a good tool for optimization problems, and particularly strongly linear, continuous or mixed (combines real and whole numbers) problems. Multiple linear regression algorithm using gradient optimized by particle swarm method Algorithm is as follows:

244

Fig. 16.1 Gradient descent converging over the search space

Fig. 16.2 Particle swarm optimization converging over the search space

Fig. 16.3 Movement principle schema of a PSO particle

Z. Bousbaa et al.

16 Stock Market Speculation System Development Based on Technico Temporal …

245

Algorithm 27 Gradient Descent Optimized with Particle Swarm Optimization. 1: Algorithm input, c1 and cmax are used while updating the weights. xmin and xmax determine the search space of weights, determined thanks to the numerous tests carried out: 2: Tolerance = 0.001 c1 = 0.738 cmax = (2/0.97725) ∗ c1 xmin = −0.3 xmax = 0.3 numberOfParticles = 3 3: for all particle pi ∈ P do 4: initialization; 5: Regression formula weights with random values wi = random(xmin , xmax ) 6: Particles movement speed for each particle by a real random value vmin = (xmin − xmax )/2

vmax = (xmax − xmin )/2

7: end for 8: while MeanSquaredError > tolerance do 9: for all particle pi ∈ P do 10: Calculate the predicted value by the model using the multiple linear regression function yi = w0 + w1 x1 + x2 x2 + · · · + wd xd = w0 +

d 

wi xi

i=1

11:

Calculate the value of the function that we are optimizing and that is presented by the derivative of the squared difference between the predicted values and the actual values: n ∂ Jn (w) 2 (yi − w0 − w1 x1 − w2 x2 − · · · − wd xd )xi, j = ∂w n i=1

Speeds vd and weights wd update with d the number of weights:

12:

vd ← c1 vd + random(0, cmax )( pd − xd ) + random(0, cmax )(gd − xd ) xd ← xd + vd with : • •

gd : The list of weights having realized the best results in the whole swarm. pd : The list of weights having realized the best results in the current particle. Calculate the value of the error function derivative and store it in a variable called error;

13: if the error of the current particle is higher than the stored one in error variable then 14: Assign the error of the current particle to error variable; 15: else 16: Go to the next particle without doing anything; 17: end if 18: end for 19: end while

16.4 Experimental Results Our data includes historical data of the major pairs price EUR /USD, GBP/USD and JPY /USD that we found in [34], then for each pair we calculated j-1, j-2, j-3, j-4,

246

Z. Bousbaa et al.

j-4, j-5, j-6, where j-n is the price n days before the current day, which added 18 columns to our dataset,and our target value is EUR/USD price. Thanks to [30], we got able to choose technical indicators that are considered important by investors in the market, for each pair we calculated 12 indicators with Excel using formulas from [31], which are 14-Day RSI, 14-day Stochastic Oscillator, 14-day StochRSI, MACD, ADX, 14-day williams %R, 20-day CCI, ATR, High-Low Index, Ult Osc 12-day, ROC, Bull Power and Bear Power. Fundamental factors are also considered by investors, Fundamental analysis involves analyzing the economic conditions that affect the valuation of a currency. Fundamental traders wait for the favorable state to occur. These factors can be categorized into economic factors, financial factors, Political and Social Events and finally Crises. After making some processings on it, we have added gold historical data, petrol and its derivatives from different sources: [35–38]. This dataset also required to be unified and become all a daily data and cleaned using python programs that we programmed for that aim, Fig. 16.4 shows the EUR/USD pair price movement over 2016. Before programming our algorithm in Java language, we have tested simple linear regression on Rstudio, to evaluate how performant it can be and if it works for our dataset. Figure 16.5a shows if residuals have non-linear patterns, We notice equally spread residuals around a horizontal line without distinct patterns, which is a good indication we don’t have non-linear relationships, that implies a linear regression is a good choice for our case of study. In Fig. 16.5b, we notice that in the middle residuals follow well a straight line and do not deviate severely, which means that our sets of quantiles truly come from Normal distributions. But residuals curve off in the extremities, this behavior usually means our data have more extreme values than would be expected if they truly came from a Normal distribution. Figure 16.5c shows the spread-location plot: residuals are spread equally along the ranges of predictors, the horizontal line with equally spread points validates the assumption of equal variance. Figure 16.5d of residual vs leverage show the extreme values that might be influential to determine a regression line, we notice at the lower right corner, outside of a dashed line or Cook’ s distance

Fig. 16.4 EUR/USD pair price movement within 2016

16 Stock Market Speculation System Development Based on Technico Temporal …

(a) Residuals vs fitted values.

(b) Normal Q-Q.

(c) Scale-Location.

(d) Residuals vs leverage.

247

Fig. 16.5 Data distribution analysis

very few points, these cases are influential to the regression results, the regression results will be altered if we exclude those cases. Our proposed algorithm have been programmed in Java oriented object language, we compared it with multiple regression algorithms that are available on python sklearn library which are: Neural Networks, Support Vector Machine, Random Forests, Partial least squares regression, Multi Linear Regression and Ensembling techniques, there are also Neural Networks and Multi Linear Regression on RStudio platform. Feature selection using PCA algorithm available on FactorMiner package in RStudio reduced the number of columns from 147 to 30 variables with 99% of informations, Fig. 16.6 shows the number of columns according to the percent of information that they hold. Overfitting is a very common problem in machine learning, each paper in literature deals with it differently. Reference [10] Uses regularization parameter and epsiloninsensitive loss function, in order to investigate the variation of performance with respect to regularization parameter C. The risk minimization problem has been solved in [12] by balancing the empirical error and a regularization term, where the risk is measured by Vapnik’ s insensitive loss function. Reference [22], Conjugate Gradient and Bayesian Regularization are used. Reference [25] used Ridge and Lasso Regularization techniques in order to ensure the existence of a solution, even with highly correlated features, to improve prediction accuracy, and to improve interpretability, by determining which features contribute most to the output. References [26–29] used particle swarm optimization to optimize the regularization parameter of other regression algorithms. We’ ve adopted the same approach as [12] as it’ s the less complex one, which prevents our program iterating from being slower, The regression

248

Z. Bousbaa et al.

Fig. 16.6 The percent of information in n columns

Fig. 16.7 Representation of real values versus predicted values for 2015

function can be estimated by minimising a regularised risk function: 1 1 w2 + c L 2 l i=1 l

(16.3)

On Figs. 16.7 and 16.8 we show the obtained results from the multiple linear regression approach with the gradient technique, optimized by particle swarm metaheuristic. This algorithm requires a good definition of research space, weights initialization, which has an impact on the direction of the model learning evolution, managing weights values is also important so that they don’ t exceed research space. The algorithms didn’ t converge to the optimum solution from the first execution, parameters optimization required in our case from 1 to 3 months, the period is affected in addition of the computational power, by how much volatile is our target currency pair that we try to forecast its exchange rate. The best model that we could have using particle swarm optimization was the one that made the smallest error, we tested our model on three datasets, and we noticed that it achieved the best results. After some

16 Stock Market Speculation System Development Based on Technico Temporal …

249

Fig. 16.8 Representation of real values versus predicted values for december 2015 Table 16.1 Mean squared values of different algorithms on three 2015 datasets Algorithm

NN

SVR

RF

PLS

MLR

IG

Ensembling

GradPSO

MSE over 2015

0,04

0,00403 0,0599

82

0,53

0,8034

0,0013

0,4505

MSE over December 2015

0,96

0,97

0,83

12,22

0,94

0,8853

0,9065

0,2529

MSE 7 days 24-31/12/2015

0,9

1,06

0,75

16,76

0,87

0,8457

0,8542

0,4079

calculations, we found that despite the margin error, exchange rate of the predicted values and of the real values variates the same way, in 71% of the cases for the first dataset, in 80% for the second dataset and in 67% for the third dataset. Which consequently means that the model is performing well (See Figs. 16.7 and 16.8), and that the obtained results are far from being considered as random, which gives to the model the credibility to forecast the exchange rate. Another proof that our model is performing well, are the results on Table 16.1 that we obtained in a comparison of it with other models that we builts using python Sickit-learn library, we can see that our model has the least error on all the three datasets, even that neural nets, SVR and random forest are performing well on the first dataset, but their results are worse than our models for the second and third dataset comparing to our model.

16.5 Conclusion and Perspectives In this paper, we have presented results and works related to forecasting exchange rate of currencies in FOREX market. Our model predicted well the exchange rate comparing to simple regression model, and it performed better than other regression

250

Z. Bousbaa et al.

models like Neural Nets, SVR, Random Forests, PLS in predicting values of short term test datasets (Example of 7 days test). The goal of this study, was to suggest an optimized predictive regression model, which is dedicated to sinusoidal category data, or time series. We are therefore ambitious to suggest an applicable solution on the data of the evolution of the Moroccan dirham, especially since this currency is lately in a transitional phase from exchange rate relative to the European euro and the American dollar, to a regime of a flexible exchange rates. Acknowledgements Special thanks to Professor Loqman Chakir from Sidi Mohammed Ben Abdellah University, who helped me on the topic during his supervision in my Master thesis. To Mr Mehdi ERRAJI, who is a mathematician phd student in Cadi Ayyad University, for helping me a lot with understanding and implementing the gradient descent algorithm, and also to Mrs Hanaa JAMALI, Computer science phd student in Cadi Ayyad University, for helping me to get more familiar with the Stock Market Speculation case of study.

References 1. E. Roudgar, Forecasting foreign exchange market trends: is technical analysis perspective successful? Thesis for the degree of Master of Science in Banking and Finance, Eastern Mediterranean University Gazimagusa, North Cyprus, Turkey (2012) 2. Y. Yong, D.C.L. Ngo, Y. Lee, Technical indicators for forex forecasting: a preliminary study (Springer International Publishing Switzerland, 2015) 3. G. Sermpinis, K. Theofilatos, A. Karathanasopoulos, C. Dunis, Forecasting foreign exchange rates with adaptive neural networks using radial basis functions and particle swarm optimization. Eur. J. Oper. Res. (2013) 4. L. Wu, Mixed effects models for complex data 5. A. Siebes, Z. Struzik, Complex data: mining using patterns, in Pattern Detection and Discovery, eds. by D.J. Hand, N.M. Adams, R.J. Bolton. Lecture Notes in Computer Science, vol. 447 (2002) 6. M. Clerc, P. Siarry, Une nouvelle métaheuristique pour l’optimisation difficile: la méthode des essaims particulaires, J3eA, 3–7 (2004) 7. A. Lemouari, Introduction aux métaheuristiques. support de cours spécialité système d’ information et aide à la prise de décision, Master thesis, Faculté des Sciences Exactes et Informatique, Département Informatique, Université de Jijel, Jijel, Algeria (2014) 8. H. Jamali, A. Nabaji, O. Bencharef, S. Raghay, New trends in fOREX Speculation: literature review (2017) 9. B. Amar, Classification des signaux EGC avec un système-multi-agent neuronale, Mémoire d’ un Magister en informatique, Université Abou Bakr Belkaid-Tlemcen, Faculté Des Sciences, Département d’ Informatique, Juin 2012 10. J. Kamruzzaman, R.A. Sarker, I. Ahmad, SVM based models for predicting foreign currency exchange rates, ICDM, in Third IEEE International Conference on Data Mining (2003) 11. J. Przemyslaw, K. Jan, T. Katarzyna, Decision trees on the foreign exchange market. Intell. Decis. Technol., 127–138 (Springer International Publishing, 2016) 12. A. Said, O. Bencharef, A. Ouaarab, A combination of regression techniques and cuckoo search algorithm for FOREX speculation, in World Conference on Information Systems and Technologies (Springer, Cham, 2017) 13. C. Man-Chung, C.-C. Wong, C.-C. Lam, Financial time series forecasting by neural network using conjugate gradient learning algorithm and multiple linear regression weight initialization. Comput. Econ. Financ. 61 (2000)

16 Stock Market Speculation System Development Based on Technico Temporal …

251

14. S. Galeshchuk, Neural networks performance in exchange rate prediction. Neurocomputing 172, 446–452 (2016) 15. P. Czekalski, M. Niezabitowski, R. Styblinski, ANN for FOREX forecasting and trading. Control Syst. Comput. Sci. (CSCS) (2015) 16. Y.L. Yong, D.C.L. Ngo, Y. Lee, Technical indicators for forex forecasting: a preliminary study, in International Conference in Swarm Intelligence (Springer International Publishing, 2015) 17. R.T. Gonzalez, C.A. Padilha, D.A.C. Barone, Ensemble system based on genetic algorithm for stock market forecasting. Evol. Comput. (CEC) (2015) 18. M. Ozturk, I.H. Toroslu, G. Fidan, Heuristic based trading system on Forex data using technical indicator rules. Appl. Soft Comput. 43, 170–186 (2016) 19. G. Sermpinis et al., Forecasting foreign exchange rates with adaptive neural networks using radial-basis functions and particle swarm optimization. Eur. J. Oper. Res. 225(3), 528–540 (2013) 20. M. Rout et al., Forecasting of currency exchange rates using an adaptive ARMA model with differential evolution based training. J. King Saud Univ.-Comput. Inf. Sci. 26(1) 7–18 (2014) 21. A.S. Babu, S.K. Reddy, Exchange rate forecasting using ARIMA, neural network and fuzzy neuron. J. Stock Forex Trad. 4(155), 2 (2015) 22. M.O. Ozorhan, E.H. Toroslu, O.T. Sehitoglu, A strength-biased prediction model for forecasting exchange rates using support vector machines and genetic algorithms. Soft Comput., 1–19 (2016) 23. F.X. Diebold, J.A. Nason, Nonparametric exchange rate prediction. J. Int. Econ. 28(3–4), 315– 332 (1990) 24. F. Giordano, M.L. Rocca, M.L. Parrella, Clustering complex time series databases, Classification and Multivariate Analysis for Complex Data Structures (Springer, Berlin, 2011), pp. 417–425 25. A. Nigham, V. Aggarwal, The LPASSO method for regression regularization. Technical Report, MIT (2005) 26. M. Subathra, R. Nedunchezhian, An Improved alias classification using logistic regression with particle swarm optimization. Indian J. Sci. Technol. 8(28) (2015) 27. R. Goebel, J. Siekmann, W. Wahlster, Lecture notes in artificial intelligence, in Proceedings of 12th Conference on Artificial Intelligence in Medicine (2009) 28. D. Ma et al., Parameter identification for continuous point emission source based on Tikhonov regularization method coupled with particle swarm optimization algorithm. J. Hazard. Mater. 325, 239–250 (2017) 29. A. Ozbeyaz, M.I. Gursoy, R. Coban, Regularization and kernel parameters optimization based on PSO algorithm in EEG signals classification with SVM. Signal Process. Commun. Appl. (SIU) (2011) 30. https://www.investing.com/ 31. http://stockcharts.com/ 32. http://ieeexplore.ieee.org/Xplore/home.jsp 33. https://www.dailyfx.com/ 34. http://www.histdata.com/ 35. https://fred.stlouisfed.org/ 36. https://stats.oecd.org/ 37. https://www.census.gov/ 38. https://www.bea.gov/ 39. http://www.investopedia.com/ 40. https://www.zonebourse.com/ 41. https://en.wikipedia.org/wiki/Gradient_descent#/media/File:Gradient_ascent_(surface).png 42. http://www.onestepremoved.com/tag/decision-tree/ 43. https://people.cs.pitt.edu/~milos/courses/cs2750-Spring04 44. http://eric.ish-lyon.cnrs.fr/51-EN-complex_data_a_definition 45. https://www.andlil.com/definition-de-divergence-125688.html

Chapter 17

A New Hidden Markov Model Approach for Pheromone Level Exponent Adaptation in Ant Colony System Safae Bouzbita, Abdellatif El Afia, and Rdouan Faizi

Abstract We propose in this paper a Hidden Markov Model (HMM) approach to avoid premature convergence of ants in the Ant Colony System (ACS) algorithm. Indeed, the proposed approach was modelled as a classifier method to control the convergence through the dynamic adaptation of the α parameter that weighs the relative influence of the pheromone. The implementation was tested on several Travelling Salesman Problem (TSP) instances with different number of cities. The proposed approach was compared with the standard ACS and the existing fuzzy logic in the literature. The experimental results illustrate that the proposed method shows better performance.

17.1 Introduction The Travelling Salesman Problem (TSP) is considered as one of the hardest combinatorial optimization problems studied in computer science, logistics, and transportation industries [1–3]. Where, the task is finding the shortest tour that visits all the cities in a given list of cities once and only once, starting from one city and returning to the same city [4]. The TSP is then the optimization problem to find a Hamiltonian cycle that minimizes the length of the tour. For this minimization mission, (n − 1)! possibilities of solutions have to be compared, which make it very hard to be solved and then belongs to the NP-hard problem that cannot be optimally in a polynomial time. Many heuristics and meta-heuristics S. Bouzbita (B) · A. E. Afia · R. Faizi ENSIAS - Mohammed V University, Rabat, Morocco e-mail: [email protected] A. E. Afia e-mail: [email protected] R. Faizi e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_17

253

254

S. Bouzbita et al.

have been proposed to find near optimal solution to it. The Ant Colony Optimization (ACO) meta-heuristic is one of the most powerful algorithms for solving the TSP [5]. Since the development of the first (ACO) algorithm by Dorigo 1991 [6], many other variants have been proposed, which differ one to the other in the Update Pheromone procedure and some characteristics in the Construction Solution. One of the most interesting variants is the Ant Colony System (ACS). The (ACS) algorithm differs in several ways from the other (ACO) techniques. Algorithm 1 Standard ACS Initialization phase: Initialize pheromone trails with τ0 = 1/L N N Construction phase: repeat for all ant do Build a feasible solution according to (1) and (2) Local update according to (3) end for Global update according to (4) until Stop criterion is reached return the best found tour L best

The main steps of ACS algorithm are: • Initialization: In this step, the parameters are set and pheromone matrix is initialized with a small values. • Construction solution: ants uses the so called pseudo-random proportional rule to construct a feasible solution, a complete tour in the case of TSP: with a probability q0 the next city is chosen as argmaxu∈Jk (r ) [τ (r, u)]α [η(r, u)]β i f q ≤ q0

(17.1)

and with probability (1-q0 ) the random proportional rule is used as ⎧ α β ⎨  [τ (r, s)] .[η(r, s)] prks = [τ (r, u)]α [η(r, u)]β ⎩ u∈Jk (r ) 0

i f s ∈ Jk (r )

(17.2)

otherwise

where τ (r, s) and η(r, s) are the pheromone level and heuristic value between i and u, and Jk (r ) is the set of component solutions yet to be chosen by ant k positioned on r . The state transition rule resulting from (1) and(2) is called pseudo-randomproportional rule. During the solution construction, a local pheromone updating rule is applied. Each time an ant moves to the next city s the amount of pheromone between (r,s) is modified according to:

17 A New Hidden Markov Model Approach for Pheromone Level …

255

τ (rk , sk ) := (1 − ξ )τ (rk , sk ) + ξ τ0

(17.3)

where ξ ∈ (0, 1) is a parameter called local pheromone parameter, and τ0 is the value that initializes the trails of pheromone, τ0 is a very small constant with value 1 , where L nn is the length of a nearest neighbour tour and n is the number of n.L nn instances in the problem. The aim of the local update pheromone is avoiding the stagnation by decreasing the pheromone value on the used edges and make them less attractive. • Global pheromone update: At the end of each iteration, and after all ants have constructed a complete solution, the amount of pheromone is updated again according to either the iteration-best solution or the global so far, using the following formula: ρ (17.4) τ (rk , sk ) := (1 − ρ)τ (rk , sk ) + (L best ) This rule is applied after that all ants have constructed their solutions. • Stop criterion: The algorithm can be stopped after it reached the stop criterion which is a maximum number of iterations without improvement and returns L b est. The performance of ACO algorithms depends strongly on the given values to parameters. In the earliest ACO applications, parameters values are set constant during the running of the algorithm. However, modifying the values of parameters throughout the run of the algorithm can evolve the performance of algorithm. Parameter adaptation is becoming a considerable task in the field of metaheuritics algorithms. The adaptation of parameters while running the algorithm using machine learning techniques is also the influential theme in the research area. Several strategies have been proposed in the literature for adapting parameters while solving a problem. For example, in [7–13] the authors have chosen the HMM machine learning to adapt parameters of some meta-heuristics at runtime. While, in [14, 15] other methods were proposed to control the parameters of PSO metaheuristic. In this paper, we propose the machine learning technique Hidden Markov Model (HMM) to adapt the exponent of pheromone level α parameter dynamically according to some performance measures while solving some TSP problems. The rest of the paper is organized as follows. Section 2 presents the relate work. Section 3 describes a new method of parameter adaptation through Hidden Markov Model. Section 4 outlines the experimental results and comparison. Finally, Sect. 5 gives the conclusion.

17.2 Related Work The exponent on the pheromone level α of ACO algorithm has not received enough attention in the literature. Even though, Meyer [16] decided to study the influence of this parameter on the ACO performance by proposing a critical cycle Ant System

256

S. Bouzbita et al.

(ccAS) algorithm, in which the subsequent runs start from an already developed pheromone matrix, and he proved that the parameter α determines the quality of the convergence of ACO, also it rends the ant algorithms more efficient. Other authors preferred to analyse the influence of this parameter by adapting it during the search process. In [17] Chusanapiputt et al. proposed a Selective SelfAdaptive approach Ant System (SSAS) to solve the constrained unit commitment problem by adapting transition probability parameters and population size through increasing and decreasing the value of α and β and cooperating with Effective Repairing Heuristic Module (ERHM) and Candidate Path Management Module (CPMM). In th same context, Martens et al. [18] suggested an implementation of Max-Min Ant System (MMAS) applied to the data mining field for extracting rule-based classifiers. The aim of their called AntMiner+ classification technique, is to find suitable values for α and β parameters by creating a new vertex group in the construction graph for each parameter. The values of α and β are limited to integers 1,2 and 3. Also, Khichane et al. [19] proposed a self-adaptive approach to modify the parameters α and β of MMAS by introducing two reactive frameworks that differ in he granularity of parameter learning and applying it to constraint satisfaction problems. The updated parameters are considered independent of each other. For this adaptation authors developed two mechanisms. The first one, is called global parameter learning ant-solver (GPL-Ant-solver) defines one common parameter for all the colony during the construction solution by ants. In the second mechanism, which is called distributed parameter learning ant-solver (DPL-Ant-solver), the values of α and β are updated at each step of the construction solution. Beside the self-adaptive strategies, there are search based proposed methods for adapting α parameter. For example, Li et al. [20] proposed an information entropy based approach to solve the premature convergence problem of the ACO algorithm, which is applied to tuning α and β parameters on Traveling Salesman Problem. Neyoy et al. [21] proposed a new approach to control the diversity in ACO by the dynamic variation of α parameter using a convergence fuzzy logic controller that aims to avoid or slow down full convergence. For their part, Olivas et al. [22] proposed a dynamic parameter adaptation approach for Ant Colony Optimization (ACO) based on fuzzy logic systems, for controlling the ability of exploration and exploitation. The objective of their method is to be able to apply it to a wide range of problems without the need of finding the optimal values of parameters for each problem. The exponent on the pheromone level α and the rate of evaporation of the pheromone trail ρ are the chosen parameters to be change. Ling et al. [23] introduced the Artificial Fish Swarm Algorithm (AFSA) to solve the parameter modification problem of ACO algorithm when applying it to the TSP. Three parameters were updated α, ρ and Q. In their work, the authors defined the same parameter setting for all ants. Melo et al. [24] proposed a multi-colony ACS algorithm, where various ant colonies try to solve the same problem at the same time. Each colony has its proper values of parameters α, β, ρ and q0 . The main idea of their work is to replace the parameters of the worst colony by the values of the same parameters of the best colony.

17 A New Hidden Markov Model Approach for Pheromone Level …

257

17.3 Proposed Method In this section a new improved ACS algorithm based on HMM method is proposed to adapt the exponent on the pheromone level α according to some performance measures. The reason why we have chosen to tune this parameter, is its strong influence on the quality of convergence and performance of the ACS algorithm. The main idea of this work is that the value of α should be modified according to the diversity throughout the population and the closeness to the best known solution which are considered as the performance measures. For example, if the diversity throughout the population is high so there are several different paths explored by the ants, this means we don’t require any more exploration and we need to exploit the accumulated information by setting the α parameter to a high value. Considering the error from getting the best solution, if the error is high this means we are far from the best solution so we need this time to explore more solutions by setting α to a low value. To describe the diversity throughout the population we have chosen the Variance measure which has the power to indicates how far ants are spread out. m (L i − μ)2 (17.5) V ariance = i m where, L i is the length of the found solution by ant i, μ is the mean of all solutions found by the population, and m is the number of ants. For the closeness to the best known solution the error variable was chosen as a measure performance. Err or = L best − L best−known

(17.6)

where, L best is the best found solution by the population, and Lbest − known is the best known solution for [TSPLIB]. These two measures are considered as observations for the HMM, while the values of α are considered as the hidden states of HMM.

17.3.1 Hidden Markov Model Hidden Markov Model is a learnable stochastic automate which consists of two stochastic processes. The first process is an underlying unobservable Markov Chain that externally cannot be visible (hidden) and characterized by states and transition probabilities. The second one is an observable process producing observation symbols depending on probability distribution of the first process. In practice, A HMM noticed by the symbol λ.

258

S. Bouzbita et al.

We now define the five following elements for the proposed model:‘ • S = {L , M L , M, M H, H } is the hidden states set • V = {L L , L M, L H, M L , M M, M H, H L , H M, H H } is the observation symbols per state set • = [π1 , π2 , π3 , π4 , π5 ] = [1, 0, 0, 0, 0] is the initial probability where πi is the probability of being in the state Si . Where, at time t=0 the model is in state “Low” for α parameter. As it was mentioned before, at the beginning of the algorithm we need to explore maximum area that’s why we have to set the α parameter at a low value. • A = [ai j ] is a matrix of transition probabilities from state Si at time t to state S j at time t+1. In our method, we considered a five state left-to-right HMM to describe the ants states. In the case of left-to-right HMM, the initial state probability is equal to one. Thus, all the ants start with the same state. • B = [b jk ] is the emission matrix of observing a symbol Vk from a state Si . The emission matrices were defined according to some knowledge that a low value of variance means that the ants may be stuck in an optimal solution so we need to explore more solutions by decreasing the value of α. And a low value of error means that the ants are close to the best known solution so we have to exploit the accumulated information by increasing the value of α. ⎡

0.5 ⎢0 ⎢ A = (ai j ) = ⎢ 0 ⎣0 0

0.5 0.5 0 0 0

0 0.5 0.5 0 0

0 0 0.5 0.5 0

⎤ 0 0⎥ ⎥ 0⎥ 0.5⎦ 1



0 ⎢0 ⎢ ⎢1 B = (b jk ) = ⎢ ⎢3 ⎣0 1

0 0 0 0 0 0 0 0.5 0 0 1 0 0 0 0 3 0.5 0 0 0 0.5 0 0 0 0 0

⎤ 0 0 1 0 0.5 0 ⎥ ⎥ 1⎥ ⎥ 0 0 3⎥ 0 0 0⎦ 0 0 0

We have defined five states corresponding to the values of α such as: Low (L), Medium Low (ML), Medium (M), Medium High (MH), High (H). For the observation symbols we have concatenated the performance measures Variance and Error such as each measure represented by three symbols L, M, H. Where, L means Low, M means Medium, and H means High. So we obtain nine possible concatenation. In our method, the Viterbi algorithm was adopted so as we tried to compute the most likely sequence of states that produced the current observation sequence. The dynamic adaptation of the parameter q0 is done according to the last state in the found sequence of states. For example, if the most likely state sequence for α parameter is H, L, M, MH, L, ML, M, then its value is updated according to the last state which is “Medium”. In addition to the Viterbi, we have used the well known Baum-Welch training method to adjust the HMM parameters λ = (A, B, π ) during the run time.

17 A New Hidden Markov Model Approach for Pheromone Level …

17.3.1.1

Proposed Algorithm

Algorithm 2 Pseudocode of proposed algorithm Initialization: Construction Solution: while Termination condition not met do for all ant k in the population do Chose the next according to (1) and (2) Local pheromone update according to (3) end for for all ant k in the population do Chose the next according to (1) and (2) Local pheromone update according to (3) end for Update the global best solution; Global pheromone update according to (8) Compute Variance and Error according to (5) and (6) Max V ariance if 0 < Variance ≤ then 3 Variance = L Max V ariance 2 ∗ Max V araince else if < Variance ≤ then 3 3 Variance = M else Variance= H end if Max Err or if 0 < Error ≤ then 3 Error= L 2 ∗ Max Err or Max Err or < Error ≤ then else if 3 3 Error= M else Error= H end if return O = Variance Error; Apply algorithm 2 to find the suitable state Update α according to the found state, if state= L then α = 0.6 else if state= ML then α = 0.7 else if state= M then α = 0.8 else if state= MH then α = 0.9 else α=1 end if end while return L best

259

260

S. Bouzbita et al.

In the proposed algorithm, each ant builds a tour by choosing the next node using the pseudo random-proportional rule. After building a tour the variance and the error are calculated and converted into symbols according to the determined intervals. The symbols then are combined to build an observation, then the observation is sent to the Viterbi algorithm to determine which state is the most likely responsible for producing this observation. The sequence of observation is incremented after each iteration, so the number of elements of a sequence equal to the number of iterations.

17.3.1.2

Adjusting HMM Parameters and State Determination

The values of HMM parameters (transition and emission matrices) are set to the best guessed values. But to adjust them, at the end of each iteration we perform an Online Learning using the Baum-Welch algorithm. The Baum-Welch algorithm uses the expectation maximization algorithm to find the maximum likelihood estimate of the HMM parameters given a sequence of observed data. After that, to find the most likely explanation or the most likely state that generated a particular sequence of observations the Viterbi algorithm is used to find a maximum over all possible state sequences. This algorithm has the ability to deal with the entire sequences of hidden states from the beginning of the algorithm till the current iteration,and then make a decision based on the whole history, which makes it advantageous compared to other algorithms that depend only on the information of the current iteration.“ Algorithm 3 Parameters estimation and state determination Input: O=o1 ,o2 ,...,oT , S, λ = (A, B, π ) repeat Re-estimate λ using Baum-Welch Find the most likely sequence of states ST using Viterbi until no increase of P(O/λ) or no more iterations are possible to do

17.4 Experimental Results and Comparison To test the efficiency of the proposed algorithm, we compared it with the standard ACS algorithm also with Fuzzy Logic results. The algorithm has been run according to the best known values of ACS algorithm parameters [25] which are β = 2, ρ = 0.1, q0 = 0.9, and m = 10. The initial position of ants is set randomly on all experiments. The TSP benchmark instances used in this study were chosen from the TSPLIB [26] according to the most common used instances in the literature. The algorithm is developed on MATLAB. Each instance has been run for 30 times with 1000 iterations for each run. The stop condition is: 200 iteration as the maximum number without improving in results.

17 A New Hidden Markov Model Approach for Pheromone Level …

261

17.4.1 Comparison on the Solution Accuracy Table 17.1 shows the results of running the proposed and the standard ACS algorithms on the chosen TSP instances. The meaning of each column of the table is as follows. • The first column represents the names of the chosen TSP instances. • The second column represents the best found solution by the standard ACS algorithm. • The third column represents the amount of time used for the standard ACS processing. • The fourth column represents the best found solution by the proposed method ACSHMM. • The fifth column represents the amount of time used for the proposed method. • The sixth column represents the best found solution by Fuzzy Logic[28].

Table 17.1 Summary of results of the standard ACS and ACSHMM algorithms on some TSP instances Problem Standard ACS ACSHMM FuzzyLogic Best known Solution CPUtime Solution CPUtime Solution Solution Eil51 St70 Eil76

435.4 [104] 143 690 [224] 252 566 [12] 167.7

Rat99 Eil101 Lin105

1228.5 1158.4 [614] 675.7 [7] 316.78 14551 [165] 698.8

656.4 [75] 363.3 14474 [148] 436.79

Pr107

44620 [82]

403.6

44481 [127] 510.5

Pr124

59632.9 [25] 2458.8 [114] 53118.4 [101] 46319.21 [44] 121082.91 [30]

472.2

59159 [309] 1101.5

1833.3

2434 [28]

1400

5590

52514.5 [31] 44300.71 [430] 117757.8 [179]

2790

Rat195 pr264 Lin318 Pr439

3013.6 6471.7

428.8 [325] 677 [246] 555.79 [286] 1221 [325]

201 292.6 275.79

431.9 [199] 426 691 [66] 675 556 [479] 538

711.8

1234.2 [315] 655.6 [218] 14543.1 [27] 45757.9 [329] 60203.3 [322] 2401.9 [65]

1211 629 14379 44303 59030 2323

8039.8

53429.6 [78] –

49135 42029

9842.6



107217

262

S. Bouzbita et al.

The numbers between brackets are the numbers of iterations, in which the algorithms found the best solution, and the CPU time is the amount of time used to find this best solution. From Table 17.1 we can observe that the proposed algorithm ACSHMM gives better results in the solution accuracy and the convergence speed compared with the standard ACS and the fuzzy logic proposed in the literature. Thus in The most instances our proposed algorithm outperforms the others. We can see that the found solutions by the proposed ACSHMM method are very close to the best known solutions especially the 3 fist instances. As we can see, the difference in solutions grows with size growth. Also, the important aspect about our proposed algorithm is its behaviour under bad parameter setting. Thus, the optimal known value for the number of ants equals to the size of the TSP instance, but when we set the number of ants to 10, our proposed algorithm outperforms the standard and the Fuzzy Logic algorithms. For the number of iteration in which the algorithm found the best solution, we can observe that in ACSHMM method, the number of iteration required is superior compared to the standard ACS, which can be explained by falling into an early stagnation for the standard ACS, while the proposed ACSHMM method with the dynamic value of the exponent on the pheromone level, could avoid it by keep looking for other best solutions. For the CPU time, it can be noted that the processing time for the proposed method is a bit higher than the standard ACS, which can be explained by the hybridization between two weighty mechanisms (HMM and ACS) in the proposed method, also to avoid the stagnation, the proposed algorithm requires an important number of iteration and then a big quantity of time.

17.4.2 Comparison on the Convergence Speed In order to show the convergence speed of the proposed method we calculated its CPUtime (Table 17.1), also the charts that show the progress of the tour length with the iteration number have been drawn. The first figure 17.1 corresponds to pr107.tsp instance, shows that the proposed method converge to a better solution, as we can observe from Table 17.1, even if the standard ACS found its best solution in the 82th iteration but the proposed method found better one. In the second figure 17.2, a drawing of best found solution for the proposed algorithm and the standard ACS when applied to pr264.tsp instance is shown. From this drawing we can observe that the proposed ACSHMM converges to a better solution in less number of iterations compared to the standard ACS. The third figure 17.3 present the result of drawing the best found solutions for ACSHMM and ACS when applied to rat99.tsp" after the best found solutions for ACSHMM and ACS, in which we can observe that the ACSHMM algorithm is advantageous on the solution quality and the number of iteration.

17 A New Hidden Markov Model Approach for Pheromone Level …

263

Fig. 17.1 Sample run on pr107.tsp

Fig. 17.2 Sample run on pr264.tsp

Fig. 17.3 Sample run on rat99.tsp

For the fourth figure 17.4 that correspond to the rat195.tsp problem, the proposed ACSHMM approach achieved better solution accuracy and made faster convergence. For lin318.tsp instance 17.5, the difference between solutions for the proposed approach and the standard ACS is huge with privilege to the proposed approach, also we can see that the standard ACS was stuck in an early solution but the proposed approach has continued to search new solutions. For pr439.tsp instance 17.6, which is considered as a big problem, the proposed approach has achieved better solution than the standard ACS, but it has required more number of iterations than the standard one.

264

S. Bouzbita et al.

Fig. 17.4 Sample run on rat195.tsp

Fig. 17.5 Sample run on lin318.tsp

Fig. 17.6 Sample run on pr439.tsp

From those charts and Table 17.1 we can assume that the results of our proposed method are very encouraging.

17.4.3 Statistical Test We used the Wilcoxon Rank Test in a pair-wise comparison procedure under significance level α = 0.05 as a statistical test to compare the methods. The reason why we have used the Wilcoxon Rank Test is its consideration to the quantitative differences

17 A New Hidden Markov Model Approach for Pheromone Level …

265

Table 17.2 Statistical validation for the TSP benchmark instances with ACSHMM as control algorithm TSP Eil51 St70 Eil76 Rat99 Eil101 Standard ACS 2.57E-02 TSP Lin105 Standard ACS 2E-02

3.445E-01 Pr107 2.39E-02

1.04E-02 Pr124 4.69E-02

6.5E-02 Rat195 4E-02

9.09E-01 pr264 5.8E-03

in the algorithms performance. Also, it is the recommended statistical test method used in many other researches [29, 30]. The null hypothesis says that the found solutions of the ACSHMM method are worse than the ACS method, while the alternative hypothesis says that the solutions of the ACSHMM method are better when compared with the standard ACS. From Table 17.2 that represents the p-value for the test, we can observe that our proposed algorithm outperforms the original ACS with level of significance of 5%. As we can notice, the calculated p-value is below the significance level in most benchmark instances. Also, we can see that the proposed method only in one instance fails to reject the null hypothesis, however in all other results our proposed method converges to better solutions when compared to other methods.

17.5 Conclusion We can conclude from proposing the Hidden Markov Model (HMM) controller to the Ant Colony System (ACS) algorithm for the sake of dynamic adaptation to its exponent of pheromone level α parameter when applied to some Travelling Salesman Problems (TSP) that this proposed algorithm could improve the quality of solutions compared with the standard algorithm and the proposed one by the Fuzzy Logic on one hand. On the other hand, we can see that the proposed method converges faster to better solutions in all most instances. Also, from the results of the statistical tests we can observe that only in one instance the test fails to reject the null hypothesis, however in all other results the proposed approach found enough evidence to reject the null hypothesis with a level of significance of 5%. Thus, the ACSHMM outperforms the other methods in both solution accuracy and speed convergence.

References 1. H. Erol, M.Er. Kara, S. Bulkan, Optimizing the ant colony optimization algorithm using neural network for the travelling salesman problem, in Actas de la Conferencia Internacional de, pp. 83–89 (2012) 2. K. Helsgaun, J.L. Ngassa, ACO and TSP (2007)

266

S. Bouzbita et al.

3. R. Matai, S.P. Singh, M.L. Mittal, Travelling salesman problem: an overview of applications, formulations, and solution approaches, in Traveling Salesman Problem, Theory and Applications, vol. 1 (2010) 4. M. Diaby, The travelling salesman problem: a linear programming formulation (2006). arXiv:cs/0609005 5. E. Filip, M. Otakar, The travelling salesman problem and its application in logistic practice. WSEAS Trans. Bus. Econ. 8(4), 163–173 (2011) 6. M. Dorigo, V. Maniezzo, A. Colorni, The ant system: an autocatalytic optimizing process (1991) 7. S. Bouzbita, A. El Afia, R. Faizi, M. Zbakh, Dynamic adaptation of the ACS-TSP local pheromone decay parameter based on the Hidden Markov Model, in 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech) (IEEE, Marrakesh, 2016), pp. 344–349. https://doi.org/10.1109/CloudTech.2016.7847719 8. S. Bouzbita, A. El Afia, R. Faizi, A novel based Hidden Markov Model approach for controlling the ACS-TSP evaporation parameter, in 2016 5th International Conference on Multimedia Computing and Systems (ICMCS) (IEEE, Marrakech, 2016), pp. 633–638. https://doi.org/10. 1109/ICMCS.2016.7905544 9. O. Aoun, M. Sarhani, A. El Afia, Investigation of hidden markov model for the tuning of metaheuristics in airline scheduling problems. IFAC-PapersOnLine 49(4), 347–352 (2016). https://doi.org/10.1016/j.ifacol.2016.07.058 10. S. Bouzbita, A. El Afia, R. Faizi, Hidden Markov model classifier for the adaptive ACS-TSP Pheromone parameters, in Bioinspired Heuristics for Optimization, vol. 774 (Springer, Cham, 2019), pp. 153–169. https://doi.org/10.1007/978-3-319-95104-1_10 11. O. Aoun, M. Sarhani, A. El Afia, Hidden markov model classifier for the adaptive particle swarm optimization, in Recent Developments in Metaheuristics, vol. 62 (Springer, Cham, 2018), pp. 1–15. https://doi.org/10.1007/978-3-319-58253-5_1 12. A. El Afia, M. Sarhani, O. Aoun, Hidden markov model control of inertia weight adaptation for Particle swarm optimization. IFAC-PapersOnLine 50(1), 9997–10002 (2017). https://doi. org/10.1016/j.ifacol.2017.08.2030 13. M. Lalaoui, A. El Afia, R. Chiheb, Hidden Markov model for a self-learning of simulated annealing cooling law, in 2016 5th International Conference on Multimedia Computing and Systems (ICMCS) (IEEE, Marrakech, 2016), pp. 558–563. https://doi.org/10.1109/ICMCS. 2016.7905557 14. A. El Afia, M. Sarhani, O. Aoun, A probabilistic finite state machine design of particle swarm optimization, in Bioinspired Heuristics for Optimization, vol. 77 (Springer, Cham, 2019), pp. 185–201. https://doi.org/10.1007/978-3-319-95104-1_12 15. M. Sarhani, A. El Afia, R. Faizi, Facing the feature selection problem with a binary PSOGSA approach, in Recent Developments in Metaheuristics, vol. 62 (Springer, Cham, 2018), pp. 447–462. https://doi.org/10.1007/978-3-319-58253-5_26 16. B. Meyer, Convergence control in ACO, in Genetic and Evolutionary Computation Conference (GECCO): Late-Breaking Paper Available on CD, Seattle, WA (2004) 17. S. Chusanapiputt, D. Nualhong, S. Jantarang, S. Phoomvuthisarn, Selective self-adaptive approach to ant system for solving unit commitment problem, in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2006), pp. 1729– 1736 18. D. Martens, M. De Backer, R. Haesen, J. Vanthienen, M. Snoeck, B. Baesens, Classification with ant colony optimization. IEEE Trans. Evol. Comput. 11(5), 651–665 (2007) 19. M. Khichane, P. Albert, C. Solnon, An ACO-based reactive framework for ant colony optimization: first experiments on constraint satisfaction problems, in Learning and Intelligent Optimization, Third International Conference, LION 3, Lecture Notes in Computer Science, vol. 5851, ed. by T. Stutzle (Springer, Heidelberg, Germany, 2009), pp. 119–133 20. Y. Li, W. Li, Adaptive ant colony optimization algorithm based on information entropy: foundation and application. Fund. Inf. 77(3), 229–242 (2007)

17 A New Hidden Markov Model Approach for Pheromone Level …

267

21. H. Neyoy, O. Castillo, J. Soria, Fuzzy logic for dynamic parameter tuning in ACO and its application in optimal fuzzy logic controller design, in Fuzzy Logic Augmentation of NatureInspired Optimization Metaheuristics, vol. 574 (Springer, Cham, 2015), pp. 3–28 22. F. Olivas, F. Valdez, O. Castillo, C.I. Gonzalez, G. Martinez, P. Melin, Ant colony optimization with dynamic parameter adaptation based on interval type-2 fuzzy logic systems. Appl. Soft Comput. 53, 74–87 (2017) 23. W. Ling, H. Luo, An adaptive parameter control strategy for ant colony optimization, in CIS’07: Proceedings of the 2007 International Conference on Computational Intelligence and Security (CIS 207) (IEEE, Washington, D.C., 2007), pp. 142–146 24. L. Melo, F. Pereira, E. Costa, MC-ANT: a multi-colony ant algorithm, in International Conference on Artificial Evolution (Evolution Artificielle) (Springer, Berlin, Heidelberg, 2009), pp. 25–36 25. T. Stützle, M. López-Ibánez, P. Pellegrini, M. Maur, M.M. De Oca, M. Birattari, M. Dorigo, Parameter adaptation in ant colony optimization, in Autonomous Search (Springer, Berlin, Heidelberg, 2011), pp. 191–215 26. G. Reinelt, Tsplib discrete and combinatorial optimization (1995) 27. D. Gomez-Cabrero, D.N. Ranasinghe, Fine-tuning the ant colony system algorithm through particle swarm optimization (2018). arXiv:1803.08353 28. C. Amir, A. Badr, I. Farag, A fuzzy logic controller for ant algorithms. Comput. Inf. Syst. 11(2), 26–34 (2007) 29. A. LaTorre, S. Muelas, J.M. Pena, A comprehensive comparison of large scale global optimizers. Inf. Sci. 316, 517–549 (2015) ˇ 30. N. Veˇcek, M. Crepinšek, M. Mernik, On the influence of the number of algorithms, problems, and independent runs in the comparison of evolutionary algorithms. Appl. Soft Comput. 54, 23–45 (2017)

Chapter 18

A New Cut-Based Genetic Algorithm for Graph Partitioning Applied to Cell Formation Menouar Boulif

Abstract Cell formation is a critical step in the design of cellular manufacturing systems. Recently, it was tackled by using a cut-based-graph-partitioning model. This model meets real-life production systems requirements as it uses the actual amount of product flows, it looks for the suitable number of cells, and it takes into account the natural constraints such as operation sequences, maximum cell size, cohabitation and non-cohabitation constraints. Based on this model, we propose an original encoding representation to solve the problem by using a genetic algorithm. We discuss the performance of this new GA in comparison to some approaches taken from the literature when they are applied to a set of medium sized instances. Given the results we obtained, it is reasonable to assume that the new GA will provide similar results for large real-life instances.

18.1 Introduction Cellular Manufacturing Systems (CMS) are an industrial implementation of the Group Technology (GT) philosophy. CMS consist of dividing the manufacturing system into cells so that similar parts are processed in the same cell. Such systems are specifically designed for job shops whose production volume is average [1]. CMS have proven ability to reduce set-up times, in-process inventories, lot sizes and production equipment while improving productivity and production system mastery [16]. There are four important steps in CMS design: (1) process planning, (2) cell formation (CF), (3) machine layout and (4) cell layout. Our paper deals with CF which is a key step in CMS design. In the last decades, the interest of researchers on CF triggered a big amount of research that can be broadly divided into the following three non-exclusive categories [1]: 1. Methods based on the part-machine incidence matrix: The part-machine incidence matrix (PMIM) is a binary matrix that indicates M. Boulif (B) Department of Computer Science, Faculty of Sciences, M’Hamed Bougara University of Boumerdes, 35000 Boumerdes, Algeria e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_18

269

270

M. Boulif

the set of machines used to process each part. A large number of studies concentrate on the use of this matrix, by considering that it is the most important, if not the sole, input of the problem (e.g. [13]). Such matrix-based methods generally proceed by swapping rows and/or columns of the PMIM to yield a diagonal block structure from which part families and machine cells are obtained. This category has several limitations as it takes neither the operation sequences nor the production volumes into account. 2. Methods based on similarity coefficients: McAuley [11] was the first to use the measure of similarity between machines to identify cells. He developed a mathematical coefficient that uses only PMIM information. Since his article was published, numerous papers have tried to enhance this measure by adding further inputs, including production volumes [10, 15], part operational time and operation sequences [4]. The efforts in this category tend to combine data inputs from several criteria, defining the similarity coefficient as a weighted combination of the overall criteria (see [18] for a comprehensive study). However, weak justifications are given for the weighting procedure, which is an influential parameter in the derived solutions. 3. Methods based on Meta-Heuristics: CF problem’s NP-completeness prompted research to focus on heuristic methods. Meta-heuristics, have attracted the most attention, leading to Tabu search approach [9], Simulated Annealing [17], Neural Network approaches [7] and Genetic algorithms (GA) [1, 2, 5, 17]. Literature findings proved that GA based methods are very interesting research paths in comparison to other heuristics [10]. In GA based approaches, the encoding representation is the sole means that prospects the search space. We believe therefore that it must be lent more attention in research efforts. In fact, most of the published works (e.g. [2, 5, 17]) that use an evolutionary approach adopt the machine-to-cell integer encoding that has proven its limitation [1]. To contribute to these efforts, this paper proposes a new cut-based GA encoding representation derived from the cut-based-graph-partitioning model [12]. The proposed cut-based solving approach supposes that the number of cells is not known a priori and hence, it looks for the appropriate number of cells. Furthermore, this approach is more suitable to meeting the real-life production systems requirements as it uses the actual amount of product flow that is falsely estimated by binary-PMIM-based methods, and as it considers the natural constraints such as operation sequences, maximum cell size, cohabitation and non-cohabitation constraints. The remainder of this paper is organized as follows. In Sect. 18.2, a graph partitioning formulation for the MCF problem is presented. Section 18.3 discusses some theoretical aspects from which the cut based encoding is derived. The next section describes the proposed genetic algorithm. Section 18.5 presents the results obtained by applying the proposed methods on some chosen data sets, and Sect. 18.6 presents our conclusions as well as our recommendations for further research.

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

271

18.2 Formulation In order to be self contained, in what follows we present the formulation of the MCF problem as a cohabitation-and-non-cohabitation-constrained graph partitioning problem [12].

18.2.1 Input data 1. M = {M 1 , M2 , . . . , Mm } is a set of m machines and P = {P 1 , P2 , . . . , Pp } is a set of p part types. 2. For each part type Pk (k = 1, 2, …, p), we suppose given: • A single sequence of machines to be visited by the part: Rk =(< Mk,1 >,< Mk,2 >, …, < Mk,sk >), where: < Mk,t > ∈ M (t=1,2,…,sk ) and sk is the number of machines in the sequence Rk . • rk : the mean production volume of part type Pk per time unit. 3. Constraint data: • A set of machine couples SC. • Another set of machine couples S N such that S N ∩ SC =Ø. • An integer number N , 0 < N < m.

18.2.2 Flow Graph Construction 4. For each (Pk , Mi , Mq ) ∈ P × M × M, we denote as vkiq the number of times Mi follows Mq or inversely in Rk ((i, q) ∈ {1, . . . , m − 1} × {i + 1, . . . , m}). 5. For each (Mi , Mq ) ∈ M × M, we denote tiq the Mi , Mq inter-machine traffic ((i, q) ∈ {1, . . . , m − 1} × {i + 1, . . . , m}). That is: tiq =

p  rk vkiq k=1

In addition, we define: 6. The undirected flow graph G = (M, E), where the set of edges E is the set of non-ordered machine couples that are connected by a positive traffic or that are in SC or S N : E = {eiq /(Mi , Mq )∈M × M, (i, q) ∈ {1, . . . , m − 1} × {i + 1, . . . , m}; i  = q and tiq  = 0} ∪ SC ∪ S N

272

M. Boulif

7. Edge weight function W : W (eiq ) = tiq where (i, q) ∈ {1, . . . , m − 1} × {i + 1, . . . , m}. Remark 18.1 If the flow graph is not connected, it must be connected by adding fictive edges with null weights. This procedure permits the assumption that the flow graph is connected from here on.

18.2.3 Decision Variables   8. Let C S ={w1 , w2 , . . . , w|C S| } be a subset of cuts of G such that mN ≤ |C S|< m where [x] is the integer part of the number x, |X | is the cardinal of the set X and N is the maximum cell size. In next section, we shall give further explanations on how a cut set can define a partition.

18.2.4 Intermediate Processing 9. Let C ={C1 , C2 , . . . , C J } be the set of connected components of the graph G after removing all the edges of C S cuts. That is, those of the graph   |C S| G(C) = M, E − ∪i=1 wi . C is a partition of M in J cells. That is, C j = ∅, ∀ j ∈{1, 2, . . . , J }: ∪ Jj=1 C j = Mand C j ∩ C g = ∅, ∀ j, g ∈ {1, 2, . . ., J }, j = g. 10. Upon C S we can define the subset of intercellular edges: |C S|

E (C S) = ∪i=1 wi , 11. and the total intercellular traffic: T (C S) =

 eiq ∈E(C S)

  W eiq .

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

273

18.2.5 Constraints In order to be feasible, a partition C associated to C S must satisfy the following constraints: 12. Maximum number of machines allowed in each cell, N : ∀C j ∈ C, ( j = 1, 2, . . . , J ) : |C j | ≤ N . 13. Cohabitation constraints: ∀(Mi , Mq ) ∈ SC, ∀w ∈ C S : eiq ∈ / w, where (i, q) ∈ {1, . . . , m − 1} × {i + 1, . . . , m}.

14. Non-cohabitation constraints: ∀(Mi , Mq ) ∈ S N , ∃w ∈ C S : eiq ∈ w, where (i, q) ∈ {1, . . . , m − 1} × {i + 1, . . . , m}.

18.2.6 Objective Function 15. Let S be the set of cut subsets that respect the previous constraints. The problem is to find a solution C S ∗ ∈ S, such that:   Z CS ∗ = Min T (C S) CS∈S

This means to seek a cut subset that respects all the constraints and has the minimum amount of intercellular traffic.

18.3 Theoretic Preliminaries Since a solution is a graph partition, it can be represented by a sum, using the boolean operator OR , denoted +, of cuts (A cut is a subset of edges that can be associated with a subset of vertices A, in which all these edges have exclusively and exactly one endpoint). The solution of Fig. 18.1, for example, can be defined by the sum of two of the three depicted cuts w1 , w2 and w3 . For instance, the sum of w1 = (0, 1, 1, 1, 0, 0, 1, 0) and w2 = (0, 1, 1, 0, 1, 1, 1, 0) yields w1 + w2 = (0, 1, 1, 1, 1, 1, 1, 0), which is sufficient to determine the associated solution. In fact, the cuts are represented by binary vectors in which the ones indicate the associated edges. For example, w1 = (0, 1, 1, 1, 0, 0, 1, 0) is constructed by e2 , e3 , e4 and e7 because their corresponding values equal one. The obtained solution vector is an edge encoding representation [1]. However,

274

M. Boulif

Fig. 18.1 A graph partition with its constructor cuts

the allele values used to identify intercellular and intracellular edges are inverted. According to this interpretation, the obtained sum (0, 1, 1, 1, 1, 1, 1, 0) sets e1 and e8 intracellular and the remainder edges are intercellular, yielding the solution C = {{M1 , M3 }, {M2 }, {M4 , M5 }} of Fig. 18.1. A partition being a sum of cuts yields to the fact that the search space can be covered using these “graph creatures”. However, the edge-based cut codification is not suitable to be used as a genetic encoding representation because a random binary vector is not necessarily a cut and therefore this will require resorting to a repair function when generating the initial population or whenever a GA operator is applied. Fortunately, we can overcome this first hurdle by using cut properties in graph theory. In fact, cuts define a vector space that can be covered by the X O R operator and a subset of only m-1 special cuts (the base of the vector space). There are several manners to get a cut base. Among the simplest ways, the direct method consists of choosing any m − 1 vertices and then getting the cuts associated with the singletons of the chosen vertices. For instance, for the graph of Fig. 18.1, if we choose the singletons {M1 }, {M2 }, {M3 }, {M4 } their associated cuts, i.e. w({M1 }) = (1, 1, 1, 0, 0, 0, 0, 0), w({M2 }) = (0, 0, 0, 1, 1, 1, 0, 0), w({M3 }) = (1, 0, 0, 1, 0, 0, 1, 0) and w({M4 }) = (0, 1, 0, 0, 1, 0, 0, 1), define a cut base. Any cut has a unique representation as a X O R sum of basic cuts. For example, w1 = w({M1 }) X O R w({M3 }) and w2 = w({M1 }) X O R w({M2 }) X O R w({M3 }). For further theoretic issues on cut properties see [14].

18.4 The Cut-Based GA Genetic algorithms are one of the famous optimization approaches that imitate natural evolutionary processes. In this section, the general principles of GA are first presented [1], followed by a description of the GA applied to the MCF problem.

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

275

18.4.1 Principles of the Genetic Algorithm Holland [6] is considered to be the founder of the modern Genetic Algorithms. These algorithms are based on an analogy to natural selection. First, a chromosome structure is set to represent the solutions of the problem. Afterwards, an initial population of chromosomes is generated, either randomly or by using a given heuristic. Then, members of the population are selected, based on an evaluation function, called fitness. The fitness associates a value to each member according to its objective function. Genetic operators are then applied to the selected members to produce a new population generation. This process is repeated until achieving a certain stopping criterion. Implementing genetic algorithms requires defining the following aspects: 1. 2. 3. 4. 5.

the structure of the genetic code used for representing solutions; the method for generating the initial population of solutions; an adaptation function to evaluate the fitness of each solution; the genetic operators used for producing a new generation; and certain control parameter values (eg. population size, number of iterations, genetic operator probabilities).

18.4.2 GA Implementation In what follows, we present some implementation details of the cut-based GA.

18.4.2.1

Cut-Based Encoding

The graph theory model allows encoding each solution by a chain of K × (m − 1) binary alleles, where K = [m/N ] ([x] denotes the integer part of x). In other words, this chain is composed of K parts of length m-1. Each part allows to define a cut by specifying the basic cuts that construct it with the X O R sum. Therefore, each one of the K parts yields a cut. Afterwards, by combining these cuts with an OR sum, we get the associated partition solution. The definition form of K is due to the fact that a typical good solution has probably a moderate number of cells which cannot be less than m/N and thus, K cuts are generally sufficient to construct a good solution. For example, by supposing K to be equal to 3, the three-cell solution of Fig. 18.1 would be coded by the following chain: par t 1

par t 2

par t 3

w({M 1 }) w({M 2 }) w({M 3 }) w({M 4 }) w({M 1 }) w({M 2 }) w({M 3 }) w({M 4 }) w({M 1 }) w({M 2 }) w({M 3 }) w({M 4 }) 1

0

1

0

1

1

1

0

0

0

0

0

The interpretation of this chromosome structure is straightforward: part 1 uses w({M1 }) and w({M3 }) to define the first cut w1 . The second uses w({M1 }), w({M2 })

276

M. Boulif

and w({M3 }) yielding w2 . In the third part all the alleles are equal to zero, and thus, no cut is generated. The two cuts, w1 and w2 , yield the associated partition by combining them with an OR sum. The most important advantage of binary coding is that GAs are positively sensitive to reduced alphabets. With a binary alphabet, it becomes easier to the GA to detect the good building blocks of the individuals’ codes. However, the cut-based GA suffers from a high level of redundancy. In fact, a solution is not affected by swapping its parts and, furthermore, we can have two equal parts. To overcome this second hurdle, we sort every solution without repetition. This sorting can be directly applied on the binary string by using a lexicographical order, or it can consider for each part the decimal number taken from the sub-string of the part when it is supposed binary coded (see [20]). For example, the previous chain is coded in decimal as follows: par t 1 par t 2 par t 3 10

14

0

Thus, part 2 sub-string must be put first, then part1. The sorting ignores part3 because it is not associated to a valid cut. Hence, after this sorting procedure, there will be no equal parts except for a possible sequence-of-zeros tail.

18.4.2.2

Initial Population

The initial population is randomly generated without repetition. To get a solution, if m is moderate, we can generate K integers from the interval [0, 2m−1 −1] (see [20]). Each integer, when it is binary coded, will give a cut part of the solution chain. After applying the sorting procedure, if there is no equivalent member in the population, the solution is accepted.

18.4.2.3

Fitness and Selection

To allow the GA to get advantage of the good information infeasible solutions can hide, the fitness is calculated by using a transformation function proposed by [1]. This method enables the GA to distinguish between feasible solutions and infeasible ones as well as between good and less good feasible solutions. By using this fitness, the “Roulette wheel” random procedure [3] selects the individuals eligible to the crossover.

18.4.2.4

Crossover and Mutation

For simplicity, we have opted for one-cutting-point crossovers. The first one is classical and allows putting the cutting point in any random point of the chain. The second

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

277

Fig. 18.2 Crossovers

crossover allows putting the cutting point only between consecutive cut parts (see Fig. 18.2). The ratio of individuals that will undergo a crossover operator is defined by the parameter Pc. The rest give up their places to other randomly generated members. The mutation operator consists of randomly choosing a ratio of Pm members. For each one, a cut part is replaced by another randomly-generated one. For the parameter settings, empirical experimentation has been conducted to choose the parameter values that push the GA to perform at its best in a small amount of time (less than one minute).

18.4.2.5

Stopping Criterion

After the selection-crossover-mutation process, the sorting procedure is applied on each individual of the population. The best individual that has been saved before the three step process is then reinserted in the population (elitism). This process is repeated until a certain number of iterations i max is reached.

18.4.2.6

Cut Based GA Algorithm

The pseudo code of the GA we implemented is as follows: 1. Get random population (with feasible or infeasible individuals without repetition); [Apply the sorting procedure on each individual of the population;] 2. Evaluate population using fine tuning procedure (see [1]); 3. Repeat

278

M. Boulif

• Save the best fitted individual; • Get mating population : Select Pc proportion of individuals from population using the Roulette Wheel Selection procedure; • Apply crossover (pick randomly one of the two crossovers on the mating population and replace parents by their offspring; randomly generate the rest (1-Pc) of the population; • Apply mutation on the resulting population with Pm ratio; • Reinsert the best fitted individual; • [Apply the sorting procedure on each individual of the population;] • Re-evaluate population; Until i max iterations;

18.5 Computational Results Two variants of the cut based genetic algorithm were implemented. The first one does not implement the sorting procedure whereas the second does. The two cut based GA were also compared with the edge based GA [1] and our implementation of the well-known Kmeans clustering [8]. Our implementation of Kmeans initializes the centroids by using random rows of the machine-to-machine-product-flow matrix. In addition, since Kmeans uses a known number of clusters (cells), we call Kmeans for all the integer values in [m/N , m − 1] interval; hence, we call it multiKmeans. The corresponding pseudo code is as follows: • Loop for k = [m/N ] to m − 1 do – Call kMeans(input: number of cells (clusters) k, machine-to-machine- product -flow adjacency matrix, output: machine partition solution); – update the best solution if outperformed; • Loop end; The four applications were processed on a Core i3 microcomputer with a clock speed of 2,1 GHz and 3.8 Go of RAM managed by a 32-bit Linux operating system. We coded them by using a C++ compiler. In the following paragraphs, the four methods are referred to as CGA for the Cut based Genetic Algorithm without the sorting procedure, SCGA for the Cut based Genetic Algorithm with the sorting procedure, EGA for the Edge based Genetic Algorithm, and multiKmeans. Four examples are taken from the literature [1] and a fifth example has been randomly generated [20]. They are sorted according to their size, assumed to be equal to the product p × m (number of products × number of machines). The five examples have a size of 20 × 8, 20 × 20, 40 × 20, 51 × 20 and 100 × 50 respectively; and the maximum number of machines per cell is set to 5 for all but the third example where it is set to 4 and the fifth. For this largest example, we used two values for the maximum cell size: N = 7 (example 5a) and N = 15 (example 5b).

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

279

Table 18.1 Computational results (∗ UF : unfeasible solution) Example 1: EGA CGA SCGA Pop. Gen. Avrg. Best tr. Cpu (s) Avrg. Best tr. Cpu (s) Avrg. Best tr. Cpu (s) tr. tr. tr. 100

100 200 300 300 100 200 300 500 100 200 300 Example 2: Pop. Gen. 100

100 200 300 300 100 200 300 500 100 200 300 Example 3: Pop. Gen. 100

300

500

100 200 300 100 200 300 100 200 300

16.0 19.8 25.2 18.6 20.0 19.0 17.3 16.7 18.45 EGA Avrg. tr. 30.1 30.4 27.5 27.7 28.4 26.0 26.9 26.0 24.1 EGA Avrg. tr. – – 78.1 60.0 71.8 67.2 62.1 64.9 62.0

13 17 19 15 13 13 13 13 13

0.07 0.13 0.19 0.22 0.42 0.65 0.40 0.82 1.21

26.0 27.6 27.4 28.6 27.2 26.0 27.4 25.4 26.0 CGA Best tr. Cpu (s) Avrg. tr. 27 0.06 45.9 25 0.10 44.0 22 0.15 41.8 24 0.17 40.9 25 0.34 41.6 22 0.54 40.8 22 0.30 39.5 20 0.60 39.7 21 0.90 38.7 CGA Best tr. Cpu (s) Avrg. tr. UF∗ – 85.0 UF – 85.0 66 0.16 84.9 51 0.19 82.7 56 0.35 82.3 47 0.52 82.3 56 0.40 80.5 50 0.62 80.3 49 0.93 76.7

20 22 24 21 20 21 19 19 19

0.13 0.25 0.40 0.68 1.59 2.58 1.67 4.34 6.10

27.0 24.5 25.0 24.1 26.0 27.7 25.9 26.3 26.6 SCGA Best tr. Cpu (s) Avrg. tr. 43 0.13 38.9 41 0.29 38.4 38 0.49 37.0 38 0.53 36.2 39 1.35 35.5 36 1.62 34.8 37 1.07 34.8 37 2.33 31.6 34 3.40 33.2 SCGA Best tr. Cpu (s) Avrg. tr. 80 0.25 83.7 81 0.48 80.5 81 0.73 80.1 79 0.78 74.8 80 1.55 73.6 80 2.36 73.4 62 1.51 75.6 72 2.98 73.1 65 4.54 70.2

22 21 20 17 20 22 22 19 19

0.15 0.29 0.44 0.77 1.83 2.56 2.57 3.99 6.56

Best tr. Cpu (s) 38 33 30 32 26 31 31 25 25

0.11 0.24 0.34 0.52 1.08 1.55 1.04 2.18 3.19

Best tr. Cpu (s) 75 70 72 68 64 61 68 57 61

0.24 0.47 0.72 0.82 1.61 2.68 1.60 3.37 4.98 (continued)

280 Table 18.1 (continued) Example 4: EGA Pop. Gen. Avrg. tr. 100 100 211.2 200 185.7 300 199.5 300 100 202.7 200 199.2 300 183.5 500 100 183.8 200 196.4 300 192.7 Example 5a: EGA Pop. Gen. Avrg. tr. 100 100 – 200 – 300 – 300 100 – 200 529.6 300 – 500 100 530.4 200 528.9 300 520.3 Example 5b: EGA Pop. Gen. Avrg. tr. 100 100 506.7 200 – 300 – 300 100 – 200 525.8 300 522.2 500 100 526.1 200 523.8 300 486.2

M. Boulif

CGA Best tr. Cpu (s) Avrg. tr. 184 0.09 192.3 165 0.18 187.7 168 0.26 190.6 162 0.29 173.0 178 0.57 163.7 102 0.86 164.1 118 0.53 170.5 146 1.06 161.2 155 1.74 160.0 CGA Best tr. Cpu (s) Avrg. tr. UF – 515.1 UF – 516.1 UF – 513.7 UF – 503.0 465 1.49 502.2 UF – 484.7 482 1.36 492.2 492 2.95 483.0 455 3.19 464.9 CGA Best tr. Cpu (s) Avrg. tr. 398 0.18 461.5 UF – 457.3 UF – 449.0 UF – 448.5 390 1.77 434.7 415 2.40 432.0 460 1.03 435.4 349 2.15 428.6 394 3.78 405.0

SCGA Best tr. Cpu (s) Avrg. tr. 166 0.44 183.7 153 0.93 183.5 167 1.32 180.8 146 2.17 169.3 144 4.04 165.1 145 7.3 165.3 152 5.36 161.7 140 10.53 155.5 147 16.11 150.3 SCGA Best tr. Cpu (s) Avrg. tr. 511 3.00 487.8 513 6.02 486.9 503 8.84 477.7 436 9.94 477.5 453 18.48 463.4 422 25.19 465.1 447 15.50 466.6 433 31.94 459.4 422 45.02 443.1 SCGA Best tr. Cpu (s) Avrg. tr. 449 0.71 390.0 446 1.41 378.5 392 2.27 369.9 439 2.24 368.9 363 4.51 367.9 390 7.91 363.4 336 4.14 364.6 369 8.18 366.0 345 16.16 358.5

Best tr. Cpu (s) 167 139 145 139 132 143 140 136 110

0.38 0.85 1.25 1.81 3.18 5.00 3.78 9.04 11.52

Best tr. Cpu (s) 453 435 429 447 435 434 432 435 418

3.42 6.89 9.93 10.34 20.37 31.61 18.12 36.26 54.56

Best tr. Cpu (s) 350 342 339 350 341 326 338 341 331

0.72 1.48 2.21 2.41 5.09 7.79 4.40 9.61 14.83

Aiming at using moderate resources, we considered the evolutionary methods with the following parameter values:

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

281

Fig. 18.3 Best traffic

• Population size : 100, 200, 300, 400 and 500; • Number of generations : 100, 200 and 300; Furthermore, values in the interval [0.6,0.8] for the crossover rate, and in [0.01,0.05] for the mutation rate have shown comparable performances. Therefore, we have chosen to set them to 0.7 and 0.03 respectively. We run each one of the three methods twenty times, and then we have reported the best average traffic and the best solution with its own computational running time. The results obtained by the three evolutionary methods are reported in Table 18.1 (the best performances for each method are boldfaced and the best scores for each example are underlined). Whereas multiKmeans gave 32, 50, 194, UF, UF and 416 in less than 0.1 s for the six instances respectively. From Table 18.1, in the first example, when compared to the other methods EGA had a twofold performance: it reached a better value of the objective function in a lesser running time. In the second example, EGA was still able to reach a better value of the traffic. However, the average traffic of EGA reveals a great difficulty in reaching these best values. Indeed, CGA and SCGA were clearly more responsive to increasing the population size by reaching far better solution in the average, as depicted in Fig. 18.3 for the fourth example. Figures 18.4 and 18.5 further stress the analysis by depicting the values of the results by using bar charts, and as it can be deduced, the three evolutionary methods are far better than Kmeans based approach that does implement neither a constraint handling routine nor a mechanism to avoid local optima trapping. Indeed, in these figures, the multiKmeans values for the examples 4 and 5a are unfeasible. We can also realise that EGA struggles to reach feasible solutions in small population sizes and limited number of generations in comparison to its counterparts. However, SCGA that gave the best performances in average requires more time resources especially when the expected number of cells (clusters) for good solutions

282

M. Boulif

Fig. 18.4 Best traffic

Fig. 18.5 Best average traffic

grows. Indeed, when the maximum cell size decreases, cut based methods need more graph cuts to construct feasible solutions because with the overhead of the sorting procedure SCGA needs more time to achieve its optimisation process.

18.6 Conclusions This paper porposes a new graph-cut-based encoding representation to solve the cell formation problem with the genetic algorithm. The obtained performance shows that it requires less efforts to reach promising areas especially when the expected number

18 A New Cut-Based Genetic Algorithm for Graph Partitioning …

283

of cells for good solutions is moderate. We suggest continuing this work in the following directions. First, we are interested by adopting other ways for constructing the cut base. Inspecting then their influence on the performance of the cut based GA will be a good path of investigation. Second, the compared evolutionary methods are from the same family as they belong to the edge-based approach. This suggests a co-evolutionary solving approach is very promising. Finally, the branch and bound enhancement [1] being closer to the cut-based GA than the edge-based one, it seems that a hybridization of the two methods is another promising research path.

References 1. M. Boulif, K. Atif, A new branch-&-bound-enhanced GA for the manufacturing cell formation problem. Comput. Operat. Res. 33, 2219–2245 (2006) 2. S.P. Darla, C.D. Naiju, P.V. Sagar, B.V. Likhit, Optimization of inter cellular movement of parts in cellular manufacturing system using genetic algorithm. Res. J. Appl. Sci. Eng. Technol. 7(1), 165–168 (2014) 3. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (Addison Wesley, Reading, MA, 1989) 4. T. Gupta, H. Seifoddini, Production data based similarity coefficient for machine-component grouping decision in the design of a cellular manufacturing system. 28, 1247–69 (1990) 5. Y.P. Gupta, M.C. Gupta, A. Kumar, C. Sundram, Minimizing total intercell and intracell moves in cellular manufacturing: a genetic algorithm approach. Int. J. Integ. Manuf. 8(2), 92–101 (1995) 6. J.H. Holland, Adaptation in Natural and Artificial Systems (University of Michigan Press, Ann Arbor, 1975) 7. S. Kaparthi, N.C. Suresh, Machine-component cell formation in group technology: a neural network approach. Int. J. Prod. Res. 30(6), 1353–1367 (1992) 8. S.P. Lloyd, Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982) 9. R. Logendran, P. Ramakrishna, Manufacturing cell formation in the presence of Lot splitting and multiple units of same machine. Int. J. Prod. Res. 33, 675–693 (1995) 10. S.S. Mahapatra, R.S. Pandian, Genetic cell formation using ratio level data in cellular manufacturing systems. Int. J. Adv. Manuf. Technol. 38, 630–640 (2008) 11. J. McAuley, Machine grouping for efficient production. Prod. Eng. 52(2), 53–57 (1972) 12. S. Merchichi, M. Boulif, Constraint-driven exact algorithm for the manufacturing cell formation problem. European J. Ind. Eng. 9(6), 717–743 (2015) 13. W. Nunkaew, B. Phruksaphanrat, Effective fuzzy multi-objective model based on perfect grouping for manufacturing cell formation with setup cost constrained of machine duplication. Songklanakarin J. Sci. Technol. 35(6), 715–726 (2013) 14. R. Diestel, Graph theory, in Springer’s Graduate Texts in Mathematics Series, 4th edn. (Springer, Berlin, 2010), pp. 23–27 15. H. Seifoddini, Single linkage versus average linkage clustering in machine cells formation applications. Comput. Ind. Eng. 16, 419–426 (1989) 16. A. Souilah, Les systemes cellulaires de production : Agencement intracellulaire, Ph.D. Dissertaion, University of Metz (1994) 17. V. Venugopal, T.T. Narendran, Cell formation in manufacturing systems through simulated annealing: an experimental evaluation. Eur. J. Oper. Res. 63, 409–422 (1992) 18. Y. Yin, K. Yasuda, Similarity coefficient methods applied to the cell formation problem: a taxonomy and review. Int. J. Prod. Econ. 101(2), 329–352 (2006)

284

M. Boulif

19. M. Boulif, Genetic algorithm encoding representations for graph partitioning problems, in Machine and Web Intelligence (ICMWI), International Conference on (IEEE, Algiers, 2010), pp. 288–291 20. M. Boulif, manuscript2018MetaFinalSupMaterial_HOL.pdf. figshare. Dataset (2019). https:// doi.org/10.6084/m9.figshare.11416998.v1

Chapter 19

Memetic Algorithm and Evolutionary Operators for Multi-Objective Matrix Tri-Factorization Problem Rok Hribar, Gašper Petelin, Jurij Šilc, Gregor Papa, and Vida Vukašinovi´c

Abstract In memetic algorithm, a population based global search technique is used to broadly locate good areas of the search space, while repeated usage of a local search heuristic is employed to locate optimum. Intuitively, evolutionary operators that generate individuals with genetic material inherited from the parents and improved performance ability should be the right option for improved performance of the algorithm in terms of time and solution quality. Evolutionary operators with such properties were devised and used in memetic algorithm for solving multi-objective matrix tri-factorization problem. It was shown, by comparing deterministic naive approach with two variants of memetic algorithm with different level of inheritance, that evolutionary operators do not improve performance in this case. Further analysis showed that even though proposed evolutionary operators inherit high fitness from its parents, local search does not perform well on such offspring which results in poor performance.

R. Hribar (B) · J. Šilc · G. Papa · V. Vukašinovi´c Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia e-mail: [email protected] J. Šilc e-mail: [email protected] G. Papa e-mail: [email protected] V. Vukašinovi´c e-mail: [email protected] G. Petelin Faculty of Computer and Information Science, University of Ljubljana, Veˇcna pot 113, 1000 Ljubljana, Slovenia e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_19

285

286

R. Hribar et al.

19.1 Introduction The level of data generated within different area of our life has drastically increased with the expand of information technology. As a consequence, an interests to extract meaningful information out of collected data significantly grew and the knowledge discovery has become widely studied research area. Within this research, we try to understand data by forming groups of instances, i.e. clusters, where instances in the same cluster are in some sense more similar to each other than the instances in other clusters. The problem studied in this paper is non-negative matrix factorization (NMF) problem which generalizes kernel k-means clustering, bipartite graph k-means clustering and spectral clustering problem [1]. Original NMF factorizes input non-negative matrix R into two non-negative matrices so that R ≈ G Q T , n×k m×k where R ∈ Rn×m + , G ∈ R+ , and Q ∈ R+ . NMF’s main objective is clustering of columns of R. While NMF can capture two types of relations, non-negative matrix tri-factorization (NMTF) R ≈ G S Q T can capture more types of information [2]. Both approaches are used for revealing hidden patterns in large real-world datasets and give a good framework for simultaneously clustering the rows and columns of R. NMTF problem can be encountered in image processing, text mining, hyperspectral unmixing and bioinformatics [3, 7, 8]. Additionally, Buono and Pio proved that NMTF has several advantages compared to the original NMF approach [4]. This paper is concerned with the behavior of evolutionary operators used in solving NMTF problem with presence of additional local search strategies. In Evolutionary algorithm (EA), mechanisms inspired by biological evolution such as selection, crossover and mutation influence the evolution of population and implicitly lead the performance of evolutionary search. Mitchell and Holland analyzed promising features of genetic algorithm for its speedup and suggested that crossover in idealized genetic algorithm should create instances with higher fitness [5]. Doerr et al. provided first theoretical proof for usefulness of crossover for non-artificial problem [6]. In this work, we develop a mutation and crossover operator which, applied on matrices, are able to provide solutions with lower objective values and we compare memetic algorithms (with and without suggested crossover) and deterministic naive approach on non-negative matrix tri-factorization problem (NMTF).

19.2 Multi-Objective Non-Negative Matrix Tri-Factorization Problem The aim of NMTF is to extract insights of intra-relations of some data set. If the intra-relations are expressed by non-negative symmetric matrix R, then by NMTF of the form R = G SG T , where G and S are non-negative matrices of dimensions much smaller than dimension R, some insights how data is clustered and what are relations among those clusters can be provided. Further, in a co-clustering version

19 Memetic Algorithm and Evolutionary Operators …

287

of NMTF problem a set of matrices Ri needs to be factored using the same G as it is shown in Eq. (19.1).

Ri

G

Si

GT

(19.1) Here, the columns of G can be interpreted as clusters, while components of Si can be interpreted as interactions among these clusters. This problem can be stated as an optimization problem by minimizing relative square error  Ri − G Si G T 2F RSE = i  , (19.2) 2 i Ri  F where · F is the Frobenius norm. Note, that optimal solution has RS E = 0, while trivial solution (G = 0 and Si = 0) has RS E = 1. Given that G matrix is common to all Ri tri-factorizations, all dimensions of Si are the same. This dimensions’ number can be interpreted as the number of clusters and it is not known in advance. In order to assure high ability of data relationships’ interpretation, the second objective is to minimize (19.3) k = dim Si . In this respect, RSE minimization ensures the accuracy of tri-factorization and k minimization ensures that the size of representation is as small as possible. Note that the objectives in Eqs. (19.2) and (19.3) are contradictory because the capacity of G SG T model grows with k.

19.3 Classical Evolutionary Operators Motivation behind developing novel evolutionary operators tailored specifically for NMTF problem (presented in 19.5.1) is the inability of classical evolutionary operators to inherit good traits from the parents. Several classical crossover operators were tested which are listed in Table 19.1. Behavior of these evolutionary operators when applied to NMTF problem is depicted in Fig. 19.1. In all cases the offspring does not inherit low RSE values. Classical crossover operators do diversify the population which is also their aim, however, to a such degree that offspring appear random compared to its parents. In this regard such evolutionary search resembles random search. Furthermore, classical evolutionary operators are not suitable for multi-objec-tive NMTF problem. This is because they need parents of same dimension. In case of multi-objective NMTF problem, where different individuals have different k, the dimension of individual in the population varies. To fully benefit from solving this

288

R. Hribar et al.

Table 19.1 Classical crossover operators applied to NMTF problems Acronym Full name SBX SPX DE RC

Simulated binary crossover Simplex crossover Differential evolution Random column

1.2 1

RSE of trivial solution

RSE

0.8 0.6 0.4 0.2 RSE before crossover 0

no crossover

SBX

SPX crossover operator

DE

RC

Fig. 19.1 Distributions of RSE of individuals produced by crossover from parents with RSE from the shaded range

problem in the multi-objective paradigm, there is a need for a crossover that can take parents with different values of k. In this way the information about promising clusters can flow among solutions with different number of clusters.

19.4 Naive Approach There is a simple approach for solving multi-objective NMTF problem by repetitively performing gradient descent for different k values until a satisfactory approximation of Pareto front is found. If k is fixed, RSE can be minimized via gradient descent since RSE from Eq. (19.2) is a differentiable function of G and Si . Libraries for automatic differentiation such as tensorflow, theano or CNTK can be used to calculate the gradient of RSE with respect to G and Si and update them in direction of the gradient. It must be stressed that the second objective k is not differentiable, hence gradient descent can only be used to minimize one objective.

19 Memetic Algorithm and Evolutionary Operators …

289

In this work, special version of gradient descent algorithm called Adam [9] is used. This algorithm is well suited for large problems and has two main benefits. One is is the adaptive learning rate control which changes step size during descent in response to changes in gradient magnitude. The second benefit is the use of momentum which prevents oscillations in narrow valleys of the search space and gives the descent an ability to skip shallow local minima. Both traits reduce the number of steps needed to find a local minimum. The non-negativity constraint encountered in problem definition can be easily fulfilled if absolute value is applied to G and Si before every RSE calculation. In this way a step of gradient descent going through the bound of the feasible region is effectively bounced back to the feasible region. A stopping criterion for gradient descent was devised where relative differences in objective function among successive steps are used as an indicator of convergence. Median of the several past differences was found to be a reliable estimate of the pace of convergence.1 When this pace falls far below previously encountered ones the gradient descent is stopped. Additionally, descent is also stopped if maximum number of steps is exceeded. Even though k is not differentiable, it is possible to solve the two-objective optimization problem using only gradient descent. In naive approach (NA) Adam is performed for many different k until satisfactory approximation of Pareto front is acquired. However, there are no guidelines how large the desired k might be and the search can be concentrated to a region where k is too small. In such cases, valuable computational time is being wasted.

19.5 Memetic Algorithm The term memetic algorithm is used to describe a synergetic combination of an evolutionary approach and local improvement procedure. In this work memetic algorithm for above described NMTF problem is developed with the main motivation to benefit from the combination of good hereditary features of EA and efficient Adam, which is used as local improvement procedure. Stopping criteria for Adam are the same as in naive approach.

19.5.1 Evolutionary Algorithm Standard EA operators are adapted in such a way that offspring inherit good traits from its predecessors, i.e. no matter what is the dimension of S, a comparably low RSE value is ensured. The aim of EA is to provide good starting individuals with 1 Mean

was found to be too susceptible to outliers which are also encountered during gradient descent.

290

R. Hribar et al.

Fig. 19.2 Crossover of two parents

various dimension k for further treatment with Adam, which is usually able to further decrease RSE. In this manner evolutionary algorithm is used only to find good initial points for gradient descent which should reduce the computational load to a high degree. The difference between suggested adapted EA and the classical one is that in the suggested algorithm the evolutionary process starts with a very small initial population which is growing linearly over time. As a selection of individuals for breeding, a tournament selection is used. We privilege individuals, where Adam was successful, hence the criteria to win the tournament is the lowest RSE. Evolutionary search is also slowly switched from exploration to exploitation; in the beginning parents are selected at random, while in the later generations individuals with lower RSE are preferably chosen to become parents. This is accomplished by setting the proportion between the tournament size and the number of individuals in the population constant over all generations. Crossover operator used in this work combines two parents and produces one offspring. Offspring’s G matrix is a concatenation of parents’ G matrices along rows, while offspring’s Si matrices are a direct sum of parents’ Si matrices, see Fig. 19.2 for an illustration. Note, that by crossover operator individuals with enlarged dimension k are obtained but it holds that RSE(offspring) ≤ 1/2 (RSE(parent1) + RSE(parent2)).

(19.4)

In order to prove statement (19.4) it is sufficient to show that this inequality holds for a single summand in Eq. (19.2). Let M1 , M2 be G SG T —products of the parents, while the offspring’s G SG T -product is 1/2(M1 + M2 ) by definition. Using this fact, it follows R − 1/2(M1 + M2 )2 ≤1/4 (R − M1  + R − M2 )2   ≤1/2 R − M1 2 + R − M2 2 ,

(19.5) (19.6)

where in (19.5) triangle inequality and in (19.6) the fact that 2x y ≤ x 2 + y 2 was used2 . Clearly, an offspring inherits comparable low RSE value from its parents. 2 2x y

≤ x 2 + y 2 is equivalent to 0 ≤ (x − y)2 .

19 Memetic Algorithm and Evolutionary Operators …

291

Fig. 19.3 Mutation where deletion of one column of G was applied

Mutation operator used in this work either deletes or adds columns to matrix G and corresponding rows and columns to matrices Si , see Fig. 19.3 for an illustration. Columns and rows added by mutation are populated with small random values. In case of deletion, columns and rows that contribute the least to the G Si G T  F are chosen. If columns of G are normalized, then by inspecting the smallest values of Si components it is easy to determine which columns contribute the least. In this regard, those rows and columns of Si and corresponding ones in G are deleted that are least significant. By applying a mutation on the selected individual, new offspring with different k and minimally altered RSE is obtained. New individuals, constructed by crossover and mutation operators, are improved using Adam algorithm at the end of every generation. Offspring created by mutation or crossover both inherit good RSE from their predecessors. In this regard, the main purpose of proposed evolutionary operators is their ability to construct good initial points for gradient descent from already descended individuals from the population. By using evolutionary operators, information about good clusters and their interactions is able to flow across individuals that have matrices of different dimensions. More importantly, good clusters found in lower dimensions can be passed to higher dimensional individuals for which computational load of using Adam is more pronounced. By passing this information to large individuals, the number of gradient descent steps should be reduced to a large degree. Two versions of memetic algorithm were used in this work. The M1, performs only mutations and the second, M2, performs both crossovers and mutations.

19.6 Experiments A test problem was constructed with 5 matrices Ri of dimension 800 that have a known minimum at RSE = 0 and k = 50. Approximately 1/3 of Ri components were non-zero and their magnitude was around one. Three algorithms were used to

292

R. Hribar et al.

solve this problem, i.e. M1, M2, and NA. Each algorithm was run 12 times due to limited computational resources. Basic component of all algorithms in this work is Adam. The parameters of Adam algorithm were manually tuned beforehand, starting with values proposed in the literature [9]. The optimal parameters found were α = 0.001, β1 = 0.9 and β2 = 0.99 where notation from [9] is assumed. In order to ensure reasonable execution time, maximum number of steps for Adam was chosen to be 5000. Convergence criterion was fulfilled when median of last 150 relative differences dropped below one third of the worst seen median. This convergence criterion was devised by observing how gradient descent progresses for this type of problems. Initial experiments showed that when this criterion is fulfilled, there is a small probability that continuing gradient descent will bring further improvements. Initial matrices G and Si for all algorithms were chosen randomly where matrix components were drawn from uniform distribution on interval [0, 0.01]. In this way initial matrices are close to the trivial solution (G = 0 and Si = 0) with RSE ≈ 1. Initial experiments have shown that taking larger initial components of the matrices results to a larger number of steps before gradient descent converges. Algorithm NA started with k = 1 and incremented k by one in each generation. For each k Adam is executed starting from initial random matrices. NA was stopped when it encountered an individual with RSE < 0.01. Both M1 and M2 started with a population of 4 individuals whose k was chosen from a uniform distribution on set {1, 2, . . . , 7}. M1 preforms only mutations, while M2 preforms both crossovers and mutations. For each crossover M2 performs two mutations. The number of columns deleted or added during mutation was drawn from geometric distribution with expected value equal to 3.0. At the end of each generation gradient descent was performed on all new individuals. Crossover of a parent with itself was prevented due to the inability of gradient descent to improve such an offspring. The stopping criterion is the same as for NA which is fulfilled when RSE < 0.01 for some individual in the population.

19.7 Results A comparison was done among M1, M2 and NA algorithms with regard to the hypervolume and the number of evaluations. Figs. 19.4 and 19.5 depict the distributions of these two indicators. A depiction of Pareto front approximations obtained over all runs by each algorithm is shown in Figs. 19.6, 19.7, 19.8 The number of runs is rather small for statistical comparison, however it can give indications for further work. The data for both comparisons was analyzed using the traditional approach [10] and the Deep Statistical Comparison (DSC) approach [11]. Using the traditional approach, the normality condition (checked with the onedimensional Kolmogorov-Smirnov test for normality) with regard to the number of evaluations was satisfied, while with regard to the hypervolume was not satisfied. For both comparisons, the homoscedasticity of the variance was checked with Levene’s

19 Memetic Algorithm and Evolutionary Operators …

293

Fig. 19.4 Box plot of hypervolumes for 12 runs of algorithms M1, M2 and NA

Fig. 19.5 Box plot of number of evaluations for 12 runs of algorithms M1, M2 and NA

test and for both comparisons the condition was not satisfied. For this reason, the Kruskal–Wallis test was selected as an appropriate omnibus statistical test. With regard to the hypervolume, there is a statistical significance between the three algorithms, and this significance comes from the difference between the pairs M1, NA and M2, NA. For the same comparison the DSC approach showed that there is a statistical significance between the three algorithms M1, M2 and NA, and they are ranked as 2, 3, and 1, respectively, which can also be seen in Fig. 19.4. Regarding the number of evaluations, the Kruskal–Wallis test showed there is a statistical significance between the three algorithms, however the post hoc test according to Dunn showed that the significance comes with regard to the pairs M1, M2 and M1, NA, while there is no difference between M2, NA. However the traditional approach with Kruskal–Wallis test is made with regard to the medians and not taking into account different standard deviation of the distribution. For this reason, a recently proposed DSC approach was used, where the comparison is made taking into account the whole distribution of the data. The result is that there is a statistical significant

294

R. Hribar et al.

Fig. 19.6 Summary attainment surfaces of Pareto fronts over 12 runs for M1

Fig. 19.7 Summary attainment surfaces of Pareto fronts over 12 runs for M2

difference among M1,M2 and NA, and they are ranked as 1, 2, and 3, respectively, from which it follows that the NA needs most evaluations on average. The fact that NA finds better quality solutions compared to M1 and M2 is very surprising. Even though evolutionary operators generate initial points with lower values of RSE, it seems that those initial points do not lead gradient descent to good regions. Evidently, starting with low RSE does not guarantee good convergence. This indicates that low RSE should not be the sole trait to be inherited in order to ensure efficient evolutionary operators. To further explore this counterintuitive behavior, the data gathered during optimizations was analyzed. All instances of individuals with k = 10 that was generated during M1, M2 or NA was gathered. Such individuals can be produced by crossover or mutation followed by a gradient descent or it can be produced solely by gradient descent starting from random initial point.

19 Memetic Algorithm and Evolutionary Operators …

295

Fig. 19.8 Summary attainment surfaces of Pareto fronts over 12 runs for NA

Figure 19.9 shows the progression of gradient descent for individuals generated by crossover and for randomly generated individuals. Because crossover combines previously optimized parents, the offspring has low initial RSE compared to the random point whose RSE ≈ 1 at the start of gradient descent. When gradient descent is used on individuals created by crossover, RSE steeply falls but shortly after the convergence becomes very slow. It seems that gradient descent enters a region of slow convergence which could indicate a plateau in the optimization landscape. On the other hand, when gradient descent starts from random initial point, the convergence is quite even and no quick changes in steepness are present. It seems that crossover introduces such initial points for gradient descent that are drawn to a plateau with very small gradient. Figure 19.10 shows the progressions of gradient descent for individuals generated by mutation. Only mutations where columns were added were considered. The extent to which an individual is mutated is measured by the number of columns that was added to an individual k. Gradient descent performed on mutated individuals converges to higher values of RSE compared to the one performed on random individuals. The only mutation that leads gradient descent to RSE values close to the ones in nonmutated case is the addition of one column (k = 1). The number of steps needed to reach convergence in this case is also approximately three times smaller compared to nonmutated case. This is one explanation why M1 requires less evaluations compared to M2 and NA. The distributions of RSE values after gradient descent for mutated individuals is shown in Fig. 19.11. Like with crossover, the mutation seems to lead gradient descent to unfavorable regions of the search space.

296

R. Hribar et al. random individuals individuals created by crossover

0.62

0.6

RSE

0.58

0.56

0.54

0.52 0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

steps of gradient descent

Fig. 19.9 Progress of gradient descent when initial points are individuals created by crossover compared to the random individuals. All solutions here have k = 10 and crossovers were performed using already descended individuals with k = 3, 4, 5, 6, 7 (3 + 7, 4 + 6 and 5 + 5). Full lines are the medians and the shaded areas represent the range of central 66% of runs

Δk =6 Δk =5 Δk =4 Δk =3 Δk =2 Δk =1 no mutation

0.6

RSE

0.58

0.56

0.54

0.52 0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

steps of gradient descent

Fig. 19.10 Median progress of gradient descent when initial points are random individuals compared to mutated individuals. All solutions here have k = 10 and mutations were performed using already descended individuals with k = 4, 5, . . . , 9. Quantity k tells how many columns were added via mutation and represents the extent to which an individual was mutated

19 Memetic Algorithm and Evolutionary Operators …

297

Fig. 19.11 Distributions of RSE after performing gradient descent on mutated individuals compared to nonmutated ones. Mutation considered here is the addition of specific number of columns (x axis) to matrices G and Si to already descended individuals. All solutions here have k = 10, therefore mutations were performed on already descended individuals with k = 3, 4, . . . , 9

19.8 Conclusion and Future Work The comparison of the three algorithms showed that the naive approach consistently surpasses memetic algorithms in the quality of solutions. Even though, memetic algorithms can generate individuals which inherit low RSE, these offspring do not lead Adam to solutions of the same quality compared to random starting points. This was further proved by analyzing gradient descent progress of individuals that were generated via crossover, mutation or generated randomly. This can indicate an interesting property of the fitness landscape that needs to be further explored. In particular, Adam in memetic algorithms starts on matrices with few dominant non-zero entries provided by evolutionary operators, while in naive approach Adam starts its search from matrices with entries which are uniformly distributed between [0, 0.01]. This might be the reason for low local search performance and might indicate that the choice of evolutionary operators should also be dependent of the local search used and not only on good hereditary features. Also, it seems that inheritance of traits from good individuals of lower dimension somehow guides the gradient descent to regions where gradient has very small magnitude. Non the less, memetic algorithms proved to be significantly faster than the naive approach. For the future work we will continue our studies of efficient evolutionary operators for the research problem. Since further testing of evolutionary operators behavior for

298

R. Hribar et al.

this problem requires great computational resources, acceleration of gradient descent step will be implemented on GPGPU. Evolutionary algorithm presented in this work will be further developed so that matrices G have orthogonal columns. This constraint restricts the problem and enforces classical interpretation of clustering. This algorithm will also be adapted so that asymmetric matrices Ri can be used which in consequence introduces several different G matrices of different dimensions, one for each data type. In this regard, the algorithm will be able to perform both classical and soft clustering for possibly heterogeneous data.

References 1. C. Ding, X. He, H.D. Simon, On the equivalence of nonnegative matrix factorization and spectral clustering, in Proceedings of the 2005 SIAM International Conference on Data Mining (SIAM, Bangkok, 2005), pp. 606–610 2. C. Ding, Orthogonal nonnegative matrix tri-factorizations for clustering. In In SIGKDD, Press, pp. 126–135 (2006) 3. Y. Pei, N. Chakraborty, K. Sycara, Nonnegative matrix tri-factorization with graph regularization for community detection in social networks, in Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15. AAAI Press, pp. 2083–2089 (2015) 4. N. Del Buono, G. Pio, Non-negative matrix tri-factorization for co-clustering: an analysis of the block matrix. Inf. Sci. 301, 13–26 (2015) 5. M. Mitchell, J.H. Holland, When will a genetic algorithm outperform hill climbing? In Stephanie Forrest, editor, Proceedings of the 5th International Conference on Genetic Algorithms, Urbana-Champaign, IL, USA, June 1993, p. 647. (Morgan Kaufmann, Burlington, 1993) 6. B. Doerr, E. Happ, C. Klein, Crossover can provably be useful in evolutionary computation. Theoretical Foundations of Evolutionary Computation. Theor. Comput. Sci. 425, 17–33 (2012) 7. N. Gillis, The why and how of nonnegative matrix factorization. Regul. Optim. Kernels Support Vector Mach. 12, 257 (2014) 8. Vladimir Gligorijevi´c, Noël Malod-Dognin, Nataša Pržulj, Integrative methods for analyzing big data in precision medicine. Proteomics 16(5), 741–758 (2016) 9. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. arXiv:1412.6980 (2014) 10. Salvador García, Daniel Molina, Manuel Lozano, Francisco Herrera, A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J. Heuristics 15(6), 617 (2009) 11. T. Eftimov, P. Korošec, B.K. Seljak, A novel approach to statistical comparison of meta-heuristic stochastic optimization algorithms using deep statistics. Inf. Sci. 417, 186–215 (2017)

Chapter 20

Quaternion Simulated Annealing Abdellatif El Afia, Mohamed Lalaoui, and El-ghazali Talbi

Abstract Simulated annealing (SA) is a well-known stochastic local search algorithm for solving unconstrained optimization problems. It mimics the annealing process used in the metallurgy to approximate the global optimum of an optimization problem and uses the temperature to control the search. Unfortunately, the effectiveness of simulated annealing drops drastically when dealing with a large-scale optimization problem. This is due in general to a premature convergence or a stagnation. The both phenomenons can be avoided by a good balance between exploitation and exploration. This paper focuses on the same problem encountered by simulated annealing and try to heal it using the quaternion which are a number system that extends complex numbers. Quaternion representation helps the simulated annealing algorithm to smooth the fitness landscape and thus avoiding to get stuck in the local optima by expanding the original search space. Empirical analysis was conducted on many numerical benchmark functions. The experimental results show that the quaternion representation of neighborhood improves the solution quality compared with the classical simulated annealing. Our approach was also compared with other nature-inspired optimization algorithms. It was shown that the quaternion simulated annealing overcomes other heuristics in terms of solution quality for most of the benchmark functions.

A. El Afia (B) · M. Lalaoui National School of Computer Science and Systems Analysis, Mohammed V University, Rabat, Morocco e-mail: [email protected] M. Lalaoui e-mail: [email protected] E. Talbi Polytech’Lille, University Lille, INRIA, CNRS, Lille, France e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_20

299

300

A. El Afia et al.

20.1 Introduction The optimization problem with continuous variables aims at minimizing an objective function. If this objective function depends on real variables with no restrictions on its values, the problem is called the unconstrained optimization. Otherwise, it is the constrained optimization. Mathematically, let  the set of feasible solutions, and let f :  → R the objective function defined over the solution space. The purpose is to solve the unconstrained optimization problem (20.1), min

f (x)

(20.1)

s.t. x ∈ 

We need to find the global minimum x ∗ in the solution space , where for every x ∈ , f (x ∗ ) ≤ f (x). Let define N (x) the neighborhood function for every x ∈  and we consider the classical simulated annealing described in Algorithm 1. The SA starts from an initial solution x0 ∈ . Then, at each iteration n of the inner loop, the SA generates a new solution xn from the previous solution xn−1 , this based on a probability distribution and decides whether or not to accept it through the Metropolis criterion. This behavior is modeled by a probability distribution defined as, PTn (xnew |xnew xcurr ent xcurr ent ) =

  ex p − 1,

f (x new )− f (x cur ) Tn−1



, if f (xnew ) − f (xcur ) > 0 otherwise

(20.2) In which Tn−1 is the previous temperature parameter. For each temperature stage Tn , this process is repeated L max times. After these L max iterations, the current temperature Tn is decreased. One of the most used cooling function is the geometric function: Tn = α Tn−1 , 0 < α < 1. The SA repeats these steps until a stopping criterion is meet which can be the final temperature T f or an upper limit on the number of iterations. At the early stage, the temperature is high and the probability of accepting a bad solution is large. As the algorithm proceeds, both the temperature decreases and the probability of accepting bad solution become low. In the final stage, the simulated annealing behaves as the gradient descent method. In general, SA can be seen as an iterative improvement process composed of three functions: generation, acceptance and cooling. These three functions determine the convergence of general SA [1], but also the parameters such as the initial temperature, initial configuration, inner-loop, and outer-loop stop criterion, can have a significant impact on its finite-time behavior. That is, the computation time in practice depends on the three functions as well as these parameters. Most research on SA has concentrated on the update and accept function and various algorithmic parameters, only limited attention has been paid to the generate function. Many studies tried to enhance the performances of metaheuristic by learning techniques [2–22], although to the best of our knowledge, the first work integrating a

20 Quaternion Simulated Annealing

301

Algorithm 1 Simulated annealing algorithm Input f : cost function, T0 : the initial temperature,T f : the final temperature, Tn : the temperature at the n-th stage, L max : the length of temperature stage, x0 : the current solution Output xbest : the best solution for the cost function f Initialization n=0 xcurr ent ← x0 while Tn ≤ T f do for k = 1 to L max do xnew ← Generate_solution(xcurr ent ) if f (xnew ) − f (xcurr ent ) ≤ 0 then xcurr ent ← xnew else Generate a pseudorandom number  from uniform distribution end if if  < f (xnew )−Tfn (xcurr ent ) then xcurr ent ← xnew end if end for Tn+1 ← αTn n ←n+1 xcurr ent ← xnew end while xbest ← xcurr ent

learning technique into the neighborhood function was proposed by Corana [23] , who proposed an adaptive approach to adjust the neighborhood range of SA for continuous optimization problems. But It has been proved by [24] that this method is not better than the SA with the good neighborhood range. Miki et al. [24] tried to enhance the performance of SA by adjusting the neighborhood range according to the landscape of the given problem using the opposition based learning. In fact, the opposition based learning increases only the diversity of the candidate’s solution by selection not only the random guess but also its opposite. But this approach can become computationally expensive when the algorithm gets near to the global optimum. Unlike the previous approaches which used parameters to adjust the neighborhood range during the exploration of the search domain, this research proposes the quaternion representation to enhance the neighborhood exploration. In this paper, we propose a method that does not use parameters to adjust neighborhood range, but instead explore the quaternion space. Each 1-dimension of the initial vector is converted to a 4-dimension quaternion. The search become easier even if the quaternion space is bigger than the original one. In addition real valued data is often best understood when embedded in the complex domain. It was first reported by Fister et al. [25] that the quaternion representation can help the algorithm to efficiently balance between exploration and exploitation. This approach expands the original search space. In addition exploring the search space of quaternion is easier because the fitness landscape based on this quaternion representation become smoother. The quaternion was used by Fister et al. [25] to enhance the firefly algorithm and to avoid the premature

302

A. El Afia et al.

convergence, it was reported by the author that the quaternion representation of individuals within firefly creates a balance between exploration and exploitation. Joao Papa et al. [26] introduces a Harmony search algorithm based on quaternion. His aim was to smooth the fitness landscape of non-convex function in high dimension space, where each proposed solution in n dimensional is modeled as a tensor of dimensions 4×N. The same approach was applied for swarm intelligence algorithm by Iztok Fister et al. [27] to reduce the problem of stagnation if the Bat algorithm where each individual was represented as quaternion. In addition [28] represents individuals of genetic algorithm using quaternion. The main idea behind his approach is to map each 1-dimension real value to a 4-dimensional quaternion. The method gives a better final solution and enhances the algorithm convergences. Up to now, the quaternion was only applied on population-based algorithms. This article presents the first attempt to apply quaternion on a stochastic local search like simulated annealing and to validate its effectiveness. The rest of this paper is organized as follows. First, Sect. 20.2 gives background information, including a description of the simulated annealing and the quaternion algebra, Sect. 20.3 describes the concept of the quaternion simulated annealing, then Sect. 20.4 presents the experimental results and discussion, and finally, we conclude the paper in Sect. 20.5.

20.2 Background In this section, first we will describe how the simulated annealing generates the candidate’s solutions in the case of continuous optimization. Then, we will give a brief introduction of the quaternion algebra.

20.2.1 Neighborhood Structure We suppose that for each state x in S there is a set N (x) ⊂ S, which N (x) is called the set of neighbors of x reachable in exactly one move. Each move is reversible ( y ∈ N (x) =⇒ x ∈ N (y)). From any state x there is the same number of moves (i.e., ω = |N |). In addition, any state in the neighborhood N must be reachable is a finite number of moves. Each move has a probability of 1/ω to be accepted and each random move is choosen using the function (20.2). The simulated annealing explores the search space using random walk methods called the hit and run generator introduced by Smith in 1984 [29] summarized in Algorithm 2. The underling concept behind this Markov chain sampling technique is to generate a sequence of point by taking steps of random length in randomly generated direction. First the hit-and-run algorithm generates a random uniformly distributed directions over a specific set of directions on the unit hypersphere Rn . This is done by generating n independent

20 Quaternion Simulated Annealing

303

scalar di , i = 1, 2, . . . , n from a normal distribution N (0, 1) then we scale them to calculate the unit direction vector Dk [30]. Dk = (d1 , d2 , . . . , dn )

 n 

−1/2 di2

(20.3)

i=1

Then the hit and run algorithm generates a step length, which is generated uniformly at the intersection of unit direction vector Dk with the feasible set S. Algorithm 2 Generate solution Input xcurr ent : the current solution, D: the problem dimension , k : the current inner iteration index Output xnew : the new generated solution Begin xk+1 = xk + λDk where Dk is a random direction uniformly distributed over a direction set D ⊂ S and xk is uniformaly distributed over the line set :[L k = {x : x ∈ S and x = X k + λDk , λ a r eal scalar }] End

20.2.2 Quaternions The quaternions [31] is defined by the formula q = a0 + a1 i + a2 j + a3 k, where a0 , a1 , a2 , a3 are real numbers i, j and k represent the imaginary parts. These fundamental quaternion units satisfy the following equations, i j = k, jk = i, ki = j ji = −k, k j = −i, ik = − j

(20.4)

i = j =k =1 2

2

2

For two quaternions q1 , q2 in the 4-dimensional space over the real numbers, the following operations can be defined [32]: – Addition and subtraction are determined by, q1 ± q2 = (a0 + a1 i + a2 j + a3 k) ± (b0 + b1 i + b2 j + b3 k) = (a0 ± b0 ) + (a1 ± b1 ) i + (a2 ± b2 ) j + (a3 ± b3 ) k

(20.5)

– Multiplication is determined by, q1 q2 = (a0 + a1 i + a2 j + a3 k)(b0 + b1 i + b2 j + b3 k) =

a0 + a1 i + a2 j + a3 k

(20.6)

304

A. El Afia et al.

⎧ a ⎪ ⎪ ⎨ 0 a1 ⎪ a2 ⎪ ⎩ a3

where

= a0 b0 − a1 b1 − a2 b2 − a3 b3 = a0 b1 + a1 b0 + a2 b3 − a3 b2 = a0 b2 − a1 b3 + a2 b0 + a3 b1 = a0 b3 + a1 b2 − a2 b1 + a3 b0

The product of two quaternions is not commutative i.e q 1 q2 = q 2 q1 . – Conjugate is an unary operation defined by, q1 = (a0 + a1 i + a2 j + a3 k) = a0 − a1 i − a2 j − a3 k

(20.7)

– Norm of a quaternion is determined by, q1 =

a0 + a1 i + a2 j + a3 k =

a02 + a12 + a22 + a32

(20.8)

where the norm satisfies these folowing proprieties, q1 = q1 , q0 q1 = q0 q1

(20.9)

where q1 = q1 and q0 q1 = q0 q1 . This function is used for mapping a 4-dimensional quaternion to 1-dimensional real valued scalar. – The multiplicative inverse of quaternion q is denoted as q −1 which is equal to : q1−1 =

q1 q1 2

(20.10)

– The multiplicative inverse of quaternion satisfies the properties, q1 q1−1 = q1−1 q1 = 1 (q1−1 )

−1

= q1 (q1 q2 )−1 = q2−1 q2−1

(20.11)

– The distance between a quaternion q 1 and q2 is defined by, dist (q1 , q2 ) =

(a0 − b0 )2 + (a1 − b1 )2 + (a2 − b2 )2 + (a3 − b3 )2 (20.12)

The operations of the quaternion algebra are used for implementing the quaternion hit and run function of the simulated annealing. The next section presents the use of these operations within the new simulated annealing with the quaternion hit and run generator in detail.

20 Quaternion Simulated Annealing

305

20.3 Simulated Annealing Based Quaternions The main idea behind our approach is exploring the quaternion space instead of the Euclidian space. Each solution is modeled as a set of D quaternions q ∈ R4 . Therefore, the simulated annealing based quaternion (see Algorithm 2) will search a solution of the form. Then it maps each quaternion to a real using the following (20.13),   1/2   3

 j=1 x i j  (20.13)   , i = 1, . . . , D xi = N or m xi = x  i0

In other words, the simulated annealing tries to find the quaternions that minimizes the cost function for each variables. The quaternion simulated annealing is based on its classical version, and we only changed the way we generate the candidate solution from euclidean space to a quaternion space. The quaternion representation moves the search toward the more promising area. Despite that the quaternion space is larger than the original one, it can be smoother for exploration [25]. At the first step we i , i ∈ 1..D using the Eq. 20.14 defined generate a random a set of quaternion qinitial by [25], Random_Quater nion() = {ai = N (0, 1) | f or i = 1, . . . , 4}

(20.14)

where each quaternion component is initialized with a random number drawn from the Gaussian distribution N (0, 1) with zero mean and one as a standard deviation. Next, each quaternion of candidate solution within the neighborhood space is mapped to the corresponding real value before evaluation the objective function. In the next section, we will study the convergence of the quaternion simulated annealing and we will prove that Q-SA keeps its convergence propriety under the quaternion space.

20.4 Experimental Results Experiment was performed to find how the quaternions representation of the neighborhood can enhance the quality of solution, and improve the rate of convergence in high dimensional search space of simulated annealing. Furthermore, we compared the outcomes of our approach with other optimization algorithm such as the particle swarm optimization (PSO) [33], the genetic algorithm (GA) [34]. The ant bee colony algorithm (ABC) [35], the bat algorithm (BA) [36] and the generalized simulated annealing (Gen-SA) [37]. These results will be analyzed using a statistical test and then discussed.

306

A. El Afia et al.

Algorithm 3 Simulated annealing algorithm based quaternions Input f : cost function, T0 : the initial temperature,T f : the final temperature, Tn : the temperature at the n-th stage, L max : the length of temperature stage, x0 : the current solution Output qbest : the best quaternion solution for the cost function f Initialization n=0 qinitial = qcurr ent = Random_Quater nion() while (Tn ≤ T f ) do for k = 1 to L max do qnew ← Generate_quaternion_solution(qcurr ent ) N N ew = N or m(q N ew ) if f (Nnew ) − f (Ncurr ent ) ≤ 0 then qcurr ent ← qnew else Generate a pseudorandom number  from uniform distribution end if if  < f (Nnew )−Tfn (Ncurr ent ) then qcurr ent ← qnew end if end for Tn+1 ← αTn n ←n+1 xcurr ent ← xnew end while qbest ← qcurr ent

20.4.1 Benchmark Functions We have chosen six n-dimensional functions (see Table 20.1) selected from the literature [38]. To figure the performance of our approach and how it can improve the solution quality. These functions were divided into unimodal function and multimodal one, which have multiple local minima scattered throughout the search space. These functions can show the ability of the algorithm to escape from local minimum. The n-dimensional functions can also be categorized into separable and non-separable ones. The last ones are the most difficult to optimize due to the interdependency between variables.

20.4.2 Comparison of the Convergence Speed To validate the convergence speed, experiments were conducted on the benchmarks with the dimension D = 10, in 10 000 neighborhood generations. The cost value was reported in Fig. 20.1. For non-separable functions ( f 2 , f 3 , f 4 ), it was depicted from the figures that the classical SA get stuck rapidly in local minima, it also shows a slow convergence rate in the high-dimensional search space. However, The Q-SA presents a good rate of convergence toward the global minimum, especially for non-separable

20 Quaternion Simulated Annealing

307

Table 20.1 The benchmark functions No Name Ranges

Minimum

1

Rastrigin

xi ∈[−5.12,5.12] 0

2

Griewank

xi ∈ [−600, 600] 0

3

Rosenbrock

xi ∈ [−5, 10]

0

4

Levy

xi ∈ [−10, 10]

0

5

Xin-She Yang

xi ∈ [−5, 5]

0

6

Salomon

xi ∈ [−100, 100] 0

Characteristics separable, multimodal non-separable, multimodal non-separable, unimodal non-separable, multimodal separable, multimodal non-separable, multimodal

function. For separable functions ( f 1 ) the Q-SA reaches a better solutions in the early stages compared to the original SA. The SA using quaternion representation converges in a few generations, and it has ability to tune itself in the local minimum. As observed from Fig. 20.1 we find out that the Q-SA converges to much better solutions and with much faster speed at the later stage of algorithm. However, there are different convergence behaviors at the former stage for two algorithms. In the early stages, the classical SA’s convergence speed outperforms Q-SA for some functions. For example, SA’s convergence speed is faster than Q-SA at the initial stage for Rastrigin and Griewank. However, the convergence speed of Q-SA at the initial stage greatly exceed SA for all other functions. In subsequent stage, The SA is more local than the Q-SA, this is why fails rapidly into a local optimum. We can conclude that the proposed Q-SA has higher precision and tends to find the global optimum faster than the SA for all these benchmark functions.

20.4.3 Performance Comparison with Other Optimization Algorithms This subsection aims to compare the result of Q-SA to other well-known optimization algorithms such as the particle swarm optimization (PSO) [33], the genetic algorithm (GA) [34], the ant bee colony algorithm (ABC) [35], the Bat Algorithm (BA) [36] and the generalized simulated annealing (Gen-SA) [37]. This experiment part aims also to measure the effect of quaternion representation on the classical SA. The obtained results were tested using Friedman statistical tests. These experiments were conducted using an open-source library LibOPT developed in C language and implementing PSO, GA ABC and BA [39]. For the dimension D = 50, we set the

308

A. El Afia et al.

(a) Rastrigin

(b) Griewank

(c) Rosenbrock

(d) Levy

Fig. 20.1 Convergence curves of the Q-SA and the SA on the benchmark functions

population size to P S = 100 as reported by [25]. The maximal number of generations (MaxGen) were calculated using the following formula [25]: Max Gen =

5000 × D PS

(20.15)

Therefore, the maximal number of generations Max Gen = 2500 was used in this study for PSO, GA, ABC and BA. In addition, for each optimization algorithm a set of parameter was used. The parameters of GA, BA and ABC was the same used by [25]. • The Q-SA and SA parameters: T0 = 1000, the inner number of iterations is equal to 100 and the outer number of iterations equal to 100. • The Gen-SA parameters were selected as [37] : T0 = 5230, the visiting parametre qv = 2.62 and the acceptance parametre qa = −5.0, the markov chain length L max = 2 × dimension. • The GA parameters are: the crossover rate, C R = 0.9, the probability of diversity, p = 0.15 and the number of individuals is chosen to store in the archive pool: m = 40.

20 Quaternion Simulated Annealing

309

• The BA parameters are: the loudness A0 = 0.5, the pulse rate r0 = 0.5, the minimum frequency Q min = 0.0, and the maximum frequency Q max = 0.1. • The ABC parameters are: the number of employed bees equals 50, the number of onlooker bees equals 50 and the limitation of the number of cycles that a source cannot be improved is 100. • The PSO parameters are based on the studies of [40] : the acceleration constants c1 = c2 = 2, the inertia weight w = 0.7, the minimal inertia weight wmin = 0.4, the maximal inertia weight wmax = 0.9. The numerical results of our experiment for each algorithm on six benchmark functions where the best, the worst, the mean, the median values and its corresponding standard deviations for each algorithm on ten benchmark functions are presented in Table 20.2. As depicted in the Table 20.2, the Q-SA algorithm solves efficiently the Rosenbrock and the Levy problems. It was also noticed that the Q-SA shows acceptable performance for Rastrigin, Griewank. Table 20.2 shows also that the ABC algorithm significantly outperforms the results of the other algorithms, i.e., Q-SA, SA, GenSA, PSO, GA and BA according to dimension D = 50 in Rastrigin and Salomon functions.

20.4.4 Statistical Test The significance of the results was evaluated using the Friedman’s test [41]. It is a non-parametric statistical test equivalent to the parametric ANOVA. The Freedman’s test hypothesis are formulated as follows: – H0 : Each ranking of the metaheuristics within each problem is similar, (i.e., there is no difference between them) so that for instance, the population medians are equal: H0 : [μ1 = ... = μ N ]

(20.16)

– H1 : At least one of the metaheuristics has a different performance than at least one of the other metaheuristics. H1 : [μ1 ...μ N not all equal]

(20.17)

In addition, we rank the results of the metaheuristic for each benchmark function, giving 1 to the best algorithm and 7 to the worst one. Let r( pi j ) be the rank of jth algorithm in k algorithm on the ith function of N benchmark functions, where k is equal to 7 and N is equal to 6 in our experiment. The average ranks of the algorithms N r( pi j ) f or j ∈ [1..7] as shown in Table 20.3. were then computed, R j = N1 i=1 The average ranks by themselves give a useful performance comparison. As depicted in Table 20.3 the Q-SA ranks the first with the rank average of 1.83 followed by the

310

A. El Afia et al.

Table 20.2 Performance comparison of Q-SA with other optimization algorithms for dimension D = 50 Function

Mesures

Q-SA

SA

Gen-SA

PSO

GA

ABC

BA

Rastrigin

Best Worst Mean Stdev Median Best Worst Mean Stdev Median Best Worst Mean Stdev Median Best Worst Mean Stdev Median Best Worst Mean Stdev Median Best Worst Mean Stdev Median

3.06E-04 1.29E-01 1.41E-02 2.52E-02 4.82E-03 7.92E-07 6.34E-03 9.05E-04 1.46E-03 2.05E-04 1.77E-04 4.29E+00 2.57E-01 7.88E-01 6.24E-02 5.95E-07 4.55E-04 9.03E-05 1.23E-04 4.13E-05 3.83E-08 1.41E-05 2.60E-06 2.79E-06 1.79E-06 4.91E-04 2.07E-01 9.56E-02 7.30E-02 1.01E-01

3.51E+02 5.72E+02 4.75E+02 5.65E+01 4.71E+02 2.39E+01 3.31E+01 2.96E+01 2.14E+00 2.97E+01 3.43E+08 4.97E+08 4.12E+08 4.29E+07 4.12E+08 2.19E+02 3.36E+02 2.79E+02 3.03E+01 2.79E+02 2.40E+11 2.24E+19 1.35E+18 4.20E+18 4.28E+16 2.92E+01 3.61E+01 3.39E+01 1.29E+00 3.42E+01

3.38E+01 8.56E+01 5.39E+01 1.28E+01 5.37E+01 3.31E-12 4.13E-11 1.55E-11 9.84E-12 1.28E-11 1.98E-13 8.97E+01 2.21E+01 2.59E+01 1.44E+01 1.35E+01 1.18E+02 4.81E+01 2.02E+01 4.38E+01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.20E+00 2.22E+01 1.73E+01 2.66E+00 1.77E+01

6.45E+01 1.96E+02 1.35E+02 2.91E+01 1.39E+02 8.07E-01 1.06E+00 9.55E-01 6.17E-02 9.58E-01 4.00E+02 3.84E+03 1.87E+03 1.04E+03 1.63E+03 5.73E+00 4.11E+01 1.55E+01 7.52E+00 1.46E+01 1.61E-10 1.10E+01 5.52E-01 2.07E+00 0.00E+00 2.00E-01 3.50E+00 2.02E+00 7.56E-01 2.20E+00

6.88E+02 7.93E+02 7.40E+02 2.49E+01 7.41E+02 2.57E+01 3.21E+01 2.90E+01 1.69E+00 2.91E+01 1.83E+06 3.59E+06 2.69E+06 5.30E+05 2.77E+06 3.17E+02 4.81E+02 4.13E+02 4.16E+01 4.13E+02 9.86E+18 1.70E+25 1.16E+24 3.25E+24 0.00E+00 3.22E+01 3.64E+01 3.45E+01 9.67E-01 3.43E+01

0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.34E+01 7.02E+00 5.67E+00 1.02E+01 4.87E+01 1.10E+06 3.56E+05 4.33E+05 4.90E+01 7.93E+00 9.56E+00 8.83E+00 4.00E-01 8.95E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00

4.08E+02 1.07E+03 7.81E+02 2.02E+02 8.19E+02 2.44E-01 2.60E+01 6.80E+00 6.62E+00 5.52E+00 9.26E+01 9.42E+05 3.27E+04 1.75E+05 1.53E+02 9.30E+01 9.43E+05 1.30E+05 3.31E+05 1.53E+02 4.79E+16 9.36E+37 4.26E+36 1.81E+37 0.00E+00 1.72E+01 2.67E+01 2.20E+01 2.20E+00 2.18E+01

Griewank

Rosenbrock

Levy

Xin-She Yang

Salomon

Table 20.3 The rank for all algorithms in each benchmark function and the their average rank Q-SA SA Gen-SA PSO GA ABC BA Rastrigin Griewank Rosenbrock Levy Xin-She Yang Salomon Average rank (R j )

2 2 1 1 3 2 1.83

5 7 7 5 5 6 5.83

3 1 2 4 1 4 2.50

4 3 3 3 4 3 3.33

6 6 6 6 6 7 6.17

1 5 5 2 1 1 2.50

7 4 4 7 7 5 5.67

20 Quaternion Simulated Annealing

311

Gen-SA with the rank average of 2.50, the ABC, PSO, BA and SA rank the third, the fourth, fifth and sixth respectively. The GA has the worst performance overall algorithms. The Freidman statistic is calculated by the following formula, ⎡ ⎤ 7 12N ⎣ 2 k(k − 1)2 ⎦ 2 R − χF = = 24.60 (20.18) k(1 + k) j=1 j 4 Then we calculate the Iman & Danvenport [42] statistic FF to overcome the conservative behavior of Freidman statistic χ F2 , FF =

(N − 1)χ F2 = 10.80 N (k − 1) − χ F2

(20.19)

where the FF statistic is distributed according to the F-distribution with k − 1 = 6 and (k − 1)(N − 1) = 30 degrees of freedom. FF = 10.80 is greater than the critical values of F (6, 66) = 2.35 [43]. Thus, we reject the null hypothesis at the level of significance α=0.05. Then, we conclude that the performance of all algorithms is statistically different. We can proceed with a post hoc significant test to know if algorithm i and j are different. To do that, we used the Holm–Bonferroni [44] method. First we start by ordering the p-value p1 , p2 , p3 , . . . , pk−1 associated with hypothesis H1 , H2 , H3 , . . . , Hk−1 . The Holm–Bonferroni procedure reject the null hypothesis H1 to H j−1 if j is the smallest integer such that p j > k−a j , where a is the level of significance which is equal to 0.05 in this study. Within the R j values calculated by Friedman test shown in Table 20.3, the Q-SA has been taken as a reference algorithm. Indicating with R1 the rank of Q-SA, and with R j for j = 2, . . . , 6 the rank of the remaining algorithms. To calculate the p j value for each pair of algorithm, first we compute the z j value given by the following equation, R1 − R j zj = k(k+1) 6N

(20.20)

The probability values p j following the normal distribution N(0, 1) have been 0.05 . The results of the Holm–Bonferroni procedure are calculated and compared to 7− j given in Table 20.4. The null hypothesis is rejected when the Q-SA is compared ton GA, SA and BA. In other words, The Q-SA statistically outperforms the GA, SA and BA. However, the null hypothesis is accepted when Q-SA is compared to PSO, ABC and Gen-SA meaning that the performance of Q-SA, PSO and ABC is indistinguishable on the selected benchmark functions. These results clearly indicate that not only quaternion representation enhances the classical version of simulated annealing but it can be an alternative to population based algorithm like BA, PSO and ABC for high scale problems.

312

A. El Afia et al.

Table 20.4 Results of the Holm–Bonferroni procedure k=7 Q-SA vs zj pj 1 2 3 4 5 6

GA SA BA PSO ABC Gen-SA

−3.474 −3.207 −3.073 −1.202 −0.534 −0.532

4.49E-07 1.14E-04 2.36E-04 5.41E-02 2.25E-01 3.53E-01

a/(k − j)

Hypothesis

8.33E-03 1.00E-02 1.25E-02 1.67E-02 2.50E-02 5.00E-02

Rejected Rejected Rejected Accepted Accepted Accepted

20.5 Conclusion This article introduces a novel approach to enhance the simulated annealing by the quaternion representation of the neighborhood structure. Research works dealing with quaternions for solving the optimization problem are still very limited. This work proposes the quaternion representation of neighborhood structure in the simulated annealing algorithm that associates each 1-dimensional real-valued scalar to 4-dimensional quaternion. Despite that the search space is enlarged by the quaternion representation, exploration is more effective. This study demonstrates that problems such as the premature convergence or the stagnation arisen in large scale unconstrained optimization could be reduced or even avoided using quaternion. Numerical results show that our approach enhances significantly the quality of solution in large scale problems compared to the classical simulated annealing. In addition, the Q-SA is competitive with other optimization algorithms. Further research should be conducted on other local search algorithms using the quaternion algebra. Furthermore, the Q-SA has a promising application for the real optimization problems.

References 1. Y. Xin, Dynamic neighbourhood size in simulated annealing, in Proceedings of the IEEE Symposium on Foundations of Computational Intelligence (2017) 2. M. Lalaoui, A. El Afia, A versatile generalized simulated annealing using type-2 fuzzy controller for the mixed-model assembly line balancing problem. IFAC-PapersOnLine 52(13), 2804–2809 (2019). https://doi.org/10.1016/j.ifacol.2019.11.633 3. A. El Afia, M. Lalaoui, R. Chiheb, A self-controlled simulated annealing algorithm using hidden markov model state classification. Proc. Comput. Sci. 148, 512–521 (2019). https://doi. org/10.1016/j.procs.2019.01.024. 4. M. Lalaoui, A. El Afia, A Fuzzy generalized simulated annealing for a simple assembly line balancing problem. IFAC-PapersOnLine 51(32), 600–605 (2018). https://doi.org/10.1016/j. ifacol.2018.11.489 5. M. Lalaoui, A. El Afia, R. Chiheb, A self-tuned simulated annealing algorithm using hidden Markov model. Int. J. Electr. Comput. Eng. (IJECE) 8(1), 291–298 (2017). https://doi.org/10. 11591/ijece.v8i1.pp291-298

20 Quaternion Simulated Annealing

313

6. M. Lalaoui, A. El Afia, R. Chiheb, A self-tuned simulated annealing algorithm using hidden Markov model, in the International Conference on Learning and Optimization Algorithms: Theory and Application (LOPAL’2018) (2018). https://doi.org/10.1145/3230905.3230963 7. A. El Afia, M. Lalaoui, R. Chiheb, Fuzzy logic controller for an adaptive Huang cooling of simulated annealing, in The 2nd International Conference on Big Data, Cloud and Applications (CloudTech’17) IEEE Conference (2017). https://doi.org/10.1145/3090354.3090420 8. M. Lalaoui, A. El Afia, R. Chiheb, A self-adaptive very fast simulated annealing based on hidden Markov model, in The 3rd International Conference on Cloud Computing Technologies and Applications, ACM Conference(2017). https://doi.org/10.1109/CloudTech.2017.8284698 9. M. Lalaoui, A. El Afia, R. Chiheb, Hidden Markov model for a self-learning of simulated annealing cooling law, in The 5th International Conference on Multimedia Computing and Systems IEEE Conference, ICMCS’16. https://doi.org/10.1109/ICMCS.2016.7905557 10. S. Bouzbita, A. El Afia, R. Faizi, A novel based Hidden Markov model approach for controlling the ACS-TSP evaporation parameter, in The 5th International Conference on Multimedia Computing and Systems (ICMCS), pp. 633–638 (2016). https://doi.org/10.1109/ICMCS.2016. 7905544 11. S. Bouzbita, A. El Afia, R. Faizi, M. Zbakh, Dynamic adaptation of the ACS-TSP local pheromone decay parameter based on the Hidden Markov model, in The 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), pp. 344–349 (2016). https://doi.org/10.1109/CloudTech.2016.7847719 12. A. El Afia, S. Bouzbita, R. Faizi, The effect of updating the local pheromone on acs performance using fuzzy logic. Int. J. Electr. Comput. Eng. 7(4), 2161–2168 (2017). https://doi.org/10. 11591/ijece.v7i3.pp2161-2168 13. S. Bouzbita, A. El Afia, R. Faizi, Hidden Markov model classifier for the adaptive ACS-TSP Pheromone parameters, in Bioinspired Heuristics for Optimization, vol. 774. (Springer, Berlin, 2018), p. 153. https://doi.org/10.1007/978-3-319-95104-1_10 14. S. Bouzbita, A. El Afia, R. Faizi, Parameter adaptation for ant colony system algorithm using hidden markov model for tsp problems, in The Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications (ACM, New York, 2018), p. 6. https://doi.org/10.1145/3230905.3230962 15. S. Bouzbita, A. El Afia, R. Faizi, Adjusting population size of ant colony system using fuzzy logic controller, in The International Conference on Computational Collective Intelligence, vol. 11684 (Springer, Berlin, 2019), pp. 309–320. https://doi.org/10.1007/978-3-030-283742_27 16. A. El Afia, M, Sarhan, O. Aoun, A probabilistic finite state machine design of particle swarm optimization, in Bioinspired Heuristics for Optimization (Springer, Cham, 2019), pp. 185–201. https://doi.org/10.1007/978-3-319-95104-1_12 17. A. El Afia, O. Aoun, S. Garcia, Adaptive cooperation of multi-swarm particle swarm optimizerbased hidden Markov model. Prog. Artif. Intell. 8, 441–452 (2019). https://doi.org/10.1007/ s13748-019-00183-1 18. O. Aoun, M. Sarhani, A. El Afia, Hidden Markov model classifier for the adaptive particle swarm optimization, in Recent Developments in Metaheuristics (Springer International Publishing, Cham, 2018), pp. 1–15. https://doi.org/10.1007/978-3-319-58253-5_1 19. O. Aoun, M. Sarhani, A. El Afia, Particle swarm optimisation with population size and acceleration coefficients adaptation using hidden Markov model state classification, in International Journal of Metaheuristics, vol. 7(1) (Inderscience Publishers (IEL), Geneva, 2018), pp. 1-29. https://doi.org/10.1504/IJMHEUR.2018.091867 20. O. Aoun, A. El Afia, S., Garcia, Self inertia weight adaptation for the particle swarm optimization, in Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications (ACM, New York, 2018), pp. 8:1–8:6. https://doi.org/10. 1145/3230905.3230964 21. A. El Afia, M. Sarhani, O. Aoun, Hidden Markov model control of inertia weight adaptation for Particle swarm optimization. IFAC-PapersOnLine. 50(1), 9997–10002 (2017). https://doi. org/10.1016/j.ifacol.2017.08.2030

314

A. El Afia et al.

22. O. Aoun, M. Sarhani, A. El Afia, Investigation of hidden Markov model for the tuning of metaheuristics in airline scheduling problems. IFAC-PapersOnLine. 49(3), 347–352 (2016). https://doi.org/10.1016/j.ifacol.2016.07.058 23. A. Corana, M. Marchesi, C. Martini, Simulated annealing with adaptive neighborhood using fuzzy logic controller. ACM Trans. Math. Softw. 13(3), 262–280 (1987) 24. M. Miki, T. Hiroyasu, K. Ono, Simulated annealing with advanced adaptive neighborhood, The Second International Workshop on Intelligent Systems Design and Application, Atlanta, GA, USA, pp. 113–118 (2002) 25. I. Fister, X.S. Yang, J. Brest, Modified firefly algorithm using quaternion representation. Expert Syst. Appl. 40, 7220–7230 (2013) 26. J. Papa, D., Pereira, A. Baldassin, X.S. Yang, On the Harmony Search Using Quaternions, in IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 126–137 27. I. Fister, J. Brest, Modified bat algorithm with quaternion representation, in Evolutionary Computation (CEC), IEEE Congress, Sendai, Japan (2015) 28. T.T. Khuat, M.H. Le, A genetic algorithm with multi-parent crossover using quaternion representation for numerical function optimization. Appl. Intell. 46(4), 810–826 (2017) 29. R.L. Smith, Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions. Operat. Res. 32, 1296–1308 (1984) 30. D.E. Knuth, The Art of Computer Programming, vol. 2 (Addison-Wesley, Boston, 1969) 31. W.R. Hamilton, Lectures on Quaternions (Royal Irish Academy, Dublin, 1853) 32. D. Eberly, Quaternion algebra and calculus. Geometric Tools, LLC (1999) 33. J. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, 1st Edn. (Morgan Kaufmann, Burlington, 2001) 34. J. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992) 35. D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39, 459–471 (2007) 36. X.S. Yang, A.H. Gandomi, Bat algorithm: a novel approach for global engineering optimization. J. Eng. Comput. 29(5), 464–483 (2012) 37. C. Tsallis, D.A. Stariolo, Generalized simulated annealing. Phys. A: Stat. Mech. Appl. 233, 395–406 (1996) 38. M. Jamil, X.S. Yang, A literature survey of benchmark functions for global optimization problems. Int. J. Math. Modell. Numer. Optim. 4(2), 150–194 (1996) 39. J.P. Papa, G.H. Rosa, D. Rodrigues, X.S. Yang, Libopt: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques In Conference’17, Washington, DC, USA (2017) 40. F. Marini, B. Walczak, Particle swarm optimization (PSO). a tutorial, Part B. J. Chemom. Intell. Lab. Syst. 149, pp. 153–165 (2015) 41. M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937) 42. R.L. Iman, J.M. Davenport, Approximations of the critical region of the Friedman statistic. J. Commun. Stat.- A Theory Meth. 9(6), 571–595 (1980) 43. D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd Edn. (Chapman & Hall/CRC, Boca Raton, 2004) 44. S. Holm, A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)

Chapter 21

A Cooperative Multi-swarm Particle Swarm Optimizer Based Hidden Markov Model Oussama Aoun, Abdellatif El Afia, and El-Ghazali Talbi

Abstract Particle swarm optimization (PSO) is a population-based stochastic metaheuristic algorithm; it has been successful in dealing with a multitude of optimization problems. Many PSO variants have been created to boost its optimization capabilities, in particular, to cope with more complex problems. In this paper, we provide a new approach of multi-population particle swarm optimization with a cooperation strategy. The proposed algorithm splits the PSO population into four sub swarms and attributes a main role to each one. A machine learning technique is designed as an individual level to allow each particle to determine its suitable swarm membership at each iteration. In a collective level, cooperative rules are designed between swarms to ensure more diversity and realize the better solution using a Master/Slave cooperation scheme. Several simulations are performed on a set of benchmark functions to examine the performances of this approach compared to a multitude of state of the art of PSO variants. Experiments reveal a good computational efficiency of the presented method with distinguishable performances.

21.1 Introduction A number of cooperative approaches have been emerged to improve swarm intelligence optimizers, in particular, various efforts have been made to improve the collective search behavior of optimization algorithms mainly in swarm intelligence approaches such as particle swarm optimization [29] and ant colony optimization O. Aoun (B) ENSEM - Hassan II University, Casablanca, Morocco e-mail: [email protected] A. El Afia ENSIAS - Institute of Computer Science, Mohammed V University, Rabat, Morocco e-mail: [email protected] E.-G. Talbi Polytech’Lille - University of Lille, Lille, France e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_21

315

316

O. Aoun et al.

[19]. The main purpose of these essays is to incite complex global behaviors through local interactions by sharing information between different agents and improving learning capacity. Furthermore, it may help them to adapt to unexpected variations (dynamic optimization) when they are communicating with other agents. Concerning particle swarm optimization (PSO), one of its challenging issues is to formalize the design of such cooperative multi-swarm behavior of agents (which are named particles in the case of PSO). It main advantage is to enhance the diversity of the algorithm or to achieve a trade-off between exploration and exploitation. A commonly used cooperation form of PSO is based on the idea of considering multi-swarms (multipopulations), it consists in dividing the whole search space into local subspaces, each of which might cover one or a small number of local optima, and then separately searches within these subspaces. Another way to define cooperative PSO is to assign different roles to particles. Thus, different particles can play different roles, and each one of these particles can play different roles during the search processes. A challenging task within this PSO variant is how each particle has to decide which role it will assume. In this paper, the machine learning algorithm hidden Markov model (HMM) is applied in an individual level (each particle) to model how the decision making of particles to choose the adequate sub-swarm which it will belong. That is, HMM is used to learn and predict the most probable swarm, corresponding to each particle in order to control particles behavior of PSO. This process is performed through the Viterbi algorithm that gives the most likely path of states for each particle at each PSO iteration. Or, for each sub swarm, an associated role is given: exploration, exploitation, convergence and jumping out. Then, in a collective level of the swarm, a cooperative design is made to guide the search and move toward different promising sub-regions. A Master/Slave scheme is chosen for setting cooperation rules between sub-swarms. The rest of the paper is organized as follows: in the next section, we outline the related works. In Sect. 21.3, we provide our multi-swarm approach. Section 21.4 presents the obtained results for the experiments. Finally, we conclude and present perspectives to our work.

21.2 Literature Review In recent years, there has been increased interest in the use of learning methods inside PSO to control its behavior and then to improve its performance. That is, various methods have been proposed to control PSO behavior and to improve the particles learning ability. We can differentiate between two kinds of control approaches which have been used inside PSO in the literature. In the first one, the control depends on the iteration and then the whole swarm follow the same strategy (as it is done in our previous work [4]). In the second one, the control depends on the particle itself. That is, at each iteration, particles are grouped into sub-swarms, and the particles of each swarm have a specific role in the swarm. A summary of different PSO variant types found in literature is given in Table 21.1.

21 A Cooperative Multi-swarm Particle Swarm Optimizer … Table 21.1 PSO variants classification PSO variant type Tuning the control parameters Changing the neighborhood topology Hybridizing Multi-swarm design

317

Examples from literature [4, 51] [36, 37] [6, 26] [35, 52]

The case of multi-swarm design type of PSO variant is our focus in this paper and then we present in the following a review of papers which interested in this issue. Firstly, [34] proposed four operators which play similar roles as the four of states the adaptive PSO defined in [51], which are exploration, exploitation, convergence and jumping-out. Their approach is based on the idea of assigning to each particle one among different operators based on their rewards. Moreover, [52] defined a cooperative approach to PSO based on dividing the population on four sub-swarms according to the four states. Furthermore, this type of control is related to the concept of cooperative swarms which has been introduced by [47]. This concept has been realized in their paper by making use of multiple swarms to optimize diverse components of the solution vector cooperatively. This variant of PSO (multi-swarm PSO) has been shown in the literature as a particular and independent algorithm known by multiswarm optimization [10]. In the offered variant, the authors have been motivated by the quantum model of atoms to determine the quantum swarm. Additionally, another grouping methodology has been stated by [41]. A further cooperative PSO approach can be defined by clustering techniques as offered by [50]. Their work consists of assigning particles to different promising sub-regions as determined by a hierarchical clustering method. More generally, four main classes have been proposed to enhance PSO performance, which are: configuration of the parameters (adaptive control), the study of neighborhood topology of particles in the swarm of PSO, hybridization with other optimization algorithms and integration of learning strategies (diversity control). Concerning the two types mentioned at the beginning of this section. The former correspond to the first type, while the latter is related to the second one. Furthermore, the control the PSO parameters has been proposed in a number of papers with the purpose of achieving a trade-off between the diversity and the convergence speed. It has generally been done using learning strategies such as the comprehensive learning [36] approach in where each particle learns from another particle which is chosen according to a learning probability. Parameter setting in litterature is also an important task of iterative optimization algorithms [45] similar to the PSO algorithm. Parameter tuning is achievable offline by choosing the best configuration designed for a particular optimization problem as in [3]. In [5], an approach for adaptation of the swarm size is proposed. In [4], the acceleration coefficients are adapted and [1] the inertia weight control is controlled. An heterogeneous control parameters of PSO has been done in [2] or also in [22, 25]. Concerning hybridization, it is a long-standing of PSO and example of improvement can be found in [46]. In addition,

318

O. Aoun et al.

hybridization approach has been applied to other metaheuristics such as ant colony [11–15, 23] and simulated annealing [21, 24, 30–33]. We can see from the literature that many papers have inspired from some approaches used in multi-agent systems to defined automated cooperative approach. An example of using the multi-agent concept in PSO can be found in [18]. That is, incremental social learning which is often used to improve the scalability of systems composed of multiple learning agents has been used to improve the performance of PSO. Furthermore, [6] proposed a multi-agent approach which combines simulated annealing (SA) and PSO, we can remark that their idea is related to the generic notion of hyper-heuristics which consists of finding the most suitable configuration of heuristic algorithms. Monett Diaz [40] has cited the may features obtained by using agents in configuring metaheuristics, which are distributed execution, remote execution, cooperation and autonomy. This issue (the interaction between swarm intelligence and multi-agent systems) has been given much attention in the last few years in particular by the popularization of the swarm robotic field. In particular, [9] affirmed the concept of swarm appears nowadays closely associated with intelligent systems in order to carry out useful tasks. The author also analyzed qualitatively the impact of automation concepts to define the intelligent swarms. Moreover, [16] have outlined the main characteristics of swarm robotics and analyzed the collective behavior of individuals in some fields. They affirmed that finite state machines are one of the most used adequate approaches to model this behavior. Another commonly used approach for this purpose is reinforcement learning. In particular, the using of multi-agent concepts can be useful to self-organize particles in PSO using simple rules as defined by [8]. Their main idea was to define six states, which are cohesion, alignment, separation, seeking, clearance, and avoidance. Furthermore, the finite state machine has been used for movement control. That is, the states have been defined by a collection of velocity components and their behavior specific parameters. Furthermore, the population has been divided into two swarms in order to introduce the divide and conquer concept using genetic operators. Another automation approach which can be used inside PSO is cellular automata (CA). It can be used for instance split the population of particles into different groups across cells of cellular automata. Shi et al. [44] has integrated it in the velocity update to modify the trajectories of particles. In term of multi-swarm design of PSO, [35] provided a multi-swarm and multi-best for the particle swarm optimization algorithm. They randomly split particles into multi populations. This algorithm updates velocities and positions of particles using multi-gbest and multi-pbest rather than single gbest and pbest. Liu et al. [39] proposed a novel variant known as Center PSO, it makes use of an extra particle as a center particle that controls the search direction of the entire swarm. Also, [42] built a Multi-swarm cooperative particle swarm optimizer based on a master-slave model; the slave swarms perform as a single PSO while the master swarm iterates depending on its knowledge as well the knowledge of the slave swarms. At the light of the literature in term of enhancing PSO performances and building a more efficient variant of this algorithm, this paper addresses a new variant of PSO based on cooperative multi-swarm design with a coefficient adaptation.

21 A Cooperative Multi-swarm Particle Swarm Optimizer …

319

21.3 Cooperative Multi-swarm Conception of PSO In this section, the standard PSO algorithm is defined with its parameters. Then, the way how sub-swarms are identified is given depending on the individual particle state given by HMM. Each sub-swarm will its own configuration of the parameters of its particles. Cooperation rules will be defined to ensure the information exchange between subs warms during the search process.

21.3.1 Standard PSO The standard PSO is a population based metaheuristic algorithm introduced firstly by [29]. Its mechanism starts with a population of random solutions and during a search process, particles are looking for optima by moving in the search space. In PSO, particles are potential solutions in a D-dimension search space, having a velocity that is adjusted dynamically depending on both individual and social experience. Therefore the velocity and the position of every single particle is updated based on to Eqs. (21.1) and (21.2). vi = w vi + c1r1 ( p Best − xi ) + c2 r2 (g Best − xi )

(21.1)

xi = xi + vi

(21.2)

Where r1 and r2 are two are two vectors of D dimension provided by distribution function of independent uniform random numbers defined between 0 and 1. pBest is the best position of the particle and gBest is the global best of the entire swarm. In our case of multi swarms, a multi gBest positions are defined. w is the inertia weight, and c1 and c2 are the acceleration coefficients. Equation (21.1) is involved to compute the particle new velocity, when Eq. (21.2) is used to update the position of the particle using its previous position and its new velocity. More detail of these parameters can be seen for example in [4]. In our approach, we interest especially on dividing the population into multipopulation or also called multi-swarm. Each swarm will has its own characteristics and search behavior. The next paragraph will gives how sub swarms are constituted and how each sub swarm will be customized to a specific role in the search space.

21.3.2 Sub-swarms Constitution In this approach swarm is divided to a sub-swarms in the objective to achieve a good trade-off between the population diversity and the convergence speed, and specially a good management of the exploration and exploitation of the search process during

320

O. Aoun et al.

Fig. 21.1 Sub-swarms and possible particle movements

execution in order to attain the best possible solution in the minimum number of iterations. Inspired from the definition of [51] of the evolutionary states for PSO, each sub swarm will groups particles of a specific four evolutionary state that are: Exploration (s1 ), exploitation (s2 ), convergence (s3 ) and jumping-out (s4 ). Then, each particle is viewed as a Markov chain having a state {si }i∈[1,4] . During iterations, a particle can has a specific state i that design its membership to a specific swarm i. Also, a particle can change the state from iteration to another and change consequently its corresponding sub swarm. So, a movement between sub swarms is indicated by the rows in the Fig. 21.1. To model the associated swarm of particle, an associated markov chain with state {si }i∈[1,4] is defined to each particle. However, particle state cannot be perceived directly but only by observing some key parameters across iteration. Hence, a hidden markov chain is defined for each particle as a by a triple = (,A,B), all processes are defined on a probability space (, F, P): •  = (πi ) The vector of the initial probability distribution over states; • A = (ai j ) The state transition matrix, P(qt = i|q(t−1) = j), i, j∈[1, N ], N: number of states, t: iteration number, qt is the state at iteration t. • B = (b jk ) The emission matrix also called the confusion matrix, P(ot = k|qt = j), j ∈ [0, N ], k ∈ [0, M], M: number of observations, ot the observation at iteration t. The set of N states {qt }t∈N takes values from the set S = {si }i∈[1,4] what references respectively: exploration, exploitation, convergence, and jumping out. The change of state is reflected by the PSO sequence. (q1 = s2 ) ⇒ (q2 = s1 ) ⇒ (q3 = s2 ) ⇒ as deduced by [51], corresponding to the Markov Chain. Furthermore, we define corresponding initial transition probabilities, P(qt = i|q( t − 1) = j), i, j ∈ [1, 4]. This probability controls all behavior of transition between states of PSO resolution. We take for all possible i and j transitions a transition probability of 0.5.

21 A Cooperative Multi-swarm Particle Swarm Optimizer …

321

The initial state probability corresponds to deterministic start in exploration state (Eq. (21.3)):  = (πi ) = [1000]

(21.3)

The observed parameter of this hidden chain is the evolutionary factor f (defined in [51]) of the APSO. Observations will be belonging f to subintervals of [0, 1] ([0, 0.2], [0.2, 0.3], [0.3, 0.4], [0.4,0.6], [0.6, 0.7], [0.7, 0.8], [0.8, 1]). We divide [0,1] to seven subintervals, so the set observation Y = {yi }i∈[1,...,7] will be number of the subinterval witch belong f. Let sub : [0, 1] → {1, 2, . . . , 7} the function that returns the corresponding interval of f, it corresponds also to the observation: sub ( f ) =δ[0,0.2[ ( f ) +2δ[0.2,0.3[ ( f ) +3δ [0.3,0.4[ ( f ) +4δ [0.4,0.6[ ( f ) +5δ [0.6,0.7[ ( f ) + 6δ [0.7,0.8[ ( f ) +7δ [0.8,1] ( f )

 (with

δ[a,b] (x) =

1, x∈ [a, b] 0 other wise

a, b∈N , x∈R)

(21.4)

(21.5)

Emission probabilities are deduced from defuzzification process of [51] as follow, the same as in [4]: ⎡ ⎤ 0 0 0 0.5 0.25 0.25 0 ⎢ 0 0.25 0.25 0.5 0 0 0 ⎥ ⎥ P=⎢ (21.6) ⎣ 2/3 1/3 0 0 0 0 0 ⎦ 0 0 0 0 0 1/3 2/3 Once HMM parameters are initialized, Baum–Welch algorithm [7] is utilized at each iteration to estimate and then update HMM emission and transition matrix; this enables HMM to be more adaptive and accurate for the classification step. Then, the Viterbi Algorithm is used with the estimated parameters to find the most probable sequence associated with hidden states given a sequence of observed states. The algorithm will find the max Q (state sequence Q = q1 q2 . . . . qT ) for a given observation sequence (O = o1 o2 . . . . oT ) by means of induction (t the iteration number). It is about to find the highest probability paths for states [43]. Viterbi algorithm (Algorithm 1) determine then how each particle move between sub swarms. Indeed, HMM has the ability to learn states of our automata from hidden observation based on the maximum likelihood estimation [20], this learning feature of HMM is used to control the particles cross PSO iterations.

322

O. Aoun et al.

Algorithm 1 Viterbi algorithm Initialization: Observations of length T , state-graph of length N Create a path probability matrix viterbi [N + 2, T ] Create a path backpointer matrix backpointeri [N + 2, T ] for state s = 1 to N do f or war d[s, 1] −→ ao,s x bs (o1 ) backpointer [s, 1] −→ 0 end for for Time step t = 2 to T do for State s = 1 to N do N viter bi[s  , t − 1] x a  x b (o ) viter bi[s, t] −→ maxs=1 s t s ,s N viter bi[s  , t − 1] x a  backpointer [s, 1] −→ arg maxs=1 s ,s end for end for t −→ t + 1 return Best-path of states: the classified current state

21.3.3 Sub-swarms Parameters Adaptation Moreover, according to each sub swarm associated to each state, PSO parameters are adjusted, especially acceleration parameters c1 , c2 and inertia weight w with elastic learning in convergence sub-swarm [51]. It is done based on APSO parameters update in [4, 51], see Algorithm 1: Algorithm 2 Adaptive acceleration update in APSO [51] Initialization: positions and accelerations factors c1 and c2 if sub-swarm = exploration then Increasing c1 and Decreasing c2 else if sub-swarm = exploitation then Increasing c1 and Slightly Decreasing c2 else if sub-swarm = jumping out then Increasing Slightly c1 and Increasing c2 else if sub-swarm = convergence then Decreasing c1 and Increasing c2 end if return c1 and c2 Updated acceleration factors

For all sub swarms, the inertia weight is set as follow: ω(l) =

1 ∈ [0.4, 0.9]∀l ∈ [0, 1] 1 + 1.5e−2.6l

(21.7)

A real-time state estimation procedure is performed to identify each particle adequate swarm: exploration, exploitation, convergence, and jumping out. It qualifies an automatic control of the sub-swarms.

21 A Cooperative Multi-swarm Particle Swarm Optimizer …

323

Fig. 21.2 Sub-swarms and the master/slave interactions

21.3.4 Multi-swarms Cooperation To make use of the multi-swarm design given in the previous paragraphs, it’s mandatory to set a cooperation model to make use of the search capabilities given by each sub-swarm. A master/slave cooperation model is chosen in this approach like in [42], where the slave swarms perform as a single PSO while the master swarm iterates depending on its knowledge as well the knowledge of the slave swarms. In our case, the master swarm is the swarm associated to the convergence state. Then, the slave swarms will be those associated to exploration, exploitation and jumping-out states. Each slave swarm with some n particles adapts itself according to its own evolutionary attached state separately. So, a slave swarm can be viewed as an independent swarm not connected to the other slaves. For the master swarm, the particles improve themselves not simply depending on the social knowledge of the master swarm but as well as that of the slave swarms. This notion is made by additional integrating a new dimension on the velocity of the particles in its velocity update. The equations for the velocity update of the master swarm will be:



viC (t + 1) = w viC (t) + c1 r1 p Best − xiC (t) + c2 r2 g Best C − xiC (t) +c3 r3 (g Best s − xiC (t)) (21.8) Where C represents the convergence sub-swarm, c3 is called migration coefficient, r3 uniform random sequence in the range [0, 1], g Best C is the global best of the convergence swarm and g Best s is the global best of the other slave sub swarms, in particular exploration (g Best s1 ), exploitation (g Best s2 ) and jumping-out (g Best s4 ). The Figure 21.2 represents a communication scheme between sub-swarms. Then, the global algorithm of this approach named MsHMM-PSO is described in Algorithm 3.

324

O. Aoun et al.

Algorithm 3 MsHMM-PSO Data: The objective function ( f ) Initialization: positions, velocities of particles, accelerations factors of all four swarms; Set t value to 0 while number of iterations t ≤ tmax not met do for i = 1 to the number of particles do Decoding specific particle state (viterbi) Associate particle i to its decoded sub-swarm Update w according to Eq. (21.7) Update c1 and c2 values according to the corresponding state (Algorithm 1) end for if convergence swarm then Update velocities according to Eq. (21.8) else Update velocities according to Eq. (21.1) Update positions according to Eq. (21.2) Compute f(xi ) end if for each sub-swarm i do if f(xi ) ≤ fbest) then fbest → f (xi pbest → x if f ( pbest ) ≤ fGbest then fGbest −→ fbest gbest −→ Xbest end if if sub-swarm = convergence then Elistic learning [51] end if end if end for t→t + 1 end while return p best and f best Result The solution based on the best particle in the population and corresponding fitness Value

21.4 Experimentation In this section, we conduct an experimental analysis for the proposed conception method of cooperative multi-swarm PSO called MsHMM-PSO. Simulations are done on various benchmark functions: unimodal and multi-modal. Then, results are compared with other state of the art of PSO related variants.

21.4.1 Parameters Setting Twenty benchmark functions constitute fitness function used for experimentation, which are split to modal, and multimodal as shown in Table 21.2, in addition to

21 A Cooperative Multi-swarm Particle Swarm Optimizer … Table 21.2 Used benchmark functions Test functions Name f1 f2 f3 f4 f5 f6 f7 f8 f9 f 10 f 11 f12 f13 f 14 f 15 f16 f 17 f 18 f 19 f 20

Rotated Ackley Ackley Dropwave Rotated Elliptic Elliptic Griewank Rotated Griewank Quadric Rotated Rastrigin Shifted Rastrigin Rastrigrin Shifted Rosenbrock Rosenbrock Schwefel Shifted Schwefel Shifted Sphere Sphere Step Tablet Rotated Weierstrass

325

Type Multimodal Multimodal Multimodal Unimodal Unimodal Multimodal Multimodal Multimodal Multimodal Multimodal Multimodal Unimodal Unimodal Multimodal Multimodal Multimodal Multimodal Unimodal Unimodal Multimodal

shifted and rotated functions. Across executions, and for every function, the best and the average value are put to use in comparison. The used experimentation machine has i7 processor eighth generation of 2.5 GHz, with 8 Gb of RAM and 512 Gb of storage. Various versions of the PSO algorithm from the literature are selected in the experimentation level for comparison (see Table 21.3). For all chosen PSO variants, we have executed their code which is available online on each benchmark function. Simulations and validations of the proposed MsHMM-PSO on benchmark functions are evaluated. Parameters are the same c1 = c2 = 2, ω = 0.9 for all PSO variants, and c3 = 0.8 for the MsHMM-PSO. The Swarm size is 30 with dimension of 30. Each run contains 1000 generations of the optimization process. We compared results obtained for the benchmark test functions with best known PSO variants from literature. Performance is qualified following the main measured observations: Comparison on the solution accuracy, Comparison on the convergence speed and statistical tests.

326

O. Aoun et al.

Table 21.3 Compared variants of PSO Algorithm Name YSPSO SELPSO SecVibratPSO SecPSO SAPSO RandWPSO LinWPSO CLSPSO AsyLnCPSO SimuAPSO MsPSO

PSO with compressibility factor Natural selection based PSO Order oscillating PSO Swarm-core evolutionary PSO Self-adaptive PSO Random inertia weight PSO Linear decreasing weights PSO Cooperative line search PSO Asynchrous PSO PSO with Simulated Annealing Heterogeneous multi-swarm PSO

Reference [38] [27] [28] [48] [27] [49] [49] [48] [27] [48] [17]

21.4.2 Performance Evaluation Firstly, we dress obtained results of all executions to compare the solution accuracy of our MsHMM-PSO. The best and the average values resulted from experimentations are given in Table 21.4: Our proposed method provides, in most cases, the much better results than all state of the art PSO variants used for comparison. The solution accuracy is enhanced for both unimodal and multimodal functions. So, MsHMM-PSO has enhanced a significant better performances in term of solution accuracy. In term of convergence speed, comparisons are illustrated in Figs. 21.3, 21.4, 21.5, 21.6 and 21.7. As shown in figure last figures, the black line that gives the executions of MsHMMPSO results is under all other lines. Subsequently, MsHMM-PSO provides a quicker convergence when compared to all diverse used PSO variants from literature. In general, the black line of Figs. 21.3, 21.4, 21.5, 21.6 and 21.7 has more convergence speed than other lines. For all functions, the MsHMM-PSO speeds up the optimization across iterations. Then, for the convergence speed, MsHMM-PSO shows its supremacy. Given the exposed effective results of our approach, we can most certainly recognize that the multi-swarm cooperation with Mater/Slave rules based on hidden markov model supplies noticeably more significant performances for the PSO algorithm regarding the solution accuracy and the convergence speed. The next paragraph provides statistical tests comparison.

21.4.3 Statistical Tests For further comparison of the algorithms, we have considered the parametric twosided test t-test with a significance level of 0.05 between the MsHMM-PSO and

fl0

f9

f8

f7

f6

f5

f4

f3

20.2l

54.62

Mean

580.54

Mean

Best

445.55

Mean

Best

8.7E+4

8.9E+5

Best

0.657

0.852

Mean

0.026

Mean

Best

0.006

l.3E+4

Mean

Best

2.9E+3

Best

l.9.E+5

6.4.E+5

54.62

20.2l

580.54

445.55

8.9E+5

8.7E+4

0.852

0.657

0.026

0.006

l.3E+4

2.9E+3

6.4.E+5

l.5.E+5

−0.986

−0.986

Mean

Mean

−l

−l

Best

Best

2.254

2.254

Mean

l.456

20.98

20.83

l.456

Mean

PSO

Best

20.83

20.98

Best

fl

f2

APSO

Functions

847.5l

637.66

l642.53

ll34.l5

l.4E+9

4.5E+8

l.52l

l.335

0.9l9

0.503

4.8E+6

l.lE+6

2.2.E+7

6.7.E+6

−0.793

−0.93

9.07l

7.098

2l.03

20.96

SimuA-PSO

596.74

370.84

l005.86

7l8.55

l.5E+8

2.lE+7

l.227

l.l4l

0.293

0.l07

4.4E+5

l.6E+5

5.2.E+6

l.3.E+6

−0.958

−l

5.022

4.0ll

2l.0l

20.84

Sec-PSO

69l.50

5l4.l8

l077.63

703.37

3.lE+8

6.5E+7

l.278

l.l48

0.535

0.l57

l.2E+6

4.4E+5

8.0.E+6

3.2.E+6

−0.943

−0.992

5.888

4.857

2l.02

20.89

Rand-WPSO

Table 21.4 Results comparisons with other variants of PSO

448.77

306.5l

688.26

544.54

6.9E+6

l.7E+6

l.080

l.052

0.l67

0.07l

l.1E+5

3.6E+4

9.2.E+5

2.4.E+5

−0.990

−l

4.466

3.659

20.82

20.7l

YSPSO

604.l9

456.0l

983.55

727.49

l.4E+8

3.8E+7

l.236

l.l43

0.408

0.l82

9.0E+5

3.7E+5

4.6.E+6

2.5.E+6

−0.967

−0.999

6.267

5.l74

2l.0l

20.83

Sel-PSO

856.43

688.28

l603.80

l250.l4

6.2E+8

l.3E+7

l.477

l.3ll

0.465

0.039

2.2E+6

9.lE+4

2.l.E+7

5.4.E+6

−0.823

−0.994

5.l86

l.835

2l.03

20.8l

SecVibratPSO

59l.l9

427.l6

972.70

7l3.ll

9.4E+7

2.8E+7

l.202

l.l04

0.34l

0.ll8

3.7E+5

8.6E+4

5.4.E+6

l.5.E+6

−0.96l

−l

5.5ll

4.254

20.96

20.79

SAPSO

620.23

496.76

l068.78

838.73

9.9E+7

3.4E+6

l.24l

l.l3l

0.323

0.l30

6.2E+5

3.7E+4

5.7.E+6

2.2.E+6

−0.963

−l

5.467

4.463

2l.02

20.83

Lin-WPSO

506.ll

377.54

878.03

6l2.94

l.8E+7

3.7E+6

l.l00

l.050

0.l95

0.068

l.0E+5

2.2E+4

l.1.E+6

3.7.E+5

−0.994

−l

5.95l

4.596

20.78

20.52

AsyLnCPSO

219.15

172.22

438.01

375.65

934189

166679

0.995

0.83

0.04

0.00

5677.4

460.49

5.9.E+5

2.1.E+5

−0.99

−1

0.22

0.03

20.94

20.67

Ms-PSO

(continued)

42.11

0.00

339.95

99.43

6684

2697

0.081

0.013

3.2e−3

8.6e−4

38

6.03

7.65.E+4

1745

−1

−l

0.016

0.001

20.75

20.32

MsHMMPSO

21 A Cooperative Multi-swarm Particle Swarm Optimizer … 327

f20

fl9

fl8

fl7

fl6

fl5

fl4

fl3

30.2l

37.59

Mean

4.23

Best

l.l4

Mean

0

Best

0

Mean

2.50

Best

0.49

Mean

3.75

Best

l.35

Mean

243.9

Best

lll.30

Mean

56.33

Mean

Best

33.24

l038

Mean

Best

377.54

2l5l.7

Best

299.73

Mean

57.36

Mean

Best

37.39

Best

fll

fl2

APSO

Functions

Table 21.4 (continued)

37.59

30.2l

4.23

l.l4

0

0

2.50

0.49

3.75

l.35

243.9

lll.30

56.33

33.24

l038

377.54

2l5l.7

299.73

57.36

37.39

PSO

40.87

36.55

494.37

283.6l

5.2E−2

8.7E−5

l86.93

l08.73

6l5.97

387.l8

l306.04

68l.07

456.22

l62.89

2954ll

76023

2.3E+6

l.lE+6

473.35

328.92

SimuA-PSO

4l.06

36.ll

96.05

40.73

2.4E−9

4E−32

34.l4

l8.32

304.52

l30.82

668.86

387.95

84.95

29.8l

ll206

3l99

7.6E+5

2.6E+5

253.05

l66.76

Sec-PSO

4l.7l

39.54

2ll.l7

ll8.63

2.3E−4

l.lE−9

60.72

27.28

365.06

206.8l

l003.57

642.76

l76.98

68.25

4l622

6897

9.5E+5

5.5E+5

3l3.34

204.3l

Rand-WPSO

38.74

33.2l

4l.88

22.07

0.0E+0

0.0E+0

l4.45

7.3l

l28.92

87.26

374.87

260.79

24.7l

l5.5l

2676

646.9

l.4E+5

4.lE+4

l82.l8

l09.96

YSPSO

42.l8

39.78

93.82

58.79

6.7E−5

l.lE−ll

53.07

25.68

296.6l

l88.43

795.69

492.64

l40.74

82.98

30657

7l22

7.6E+5

3.0E+5

308.80

232.42

Sel-PSO

4l.23

37.75

l95.l2

3.89

9.2E−2

l.6E−4

64.75

4.6l

558.07

442.44

l766.l3

822.37

233.83

l0.87

34762

58l.75

2.3E+6

l.lE+6

328.07

209.5l

SecVibratPSO

40.2l

35.35

84.60

l9.06

0

0

29.65

l3.l6

266.69

l87.64

622.24

384.93

66.77

38.24

l5669

3899

5.4E+5

2.0E+5

246.l6

l84.92

SAPSO

40.04

36.l5

l05.43

34.46

0

0

3l.62

l5.9l

3ll.65

223.45

697.59

428.l4

66.34

32.56

22428

4034

6.4E+5

2.8E+5

258.39

l70.47

Lin-WPSO

37.99

33.04

43.9l

24.l0

lE−29

0

40.88

l6.53

ll3.44

79.2l

469.77

343.36

58.79

l8.6l

8823

l005

l.2E+5

3.4E+4

290.52

l97.77

AsyLnCPSO

38.19

30.51

0.94

0.14

0.0025

0.00

0.05

0.00

38.60

11.2

219.82

72.11

3.83

0.00

192.79

86.12

552.02

216.11

56.81

0.29

Ms-PSO

31.83

26.33

0.0133

0.0078

0

0

0.0068

0.0041

5.12E−05

4.28E−07

2.98

0.91

−118.34

−118.35

53.31

21.67

33.14

6.19

14.70

6.89

MsHMMPSO

328 O. Aoun et al.

21 A Cooperative Multi-swarm Particle Swarm Optimizer …

Fig. 21.3 Comparison on Elliptic rotated and Ackley rotated functions

Fig. 21.4 Comparison on Sphere and Tablet functions

Fig. 21.5 Comparison on Griewang rotated and Rastrigrin shifted functions

329

330

O. Aoun et al.

Fig. 21.6 Comparison on Rosenbrock Shifted and Sphere Shifted functions

Fig. 21.7 Comparison on Schwefel Shifted and Drop wave functions

the other PSO variants. Results variance is estimated using the sample execution test sets with Student’s t-Test that measures if the difference between two means is statistically significant. We address the two hypothesizes as: • Hypothesis H0 is that the compared approach is similar to the other PSO variants • Hypothesis H1 is that the compared approach is different from the other PSO variants. In the same manner of [51], we present the P-values on every function of this twotailed test with a significance level of 0.05. We perform this statistical test to investigate the given hypotheses as shown in Table 21.5. The test is performed using the statistical toolbox of Matlab. We display in Table 21.5 of P-values on every function of statistical tests with a significance level of 0.05. Rows “1 (Better),” “0 (Same),” and “−1 (Worse)” give the number of functions that the MsHMM-PSO performs significantly better than, almost the same as, and significantly worse than other algorithms. Executing statistical inferred t-test on the thirty executions, clearly, the MsHMM-PSO outperforms the other PSO variants and provides a competitive upgrade in PSO performances.

21 A Cooperative Multi-swarm Particle Swarm Optimizer …

331

Table 21.5 Statistical tests Funct.

APSO

PSO

SimuAPSO

SecPSO

RandW- YSPSO PSO

SelPSO

SecVibratPSO

SAPSO

LinWPSO

AsyLnCPSO

MsPSO

f1

0

0

0

0

0

0.026

0

0

0

0

0

0

f2

0

0

0

0

0

0

0

0

0

0

0

0

f3

0

0

0

0.0018

0

0

0

0.0011

0

0.0612

0

0.0721

f4

0

0

0

0

0

0

0

0

0

0

0

0

f5

0

0

0

0

0

0

0

0

0

0

0

0

f6

0

0

0

0

0

0

0

0

0

0

0

0

f7

0

0

0

0

0

0

0

0

0

0

0

0

f8

0

0

0

0

0

0

0

0

0

0

0

0

f9

0.062

0

0

0

0

0

0

0

0

0

0

0.248

f 10

0

0

0

0

0

0

0

0

0

0

0

0

f 11

0

0

0

0

0

0

0

0

0

0

0

0

f 12

0

0

0

0

0

0

0

0

0

0

0

0

f 13

0

0

0

0

0

0

0

0

0

0

0

0

f 14

0

0

0

0

0

0

0

0

0

0

0

0

f 15

0

0

0

0

0

0

0

0

0

0

0

0

f 16

0

0

0

0

0

0

0

0

0

0

0

0

f 17

0

0

0

0

0

0

0

0

0

0

0

0

f 18

1

0

1

0

0

0

0

1

0

1

0

1

f 19

0

0

0

0

0

0

0

0

0

0

0

0

f 20

0.0031

0

0

0

0

0

0

0

0

0

0

0

+1

18

20

19

20

20

20

20

19

20

19

20

17

0

2

0

1

0

0

0

0

1

0

1

0

3

−1

0

0

0

0

0

0

0

0

0

0

0

0

Even if our approach has given better results, its associated CPU time is computationally more expensive because of the control mechanism of machine learning that has an extra computation time per iteration. Therefore, our approach may be useful especially in more complex optimization problems regardless the CPU-time cost.

21.5 Conclusion As a conclusion, we have displayed a new approach named MsHMM-PSO that uses a multi swarm design based on a hidden markov model with a master/slave cooperation rule. Each one of PSO particles uses its historical information and its current swarm to choose the next swarm which it will belong. Our multi swarm approach is powered by an attached hidden Markov chain to each element of the swarm that provides swarm control of particle during the search process. According to each swarm, acceleration coefficients are updated. Then, the cooperation between swarms boost more the search. Experimental results have established very competitive performances in comparison to several chosen PSO variants. We can deduce from obtained results that associating a multi swarm based machine learning with a cooperation strategy enhances significantly PSO performances.

332

O. Aoun et al.

References 1. O. Aoun, A. El Afia, M. Sarhani, Hidden Markov model control of inertia weight adaptation for particle swarm optimization. IFAC-PapersOnLine 50(1), 9997–10002 (2017) 2. O. Aoun, A. El Afia, S. Garcia, Self inertia weight adaptation for the particle swarm optimization, in Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, LOPAL’18, New York, NY, USA (ACM, 2018), pp. 8:1–8:6 3. O. Aoun, M. Sarhani, A. El Afia, Investigation of hidden Markov model for the tuning of metaheuristics in airline scheduling problems. IFAC-PapersOnLine 49(3), 347–352 (2016), in 14th IFAC Symposium on Control in Transportation SystemsCTS 2016, Istanbul, Turkey, 18–20 May 2016 4. O. Aoun, M. Sarhani, A. El Afia, Hidden Markov Model Classifier for the Adaptive Particle Swarm Optimization (Springer International Publishing, Cham, 2018), pp. 1–15 5. O. Aoun, M. Sarhani, A. El Afia, Particle swarm optimisation with population size and acceleration coefficients adaptation using hidden Markov model state classification. Int. J. Metaheuristics 7(1), 1–29 (2018) 6. M.E. Aydin, Coordinating metaheuristic agents with swarm intelligence. J. Intell. Manuf. 23(4), 991–999 (2012) 7. L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 164–171 (1970) 8. B. Bengfort, P.Y. Kim, K. Harrison, J.A. Reggia, Evolutionary design of self-organizing particle systems for collective problem solving, in 2014 IEEE Symposium on Swarm Intelligence (SIS) (IEEE, 2014), pp. 1–8 9. G. Beni, From swarm intelligence to swarm robotics, in Swarm Robotics (Springer, 2004), pp. 1–9 10. T. Blackwell, J. Branke, et al., Multi-swarm optimization in dynamic environments, in EvoWorkshops, vol. 3005 (Springer, 2004), pp. 489–500 11. S. Bouzbita, A. El Afia, R. Faizi, A novel based hidden Markov model approach for controlling the ACS-TSP evaporation parameter, in 2016 5th international conference on multimedia computing and systems (ICMCS) (IEEE, 2016), pp. 633–638 12. S. Bouzbita, A. El Afia, R. Faizi, Hidden Markov model classifier for the adaptive ACS-TSP pheromone, in Bioinspired Heuristics for Optimization, vol. 774 (2018), p. 153 13. S. Bouzbita, A. El Afia, R. Faizi, Parameter adaptation for ant colony system algorithm using hidden Markov model for TSP problems, in Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications (ACM, 2018), p. 6 14. S. Bouzbita, A. El Afia, R. Faizi, Adjusting population size of ant colony system using fuzzy logic controller, in International Conference on Computational Collective Intelligence (Springer, 2019), pp. 309–320 15. S. Bouzbita, A. El Afia, R. Faizi, M. Zbakh, Dynamic adaptation of the ACS-TSP local pheromone decay parameter based on the hidden Markov model, in 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech) (IEEE, 2016), pp. 344–349 16. M. Brambilla, E. Ferrante, M. Birattari, M. Dorigo, Swarm robotics: a review from the swarm engineering perspective. Swarm Intell. 7(1), 1–41 (2013) 17. N.J. Cheung, X.-M. Ding, H.-B. Shen, Optifel: a convergent heterogeneous particle swarm optimization algorithm for Takagi–Sugeno fuzzy modeling. IEEE Trans. Fuzzy Syst. 22(4), 919–933 (2014) 18. M.A.M. De Oca, T. Stützle, K. Van den Enden, M. Dorigo, Incremental social learning in particle swarms. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(2), 368–384 (2011) 19. M. Dorigo, L.M. Gambardella, Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997) 20. P. Dupont, F. Denis, Y. Esposito, Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognit. 38(9), 1349–1371 (2005)

21 A Cooperative Multi-swarm Particle Swarm Optimizer …

333

21. A. El Afia, M. Lalaoui, R. Chiheb, A self controlled simulated annealing algorithm using hidden Markov model state classification. Procedia Comput. Sci. 148, 512–521 (2019), in The 2nd International Conference on Intelligent computing in Data Sciences, ICDS2018 22. A. El Afia, O. Aoun, S. Garcia, Adaptive cooperation of multi-swarm particle swarm optimizerbased hidden Markov model. Prog. Artif. Intell. 8(4), 441–452 (2019) 23. A. El Afia, S. Bouzbita, R. Faizi, The effect of updating the local pheromone on ACS performance using fuzzy logic. Int. J. Electr. Comput. Eng. 7(4), 2161 (2017) 24. A. El Afia, M. Lalaoui, R. Chiheb, Fuzzy logic controller for an adaptive huang cooling of simulated annealing, in Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, BDCA’17, New York, NY, USA (Association for Computing Machinery, 2017) 25. A. El Afia, M. Sarhani, O. Aoun, A probabilistic finite state machine design of particle swarm optimization, in Bioinspired Heuristics for Optimization (Springer, Berlin, 2019), pp. 185–201 26. M.G. Epitropakis, V.P. Plagianakos, M.N. Vrahatis, Evolving cognitive and social experience in particle swarm optimization through differential evolution: a hybrid approach. Inf. Sci. 216, 50–92 (2012) 27. W. Jiang, Y. Zhang, R. Wang, Comparative study on several PSO algorithms, in The 26th Chinese Control and Decision Conference (2014 CCDC), May 2014 (2014), pp. 1117–1119 28. H. Jianxiu, Z. Jianchao, A two-order particle swarm optimization model. J. Comput. Res. Dev. 11, 004 (2007) 29. J. Kennedy, R.C. Eberhart, Particle swarm optimization, in Proceedings of IEEE International Conference Neural Networks, pp. 1942–1948 (IEEE, 1995) 30. M. Lalaoui, A. El Afia, R. Chiheb, Hidden Markov model for a self-learning of simulated annealing cooling law, in 2016 5th International Conference on Multimedia Computing and Systems (ICMCS) (IEEE, 2016), pp. 558–563 31. M. Lalaoui, A. El Afia, R. Chiheb, A self-adaptive very fast simulated annealing based on hidden Markov model, in 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech) (IEEE, 2017), pp. 1–8 32. M. Lalaoui, A. El Afia, R. Chiheb, A self-tuned simulated annealing algorithm using hidden Markov model. Int. J. Electr. Comput. Eng. 8(1), 291 (2018) 33. M. Lalaoui, A. El Afia, R. Chiheb, Simulated annealing with adaptive neighborhood using fuzzy logic controller, in Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, LOPAL’18, New York, NY, USA (Association for Computing Machinery, 2018) 34. C. Li, S. Yang, T.T. Nguyen, A self-learning particle swarm optimizer for global optimization problems. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(3):627–646 (2012) 35. J. Li, X. Xiao, Multi-swarm and multi-best particle swarm optimization algorithm, in 7th World Congress on Intelligent Control and Automation, 2008. WCICA 2008 (IEEE, 2008), pp. 6281–6286 36. J.J. Liang, A.K. Qin, P.N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Trans. Evol. Comput. 10(3), 281–295 (2006) 37. W.H. Lim, N.A.M. Isa, An adaptive two-layer particle swarm optimization with elitist learning strategy. Inf. Sci. 273, 49–72 (2014) 38. L.-L. Liu, X.-B. Gao, An adaptive simulation of bacterial foraging algorithm. Basic Sci. J. Text. Univ. 4, 022 (2012) 39. Y. Liu, Z. Qin, Z. Shi, L. Jiang, Center particle swarm optimization. Neurocomputing 70(4–6), 672–679 (2007) 40. D. Monett Diaz, Agent-based configuration of (metaheuristic) algorithms. Ph.D. thesis, Humboldt University of Berlin, 2005 41. S. Mirjalili, A. Lewis, A.S. Sadiq, Autonomous particles groups for particle swarm optimization. Arab. J. Sci. Eng. 39(6), 4683–4697 (2014) 42. B. Niu, Y. Zhu, X. He, W. Henry, Mcpso: a multi-swarm cooperative particle swarm optimizer. Appl. Math. Comput. 185(2), 1050–1062 (2007)

334

O. Aoun et al.

43. L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989) 44. Y. Shi, H. Liu, L. Gao, G. Zhang, Cellular particle swarm optimization. Inf. Sci. 181(20), 4460–4493 (2011) 45. T. Stützle, M. López-Ibáñez, Automated design of metaheuristic algorithms, in Handbook of Metaheuristics (Springer, Berlin, 2019), pp. 541–579 46. S. Sun, H. Liu, Particle swarm algorithm: convergence and applications, in Swarm Intelligence and Bio-Inspired Computation, ed. by X.-S. Yang, Z. Cui, R. Xiao, A.H. Gandomi, M. Karamanoglu (Elsevier, Oxford, 2013), pp. 137–168 47. F. van den Bergh, A. Engelbrecht, A cooperative approach to particle swarm optimization. IEEE Trans. Evol. Comput. 8(3), 225–239 (2004) 48. S. Wang, M. Chen, D. Huang, X. Guo, C. Wang, Dream effected particle swarm optimization algorithm. J. Inf. Comput. Sci. 11(15), 5631–5640 (2014) 49. Z. Wu, Optimization of distribution route selection based on particle swarm algorithm. Int. J. Simul. Model. (IJSIMM) 13(2) (2014) 50. S. Yang, C. Li, A clustering particle swarm optimizer for locating and tracking multiple optima in dynamic environments. IEEE Trans. Evol. Comput. (2010) 51. Z.-H. Zhan, J. Zhang, Y. Li, H.S.-H. Chung, Adaptive particle swarm optimization. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39(6), 1362–1381 (2009) 52. J. Zhang, X. Ding, A multi-swarm self-adaptive and cooperative particle swarm optimization. Eng. Appl. Artif. Intell. 24(6), 958–967 (2011)

Chapter 22

Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method Vasileios A. Tatsis and Konstantinos E. Parsopoulos

Abstract Grid-based parameter adaptation method has been recently proposed as a general-purpose approach for online parameter adaptation in metaheuristics. The method is independent of the specific algorithm technicalities. It operates directly in the parameter domain, which is properly discretized forming multiple grids. Short runs of the algorithm are conducted to estimate its behavior under different parameter configurations. Thus, it differs from relevant methods that usually incorporate ad hoc procedures designed for specific metaheuristics. The method has been demonstrated on two popular population-based metaheuristics with promising results. Similarly to other parameter tuning and control methods, the grid-based approach has three decision parameters that control granularity of the grids and length of algorithm runs. The present study extends a preliminary analysis on the impact of each parameter, based on experimental statistical analysis. The differential evolution algorithm is used as the targeted metaheuristic, and the established CEC 2013 test suite offers the experimental testbed. The obtained results and analysis verify previous evidence on the method’s parameter tolerance, offering also an insightful view on the parameters interplay.

22.1 Introduction Metaheuristics have been long established as essential search and optimization procedures that can offer (sub-) optimal solutions in cases where traditional optimization methods are either not applicable or deficient [8, 9, 21]. All metaheuristics typically have a number of parameters that influence their dynamic. For example, the popular V. A. Tatsis (B) · K. E. Parsopoulos Department of Computer Science and Engineering, University of Ioannina, 45110 Ioannina, Greece e-mail: [email protected] K. E. Parsopoulos e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_22

335

336

V. A. Tatsis and K. E. Parsopoulos

evolutionary algorithms require from the user to set parameters such as population size, mutation and crossover type and rate, among other [6]. The ongoing research activity on the development of new metaheuristics or improvement of established ones has kept the parameter setting problem in a salient position among the most hot research topics in the relevant literature. There are two main types of parameter setting methodologies in metaheuristics: offline parameter tuning and online parameter control. While offline methods are based on preliminary trial-and-error experimentation, online methods dynamically adapt the parameters on the fly. Offline methods are better suited to cases where a multitude of related problems need to be solved. In such cases, a training subset of the problems can be used to identify a promising parameter setting that is then adopted for solving problems of the same type. Obviously this approach adds in reusability of the parameters but the computational cost of the preliminary experiments can be prohibitively high. Also, over-specialization is a deficiency that shall be taken into consideration. Design of experiments [2], F-race [3], sequential model-based optimization [11], and paramILS [10] are typical examples of such methods. Alternatively, online parameter control does not provide reusable results, but it dynamically controls the parameters based on performance feedback received during the algorithm run. Thus, the algorithm can adjust its behavior in order to maintain appropriate trade-off between exploration and exploitation during the different phases of the optimization procedure. For this purpose, ad hoc methodologies developed for the specific algorithm are usually adopted. Various online adaptation approaches designed for the differential evolution algorithm that will be later used can be found in [4, 6, 7, 13, 15, 22]. The general-purpose grid-based parameter adaptation method (henceforth abbreviated as GPAM) was recently proposed in [17]. It belongs to the category of online parameter adaptation methods, and it has been successfully applied on two popular metaheuristics, namely differential evolution and particle swarm optimization. It is based on grid search in a discretized parameter domain, exploiting performance estimations through short runs of the algorithm under alternative parameter settings. GPAM offered very promising results in state-of-the-art testbeds of various dimensions [16–19]. All parameter adaptation methods involve a number of inner decision parameters. The number of these parameters shall be reasonably small otherwise the adaptation method would require another external procedure to “tune the tuner”. Besides the small number of inner decision parameters, mild parameter sensitivity is highly desirable for adaptation methods. Thus, identifying the impact of each parameter as well as possible interactions among them is an important issue for the effectiveness of the method. A first study on the sensitivity of GPAM on its parameters was offered in [20]. In that study the main effect of various levels of the three parameters, individually, was investigated by changing one parameter at a time. The results offered some interesting initial insight on the most important parameter. The present work aims at extending the previous study through extensive experimentation and statistical analysis, in order to identify also the interplay between the parameters. For this purpose, differential

22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method

337

evolution was selected as the underlying algorithm, and the state-of-the-art CEC 2013 test suite was adopted as our testbest. Several levels of the GPAM parameters were considered and full factorial experimentation was conducted, accompanied by hypothesis testing analysis. The rest of the chapter is organized as follows: Sect. 22.2 offers brief presentation of GPAM and differential evolution. The experimental configuration and analysis is offered in Sect. 22.3. Finally, Sect. 22.4 concludes the chapter.

22.2 Background Information In the following paragraphs, the differential evolution algorithm is briefly presented along with the GPAM approach.

22.2.1 Differential Evolution Differential evolution (henceforth abbreviated as DE) [14] is a state-of-the-art metaheuristic for numerical optimization problems. Its adaptability and simplicity has placed it among the most popular metaheuristics [5], despite its known sensitivity on its control parameters. For the general continuous bound-constrained n-dimensional optimization problem: min f (x), x ∈ X ⊂ Rn , x

DE utilizes a population of N search points: P = {x1 , x2 , . . . , x N }, def

with xi ∈ X , for all i ∈ I = {1, 2, . . . , N }. All population members are randomly initialized in the search domain X . At each iteration t, the mutation, crossover, and selection operators are applied on each xi . Mutation produces a new vector ui for each xi by combining randomly selected members of the population. The most common mutation operators of one difference vector are defined as: (DE / ψ / 1)

  (t) (t) ui(t+1) = x(t) α1 + F xα2 − xα3 ,

where F > 0 is a scalar parameter. For ψ = “best” we set α1 = g, and α2 , α3 = randi(I ), where g stands for the index of the best member of the population (i.e., the one with the smallest function value), while each call at randi(I ) returns a random integer from the indices set I defined above. Alternatively, for ψ = “rand”

338

V. A. Tatsis and K. E. Parsopoulos

we set α1 , α2 , α3 = randi(I ). In both cases above, it shall hold that α j = αk = i, for all j = k. Alternative operators with two difference vectors are defined as: (DE / ψ / 2)

  (t) (t) (t) (t) ui(t+1) = x(t) α1 + F xα2 − xα3 + xα4 − xα5

where for ψ = “best” we set α1 = g, and α2 , α3 , α4 , α5 = randi(I ), while for ψ = “rand” we have α1 , α2 , α3 , α4 , α5 = randi(I ). Another variant with ψ = “currentto-best” defines α1 = i, α2 = g, α3 = i, and α4 , α5 = randi(I ). Again, α j = αk = i shall hold for all j = k. The above cases define the five most common mutation operators. Mutation is succeeded by crossover, where a trial vector vi is produced for each xi . Each component of vi is selected as the corresponding component of ui with probability CR ∈ (0, 1] (another scalar parameter of the algorithm), otherwise it is taken from xi :  vi j =

ui j , if (rand( )  CR) OR ( j = ζ ) , , xi j , otherwise.

j ∈ {1, 2, . . . , n},

where ζ = randi(1, 2, . . . , n) is a randomly selected dimension component that is automatically inherited from the mutated vector, and rand( ) is the pseudo-random number generator in the range [0, 1]. Eventually, the new vector vi competes with xi and, if it has a better function value, it replaces xi in the population for the next iteration. Detailed presentation of the DE algorithm can be found in relevant sources such as [12].

22.2.2 Grid-Based Parameter Adaptation Method Let us now describe the GPAM proposed in [17] on the DE algorithm. The two scalar parameters of DE are F and CR as presented in the previous section. The mutation operator type can be considered as a categorical parameter. GPAM can tackle such parameters as it was shown in [16]. Given discretization steps λCR and λ F for the corresponding scalar parameters, we define their discrete domains SCR , S F , respectively. Then, we can define a parameter grid as [17]: G = {(CR, F); CR ∈ SCR , F ∈ S F } . Smaller step sizes define fine-grained grids that may require longer grid searches for the detection of appropriate parameter pairs. In general, optimal step sizes depend on the studied algorithm and the optimization problem at hand, as well as on the available computational resources. Possible previous experience on the algorithm’s parameter sensitivity may provide useful insight for the proper selection of step sizes.

22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method

339

GPAM starts with an initial parameter pair in G (the central point is a reasonable choice), and initializes a primary population, Pp , with this parameter pair, denoted as (CR p , F p ). Then, Pp is evolved for a number of iterations, t pri = α × n, where α > 1 is an integer and n stands for the problem’s dimension. In [17] the typical case F, CR ∈ [0, 1] was considered, and the selected parameter values were: λCR = λ F = 0.1, α = 10. After the t pri iterations, three phases are iteratively applied. The first one is the cloning phase. The primary parameter pair has eight immediate neighboring parameter pairs in the grid, which are defined as: CR  = CR p + i λCR ,

F  = F p + j λ F , i, j ∈ {−1, 0, 1},

(22.1)

where, i = j = 0 corresponds to the current primary parameter vector. Each one of the neighboring pairs as well as the primary one are individually assigned to nine secondary populations, Ps j , j ∈ {1, 2, . . . , 9}, which are initialized as clones of the primary population Pp . In order to adapt also the mutation operator, as proposed in [16], four additional secondary populations, also called bridging populations, are used. Each bridging population is a copy of the primary population with same scalar parameters but different mutation operator, selected from the five operators defined in Sect. 22.2.1. The bridging populations are denoted as Ps j , j ∈ {10, . . . , 13}. The second phase is the performance estimation, where each one of the 13 secondary populations are evolved for a small number of iterations, tsec  t pri . These short runs aim at probing the local performance dynamic of the secondary populations, providing evidence about the current primary population under the different assigned parameter settings. Typically, tsec shall be significantly smaller than t pri in order to spare computational resources (function evaluations). For the performance assessment of the secondary populations, the average objective value (AOV) measure was proposed in [17]. The AOV of the secondary population Ps j is defined as: N 1  AOV j = f (xi ) , (22.2) N i=1 with xi ∈ Ps j , i = 1, 2, . . . , N , j = 1, 2, . . . , 13. This is the average improvement of the corresponding secondary population with its assigned parameter pair. In order to take into consideration also the crucial issue of diversity, an additional performance measure was considered, namely the objective value standard deviation (OVSD) [16], defined as:

340

V. A. Tatsis and K. E. Parsopoulos

  N 1   2 f (xi ) − AOV j , OVSD j =  N i=1

(22.3)

with xi ∈ Ps j , i = 1, 2, . . . , N , j = 1, 2, . . . , 13. Higher values of OVSD are associated to higher diversity, which is preferable for alleviating premature convergence and search stagnation. Using these two performance metrics, the secondary populations are compared in a Pareto manner. The best among them, i.e., the one with the lowest AOV and the highest OVSD, is selected as the new primary population along with its parameters. Note that more than one non-dominated populations may appear. Since these populations are incomparable, one is randomly selected among them to become the new primary population. Moreover, the concept of sufficient improvement can be implemented by replacing the primary population with a new one only if AOV is improved for at least some ε > 0, otherwise the current primary population is retained. Eventually, the dynamic’s deployment phase takes place, where the selected new primary population is evolved for t pri iterations to reveal its dynamic with the adopted parameters. This step completes a full cycle of GPAM applied on DE. Following the notation in [16], the combination of DE with GPAM is denoted as DEGPOA. For further details on the method the reader is referred to [16, 17].

22.3 Experimental Analysis In [20] a preliminary study on the impact of the three parameters of GPAM, namely t pri , tsec , λ = λ F = λCR , on DEGPOA was presented. The study was based on simplistic experimental configuration where one of the parameters was changed at a time, keeping the rest fixed. The considered parameter levels were as reported in Table 22.1. In the present study, we extend the previous results with a full factorial design investigation. The produced 43 = 64 DEGPOA instances are named after the values of the three parameters (n is skipped for t pri ) as follows: tsec _ t pri _ λ.

Table 22.1 The considered parameter values in our experimental setting Parameter Level 1 2 3 tsec t pri λ

5 5n 0.05

10 10n 0.1

15 15n 0.15

4 20 20n 0.2

22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method

341

For example, 10_5_0.15 stands for the instance with tsec = 10, t pri = 5n, and λ = 0.15. All experiments were conducted on the state-of-the-art CEC 2013 test suite [1], strictly following its guidelines. The test suite consists of 28 unimodal, multimodal, and composite functions, henceforth denoted as f 1 - f 28 . The most common dimensions n = 10 and n = 30 were considered, while the search space for all test problems was [−100, 100]n . As dictated by the test suite, the maximum computational budget was set to Tmax = 104 × n function evaluations. Solution quality was measured according to the error gap of the detected solution from the known optimal one. Fixed population size was used in all experiments, while 51 independent runs per test problem were conducted. The preliminary statistical analysis in [20] offered clear indications that tsec is the most influential parameter, followed by λ and t pri . The reader is referred to [20] for a detailed explanation. The present work aims at expanding the previous analysis through a full factorial design that can reveal possible parameter interactions and correlations. For this reason, our current experimental configuration reported above is identical to the previous one in [20]. The first part of our analysis was based on 3-way ANOVA on all the 64 DEGPOA instances, based on their average error values over the whole test suite.1 The resulting ANOVA tables for both dimensions are reported in Table 22.2. The tables contain the typical ANOVA invormation, with the last column (marked as “Prob > F”) providing the p-values. For the 10-dimensional case (upper part of Table 22.2), we can see pvalues smaller than 0.05 for tsec and λ, which suggest significant impact of these two parameters, in contrast to t pri . However, there seems to be no statistically significant interactions between any two of the parameters. This is in line with the results in the preliminary analysis in [20]. In the 30-dimensional case, the impact of tsec remains significant, but now t pri exhibits significant impact instead of λ. This interesting evidence suggests that, as dimension increases, the granularity of the grid becomes less important than the effort spent for the deployment of the algorithm’s dynamic. Also, it shows that the computational budget devoted to the estimation of the best parameter setting through the secondary populations is always important. This is a reasonable effect because tsec is the key-factor for proper assessment of the secondary populations and, consequently, for effective grid search in the parameter domain. In addition to the these evidence, we followed an alternative analysis to identify the most promising parameter combinations. For this purpose, we employed Wilcoxon rank-sum tests to compare all algorithm instances among them on each test function, individually, based on their solution errors in the 51 independent runs. For each comparison between two algorithm instances, the rank-sum test revealed possible statistically significant difference between the compared error samples. In this case, the algorithm with the smallest mean error was awarded a win, and the opponent algorithm a loss. In case of insignificant difference, both algorithms assumed a draw. 1 The

MathWorks Matlab® software was used for this purpose.

342

V. A. Tatsis and K. E. Parsopoulos

Table 22.2 Three-way ANOVA for the 64 DEGPOA instances Source Sum sq. d.f. Mean sq. Dimension: n = 10 tsec t pri λ tsec ∗ t pri tsec ∗ λ t pri ∗ λ Err or T otal

12.38 3.1402 13.7274 8.2554 4.5431 2.8411 18.1128 63

F

Prob > F

3 3 3 9 9 9 27 63

4.12667 1.04675 4.5758 0.91727 0.50478 0.31568 0.67084

6.15 1.56 6.82 1.37 0.75 0.47

0.0025 0.2219 0.0014 0.251 0.6592 0.8815

3 3 3 9 9 9 27 63

5.31632 3.05771 1.23643 0.54477 0.75746 0.6402 0.61803

8.6 4.95 2 0.88 1.23 1.04

0.0004 0.0073 0.1376 0.5534 0.3208 0.4382

Dimension: n = 30 tsec t pri λ tsec ∗ t pri tsec ∗ λ t pri ∗ λ Err or T otal

15.949 9.1731 3.7093 4.9029 6.8171 5.7618 16.6868 63

After all pairwise comparisons, we calculated a rank measure for each algorithm, defined as the difference between its wins and losses: rank(alg) = winsalg − lossesalg . Since each algorithm is compared to 63 others on 28 functions, it holds that: −1764  rank(alg)  1764, with higher values denoting better performance (higher number of wins against losses). All ranks are sorted and graphically illustrated in Fig. 22.1. The sorted ranks clearly illustrate that the performance among all algorithm instances varies significantly in some cases. In order to get a better picture, we isolated the algorithm instances with positive ranks and conducted the same rank analysis strictly among them. The corresponding sorted ranks are given in Fig. 22.2. As we can see, the top-performing instances for both dimensions prefer larger values of t pri (15n or 20n) but smaller values of tsec (typically 5 or 10). Interpreting this evidence, we conclude that the most efficient algorithms are the ones putting more emphasis on the dynamic deployment phase rather than the performance estimation

22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method

343

Fig. 22.1 Sorted ranks of the 64 DEPGPOA instances

phase, where rough estimations are adequate to guide the grid search. Interestingly, higher levels of λ appear more frequently in the top-performing algorithms. This shows that the DEGPOA approach does not require highly fine-grained grids, probably as a consequence of DE’s reduced sensitivity in marginal changes of its scalar parameters (this finding has been reported also in [16]). The observed patterns of higher t pri and λ levels and lower tsec levels are more clearly illustrated in Figs. 22.3 and 22.4. These figures illustrate all algorithm instances sorted according to their ranks, along with their parameter levels. In order to avoid scaling issues, the level indices (i.e., 1, 2, 3, 4) of the parameter values are used instead of their actual values. The aforementioned pattern is clearly observed on the right part of the figures, where the best-performing algorithm instances lie, verifying the previous findings.

344

V. A. Tatsis and K. E. Parsopoulos

Fig. 22.2 Sorted ranks of the 64 DEPGPOA instances

Fig. 22.3 The 64 DEGPOA instances sorted by their ranks, along with their corresponding parameter levels for the 10-dimensional functions

22 Experimental Sensitivity Analysis of Grid-Based Parameter Adaptation Method

345

Fig. 22.4 The 64 DEGPOA instances sorted by their ranks, along with their corresponding parameter levels for the 30-dimensional functions

22.4 Conclusion We presented an extended analysis of GPAM applied on the DE algorithm, adding to previous experimental analysis. The presented study is based on full factorial statistical analysis for various parameter levels of the resulting DEGPOA approach, based on the established CEC 2013 test suite. The main findings can be summarized as follows: (a) In smaller dimension, tsec and λ have higher impact on the algorithms than t pri . In higher dimension, tsec and t pri exhibit higher statistical significance than λ. Thus, as dimension increases, the granularity of the grid becomes less important than the deployment of the algorithm’s dynamic. (b) Higher values of t pri and λ, and lower values of tsec are associated with the best performing algorithm instances. These findings verify at large previous findings and they justify the parameter value choices in previous works on DEGPOA. Naturally, the observed results are highly associated with the specific algorithm and testbed. Further experimentation is needed to probe the algorithm’s behavior in different experimental environments, as well as for different algorithms.

References 1. Complementary material: Special session & competition on real-parameter single objective optimization at CEC’2013, http://www.ntu.edu.sg 2. T. Bartz-Beielstein, Experimental Research in Evolutionary Computation (Springer, Berlin, 2006) 3. M. Birattari, Tuning Metaheuristics: A Machine Learning Perspective (Springer, Berlin, 2009) 4. J. Brest, M.S. Maucec, Self-adaptive differential evolution algorithm using population size reduction and three strategies. Soft Comput. 15, 2157–2174 (2011) 5. S. Das, P.N. Suganthan, Differential evolution: A survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15(1), 4–31 (2011)

346

V. A. Tatsis and K. E. Parsopoulos

6. A.E. Eiben, R. Hinterding, Z. Michalewicz, Parameter control in evolutionary algorithms. IEEE Trans. Evol. Comput. 3(2), 124–141 (1999) 7. A.E. Eiben, S.K. Smit, Evolutionary algorithm parameters and methods to tune them, in Autonomous Search, chapter 2, eds. by Y. Hamadi, E. Monfroy, F. Saubion (Springer, Berlin, 2011), pp. 15–36 8. M. Gendreau, J. Potvin, Handbook of Metaheuristics, 2nd edn. (Springer, New York, 2010) 9. A. Gogna, A. Tayal, Metaheuristics: review and application. J. Exp. Theor. Artif. Intell. 25(4), 503–526 (2013) 10. H.H. Hoos, Automated algorithm configuration and parameter tuning, in Autonomous Search, chapter 3, eds. by Y. Hamadi, E. Monfroy, F. Saubion (Springer, Berlin, 2011), pp. 37–72 11. F. Hutter, H.H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, in Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy. Selected Papers, ed. by A.C. Coello Coello (Springer, Berlin, 2011), pp. 507–523 12. K.V. Price, R.M. Storn, J.A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization (Springer, Berlin, 2005) 13. A.K. Qin, V.L. Huang, P.N. Suganthan, Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans. Evol. Comput. 13(2), 398–417 (2009) 14. R. Storn, K. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Opt. 11, 341–359 (1997) 15. R. Tanabe, A. Fukunaga, Improving the search performance of SHADE using linear population size reduction, in 2014 IEEE Congress on Evolutionary Computation (2014) 16. V.A. Tatsis, K.E. Parsopoulos, Grid search for operator and parameter control in differential evolution, in 9th Hellenic Conference on Artificial Intelligence, SETN ’16 (ACM, 2016), pp. 1–9 17. V.A. Tatsis, K.E. Parsopoulos, Differential evolution with grid-based parameter adaptation. Soft Comput. 21(8), 2105–2127 (2017) 18. V.A. Tatsis, K.E. Parsopoulos. Grid-based parameter adaptation in particle swarm optimization, in 12th Metaheuristics International Conference (MIC 2017) (2017), pp. 217–226 19. V.A. Tatsis, K.E. Parsopoulos, Experimental assessment of differential evolution with gridbased parameter adaptation. Int. J. Artif. Intell. Tools 27(04), 1–20 (2018) 20. V.A. Tatsis, K.E. Parsopoulos, On the sensitivity of the grid-based parameter adaptation method, in 7th International Conference on Metaheuristics and Nature Inspired Computing (META 2018) (2018), pp. 86–94 21. J. Torres-Jiménez, J. Pavón, Applications of metaheuristics in real-life problems. Prog. Artif. Intell. 2(4), 175–176 (2014) 22. J. Zhang, A.C. Sanderson, JADE: Adaptive differential evolution with optional external archive. IEEE Trans. Evol. Comput. 13, 945–958 (2009)

Chapter 23

Auto-Scaling System in Apache Spark Cluster Using Model-Based Deep Reinforcement Learning Kundjanasith Thonglek, Kohei Ichikawa, Chatchawal Sangkeettrakarn, and Apivadee Piyatumrong Abstract Real-time processing is a fast and prompt processing technology that needs to complete the execution within a limited time constraint almost equal to the input time. Executing such real-time processing needs an efficient auto-scaling system which provides sufficient resources to compute the process within the time constraint. We use Apache Spark framework to build a cluster which supports realtime processing. The major challenge of scaling Apache Spark cluster automatically for the real-time processing is how to handle the unpredictable input data size and also the unpredictable resource availability of the underlying cloud infrastructure. If the scaling-out of the cluster is too slow then the application can not be executed within the time constraint as a result of insufficient resources. If the scaling-in of the cluster is slow, the resources are wasted without being utilized, and it leads less resource utilization. This research follows the real-world scenario where the computing resources are bounded by a certain number of computing nodes due to limited budget as well as the computing time is limited due to the nature of near real-time application. We design an auto-scaling system that applies a deep reinforcement learning technique, DQN (Deep Q-Network), to improve resource utilization efficiently. Our model-based DQN allows to automatically optimize the scaling of the cluster, because the DQN can autonomously learn the given environment features so that it can take suitable actions to get the maximum reward under the limited execution time and worker nodes. K. Thonglek (B) · K. Ichikawa Nara Institute of Science and Technology (NAIST), Nara, Japan e-mail: [email protected] K. Ichikawa e-mail: [email protected] C. Sangkeettrakarn · A. Piyatumrong National Electronics and Computer Technology Center (NECTEC), Pathum Thani, Thailand e-mail: [email protected] A. Piyatumrong e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_23

347

348

K. Thonglek et al.

23.1 Introduction Nowadays, big data technology is more important than ever because the amount of data is increasing exponential over time [1]. It is known that with the higher amount of quality data, finding valuable information out of it is very much expected. Thus, organization, company as well as SMEs are all trying to extract valuable information out of their data. To that end, they need researchers, data scientists/engineers who can efficiently handle large-scale data processing systems. However, it is also known that data processing science is a relatively new field where it requires advanced knowledge on a huge variety of techniques, tools, and theories. For example, a near real-time data computing application might have to handle different sizes of the input data at the different time as well as different techniques of Machine Learning for different purposes at the same time. And yet the task has to produce output within a certain time to guarantee the quality of service of a so-called ‘near real-time’ application. Hence, new workforces, who are inexperienced with the field, are needed to be trained and to be helped by using some or several tool packages. As a consequence, there are many big data technology tools to support both experienced and inexperienced users to process data faster such as Apache Flink, Apache Nifi, Apache Kafka, Apache Hadoop and Apache Spark [2]. Among these technologies, Apache Spark is one of the most popular general purpose open source frameworks for distributed cluster computing [3]. It is used as a processing engine for big data equipped with rich language-integrated APIs and a wide range of libraries [4]. This work aims to optimize scaling mechanism in order to accommodate real-time processing applications on Apache Spark such that (1) the applications successfully complete within the bounded execution time and (2) utilize the computing resources efficiently. The methodology employs a learning algorithm to automate the scaling computing node of Apache Spark cluster while satisfying a set of constraints. The main challenge, apart from working under the near real-time scenario, is to make the correct decision to scale under the very dynamic of input data size and the workload of the underlying infrastructure. There are two main approaches in literature to scale cluster automatically: rulebased scaling and data-driven scaling. Rule-based scaling uses fixed rules to control the scale of the cluster. On the other hand, data-driven scaling observes the system and collects data in order to learn how to predict the suitable scaling action. This research employed deep reinforcement learning technique and adapted it for scaling the Apache Spark cluster so that it can learn from the environment features that are analyzed and selected by this work. Then, the learning agent makes a decision on which actions the system should take. The goal of this work is to improve the resource utilization rate of Apache Spark on OpenStack system dynamically. The rest of this paper is organized as follows. Section 23.2 describes related research works about scaling resources and deep reinforcement learning technology. Section 23.3 explains our proposed methodology to realize the auto-scaling sys-

23 Auto-Scaling System in Apache Spark Cluster …

349

tem on Apache Spark cluster using deep reinforcement learning technique. Section 23.4 shows the experimental result and discuss our result. Section 23.5 presents the conclusion of this research paper and future works.

23.2 Background 23.2.1 Apache Spark on OpenStack Apache Spark is a general purpose data processing framework optimized for distributed computing. Its underlying platform utilizes a data structure called Resilient Distributed Datasets (RDD) that is highly used for supporting the distributed computing operations within Apache Spark cluster. Besides the RDD-oriented functional style of programming, Spark provides two restricted forms of shared variables: broadcast variables reference read-only data that needs to be available on all nodes, while accumulators can be used to program reductions in an imperative style. Figure 23.1 shows that there are two RDD operations: the Transformation operation produces a new RDD from existing RDDs, while the Action operation computes a RDD and returns the computing result as a value [5]. Apache Spark can run both in standalone or cluster mode. Figure 23.2 shows the key components of Apache Spark

Fig. 23.1 The two operations for RDDs

Fig. 23.2 Key components of apache spark cluster

350

K. Thonglek et al.

Cluster. The cluster manager is an important component on Apache Spark cluster since it controls data operations between data node, master node and worker node. In this cluster mode Fig. 23.2, the SparkContext, located inside Driver Program, connects to cluster managers to allocate resources (the worker nodes) for applications. SparkContext is an Apache Spark’s component that enables to create RDDs, accumulators and broadcast variables, access Spark services and run jobs. Once connected, Spark Cluster acquires executors on the worker nodes in the cluster. These worker nodes of the cluster will run computations and store data for the applications. The application code is submitted to the executors on the worker nodes via the cluster manager on the master node [3]. Computing partitions in a RDD is a distributed process by design and to achieve even data distribution as well as leverage data locality, they are partitioned to a fixed number of partitions logical chunks or parts of data. The logical division is for processing only and internally it is not divided. Each partition comprises of records. Big data computing and real-time analytic applications are known for the challenges of handling huge data size and large-scale computing resources. Intuitively, these applications require High Performance Computing (HPC) or cloud platform where computing resources tend to be enormous. In this work, we are interested in Apache Spark Cluster on top of OpenStack platform. OpenStack is a cloud operating system that controls large pools of computing, storage, and networking resources, all managed through a dashboard that gives administrators controllable while empowering their users to provision resources through a web interface [6]. Sahara [7] is the component helping OpenStack to scale cluster. Sahara API works together with the template configuration of OpenStack that preconfigured operating system and resources organization. It is a free and open-source software platform for cloud computing, whereby virtual servers and other resources are made available to customers. There are two types of cluster scaling: scale-out is to add worker node into the cluster, while scale-in is to remove worker node out from the cluster. By respecting the resource efficiency, one would Scale-out the cluster to provide more resources in order to be sufficient, while for removing wasted resources is the key for scaling-in. The main challenge of scaling cluster in this work is to support real-time processing computation for computing the dynamic incoming data size.

23.3 Methodology This section describes our proposed methodology to create an auto-scaling Apa-che Spark cluster using deep reinforcement learning technique. There are three steps of our proposed method. First features from data are needed to be select. Then, we applied DQN for our scenario by construct states, actions and reward. Finally, the auto-scaling system is designed.

23 Auto-Scaling System in Apache Spark Cluster …

351

23.3.1 Feature Selection Since the scenario of this study is to scale Apache Spark Cluster on an OpenStack cloud system, features from both Apache Spark API and OpenStack API are considered. As discussed in Sect. 23.2.1, there are two main operations of Apache Spark that are the Transformation and the Action operation. These two operations consume memory for processing differently. Moreover, it is known that Apache Spark mainly uses in-memory processing for its framework [11], we highly paid attention to the memory usage of both operations. In in-memory computation, the data is kept in random access memory instead of some slow disk drives and is processed in parallel. Using this we can detect a pattern, analyze large data. Empirical study has been done for Apache Spark log analysis within this study. Basically, all features from Apache Spark API are monitored while applications run from output console. Please note that the full features we analyzed from the Apache Spark log can be found at [12]. From the observation, it is shown that memory usages are highly different between the Action and the Transformation operation. Thus, we chose these two information as our features. Feature selected from Apache Spark API: m a is the percentage memory usage when Apache Spark operate the Action operation and m t is the percentage memory usage when Apache Spark operate the Transformation operation For OpenStack framework, as a cloud computing platform, CPU and network utilization are featured as significant factors since they can affect the performance of cluster computation on cloud [13]. We, thus, observe the CPU and network features via OpenStack API. Similarly, from the observation, we found that CPU usage of user process, CPU usage of system process, network usage on inbound Apache Spark cluster and network usage on outbound Apache Spark cluster acted differently and made the effect to the overall run-time. Feature selected from OpenStack API: cu is the percentage of CPU usage for user processes cs is the percentage of CPU usage for system processes bi is the percentage of network usage for inbound network and bo be percentage of network usage for outbound network

352

K. Thonglek et al.

Fig. 23.3 The Features and Actions Flow in the Proposed Solution System

23.3.2 Applied DQN for Auto-Scaling Task DQN is the deep reinforcement technique that we apply to our proposed system. There are three important variables that we need to define in order to apply DQN: states, actions and reward function (Fig. 23.3). S TATES AND THEIR CONSTRAINTS: States are the possible environment status of the studying system. According to the scenario we are facing, the Apache Spark Cluster is spawned as a cluster with at least one Master node and one Worker node, according to the pre-configured template of OpenStack for scaling purpose. The state we are working with is simply the current status of the Apache Spark Cluster which reflects the number of worker node, the expectation of bounded execution time T and a bounded maximum number of worker nodes N due to the limited cost for computing infrastructure. Let states is defined by SxT,N ,

(23.1)

where x is the number of current worker node and 1 ≤ x ≤ N . If the current cluster comprises a single worker node with the available resources to extend to 10 and are expected to finish the computation within 5 min, the state can be presented as S15,10 . ACTIONS: The actions for reinforcement learning to scale Apache Spark cluster where N is the maximum number of worker nodes in the cluster is defined by Ao|i y ,

(23.2)

Let Aoy be the action that Apache Spark cluster scale-out by add y worker nodes to the cluster where (x + y) ≤ N . Let Aiy be the action that Apache Spark cluster scale-in by remove y worker nodes when the current worker nodes in cluster is x from the cluster where 1 ≤ (x − y).

23 Auto-Scaling System in Apache Spark Cluster …

353

Please note that for the action of not-scaling, the value of y is zero and can be . represented by Aneutral 0 R EWARD FUNCTION: The reward equation to give the reward (r ) to an agent when it make a decision to scale the Apache Spark cluster, must has at least one worker node. The reward function utilize the features which are selected and explained earlier as well as the constraint of the cluster state (m a , m t , cu , cs , bi , bo , T, N ). Furthermore, it must take into account the number of scaling worker nodes y made by either scaling-out action ( Aoy ) or scaling-in action (Aiy ). Let w represents the value of worker nodes that learning agent has made decision o|i for an action A y  w(y) =

+y, when Aoy , the agent takes scaling-out action −y, when Aiy , the agent takes scaling-in action

The reward function is defined as

r=

(1 −

w ) (N −1)

+ m a + m t + cu + cs + bi + bo + (1 + U

T −t ) T

,

(23.3)

where t is the execution time of this round and U is the number of features, which is equal to 10 in this work. Reward equation has direct variation to resources utilization and inverse variation to execution time. Scaling-in action is saving resources. Therefore, Reward value is increased when the learning algorithm faces the situation that resource utilization rate is high, the learning agent acts scaling-in ( Aiy ) and the execution time, t is less than limit execution time, T . Reward value is decreased when the Apache Spark cluster has low resource utilization, or it agent command scaling-out and the execution time, t is higher than the bounded execution time, T . On the other hands, reward value is increased when the Apache Spark cluster has high resource utilization, or it agent command scaling-in and the execution time, t is less than the bounded execution time, T . Resource utilization is calculated by the percentage ratio between resource usage and allocated resource in the Apache Spark cluster.

23.3.3 Auto-Scaling System Design Our proposed auto-scaling system is designed and comprises three main components: (1) data publishing engine, (2) learning and scaling engine and (3) scaling-mode web interface as shown in Fig. 23.4. Data publishing engine publishes the Apache Spark cluster’s information. Learning and scaling engine subscribes the information to learn in deep reinforcement learning model then return the suitable scaling action to the

354

K. Thonglek et al.

Fig. 23.4 The proposed auto-scaling system

Apache Spark cluster. Scaling-mode web interface is able to switch the cluster scaling mode between manual-scaling mode and auto-scaling mode. These designed and implemented modules are used for auto-scaling Apache Spark cluster on OpenStack platform as stated earlier and they can be adopted with the same system architecture configuration. (1) DATA P UBLISHING E NGINE COMPONENT is the component to publish data from difference 3 data sources: Apache Spark API, Ganglia API and Sahara library. Data from Apache Spark API consists of memory usage when Apache Spark operate action (m a ) and memory usage when Apache Spark operate transformation (m t ). Data from Ganglia API on OpenStack platform consists of CPU usage of user processed (cu ), CPU usage of system processed (cs ), network usage when inbound network (bi ) and network usage when outbound network (bo ). Data from Sahara library on OpenStack platform consists of the current worker node (x) of cluster with the cluster status (i.e., processing, stop, killed). (2) L EARNING & S CALING E NGINE COMPONENT is the unit that subscribes service of data publishing. DQN network learns from the subscribed data, the environment features, in order to take action that suitable for the current cluster and application that runs on it. The suitable action ( Aoy or Aiy ) is a scaling action that cluster can scale to complete the task within limit execution time (T ) and bound under the maximum number of worker nodes (N ) of the cluster. The Learning and Scaling engine send the action to OpenStack platform via Sahara library to take an action on Apache Spark cluster. (3) S CALING -M ODE W EB I NTERFACE is the component to help preparing the scale-related action of the Apache Spark cluster. It gives the cluster resource utilization monitoring (m a , m t , cu , cs , bi , bo ) to user. Moreover, if user gives the T o|i and N values to the web interface it will also give the prediction value of A y in return. Thus, the inexperienced users can get benefit from this module for preparing the suitable cluster for the expected-to-run application. By this helping, the resource utilization can be optimized easily by the decision of the user starting at the very

23 Auto-Scaling System in Apache Spark Cluster …

355

beginning of the computing work. Another utility that this module gives to user is the ability to choose between auto-scaling by the proposed system and manual-scaling by the user at this portal.

23.4 Evaluation This research follows the real-world scenario that the computing resources are bounded by a certain number of computing nodes due to limited budget as well as the computing time is the limited due to the nature of near real-time application. Thus, the experiment setup is designed according to the scenario explained earlier. The OpenStack system is prepared and stacked up with Apache Spark Cluster configuration in necessary templates such as master node template, worker node template, data node template and the Apache Spark cluster template where one cluster must have at least one master and one worker node. The underlying system has been used for large-scale data mining and data analytics for over a year without the problem. This confirms the stability of the system to be used for this research work as well as it characters as a real system used in real-world scenario. Apache Spark cluster is launched on the prepared system in homogeneous mode. Each node has 4 vCPU, memory 8 GB and storage disk of 20 GB. There are independent variables that need to be changed over the experiment. These are the bounded execution time T , a bounded maximum number of worker nodes N and the input data size, denoted by D. Our purposed experiment is limited by the existing computing resource on OpenStack platform and the execution time of real-time processing. The possible values of each variable is as follows: T = {5, 6, …, 10} minutes N = {5, 6, …, 10} nodes D = {0.1, 0.2, …, 1.0} GB In order to imitate the real workload, Word2Vec algorithm is used for this study. Word2Vec is an algorithm to compute distributed vector representation of words. For near real-time application such as social media monitoring, texts are produced very fast and the final data at each episode of the real-time application can’t be predicted but tends to be enormous. Furthermore, Word2Vec is an algorithm that consumes in-memory and computing power which is suitable for our experiment scenario. The hyperparameter of DQN agent which is located in the Learning & Scaling Engine is defined in Table 23.1. Reinforcement learning problems are notoriously sensitive to hyperparameters, which means it’s necessary to evaluate many different hyperparameters. For hyperparameter tuning, there are two important additions we need to add to the model: (1) specify the metric we want to optimize on; and (2) give each model output the suitable scaling actions. We compare our proposed solution with the previous setup using Linear Regression model, which has been trained to have an offline single model and had no a follow-up the process to retrain at all. Figures 23.5, 23.6 show the failure rate on

356 Table 23.1 DQN agent’s Hyperparameter Hyperparameter Minibatch size Replay memory size Agent history length Target network update frequency Discount factor Action repeat Update frequency Learning rate Gradient momentum Squared gradient momentum Min squared gradient Initial exploration Final exploration Replay start size

K. Thonglek et al.

Value 32 1000000 The number of maximum worker nodes 1000000 0.99 The number of maximum worker nodes The number of maximum worker nodes 0.00025 0.95 0.95 0.01 1 0.1 50000

Fig. 23.5 The percentage of job failure with deep Q-network (DQN)

Word2Vec application based on our experiment setup between our proposed method DQN and linear regression model when the input data size is at 5 GB. The average number of failure rate using DQN to scale Apache Spark cluster is 0.116 and the number of failure rate using linear regression to scale Apache Spark cluster is 0.296. Since the average number of failure rate using DQN is less than using linear regression, this means DQN can optimize the auto-scaling such that the system can execute Word2Vec application within limit time constraint (T ) and the maximum number of possible worker nodes (N ), better than using linear regression model. DQN can make

23 Auto-Scaling System in Apache Spark Cluster …

357

Fig. 23.6 The percentage of job failure with linear regression (LR)

decision wiser and quicker to take action to scale, helping the overall efficiency of computing utilization (less failed job). Table 23.2 shows the rate of sacrifice period and stabilize period of DQN and LR. Most of the failures for DQN happened within the very beginning of the experiment. Actually, the longest round of experiments that DQN still failed to make a good decision is round 9 out of 100 experiments per test case. Comparing to linear regression model, we found that the failure pattern of LR is scattered over the 100 rounds of experiment, confirming the need of learning phase to adapt with dynamic environment which LR lacks. The table confirms about learning ability of our proposed DQN to support efficient resource better by learning how to scale under changes of constraints and input data size. Our proposed auto-scaling system using DQN has a short sacrifice period (maximum at nine rounds of application run). The disappearing of the failure number after 9 rounds show how good the policy is after the algorithm has stabilized. The period during 1 to 9 rounds shows the cost of learning algorithm and how fast the proposed DQN can adapt itself to improve. Average value of last failure using DQN scaling is at 4 which less than average value of last failure using linear regression is 62. As the result, the scaling behaviour of Apache Spark cluster automatically using deep reinforcement learning techniques shows that the last failure happened in 100 experimental rounds is less than first quarter experimental round. The deep reinforcement learning’s advantages are all about making decisions sequentially. We can explain that the out depends on the state of the current input and the next input depends on the output of the previous input. The general reinforcement-learning problem is how to enable an agent to maximize an external reward signal by acting in an unknown environment. To ensure a well-defined problem, we make assumptions about the types of possible worlds. To make the problem tractable, we settle for

358

K. Thonglek et al.

Table 23.2 The sacrifice and stabilize period of DQN and LR Time Constraint (T)

5

6

7

8

9

10

# experiment

LR

DQN

LR

DQN

LR

DQN

LR

DQN

LR

DQN

LR

DQN

1–25

4

5 L=9

4

5 L=7

3

3 L=5

2

2 L=3

0

0

0

0

26–50

2

0

3

0

0

0

1

0

1 0 L=34

0

0

51–75

2

0

2 0 L=73

1

0

1

0

0

0

0

0

76–100

2 0 L=90

0

2 0 L=96

1 0 L=84

0

0

0

0

0

near-optimal rather than optimal behavior on all but a polynomial number of time steps, as well as a small allowable failure probability. Linear regression assumes that the data are independent. That means that the scores of one subject have nothing to do with those of another. This is often, but not always, sensible. Two common cases where it does not make sense are clustering in space and time. Therefore, using linear regression to scale the cluster is not suitable because of making decision sequentially.

23.5 Conclusion In this research, we study how to optimize the scaling computing node issue of Apache Spark cluster automatically using deep reinforcement learning technique. There are six significant features that give direct impact to the performance of near real-time application running on Apache Spark cluster. Such performance of the cluster is constrained by two constraint features: the limitation of execution time and the number of maximum worker node per cluster. The six features are acquired from two main data sources including Apache Spark API and OpenStack API. Apache Spark API provides memory usage when it operates the Action and the Transformation to RDDs. OpenStack API provides CPU usage of user and system processes, together with network usage inbound and outbound the cluster. These six features are acquired by our Data Publishing Engine, that is also publishing the data for other services to make a subscription. Our proposed Learning & Scaling Engine subscribes the six features and the cluster’s states feature, then applied our DQN for auto-scaling actions. By having all these features, our proposed DQN, equipped with the reward functions (Eq. 23.3), can learn how to scale suitably for each situation of an application dynamically. The efficiency of resource utilization of cluster and the success of the application’s outcome are deployed into the reward function of our proposed DQN, which is where the learning and the optimization occur.

23 Auto-Scaling System in Apache Spark Cluster …

359

Our proposed solutions are ready to be adopted to an Apache Spark Cluster system on OpenStack Platform. To reserve the heart of being an open-source, following Apache Spark as well as OpenStack, we create our proposed solutions in two Docker containers, which can be accessed via our project repository on Github at https:// github.com/Kundjanasith/scaling-sparkcluster [12], one can install and apply this system easily using our provided Docker images. The first image is the Data Publishing Engine, which should be installed on Apache Spark cluster. The second image is the Learning & Scaling Engine and Scaling Mode Web Interface should be installed on outside of the cluster to be scaled by DQN engine. Furthermore, it contains the Scaling-Mode Web Interface to enhance the efficiency of resource utilization by suggesting the right cluster configuration for the specific application of user. One of our future works is to investigate the behaviour of DQN with a bigger system such that the number of nodes are huge and the scalability might be observed and enhanced. Moreover, more types of workload or various applications will be investigated in the future study.

References 1. P.D.B. Parolo, R.K. Pan, R. Ghosh, B.A. Huberman, K. Kaski, S. Fortunato, Attention decay in science. J. Inf. 9(4), 734–745 (2015). https://doi.org/10.1016/j.joi.2015.07.006 2. S. Singh, P. Singh, R. Garg, P.K. Mishra, Big data: technologies, trends and applications. Int. J. Comput. Sci. Inf. Technol. 6(10), 4633–4639 (2015) 3. M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, pp. 10 (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113 4. M. Armbrust, T. Das, A. Davidson, A. Ghodsi, A. Or, J. Rosen, I. Stoica, P. Wendell, R. Xin, M. Zaharia, Scaling spark in the real world: performance and usability. Proc. VLDB Endow. 8(12), pp. 1840–1843 (2015). https://doi.org/10.14778/2824032.2824080 5. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, pp. 15–28 (2012). https://www.usenix.org/conference/nsdi12/technical-sessions/ presentation/zaharia 6. Openstack platform. https://www.openstack.org/, Accessed 31 Oct. 2018 7. Sahara documentation. https://docs.openstack.org/sahara/ocata/userdoc/features.html, Accessed 01 Nov. 2018 8. A.Y. Nikravesh, S.A. Ajila, C.-H. Lung, Towards an autonomic auto-scaling prediction system for cloud resource provisioning, pp. 35–45 (2015). http://dl.acm.org/citation.cfm?id=2821357. 2821365 9. H. Arabnejad, C. Pahl, P. Jamshidi, G. Estrada, A comparison of reinforcement learning techniques for fuzzy cloud auto-scaling, CoRR (2017). http://arxiv.org/abs/1705.07114 10. A. Graves, S. Fernández, J. Schmidhuber, Multi-dimensional recurrent neural networks, CoRR (2007). http://arxiv.org/abs/0705.2011 11. C. Engle, A. Lupher, R. Xin, M. Zaharia, M.J. Franklin, S. Shenker, I. Stoica, Shark: fast data analysis using coarse-grained distributed memory, pp. 689–692 (2012). https://doi.org/ 10.1145/2213836.2213934 12. K. Thonglek, C. Sangkeettrakarn, A. Piyatumrong, Open-source software of this research : to optimize an auto-scaling apache spark cluster using deep reinforcement learning, https:// github.com/Kundjanasith/scaling-sparkcluster, Accessed 01 Nov. 2018

360

K. Thonglek et al.

13. A. Al-Shaikh, H. Khattab, A. Sharieh, A. Sleit, Resource utilization in cloud computing as an optimization problem. Int. J. Adv. Comput. Sci. Appl. 7(6) (2016). https://doi.org/10.14569/ IJACSA.2016.070643

Chapter 24

Innovation Networks from Inter-organizational Research Collaborations Saharnaz Dilmaghani, Apivadee Piyatumrong, Grégoire Danoy, Pascal Bouvry, and Matthias R. Brust

Abstract We consider the problem of automatizing network generation from interorganizational research collaboration data. The resulting networks promise to obtain crucial advanced insights. In this paper, we propose a method to convert relational data to a set of networks using a single parameter, called Linkage Threshold (L T ). To analyze the impact of the L T -value, we apply standard network metrics such as network density and centrality measures on each network produced. The feasibility and impact of our approach is demonstrated by using a real-world collaboration data set from an established research institution. We show how the produced network layers can reveal insights and patterns by presenting a correlation matrix.

S. Dilmaghani (B) · M. R. Brust SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg e-mail: [email protected] M. R. Brust e-mail: [email protected] A. Piyatumrong NECTEC, A member of NSTDA, Khlong Luang, Thailand e-mail: [email protected] G. Danoy · P. Bouvry FSTM-DCS/SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg e-mail: [email protected] P. Bouvry e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_24

361

362

S. Dilmaghani et al.

24.1 Introduction Network structures have drawn a significant attention in big data due to the various ways they offer to represent datasets. Different techniques in network theory support a variety of analysis such as visualization, link prediction, and clustering. The network construction from a dataset plays an important role and it has been applied in different areas from biology and neuroscience [28] (e.g., brain networks [5]) to modeling and analyzing galaxy distributions [18], and quantifying reputation in art [16]. Network theory often provides computationally efficient algorithms with lower complexity in comparison to a tabular structure [12]. Numerous algorithms can be applied directly on networks such as the Louvain’s algorithm [7] which is a community detection algorithm and Page Rank, that identifies the most influential object within a network. Furthermore, data transformed into network layers can provide evidence for missing or omitted information [22, 30] as well as predicting the growth of the network in terms of nodes and links [27]. Networks also provide mechanisms for adaptability and a formal approach on how to capture dynamicity [9]. With these advantages of networks at hand, we are confronted with the challenge of how to transform relational data into appropriate networks [15]. The challenge is twofold: It is not only on how to represent elements of a network but also the specific construction principles, since, for each dataset, there are numerous ways how to transform data into network [10]. Each network reveals a particular perspective on the input dataset—emphasizing some characteristics while diminishing the dominance of others [14]. In this paper, we propose a method that transforms scientific collaboration data into network layers. Our approach favors scientific projects as nodes with links generated by using a specific Linkage Threshold (L T ). We investigate different L T -values for the transformation of an inter-organizational collaboration dataset into networks. We apply our method on real-world data that describes collaborations of researchers within the National Electronics and Computer Technology Center (NECTEC) from Thailand. Our study uses a selected set of local and global metrics to determine the influence of the network structure on the network properties. We present a correlation matrix and discuss the results. The remainder of this paper is structured as follows. Section 24.2 investigates related work. We describe our method to complement data with network layers in Sect. 24.3. The experiment setup designed to analyze the proposed algorithm is represented in Sect. 24.4. Section 24.5 discusses the outcome, in particular the correlation matrix, and the future work. Finally, Sect. 24.7 concludes the paper.

24.2 Related Work The challenge of converting data to networks is an important issue when it comes to geographical data [13, 17]. Graph studies on spatial data reveal valuable information

24 Innovation Networks from Inter-organizational Research Collaborations

363

on route networks, complex urban systems [23] and the relationship between different urban areas [31]. Nevertheless, the raw dataset on geographical information is not enough by its own to conduct proper graph studies. Karduni et al. [19] focus on this challenge and have introduced an approach for geographical systems by defining a protocol describing the network properties to convert the spatial polyline data into a network. Some other studies have stressed the inference of links from relational data to design a network out of the relational data. Casiraghi et al. [11] developed a generalized hypergeometric ensembles approach to address the problem of inferring connections within relational data. The study represents a perspective of link prediction while applying predictive analysis. From a similar point of view, Xiang et al. [29] established a link-based latent variable model to infer the friendship relations within a social interaction. In addition, in another study [26] the international relations from a dataset which consists of the news from different countries are extracted by a tensor factorization technique. Moreover, Akbas et al. [1] proposed a model to construct a social network. They introduced a method built on the interactions (e.g., phone calls) considering smartphone data. Given a weight to each type of interaction, the authors define a link value as the combination of various interaction types. Akbas et al. followed up their study on network generation from interaction patterns by studies on how to infer social networks of animal groups [2–4]. Initiating from these studies, in particular from [1], we followed a similar approach in order to define our network construction models. Furthermore, in a similar approach Hong et al. [18] have also focused on different networks to represent the cosmic web in order to investigate the architecture of the universe. They introduced three network models each with their own linkage model. Furthermore, Newman [20] has established networks from scientific paper publication datasets data and modeled the collaboration as a network considering nodes as the authors and connect two authors if they had a collaboration. Scientific collaboration networks have also been studied on a particular network structure, hypergraphs, by Ouvrard et al. [21]. The authors emphasized on enhancing the visualization of these networks with respect to network properties. We followed a different approach in order to atomize the network generation process and also to stress the impact of the link definition on network analysis.

24.3 Network Generation with Linkage Threshold We propose a method to construct networks from an inter-organizational research collaboration dataset. The dataset contains information corresponding to the contribution of researchers within various teams to deliver a particular outcome (i.e., IP, paper, and prototype). Each collaborator may dedicate a certain level of contribution to each team. We benefit from this information in order to define a Linkage Threshold which measures the contribution strength between two teams.

364

S. Dilmaghani et al.

Fig. 24.1 The network structure is identified by nodes (gray circles) which illustrate teams and contain collaborators. The links are describing the collaboration between the nodes. Each pair of teams may have common collaborators (illustrated as non-gray arrows) and other members who has not collaborated (grey arrows) with other teams. In this example, T eam i and T eam j have two common collaborators with different contribution levels ( pi ), and T eam j and T eam k have only one collaborator in common. However, there is not a common collaborator between T eam l and other teams

The network structure is defined such that nodes represent teams and links illustrate the contribution of the common collaborators within each two teams. An illustration of the network structure is shown in Fig. 24.1. As explained in this figure, two teams are connected if they have common collaborators. Hence, if there is no collaboration between two teams there will be no link between the corresponding nodes in the network and they remain as isolated nodes. Furthermore, in order to prune the network based on the collaboration level between teams, we exploit L T such that increasing L T depicts a sparse network while the collaboration level of researchers is higher compared to a network with a lower L T . We define L T considering two features which describe collaboration (1) the number of common collaborators within each pair of teams, and (2) the contribution percentage of the common collaborators. Assume T eam i and T eam j have n common collaborators such that each collaborator contributes up to a certain level within the team. We define pim as the contribution percentage of collaborator m in T eam i . We determine M = {T eam i ∩ T eam j } such that it identifies the list of common collaborators between T eam i and T eam j . We thus formalize L T between each T eam i and T eam j as

LT =

1  m p + p mj , 2n ∀m∈M i

(24.1)

24 Innovation Networks from Inter-organizational Research Collaborations

365

where L T is the Linkage Threshold which measures the contribution strength between two teams. The value range of L T starts from 0 to 100%. Each value within this range represents the average of contribution percentage of collaborators within a pair of teams. For instance, with L T equals to 20%, two teams in the network are connected if the average contribution percentage of the common members between those projects is equal or greater than 20%. Thus, those two teams are neighbors in the network. In other words, a particular L T identifies teams who have the highest contribution of collaborators. Nevertheless, if the collaboration level between a team and other teams does not meet the L T condition, that team remains as isolated node in the network. Algorithm 1 Data-to-network layers Input: D , a dataset of research collaboration. Output: G , a vector of generated network layers. Procedure Data-to-NetworksD nodes List ← teams within D for each threshold in range(0, 100) do for each tuple of team in nodes List do L T ← LinkageT hr eshold(team i , team j ) if L T ≥ threshold then links List ← (team i , team j ) end if N etwor k G ← GenerateN etwor k(nodes List, links List) Insert G to G end for end for return G

24.3.1 Description of the Algorithm Assume D is a collaboration dataset. In order to generate networks, we need to identify the nodes List and links List of the network. The research teams extracted from D and considered as nodes (see line 2). For each tuple of the teams, L T is calculated as described in Eq. (24.1). Given the value of L T (see line 3), links which satisfy the condition specified in the algorithm (see line 6) are appended to the link List (see line 7). Finally, a set of networks are constructed from the list illustrating nodes and links stored in a vector G (see line 9).

24.3.2 Complexity Analysis The complexity of Algorithm 1 depends on the two main parts of the algorithm: (a) Comparing each pair of nodes to find those that serves the identified condition

366

S. Dilmaghani et al.

for threshold that is the most expensive one with a complexity of O(n 2 ). (b) The complexity of the network generation is linear such that for n nodes and m links the complexity is O(n + m).

24.4 Experiment Setup 24.4.1 Dataset We use collaboration data of researchers derived from the National Electronics and Computer Technology Center (NECTEC) from Thailand. The organization consists of different R&D departments and researchers from electronics and computer science topics (e.g., AI and advanced electronic sensing, intelligent systems and networks). NECTEC is organized such that experts collaborating within teams which may comprise various deliverables: intellectual property (IP), papers, or prototypes. The information of collaborations and contributors has stored in a relational database consisting of collaborations conducted between July 2013 and July 2018. The dataset has been retrieved from NECTEC’s knowledge management system with two key information: (1) the type of the deliverable, and (2) team contributors and contributions. The dataset of combined deliverables consists of almost 8,000 records for more than 3,000 teams. Table 24.1 represents details regarding the statistics of the dataset. Overall, there are more than 1,000 researchers in the dataset who are contributing within different teams. NECTEC evaluates each member within each team according to the (1) contribution percentage, and (2) IC-score. The contribution percentage of each member varies from 0 to 100% and represents a member’s contribution to the particular team. Furthermore, the IC-score has been developed by NECTEC to measure the impact of each team based on its status. For example, an industrial-level prototype has a higher IC-score than one in lab-level. To obtain the IC-score for a member, the total IC-score value of the deliverable has divided to the contribution percentage of the member. We perform a data analysis on NECTEC dataset in order to acquire information regarding the features describing the collaboration. As described in Eq. (24.1), L T is defined based on these features. We first examine the statistical details related to the features and present the results in Table 24.2. IC-score represents a small value

Table 24.1 Statistics of the NECTEC collaboration dataset Deliverable type #Researchers Paper Prototype IP

576 524 489

#Teams 1717 539 630

24 Innovation Networks from Inter-organizational Research Collaborations

367

Table 24.2 NECTEC dataset statistics. The mean (μ), standard deviation (σ ) and variance (σ 2 ) of IC-score and contribution percentage Features μ σ σ2 IC-score Contribution percentage

3.16 23.30

4.24 22.80

1.79 5.20

Fig. 24.2 Histograms of IC-score and contribution percentage in the collaboration dataset. They represent the number of members within a certain value of IC-score (or contribution percentage) in the dataset

in average that is equal to 3.16 which based on the definition of the variable it is highly dependent on the type of deliverable and the contribution of members. The contribution percentage of members, however, is distributed with the mean value of 23.30. We use histograms to plot the distribution of the IC-score and contribution percentage (cf. Fig. 24.2). Both histograms illustrate that a large number of members participated in teams with lower IC-score and contribution percentage.

24.4.2 Network Metrics A network (or graph) G = (V, E) consists of a set of nodes V which are connected by links from set E. We choose a set of metrics to analyze the obtained network layers from proposed Data-to-Network algorithm in Sect. 24.3. The metrics have been chosen so that different levels of local and global information of the network are captured. Metrics are considered global when the computation requires information about the wider structure in a network. On the other hand, those which only consider information accessible from the individual node and/or the information of its neighbor(s) are local. Hence, we choose network density, centrality measures,

368

S. Dilmaghani et al.

and connected components to analyze the generated networks both from global and local perspectives. Definition 24.1 Network density. The network density d of a network G is calculated as 2m d= n(n − 1) where n is the number of nodes and m is the number of links in network G. The network density illustrates the ratio of existing links to the potential ones in a network. The range of this metric varies form 0 to 1. A network density close to 0 indicates a sparse network while a higher density describes a dense network. Definition 24.2 Closeness Centrality. The closeness centrality [24] of node v in network G calculates as  1 CC (v) = d(v, u) where d(v, u) is the distance between nodes v and u. Closeness measures the average shortest path from a node to all other nodes within the network. Hence, the more central a node is, the closer it is to all other nodes. Definition 24.3 Betweenness Centrality. The betweenness centrality [8] of node v in network G is calculated as C B (v) =

 s=v=t∈V

σst (v) σst

where σst total number of shortest paths from node s to node t and σst (v) is the number of those paths that pass through v. Betweenness centrality indicates the number of times a node acts as a bridge along the shortest path between two other nodes. Nodes with high betweenness may have a significant influence in a network due to their control over the flow of information passing between others through them. Definition 24.4 Degree Centrality. The degree centrality of node v in network G is represented as C D (v) = deg(v) where deg(v) is the number of direct links which are connected to a node within the network. The degree centrality metric captures the importance of nodes, whereby the higher degree represents the immediate risk of these node when information is flowing through the network.

24 Innovation Networks from Inter-organizational Research Collaborations

369

Definition 24.5 Clustering Coefficient. The clustering coefficient [25] of node v in network G represents the degree to which nodes in a graph tend to cluster together and defines as 2T (v) CC(v) = deg(v)(deg(v) − 1) where T (v) identifies the number of triangles through node v and deg(v) is the degree of v. The clustering coefficient gives a notion on how strongly nodes are grouped in local clusters. The value of the clustering coefficient lies between 0 for a star network— in which a node’s neighbors are not connected to each other, and 1 for a clique network—in which every two distinct nodes are adjacent. Definition 24.6 Connected Components. The connected components of a network G are defined as the number of sub-networks in which there are at least two nodes connected to each other through a path. In a connected component, two nodes are in the same sub-network if there is a path between them in the network. We have benefited from the above-mentioned set of standard metrics to analyze the generated networks with different LinkageT hr esholds.

24.5 Network Analysis We applied the proposed method, Data-to-Network, on the NECTEC dataset which transforms the collaboration dataset to a set of network layers. As described in Sect. 24.3, our methodology is provides a team-based perspective where certain members are collaborating within. Hence, networks illustrate teams as nodes and a link between teams if they satisfy the defined threshold, L T . As a result, we obtained a vector of network layers, each represents a certain L T . For each network we calculated the set of standard network metrics including, network density, closeness centrality, betweenness centrality, degree centrality, clustering coefficient and connected components as described in Sect. 24.4. We chose 11 thresholds from 0 to 100 and fed the NECTEC dataset to our algorithm. The algorithm produces a network for each L T . Figure 24.3 shows some of the obtained networks. During our experiments with different thresholds, we observed the impact of isolated nodes on the outcomes from network metrics. To analyze the produced networks regardless of the influence of these nodes, the network metrics are measured after removing isolated nodes. Increasing L T generates different view of the collaboration within the institute. Considering the border lines, L T = 0 represents the network where teams are connected only if there is a collaboration between them. However, in the extreme case of L T = 100 teams are only connected if they have a full collaborations. Therefore, this layer of network only represents strong collaboration in which the members are fully participated in projects.

370

S. Dilmaghani et al.

Fig. 24.3 Visualization of the generated networks. In each network, the size of the nodes represents the degree of a node and the color illustrates the components. Such that blue shows components with the highest number of nodes, whereas gray represents the smallest components of a network. Moreover, green and red describe components which have a number of nodes within the range of previous cases. The networks are visualized using Gephi [6] and the Fruchterman–Reingold layout

We first analyze the generated networks with the global metrics to gain a general perspective about the networks. We, then consider a detailed perspective by analyzing attributes related to the nodes within the different generated network layers. The former are explained in Table 24.3 and the latter are presented in Fig. 24.4. Table 24.3 provides information regarding the number of nodes and links at each network setting and density of the network. As shown in the table, the number of teams who can satisfy the certain L T has decreased while increasing the threshold. In addition, the number of connected components has first increased such that in L T = 50 the generated network is the most fragmented layer with 112 components, and then it is degraded. The density of networks is also providing additional information where the networks generated with higher L T (>70) are shown to have a denser structure compare to the lower L T s. The closeness centrality reveals another insight of nodes, where in our set of networks with lower contribution level the maximum number of teams who are acting as broadcasters in the network is relatively high. Figure 24.4 shows the results of a set of centrality metrics (betweenness, degree, and closeness), and clustering coefficient. As shown in the figure, the values of betweenness centrality of nodes are quite low. This is expected from the datasets because the teams of NECTEC are organized such that they work on certain domains of their specialty. Hence, the number of teams that lie in the shortest path within other nodes are very small. The degree centrality of the nodes have not changed dramatically while changing the threshold. The network layers that are not constructed with

24 Innovation Networks from Inter-organizational Research Collaborations

371

Table 24.3 The global perspective of the generated networks. The number of nodes (#Nodes), number of links (#Links), number of connected components (n comp ) as well as network density (d) are calculated LT #Nodes #Links n comp d 0 10 20 30 40 50 60 70 80 90 100

2334 2330 2228 1960 1631 1282 826 526 379 298 210

65162 43614 19083 10146 6442 4562 2950 1898 1299 1084 875

36 38 45 65 93 112 99 91 77 60 43

0.024 0.016 0.008 0.005 0.005 0.005 0.009 0.014 0.018 0.024 0.04

Fig. 24.4 Measuring a set of network metrics to analyze the nodes’ behavior from the generated network layers

372

S. Dilmaghani et al.

Fig. 24.5 The correlation of measured metrics for each network generated with a certain L T

the very small or high value of L T are shown to be only collaborating with a consistent set of teams, although with different level of collaboration. Besides in the both extreme cases where the collaboration is very low, i.e., L T = 0, and very high, i.e., L T = 100, the maximum number of teams that a particular team is in collaboration with is higher than the other networks. Clustering coefficient hast the most variant values in the networks, where in L T = 50 it reveals a normal distribution covering the full spectrum. Even though this network is the most fragmented layer within the set of network layers, there are well-cluster shaped components within the network. Overall, all networks reveal a high clustering coefficient which matches with the nature of the networks as collaboration data. In particular, networks with L T < 30 and L T > 80 the clustering coefficients are considerably high (>0.65) and the corresponding number of connected components and network density are small and high respectively. Hence, among all network layers these networks present components in which the teams are intended to have more stronger collaborations. The correlation between the each metric and L T is well represented in Fig. 24.5, in which the clustering coefficient is the most correlated metric with L T . We also generate a correlation matrix, illustrated in Fig. 24.6, to investigate on the correlation between the values of the metrics we have extracted from different generated network layers. The high correlation between the metrics reveal that each network layer is informative and reveals a particular aspect of the collaboration structure that we have constructed.

24.6 Discussion Our methodology to construct network layers from collaboration data reveals several optimization criteria. Optimizing the number of network layers while still containing the maximum on distinct information for enhanced analytics is a challenging task. Moreover, the L T we have defined in this paper can be generalized to a utility function to be performed on any given collaboration dataset. In addition, deciding on an optimal L T based on predefined criteria and conditions could further improve the performance, but also widen the applicability, of our algorithm.

24 Innovation Networks from Inter-organizational Research Collaborations

373

Fig. 24.6 The correlation matrix of the metrics from the generated networks with different L T s

24.7 Conclusion The approach outlined in this paper infers possible collaboration networks of researchers within teams of an organization. Our method uses a L T to automatically generate these network layers from the relational input data. We conducted a network analysis on the produced networks using metrics such as clustering coefficient, closeness and betweenness centrality, and illustrate their impact on the different network layers. We, then, utilize the results of the metrics as an important input to visualize the generated graphs in each configuration. We conclude that the L T has a crucial impact on the network properties and must be chosen with caution. Besides we show how each network layer could reveal particular perspective from the same dataset. Besides, the influence of the L T on the results of the metrics indicates that the network representation can be optimized. Finally, we will consider different network representations for the same data in future work. We also plan on using more real-world collaboration data from distinct sources to further generalize our approach. Acknowledgements This work is partially funded by the joint research programme UL/SnT– ILNAS on Digital Trust for Smart-ICT.

374

S. Dilmaghani et al.

References 1. M. Akbas, M. Brust, D. Turgut, Social network generation and role determination based on smartphone data, in IEEE International Conference on Computer Communications (INFOCOM) Student Workshop (2012) 2. M.I. Akbas, M.R. Brust, C.H.C. Ribeiro, D. Turgut, Deployment and mobility for animal social life monitoring based on preferential attachment, in IEEE Conference on Local Computer Networks (2011), pp. 484–491 3. M.I. Akbas, M.R. Brust, C.H.C. Ribeiro, D. Turgut, fAPEbook - animal social life monitoring with wireless sensor and actor networks, in IEEE Global Telecommunications Conference GLOBECOM (2011), pp. 1–5 4. M.I. Akbas, M.R. Brust, D. Turgut, C.H. Ribeiro, A preferential attachment model for primate social networks. Comput. Netw. 76, 207–226 (2015) 5. D.S. Bassett, P. Zurn, J.I. Gold, On the nature and use of models in network neuroscience. Nat. Rev. Neurosci. 1 (2018) 6. M. Bastian, S. Heymann, M. Jacomy, Gephi: an open source software for exploring and manipulating networks, in Third International AAAI Conference on Weblogs and Social Media (2009) 7. V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008) 8. U. Brandes, A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001) 9. M.R. Brust, H. Frey, S. Rothkugel, Adaptive multi-hop clustering in mobile networks, in International Conference on Mobile Technology, Applications, and Systems (ACM, 2007), pp. 132–138 10. C.T. Butts, Revisiting the foundations of network analysis. Science 325(5939), 414–416 (2009) 11. G. Casiraghi, V. Nanumyan, I. Scholtes, F. Schweitzer, From relational data to graphs: inferring significant links using generalized hypergeometric ensembles, in International Conference on Social Informatics (Springer, 2017), pp. 111–120 12. J.G. Davis, J.K. Panford, J.B. Hayfron-Acquah, Big and connected data analysis with graph and relational databases using collaborative filtering technique. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 15(12) (2017) 13. S. Derrible, C. Kennedy, Applications of graph theory and network science to transit network design. Transport Rev. 31(4), 495–519 (2011) 14. S. Dilmaghani, M.R. Brust, A. Piyatumrong, G. Danoy, P. Bouvry, Link definition ameliorating community detection in collaboration networks. Front. Big Data 2, 22 (2019) 15. S.E. Dilmaghani, A. Piyatumrong, P. Bouvry, M.R. Brust, Transforming collaboration data into network layers for enhanced analytics (2019), arXiv:1902.09364 16. S.P. Fraiberger, R. Sinatra, M. Resch, C. Riedl, A.-L. Barabási, Quantifying reputation and success in art. Science (2018) 17. M.T. Gastner, M.E. Newman, The spatial structure of networks. Eur. Phys. J. B 49(2), 247–252 (2006) 18. S. Hong, B.C. Coutinho, A. Dey, A.-L. Barabási, M. Vogelsberger, L. Hernquist, K. Gebhardt, Discriminating topology in galaxy distributions using network analysis. Mon. Not. R. Astron. Soc. 459(3), 2690–2700 (2016) 19. A. Karduni, A. Kermanshah, S. Derrible, A protocol to convert spatial polyline data to network formats and applications to world urban road networks. Sci. Data 3, 160046 (2016) 20. M.E. Newman, Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001) 21. X. Ouvrard, J.-M.L. Goff, S. Marchand-Maillet, Networks of collaborations: hypergraph modeling and visualisation (2017), arXiv:1707.00115 22. L. Pan, T. Zhou, L. Lü, C.-K. Hu, Predicting missing links and identifying spurious links via likelihood analysis. Sci. Rep. 6, 22955 (2016) 23. F. Peiravian, A. Kermanshah, S. Derrible, Spatial data analysis of complex urban systems, in IEEE International Conference on Big Data (IEEE, 2014), pp. 54–59

24 Innovation Networks from Inter-organizational Research Collaborations

375

24. G. Sabidussi, The centrality index of a graph. Psychometrika 31(4), 581–603 (1966) 25. J. Saramäki, M. Kivelä, J.-P. Onnela, K. Kaski, J. Kertesz, Generalizations of the clustering coefficient to weighted complex networks. Phys. Rev. E 75(2), 027105 (2007) 26. A. Schein, J. Paisley, D.M. Blei, H. Wallach, Bayesian Poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts, in ACM International Conference on Knowledge Discovery and Data Mining (2015) 27. Z. Sha, Y. Huang, J.S. Fu, M. Wang, Y. Fu, N. Contractor, W. Chen, A network-based approach to modeling and predicting product coconsideration relations. Complexity 2018 (2018) 28. S. Shirinivas, S. Vetrivel, N. Elango, Applications of graph theory in computer science an overview. Int. J. Eng. Sci. Technol. 2(9), 4610–4621 (2010) 29. R. Xiang, J. Neville, M. Rogati, Modeling relationship strength in online social networks, in Proceedings of the 19th International Conference on World Wide Web (ACM, 2010), pp. 981–990 30. J. Yang, X.-D. Zhang, Predicting missing links in complex networks based on common neighbors and distance. Sci. Rep. 6, 38208 (2016) 31. C. Zhong, S.M. Arisona, X. Huang, M. Batty, G. Schmitt, Detecting the dynamics of urban structure through spatial network analysis. Int. J. Geogr. Inf. Sci. 28(11), 2178–2199 (2014)

Chapter 25

Assessing Film Coefficients of Microchannel Heat Sinks via Cuckoo Search Algorithm Jorge M. Cruz-Duarte, Arturo García-Pérez, Iván M. Amaya-Contreras, and Rodrigo Correa Abstract Film transfer coefficient is one of the most challenging variables to measure in experimental heat transfer. This happens because such a variable depends on too many others. Examples include type of media (gas or liquid), body geometry, fluid flow, thermal conductivity, and many more thermodynamic properties. In chapter proposes an estimation strategy for the film transfer coefficient by solving an inverse heat transfer problem via the Cuckoo Search global optimization algorithm. The designs were achieved through the entropy generation minimization criterion, also powered by Cuckoo Search, employing several specifications (material, working fluid and heat power). Obtained results show great estimations for signal-to-noise ratios above 30 dB, which can be reached with virtually any modern temperature sensors.

25.1 Introduction Thermal management problem of modern electronic devices is a well–known problem nowadays. Such a problem arises throughout their design and operation [17]. It is easy to find a large number of reported solutions including theoretical, numerical and J. M. Cruz-Duarte · I. M. Amaya-Contreras Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Monterrey, Nuevo León, Mexico e-mail: [email protected] I. M. Amaya-Contreras e-mail: [email protected] A. García-Pérez Universidad de Guanajuato, División de Ingenierás del Campus Irapuato-Salamanca, Salamanca, Guanajuato, Mexico e-mail: [email protected] R. Correa (B) Universidad Industrial de Santander, Escuela de Ingenierías Eléctrica, Electrónica y de Telecomunicaciones, Bucaramanga, Santander, Colombia e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_25

377

378

J. M. Cruz-Duarte et al.

experimental studies [1, 7, 16, 31]. Specifically, microchannel heat sinks (MCHSs) have been employed on this problem, and they are a recurrent approach in high thermal power dissipation applications [15, 22, 39]. This idea began with an early implementation proposed by Tuckerman and Pease in [35]. MCHSs mainly ensure a maximal dissipation of electronic power losses and a minimal additional energy consumption. For that reason, multiple MCHS designing strategies have appeared in literature [18, 32]. Among the most recent and powerful ones resides the one pioneered by Bejan [2]. His approach focuses on the thermodynamics-based Entropy Generation Minimization (EGM) criterion. Every practical electronic thermal management system, such as a MCHS, may have its performance altered when operating in a noisy environment. Moreover, practical engineering applications require precise estimation of some parameters, parting from temperature measurements. Some examples include thermophysical properties of materials and coolants, as well as boundary and initial conditions, and energy source distributions. This kind of problem is known as an Inverse Heat Transfer Problem (IHTP) [29], and its solution can help study the differences between theoretical and real system performance. For that reason, after implementing a MCHS, as well as any practical engineering system, it is necessary to add an estimation procedure for eventually tuning the system. This can help prevent marginal or unexpected behaviors. Inverse Heat Transfer Problems have been broadly used on diverse areas of practical engineering, with very creative methodologies. For example, Wang et al. estimated the grinding heat flux from a 2D temporal and spatial temperature distribution, throughout a surgical bone grinding process, implementing sequential function specification method and sequential programming [36]. Li et al. determined multiple heat sources for a mold heating system of the injection machine by solving a 3D inverse heat conduction problem [20]. Cornejo et al. approached the thermophysical properties of food in the freezing temperature range by implementing an inverse method based on simulation results [8]. Wang et al. determined heat transfer coefficients in a steel billets continuous casting process under large disturbance. They used an approach based on weighted least squares and Levenberg-Marquardt (LM) [37]. Mohebbi et al. accurately determined thermal conductivity as a linear temperature function in a 2D steady-state heat conduction process [26]. Luo and Yang calculated the total heat exchange factor by solving the IHTP via a combined method based on gradients. They employed it as reference trajectories for a control system in a reheating furnace [24]. On the other hand, Chanda et al. reached a practical thermal conductivity tensor of anisotropic composite materials, via a characterization methodology powered by an artificial neural network and a genetic algorithm [5]. Dhiman and Prasad found empirical correlations for local and average Nusselt numbers of a heated hollow cylinder in cross flow of air, implementing the LM method [10]. Furthermore, other works have been aimed at improving traditional IHTP solving methodologies, by including Tikhonov regularization. An alternative inversion model was studied by Mu et al. They solve an optimization problem by splitting it out

25 Assessing Film Coefficients of MCHSs via CS …

379

in several simpler sub-problems, and then tackling each one through the BroydenFletc.her-Goldfarb-Shanno algorithm [28]. Additionally, Duda proposed an inverse method to avoid the stability drawbacks of classical methods when solving IHTPs while using real-time measurements [11]. Nevertheless, very few manuscripts have reported IHTP solutions on heat sinks applications, such as Bodla et al. found the optimal ranges for uncertainties related to geometry and operating conditions in fin heat sinks using a stochastic methodology [3, 4]. Huang et al. estimated the optimal perforation diameter of a perforated pin-fin heat sink for a desired temperature reference [13]. Sudheesh and Siva characterized some parameters associated to heat transfer in trailing heat sinks for the reduction of stresses and distortions from welded structures [34]. Chen et al. determined average convective heat transfer coefficients of an operating plate-fin heat sink, under different fluid flow conditions. They solved the IHTP with the help of FLUENT simulation data [6]. More specific works have dealt with microchannel heat sinks, such as the one of Huang et al., who accurately estimated the dynamic heat transfer coefficient in a multi-microchannel evaporator under disturbance [14]. Authors approached the three-dimensional IHTP using a combined strategy of the tridiagonal matrix algorithm, Newton-Raphson, and local energy balance. Moreover, Maciejewska and Piasecka determined the thermal coefficients involved on a deep minichannel vertically oriented and filled with an electronic cooling liquid, using the Trefftz method [25]. As aforementioned, most works focus on the use of gradient-based techniques. Thus, we strive to fill the knowledge gap regarding the performance of modern optimization techniques for these kinds of problems. Hence, we use an alternative methodology for solving inverse heat transfer problems. As stated, we consider a modern optimization algorithm instead of gradient-based techniques. To illustrate this methodology, average convective heat transfer coefficients were estimated for different scenarios. We analyze several noise levels by creating different synthetic temperature measurements. To solve the IHTP, we follow an approach that uses the least-square error (LSE) criterion as the objective function, and the Cuckoo Search (CS) algorithm as an optimizer. Noiseless temperature data were obtained from solving the forward problem. Such problem represents a MCHS designed under certain specifications with the EGM criterion, also powered by CS. Results show great estimations for signal-to-noise (SNR) ratios above 30 dB. This chapter starts by describing the heat transfer problems, i.e. the thermal management scenario followed by its corresponding design, as well as by the forward and inverse problems. The employed optimization algorithm, Cuckoo Search, is introduced in the next section. Subsequently, the procedure carried out is detailed in the Methodology section, and results achieved are presented and discussed afterwards. The manuscript wraps up by summarizing the main highlights and remarks.

380

J. M. Cruz-Duarte et al.

25.2 Heat Transfer Problem The steady state heat transfer process of any thermal mechanism can be effectively approached through its equivalent thermal resistance (R [W/K]). This statement also covers a microchannel heat sink structure. Thus: θ Q˙ = , R

(25.1)

where Q˙ [W] is the net heat transfer rate entering into the system, and θ [◦ C] is the finite difference of temperatures between the system isothermal boundary (Ti [◦ C]) and its surroundings (Ta [◦ C]) [18, 21, 27]. The latter is a measurable quantity allowing engineers to make decisions or control their systems. Specifically, MCHS performance is directly related to θ behavior in microelectronic thermal management applications, where electronic components must operate below a threshold temperature to avoid any failure. Thus, the finite difference of temperatures between the chip-heat sink interface (Ti [◦ C]) and the ambient (Ta [◦ C]) is written as, θ = Ti − Ta = Q˙ R.

(25.2)

Moreover, R comprises several heat transfer mechanisms within the MCHS, according to literature [23, 33]. In this study, two simple components are employed: R=

1 1 + , 2 h N L hs (wc + η p Hc ) ρw f G w f cw f

(25.3)

where the right hand terms of (25.3) correspond to resistances due to convection inside channels, and to calorific capacitance of the working fluid, respectively. Hence, it is easy to notice that R depends on three kinds of parameters: design specifications, thermophysical properties and correlations. The first set mainly consists on geometrical parameters, such as Whs [m] and L hs [m]. They respectively represent the width and length of the heat sink. Hc [m], Wc = 2wc [m], W p = 2w p [m], and N = (Whs − Wc )/(Wc + W p ) are the channel height, channel width, wall width, and number of channels. This group also includes the non–geometrical design parameter G w f [m3 /s], which is the volume flow rate of the working fluid. The second group contains thermophysical properties of the body material of the heat sink (hs) and working fluid (w f ): mass density (ρ [kg/m3 ]), thermal conductivity (k [W/m·K]), and specific heat capacity (c [J/kg·K]). The last classification of parameters corresponds to empirical correlations and related expressions, such as the convective heat transfer coefficient (h [W/m2 ·K]), and the efficiency of the wall (η p ). These parameters have a commonly accepted form, as:

25 Assessing Film Coefficients of MCHSs via CS …



kw f Dh

381



2Wc Hc N u, with Dh = , and Wc + Hc  2h(w p + L hs ) tanh(m Hc ) ηp = , with m = , m Hc kw p L hs h=

(25.4) (25.5)

where N u is the dimensionless Nusselt number, and Dh [m] is the effective hydraulic diameter of one channel. Once design specifications, thermophysical properties and correlations are defined, the temperature difference in a heat sink application, θ , can be approximated via (25.2). Let x be a vector containing all the parameters required to obtain θ , θ (x) : R D → R since x ∈ R D . There are infinite possible values for x, but their selection depends on the practical application and engineer expertise. Say x is formed by two parameter vectors, x = (y, z), where y is the set of known parameters from a practical setup specification, e.g.. size constraints, fluid flow pumping power and net heat flux. Likewise, z is the vector of parameters subject to the expert knowledge, whose values can enhance or jeopardise the system performance. As an illustrative example, y and z could be: y = (Whs , L i , L hs , Hc , ρhs , ρw f , khs , kw f , cw f , . . . ) , z = (wc , w p , G w f ) . Nevertheless, it is possible to reduce any uncertainty by finding, objectively, the best values of z. Such a configuration can be reached through several conceptual schemes. One example is to use a minimization procedure of the equivalent thermal resistance. There exists a recurrent and powerful methodology for finding these parameters, based on the second law of thermodynamics. This process is based on the Entropy Generation Minimization (EGM) criterion. EGM has been employed by several authors since 2009 for designing microchannel heat sinks [12, 18, 30]. The idea is to minimize the total entropy generation rate ( S˙ gen [W/K]), given by, S˙ gen (z) =

    ˙ 2  Gw f R(x) Q P(x) , + Ta Ti (x) Ta y

(25.6)

where Ta and Ti are temperatures of the ambient and the chip-heat sink interface, respectively. They are both related with θ as θ = Ti − Ta , and thus with Q˙ and R by (25.1). Moreover, P [Pa] is the pressure drop perceived by the working fluid flowing throughout the system with a volume flow rate of G w f [m3 /s], which is modelled as shown in Eq. (25.7), where f c is the Darcy friction factor for inner walls of the channels. ρw f P = 2



Gw f N Wc Hc

2 

f c L hs 2.32Wc 0.53Wc2 + 1.79 − + Dh Wc + W p (Wc + W p )2

 , (25.7)

382

J. M. Cruz-Duarte et al.

For the sake of brevity, more information about all described concepts and formulae can be found in [9]. However, some parameters from the third classification, i.e., empirical correlations, are just approximate forms obtained from general cases that could be erroneous in several specific and suitable applications [19]. Hence, a more accurate form or value for these parameters is needed. Many studies have dealt with the estimation of unknown quantities, which usually comes from the solution of an inverse problem. Authors have used measurements from an experimental setup implementing the traditional Least Squa-red Error (LSE) criterion, which consists on minimizing ε(z) [40], such as 2  ε(z) =  θ m − θ (x)|y 2 ,

(25.8)

since θ m [◦ C] is the vector of temperature difference measurements at the chip-heat sink interface with respect the ambient. θ (x) [◦ C] is the vector of values calculated for the same temperature difference, using a known functional form (or model), a set of known parameters y, and a set of candidate values for the unknown parameters z, with x = (y, z). Plus, || · ||22 is the ordinary Euclidian two-norm. Finally, all the above mentioned about the electronic thermal management scenario, based on a microchannel heat sink, can be summarized in three well-known heat transfer problems that are now described.

25.2.1 Design Heat Transfer Problem In a given practical scenario for a microelectronic thermal management problem (i.e., the conceptual design of a microchannel heat sink), with a defined set of specifications y and constraints zl ∧ zu , it is possible to determine the set of parameters z. Such parameters optimise the steady state system performance, in the sense of entropy generation. From (25.6),   z∗ = arg min S˙ gen (x)y , z

(25.9)

s.t. zl  z  zu .

25.2.2 Objective of the Direct Heat Transfer Problem The direct heat transfer problem is laid out with the objective of calculating the temperature in a chip–heat sink interface, for a specific microelectronic heat sinking process. Heat power dissipation is assumed to be uniform and homogeneous. Other known parameters and constrictions are given by x = (y, z). They are evaluated in the mathematical model from (25.2), as

25 Assessing Film Coefficients of MCHSs via CS …

θ (x)  Q˙ R(x),

383

(25.10)

where x = (Whs , L hs , Hc , wc , w p , . . . ) is the vector of all the parameters detailed in (25.3). Some parameters of x can be determined via a design process, as it was mentioned in the design heat transfer problem.

25.2.3 Objective of the Inverse Heat Transfer Problem In contrast, the inverse heat transfer problem is laid out with the objective of estimating some parameter values of x, which are unrelated to the design specifications y, say z as x = (y, z). The selected scenario for this corresponds to a practical microchannel heat sink, where at least one temperature is measurable. This can be achieved by solving

z∗ = arg min ε(x)|y z (25.11) s.t. zl  z  zu where y and z are vectors of known and unknown parameters, respectively, since x = (y, z) is the vector of all parameters from the direct problem formulation, that is x = (Whs , L hs , Hc , wc , w p , . . . ) . The set of values of z for the experimental setup can be obtained from a conceptual design, or from previous knowledge of the direct problem solution.

25.3 Cuckoo Search Algorithm Cuckoo Search (CS) is a modern global optimization algorithm that has been widely implemented, and it was formulated by Yang and Deb in 2009, as a bio-inspired technique that mimics the brood parasitism behavior of certain species of cuckoos in nature [38]. CS can be described in few words as a mutation-based swarm-intelligence algorithm with Lévy flights. It is formally described by using the following mathematical definitions, and its overall logic is laid out in Algorithm 1. Definition 25.1 Let X t = {xt1 , xt2 , . . . , xtN } be a finite set of candidate solutions for any optimisation problem in R D , with known cost function f : R D → R. N is the number of candidate solutions, D is the number of unknown variables and, xtk ∈ R D denotes the kth candidate solution at step t of an iterative procedure. Let it be the Cuckoo Search algorithm. Definition 25.2 Let X t+1 represents the finite set of new candidate solutions that improves the previous set of solutions X t through two strategies, i.e., Lévy flights t and eggs discovery. Each candidate of X t+1 suffices f (xt+1 k ) ≤ f (xk ).

384

J. M. Cruz-Duarte et al.

Definition 25.3 Let xt∗ ∈ X t be the best solution found at the tth iteration, determined by xt∗ = arg min{ f (X t )}. Definition 25.4 Strategy 1—Let xt+1 ∈ R D be a new candidate position obtained k by = xtk + δx λ L ◦ (xtk − xt∗ ), (25.12) xt+1 k where δx is the step size, commonly set as 0.1 for D ≤ 3, and 0.01 for D > 3. λ L is a vector of i.i.d. symmetric Lévy stable random numbers, and ◦ is the Hadamard–Schur product. Definition 25.5 Strategy 2—Let xt+1 ∈ R D be a new candidate position with an k associated probability p D , related to the chance that a host bird discovers a hidden egg, which is determined as = xtk + u ◦ H (u − p D ) ◦ (xit − xtj ), xt+1 k

(25.13)

where u ∈ R D is a vector of i.i.d. uniform random numbers between 0 and 1, H : R D → {0, 1} D is the multidimensional form of the Heaviside function, and xit , xtj ∈ X t with i = j, where i and j are randomly selected.

Algorithm 1 Cuckoo Search (CS) Require: f : R D → R, N , δx , p D , and stopping criteria: M 1, and others (if so) Ensure: xt∗ from Definition 25.3. 1: Make t ← 0 and initialize X 0 {Definition 25.1} 2: Find x0∗ using Definition 25.3. 3: while (t ≤ M) & (any stopping criterion is not reached) do 4: Update X t+1 via Definition 25.2 with Definition 25.4 {Strategy 1} 5: Update X t+1 via Definition 25.2 with Definition 25.5 {Strategy 2} 6: Find xt+1 using Definition 25.3, and make t ← t + 1 ∗ 7: end while

25.4 Methodology All experiments were performed in a numerical computing platform, running on an iMac model 15.1, with an Intel Core i5 CPU at 1.6–2.7 GHz, 8 GB RAM, and macOS Sierra v10.12.1. Each case of study was repeated a hundred times for statistical purposes. As previously stated, we used the Cuckoo Search (CS) algorithm as an optimizer and parameters for CS were tuned according to [9]. This work was carried out in two stages, which are graphically presented in Fig. 25.1, whereas the system specifications are laid out in Table 25.1. Moreover,

25 Assessing Film Coefficients of MCHSs via CS …

385

Fig. 25.1 Direct and inverse heat transfer problems Table 25.1 Values assumed for design parameters of the system Design specifications, y Design variables, z Parameter Value Parameter Hc L hs Whs Ta Q

1.7 mm 51 mm 51 mm 300 K 25, 30, . . . , 150 W

Gw f α β

Value (0, 0.01] m3 /s (0, 1] [1, 100]

two materials for the heat sink body, and two fluids for the coolant were considered, as Table 25.2 shows. In the first case, we used Silicon (Si) and High Thermal Conductive Graphite (HTCG). In the second case, we employed air and ammonia gas (NH3 ). Figure 25.1 shows the solution of the forward problem in (25.2), under several design conditions and specifications (y, zl and zu ), and with multiple target net heat power ( Q˙ [W]) values. Design parameters (z) were obtained by minimizing the entropy generation rate of the entire system, i.e., solving (25.6) with CS in Algorithm 1. Parameters h and f c from (25.4) and (25.7), respectively, were calculated using (25.14) and (25.15), where Re is the Reynolds number [9].    kw f Wc −1.5 h= , 2.253 + 8.164 1 + Dh Hc     Dh 1.14 1 9.82Dh 2 7.6953Re fc = + 24.34 − , Re L hs Wc + Hc 

(25.14)

(25.15)

386

J. M. Cruz-Duarte et al.

Table 25.2 Thermophysical property values for heat sink materials and working fluids: High Thermal Conductive Graphite (HTCG), Silicon (Si), Air, and Ammonia Gas (NH3 ) Property HTCG Si Air NH3 ρ (kg/m3 ) k (W/m·K) c (J/kg·K) ν (m2 /s)

1000 1900 742 –

2330 148 703 –

1.1614 0.0261 1007 1.58 × 10−5

0.7 0.0270 2158 1.47 × 10−5

25.5 Results and Discussion Figure 25.2 shows the optimal values of design parameters z∗ = (G ∗w f , α ∗ , β ∗ ) . That means S˙ gen,min = S˙ gen (α ∗ , β ∗ , G ∗w f ). Or, in other words, that such parameters minimize the entropy generation of the microchannel heat sink. Each one of the plotted markers corresponds to an optimal design for all the combinations of build materials, such as Silicon (Si) and High Thermal Conductive Graphite (HTCG), and working fluids like Air and Ammonia Gas (NH3 ). Likewise, Fig. 25.3 presents the minimal entropy generation rate ( S˙ gen,min ) for each one of the obtained designs. Values are plotted for different electronic power ˙ levels. The entropy generation rate increases (as expected) as the dissipation ( Q) net heat power is augmented. There is also a noticeable difference when choosing a different material (i.e., Si or HTCG) to build the MCHS. This effect is also evident when choosing a different fluid (i.e., Air or NH3 ) to act as a coolant. The combination of HTCG-NH3 laid out best results in terms of minimal entropy generation, overcoming the traditional combination of Si-Air, which corroborates reported results in the literature. Moreover, Fig. 25.3 evidences the influence of implementing a certain material or fluid. For high power dissipation applications, i.e., beyond 60 W, this

Fig. 25.2 Optimal values reached for the design variables G ∗w f [m3 /s], α ∗ and β ∗

25 Assessing Film Coefficients of MCHSs via CS …

387

Fig. 25.3 Minimal entropy generation rate ( S˙ gen,min [W/K])

Fig. 25.4 Optimal values of equivalent thermal resistance (R ∗ [K/W])

Fig. 25.5 Optimal values of temperature finite difference (θ [◦ C])

effect becomes more evident. This influence is strongly reflected on the behavior of the optimal equivalent thermal resistance (R ∗ ). Figure 25.4 shows data for each fluid. In this figure, R ∗ decreases when Q˙ increases to avoid an excessive rise of temperature inside the electronic package, and reducing overall effects due to irreversibilities in the MCHS. Furthermore, optimal difference temperature values of θ ∗ can be calculated from the R ∗ data (cf. Fig. 25.4) using (25.2). These values are graphically presented in Fig. 25.5. Such information complements the minimal entropy production reached from the design procedure. Besides, it is a practical and measurable quantity, which describes the performance of the thermal management system.

388

J. M. Cruz-Duarte et al.

Fig. 25.6 Measured temperature difference (θm [◦ C]) values

Fig. 25.7 Estimated temperature difference (θe [◦ C]) values

Figure 25.6 displays results from the forward problem, in terms of θ , contaminated with additive white Gaussian noise (AWGN), to emulate a measured dataset. We employed several values of signal–to–noise ratio (SNR [dB]) to analyze the performance of our approach under different conditions. The measured temperature difference is identified as (θm ). Figure 25.7 presents results from the inverse heat transfer problem, where θe is the estimated temperature difference. Furthermore, estimated convection heat transfer coefficient are shown in Fig. 25.8. Striving to complement our data, Table 25.3 summarizes the root–mean–square error (R M S E) for each combination of material and fluid, and for all the noise levels.

25.6 Conclusions This work proposed an alternative strategy for estimating convection heat transfer coefficients in electronic thermal management applications. Such a strategy solves the associated inverse heat transfer problem. In the present case, we tackled this inverse problem via the Cuckoo Search (CS) algorithm, by minimizing a cost function based on the least-square error. This methodology was illustrated with a microchannel heat

25 Assessing Film Coefficients of MCHSs via CS …

389

Fig. 25.8 Estimated heat transfer coefficient (h e [W/m·K]) values, varying the net heat power dissipation ( Q˙ [W]) Table 25.3 Root–mean–square error (R M S E) of estimated values for h, R M S E{h e }, varying the ˙ with different materials and working fluids net heat power Q, SNR (dB) R M S E{h e } Si–Air Si–NH3 HTCG–Air HTCG–NH3 Noiseless 40 30 20 10 5

0.00 2.04 3.17 8.55 12.48 19.28

0.00 1.64 7.52 72.22 27.54 33.78

0.00 0.27 0.70 5.79 5.33 6.22

0.00 0.34 0.99 4.39 10.00 10.70

sink with several design conditions. Average convective heat transfer coefficients were estimated from several synthetical temperature measurements, with different noise levels. Reference temperature data were obtained by solving the forward problem, based on the equivalent thermal resistance model of a previously designed microchannel. Finally, these designs were achieved through the entropy generation minimization criterion, also powered by CS, employing several specifications (material, working fluid and heat power). Results agreed remarkably well for signalto-noise ratios above 30 dB, which can be reached with any modern temperature sensors.

References 1. A.A. Alfaryjat, D. Stanciu, A. Dobrovicescu, V. Badescu, M. Aldhaidhawi, Numerical investigation of entropy generation in microchannels heat sink with different shapes. IOP Conf. Ser.: Mater. Sci. Eng. 147, 012134 (2016) 2. A. Bejan, Entropy generation minimization: the new thermodynamics of finite-size devices and finite-time processes. J. Appl. Phys. 79(3), 1191–1218 (1996)

390

J. M. Cruz-Duarte et al.

3. K.K. Bodla, J.Y. Murthy, S.V. Garimella, Optimization under uncertainty applied to heat sink design. J. Heat Transf. 135(1), 011012 (2012) 4. K.K. Bodla, J.Y. Murthy, S.V. Garimella, Optimization under uncertainty for electronics cooling design applications, 13th IEEE ITHERM Conference, pp. 1191–1201 (2012) 5. S. Chanda, C. Balaji, S.P. Venkateshan, G.R. Yenni, Estimation of principal thermal conductivities of layered honeycomb composites using ANN–GA based inverse technique. Int. J. Thermal Sci. 111, 423–436 (2017) 6. H.-T. Chen, H.-C. Tseng, S.-W. Jhu, J.-R. Chang, Numerical and experimental study of mixed convection heat transfer and fluid flow characteristics of plate-fin heat sinks. Int. J. Heat Mass Transf. 111, 1050–1062 (2017) 7. X. Chen, H. Ye, X. Fan, T. Ren, G. Zhang, A review of small heat pipes for electronics. Appl. Therm. Eng. 96, 1–17 (2016) 8. I. Cornejo, G. Cornejo, C. Ramírez, S. Almonacid, R. Simpson, Inverse method for the simultaneous estimation of the thermophysical properties of foods at freezing temperatures. J. Food Eng. 191, 37–47 (2016) 9. J.M. Cruz-Duarte, A. Garcia-Perez, I.M. Amaya-Contreras, C.R. Correa-Cely, R.J. RomeroTroncoso, J.G. Avina-Cervantes, Design of microelectronic cooling systems using a thermodynamic optimization strategy based on cuckoo search, in IEEE Transactions on Components, Packaging and Manufacturing Technology, pp. 1–9 (2017) 10. J.K. Dhiman, S.K., Prasad, Inverse estimation of heat flux from a hollow cylinder in cross-flow of air. Appl. Therm. Eng. 113(113), 952–961 (2017) 11. P. Duda, Solution of inverse heat conduction problem using the Tikhonov regularization method. J. Therm. Sci. 26(1), 60–65 (2017) 12. A. Ebrahimi, F. Rikhtegar, A. Sabaghan, E. Roohi, Heat transfer and entropy generation in a microchannel with longitudinal vortex generators using nanofluids. Energy 101, 190–201 (2016) 13. C.H. Huang, Y.C. Liu, H. Ay, The design of optimum perforation diameters for pin fin array for heat transfer enhancement. Int. J. Heat Mass Transf. 84, 752–765 (2015) 14. H. Huang, N. Borhani, N. Lamaison, J.R. Thome, A new method for reducing local heat transfer data in multi-microchannel evaporators. Int. J. Therm. Sci. 115, 112–124 (2017) 15. S.T. Kadam, R. Kumar, Twenty first century cooling solution: microchannel heat sinks. Int. J. Therm. Sci. 85, 73–92 (2014) 16. S.G. Kandlikar, Review and projections of integrated cooling systems for three-dimensional integrated circuits. J. Electron. Packag. 136(2), 24001 (2014) 17. S.S. Khaleduzzaman, M.R. Sohel, R. Saidur, I.M. Mahbubul, I.M. Shahrul, B.A. Akash, J. Selvaraj, Energy and exergy analysis of alumina-water nanofluid for an electronic liquid cooling system. Int. Commun. Heat Mass Transf. 57, 118–127 (2014) 18. W.A. Khan, J.R. Culham, M.M. Yovanovich, Optimization of microchannel heat sinks using entropy generation minimization method. IEEE Trans. Compon. Packag. Technol. 32(2), 243– 251 (2009) 19. F. Kreith, R.M. Manglik, M.S. Bohn, Principles of Heat Transfer, 7th edn. (Cengage Learning, Stamford, CT, 2011) 20. J. Li, N. Jiang, Z. Gao, H. Liu, G. Wang, An inverse heat conduction problem of estimating the multiple heat sources for mould heating system of the injection machine. Inverse Probl. Sci. Eng. 24(9), 1587–1605 (2016) 21. J.H. Lienhard IV, J.H. Lienhard V, A Heat Transfer Textbook, 4th edn. (Phlogiston Press, Cambridge, 2012) 22. L. Lin, Y.-Y. Chen, X.-X. Zhang, X.-D. Wang, Optimization of geometry and flow rate distribution for double-layer microchannel heat sink. Int. J. Therm. Sci. 78, 158–168 (2014) 23. D. Liu, S.V. Garimella, Analysis and optimization of the thermal performance of microchannel heat sinks. Int. J. Numer. Methods Heat Fluid Flow 15(1), 7–26 (2005) 24. X. Luo, Z. Yang, A new approach for estimation of total heat exchange factor in reheating furnace by solving an inverse heat conduction problem. Int. J. Heat Mass Transf. 112, 1062– 1071 (2017)

25 Assessing Film Coefficients of MCHSs via CS …

391

25. B. Maciejewska, M. Piasecka, Trefftz function-based thermal solution of inverse problem in unsteady-state flow boiling heat transfer in a minichannel. Int. J. Heat Mass Transf. 107, 925– 933 (2017) 26. F. Mohebbi, M. Sellier, T. Rabczuk, Estimation of linearly temperature-dependent thermal conductivity using an inverse analysis. Int. J. Therm. Sci. 117, 68–76 (2017) 27. A.M. Morega, Principles of heat transfer, in Mechanical Engineer’s Handbook, Chap. 7, 1st edn. by D.B. Marghitu (Academic Press, Cambridge, 2001), pp. 445–557 28. S. Mu, H. Li, J. Wang, X. Liu, Optimization based inversion method for the inverse heat conduction problems. IOP Conf. Ser.: Earth Environ. Sci. 64(1), 9 (2017) 29. M.N. Ozisik, Inverse Heat Transfer: Fundamentals and Applications (CRC Press, Boca Raton, 2000) 30. A. Reddy, A critical review of entropy generation analysis in micro channel using nano fluids. Int. J. Sci. Dev. Res. 1(5), 7–12 (2016) 31. S.K. Sahoo, M.K. Das, P. Rath, Application of TCE-PCM Based Heat Sinks for Cooling of Electronic Components: A Review, vol. 59 (Elsevier, Amsterdam, 2016) 32. B. Shao, Z. Sun, L. Wang, Optimization design of microchannel cooling heat sink. Int. J. Numer. Methods Heat Fluid Flow 17(6), 628–637 (2007) 33. B. Shao, L. Wang, H. Cheng, J. Li, Optimization and Numerical Simulation of Multi-layer Microchannel Heat Sink. Proc. Eng. 31, 928–933 (2012) 34. R.S. Sudheesh, N.S. Prasad, Comparative study of heat transfer parameter estimation using inverse heat transfer models of a trailing liquid nitrogen jet in welding. Heat Transf. Eng. 36(2), 178–185 (2015) 35. D.B.B. Tuckerman, R.F.W.F.W. Pease, High-performance heat sinking for VLSI. IEEE Electron Device Lett. 2(5), 126–129 (1981) 36. G. Wang, L. Zhang, X. Wang, B.L. Tai, An inverse method to reconstruct the heat flux produced by bone grinding tools. Int. J. Therm. Sci. 101, 85–92 (2016) 37. Y. Wang, X. Luo, Yu. Yang, Q. Yin, Evaluation of heat transfer coefficients in continuous casting under large disturbance by weighted least squares Levenberg-Marquardt method. Appl. Therm. Eng. 111, 989–996 (2017) 38. X.-S. Yang, S. Deb, Cuckoo search via levy flights, in 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC) (IEEE, Piscataway, N.J., 2009), pp. 210–214 39. Y.-T. Yang, K.-T. Tsai, Y.-H. Wang, S.-H. Lin, Numerical study of microchannel heat sink performance using nanofluids. Int. Commun. Heat Mass Transf. 57, 27–35 (2014) 40. N. Zabaras, Inverse problems in heat transfer, in Handbook of Numerical Heat Trasnfer (Wiley, New York, NY, 2006), pp. 525–557

Chapter 26

One-Class Subject Authentication Using Feature Extraction by Grammatical Evolution on Accelerometer Data Stefano Mauceri, James Sweeney, and James McDermott

Abstract In this study Grammatical Evolution (GE) is used to extract features from accelerometer time series in order to increase the performance of a Kernel Density Estimation (KDE) classifier. Time series are collected through nine wrist-worn accelerometers assigned to as many subjects. The goal is to distinguish each subject from all the others in a one-class classification framework. GE-evolved solutions, referred to as feature extractors, are thoroughly analyzed. Each solution is a function able to target a specific sub-sequence of a time series and reduce it to a single scalar. In this way a long time series can be summarized to an arbitrary number of features. Results show that the proposed evolutionary algorithm outperforms two strong baselines.

26.1 Introduction The aim of subject authentication is to confirm the identity of an individual. There are several authentication technologies and applications [24]. In this study, we wish to develop a method which confirms that the accelerometer data produced by a source device corresponds to the individual to which the device is assigned. In clinical research, scientists would like to use accelerometer data for monitoring the efficacy of treatment options on movement disorders or the impact of drugs on S. Mauceri (B) Natural Computing Research and Applications Group (NCRA), University College Dublin, Dublin, Ireland e-mail: [email protected] J. Sweeney University of Limerick, Limerick, Ireland e-mail: [email protected] J. McDermott National University of Ireland, Galway, Ireland e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_26

393

394

S. Mauceri et al.

subjects’ free living activity levels [15]. However, first they are in need of classification models to confirm that a given device is worn only by the intended subject for the whole trial period. Both errors and misconduct could invalidate studies of considerable cost and duration. In addition, accelerometer data are used to perform activity recognition [22] but this application is not investigated in this study. Accelerometer data come as a sequence of time-ordered real values, thus the problem we treat falls within the time series classification domain: given a set of accelerometer time series we want to be able to separate those that are generated by a specific subject from those that aren’t. We address this problem as a feature-based time series one-class classification problem. In one-class classification the aim to learn a concept by only using examples of the same concept [14]. As a matter of fact, to distinguish an apple from another type of fruit humans do not need to be trained on all the types of fruit of the planet. It is sufficient to see a few examples of the “class” apple to learn what is an apple and separate it from what is not. In this study we want to separate the accelerometer time series generated by a subject from those that aren’t; we infer a suitable decision rule from a set of accelerometer time series all coming from that one subject. In a related work [18] we propose a feature-based time series classification method which shows good performance. A set of 25 features is manually selected from the statistical and the time series domains. An accelerometer time series recorded over an entire day is divided into 24 equally sized sub-sequences (one per hour). The 25 features are extracted from each sub-sequence and collected in a feature vector (of length 25 × 24) then used for classification. This approach, although effective, shows some limitations. For instance, how can we know in advance in how many sub-sequences it is better to divide a time series? How can we know if we need to focus on a single sub-sequence that spans from a to b or on the entire time series? Again how can we know if for a given sub-sequence [a : b] it is important to know its mean value or the result of a less intuitive function like log(mean(sin(x))? These considerations justify the need for a flexible feature extraction method for time series classification. In terms of flexibility GE [20] stands out as a powerful learning framework. GE allows us to set the general rules for the creation of suitable solutions in a bespoke grammar. Therefore, a grammar guided intelligent search leads to the development of open-ended solutions which can allow the discovery of subtle or complex relationships in the mapping from data to class labels that would be difficult for humans to identify. In summary, we use GE to create one or more feature extractors each of which can select a sub-sequence of a time series and reduce it to a single scalar (i.e. a feature). A sub-sequence is reduced to a scalar thanks to a symbolic model composed of functions which naturally “synthesise” a series of numbers into a single one e.g. the mean. In this way a time series can be reduced to an arbitrary number of features. This feature-based representation is meant to maximize the performance of a KDE classifier in the solution of the aforementioned subject authentication problem. All the features are evolved sequentially. The evolution of the first feature is driven by the search for better classification performance, in terms of AUROC (area under the receiver operating characteristic curve). Subsequent features are evolved according

26 One-Class Subject Authentication Using Feature Extraction …

395

to two objectives: the classification performance to be maximized, and the average coefficient of determination (R 2 ) with previous features to be minimized in order to reduce linear dependencies between features. This algorithm shows an excellent classification performance in terms of AUROC using only two features. The layout of the paper is as follows: related work is presented in Sect. 26.2. Data is described in Sect. 26.3. Section 26.4 describes the evolutionary system, and the grammar in use. The experiment design is illustrated in Sect. 26.5. Results are discussed in Sect. 26.6. Finally, some conclusions are drawn in Sect. 26.7.

26.2 Related Work A time series is a collection of equally spaced measurements over time. This type of data are extremely common across a variety of scientific fields [7]. There are three major approaches to time series classification: (1) feature-based classification, in which a time series is summarized in a feature vector; (2) distance-based classification, where one or more distance functions are used to measure the distance between two time series; (3) model-based classification, where the focus is on the underlying model generating the time series [26]. Feature-based methods are less sensitive to the amount of noise in the data, facilitate working with time series of large size, do not require all time series to have the same length as long as the same number of features is extracted, and allow highlighting of data local and global properties [19, 25]. However, feature extraction is a domain-specific and time consuming process which requires a long trial and error procedure, especially when a new problem is addressed [6]. Nonetheless, the choice of features makes the difference in terms of classification performance. While some of the information contained in a time series can be redundant or irrelevant and negatively affect the predictive performance of learning models [1], how to divide a time series in a set of highly descriptive sub-sequences is not known a priori [2]. The use of equally sized sub-sequences is an oversimplified approach to this problem hence a dynamic method is preferable [4]. GE [20] is a grammar-based approach to Genetic Programming (GP) [16] which evolves human-readable code according to rules specified by a grammar. It can be useful when the modeler has a weak understanding of the relation between the explanatory and the dependent variable [3]. GP is used for feature selection in EEG signals classification [11], speaker verification [17] and selection of the best tuple feature extractor/classifier for fault detection [13]. GE is used to create a non-linear mapping of pre-existing features for fetal heart rate classification [10]. Although related to feature extraction feature selection is a different problem. GP is also used for feature extraction in image classification [23] which is a different application domain. In the following we review two general purpose feature extraction algorithms for time series based on GE/GP. Eads et al. [6], evolve a single set of grammar-generated feature extractors. Each feature extractor reduces a sub-sequence of a time series to

396

S. Mauceri et al.

a single scalar. This hill-climbing algorithm does not make use of any crossover operator but allows modification to the current solution (inclusion, dismissal, mutation of feature extractors) only if changes increase classification accuracy or cause a negligible change in accuracy but a decrease in runtime. Todd et al. [12] propose an algorithm based on GP where a set of 35 functions are used to evolve sets of feature extractors. When the process is complete the best set is used to derive a feature-based representation. Nearly all the functions take a time series as input and output a transformed time series. To reduce time series to a single scalar each feature extractor must end with a summation. The algorithm includes a set of rules to reduce redundancy of final feature extractors and increase their interpretability. The classification performance is tested on simulated data. To the best of our knowledge the present study represents a novel populationbased GE approach to the feature-based time series one-class classification problem and the subject authentication problem from accelerometer data.

26.3 Preliminaries This section introduces the data-set in use (Sect. 26.3.1), the data preparation steps (Sect. 26.3.2) and the training, validation and test sets design (Sect. 26.3.3).

26.3.1 Data-Set Description Data are collected through wrist-worn tri-axial accelerometers able to measure linear acceleration within a range of ±16 g per axis.1 The magnitude of the acceleration along the three axis is calculated as the square root of the sum of the square of the single accelerations and rounded to the closest integer. The time series object of this study are indeed a sequence of magnitude values at the resolution of one data point per minute over an entire day (24 h. A graph of magnitude recordings for one typical day is shown in Fig. 26.1. All the variables recorded are shown in Table 26.1. ID is the subject unique and anonymous identifier. Axis_1,2,3 show acceleration. A sample of 9 volunteers (not enrolled in a clinical trial), composed of 3 females and 6 males, is required to wear the mentioned device for a period of approximately 40 days in free living conditions. All the participants are office workers based in the same location and working from Monday to Friday. Only weekdays are used in this study because weekend days are relatively few.

1 For

further information see: http://www.actigraphcorp.com/.

26 One-Class Subject Authentication Using Feature Extraction …

397

Fig. 26.1 Magnitude recordings for one day for one subject Table 26.1 Data-set variables ID Date Hour Subject_1 2016-0321

11

Minute

Axis_1

Axis_2

Axis_3

Magnitude

36

168

544

563

801

26.3.2 Data Preparation The algorithmic implementation in use requires a full day of recordings i.e. each time series has to contain 1440 observations (24 h × 60 min). The original data-set is filtered as follows. • Time series with a cumulative magnitude lower than 500,000 g are removed. These time series predominantly contain ‘non-wear time’ i.e. time when the device is on but it is set down and stationary. In such a scenario a device records only a long sequence of zeros. Note that this does not exclude the possibility to find some ‘non-wear time’ in the remaining time series. • Some time series may contain a sequence of missing values i.e. they contain less than 1440 observations. This is caused by the fact that from time to time it is necessary to download the data from the limited internal memory of the device (4 GB) and recharge its batteries. In order to deal with this, when a sequence of contiguous missing values of size N is found it is filled with a copy of the N values that precede it. The maximum length for a sequence of missing values allowed is 90 min; if this limit is exceeded the time series is discarded. This strategy for handling the missing data may seem naive but it is adopted for its simplicity and because it is required only two to three times per subject.

398

S. Mauceri et al.

26.3.3 Training, Validation and Test Sets After cleaning the data-set as described in Sect. 26.3.2 we have a different amount of time series per subject. The subject with the fewest has 23 of them. From those subjects where a larger number is available, 23 are randomly selected. For each subject, 18 of the 23 time series are included in the training set, while the remaining 5 are included in the test set. Thus a training set of 162 time series (18 time series × 9 subjects) and a test set of 45 time series (5 time series × 9 subjects) are obtained. A validation set is created using a 2:1 split of the training data maintaining a constant number of time series per subject.

26.4 Evolutionary System The overall evolutionary system is described in Sect. 26.4.1. The grammar in use is discussed in Sect. 26.4.2. The system is implemented in Python and relies on the PonyGE2 library [8]. The KDE classifier uses Scikit-Learn [21].

26.4.1 System Overview The evolutionary process involves one subject at a time, thus evolved features are specifically developed to have a good classification performance for that one subject. All the elements of a population of GE-generated feature extractors are evaluated according to the process depicted in Fig. 26.2. A feature extractor (FE) selects a subsequence of a time series and reduce it to a scalar (i.e. a feature). This is done for both the training and the validation sets. A KDE classifier is fit on the feature-based representation of the training set and than used to estimate the probability of each instance of the feature-based representation of the validation set to belong to the target class. Estimated probabilities are used to calculate the AUROC. The AUROC is the area under the receiver operating characteristic curve. Using, the classification scores assigned to test (validation) samples by the KDE classifier the AUROC is obtained by computing the underlying area of a curve constructed by plotting the true positive rate against the false positive rate at various threshold settings. Threshold values are calculated as the midpoint between each pair of sorted classification scores. For each threshold value all the samples with score lower then the threshold are assigned to the negative class while those with greater score to the positive class. The algorithm is able to sequentially extract an user-defined number of features. This number is set through an hyper-parameter k that influences the fitness evaluation. By choosing the number of features to be evolved a large time series can be reduced to a feature-vector of arbitrary dimensionality.

26 One-Class Subject Authentication Using Feature Extraction …

399

Fig. 26.2 Evolutionary system summary: evaluation of a feature extractor

The fitness function is a core component of the algorithm because it drives the search through the space of candidate solutions. In this study, when k = 1, each feature extractor is assigned a fitness score (i.e. a measure of quality) equal to the classification error: (1−AUROC score). When k > 1 feature extractors are assigned a fitness score equal to the sum of the classification error plus the average coefficient of determination R 2 shown in Eq. 26.1. The R 2 is calculated using the feature-based representation of training data according to the current feature extractor being evaluated (F E) and the ones outputted from previous runs of the algorithm (F E k ). 1  2 = R (F E, F E k ) k − 1 i=1 k−1

R2

(26.1)

The system works to minimize the classification error of the current feature extractor, and to minimize the R 2 with previous features in order to reduce linear dependencies between extracted features. When the evolutionary process is completed the feature extractor with the lowest fitness is considered the final solution of the algorithm and tested on a portion of unseen data. We use a KDE classifier because it is a simple and computationally efficient model, however any other one-class classifier could be plugged into the system instead.

400

S. Mauceri et al.

Fig. 26.3 Grammar. All the functions in use are listed in the text

26.4.2 Grammar A grammar in the Backus-Naur form, as shown in Fig. 26.3, guides the creation of a feature extractor. The mapping starts from the production rule . In there are two production choices both intended to select a sub-sequence from a time series and reduce it to a single scalar. A sub-sequence is selected using a lower bound () and an upper bound () extracted with uniform probability from the range [1 : 1440]. Since a requirement is that the lower bound is below the upper bound, if this condition is violated they are swapped. If the lower bound is equal to the upper bound than the latter is increased by 1. As shown in Fig. 26.4, once bounds are set a sub-sequence can be treated according to two different strategies: Select and Select_fold. The Select function picks up a sub-sequence and applies a function on it outputting a single scalar. The Select_fold function picks up a sub-sequence and fold it times. A function is applied on each fold. Resulting values are reduced to a single one thanks to a function . If a sub-sequence cannot be folded in equally sized sub-sub-sequences the upper bound is adjusted according to | – | mod . Both the production rules and include the same functions: mean, standard deviation, variance, median, mode, skewness, kurtosis, max, min, sum, autocorrelation, mean absolute deviation and entropy. All these functions are able to reduce a selected sub-sequence to a single value. Along with the functions in

these are among the simplest and most common functions used in feature-based time series classification [9]. The set of terminals is defined in . It is possible to select a time series X or a function from the production rule

. All the functions in

do not alter the shape of their inputs. In this set are included: four operators {+, −, ∗, /}, sine and cosine, logarithm, square root, absolute value, remove linear trend, exponential smoothing, moving variance, moving max, moving min, moving average.

26 One-Class Subject Authentication Using Feature Extraction …

401

Fig. 26.4 Sub-sequence selection strategies. (1) Select. (2) Select_fold

26.5 Experiment Design For each subject in the data-set we search for the best features that allow his/her authentication. We sequentially evolve 5 feature extractors per subject, and we repeat this for 30 runs. The optimal number of features required isn’t known a priori. However it is expected to depend on the quality of the features and on the problem difficulty meaning that a subject can be authenticated more easily than others. An increasing number of feature extractors from 1 to 5 are tested on a portion of unseen data. The training and the validation set used during the evolutionary process are merged in a new training set. Both the new training set and the test set are reduced to a certain number of features according to the number of feature extractors at hand. Considering that extracted features can have different scales column-wise standardization is applied by removing the mean and scaling to unit variance. A KDE classifier with Gaussian kernel is fit on the training set and than used to estimate the probability of the instances in test test to belong to the target class.

26.5.1 Baselines GE-evolved feature extractors are compared with the results achieved by a highperforming manual feature extraction method we developed in another work [18]. A second baseline is obtained by passing the entire time series to the KDE classifier. Furthermore, two baselines are used: (1) most frequent: which always predicts the most frequent label in the training set, (2) uniform: which predicts with uniform probability at random among the labels in the training set.

402

S. Mauceri et al.

26.5.2 Run Parameters Evolution follows a generational approach with a population of 500 individuals for 40 generations. The population is initialized creating random genomes with uniform probability. The rest of the parameters are as follows: the genome has length 500, the maximum derivation tree depth is set at 15, tournament selection of size 5, intflip mutation with probability 0.01 per-gene, two-point crossover with probability 0.8, and no wraps are allowed. Elitism is used to preserve the best individual of the population over generations.

26.6 Results and Discussion A summary of the experiment results is shown in Table 26.2. Table entries represent the average performance of 30 independent runs in terms of AUROC for all the subjects included in our data-set and for all the number of features in the range [1 : 5]. The table also includes the following lines: • • • • •

Maximum: maximum performance achieved per subject. Manual FE: performance of the manual feature extraction baseline. Raw Data: performance achieved by passing the raw time series to the classifier. Uniform: classifier choosing class labels with uniform probability. Most Frequent: classifier always choosing the most frequent class label.

Results show how the discussed GE system allows higher performance in terms of AUROC than all the baselines, including the manual feature extraction method we previously developed. While Manual FE achieves an average performance of 78%

Table 26.2 Results and baselines in terms of AUROC Features Subject 1 2 3 4 5 6 1 2 3 4 5 Average Maximum

98 80 97 86 95 89 95 91 94 92 96 88 98 92 Baselines Manual FE 94 89 Raw Data 78 38 Uniform 52 56 Most Frequent 50 50

Average 7

8

9

93 94 91 89 89 91 94

83 83 80 78 79 81 83

91 90 91 92 92 91 92

74 77 79 81 81 79 81

64 68 68 71 72 69 72

93 94 94 94 94 94 94

93 93 89 84 86 89 93

86 87 86 86 87 86 89

69 78 60 50

67 60 81 50

86 76 50 50

83 64 48 50

65 62 61 50

58 73 39 50

90 66 70 50

78 66 58 50

26 One-Class Subject Authentication Using Feature Extraction …

403

Fig. 26.5 AUROC averaged over subjects per number of features



Fig. 26.6  Training samples.  Positive test samples. Negative test samples. The shaded area gets darker as the density estimated through the KDE classifier decreases. Dashed lines correspond to the decision thresholds set as the 20th percentile of the training scores

AUROC using 600 features, the GE system achieves an average performance equal to 87% AUROC using only 2 features. The gap between the two approaches is equal to 9%, and it is greater if the maximum performance per subject over the different number of feature is considered. In this case the GE system achieves an average performance of 89% AUROC. In Fig. 26.5 is shown the average AUROC score per subject and per number of features. Only subjects 2, 6 and 7 are highlighted because for these subjects the classification performance increases as more features are used. The largest improvement of 12% AUROC, between 1 and 5 features, is related to subject 2. This allows us to hypothesize that the present GE system might benefit from a mechanism that keeps evolving more features until the classification performance stops improving. In Fig. 26.6, we use the first two features evolved from a run related to subjects 1 and 2 to show the projection of data to a 2D space. The GE system tends to concentrate

404

S. Mauceri et al.

the instances of the target class around the origin for both features even though this is not explicitly required. This aspect shows that the proposed GE system has the potential to work well using a “simple” radial basis function classifier which has the advantage of requiring only a single hyper-parameter i.e. a distance threshold from the origin.

26.6.1 Frequently Selected Sub-sequences and Functions It is interesting to understand whether or not the GE system is preferring the selection of certain sub-sequences and functions rather than others. In Fig. 26.7, for each subject all the time series in the training set are averaged and the resulting time series is presented in black. The shaded area is obtained by counting how many times a given minute falls within a selected sub-sequence, over all the features extractors evolved for each subject. Y-axes limits are set according to the maximum either in terms of Magnitude or Selection Frequency found. The first row from the top shows the selection frequencies we would observe if 10,000 lower bounds and 10,000 upper bounds were generated at random i.e. without regard to fitness. Rows from Sub_1 to Sub_9 show how evolution is driving the selection towards certain sub-sequences rather than others for each subject. For instance, the hours between 6am and 9am are the most often selected for subjects 1, 2, 4, 5, and 7 but not for the others. Evolution is not only over-selecting certain sub-sequences rather others, but it is also completely ignoring large sections of the time series e.g. subjects 1, 2 and 7, in order to focus on restricted areas that enable better classification performance. In Fig. 26.8 we count how many times each function of the grammar is used in a GE-evolved feature over all the features evolved for all the subjects. Counts are expressed as percentage of the total. The average frequency selection is 3.7%. Results show that most frequently selected functions are those that measures the central tendency of data and their variability: mean, standard deviation, variance, and mean absolute deviation. In this set the sum function is included. Preprocessing strategies seem to be important for classification performance; smoothing functions like exponential smoothing, moving average, moving max, moving min are frequently selected. Again, from the set of preprocessing strategies three operators +, ∗ and the square root are selected more frequently than the average. Of these, the + operator is the most selected function.

26.7 Conclusions In this study a grammar-guided evolutionary system for subject authentication using accelerometer data is presented. The GE system explores a solution space of feature extractors. Each feature extractor is able to select a sub-sequence of a time series and

26 One-Class Subject Authentication Using Feature Extraction …

405

Fig. 26.7 Most frequently selected sub-sequences per subject

reduce it to a single scalar (i.e. a feature). By choosing the number of features to be evolved a large time series can be reduced to an arbitrary number of features. Such a feature-based representation is used to authenticate nine subjects in nine distinct one-class classification experiments thanks to a KDE classifier. Our algorithm is designed to overcome some difficulties encountered in manual feature-based time series classification. The core idea is to relieve the modeler from making any assumptions on what sub-sequences and what features enable good classification performance. It is found that the GE system intelligently drives the search of sub-sequences and features that enable high classification performance. A peak in classification performance in terms of AUROC is found using just two features. Results are compared with a manual feature extraction method we

406

S. Mauceri et al.

Fig. 26.8 Most frequently selected functions over all subjects

previously developed. The current method not only uses two features as opposed to the 600 required by the previous one but also it outperforms its classification performance by 9% (87% AUROC vs. 78% AUROC). In future research is needed to test our algorithm on the problems of the UCR/UEA archive [5] that is the main archive of time series data-sets available in the time series classification literature. Acknowledgements This work is funded by ICON plc.

References 1. D. Bacciu, Unsupervised feature selection for sensor time-series in pervasive computing applications. Neural Comput. Appl. 27(5), 1077–1091 (2016) 2. E. Bingham, A. Gionis, N. Haiminen, H. Hiisilä, H. Mannila, E. Terzi, Segmentation and dimensionality reduction, in Proceedings of the 2006 SIAM International Conference on Data Mining (SIAM, 2006), pp. 372–383 3. A. Brabazon, K. Meagher, E. Carty, M. O’Neill, P. Keenan, Grammar-mediated time-series prediction. J. Intell. Syst. 14(2–3), 123–142 (2005) 4. F.-L. Chung, T.-C. Fu, V. Ng, R.W. Luk, An evolutionary approach to pattern-based time series segmentation. IEEE Trans. Evol. Comput. 8(5), 471–489 (2004) 5. H.A. Dau, A. Bagnall, K. Kamgar, C.-C.M. Yeh, Y. Zhu, S. Gharghabi, C.A. Ratanamahatana, E. Keogh, The ucr time series archive. arXiv preprint arXiv:1810.07758 (2018) 6. D. Eads, K. Glocer, S. Perkins, J. Theiler, Grammar-guided feature extraction for time series classification, in Proceedings of the 9th Annual Conference on Neural Information Processing Systems (NIPS’05) (2005) 7. P. Esling, C. Agon, Time-series data mining. ACM Comput. Surv. (CSUR) 45(1), 12 (2012)

26 One-Class Subject Authentication Using Feature Extraction …

407

8. M. Fenton, J. McDermott, D. Fagan, S. Forstenlechner, E. Hemberg, M. O’Neill, Ponyge2: Grammatical evolution in python, in Proceedings of the Genetic and Evolutionary Computation Conference Companion (ACM, 2017), pp. 1194–1201 9. B.D. Fulcher, N.S. Jones, Highly comparative feature-based time-series classification. IEEE Trans. Knowl. Data Eng. 26(12), 3026–3037 (2014) 10. G. Georgoulas, D. Gavrilis, I.G. Tsoulos, C. Stylios, J. Bernardes, P.P. Groumpos, Novel approach for fetal heart rate classification introducing grammatical evolution. Biomed. Sig. Process. Control 2(2), 69–79 (2007) 11. L. Guo, D. Rivero, J. Dorado, C.R. Munteanu, A. Pazos, Automatic feature extraction using genetic programming: an application to epileptic eeg classification. Expert Syst. Appl. 38(8), 10425–10436 (2011) 12. D.Y. Harvey, M.D. Todd, Automated feature design for numeric sequence classification by genetic programming. IEEE Trans. Evol. Comput. 19(4), 474–489 (2015) 13. D.Y. Harvey, K. Worden, M.D. Todd, Robust evaluation of time series classification algorithms for structural health monitoring, in SPIE Smart Structures and Materials+ Nondestructive Evaluation and Health Monitoring (International Society for Optics and Photonics, 2014), pp. 90640K–90640K 14. H. He, Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (Wiley, 2013) 15. L.A. Kelly, D.G. McMillan, A. Anderson, M. Fippinger, G. Fillerup, J. Rider, Validity of actigraphs uniaxial and triaxial accelerometers for assessment of physical activity in adults in laboratory conditions. BMC Med. Phys. 13(1), 5 (2013) 16. J.R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection, vol. 1 (MIT press, 1992) 17. R. Loughran, A. Agapitos, A. Kattan, A. Brabazon, M. O’Neill, Feature selection for speaker verification using genetic programming. Evol. Intell. 1–21 (2017) 18. S. Mauceri, L. Smith, J. Sweeney, J. McDermott, Subject recognition using wrist-worn triaxial accelerometer data, in International Workshop on Machine Learning, Optimization, and Big Data (Springer, 2017), pp. 574–585 19. A. Nanopoulos, R. Alcock, Y. Manolopoulos, Feature-based classification of time-series data. Int. J. Comput. Res. 10(3), 49–61 (2001) 20. M. O’Neil, C. Ryan, Grammatical evolution, in Grammatical Evolution (Springer, 2003), pp. 33–47 21. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 22. N. Ravi, N. Dandekar, P. Mysore, M.L. Littman, Activity recognition from accelerometer data. Aaai 5, 1541–1546 (2005) 23. L. Shao, L. Liu, X. Li, Feature learning for image classification via multiobjective genetic programming. IEEE Trans. Neural Netw. Learn. Syst. 25(7), 1359–1371 (2014) 24. J. Wayman, A. Jain, D. Maltoni, D. Maio, An introduction to biometric authentication systems. Biometric Syst. 1–20 (2005) 25. S.J. Wilson, Data representation for time series data mining: time domain approaches. Wiley Interdisciplinary Reviews. Comput. Stat. 9(1), (2017) 26. Z. Xing, J. Pei, E. Keogh, A brief survey on sequence classification. ACM Sigkdd Explor. Newsl. 12(1), 40–48 (2010)

Chapter 27

Semantic Composition of Word-Embeddings with Genetic Programming R. Santana

Abstract Word-embeddings are vectorized numerical representations of words increasingly applied in natural language processing. Spaces that comprise the embedding representations can capture semantic and other relationships between the words. In this paper we show that it is possible to learn methods for word composition in semantic spaces using genetic programming (GP). We propose to address the creation of word embeddings that have a target semantic content as an automatic program generation problem. We solve this problem using GP. Using a word analogy task as benchmark, we also show that GP-generated programs are able to obtain accuracy values above those produced by the commonly used human-designed rule for algebraic manipulation of word vectors. Finally, we show the robustness of our approach by executing the evolved programs on the word2vec GoogleNews vectors, learned over 3 billion running words, and assessing their accuracy in the same word analogy task.

27.1 Introduction Recent research has shown that word-embeddings obtained using neural networks can capture linguistic or relational regularities between pair of words. In this type of representation, usually called word embeddings, each word of a given corpus is encoded by a vector of real values. One reason that makes this type of representation relevant is that several natural language processing (NLP) tasks can be efficiently implemented on it. In particular, a number of machine learning methods have been proposed for named entity recognition [20], question answering [7], machine translation [10], etc. Another convenient feature of word embeddings is that simple vector algebraic operations can capture some semantics encoded in the vector space. For R. Santana (B) Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), P. Manuel de Lardizabal, 20018 Gipuzkoa, Spain e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_27

409

410

R. Santana

example, these regularities can be manifested as constant vector offsets between pairs of words sharing a particular relationship [9, 11]. − → Let us use W to represent the vector representation of the word W, then this offset −−−−→ −−−→ −−→ −→ property can be illustrated as geckos − gecko ≈ ants − ant. In another example, −−→ in the vector space constructed by Mikolov et al., the algebraic operation king − − − → − − − − → man + woman will produce a real-value vector whose closest word in the wordembedding space is “queen”. More notably, other semantic relationships such as gender-inflections, geographical relationships, etc. can be recovered using algebraic operations between vectors. A relevant question is whether other types of operations could support more precise semantic relationships or unearth more complex or subtle relationships hidden in the semantic spaces. In this paper we address this question using genetic programming (GP) [8]. We propose the use of GP to find a sequence of word vector operations that captures a semantic relationship implicitly encoded in a set of training examples. This constitutes an automatic way to unveil the algebraic operations that express or support a given semantic relationship. We frame the general question of finding a suitable transformation of word vectors on the more specific word analogy task [9, 13]. This task consists of answering a question such as: “a is to b as c is to ?”. A correct answer is the exact word that would fit the analogy. Given the vector representations of the three known words, the problem to be solved by GP is to produce a vector whose closest word in the corpus is the one that correctly answers the question. It is important to notice that from the point of view of machine learning problems, the word analogy task is not a classification problem. This is so even if the quality of a solution can be given in terms of accuracy, as the fraction of correctly answered questions. Neither it is a classical regression problem since each single input and output feature is represented using a vector of high-dimensional variables. In this context, GP has been less investigated than for classical classification and regression problems. However, GP has been applied to a miscellany of tasks in information retrieval [2, 4, 12, 16, 17, 19]. In particular, Oren [12] combines GP with vector-based representation of documents for information retrieval. Other problems that involve text classification have also been addressed with GP. Two related areas where GP has been applied are document ranking [19] and term-weighting learning [2, 4]. In [16, 17], GP is combined with word-embeddings for the creation of kernels respectively applied to NLP classification and regression problems. The tasks addressed, and also the GP grammar used are different to the ones proposed in this work. The remainder of the paper is structured as follows: In the next section we introduce a general background to word-embeddings. Section 27.3 present our GP approach. In Sect. 27.4, the benchmark of the word analogy task dealt with in the paper is described. Section 27.5 introduces the approach for automatically learning compositional methods using GP. Experiments to evaluate the accuracy of the evolved programs and their transferability across corpora are presented in Sect. 27.6. Section 27.7 concludes the paper.

27 Semantic Composition of Word-Embeddings …

411

Fig. 27.1 Continuous Bag-of-Words (CBOW) model as proposed in [9]

27.2 Learning Word Embeddings Using Neural Networks In [9], two neural-network based models have been proposed to learn embeddings: Skip-gram and Continuous Bags of words (CBOW) models. Skip-gram learns to predict the surrounding words of a given word in a sentence. CBOW learns to predict, given the surrounding words, the word most likely to be in the center. We focus on the CBOW model. CBOW is a feed-forward neural net language model [1] with a number of added changes. The most important difference is that the hidden layer has been removed. The rationale behind this modification was to explore simpler models. They can not represent the non-linear interactions that neural networks with hidden layers can, but they are much more efficient for learning from millions of words. The CBOW network also uses a Huffman binary tree for more efficient representation of the word vocabulary and a hierarchical softmax scheme. Figure 27.1 shows a schematic representation of the CBOW architecture [9]. Learning is done by scanning the corpus and considering, for each target word w(t), a window comprising words from t − k to t + k where 2k is the window size. In the results reported in [9], the best results were obtained using k = 4. The model was trained using stochastic gradient descent and backpropagation.

27.2.1 Generation of the Embeddings To generate the embeddings we work with in this paper, text8.zip corpus1 is used. This corpus has been extracted from the English Wikipedia.2 It comprises 71291 words. 1 Available

from http://mattmahoney.net/dc/text8.zip. on the procedure to extract the data are available from https://cs.fit.edu/%7Emmahoney/ compression/textdata.html.

2 Details

412

R. Santana

Table 27.1 Parameters used by word2vec to train the CBOW model Parameter Vector size Window Negative Value 200 8 25 Parameter Sample Threads Binary Value 1e-4 6 1

hs 0 Iter 15

We use the original word2vec implementation3 of Mikolov et al. [9, 11] to train the CBOW network from the corpus and generate embeddings. The parameters used by the word2vec program to generate the embedding are described in Table 27.1. The CBOW is only generated once. Regarding the GP implementation, the most important parameter is the vector size. A larger vector size may allow a more accurate representation of the words. However, the vector size also influences the computational cost of the algebraic operations between the words that are applied intensively while GP searches for an optimal way to compose the words. To evaluate the scalability and robustness of the programs evolved by GP, we also used a much larger dictionary of embeddings. The word2vec word vector model4 comprises 3 million 300-dimension English word vectors and was trained with the Google News corpus (3 billion running words).

27.3 A Genetic Programming Approach for Word-Embedding Composition Genetic programming [8, 14] is a domain-independent method for the automatic creation of programs that solve a given problem. Each GP program can be seen as a candidate solution to the problem. The process to find the optimal solution is posed as a search in the space of possible programs. The search is organized using a traditional evolutionary optimization approach in which sets (populations) of programs are evolved and transformed by the application of the so-called mutation and crossover operators. Issues in the application of GP are the choice of the program representation, the algebraic operators used by the program, and the objective or fitness function to evaluate the programs. We will discuss these issues in more detail in Sect. 27.5. However, in order to build some intuition on the particular way in which GP is used in this paper, we present a simple example of the representation. Let us consider that the three words in the question “a is to b as c is to ?” are transformed to their vector representations, which will be the three arguments of a program. They are transformed as: a → A RG0, b → A RG1, c → A RG2. Then, the linear algebraic rule to compute the answer to the questions, i.e., d = c − a + b, 3 http://code.google.com/p/word2vec. 4 Available

from https://github.com/mmihaltz/word2vec-GoogleNews-vectors.

27 Semantic Composition of Word-Embeddings …

ARG2

add

sub

add

sub

413

ARG1

ARG1

ARG0

ARG2

sub

ARG0

ARG2

ARG1

sub

ARG0

add

ARG2

add

neg

ARG1

ARG0

Fig. 27.2 Four programs evolved by the GP algorithm. All implement the linear algebraic rule d=c−a+b

could be represented as add(A RG2, sub(A RG1, A RG0)), where add indicates addition, and sub, subtraction. Figure 27.2 shows four GP programs that produce the same rule. The representation shown in Fig. 27.2 is called a tree-based GP representation and is the one used in this paper. The tree representation is a convenient way to recursively organize the evaluation of a particular composition of the word vectors. Depending on the set of available operators (those defined in the non-terminal nodes of the trees), a richer space of possible word vector compositions could be represented. What the GP algorithm does is to bias the search toward those programs that maximize a given fitness function.

27.4 Problem Benchmark: Word Analogy Task The word analogy task consists of answering a question such as: “a is to b as c is to?” A correct answer is the exact word that would fit the analogy. Table 27.2 shows several exemplar questions and their answers. We used the benchmark proposed by Mikolov et al. [11] in which questions are separated into 13 groups. In Table 27.2, Group refers to the group from which the example was taken. Table 27.3 shows the description of the word analogy task benchmark. In this table, N qorig is the number of questions in the original benchmark and N q is the number of question after removing those words that do not appear in the shortened corpus we used in our experiments. Since the corpus we use is relatively small, for 4 of the 13 groups of questions (“capital-world”, “currency”, “city-in-state”, “nationality-adjective”) we did not find one or more of the four words for each of the questions. Therefore, these four groups of questions were initially excluded from our analysis.

414

R. Santana

Table 27.2 Examples of questions in the word analogy task Group Word 1 Word 2 Word 3 4 5 6 7 8 9 11 12 13

Boy Amazing Honest Bad Bad Code Dancing Banana Decrease

Girl Amazingly Dishonest Worse Worst Coding Danced Bananas Decreases

Answer

Sons Slow Known Old Good Walk Feeding Car Say

Daughters Slowly Unknown Older Best Walking Fed Cars Says

Table 27.3 Description of the word analogy task benchmark, where N qorig is the number of questions in the original database and N q is the number of question after removing those words that do not appear in shortened corpus Group Name N qorig Nq 4 5 6 7 8 9 11 12 13

Family (gender inflections) Gram1-adjective-toadverb Gram2-opposite Gram3-comparative Gram4-superlative Gram5-presentparticiple Gram7-past-tense Gram8-plural (nouns) Gram9-plural-verbs

506

305

992

755

812 1332 1122 1056

305 1259 505 991

1560 1332 870

1331 991 649

27.5 Description of the GP Approach The automatic learning of the composition of words is possible in the specific problem we use, and the GP task, given the vector representations of three words that define a question, is to produce a vector whose closest word in the corpus is one that correctly answers the question. We will mainly use the CBOW model learned using word2vec to determine which is the word vector of the model encoding a given word, or to find which is the word in the model whose encoding vector is the closest to a target word vector. The selection method we use for tree-based GP is truncation selection. After sorting the individuals according to their fitness, the best 100 solutions are kept for crossover and mutation. Uniform mutation randomly selects a point in the tree and

27 Semantic Composition of Word-Embeddings …

415

replaces it by a random subtree. For recombination of solutions, one-point crossover is used, it randomly selects two subtrees in the individuals and then exchanges them. The probability of mutation and crossover was pm = pcx = 0.5. The choice of genetic operators has been made as simple as possible to enhance the readability of the algorithm. While more sophisticated GP methods exist, our focus here is the proof of concept of automatic generation of the compositions, and, for that purpose, the choice of the operators was appropriate. On the other hand, we conducted a set of preliminary experiments with other mutation and selection operators5 and did not appreciate significant changes in the results when the set of all groups of analogy questions were considered. Some operators can produce more accurate programs for some particular group, but then they are outperformed by other methods in other groups. In the experiments, the population size used was N = 500 and the stop criterion is a maximum number (ngen = 250) of generations. Our GP implementation was written in Python. It is based on the EA software DEAP6 [5] and the gensim package,7 a Python-based implementation of NLP algorithms [15]. gensim includes methods for interrogating the model generated by word2vect. Our code is openly available.8

27.5.1 GP Operators The set of operators used by the programs is shown in Table 27.4. All operators are defined on vectors of the same dimension. There are two classes of operators: binary and unary. The +, − and ∗ operators have the following meaning: vector addition, vector subtraction, and vector component-wise multiplication, respectively; while % corresponds to protected division (usual division except that a divide by zero for any of the vector components returns zero). We discarded the possibility of including fixed (vector) random constants as terminals of the programs since they may depend on the size of the vector and our aim was to produce programs scalable to any vector dimension. We set a constraint d = 10 to the depth of the trees to reduce the complexity of the programs.

27.5.2 Fitness Function A critical component of a GP implementation is the definition of the fitness function. We implement three variants of fitness functions as follows: At the time of evaluating 5 Those

included in the DEAP library used to implement the algorithms.

6 http://deap.readthedocs.io/en/master/api/tools.html. 7 https://radimrehurek.com/gensim/. 8 https://github.com/rsantana-isg/GP_word2vec.

416

R. Santana

Table 27.4 Set of operators and terminals used by the tree-based GP algorithm. Binary and unary operators are represented as B. and U., respectively B. w op v U. op w U. op w add sub mul saveDiv

w+v w−v w∗v w%v

neg diff abs cos

−w 1+w abs(w) cos(w)

Roll Rint Half Norm

(w2 · · · , wn , w1 ) int (w) 1 2w

sin

sin(w)

Log1p

log(1 + N or m(w))

w max(abs(w))

a candidate program, it is applied to a training set of questions. For each question, the word vectors of the first three words are first obtained from the CBOX models. The program is then evaluated using as arguments these three word vectors, and the program’s output word vector is used to compute the quality of the program for the question. Let us consider the program add(sin(A RG0), sub(A RG2, A RG1)) and the first −→ −−→ question in Table 27.2 as an example. First, we obtained the words vectors boy, girl, − − → and sons. Then, from the execution of the GP program we obtain a word vector −→ −→ −→ − AN S = add(sin(boy), sub(− sons, girl)). The quality of the program is measured comparing the word vector AN S to the word “daughters”, or to its vector repre−−−−−−−→ sentation daughter s. Three different fitness functions are proposed to measure this quality. • F0: Finding the word in CBOX that is closest to AN S. If this word coincides with the answer to the question a counter of correctly answered questions is increased. The final fitness is the proportion of questions in the training set that were correctly answered. • F1: The cosine similarity measure between the answer produced by the program and the vector representation of the target answer is computed. The final fitness of the program is the average cosine similarity for all the questions in the training set. • F2: Similar to function F1 but instead of using the cosine similarity measure, the linear correlation between the program-produced vector and the target vector are computed. Function F0 serves as a direct assessment of the program quality because we can directly test whether the program produces vectors whose semantics is the one encoded by the question. However, it has an important drawback. The computational cost of repeatedly interrogating the model to determine the closest word to a given vector is very high, and it would increase with the size of the vocabulary if larger corpora were used. To diminish this cost, we introduced three changes to the GP scheme. (1) Limited number of function evaluations: When using F0, the population size was limited N = 500 and the number of generations to ngen = 250.

27 Semantic Composition of Word-Embeddings …

417

(2) Restricted vocabulary size for interrogation: The word2vec implementation allows the restriction of the search for the most likely word given a vector to the l most frequent words in the vocabulary. Out of the total number of words (71291) in the vocabulary, we set l = 30000. This reduces the computational time of the fitness function. (3) Partial evaluation: Each program is trained on a fraction c of the questions from the training set. We set this fraction to be 15 of the size of the training set. When evaluating a program, first a subset of the questions from the training set is randomly selected, and the accuracy of the program is measured in this subset. This means that different programs are evaluated on distinct subsets of questions. (4) Early halt: While sequentially evaluating the questions in the (random subset of the) training set, the program does not complete the evaluation of all questions and halts if: (1) A N AN output is generated for any of the questions. (2) If after at least ten questions have been “answered” the proportion of correctly answered questions is at some point below 0.05. In this case it is clearly a poorly performing program. All the previous enhancements considerably increase the efficiency of the algorithm. While partial evaluation adds some variability in the fitness output of the programs, good programs are in general good across subsets of questions and poor programs can not specialize in niches of questions since the subset selection in the training set is made randomly. The implementations of functions F1 and F2 incorporate all the efficiency enhancements except the early halt since the model is not interrogated.

27.6 Experiments The main objective of the experiments is to determine the quality of the programs generated by GP. We will compare their results for the word analogy task with those obtained by the application of the linear algebraic rule d = c − a + b, which is the one commonly used for the composition of words for this problem. In addition, we will evaluate the transferability of the best programs by applying them to a vector space comprising 3 · 106 vectors, roughly 100 times the size of the vector space we have used to learn the programs. For each fitness function, and each group of questions of those described in Table 27.3, 30 independent runs of the algorithms were executed. In total, 9 × 30 = 270 executions were conducted. Each group of questions was split into a training and test set with same number of questions. The questions in the test set were not used at any time of the evolution.

418

R. Santana

Table 27.5 Results of the GP algorithm in terms of the maximum and mean accuracy of the best program. Training and test sets of questions are used to evaluate the accuracy of the best program in each of the 30 runs. The last column corresponds to the algebraic rule d = c − a + b Group Training set Test set Rule Max. Mean Max. Mean 4 5 6 7 8 9 11 12 13

84.21 24.67 49.34 67.09 46.83 48.08 45.71 76.16 49.69

82.74 21.62 41.86 66.00 44.96 45.09 41.97 73.13 40.42

83.66 26.46 50.98 61.27 41.11 44.56 46.70 72.38 38.77

82.77 21.31 41.11 58.62 38.72 39.89 44.47 69.41 32.38

77.70 16.16 24.92 60.44 40.40 36.83 37.94 66.50 34.21

27.6.1 Numerical Results We evaluate the performance of the GP algorithms by looking at the accuracy of the best GP programs found. The accuracy, for each group of questions, is the proportion of questions correctly answered by a GP program. For each of the 30 runs, we keep all the solutions in the last selected population (100 solutions by run). Among the 100 programs, the one that has the highest accuracy in the training set is selected. Then we compute the accuracy of this program also in the test set. Using the 30 programs, the maximum and mean accuracy are calculated in the training and test sets. Table 27.5 shows these values for the 9 groups of questions. The table also shows the accuracy produced by the algebraic rule. It is important to emphasize that since no previous method has been proposed for the automatic creation of rules we can not compare to other algorithms. It can be seen that the best GP evolved programs outperform the algebraic rule on all the groups of questions, although the difference in the results is more noticeable for some groups of questions (e.g., group 6). The mean accuracy of the programs on the test set is also higher than that achieved using the algebraic rule for 6 of the 9 groups of questions. Notice, that since our selection of the best programs was based on the accuracy for the training set, there might be programs with a higher accuracy on the test set. We did identify some of these programs. Interestingly, for some groups of questions (e.g., group 11) the maximum and mean accuracy in the training set is smaller than in the test set. In a second phase of the experiments, the best program generated in the last generation of each execution of the GP algorithm was selected based on the sum of the training and test set accuracy values for the corresponding group of questions. We evaluated this set of 270 programs using the word2vec GoogleNews vectors. These vectors have a larger dimension (300 vs. 200 in the text8 vector space) and

27 Semantic Composition of Word-Embeddings …

419

Table 27.6 Results of the best programs produced by the GP algorithm on all groups of questions when the set of GoogleNews vectors are used to execute the GP programs. Last column corresponds to the algebraic rule d = c − a + b Group 4 5 6 7 8 9 11 12 13 Rule 1 2 3 4 5 6 7 8 9 10 11 12 13

77.80 14.68 65.94 73.27 27.75 34.16 88.66 65.21 70.90 85.17 62.92 76.93 55.12

63.41 10.87 41.16 62.18 18.77 27.87 63.19 48.26 60.00 83.79 43.30 74.76 21.75

51.43 2.08 33.62 48.71 17.66 28.61 71.83 33.27 61.90 71.78 48.69 74.68 29.69

81.72 16.18 72.75 77.62 28.86 32.68 89.56 67.53 73.18 85.42 68.44 78.36 65.71

81.49 15.95 72.75 77.62 28.86 32.43 89.56 67.53 73.18 85.42 66.20 78.36 63.87

76.23 14.68 69.06 73.66 21.39 34.28 86.33 46.65 61.99 82.23 64.27 75.13 56.85

81.41 16.07 68.98 77.23 29.87 33.54 89.93 69.05 71.09 85.36 64.46 78.21 62.83

78.60 16.07 70.48 77.62 28.86 32.31 89.26 68.06 72.70 85.36 66.13 78.14 62.83

81.49 15.95 74.17 77.62 28.86 32.43 89.56 67.53 74.41 85.42 66.20 78.36 63.87

81.49 15.95 72.75 77.62 28.86 32.43 89.56 67.53 73.18 85.42 66.20 78.36 63.87

comprise around 3 million words. As a consequence, these vectors contain all words for the 13 original groups of questions introduced in [9]. We must remember that the reason why we did not use four of the original groups of questions was that the text8 vector space did not include the vector representations for all constituent words of each question in these groups. Using the word2vec GoogleNews vectors we can test the evolved programs in all the data sets. The same set of operations encoded in the programs are applied, but this time using the new vector representation. The output vector is then submitted to the model that determines whether the closest word in the space of word2vec GoogleNews vectors is the right answer to the question. The results of this evaluation are shown in Table 27.6, where each row corresponds to one of the 13 original groups of questions. Each column j shows the best accuracy produced by the best program among the 30 generated with function F0 for the group of questions represented in column j. The last column shows the accuracy results of the algebraic rule. In each row, all the programs that produce results better than the one in the last column are shown in bold. Notice that we can evaluate the 270 GP program in all 13 groups of questions independently of the group used to learn them. Since all the questions have the same structure, we can apply the programs to them. The application of solutions across different problems is related to transfer learning or solution transferability, an area where evolutionary algorithms have shown a great potential [3, 6, 18]. There are a number of remarkable facts in the results shown in Table 27.6:

420

R. Santana

(1) Some of the programs improved the accuracy for groups of questions that were not in the original reduced benchmark of 9 groups. This is the case for the group of questions 3. (2) The best program for the group of questions j is not, in general, a program evolved to answer this group of questions. (3) There are programs evolved for some groups of questions that are good at answering questions for all groups. For example, this happens with programs learned using the group of questions 11.

27.6.2 Evaluating Answers and Evolved Programs One important issue is the interpretability of the evolved programs and how are they related with the human algebraic rule. Out of the 270 programs tested, 8 were equivalent to the algebraic rule. Four of these programs are shown in Fig. 27.2. It can be seen how the same rule is implemented in distinct ways using only the operators add, sub, and neg. These results show that, as an algorithm to create word compositions, GP can automatically learn compositional methods designed by humans. We also analyzed some of the GP programs that outperformed the algebraic rule. An exemplar of this type of programs is shown in Fig. 27.3. It was the best program found for the group of questions 3. Its accuracy using the word2vec GoogleNews vectors was 74.17, above the 72.75 accuracy of the algebraic rule for the same group of questions. The tree shown in Fig. 27.3 is a slight modification of the algebraic rule. Instead of adding A RG2 to the rule, this programs adds 45 A RG2 and this change allows

sub

sub

sub

Half

neg

ARG0

Half

ARG1

ARG2

ARG1

Fig. 27.3 Program number 247, according to the indices in Table 27.7. It was learned from the group of questions 13 and produced the best accuracy, among the 270 programs selected, for group of questions 3. Its accuracy using the word2vec GoogleNews vectors was 74.17, above the 72.75 accuracy of the algebraic rule for the same group of questions. See Tables 27.6 and 27.7 for details of the program behavior

27 Semantic Composition of Word-Embeddings …

421

Table 27.7 Indices of the best programs produced by the GP algorithm on all groups of questions Group 4 5 6 7 8 9 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13

23 9 15 23 23 23 23 23 23 23 23 23 23

52 52 58 58 52 58 58 52 58 58 58 37 58

76 76 76 76 76 76 76 76 76 76 76 76 76

104 104 120 120 120 111 120 120 104 120 104 120 104

129 129 129 129 129 129 129 129 129 129 129 129 129

153 153 153 178 178 178 178 178 153 178 178 154 153

185 185 198 185 208 196 208 208 195 208 208 208 195

235 235 235 235 235 224 235 235 235 224 235 235 235

261 261 247 261 261 261 261 261 247 261 261 261 261

it to increase the accuracy for the group of questions. A trend observed in other evolved programs was that they contained building blocks from the algebraic rule. As in the case of the programs shown in Fig. 27.2, these structural features were not specifically induced, they were acquired as part of the evolutionary process.

27.6.3 Comparison of the Different Fitness Functions Table 27.8 shows the mean and maximum classification accuracy computed from the 30 experiments for the three fitness functions. The accuracy has been computed using the test set of questions. The table also shows the accuracy produced by the arithmetic rule. The analysis of the table reveals that function F0 produces the best results. It outperforms the arithmetic rule on all the instances. The average accuracy across functions is very similar for fitness functions F1 and F2. This shows that the Pearson correlation can be used as a surrogate of the cosine similarity for the evolution of the vectors. We observed that programs equivalent to the algebraic rule were generated during the optimization of functions F1 and F2. However, in terms of the cosine similarity and the correlation distance, there were other GP programs that produced a better fitness value and, as a consequence, very often the programs encoding the arithmetic rule were not kept in the population. This fact may explain while for 5 of the 9 groups of questions the best evolved programs did not reach the accuracy of the arithmetic rule.

422

R. Santana

Table 27.8 Comparison of the results of the GP algorithm for functions F0, F1, and F2. The test set of questions is used to evaluate the accuracy of the programs. Last column corresponds to the arithmetic rule d = c − a + b Index Mean Max Rule F0 F1 F2 F0 F1 F2 4 5 6 7 8 9 11 12 13

83.97 21.48 42.29 59.78 40.75 41.28 45.93 70.85 33.78

75.82 18.37 40.28 54.86 28.84 25.51 30.37 67.55 19.06

78.02 18.50 40.28 55.15 28.96 28.72 28.05 67.72 18.28

84.97 27.25 50.98 61.90 41.90 45.56 47.45 72.58 41.54

83.01 20.11 40.52 56.67 32.02 33.47 35.59 69.35 22.77

83.66 19.84 40.52 57.30 33.60 34.68 32.88 69.76 24.31

81.05 17.46 26.80 58.57 39.13 36.09 39.64 63.91 30.46

27.7 Conclusions and Future Work While semantic spaces and word vector representations are able to capture some of the semantic relationships between the words, compositional methods are necessary to extend their use to multi-word constructions. In this paper we have proposed representing compositional vector operations as simple programs that can be automatically learned from data. We have shown that, using GP, it is possible to encode a set of vector operations as a program, that the programs can be evolved to achieve higher accuracy than the human rules conceived to manipulate the words, and that the programs are valid for datasets other than those from which they have been learned, i.e., they are transferable programs. Furthermore, our results indicate that it is possible to learn programs using vector vocabularies of small to moderate sizes and then apply them to bigger domains where the evaluation of a program is more costly. Acknowledgements This work has been supported by the TIN2016-78365-R (Spanish Ministry of Economy, Industry and Competitiveness), PID2019-104966GB-I00 (Spanish Ministry of Science and Innovation), the IT-1244-19 (Basque Government) program and project 3KIA (KK-2020/00049) funded by the SPRI-Basque Government through the ELKARTEK program.

References 1. Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003) 2. R. Cummins, C. O’Riordan, An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval, in 17th Artificial Intelligence and Cognitive Science Conference (AICS 2006), ed. by P.S.P.M.D. Bell (Queen’s University, Belfast, 2006)

27 Semantic Composition of Word-Embeddings …

423

3. T.T.H. Dinh, T.H. Chu, Q.U. Nguyen, Transfer learning in genetic programming, in Proceedings of the IEEE Congress on Evolutionary Computation CEC-2015, Sendai, Japan. (IEEE Press, 2015), pp. 1145–1151 4. H.J. Escalante, M.A. García-Limón, A. Morales-Reyes, M. Graff, M. Montes-y Gómez, E.F. Morales, J. Martínez-Carranza, Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. 83, 176–189 (2015) 5. F.-A. Fortin, D. Rainville, M.-A.G. Gardner, M. Parizeau, C. Gagné et al., DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13(1), 2171–2175 (2012) 6. U. Garciarena, R. Santana, A. Mendiburu. Evolved GANs for generating Pareto set approximations, in Proceedings of the 2018 on Genetic and Evolutionary Computation Conference (ACM, 2018), pp. 434–441 7. M. Iyyer, J.L. Boyd-Graber, L.M.B. Claudino, R. Socher, H. Daumé III, A neural network for factoid question answering over paragraphs, in Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 633–644 8. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (The MIT Press, Cambridge, 1992) 9. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). CoRR, arXiv:abs/1301.3781 10. T. Mikolov, Q.V. Le, I. Sutskever, Exploiting similarities among languages for machine translation (2013). CoRR, arXiv:abs/1309.4168 11. T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119 12. N. Oren, Improving the effectiveness of information retrieval with genetic programming. Master’s thesis, Faculty of Science of the University of Witwatersrand, Johannesburg, 2002 13. J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation, in Empirical Methods in Natural Language Processing (EMNLP), vol. 14 (2014), pp. 1532–1543 14. R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (www. Lulu.com, Morrisville, 2008) ˇ uˇrek, P. Sojka, Software framework for topic modelling with large corpora, in Proceed15. R. Reh˚ ings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, May 2010 (ELRA, 2010), pp. 45–50 16. I. Roman, A. Mendiburu, R. Santana, J.A. Lozano, Evolving Gaussian Process kernels for translation editing effort estimation, in Proceedings of the Learning and Intelligent Optimization Conference (LION) (ACM, Chania, Greece, 2019a), pp. 304–318 17. I. Roman, R. Santana, A. Mendiburu, J.A. Lozano, Sentiment analysis with genetically evolved Gaussian kernels, in Proceedings of the 2019 on Genetic and Evolutionary Computation Conference (ACM, Prague, Czech Republic, 2019b), pp. 1328–1336 18. R. Santana, R. Armañanzas, C. Bielza, P. Larrañaga, Network measures for information extraction in evolutionary algorithms. Int. J. Comput. Intell. Syst. 6(6), 1163–1188 (2013) 19. A. Trotman, Learning to rank. Inf. Retr. 8(3), 359–381 (2005) 20. J. Turian, L. Ratinov, Y. Bengio, Word representations: a simple and general method for semisupervised learning, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics, pp. 384–394, 2010)

Chapter 28

New Approach for Continuous and Discrete Optimization: Optimization by Morphological Filters Chahinez Nour El Houda Khelifa and Abderrahim Belmadani

Abstract In this paper, we propose a new metaheuristic algorithm called Optimization by Morphological Filters (OMF) and inspired by image processing methods. The OMF algorithm mimics a morphological transformation called functional erosion for searching the global optimum in a multidimensional space. The algorithm was benchmarking using 13 well-know test functions. OMF was validated by comparing its results with those of PSO, GSA, GHS and the results obtained are very convincing. It was also applied to solve integer programming problems, verified using 6 test problems, the proposed approach shows very competitive results which confirm that OMF could have an important place in the set of optimization algorithms.

28.1 Introduction “Metaheuristic” is a Greek word that means “to find”, “to know” or “to guide an investigation” [1]. It can be defined as the process of finding a good solution at very reasonable computational times comparing with iterative methods or heuristic algorithms [2], this make them the best solution for complex and difficult optimization problems. They are general stochastic optimization approach with iterative behavior which start the search process from a single point and continue sequentially e.g: Simulated Annealing [3] or from a random initial population exploring with a parallel manner e.g.: swarm intelligence algorithm.

C. N. E. H. Khelifa (B) · A. Belmadani Université des sciences et de la technologie d’Oran Mohamed Boudiaf, USTO-MB, P 1505, EL M’naouer, Oran 31000, Algérie e-mail: [email protected]; [email protected] A. Belmadani e-mail: [email protected]; [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1_28

425

426

C. N. E. H. Khelifa and A. Belmadani

Due to their simplicity, robustness, stochasticity and capability to explore the search space without being sensitive to the dimension, metaheuristics are employed to solve a large variety of real world problems where the search space is usually unknown and very complex, such as image processing [4, 5], pattern recognition [6, 7], dispatching economic [8, 9]...ect and scientists continue to develop other new algorithms and explore other domains. In this paper, we use morphological transformations as a source of inspiration to develop a new stochastic algorithm, namely Optimization by Morphological Filter for solving optimization problems. The remainder of this paper can be summarized as follow: Sect. 28.2 describes a brief overview of existing metaheuristic algorithms. Section 28.3 outlines the proposed OMF algorithm, Sect. 28.4 presents the results and discussion of benchmark functions from literature to demonstrate the effectiveness and robustness of the OMF algorithm and the last section is used to conclude the paper and present some perspectives.

28.2 A Brief Overview of Existing Metaheuristic Algorithm Metaheuristics take their characteristics of robustness from the nature. they generally imitate the best features in nature which made their competence and effectiveness. Some of them mimic the concept of evolution in nature, such as Genetic Algorithm [10] which conceptualize Darwinian evolution, Deferential Evolution [11], Evolution Strategies [12], Evolutionary Programming [13]... Others of them imitate the social intelligence of creatures, this branch of algorithms was firstly proposed in 1993 [14] and it’s generally inspired by hunting and search behaviors of these swarms. Among them are Cuckoo Search algorithms [15], Particle Swarm Optimization [16], Ant Colony [17], Bee Colony [18], Bat Inspired Algorithm [19], Biography Based Optimizer [20] and Grey Wolf Optimizer [21]... There are also physic-based approaches that mimic the rules of physic such as Gravitational Search Algorithm [22], BigBang Big Crunch [23], Charged System Search [24], Central Force Optimization [25], Artificial Chemical Reaction Optimization Algorithm [26], Black Hole [27], Curved Space Optimization [28]... There are also some recent algorithms which are inspired from different phenomena such as Harmony Search Optimization and its derivation (HS,GHS) [29, 30]. It’s a music-based metaheuristic algorithm. It was inspired by the observation that the aim of music is to search for a perfect state of harmony.

28.3 Optimization by Morphological Filter Our aim is to establish a new stochastic optimization algorithm inspired by morphological transformation. OMF algorithm mimics the concept of erosion transformation which consists to find the minimum combination of pixel values in the neighborhood

28 New Approach for Continuous and Discrete …

427

of structuring elements. In the next subsection, a brief overview of the functional erosion is given in order to provide a proper background followed by OMF explication.

28.3.1 Inspiration Mathematical morphology was developed from set theory. It was firstly introduced by matheron [31] in order to analyze geometric structure of metallic samples. It was extended by Serra [32] to image processing. It consists to extract features from the original image by comparing it with the structuring element (the structuring element is a set or function describing a shape namely morphological filter). It was initially developed to process binary images, then extended to grayscale (multilevel images). Functional erosion is one of the fundamental operations of mathematical morphology (the others are: dilatation, opening and closing). It is applied to grayscale images and it tends to erode, shrinking the grayscale values of the original image. Functional erosion corresponds to finding the minimum of pixel combination and the kernel function (structuring element). It can be defined mathematically as: (28.1) ‘ (28.1) ε B ( f ) = f  B t (x) = in f { f (y) , B ∈ Bx } This transformation has the properties to reduce the “peaks” of gray levels and widen “the valley”; it tends to homogenize the image to darken it and to spread the edges of darkest objects. The figure below shows the result of erosion transformation by a flat structuring element (7*7) where we can see clearly the properties discussed below (Fig. 28.1 and 28.2). As previously said, functional erosion takes the original image and structuring element as input to extract features from the analyzed image. We note here that the results obtained are very correlative to the characteristics of structuring elements. For this reason, OMF replaces the traditional structuring element by a star where the center (C1: diamond) is the actual solution and N1, N2, N3 and N4 are randomly generated neighbors (morphological filter). As mentioned below, OMF is a stochastic approach where the randomness factor must be present. Randomly generated variables are ranged in [0, R] in order to

Fig. 28.1 Results of Erosion with a flat structuring element

Initial image

Eroded image

428

C. N. E. H. Khelifa and A. Belmadani

Structuring element in image processing

OMF filter

Fig. 28.2 Structuring element versus OFM filter

form the search space (a hypercube with length R). This procedure is assured by normalization process [33] defined as follows (28.2): ‘ x − xmin ∗R (28.2) x = xmax − xmin Where x represents the variables generated randomly, xmin , xmax are user-fixed parameters (the boundaries of the random generation interval) The search process starts by launching many filters operating in parallel. A filter applied to the objective function solution explores the neighborhood of its center and return the neighbor with the best fitness if it exists to become the new filter’s center. Otherwise, the size of the filter is reduced in order to check a closer neighborhood. This procedure is repeated on all filters until the exhaustion of the neighborhood search space (size of all filters  ε). OMF steps can be summarized as follows Algorithm 39: Algorithm 39 Optimization by Morphological Filter Algorithm Require: Min, Max, Filter size,NB-FILTER, Search space dimension; Ensure: Global Optimum ; repeat for all i  N B − F I L T E R do Neighborhood calculation Procedure (); Movement Procedure (); end for until the size of all filters ε Return the best solution found

Where Neighborhood calculation() and Movement() Procedures are detailing the neighborhood calculation and movement applied in each filter, they are described down below Algorithm 40, 41: Neighborhood calculation procedure is implemented in order to assure the two fundamental characteristics of any metaheuristic algorithm: intensification (or exploration) and diversification (or exploitation). The jth coordinate of each neighbor is

28 New Approach for Continuous and Discrete …

429

Algorithm 40 Neighborhood calculation Procedure (); for all neighbors of all filter center do choose randomly: X = X j, f + a ∗ Filter si ze Or X = X j, f = random ∗ R end for

calculated by taking one of the following options: do not change position, shift left, shift right which guarantees the searching around the actual solution (intensification propriety). These possibilities represent 75% of the total cases and there are realized with the following Eq. 28.3: X j, f = X j, f + a ∗ Filter si ze

(28.3)

Where Filter size denotes the size of the f th filter and a is a parameter randomly chosen from the set {1, -1, 0}. The remaining 25% is used to diversify the search and to explore unvisited areas. The jth coordinate of this case can be calculated using the following formula (28.4) X j, f = random ∗ R

(28.4)

Algorithm 41 Movement Procedure (); for all filters do if any neighbor is better than the Filter center then Move to best one else reduce the size of the filter; end if end for

The displacement of the filter center is assured by the Movement procedure () given in Algorithm 41. For all filters centers, we allocate the neighbor with the best fitness to become the new center of the current filter if it exists. Otherwise, the size of this filter will be reduced using Eq. 28.5.It is necessary to note that the initial size of all the filters can be taken equal to R. Filter si ze =

R ck

(28.5)

Where: R is the range of search space, K denotes the number of reductions of the size of the actual filter and C is a constant which is fixed in our case at 1.001.

430

C. N. E. H. Khelifa and A. Belmadani

Fig. 28.3 Example: Iteration 1

Fig. 28.4 Example: Iteration 2

Figures 28.3, 28.4 and 28.5 illustrate the neighborhood calculation and movement procedures with fitness function f (x, y) = x 2 + y 2 . In iteration 1, the center C1 has fitness = 2 and neighbors coordinates are calculated using neighborhood calculation procedure () Algorithm 40, where N 1(−5, 0), N 2(0, 1.25), N 3(5, 2.25), N 4(0, −5) with f (N 1) = 25, f (N 2) = 1.56, f (N 3) = 6.25, f (N 4) = 25. The movement procedure(), shown in Algorithm 41, compares the fitness’s of the actual solution (C1) and its neighbors and performs a movement or a reduction of the parameter filter size. Since f (N 2) in iteration 1 is better than f (C1), N2 becomes the new center of the filter in iteration 2. contrariwise to iteration Fig. 28.3, no neighbor presents a better solution in iteration Fig. 28.4 so the algorithm will perform a reduction of the size of the filter Fig. 28.5. We note that neighbors coordinates are always generated by the neighborhood calculation procedure. In order to give more explanation of our approach, we choose to show the OMF behavior when the objective function is a two dimensional Rosenbrock function in the range [−2, 2]. For this example, we apply OMF using three filters with four neighbors and an initial f ilter si ze = 10.The stop criterion is ε = 10−5 . The centers of the filters (Solutions) C1, C2 and C3 are represented by small black rhombus and

28 New Approach for Continuous and Discrete …

431

Fig. 28.5 Example: Iteration 3

the neighbors are represented by circles. The four neighbors of the f th solution are Nf.1 to Nf.4 where f is the filter number. Iteration 1 and Movement show the solutions connected with their neighbors by a straight line. The others show only the position of the new solutions (centers of the filters) at the actual iteration. The global optimum is marked at (x1 = 1, x2 = 1) where the function returns 0. Figure 28.6 shows the results of the first iteration. Here, the positions of the centers of the filters are generated randomly and the calculation of the neighbors is performed using Neighborhood calculation () Procedure (see Table 28.1). For example, the first coordinate of N1.1 is calculated by expression (28.3) with a = 0 (randomly chosen) while the first coordinate of N2.4 in iteration 2 is calculated with expression (28.4). The movement procedure, shown in Algorithm 41, compares the fitness’s of each solution and its neighbors and performs a movement or a reduction of the parameter Filter size. Here, N1.4 becomes the new C1, N2.3 becomes the new C2, N3.2 becomes the new C3 and N4.4 becomes the new C4. In this example and at this iteration all the centers are moved, however for the next iterations if no neighbor is performing better than the center so this one remain unchanged and the parameter Filter size is reduced using Eq. 28.5. The results after 25, 50, 75 and 100% of the iterations are shown in Fig. 28.6 respectively.. the coordinates of filters centers in the last iteration are given in Table 28.2; all filter centers are close to rosenbrock optimum coordinate(1,1)and thier movement toward it is shown in Fig. 28.6

28.4 Results and Discussion 28.4.1 Real Programming Problems As any new optimization algorithm, it is necessary to test its credibility using a set of benchmark functions. In this section, 13 classical test functions used in the literature

432

C. N. E. H. Khelifa and A. Belmadani

(a) Iteration 1

(b) Movement

(c) Iteration 4722(25%)

(d)Iteration 9445(50%)

(e) Iteration 14167(75%)

(f) Iteration 18889(100%)

Fig. 28.6 OMF behavior using Rosenbrock function

[16, 21, 22, 30] are performed to validate OMF. They are a mixture of minimization problems with various characteristics such as the modality, separability and space dimension. This allows us to evaluate OMF and to identify in which kind of problem, it performs better comparison with other algorithms. This benchmark function functions are listed in Table 28.3 where Fct indicates nouns of used functions, D indicates the dimension of the function, Range Space (RS) is the boundary of the function’s search space, and Fmin is the theoretical global optimum. The OMF algorithm was run 30 times on each benchmark function with number of filters N B − f ilter s = 30, number of neighbors = 10 and a minimum size of the filters (stop criterion: Size of each filter less than ε = 10−24 ).The values of the OMF parameters are fixed according to the tests, we opted for the values gave the best ratio time/optimum. The statistical results (average and standard deviation) are reported in Table 28.4.Among the algorithms cited in Sect. 1.2, we chose the ones

28 New Approach for Continuous and Discrete … Table 28.1 Iteration 1 Center C1 (−2, −2) f itness = 90.04482 C2 (−0.10835, 0.95416) f itness = 1229.06620 C3 (1.96606, 0.36093) f itness = 259.53315 C4 (−1.59565, 0.95614) f itness = 229.822

Neighbors

Coordinate

Fitness

N1.1 N1.2 N1.3 N1.4 N2.1 N2.2 N2.3 N2.4 N3.1 N3.2 N3.3 N3.4 N4.1 N4.2 N4.3 N4.4

(−0.10835, 1.52559) (−0.67978, 1.52559) (0.46307, 0.95416) (−0.67978, 0.95416) (1.96606, 0.932362) (2, 0.932362) (1.39463, 0.36093) (1.96606, 0.93236) (−1.59565, 0.38471) (−1.02422, 1.52757) (−2, 0.38471) (−1.59565, 0.38471) (0.37332, 1.55049) (0.37332, 2) (0.37332, 1.55049) (−0.19810, 0.97906)

230.40353 115.92217 55.00818 27.03384 861.20809 942.04000 251.08458 861.20809 473.89537 26.99790 1316.02620 473.89537 199.51868 346.58657 199.51868 89.76120

Table 28.2 Final iteration Center Coordinate C1 C2 C3 C4

433

(0.999999429112051, 0.999998278876708) (1.00000200647046, 1.00000602199272) (0.999998892129236, 0.999999038982221) (0.999998988222136, 0.999996931480444)

Fitness 4.07653E −10 1.58660E −10 1.10218E −10 3.38902E −11

that have successfully implemented for a wide range of continuous optimization. OMF results are compared with those of PSO [16] as an SI-based technique, GSA [22] as a physics-based algorithm in addition to GWO results [21] and GHS [30] as recent and robust algorithms. Our focus is to test OMF ability to find an appropriate balance between the exploitation and exploration proprieties. The unimodal functions(the first five functions) are used to test the exploitation propriety in contrast to multimodal functions (the second five functions) where the number of local minima rises exponentially with space dimension which make them suitable to test if OMF can escape from the local optimum(exploration propriety). Table 28.4 shows that OMF provides very competitive results both of unimodal and multimodal functions. It is able to outperform all others in Schwefel, Griewangk and Mickalwicz functions. For the fixed-dimension function, we can see clearly that OMF outperforms all others in the total of functions (3/3). Table 28.4 shows also that OMF is able to return

434

C. N. E. H. Khelifa and A. Belmadani

Table 28.3 Benchmarck functions Fct Sphere Schwefel problem

Formulation 100 2 f (x) = i=1 xi N N f (x) = i=1 |x| + i=1 |x| N

2 j=1 x j )

f (x) =

Rosenbrock

f (x) =  100  2 2 2 i=1 100 ∗ (xi − xi−1 ) + (xi−1 − 1) 100 f (x) = i=1 (xi + 0.5)2  100 f (x) = i=1 (xi sin( |xi |))

Step Schwefel function Rastring Ackley

Griewangk Michalewicz Six-Hump Camel Braninrcos Golden-Price

i=1 (

i

Rotated hyper ellipsoid

D

RS

Fmin

30

[−100, 100]

0

30

[−10, 10]

0

30

[−100, 100]

0

30

[−30, 30]

0

30

[−100, 100]

0

30

[−500, 500]

−12569.5

30

[−5.12, 5.12]

0

30

[−32, 32]

0

30

[−600, 600]

0

30

[0, π ]

−4.687

30

[−5, 5]

−1.0316

f (x) = (x2 − 5.12 x12 + π5 x1 − 6)2 + 10(1 −

30

[−100, 100]

0.398

f (x) = (1 + (x1 + x2 + 1)2 ∗ (19 − 14x1 + 3x12 − 14x2 + 6x1 x2 + 3x22 )) ∗ (30 + (2x1 − 3x2 )2 ∗ (18 − 32x1 + 12x12 + 48x2 − 36x1 x2 + 27x22 ))

30

[−2, 2]

3

 100  2 ) + 10 i=1 x i − 10 cos(2π xi   100 2 1 f (x) = −20ex p i=1 xi − 30

1 cos 2π x + 20 + e1 ex p 30 i xi 1  N x 2 −  N cos √ f (x) = 4000 +1 i=1 i i=1 f (x) =

i (i∗xi2 )20 f (x) = − i=1 sin xi ∗ sin π f (x) = 4x 2 − 1.2x14 + 13 x16 + x1 x2 + 4x22 + 4x24

100

4π 1 8π ) cos x 1 + 10

exactly the global optimum of many functions (Six-Hump Camel back, Braninrcos, Golden-Price and Griewangk). By combining the results of all benchmark functions, we can conclude that OMF has a good balance between intensification and diversification. This capability is assured by the mechanism discussed above. It will be preferable to do a static comparative study, but the computational time of cited algorithms was not mentioned. Generally OMF do not exceed 3 min to find the optimum benchmark functions (between 7.098 s for Golden price function, 64.866 s for Michalewicz and 172.35 s for Ackley function).

28.4.1.1

Scalability Study

When the dimension of the functions increases from 30 to 100, the performance of the different methods degraded. As shown in Table 28.5, The comparison results of the OMF with HS and its variants show that the OMF is performing better in 4 cases.

28 New Approach for Continuous and Discrete …

435

Table 28.4 Results of Benchmark functions Fct GWO [21] PSO [16]

GSA [22]

GHS [30]

OMF

6.59E −28

0.000136 (0.000202) 0.042144

2.53E −16

1.0E −5

(9.6E −17 )

(2.2E −5 )

0.055655

0.0728

3.32E −33 (2.7E −48 ) 0, 02058

(0.04542) 70.12562

(0.194074) 896.5347

(0.1144) 5146.2

(0) 0, 050921

(22.11924) 96.71832 (60.11559) 0.000102 (8.28E − 05) −1.03163

(318.9559) 67.54309 (62.22534) 2.5E −16 (1.74E − 16) −1.03163

(6348.7) 49.669 (6348.8) 0 (0) −1.0316

(7, 05E − 17) 0, 145492 (5, 64E −17 ) 3, 755E −33 (4, 701E −33 ) −1, 0316

(6.25E −16 ) 0.397887 (0) 3 (1.33E −15 ) −4841.29

(4.88E −16 ) 0.397887 (0) 3 (4.17E −15 ) −2821.07

(0.000018) *** *** *** *** −12569.46

(0) 0, 398 (0) 3 (0) −12569, 5

(1152.814) 46.70423 (11.62938)

(493.0375) 25.96841 (7.470068)

(0.050) 0.0086 (29)

0.276015 (0.50901) 0.009215 (0.007724) 3.627168 (2.560828)

0.062087 (0.23628) 27.70154 (5.040343) 5.859838 (3.831299)

0.0209 (0.0216) 0.1024 (0.1756) *** ***

(1, 85E −12 ) 0, 031064 (5, 646E − 17) 2, 378E −7 (1, 129E −16 ) 0, 00021 (0) −4, 687 (1, 80E − 15)

Sphere

(6.34E −5 ) 7.18E −17

Schwefel problem

(0.029014) Rotated hyper 3.29E −6 ellipsoid (79.14958) Rosenbrock 26.81258 (69.90499) Step 0.816579 (0.000126) Six-Hump −1.0316 Camel (−1.03163) Braninrcos 0.397889 (0.397887) Golden-Price 3.000028 (3) Schwefel −6123.1 function (−4087.44) Rastring 0.310521 (47.35612) Ackley Griewangk Michalewicz

28.4.1.2

1.06E −13 (0.077835) 0.004485 (0.006659) 4.042493 (4.252799)

Effect of Stop Criterion

The stop criterion is the smallest value of the Filter size parameter; translated here by ε. Table 28.6 shows the effect of this parameter on the results of OMF algorithm. The algorithm is applied to the optimization of the benchmark functions of Table 28.3 with four different values of ε. We go from a larger size ε = 10−9 to the smallest one ε = 10−24 . We note That OMF gives the best results for greater precision. That can be explained by the fact that the small procession makes the sweeping of the

436

C. N. E. H. Khelifa and A. Belmadani

Table 28.5 Results of benchmarck functions DIM = 100 Fct HS [29] IHS [34] GHS [30] Sphere Schwefel problem Rotated hyper ellipsoid

Rastring Ackley Griewangk

OMF

8.683062 (0.775134) 82.926284

8.840449 (0.762496) 82.548978

2.230721 (0.565271) 19.020813

1.79E −13 (1.28375E −28 ) 43,136096

(6.717904) 215052.904398

(6.341707) 213812.584732

(5.093733) 321780.353575

(1, 445E −14 ) 1, 49E +05

(28276.375538) (28276.375538)

(28305.249583) (28305.249583)

(39589.041160) (39589.041160)

343.49779 (27.24538) 13.857189 (0.284945) 195.59257 (24.80835)

343.23204 (25.14946) 13.801383 (0.530388) 204.29151 (19.15717)

80.657677 (30.368471) 8.767846 (0.880066) 54.252289 (18.600195)

(2, 96014E −11 ) (2, 96014E − 11) 192,0721007 (1, 1563E −13 ) 8.75 (3.61345E −15 ) 32.04980615 2.89076E −14

regions not visited before possible. However, some optimum found do not change significantly, which shows a stability of the optimization process of OMF.

28.4.2 Integer Programming Problem Many real-world applications require the variables to be integers. These problems are called Integer Programming problems. Generally, an optimization problem can be easily converted to solve integer programming problem by rounding off the real optimum values to the nearest integer. Here, we choose six common integer programming benchmark problems (see Table 28.7) to investigate the performance of OMF. The OMF algorithm was applied to the above test problems and the results are shown in Table 28.8. The results of OMF are compared to those of GHS [30]. We can see clearly that the two approaches performed comparably. They can find the global optimum solution for all the benchmark problems except for F4. However, when OMF is more precise (for a smaller ε); the algorithm can find the global optimum.

28.5 Conclusion This paper introduces a novel optimization algorithm inspired by image processing tools. OMF mimics functional erosion to search the global optimum in multidimen-

28 New Approach for Continuous and Discrete …

437

Table 28.6 The effect of stop criterion Fct ε = 10−9 ε = 10−12

ε = 10−16

ε = 10−24

2.26E −16

1.07E −22

4.70E −31

(1.5E −31 )

(7.1E −38 )

(2.2E −5 )

0.09435

0.090409

0.046469

3.32E −33 (2.7E −48 ) 0.02058

(0) 1.22E −01

(2.8E −17 ) 0.114811

(1.4E −17 ) 0.114811

(0) 0.050921

(4.2E −17 ) 0.145492 (2.82144E −17 ) 5.72E −23 (1.19493E −38 ) −1.03164 (0) 0.397887 (0) 3 (0) 12569.4

(2.8E −17 ) 0.145424 (0) 5.36E −31 (2.67235E −46 ) −1, 03162 (0) 0.397887 (0) 3 (0) 12569.4

(7.05E −17 ) 0.145492 (5.64E − 17) 3.755E −33 (4.701E −33 ) −1, 0316 (0) 0.398 (0) 3 (0) −12569.5

((1.8E −12 )) 0.098044 (2.8E −17 )) 0.192977 (5.6E −17 ) 0.000219 (1.1E −19 ) −4.68765 (1.8E −15 )

((1.8E −12 )) 0.096518 (1.4E −17 ) 0.102348 (4.2E −17 )) 0.000219 (1.6E −19 ) −4.68765 (1.8E −15 )

(1.85E −12 ) 0.031064 (5.646E −17 ) 2.378E −7 (1.129E −16 ) 0.00021 (0) −4.687 (1.8E −15 )

Sphere Schwefel problem Rotated hyper ellipsoid

(7.0E −17 ) Rosenbrock 0.145492 (5.64E −17 ) Step 1.15E −17 (4.701E −33 ) Six-Hump Camel −1.0316 (0) Braninrcos 0.397887 (0) Golden-Price 3 (0) Schwefel 12569.4 function ((1.8E −12 )) Rastring 0.100070 (5.6E −17 ) Ackley 4.25E −01 (1.1E −16 ) Griewangk 0.00021 (0) Michalewicz −4.68765 (1.8E −15 )

sional space. we used 13 benchmark functions in order to test OMF algorithm in term of exploitation, exploration, scalability and effect of filter’s size. We, then, compared its performance to that of well-known heuristics such as PSO, GSA, GHS and GWO. Unimodal function results obtained confirm the potential of the proposed approach in term of exploration. After the exploitation propriety is tested using multimodal functions and it shows that it is able to exploring the search space extensively. The scalability of the proposed approach is confirmed by augmenting the dimension of the search space from 30 to 100. The effect of the filter size, which is an important parameter in the algorithm is also shown in this paper. Finally, OMF was applied to

438

C. N. E. H. Khelifa and A. Belmadani

Table 28.7 Integer problem benchmarck No Formulation D

RS

Fmin

1

f (x) =  N i=1 |sqr (x)|

2

f (x) = (9x22 + 2x22 − 11)2 + (3x12 + 4x22 − 7)2 f (x) = (x1 + 10x2 )2 + 5(x3 + x4 )2 + (x2 + 2x3 )4 + 10(x1 + x4 )4 f (x) = 2x12 + 3x22 + 4x1 x2 − 6x1 − 3x2 − 6x1 − 3x2 f (x) = −3803.4 − 138.08x1 − 232.64x2 + 182025x1 x2 f (x) = x T x

3

4

5

6

5 15 30 2

[−100, 100]

0

[−100, 100]

0

4

[−100, 100]

0

2

[−100, 100]

0

2

[−100, 100]

−3833.12

2

[−100, 100]

0

Table 28.8 Results of integer programming problems Fct GHS F1(5) F1(15) F1(30) F2 F3 F4 F5 F6

0(0) 0(0) 0(0) 0(0) 0(0) −5(1) −3833.12(0) 0(0)

OMF 0(0) 0(0) 0(0) 0(0) 0(0) −6(0) −3833.12(0) 0(0)

solve integer programming problems using six test problems and the results are very competitive. We do not pretend to have a universal solution for optimization problems because such a solution doesn’t exist, but we can say that we developed a new optimization algorithm based on a new approach. We actually are testing the OMF algorithm for the resolution of constrained problems and engineering problems, the results should be communicated in the near future.

References

439

References 1. A. Lazar, R.G Reynolds, Heuristic knowledge discovery for archaeological data using genetic algorithms and rough sets, in Artificial Intelligence Laboratory. Department of Computer Science, Wayne State University (2003) 2. S.J. Russell, P. Norvig, Artificial Intelligence a Modern Approach (Prentice Hall, Upper Saddle River, 1995) 3. S. Kirkpatrick, C.D Gelatto, M.P. Vecchi, Optimization by simulated annealing. Science 220, 671–680 (1983) 4. O. Cordon, S. Damas, J. Santamarı, A fast and accurate approach for 3D image registration using the scatter search evolutionary algorithm. Pattern Recognit. Lett. 27, 1191–1200 (2006) 5. H. Nezamabadi-pour, S. Saryazdi, E. Rashedi, Edge detection using ant algorithms. Soft Comput. 10, 623–628 (2006) 6. Y. Liu, Z. Yi, H. Wu, M. Ye, K. Chen, A tabu search approach for the minimum sum-of-squares clustering problem. Inform. Sci. 178, 2680–2704 (2008) 7. X. Tan, B. Bhanu, Fingerprint matching by genetic algorithms. Pattern Recognit. 39, 465–477 (2006) 8. M. Pradhan, P. Kumar Roy, T. Pal, Grey wolf optimization applied to economic load dispatch problems. Electr. Power Energy Syst., 325–334 (2016) 9. D. Singh, J.S. Dhillon, Ameliorated grey wolf optimization for economic load dispatch problem. Energy (2018) 10. J.H. Holland, Genetic Algorithms and the optimal allocation of trials. SIAM J. Comput. Vol. 2(2), 88–105 (1973) 11. R. Storn, K. Price, Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–59 (1997) 12. I. Rechenberg, Evolution strategy. Comput. Intel. Imitat. Life 1 (1994) 13. X. Yao, Y. Liu, G. Lin, Evolutionary programming made faster. IEEE Trans. Evolut. Comput. 3, 82–102 (1999) 14. G. Beni, J. Wang, Swarm intelligence in cellular robotic systems, in Robots and Biological Systems: Towards a New Bionics? (Springer, Berlin, 1993), pp. 703–12 15. Yang, S. Deb, Swarm cuckoo search via levy flight, in Proceeding Of World Congress on Nature and Biologically Inspired Computing (IEEE Publications, 2009), pp. 210–214 16. J. Kennedy, R. Eberhart, Particle swarm optimization, in IEEE International Conference Proceeding (Neural Networks, 1995), pp. 1942–1948 17. A. Colorni, M. Dorigo, V. Maniezzo, Distributed optimization by ant colonies, in 1st European Conference on Artificial Life Proceeding, pp. 134–142 18. D. Karaboga, An idea based on honey bee swarm for numerical optimization. Technical Report TR06, Erciyes University, Engineering Faculty, Computer Engineering Department 19. X.S. Yang, A new metaheuristic bat-inspired algorithm, in Nature Inspired Cooperative Strategies for Optimization (Springer, Berlin, 2010), pp. 65–74 20. D. Simon, Biogeography-based optimization. IEEE Trans. Evolut. Comput. 12, 702–13 (2008) 21. S. Mirjalili, S.M. Mirjalili, A. Lewis, Grey wolf optimizer. Eng. Softw. 69, 46–61 (2014) 22. E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, GSA: a gravitational search algorithm. Inform. Sci. 179, 2232–2248 (2009) 23. O.K. Erol, I. Eksin, A new optimization method: big bang-big crunch. Adv. Eng. Softw. 37, 106–11 (2006) 24. A. Kaveh, S. Talatahari, A novel heuristic optimization method: charged system search. Acta Mech. 213, 267–89 (2010) 25. R.A. Formato, Central force optimization: a new metaheuristic with applications in applied electromagnetics. Prog. Electromag. Res. 77, 425–91 (2007) 26. B. Alatas, Artificial chemical reaction optimization algorithm for global optimization. Expert Syst. Appl. 38, 13170–80 (2011) 27. A. Hatamlou, Black hole: a new heuristic optimization approach for data clustering. Inf. Sci. (2012)

440

C. N. E. H. Khelifa and A. Belmadani

28. F.F. Moghaddam, R.F. Moghaddam, M. Cheriet, Curved space optimization: a random search based on general relativity theory, pp. 1208–2214 (2012) 29. Z.W. Geem, J.H. Kim, G.V. Loganathan, A new heuristic optimization algorithm: harmony search. Simulation 76(2), 60–68 (2001) 30. G. Mahamed, H. Omran, M. Mahdavi, Global-best harmony search. Appl. Math .Comput (2007) 31. G. Matheron, Elements Pour une Theorie des Milieux Poreux (Masson, France, 1967) 32. J. Serra, Image Analysis and Mathematical Morphology (Academic Press, New-York, 1982) 33. J.G. Postaire, C.P.A. Vasseur, An approximate solution to normal mixture identification with application to unsupervised pattern classification. IEEE Trans. Patt. Anal. Mach. 3(2), 163–179 (1981) 34. M. Mahdavi, M. Fesanghary, E. Damangir, An improved harmony search algorithm for solving optimization problems. Appl. Math. Comput. 188, 1567–1579 (2007)

Index

A Accelerometer data, 393 Accelerometer time series, 393 Advection diffusion equation, 223 Allocation, 123 Ant colony system, 253 Archived Multi-Objective Simulated Annealing (AMOSA), 1 Authentification, 393

B Bio-inspired, 199 Bulk terminal, 123

C Cell formation, 269 Classification, 393 Collaboration data, 361 Column generation, 123 Completion time, 1 ε-constraint, 33 Correlation matrix, 361 Cuckoo Search, 377

ELSHADE, 103 Evolutionary algorithm, 223 Evolutionary operators, 285

F Facility layout, 87 Feature selection, 139 Feature set problem, 139 Film transfer coefficient, 377 Financial market, 239 Forecasting, 239 Fuzzy logic, 253

G Generic programming, 409 Genetic algorithm, 51, 269 GISMOO, 69 Grammatical evolution, 393 Greedy algorithm, 17

D Differential evolution, 183

H Heat transfer, 377 Hidden Markov model, 167 Hybrid flow shop, 17 Hybridization, 183

E Energy consumption, 33, 151 Entropygeneration, 377

I Identity, 393 Islandmodel, 69

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Yalaoui et al. (eds.), Heuristics for Optimization and Learning, Studies in Computational Intelligence 906, https://doi.org/10.1007/978-3-030-58930-1

441

442 K Kernel density estimation, 393 Knapsack problem, 51

L Large scale, 299 Local search, 151 LSHADE, 103

M Makespan, 33 Markov model, 253 Matrix tri-factorization, 285 Memetic algorithm, 285 Meshless method, 223 Mixed integer programming, 33 Multi-objective, 1, 33, 51, 69, 87, 183 Multi-population, 315

N Neighborhood search, 17 Network density, 361 Network generation, 361 Non-dominated Sorting Genetic Algorithm (NSGA-2), 1

P Parallel machine, 17 Parallel model, 69 Pareto, 33, 51, 183 Particle Swarm Optimization (PSO), 199, 239, 315 Partitioning model, 269 Permutation flow shop, 33 Pixel location, 199 Planning, 123 Price exchange rate, 239

Index Production system, 269

Q Quadcopter, 151 Quadratic Assigment Problem (QAP), 69 Quaternion, 299

R Reconfigurable, 1 Reconfigurable Manufacturing (RMS), 1 Routing optimization, 151

Systems

S Scheduling, 33, 123 Scheduling problem, 17 Secret message, 199 Semantic, 409 Setup time, 17 Similarities, 139 Similarity model, 87 Simulated Annealing (SA), 167, 299 SPACMA, 103 Steganographic, 199

T Temperature sensors, 377 Time series, 393 Time windows, 151 Total flow time, 17 Travelling Saleman Problem (TSP), 253

W Word composition, 409 Word-embeddings, 409