286 97 18MB
English Pages 1163 Year 2020
Advances in Intelligent Systems and Computing 991
Hoai An Le Thi Hoai Minh Le Tao Pham Dinh Editors
Optimization of Complex Systems: Theory, Models, Algorithms and Applications
Advances in Intelligent Systems and Computing Volume 991
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science & Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen, Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Hoai An Le Thi Hoai Minh Le Tao Pham Dinh •
•
Editors
Optimization of Complex Systems: Theory, Models, Algorithms and Applications
123
Editors Hoai An Le Thi Computer science and Applications Department LGIPM, University of Lorraine Metz Cedex 03, France
Hoai Minh Le Computer Science and Applications Department LGIPM, University of Lorraine Metz Cedex 03, France
Tao Pham Dinh Laboratory of Mathematics National Institute for Applied Sciences (INSA)-Rouen Normadie Saint-Étienne-du-Rouvray Cedex, France
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-21802-7 ISBN 978-3-030-21803-4 (eBook) https://doi.org/10.1007/978-3-030-21803-4 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
WCGO 2019 was the sixth event in the series of World Congress on Global Optimization conferences, and it took place on July 8–10, 2019 at Metz, France. The conference aims to bring together most leading specialists in both theoretical and algorithmic aspects as well as a variety of application domains of nonconvex programming and global optimization to highlight recent advances, trends, challenges and discuss how to expand the role of these fields in several potential high-impact application areas. The WCGO conference series is a biennial conference of the International Society of Global Optimization (iSoGO). The first event WCGO 2009 took place in Hunan, China. The second event, WCGO 2011, was held in Chania, Greece, followed by the third event, WCGO 2013, in Huangshan, China. The fourth event, WCGO 2015, took place in Florida, USA, while the fifth event was held in Texas, USA. One of the highlights of this biannual meeting is the announcement of Constantin Carathéodory Prize of iSoGO awarded in recognition of lifetime contributions to the field of global optimization. WCGO 2019 was attended by about 180 scientists and practitioners from 40 countries. The scientific program includes the oral presentation of 112 selected full papers as well as several selected abstracts covering all main topic areas. In addition, the conference program was enriched by six plenary lectures that were given by Prof. Aharon Ben-Tal (Israel Institute of Technology, Israel), Prof. Immanuel M. Bomze (University of Vienna, Austria), Prof. Masao Fukushima (Nanzan University, Japan), Prof. Anna Nagurney (University of Massachusetts Amherst, USA), Prof. Panos M. Pardalos (University of Florida, USA), and Prof. Anatoly Zhigljavsky (Cardiff University, UK). This book contains 112 papers selected from about 250 submissions to WCGO 2019. Each paper was peer-reviewed by at least two members of the International Program Committee and the International Reviewer Board. The book covers both theoretical and algorithmic aspects of nonconvex programming and global optimization, as well as its applications to modeling and solving decision problems in various domains. The book is composed of ten parts, and each of them deals with either the theory and/or methods in a branch of optimization such as continuous v
vi
Preface
optimization, DC programming and DCA, discrete optimization and network optimization, multiobjective programming, optimization under uncertainty, or models and optimization methods in a specific application area including data science, economics and finance, energy and water management, engineering systems, transportation, logistics, resource allocation and production management. We hope that the researchers and practitioners working in nonconvex optimization and several application areas can find here many inspiring ideas and useful tools and techniques for their works. We would like to thank the chairs and the members of International Program Committee as well as the reviewers for their hard work in the review process, which helped us to guarantee the highest quality of the selected papers for the conference. We cordially thank the organizers and chairs of special sessions for their contributions to the success of the conference. Thanks are also due to the plenary lecturers for their interesting and informative talks of a world-class standard. The conference was organized by the Computer Science and Applications Department, LGIPM, University of Lorraine, France. We wish to especially thank all members of the Organizing Committee for their excellent work to make the conference a success. The conference would not have been possible without their considerable effort. We would like to express our sincere thanks to our main sponsors: Réseau de Transport d’Électricité (France), Conseil régional du Grand Est (France), Metz Métropole (France), Conseil départemental de la Moselle (France), Université de Lorraine (France), Laboratoire de Génie Informatique, de Production et de Maintenance (LGIPM) - Université de Lorraine, UFR Mathématique Informatique Mécanique Automatique - Université de Lorraine, and DCA Solutions (Vietnam). Our special thanks go to all the authors for their valuable contributions, and to the other participants who enriched the conference success. Finally, we cordially thank Springer for their help in publishing this book. July 2019
Hoai An Le Thi Hoai Minh Le Tao Pham Dinh
Organization
WCGO 2019 was organized by the Computer Science and Applications Department, LGIPM, University of Lorraine, France.
Conference Chair Hoai An Le Thi
University of Lorraine, France
Program Chairs Hoai An Le Thi Tao Pham Dinh Yaroslav D. Sergeyev
University of Lorraine, France National Institute for Applied Sciences - Rouen Normandie, France University of Calabria, Italy
Publicity Chair Hoai Minh Le
University of Lorraine, France
vii
viii
Organization
International Program Committee Members Paula Amaral Adil Bagirov Balabhaskar Balasundaram Paul I. Barton Aharon Ben-Tal Immanuel M. Bomze Radu Ioan Bot Sergiy Butenko Stéphane Canu Emilio Carrizosa Leocadio-G. Casado Tibor Csendes Yu-Hong Dai Gianni Di Pillo Ding-zhu Du Matthias Ehrgott Shu-Cherng Fang José FernándezHernández Dalila B. M. M. Fontes Masao Fukushima Vladimir Grishagin Ignacio E. Grossmann Yann Guermeur Mounir Haddou Milan Hladík Joaquim Judice Oleg Khamisov Diethard Klatte Pavlo Krokhmal Dmitri Kvasov Carlile Lavor Dung Muu Le Hoai Minh Le Gue Myung Lee
University NOVA de Lisboa, Portugal Federation University, Australia Oklahoma State University, USA Massachusetts Institute of Technology, USA Technion - Israel Institute of Technology University of Vienna, Austria University of Vienna, Austria Texas A&M University Engineering, USA National Institute for Applied Sciences-Rouen, France University de Seville, Spain University de Almería, Spain University of Szeged, Hungary Chinese Academy of Sciences, China University Rome La Sapienza, Italy University of Texas at Dallas, USA University of Auckland, Australia North Carolina State University, USA University de Murcia, Spain University of Porto, Portugal Nanzan University, Japan N.I. Lobachevsky State University of Nizhny Novgorod, Russia Carnegie Mellon University, USA LORIA, France National Institute for Applied Sciences-Rennes, France Charles University, Czech Republic University Coimbra, Portugal Energy Systems Institute, Russian Academy of Sciences, Irkutsk, Russia University of Zurich, Switzerland University of Arizona, USA University of Calabria, Italy University of Campinas, Brazil Institute of Mathematics, Hanoi, Vietnam University of Lorraine, France Pukyong National University, Korea
Organization
Jon Lee Vincent Lefieux Duan Li Leo Liberti Hsuan-Tien Lin Abdel Lisser Angelo Lucia Stefano Lucidi Andreas Lundell Lina Mallozzi Pierre Maréchal Kaisa Miettinen Michel Minoux Shashi Kant Mishra Dolores Romero Morales Ngoc Thanh Nguyen Viet Hung Nguyen Yi-Shuai Niu Ivo Nowak Jong-Shi Pang Panos Pardalos Hoang Pham Janos D. Pinter Efstratios N. Pistikopoulos Oleg Prokopyev Stefan Ratschan Steffen Rebennack Franz Rendl Ana Maria Rocha Ruey-Lin Sheu Jianming Shi Christine A. Shoemaker Eduardo Souza De Cursi Alexander Strekalovsky Jie Sun Akiko Takeda Michael Ulbrich Luis Nunes Vicente Stefan Vigerske
ix
University of Michigan, USA RTE, France City University of Hong Kong, Hong Kong, China Ecole Polytechnique, France National Taiwan University, Taiwan Paris-Sud University, France University of Rhode Island, USA University Roma “La Sapienza,” Italia Abo Akademi University in Turku, Finland University of Naples Federico II, Italy University of Toulouse - Paul Sabatier, France University of Jyvaskyla, Finland Sorbonne University, France Banaras Hindu University, India Copenhagen Business School, Denmark Wroclaw University of Science and Technology, Poland Sorbonne University, France Shanghai Jiao Tong University, China Hamburg University of Applied Sciences, Germany University of Southern California, USA University of Florida, USA Rutgers University, USA Lehigh University, USA Texas A&M University, USA University of Pittsburgh, USA Academy of Sciences of the Czech Republic, Czech Republic Karlsruhe Institute of Technology, Germany University Klagenfurt, Austria University of Minho, Braga, Portugal National Cheng-Kung University, Taiwan Tokyo University of Science, Japan National University of Singapore National Institute for Applied Sciences - Rouen, France Russian Academy of Sciences, Irkutsk, Russia Curtin University, Australia University of Tokyo, Japan Technical University of Munich, Germany Lehigh University, USA Zuse Institute Berlin, Germany
x
Gerhard-Wilhelm Weber Yichao Wu Jack Xin Fengqi You Wuyi Yue Ahmed Zidna Antanas Zilinskas
Organization
Poznan University of Technology, Poland University of Illinois at Chicago, USA University of California, Irvine, USA Cornell University, USA Konan University, Japan University of Lorraine, France Vilnius University, Lithuania
External Reviewers Manuel Arana-Jimenez Abdessamad Amir Domingo Barrera Victor Blanco Miguel A. Fortes Olivier Gallay Pedro González-Rodelas Luo Hezhi Vinh Thanh Ho Baktagul Imasheva Amodeo Lionel Aiman Moldagulova Samat Mukhanov Canh Nam Nguyen Duc Manh Nguyen Manh Cuong Nguyen Thi Bich Thuy Nguyen Thi Minh Tam Nguyen Viet Anh Nguyen Miguel Pasadas Thi Hoai Pham Duy Nhat Phan Jakob Puchinger Lopez Rafael
University of Cádiz, Spain University University University University University University
of of of of of of
Mostaganem, Algeria Granada, Spain Granada, Spain Granada, Spain Lausanne, Switzerland Granada, Spain
Zhejiang University of Technology, China University of Lorraine, France International University of Information Technology, Kazakhstan University of Technology of Troyes, France Al-Farabi Kazakh National University, Kazakhstan International University of Information Technology, Kazakhstan Hanoi University of Science and Technology, Vietnam Hanoi National University of Education, Vietnam Hanoi University of Industry, Vietnam VNU University of Science, Vietnam Vietnam National University of Agriculture, Vietnam University of Lorraine, France University of Granada, Spain Hanoi University of Science and Technology, Vietnam University of Lorraine, France Paris-Saclay University, France Federal University of Santa Catarina, Brazil
Organization
Sabina Rakhmetulayeva Hagen Salewski Daniel Schermer Ryskhan Satybaldiyeva Bach Tran Thi Thuy Tran Yong Xia Xuan Thanh Vo Baiyi Wu Xiaojin Zheng
xi
International University of Information Technology, Kazakhstan University of Kaiserslautern, Germany University of Kaiserslautern, Germany International University of Information Technology, Kazakhstan University of Lorraine, France FPT University, Vietnam Beihang University, China Ho Chi Minh City University of Science, Vietnam Guangdong University of Foreign Studies Tongji University, China
Plenary Lecturers Aharon Ben-Tal Immanuel M. Bomze Masao Fukushima Anna Nagurney Panos M. Pardalos Anatoly Zhigljavsky
Israel Institute of Technology, Israel University of Vienna, Austria Nanzan University, Japan University of Massachusetts Amherst, USA University of Florida, USA Cardiff University, UK
Special Session Organizers 1. Combinatorial Optimization: Viet Hung Nguyen (Sorbonne University, France), Kim Thang Nguyen (Paris-Saclay University, France), and Ha Duong Phan (Institute of Mathematics, Vietnam) 2. Recent Advances in DC programming and DCA: Theory, Algorithms and Applications: Hoai An Le Thi and Hoai Minh Le (University of Lorraine, France) 3. Mixed-Integer Optimization: Yi-Shuai Niu (Shanghai Jiao Tong University, China) 4. Quadratically constrained quadratic programming & QCQP: Duan Li (City University of Hong Kong, Hong Kong, China) and Rujun Jiang (Fudan University, Shanghai, China) 5. Uncertainty Quantification and Optimization: Eduardo Souza de Cursi (National Institute for Applied Sciences - Rouen, France) and Rafael Holdorf (Federal University of Santa Catarina, Brazil) 6. Computing, Engineering and Data Science: Raissa Uskenbayeva and Sabina Rakhmetulayeva (International Information Technology University, Kazakhstan)
xii
Organization
7. Complementarity Problems: Applications, Theory and Algorithms: Mounir Haddou (National Institute for Applied Sciences - Rennes, France), Ibtihel Ben Gharbia, and Quang Huy Tran (IFP Energies nouvelles, France) 8. Optimization Methods under Uncertainty: Manuel Arana Jimenez (University of Cádiz, Spain) 9. Spline Approximation & Optimization with Applications: Ahmed Zidna and Dominique Michel (University of Lorraine, France) 10. Surrogate Global Optimization for Expensive Multimodal Functions: Christine Shoemaker (National University of Singapore, Singapore) 11. Novel Technologies and Optimization for Last-Mile Logistics: Mahdi Moeini and Hagen Salewski (Technische Universität Kaiserslautern, Germany) 12. Sustainable Supply Chains and Logistics Networks: Daniel Roy and Sophie Hennequin (University of Lorraine, France)
Organizing Committee Members Hoai Minh Le Vinh Thanh Ho Bach Tran Viet Anh Nguyen Aurélie Lallemand
University University University University University
of of of of of
Lorraine, Lorraine, Lorraine, Lorraine, Lorraine,
France France France France France
Sponsoring Institutions Réseau de Transport d’Électricité, France Conseil régional du Grand Est, France Metz Métropole, France Conseil départemental de la Moselle, France Université de Lorraine, France Laboratoire de Génie Informatique, de Production et de Maintenance (LGIPM) Université de Lorraine UFR Mathématique Informatique Mécanique Automatique - Université de Lorraine DCA Solutions, Vietnam Springer
Contents
Continuous Optimization A Hybrid Simplex Search for Global Optimization with Representation Formula and Genetic Algorithm . . . . . . . . . . . . . . . . . Hafid Zidani, Rachid Ellaia, and Eduardo Souza de Cursi A Population-Based Stochastic Coordinate Descent Method . . . . . . . . . Ana Maria A. C. Rocha, M. Fernanda P. Costa, and Edite M. G. P. Fernandes
3 16
A Sequential Linear Programming Algorithm for Continuous and Mixed-Integer Nonconvex Quadratic Programming . . . . . . . . . . . Mohand Bentobache, Mohamed Telli, and Abdelkader Mokhtari
26
A Survey of Surrogate Approaches for Expensive Constrained Black-Box Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rommel G. Regis
37
Adaptive Global Optimization Based on Nested Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantin Barkalov and Ilya Lebedev
48
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deepak D. Gawali, Bhagyesh V. Patil, Ahmed Zidna, and Paluri S. V. Nataraj Concurrent Topological Optimization of a Multi-component Arm for a Tube Bending Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Federico Ballo, Massimiliano Gobbi, and Giorgio Previati Discrete Interval Adjoints in Unconstrained Global Optimization . . . . Jens Deussen and Uwe Naumann
58
68 78
xiii
xiv
Contents
Diving for Sparse Partially-Reflexive Generalized Inverses . . . . . . . . . . Victor K. Fuentes, Marcia Fampa, and Jon Lee
89
Filtering Domains of Factorable Functions Using Interval Contractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laurent Granvilliers
99
Leveraging Local Optima Network Properties for Memetic Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viktor Homolya and Tamás Vinkó
109
Maximization of a Convex Quadratic Form on a Polytope: Factorization and the Chebyshev Norm Bounds . . . . . . . . . . . . . . . . . . Milan Hladík and David Hartman
119
New Dynamic Programming Approach to Global Optimization . . . . . . Anna Kaźmierczak and Andrzej Nowakowski
128
On Chebyshev Center of the Intersection of Two Ellipsoids . . . . . . . . . Xiaoli Cen, Yong Xia, Runxuan Gao, and Tianzhi Yang
135
On Conic Relaxations of Generalization of the Extended Trust Region Subproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rujun Jiang and Duan Li
145
On Constrained Optimization Problems Solved Using the Canonical Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constantin Zălinescu
155
On Controlled Variational Inequalities Involving Convex Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Savin Treanţă
164
On Lagrange Duality for Several Classes of Nonconvex Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ewa M. Bednarczuk and Monika Syga
175
On Monotone Maps: Semidifferentiable Case . . . . . . . . . . . . . . . . . . . . Shashi Kant Mishra, Sanjeev Kumar Singh, and Avanish Shahi Parallel Multi-memetic Global Optimization Algorithm for Optimal Control of Polyarylenephthalide’s Thermally-Stimulated Luminescence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maxim Sakharov and Anatoly Karpenko Proper Choice of Control Parameters for CoDE Algorithm . . . . . . . . . Petr Bujok, Daniela Einšpiglová, and Hana Zámečníková Semidefinite Programming Based Convex Relaxation for Nonconvex Quadratically Constrained Quadratic Programming . . . . . . . . . . . . . . Rujun Jiang and Duan Li
182
191 202
213
Contents
xv
Solving a Type of the Tikhonov Regularization of the Total Least Squares by a New S-Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huu-Quang Nguyen, Ruey-Lin Sheu, and Yong Xia
221
Solving Mathematical Programs with Complementarity Constraints with a Penalization Approach . . . . . . . . . . . . . . . . . . . . . . Lina Abdallah, Tangi Migot, and Mounir Haddou
228
Stochastic Tunneling for Improving the Efficiency of Stochastic Efficient Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fábio Nascentes, Rafael Holdorf Lopez, Rubens Sampaio, and Eduardo Souza de Cursi
238
The Bernstein Polynomials Based Globally Optimal Nonlinear Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bhagyesh V. Patil, Ashok Krishnan, Foo Y. S. Eddy, and Ahmed Zidna
247
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deepak Kumar and Yves Lucet
257
Tractable Relaxations for the Cubic One-Spherical Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christoph Buchheim, Marcia Fampa, and Orlando Sarmiento
267
DC Programming and DCA A DC Algorithm for Solving Multiobjective Stochatic Problem via Exponential Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ramzi Kasri and Fatima Bellahcene
279
A DCA-Based Approach for Outage Constrained Robust Secure Power-Splitting SWIPT MISO System . . . . . . . . . . . . . . . . . . . . . . . . . Phuong Anh Nguyen and Hoai An Le Thi
289
DCA-Like, GA and MBO: A Novel Hybrid Approach for Binary Quadratic Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Samir, Hoai An Le Thi, and Mohammed Yagouni
299
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm . . . . . . . . . . . . . . Xuan Vinh Doan and Stephen Vavasis
310
Online DCA for Times Series Forecasting Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viet Anh Nguyen and Hoai An Le Thi
320
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yi-Shuai Niu, Yu You, and Wen-Zhuo Liu
330
xvi
Contents
Sentence Compression via DC Programming Approach . . . . . . . . . . . . Yi-Shuai Niu, Xi-Wei Hu, Yu You, Faouzi Mohamed Benammour, and Hu Zhang
341
Discrete Optimization and Network Optimization A Horizontal Method of Localizing Values of a Linear Function in Permutation-Based Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . Liudmyla Koliechkina and Oksana Pichugina
355
An Experimental Comparison of Heuristic Coloring Algorithms in Terms of Found Color Classes on Random Graphs . . . . . . . . . . . . . Deniss Kumlander and Aleksei Kulitškov
365
Cliques for Multi-term Linearization of 0–1 Multilinear Program for Boolean Logical Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . . Kedong Yan and Hong Seo Ryoo
376
Gaining or Losing Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jon Lee, Daphne Skipper, and Emily Speakman
387
Game Equilibria and Transition Dynamics with Networks Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexei Korolev and Ilia Garmashov
398
Local Search Approaches with Different Problem-Specific Steps for Sensor Network Coverage Optimization . . . . . . . . . . . . . . . . . . . . . Krzysztof Trojanowski and Artur Mikitiuk
407
Modelling Dynamic Programming-Based Global Constraints in Constraint Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrea Visentin, Steven D. Prestwich, Roberto Rossi, and Armagan Tarim
417
Modified Extended Cutting Plane Algorithm for Mixed Integer Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wendel Melo, Marcia Fampa, and Fernanda Raupp
428
On Proximity for k-Regular Mixed-Integer Linear Optimization . . . . . Luze Xu and Jon Lee
438
On Solving Nonconvex MINLP Problems with SHOT . . . . . . . . . . . . . Andreas Lundell and Jan Kronqvist
448
Reversed Search Maximum Clique Algorithm Based on Recoloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deniss Kumlander and Aleksandr Porošin
458
Sifting Edges to Accelerate the Computation of Absolute 1-Center in Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Ding and Ke Qiu
468
Contents
xvii
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adriano Delfino
477
Stochastic Greedy Algorithm Is Still Good: Maximizing Submodular + Supermodular Functions . . . . . . . . . . . . . . . . . . . . . . . . Sai Ji, Dachuan Xu, Min Li, Yishui Wang, and Dongmei Zhang
488
Towards Multi-tree Methods for Large-Scale Global Optimization . . . Pavlo Muts and Ivo Nowak
498
Optimization under Uncertainty Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manuel Arana-Jiménez Minimax Inequalities and Variational Equations . . . . . . . . . . . . . . . . . Maria Isabel Berenguer, Domingo Gámez, A. I. Garralda–Guillem, and M. Ruiz Galán Optimization of Real-Life Integrated Solar Desalination Water Supply System with Probability Functions . . . . . . . . . . . . . . . . . . . . . . Bayrammyrat Myradov
509 518
526
Social Strategy of Particles in Optimization Problems . . . . . . . . . . . . . Bożena Borowska
537
Statistics of Pareto Fronts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Bassi, E. Pagnacco, Eduardo Souza de Cursi, and R. Ellaia
547
Uncertainty Quantification in Optimization . . . . . . . . . . . . . . . . . . . . . Eduardo Souza de Cursi and Rafael Holdorf Lopez
557
Uncertainty Quantification in Serviceability of Impacted Steel Pipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Renata Troian, Didier Lemosse, Leila Khalij, Christophe Gautrelet, and Eduardo Souza de Cursi
567
Multiobjective Programming A Global Optimization Algorithm for the Solution of Tri-Level Mixed-Integer Quadratic Programming Problems . . . . . . . . . . . . . . . . Styliani Avraamidou and Efstratios N. Pistikopoulos
579
A Method for Solving Some Class of Multilevel Multi-leader Multi-follower Programming Problems . . . . . . . . . . . . . . . . . . . . . . . . . Addis Belete Zewde and Semu Mitiku Kassa
589
xviii
Contents
A Mixture Design of Experiments Approach for Genetic Algorithm Tuning Applied to Multi-objective Optimization . . . . . . . . . . . . . . . . . . Taynara Incerti de Paula, Guilherme Ferreira Gomes, José Henrique de Freitas Gomes, and Anderson Paulo de Paiva
600
A Numerical Study on MIP Approaches over the Efficient Set . . . . . . . Kuan Lu, Shinji Mizuno, and Jianming Shi
611
Analytics-Based Decomposition of a Class of Bilevel Problems . . . . . . . Adejuyigbe Fajemisin, Laura Climent, and Steven D. Prestwich
617
KMCGO: Kriging-Assisted Multi-objective Constrained Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yaohui Li, Yizhong Wu, Yuanmin Zhang, and Shuting Wang
627
Multistage Global Search Using Various Scalarization Schemes in Multicriteria Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . Victor Gergel and Evgeniy Kozinov
638
Necessary Optimality Condition for Nonlinear Interval Vector Programming Problem Under B-Arcwise Connected Functions . . . . . . Mohan Bir Subba and Vinay Singh
649
On the Applications of Nonsmooth Vector Optimization Problems to Solve Generalized Vector Variational Inequalities Using Convexificators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Balendu Bhooshan Upadhyay, Priyanka Mishra, Ram N. Mohapatra, and Shashi Kant Mishra
660
SOP-Hybrid: A Parallel Surrogate-Based Candidate Search Algorithm for Expensive Optimization on Large Parallel Clusters . . . . Taimoor Akhtar and Christine A. Shoemaker
672
Surrogate Many Objective Optimization: Combining Evolutionary Search, -Dominance and Connected Restarts . . . . . . . . . . . . . . . . . . . Taimoor Akhtar, Christine A. Shoemaker, and Wenyu Wang
681
Tropical Analogues of a Dempe-Franke Bilevel Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergeĭ Sergeev and Zhengliang Liu
691
U Weak Slater Constraint Qualification in Nonsmooth Multiobjective Semi-infinite Programming . . . . . . . . . . . . . . . . . . . . . . Ali Sadeghieh, David Barilla, Giuseppe Caristi, and Nader Kanzi
702
Contents
xix
Data science: Machine Learning, Data Analysis, Big Data and Computer Vision A Discretization Algorithm for k-Means with Capacity Constraints . . . Yicheng Xu, Dachuan Xu, Dongmei Zhang, and Yong Zhang
713
A Gray-Box Approach for Curriculum Learning . . . . . . . . . . . . . . . . . Francesco Foglino, Matteo Leonetti, Simone Sagratella, and Ruggiero Seccia
720
A Study on Graph-Structured Recurrent Neural Networks and Sparsification with Application to Epidemic Forecasting . . . . . . . . Zhijian Li, Xiyang Luo, Bao Wang, Andrea L. Bertozzi, and Jack Xin
730
Automatic Identification of Intracranial Hemorrhage on CT/MRI Image Using Meta-Architectures Improved from Region-Based CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thi-Hoang-Yen Le, Anh-Cang Phan, Hung-Phi Cao, and Thuong-Cang Phan
740
Bayesian Optimization for Recommender System . . . . . . . . . . . . . . . . . Bruno Giovanni Galuzzi, Ilaria Giordani, A. Candelieri, Riccardo Perego, and Francesco Archetti
751
Creation of Data Classification System for Local Administration . . . . . Raissa Uskenbayeva, Aiman Moldagulova, and Nurzhan K. Mukazhanov
761
Face Recognition Using Gabor Wavelet in MapReduce and Spark . . . Anh-Cang Phan, Hung-Phi Cao, Ho-Dat Tran, and Thuong-Cang Phan
769
Globally Optimal Parsimoniously Lifting a Fuzzy Query Set Over a Taxonomy Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitry Frolov, Boris Mirkin, Susana Nascimento, and Trevor Fenner
779
K-Medoids Clustering Is Solvable in Polynomial Time for a 2d Pareto Front . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicolas Dupin, Frank Nielsen, and El-Ghazali Talbi
790
Learning Sparse Neural Networks via ‘0 and T‘1 by a Relaxed Variable Splitting Method with Application to Multi-scale Curve Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fanghui Xue and Jack Xin Pattern Recognition with Using Effective Algorithms and Methods of Computer Vision Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. B. Mukhanov and Raissa Uskenbayeva
800
810
xx
Contents
The Practice of Moving to Big Data on the Case of the NoSQL Database, Clickhouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baktagul Imasheva, Azamat Nakispekov, Andrey Sidelkovskaya, and Ainur Sidelkovskiy
820
Economics and Finance Asymptotically Exact Minimizations for Optimal Management of Public Finances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean Koudi, Babacar Mbaye Ndiaye, and Guy Degla Features of Administrative and Management Processes Modeling . . . . Ryskhan Satybaldiyeva, Raissa Uskenbayeva, Aiman Moldagulova, Zuldyz Kalpeyeva, and Aygerim Aitim Optimization Problems of Economic Structural Adjustment and Problem of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdykappar Ashimov, Yuriy Borovskiy, and Mukhit Onalbekov Research of the Relationship Between Business Processes in Production and Logistics Based on Local Models . . . . . . . . . . . . . . . Raissa Uskenbayeva, Kuandykov Abu, Rakhmetulayeva Sabina, and Bolshibayeva Aigerim
831 842
850
861
Sparsity and Performance Enhanced Markowitz Portfolios Using Second-Order Cone Programming . . . . . . . . . . . . . . . . . . . . . . . Noam Goldberg and Ishy Zagdoun
871
Managing Business Process Based on the Tonality of the Output Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Raissa Uskenbayeva, Rakhmetulayeva Sabina, and Bolshibayeva Aigerim
882
Energy and Water Management Customer Clustering of French Transmission System Operator (RTE) Based on Their Electricity Consumption . . . . . . . . . . . . . . . . . . Gabriel Da Silva, Hoai Minh Le, Hoai An Le Thi, Vincent Lefieux, and Bach Tran
893
Data-Driven Beetle Antennae Search Algorithm for Electrical Power Modeling of a Combined Cycle Power Plant . . . . . . . . . . . . . . . Tamal Ghosh, Kristian Martinsen, and Pranab K Dan
906
Finding Global-Optimal Gearbox Designs for Battery Electric Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philipp Leise, Lena C. Altherr, Nicolai Simon, and Peter F. Pelz
916
Contents
Location Optimization of Gas Power Plants by a Z-Number Data Envelopment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Farnoosh Fakhari, R. Tavakkoli-Moghaddam, M. Tohidifard, and Seyed Farid Ghaderi
xxi
926
Optimization of Power Plant Operation via Stochastic Programming with Recourse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomoki Fukuba, Takayuki Shiina, Ken-ichi Tokoro, and Tetsuya Sato
937
Randomized-Variants Lower Bounds for Gas Turbines Aircraft Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahdi Jemmali, Loai Kayed B. Melhim, and Mafawez Alharbi
949
Robust Design of Pumping Stations in Water Distribution Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gratien Bonvin, Sophie Demassey, and Welington de Oliveira
957
Engineering Systems Application of PLS Technique to Optimization of the Formulation of a Geo-Eco-Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Imanzadeh, Armelle Jarno, and S. Taibi
971
Databases Coupling for Morphed-Mesh Simulations and Application on Fan Optimal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zebin Zhang, Martin Buisson, Pascal Ferrand, and Manuel Henner
981
Kriging-Based Reliability-Based Design Optimization Using Single Loop Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hongbo Zhang, Younes Aoues, Hao Bai, Didier Lemosse, and Eduardo Souza de Cursi
991
Sensitivity Analysis of Load Application Methods for Shell Finite Element Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001 Wilson Javier Veloz Parra, Younes Aoues, and Didier Lemosse Transportation, Logistics, Resource Allocation and Production Management A Continuous Competitive Facility Location and Design Problem for Firm Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 Boglárka G.-Tóth, Laura Anton-Sanchez, José Fernández, Juana L. Redondo, and Pilar M. Ortigosa A Genetic Algorithm for Solving the Truck-Drone-ATV Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1023 Mahdi Moeini and Hagen Salewski
xxii
Contents
A Planning Problem with Resource Constraints in Health Simulation Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1033 Simon Caillard, Laure Brisoux Devendeville, and Corinne Lucet Edges Elimination for Traveling Salesman Problem Based on Frequency K5 s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1043 Yong Wang Industrial Symbioses: Bi-objective Model and Solution Method . . . . . . 1054 Sophie Hennequin, Vinh Thanh Ho, Hoai An Le Thi, Hajar Nouinou, and Daniel Roy Intelligent Solution System Towards Parts Logistics Optimization . . . . 1067 Yaoting Huang, Boyu Chen, Wenlian Lu, Zhong-Xiao Jin, and Ren Zheng Optimal Air Traffic Flow Management with Carbon Emissions Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078 Sadeque Hamdan, Oualid Jouini, Ali Cheaitou, Zied Jemai, Imad Alsyouf, and Maamar Bettayeb Scheduling Three Identical Parallel Machines with Capacity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089 Jian Sun, Dachuan Xu, Ran Ma, and Xiaoyan Zhang Solving the Problem of Coordination and Control of Multiple UAVs by Using the Column Generation Method . . . . . . . . . . . . . . . . . 1097 Duc Manh Nguyen, Frédéric Dambreville, Abdelmalek Toumi, Jean-Christophe Cexus, and Ali Khenchaf Spare Parts Management in the Automotive Industry Considering Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109 David Alejandro Baez Diaz, Sophie Hennequin, and Daniel Roy The Method for Managing Inventory Accounting . . . . . . . . . . . . . . . . 1119 Duisebekova Kulanda, Kuandykov Abu, Rakhmetulayeva Sabina, and Kozhamzharova Dinara The Traveling Salesman Drone Station Location Problem . . . . . . . . . . 1129 Daniel Schermer, Mahdi Moeini, and Oliver Wendt Two-Machine Flow Shop with a Dynamic Storage Space and UET Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139 Joanna Berlińska, Alexander Kononov, and Yakov Zinder Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149
Continuous Optimization
A Hybrid Simplex Search for Global Optimization with Representation Formula and Genetic Algorithm Hafid Zidani1,2(B) , Rachid Ellaia1 , and Eduardo Souza de Cursi2 1
2
LERMA, Mohammed V University - Engineering Mohammedia School, Rabat, BP. 765 Ibn Sina avenue, Agdal, Morocco [email protected],[email protected], [email protected] Laboratory of Mechanics of Normandy, National Institute for Applied Sciences Rouen, BP. 08, universit´e avenue, 76801 St Etienne du Rouvray Cedex, France [email protected]
Abstract. We consider the problem of minimizing a given function f : Rn −→ R on a regular not empty closed set S. When f attains a global minimum at exactly one point x∗ ∈ S, for a convenient random variable X and a convenient function g : R2 −→ R. In this paper, we propose to use this Representation Formula (RF) to numerically generate an initial population. In order to obtain a more accurate results, the Representation Formula has been coupled with other algorithms: • Classical Genetic Algorithm (GA). We obtain a new algorithm called (RFGA), • Genetic Algorithm using Nelder Mead algorithm at the mutation stage (GANM). We obtain a new algorithm called (RFGANM), • Nelder Mead Algorithm. We obtain a new algorithm called (RFNM). All these six algorithms (RF, GA, RFGA, GANM, RFGANM, RFNM) were tested on 21 benchmark functions with a complete analysis of the effect of different parameters of the methods. The experiments show that the RFNM is the most successful algorithm. Its performance was compared with the other algorithms, and observed to be the more effective, robust, and stable than the others. Keywords: Global optimization formula · Nelder Mead algorithm
1
· Genetic algorithm · Representation
Introduction
In the context of the resolution of engineering problems, many optimization algorithms have been proposed, tested and analyzed in the last decades. However, optimization in engineering remains an active research field, since many realworld engineering optimization problems remain very complex in nature and quite difficult to be solved by the existing algorithms. The existing literature presents intensive research efforts to solve some difficulty points, which remains c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 3–15, 2020. https://doi.org/10.1007/978-3-030-21803-4_1
4
H. Zidani et al.
still incompletely solved and for which only partial response has been obtained. Among these, we may cite: handling of non convexity - specially when optimization restrictions are involved, working with incomplete or erroneous evaluation of the functions and restrictions, increasing the number of optimization variables up to those of realistic designs in practical situations, dealing with non regular (discontinuous or non-differentiable) functions, determining convenient starting points for iterative methods. Floudas [5] We observe that the difficulties concerning non-convexity and the determination of starting points are connected: efficient methods for the optimization of regular functions are often deterministic and involve gradients, but depends strongly on the initial point - they can be trapped by local minima if a non convenient initial guess is used. Alternatively, methods based on the exploration of the space of the design variables usually involve a stochastic aspect - thus, a significant increase in the computational cost - and are less dependent of the initial choice, but improvements in their performance request combination with deterministic methods and may introduce a dependence on the initial choice. This last approach tends to the use of hybrid procedures involving both approaches and try to benefit from the best of each method - by these reasons, the literature about mixed stochastic/deterministic methods has grown in the last years [2]. Those hybrid algorithms perform better if the initial point belongs to an attraction area of the optimum. This shows the importance of the initial guesses in optimization algorithm [8]. Hence, we would like in this paper to use a representation formula to provide a convenient initial guess of the solution. Let S denote a closed bounded regular domain of the ndimensional Euclidean space Rn , and let f be a continuous function defined on S and taking its values on R. An unconstrained optimization problem can be formulated, in general, as follows: x∗ = Arg min f (x) , x∈S
(1)
In the literature, representation formulas have been introduced in order to characterize explicitly solutions of the problem 1. In general, these representations assume that S contains a single optimal point x∗ (but many local minima may exist on S). For instance, Pincus [9] has proposed the representation formula: x e−λf (x) dx x = lim S −λf (x) . λ→+∞ e dx S More recently, the original representation proposed by Pincus has been reformulated by Souza de Cursi [3] as follows: let X be a random variable taking its values on S and g : R2 −→ R be a function. If these elements are conveniently chosen, then E (X g (λ, f (X))) (2) x∗ = lim λ→+∞ E (g (λ, f (X))) The formulation of Pincus corresponds to g (λ, s) = e−λs , what is a convenient choice. The general properties of X and g are detailed, for instance, in [4]. An extension to infinite dimensional situations can be found in [4]. In this work,
A Hybrid Simplex Search for Global Optimization
5
we propose the use of the representation given by Eq. (3) hybridized with the Nelder Mead algorithm and a genetic algorithm, for the global optimization of multimodal functions.
2
Hybrid Simplex Search with Representation Formula and Genetic Algorithm
Hybrid methods have been introduced to keep the flexibility of the stochastic methods and the efficiency of the deterministic one. In our paper, the hybrid method for solving optimization problems is a coupling of the representation formula proposed by Pincus [9] and Nelder Mead algorithm and genetic algorithm. The representation formula is used first to find the region containing the global solution, based on the generating of finite samples of the random variables involved in the expression and an approximation of the limits. For instance, we may choose λ large enough and generate a sample by using standard random number generators. The generation of points can be done either using a normal distribution or a Gaussian one. In the case of Gaussian distribution, when a trial point generated lies outside S, it has been projected in order to get admissible point. In order to obtain a more accurate results, it is convenient to use the improvement by the following algorithms: – Classical Genetic Algorithm (GA). We obtain a new algorithm called (RFGA). – Genetic Algorithm using Nelder Mead algorithm at the mutation stage (GANM). We obtain a new algorithm called (RFGANM). – Nelder Mead Algorithm. We obtain a new algorithm called (RFNM). 2.1
Representation Formula
As previously observed, if f attains its global minimum at exactly one point x∗ on S, we have E (X g (λ, f (X))) x∗ = lim , (3) λ→+∞ E (g (λ, f (X))) where g : R2 −→ R is continuous and strictly positive, s −→ g : (λ, s) is strictly decreasing for any s ∈ f (S) and λ > 0, while X is a convenient random variable. These conditions are fullfilled, for instance, when X is uniformly distributed or gaussian and g (λ, s) = e−λs (what corresponds to the classical choice of Pincus). We use these particular choices in the sequel. A numerical implementation can be performed by taking a large fixed value of λ in order to represent the limit λ → +∞. In order to prevent an overflow, λ should be increased gradually up to the desired value and it may be convenient to use positive functions f (for instance, by adding a constant to the original f ). A finite sample of X is generated, according to a probability P - this consists
6
H. Zidani et al.
simply in generating N admissible points (x1 , x2 , ..., xN ) ∈ S - and estimations of the means are used to approximate the exact means, what leads to N ∗
x
x∗c
=
xi i=1 n
g (λ, f (xi ))
g (λ, f (xi ))
i=1
3
Test Bed
To demonstrate the efficiency and the accuracy of the hybrid algorithms, 21 typical benchmark functions of different levels of complexity and multimodality were chosen from the global optimization literature [6]. One hundred runs have been performed for each test function to estimate the probability of success of the used methods. The used test functions are: Bohachevsky 1 BO1, Bohachevsky 2 BO2, Branin function BR, Camel function CA, Cosine mixture function CO, DeJoung function DE, Goldstein and price function GO, Griewank function GR, Hansen function HN, Hartman 3 function HR3, Hartman 6 function HR6, Rastrigin function RA, Rosenbrock function RO, Shekel 5 SK5, Shekel 7 SK7, Shekel 10 SK10, Shubert 1 function SH1, Shubert 2 function SH2, Shubert 3 function SH3, Shubert 4 function SH4 and Wolfe nondifferentiable function WO.
4
Numerical Results
In this section we focus on the efficiency of the six algorithms, i.e. Representation Formula (RF), Classical Genetic Algorithm (GA), Representation Formula with GA (RFGA), Genetic Algorithm using Nelder Mead algorithm at the mutation stage (GANM), Representation Formula with GA and Nelder Mead (RFGANM), and Representation Formula with Nelder Mead (RFNM). A series of experiments have been done to make some performance analysis about them. To avoid attributing the optimization results to the choice of a particular conditions and to conduct fair comparisons, we have performed each test 100 times, starting from various randomly selected points in the hyper rectangular search domain. The used parameters in genetic algorithm are: Population size: from 2 to 50, the mutation rate is set to 0.2, the selection method is rank weighting, the stopping criteria are maximum iteration (set to 2000 for GA) and the maximum number of continuous iterations without improving the solution, and it is set to 1000 for GA. Concerning NM we adopted the standard parameters recommended by the authors, the used stoping criteria are: Maximum function evaluation, maxf un = 50000, maximum of iteration, maxiter = 10000, termination tolerance on the function value tolf = 10−5 and termination on the tolerance on xtolx = 10−6 .
A Hybrid Simplex Search for Global Optimization
7
Extensive experimentations concerning the effect of different parameters have been performed: influence of the Pincus function, influence of the population size for GA, influence of the sample size for RF and comparison with others methods. Because of the limitation of space, only a few experiments are presented, chosen in order to illustrate significant aspects. The following abbreviations are introduced to lighten the text: TestF: Test Function; Dim: Dimension; PopGA: Genetic algorithm population size; SS: Sample size used in RF; SR: Success rate; SD: Standard deviation; CPUT: CPU Time (in seconds); NEvalF: Number of function evaluations; The reported results are in terms of the rate of successful optimizations (SR), the standard deviation, the CPU time and the average of the function evaluation number. The term SR is the number of successful run i.e. when the algorithm generates a solution with a required accuracy, where the ’required accuracy’ is a given maximum value that is calculated as the absolute difference between the solution found and the global optimum divided by the global solution (when it is non zero). The chosen accuracy for all the tests is 10−4 . 4.1
Influence of the Pincus Function
In the representation formula, the Pincus expression corresponds to g (λ, s) = e−λs , which is a convenient choice. In our experiments, four other descent functions (continuous and strictly decreasing) have been used for solving the bench−λ 1 , es3 , and 10(−λ s) . mark functions with the proposed algorithms: λ 1s3 , λ ln(s) The tests show that the choice of the function has no significant effect on the solution quality and algorithms performance, a small difference in terms of execution time has been observed. 4.2
Influence of the Population Size Used in GA
To examine the effect of the population size on the solution quality as well as the computational efforts for acquiring the optimal solution using GA, GANM, RFGA and RFGANM, eight levels of size (2, 4, 6, 8, 12, 16, 20 and 50) are examined and the experimental results are reported in Table 1 for GA and GANM methods and Table 2 for RFGA and RFGANM. The sample size for RF depends on the function dimension: SS = 60 for Dim = 1 or 2, SS = 100 for Dim = 3, SS = 300 for Dim = 4 or 5, and SS=500 for Dim =6 or 10. One of the general observations is that as the population size increase, all of them require more time (more number of function evaluations) and tend to find better solution, as indicated by smaller standard deviation and bigger success rate. In the case of more complex problem (case Rosenbrock functions) the GA and RFGA failed to find a solution with the required accuracy (SR = 0%) even for P opGA = 200. A slight improvement is obtained by the representation formula (RFGA). A worsening of the solution obtained by GA is observed for GANM (except for SK10, BO2, GR and RO). The results obtained by RFGANM are the best in term of success rate (SR=100% for PopGA=50) with additional number of function evaluation.
8
H. Zidani et al. Table 1. Influence of the population size for GA and GANM TestF Dim PopGA GA
GANM
SR
SD
CPUT NEvalF SR
SD
CPUT NEvalF
SH4
4
12
60%
2,13E-02
0,35
23 613
4%
2,68E-01
14,88
213 782
SH4
4
20
94%
2,13E-02
0,43
39 860
19%
3,16E-01
25,05
360 290
SH4
4
50
100% 5,77E-06
0,72
98 557
77%
1,11E-01
74,01
1 060 096
SK10
4
12
27%
3,52E-01
0,80
23 812
83%
2,08E-01
68,77
218 926
SK10
4
20
33%
3,33E-01
0,88
39 762
94%
1,22E-01
120,73 381 753
SK10
4
50
61%
3,31E-01
1,12
95 898
100% 5,64E-08
341,22 1 079 693
BO2
2
12
24%
1,07E-01
0,23
20 562
100% 5,72E-11
0,29
6 359
BO2
2
20
53%
1,06E-01
0,25
31 919
100% 2,68E-11
0,48
10 824
BO2
2
50
85%
7,61E-02
0,37
72 035
100% 3,47E-12
1,24
28 147
CA
2
12
93%
1,00E-04
0,24
20 489
98%
1,11E-01
3,39
64 233
CA
2
20
100% 1,39E-05
0,28
33 515
100% 3,72E-08
5,74
109 435
CA
2
50
100% 6,06E-06
0,43
77 729
100% 3,72E-08
15,85
303 313
CO
4
12
100% 1,16E-05
0,30
23 818
29%
2,46E-01
1,87
35 774
CO
4
20
100% 1,22E-06
0,35
39 336
65%
2,02E-01
1,91
38 478
CO
4
50
100% 1,09E-06
0,56
93 199
100% 6,88E-08
2,41
54 340
GR
5
12
0%
1,87E-01
0,33
23 998
93%
4,17E-02
24,97
376 071
GR
5
20
1%
1,75E-01
0,39
39 884
99%
1,46E-02
42,64
642 752
GR
5
50
44%
1,18E-01
0,67
102 050 100% 1,95E-13
113,94 1 716 703
HN
2
12
81%
5,06E-02
0,27
21 712
52%
2,62E-01
4,69
80 671
HN
2
20
94%
3,85E-02
0,32
36 018
76%
1,29E-01
8,29
142 888
HN
2
50
100% 3,19E-06
0,47
82 607
99%
1,76E-02
22,37
384 416
RA
2
12
27%
5,05E-03
0,22
19 169
51%
1,24E+00 1,35
26 337
RA
2
20
64%
3,62E-04
0,26
32 278
67%
6,11E-01
1,57
31 274
RA
2
50
93%
1,11E-04
0,34
67 520
98%
1,40E-01
1,61
35 270
RO
10
12
0%
3,45E+01 0,33
24 012
100% 7,88E-12
44,78
683 973
RO
10
20
0%
3,37E+01 0,40
40 020
100% 5,49E-12
75,83
1 161 545
RO
10
50
0%
2,92E+01 0,69
102 050 100% 2,99E-12
WO
2
12
1%
4,63E-02
0,23
20 512
100% 0,00E+00 2,04
39 624
WO
2
20
6%
2,59E-02
0,25
31 105
100% 0,00E+00 3,55
69 076
WO
2
50
43%
1,07E-02
0,37
70 768
100% 0,00E+00 9,42
183 422
4.3
200,54 3 071 508
Influence of the Sample Size for RF
To investigate the effect of the simple size used in the representation formula on searching quality of the hybrid algorithms, we choose six level of sample sizes (SS = 60, 100, 500, 1000, 2000 and 5000) and we set the value of population size to 6. The experiments results are reported in Table 3. Because of the limitation of space, only 10 test functions are presented, chosen in order to illustrate significant aspects. We observe that the success rate and the number of function evaluation increase with the sample size. The RF method failed in almost the tests (SR = 0% in the most cases) except in the case of HR3 function (SR = 100% f or SS =
A Hybrid Simplex Search for Global Optimization
9
Table 2. Influence of the population size for RFGA and RFGANM TestF Dim PopGA RFGA
RFGANM
SR
SD
CPUT NEvalF SR
SD
CPUT NEvalF
SH4
4
12
50%
5,08E-02
0,57
41 613
90%
6,42E-02
14,82
228 329
SH4
4
20
89%
2,13E-02
0,80
69 809
98%
3,00E-02
25,57
391 947
SH4
4
50
100% 7,11E-06
1,63
172 398 100% 5,05E-08
70,56
1 072 064
SK10
4
12
45%
2,64E-01
1,12
41 626
98%
68,23
234 259
SK10
4
20
84%
1,95E-01
1,41
69 439
100% 5,64E-08
SK10
4
50
94%
1,22E-01
2,45
170 297 100% 5,64E-08
342,74 1 155 194
BO2
2
12
34%
9,73E-02
0,26
24 310
98%
3,07E-02
0,36
10 864
BO2
2
20
62%
8,04E-02
0,29
38 068
100% 3,02E-11
0,50
16 495
BO2
2
50
92%
3,74E-02
0,43
82 245
100% 4,28E-12
1,27
41 913
CA
2
12
96%
4,03E-05
0,27
23 890
100% 3,72E-08
3,35
66 514
CA
2
20
99%
2,33E-05
0,31
38 013
100% 3,72E-08
5,72
114 142
CA
2
50
100% 4,52E-06
0,49
84 487
100% 3,72E-08
15,76
314 430
CO
4
12
100% 1,52E-05
0,41
41 790
100% 6,88E-08
0,58
28 515
CO
4
20
100% 1,36E-06
0,56
69 246
100% 6,88E-08
0,97
47 799
CO
4
50
100% 4,89E-07
1,07
170 920 100% 6,88E-08
2,52
122 363
GR
5
12
0%
1,76E-01
0,47
42 012
74%
9,71E-02
25,04
392 890
GR
5
20
0%
1,62E-01
0,64
70 011
88%
5,05E-02
42,92
673 123
GR
5
50
46%
1,24E-01
1,27
177 050 100% 2,13E-13
114,24 1 787 308
HN
2
12
97%
4,53E-05
0,32
25 478
100% 6,37E-08
4,69
83 688
HN
2
20
100% 1,31E-05
0,39
40 485
100% 6,37E-08
8,08
144 040
HN
2
50
100% 1,68E-06
0,65
93 454
100% 6,37E-08
22,25
393 771
RA
2
12
26%
1,43E-03
0,25
23 131
95%
2,18E-01
0,39
11 430
RA
2
20
53%
5,02E-04
0,29
36 968
98%
1,40E-01
0,56
17 506
RA
2
50
95%
7,22E-05
0,44
83 305
100% 1,41E-11
1,28
42 039
RO
10
12
0%
1,97E+00 0,60
54 012
100% 8,42E-12
43,90
696 293
RO
10
20
0%
9,84E-01
0,85
90 020
100% 4,57E-12
74,82
1 188 783
7,16E-02
117,89 401 052
RO
10
50
0%
6,08E-01
1,80
227 050 100% 2,36E-12
WO
2
12
0%
6,09E-02
0,27
23 663
100% 0,00E+00 2,05
197,37 3 130 515 42 744
WO
2
20
6%
2,96E-02
0,30
35 050
100% 0,00E+00 3,52
73 376
WO
2
50
46%
6,06E-03
0,50
82 856
100% 0,00E+00 9,55
198 171
5000), CA (SR = 97% f or SS = 5000), SH1 (SR = 100% f or SS = 60) and SH2 (SR = 100% f or SS = 5000) with an accuracy of 10−3 . Regarding the RFGANM algorithm, the results are improved (SR = 100%) as soon as SS ≥ 1000 except for SH4, SK10 and GR. A success rate of 100% is obtained for the RFNM even for sample size SS = 60. 4.4
Comparison Between Methods
The results presented in Tables 4 and 5 are based on P opGA = 12 for GA and GANM algorithms, and P opGA = 6 for the others. The sample size chosen for
10
H. Zidani et al. Table 3. Influence of the sample size used in RF
TestF
Dim
RF Size
RF SR
RF Neval
RFGA SR
RFGA Neval
RFGANM SR
RFGANM Neval
RFNM SR
RFNM Neval
SH4
4
60
0%
1 800
7%
15 460
38%
103 489
43%
7 981
SH4
4
100
0%
3 000
9%
16 892
34%
104 957
47%
9 137
SH4
4
500
0%
15 000
8%
28 885
74%
117 804
100%
21 010
SH4
4
5000
0%
150 000
10%
163 800
97%
252 327
100%
155 899
SK10
4
60
0%
1 800
0%
15 470
78%
108 783
100%
8 002
SK10
4
100
0%
3 000
2%
16 669
72%
110 233
100%
9 209
SK10
4
500
0%
15 000
4%
28 783
91%
121 826
100%
21 198
SK10
4
5000
0%
150 000
1%
163 758
99%
255 924
100%
155 946
BO2
2
60
0%
1 800
10%
14 320
83%
9 338
100%
4 421
BO2
2
100
0%
3 000
9%
15 477
78%
11 953
100%
5 532
BO2
2
500
0%
15 000
14%
26 791
92%
20 139
100%
17 332
BO2
2
5000
0%
150 000
17%
160 866
100%
152 901
100%
152 160
CA
2
60
1%
1 800
68%
13 957
100%
32 760
100%
4 338
CA
2
100
1%
3 000
71%
15 106
100%
33 982
100%
5 565
CA
2
500
1%
15 000
71%
27 101
100%
45 952
100%
17 481
CA
2
5000
11%
150 000
73%
160 307
100%
180 812
100%
152 405
CO
4
60
0%
1 800
58%
15 421
92%
8 142
96%
8 990
CO
4
100
0%
3 000
50%
16 752
93%
9 167
96%
10 219
CO
4
500
0%
15 000
43%
28 507
100%
20 130
100%
22 540
CO
4
5000
0%
150 000
52%
163 426
100%
155 101
100%
157 744
GR
5
60
0%
1 800
0%
15 685
43%
183 996
97%
27 227
GR
5
100
0%
3 000
0%
16 866
32%
185 066
100%
28 212
GR
5
500
0%
15 000
0%
28 872
49%
196 576
100%
40 228
GR
5
5000
0%
150 000
0%
163 962
33%
330 770
100%
172 940
HN
2
60
1%
1 800
63%
14 338
100%
41 534
100%
4 008
HN
2
100
2%
3 000
63%
15 387
100%
42 604
100%
5 252
HN
2
500
1%
15 000
57%
27 103
100%
54 571
100%
17 107
HN
2
5000
14%
150 000
72%
161 323
100%
189 857
100%
152 137
RA
2
60
0%
1 800
7%
13 856
64%
10 726
93%
4 054
RA
2
100
0%
3 000
13%
15 105
79%
9 321
100%
5 256
RA
2
500
0%
15 000
8%
26 663
96%
18 566
100%
17 306
RA
2
5000
1%
150 000
5%
160 487
100%
152 868
100%
152 329
RO
10
60
0%
1 800
0%
15 806
99%
326 193
96%
133 508
RO
10
100
0%
3 000
0%
16 979
99%
328 651
100%
133 634
RO
10
500
0%
15 000
0%
29 006
100%
341 868
100%
146 056
RO
10
5000
0%
150 000
0%
163 858
100%
477 806
100%
281 391
WO
2
60
0%
1 800
0%
14 399
100%
21 077
100%
6 467
WO
2
100
0%
3 000
0%
15 090
100%
22 260
100%
7 567
WO
2
500
0%
15 000
0%
27 260
100%
34 235
100%
19 385
WO
2
5000
0%
150 000
0%
160 978
100%
169 121
100%
154 137
the test functions depends on its dimension: SS = 60 for Dim = 1 or 2, 100 for Dim = 3, 500 for Dim = 4 or 5, 1000 for Dim = 6 and 2000 for Dim = 10. The Tables 4 and 5 summarize the results (i.e. SR, SD, CPUT and NEvalF) obtained from the 100 runs of the six algorithms, for the 21 benchmark functions.
BO2
BR
CA
CO
DE
GO
GR
HN
HR3
HR6
RA
RO
SH1
SH2
SH3
SH4
SK10 4
SK5
SK7
WO
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
4
4
4
3
2
1
10
2
6
3
2
5
2
3
4
2
2
2
1,00E-04
6,22E-04
5,06E-02
1,87E-01
2,23E-04
2,30E-05
1%
21%
25%
27%
60%
69%
95%
99%
0%
27%
66% 0,22
0,33
0,31
0,27
0,33
0,33
0,27
0,30
0,24
0,25
0,23
0,22
4,63E-02
3,29E-01
3,39E-01
3,52E-01
2,13E-02
5,46E-02
3,81E-05
1,97E-05
0,23
0,65
0,54
0,80
0,35
0,32
0,26
0,16
20 512
23 726
23 623
23 812
23 613
23 017
20 779
17 519
24 012
19 169
24 012
22 829
21 712
23 998
22 190
23 214
23 818
20 489
22 138
20 562
19 653
2,46E-01
1,11E-01
2,62E-01
4,17E-02
1,58E-01
1,12E-01
2,08E-01
2,68E-01
3,28E-01
2,69E-01
6,66E-02
51,66
40,24
68,77
14,88
9,75
4,77
1,99
44,78
100% 0,00E+00 2,04
90%
96%
83%
4%
25%
62%
96%
32,29
9,22
4,69
24,97
1,24E+00 1,35
5,05E-03
100% 7,88E-12
51%
98%
0,36
1,87
3,39
4,49
0,29
0,29
0%
0%
0%
1%
0%
0%
0%
1%
0%
0%
0%
217 461 0%
217 139 0%
218 926 0%
213 782 0%
39 624
SD
0,01
0,02
0,10
0,01
0,01
0,01
0,01
0,27
0,02
0,02
0,23
0,23
0,20
0,15
0,05
0,02
3,14E+00 0,02
1,21E-01
8,64E-02
1,05E-01
1,40E-01
1,41E-01
5,03E-02
0,03
3,69E+00 2,69
6,81E-01
2,02E-02
4,22E-03
5,52E-02
1 800
15 000
15 000
15 000
15 000
3 000
1 800
1 800
60 000
1 800
30 000
3 000
1 800
15 000
1 800
3 000
15 000
1 800
1 800
1 800
1 800
CPUT NEvalF 0,01
2,74E+01 0,14
3,79E-01
1,02E-01
1,05E-01
4,65E-02
6,27E-02
1,74E-01
2,66E-01
67% 1,38E-04 149 699 0%
81 145
41 092
683 973 0%
26 337
397 117 0%
131 457 0%
80 671
376 071 0%
39 874
8 530
35 774
64 233
86 543
6 359
6 422
RF CPUT NEvalF SR
1,27E+00 2,60
100% 6,43E-08
52%
93%
98%
100% 2,76E-12
29%
98%
100% 5,11E-08
100% 5,72E-11
100% 4,97E-11
GANM CPUT NEvalF SR SD
3,45E+01 0,33
5,05E-03
1,71E-02
100% 1,58E-06
81%
0%
62%
99%
100% 1,16E-05
93%
50%
24%
1,07E-01
5,09E-04
52%
BO1
1
2
SD
Num FTest Dim GA SR
Table 4. Comparison between methods: GA, GANM and RF
A Hybrid Simplex Search for Global Optimization 11
BR
CA
CO
DE
GO
GR
HN
HR3
HR6
RA
RO
SH1
SH2
SH3
SH4
SK10 4
SK5
SK7
WO
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
4
4
4
3
2
1
10
2
6
3
2
5
2
3
4
2
2
2
BO2
2
2
BO1
1
1,76E-02 0,27
2,41E-01 0,43
1,11E-03 0,32
5,48E-05 0,25
2,20E-04 0,36
2,99E-04 0,23
2,63E-03 0,22
1,09E-01 0,23
6,64E-03 0,23
0%
2%
3%
4%
8%
36%
51%
88%
0%
7%
35%
1,17E-01 0,24
2,72E-01 0,84
2,57E-01 0,74
2,57E-01 0,96
7,18E-02 0,46
3,66E-02 0,33
4,18E-04 0,27
9,12E-05 0,15
8,02E-01 2,98
9,79E-03 0,22
1,74E-02 0,58
14 399
28 693
28 858
28 783
28 885
16 512
14 308
9 942
73 994
13 856
43 874
16 290
14 338
28 872
14 649
16 260
28 507
13 957
13 463
14 320
13 842 8,88E-02
1,65E-01
0,19
0,33
1,65
2,19
0,39
0,29
2,04E-01
3,99E-01
5,03E-01
1,60E-02
1,51E-01
1,88E-01
1,51E-01
1,38E-01
7,48E-02
25,71
20,33
34,07
7,27
4,72
2,29
1,00
23,75
0,46
16,81
4,37
2,33
12,12
100% 0,00E+00 1,01
91%
83%
91%
74%
93%
100% 3,48E-08
100% 4,91E-08
99%
64%
73%
100% 6,43E-08
100% 6,37E-08
49%
5,05E-02
100% 6,43E-08
100% 6,37E-08
93%
2,41E-03
2,55E-01
99%
2,16
0,56
0,29
0,16
0,08
11,24
0,13
1,76
0,33
0,16
1,77
21 077
1,70 100% 0,00E+00 0,26
121 771 100% 7,07E-08
122 398 100% 0,00E+00 1,37
121 826 100% 5,64E-08
1,05E-01
3,38E-02
100% 3,48E-08
100% 4,91E-08
117 804 83%
75 502
40 642
21 925
386 100 98%
10 726
0,29
0,53
0,15
0,14
0,15
0,15
6 467
21 206
21 129
21 198
21 010
6 771
4 047
2 756
190 569
4 054
48 337
7 570
4 008
40 228
4 212
8 271
22 540
4 338
4 202
4 421
4 349
CPUT NEvalF
100% 0,00E+00 0,17
100% 2,25E-12
100% 6,88E-08
100% 3,72E-08
100% 5,11E-08
231 152 100% 2,00E-08
65 699
41 534
4,13E-02
100% 4,49E-11
99%
196 576 90%
20 873
7 073
20 130
32 760
43 712
9 338
7 460
RFNM CPUT NEvalF SR SD
100% 0,00E+00 1,27
100% 4,99E-12
100% 6,88E-08
100% 3,72E-08
100% 5,11E-08
83%
82%
RFGANM CPUT NEvalF SR SD
100% 1,34E-05 0,30
63%
0%
25%
78%
43%
68%
27%
10%
19%
Num FTest Dim RFGA SR SD
Table 5. Comparison between methods: RFGA, RFGANM and RFNM
12 H. Zidani et al.
A Hybrid Simplex Search for Global Optimization
13
From Tables 4 and 5, we notice that: – The success rates for GA are generally modest (4% to 70% and SR=0% for GR and RO), except in the case of SH1 and HR3 (80% and 90% respectively). – The results are improved for GANM, for witch the success rate is 100% for 7 test functions, and SR ≥ 90% for 6 test functions. This results are similar to those obtained in [1]. – The representation formula RF has failed for almost all tests, except for SH1 where SR = 67%(SR = 100% for an accuracy of 10−3 ). We notice that the number of function evaluation increased considerably. – The accuracy of RFGA is lower than GANM, with a larger number of function evaluations. The total success is obtained only for 5 functions (SR ≥ 99%). The number of function evaluation is similar to the RFGA method. – In view of the algorithm’s effectiveness and efficiency on the whole, RFGANM hybrid approach remains the most competitive to RFNM than the other methods. Indeed, the SR is 100% for 12 test functions, and SR ≥ 82% for 18 functions (SR = 100% for P opGA = 50, see Table 2 for all the test functions). The number of function evaluation is similar to the RFGA method. – Concerning the RFNM, the experimental data obtained from the 21 test functions shows high accuracy for all the test problems, with a 100% rate of successful performance on all examples, with smaller number of function evaluations than RFGANM and RFGA. 4.5
Comparison with Other Methods
In this section, the experiments are aimed to compare the performance of RFNM against five global optimization methods listed below (Table 6). In order to make the results comparable, all the conditions are set to the same values (100 runs and the accuracy of SR is set to 10−4 ). In previous tests, we used the same parameters for all test functions. The results showed that the RFNM method is robust (SR = 100%). To compare it to the algorithms listed in the table, we took appropriate settings for each function. Table 6. Comparison with other methods, list of methods Methods
References
Representation Formula with Nelder Mead (RFNM) This work Enhanced Continuous Tabu Search (ECTS)
[2]
Staged Continuous Tabu Search (SCTS)
[7]
Continuous Hybrid Algorithm (CHA)
[7]
Differential Evolution (DE)
[7]
LPτ NM Algorithm (LPτ NM)
[7]
14
H. Zidani et al.
Table 7. Comparison with other methods in terms of success rate and number of function evaluations TestF(Dim) LPτ NM NEvalF(SR)
SCTS NEvalF(SR)
CHA NEvalF(SR)
DE NEvalF(SR)
RFNM NEvalF(SR)
SH2(2)
303(85%)
370(100%)
345(100%)
4498(95%)
677(100%)
GO(2)
182(100%)
231(100%)
259(100%)
595(100%)
286(100%)
BR(2)
247(100%)
245(100%)
295(100%)
807(100%)
165(100%)
HR3(3)
292(100%)
548(100%)
492(100%)
679(100%)
312(100%)
SK10(4)
1079(96%)
898(75%)
635(85%)
-(-%)
1680(100%)
SK7(4)
837(100%)
910(80%)
620(85%)
3064(100%)
1059(100%)
SK5(4)
839(100%)
825(75%)
698(85%)
3920(100%)
874(100%)
HR6(6)
1552(100%)
1520(100%)
930(100%)
-(-%)
2551(100%)
RO(10)
9188(88%)
15720(85%)
14532(83%)
54134(100%)
42572(100%)
Table 7 shows that RFNM performs better then the other algorithms for three functions (BR, SK10, RO) with a with a success rate of 100, and with some additional number of function evaluation in some other cases.
5
Conclusion
In this paper, we proposed a new approach based on a representation formula, for solving global optimization problems. Simulated experiments for the optimization of nonlinear multimodal and nondifferentiable functions using this representation formula, hybridized with Genetic algorithm and Nelder Mead algorithms, showed that RFNM is superior to the other methods in the success of finding the global optimum. the RFGANM remains robust for high value of population size in GA, with a larger number of evaluating functions. Extensive experimentations concerning the effect of different parameters have been performed: Influence of the choice of the probability distribution, influence of the Pincus function, influence of the population size for GA, and influence of the sample size for RF. A comparison with other algorithms suggested in the literature shows that the RFNM is in general much more superior in efficiency. Further research will include other stochastic algorithms such as particle swarm optimization algorithm with the representation formula.
References 1. Chelouah, R., Siarry, P.: Genetic and Nelder-Mead algorithms hybridized for a more accurate global optimization of continuous multiminima functions. Eur. J. Oper. Res. 148(2), 335–348 (2003) 2. Chelouah, R., Siarry, P.: A hybrid method combining continuous tabu search and nelder-mead simplex algorithms for the global optimiza tion of multiminima functions. Eur. J. Oper. Res. 161(3), 636–654 (2005)
A Hybrid Simplex Search for Global Optimization
15
3. Souza de Cursi, J.: Representation of solutions in variational calculus. In: Variational Formulations in Mechanics: Theory and Applications, pp. 87–106 (2007) 4. Souza de Cursi, J., El Hami, A.: Representation of solutions in continuous optimization. Int. J. Simul. Multidiscip. Design Optim. (2009) 5. Floudas, C.A., Pardalos, P.M. (eds.).: Encyclopedia of Optimization. Springer, Boston (2009) 6. Gaviano, M., Kvasov, D., Lera, D., Sergeyev, Y.D.: Software for generation of classes of test functions with known local and global minima for global optimization. ACM Trans. Math. Softw. 29(4), 469–480 (2003) 7. Georgievaa, A., Jordanov, I.: A hybridmeta-heuristic for global optimisation using low-discrepancy sequences of points. Comput. Oper. Res. 37(3), 456–469 (2010) 8. Ivorra, B., Mohammadi, B., Ramos, A.M., Redont, I.: Optimizing initial guesses to improve global minimization. In: Pre-Publication of the Department of Applied Mathematics MA-UCM-UCM-No 2008-06 - Universidad Complutense de Madrid, 3 Plaza de Ciencias, 28040, Madrid, Spain. p. 17 (2008) 9. Pincus, M.: A closed formula solution of certain programming problems. Oper. Res. 16(3), 690–694 (1968)
A Population-Based Stochastic Coordinate Descent Method Ana Maria A. C. Rocha1,2(B) , M. Fernanda P. Costa3,4 , and Edite M. G. P. Fernandes1 1
2
ALGORITMI Center, University of Minho, 4710-057 Braga, Portugal {arocha,emgpf}@dps.uminho.pt Department of Production and Systems, University of Minho, Braga, Portugal 3 Centre of Mathematics, University of Minho, 4710-057 Braga, Portugal [email protected] 4 Department of Mathematics, 4800-058 Guimar˜ aes, Portugal
Abstract. This paper addresses the problem of solving a bound constrained global optimization problem by a population-based stochastic coordinate descent method. To improve efficiency, a small subpopulation of points is randomly selected from the original population, at each iteration. The coordinate descent directions are based on the gradient computed at a special point of the subpopulation. This point could be the best point, the center point or the point with highest score. Preliminary numerical experiments are carried out to compare the performance of the tested variants. Based on the results obtained with the selected problems, we may conclude that the variants based on the point with highest score are more robust and the variants based on the best point less robust, although they win on efficiency but only for the simpler and easy to solve problems. Keywords: Global optimization
1
· Stochastic coordinate descent
Introduction
The optimization methods for solving problems that have big size of data, like large-scale machine learning, can make use of classical gradient-based methods, namely the full gradient, accelerated gradient and the conjugate gradient, classified as batch approaches [1]. Using intuitive schemes to reduce the information data, the stochastic gradient approaches have shown to be more efficient than the batch methods. An appropriate approach to solve this type of problems is through coordinate descent methods. Despite the fact that they were the first optimization methods to appear in the literature, they have received much attention recently. Although the global optimization (GO) problem addressed in this paper has not a big size of data, the herein proposed solution method is iterative, stochastic and relies on a population of candidate solutions at each iteration. Thus, a large amount of calculations may be required at each iteration. To improve efficiency, we borough some of the ideas that are present in c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 16–25, 2020. https://doi.org/10.1007/978-3-030-21803-4_2
Population Based Stochastic Method
17
machine learning techniques and propose a population-based stochastic coordinate descent method. This paper comes in the sequence of the work presented in [2]. We consider the problem of finding a global solution of a bound constrained nonlinear optimization problem in the following form: min f (x) subject to x ∈ Ω,
(1)
where f : Rn → R is a nonlinear function and Ω = {x ∈ Rn : −∞ < li ≤ xi ≤ ui < ∞, i = 1, . . . , n} is a bounded feasible region. We assume that the objective function f is differentiable, nonconvex and may possess many local minima in the set Ω. We assume that the optimal set X ∗ of the problem (1) is nonempty and bounded, x∗ is a global minimizer and f ∗ represents the global optimal value. To solve the GO problem (1), a stochastic or a deterministic method may be selected. A stochastic method provides a solution, in general in a short CPU time, although it may not be globally optimal. On the other hand, a deterministic method is able to compute an interval that contains the global optimal solution, but requires a much larger computational effort [3]. To generate good solutions with less computational effort and time, approximate methods or heuristics may be used. Some heuristics use random procedures to generate candidate solutions and perform a series of operations on those solutions in order to find different and hopefully better solutions. They are known as stochastic heuristics. A method for GO has two main goals. One intends to explore the search domain for the region where the global optimal solution lies, the other intensifies the search in a vicinity of a promising region in order to compute a high quality approximation. This paper aims to present a practical study involving several variants of a population-based stochastic method for solving the GO problem (1). Since our goal is to make the method robust and as efficient as possible, a strategy based on coordinate descent directions is applied. Although a population of candidate solutions/points of large size is initially generated, only a very small subset of those points is randomly selected, at each iteration – henceforward denoted as subpopulation – to provide an appropriate approximation, at the end of each iteration. Since robustness of the method is to be privileged, the point of each subpopulation that is used to define the search direction to move each point of the subpopulation is carefully chosen in order to potentiate both exploration and exploitation abilities of the method. The point with the highest score of the subpopulation is proposed. A comparison with the best and the center points is also carried out. This paper is organized as follows. Section 2 briefly presents the coordinate descent method and Sect. 3 describes the herein proposed stochastic coordinate descent method when applied to a population of points. Finally, Sect. 4 contains the results of our preliminary numerical experiments and we conclude the paper with the Sect. 5.
18
2
A. M. A. C. Rocha et al.
Coordinate Descent Method
This section briefly presents the coordinate descent method (CDM) and its stochastic variant. The CDM operates by taking steps along the coordinate directions [1,4]. Hence, the search direction for minimizing f from the iterate xk , at iteration k, is defined as (2) dk = −∇ik f (xk )eik where ∇ik f (·) represents the component ik of the gradient of f , eik represents the ik th coordinate vector for some index ik , usually chosen by cycling through {1, 2, . . . , n}, and xik is the ik th component of the vector x ∈ Rn . For a positive step size, αk , the new approximation, xk+1 , differs from xk only in the ik th component and is computed by xk+1 = xk + αk dk . Note that the direction shown in (2) might not be a negative directional derivative for f at xk . When the index ik to define the search direction and the component of xk to be adjusted is chosen randomly by Uniform distribution (U) on {1, 2, . . . , n}, with or without replacement, the CDM is known as a stochastic CDM. This type of method has attracted the attention of the scientific community because of their usefulness in data analysis and machine learning. Applications are varied, in particular in support vector machine problems [5].
3
A Population-Based Stochastic Coordinate Descent Method
At each iteration of a population-based algorithm, a set of points is generated aiming to explore the feasible region for a global optimum. Let |P | denote the number of points in the population, where xi ∈ Rn represents the point with index i of the population, where i ∈ P = {1, 2, . . . , |P |}. The likelihood is that the greater the |P | the better is the exploration feature of the algorithm. However, to handle and evaluate the objective f for a large number of points is time consuming. In order to improve the efficiency of the method, the number of function evaluations must be reduced. Thus, the method is based on a random selection of points from the original population, at each iteration k – herein designated by the subpopulation k. Thus, at each iteration, a subpopulation of points (of small size) is selected to be evaluated and potentially moved in direction to the global optimum. This random selection uses the U either with or without replacement to select the indices for the subpopulation from the set {1, 2, . . . , |P |}. Let P1 , P2 , . . . , Pk , . . . be the sets of indices of the subpopulation randomly chosen from P . At each iteration k there is a special point that is maintained for the next iteration. This point is the best point of the current subpopulation. This way, for k > 1, the randomly selected set Pk does not include the index of the best point from the previous subpopulation and the size of the subpopulation k is |Pk | + 1. We note that the size of the subpopulation at the first iteration is |P1 |. The subsets of indices when generating the subpopulation satisfy the following
Population Based Stochastic Method
19
conditions: (i) P1 ⊂ P and Pk+1 ⊂ P \{kb } for k ≥ 1; (ii) |P1 | |P |; (iii) |P2 | + 1 ≤ |P1 | and |Pk+1 | ≤ |Pk | for k > 1; where kb is the index of the best + = Pk+1 ∪ {kb } for point of the subpopulation k. Onwards P1+ = P1 and Pk+1 k ≥ 1 are used for simplicity [2]. We now show how each point xkj (j = 1, . . . , |Pk+ |) of the subpopulation k is moved. For each point, a search direction is generated. Thus, the point xkj may be moved along the direction, dkj , as follows: xkj = xkj + αkj dkj
(3)
where 0 < αkj ≤ 1 is the step length computed by a backtracking strategy. The direction dkj used to move the point xkj is defined by dkj = −∇i f (xkH )ei
(4)
where ei represents the ith coordinate vector for some index i, randomly selected from the set {1, 2, . . . , n}. We note that the search direction is along a component of the gradient computed at a special point of the subpopulation k, xkH , further on denoted by the point with the highest score. Since dkj might not be a descent direction for f at xkj , the movement according to (3) is applied only if dkj is descent for f at xkj . Otherwise, the point xkj is not moved. Whenever the new position of the point falls outside the bounds, a projection onto Ω is carried out. The index of the point with highest score kH , at iteration k, satisfies kH = arg
max
j=1,...,|Pk+ |
ˆ k ) − fˆ(xk ) s(xkj ) where s(xki ) = D(x i i
(5)
ˆ k ), from xk to is the score of the point xki [6]. The normalized distance D(x i i the center point of the k subpopulation, and the normalized objective function value fˆ(xki ) at xki are defined by ˆ k )= D(x i
D(xki ) − minj=1,...,|P + | D(xkj ) k
maxj=1,...,|P + | D(xkj ) − minj=1,...,|P + | D(xkj ) k
and fˆ(xki ) =
(6)
k
f (xki ) − minj=1,...,|P + | f (xkj ) k
maxj=1,...,|P + | f (xkj ) − minj=1,...,|P + | f (xkj ) k
(7)
k
respectively. The distance function D(xki ) (to the center point x ¯k ) is measured ¯k 2 and the center point is evaluated as follows: by xki − x +
|Pk | 1 xkj . x ¯k = + |Pk | j=1
(8)
We note here that the point with the highest score in each subpopulation is the point that lies far away from the center of the region defined by its points (translated by x ¯) that has the lowest function value. This way, looking for the
20
A. M. A. C. Rocha et al.
largest distance to x ¯, the algorithm potentiates its exploration ability, and choosing the one with lowest f value, the algorithm reenforces its local exploitation capability. For each point with index kj , j = 1, . . . , |Pk+ |, the gradient coordinate index i may be randomly selected by U on the set {1, 2, . . . , n} one at a time for each kj with replacement. However, the random choice may also be done using U on {1, 2, . . . , n} but without replacement. In this later case, when all indices have been chosen, the set {1, 2, . . . , n} is shuffled [5]. The stopping condition of our population-based stochastic coordinate descent algorithm aims to guarantee a solution in the vicinity of f ∗ . Thus, if |f (xkb ) − f ∗ | ≤ |f ∗ | + 2 ,
(9)
where xkb is the best point of the subpopulation k and f ∗ is the known global optimum, is satisfied for a given tolerance > 0, the algorithm stops. Otherwise, the algorithm runs until a specified number of function evaluations, nfmax , is reached. The main steps of the algorithm are shown in Algorithm 1.
Randomly generate the population in Ω repeat Randomly select a subpopulation for iteration k and select xkH for each point xkj in the subpopulation do Randomly select i ∈ {1, . . . , n} to choose the component of ∇f at xkH Compute the search direction dkj according to (4) if dkj is descent for f at xkj then Move xkj according to (3) Select the best point xkb of the subpopulation until (9) is satisfied or the number of function evaluations exceeds nfmax
Algorithm 1. Population-based stochastic coordinate descent algorithm
4
Numerical Experiments
During the preliminary numerical experiments, well-known benchmark problems are used: BO (Booth, n = 2), BP (Branin, n = 2), CB6 (Camel6, n = 2), DA (Dekkers & Aarts, n = 2), GP (Goldstein & Price, n = 2), HSK (Hosaki, n = 2), MT (Matyas, n = 2), MC (McCormick, n = 2), MHB (Modified Himmelblau, n = 2), NF2 (Neumaier2, n = 4), PWQ (Powell Quadratic, n = 4), RG-2, RG-5, RG-10 (Rastrigin, n = 2, n = 5, n = 10) RB (Rosenbrock, n = 2), WF (Wood, n = 4), see the full description in [7]. The MatlabTM (Matlab is a registered trademark of the MathWorks, Inc.) programming language is used to code the algorithm and the tested problems. The parameter values are set as follows: |P | = 500, |P1 | = 0.01|P |, |Pk | = |P1 | − 1 for all k > 1, = 1E − 04 and nfmax = 50000.
Population Based Stochastic Method
21
In our previous work [2], we have used the gradient computed at x ¯. Besides this variant, we have also tested a variant where the gradient is computed at the best point of the subpopulation. These variants are now compared with the new strategy based on the gradient computed at the point with highest score, summarized in the previous section. All the tested variants are termed as follows: – best w (best wout): gradient computed at the best point and the coordinate index i (see (4)) is randomly selected by U with (without) replacement; ¯ and the coordinate index i is – center w (center wout): gradient computed at x randomly selected by U with (without) replacement; – hscore w (hscore wout): gradient computed at the point with highest score and the coordinate index i is randomly selected by U with (without) replacement; – best full g (center full g / hscore full g): using the full gradient computed at the best point (¯ x/the point with highest score) to define the search direction. Each variant was run 30 times with each problem. Tables 1 and 2 show the average of the obtained f solution values over the 30 runs, favg , the minimum f solution value obtained after the 30 runs, fmin , the average number of function evaluations, nfavg , and the percentage of successful runs, %s, for the variants best w, center w, hscore w and best wout, center wout, hscore wout. A successful run is a run which stops with the stopping condition for the specified , see (9). The other statistics also reported in the tables are: (i) the % of problems with 100% of successful runs (% prob 100%); (ii) the average nf in problems with 100% of successful runs (nfavg 100%); (iii) average nf in problems with 100% of successful runs simultaneously in the 3 tested variants (for each table) (nfavg all100%). A result printed in ‘bold’ refers to the best variant shown and compared in that particular table. From the results, we may conclude that using with or without replacement to choose the coordinate index i (see (4)) has no influence on the robustness and efficiency of the variant based on the gradient computed at the best point. Variants best w and best wout are the less robust and variants center w, hscore w and hscore wout are the most robust. When computing the average number of function evaluations for the problems that have 100% of successful runs in all the 3 tested variants, best w wins, followed by hscore w and then by center w (same is true for best wout, hscore wout and center wout). We remark that these average number of evaluations correspond to the simpler and easy to solve problems. For the most difficult problems and yet larger problems, the variants hscore w (75% against 50% and 69%) and hscore wout (69% against 50% and 63%) win as far as robustness is concerned. This justifies their larger nfavg 100% values. The results reported in Table 3 aim to show that robustness has not been improved when the full gradient is used. All the values and statistics have the same meaning as in the previous tables. Similarly, the variant based on gradient computed at the best point reports the lowest nfavg all100% but also reaches the lowest % prob 100%. The use of the full gradient has deteriorated the results mostly on the variant center full g when compared with both center w and center wout.
−2.478E+04 −2.478E+04 786
3.000E+00
−2.346E+00 −2.346E+00 81
9.652E-09
−1.913E+00 −1.913E+00 93
3.510E-01
3.728E-03
6.514E-03
4.643E-01
3.283E+00
5.373E+00
5.346E-04
8.587E-04
DA
GP
HSK
MT
MC
MHB
NF2
PWQ
RG-2
RG-5
RG-10
RB
WF
3.000E+00
1262
5.902E-09
2255
50
50012 0
50008 0
50007 0
43593 13
18947 63
1.490E-01
6.533E-03
3.317E-02
4.026E-09
4.568E-09
5.965E-03
1.023E-02
5.525E-09
6.966E-06
1.224E-06
1.994E-10
8.058E-12
1.868E-11
5.386E-06
6.221E-06
7.487E-10
69
2.425E-01
5.449E-03
3.157E-09
100 3.855E-09
100 4.160E-09
6.655E-03
4.403E-03
100 4.254E-09
50018 0
1067
1564 1.157E-09
2159
4.423E-03
2.052E-07
2.200E-11
3.368E-12
1.600E-10
1.723E-05
6.601E-05
5.141E-10
100
100
0
0
100
100
100
100
100
100
100
100
100
%s
916
3170
50021
50023
75
0
0
20202 100
6981
2074
50021
50020
1450
100 −1.913E+00 −1.913E+00 172
100 8.556E-09
50027 0
1607
3.000E+00
100 −2.346E+00 −2.346E+00 110
100 3.000E+00
13911 97
5918
239
1555
nfavg
100 −2.478E+04 −2.478E+04 1020
50019 0 1505
3.979E-01
2.061E-09
fmin
100 −1.032E+00 −1.032E+00 512
100 3.979E-01
100 6.543E-09
50020 0
1721
100 −1.913E+00 −1.913E+00 318
100 8.650E-09
50013 0
607
681
2082
100 −2.346E+00 −2.346E+00 305
100 3.000E+00
hscore w nfavg % s favg
100 −2.478E+04 −2.478E+04 1251
50009 0
nfavg all100%
3.979E-01
1.496E-11
fmin
100 −1.032E+00 −1.032E+00 385
100 3.979E-01
100 6.616E-09
12144 77
1542
828
607
1.706E-05
3.505E-08
1.990E+00
5.984E-09
3.165E-09
1.936E-07
3.690E-05
2.662E-10
9.006E-09
3.000E+00
935
nfavg 100%
% prob 100%
−1.032E+00 -1.032E+00
CB6
96
495
3.979E-01
3.979E-01
center w nfavg % s favg
BP
4.354E-09
fmin
6.772E-09
BO
favg
best w
Table 1. Results based on the use of one coordinate of the gradient, randomly selected with replacement.
22 A. M. A. C. Rocha et al.
−2.478E+04 −2.478E+04 753
3.000E+00
−2.346E+00 −2.346E+00 69
9.701E-09
−1.913E+00 −1.913E+00 103
3.168E-01
3.016E-03
6.014E-03
4.975E-01
1.957E+00
6.567E+00
2.924E-04
9.078E-04
GP
HSK
MT
MC
MHB
NF2
PWQ
RG-2
RG-5
RG-10
RB
WF
3.000E+00
1375
4.319E-09
2190
83
50
50008 0
49438 7
46919 7
41991 17
1.287E-01
6.361E-03
3.317E-02
3.317E-02
3.245E-09
5.501E-03
1.033E-02
5.692E-09
7.751E-05
9.378E-07
3.264E-11
3.006E-12
4.320E-11
9.561E-07
7.602E-05
1.428E-10
97
1038
63
50019 0
1.794E-01
5.017E-03
5.592E-08
3.762E-09
100 4.137E-09
5.293E-03
5.700E-03
100 3.787E-09
50028 0
1213
1518 2.762E-09
2127
1.618E-04
1.505E-08
1.516E-11
2.160E-11
3.149E-11
1.100E-06
8.673E-05
2.452E-10
100
100
100
100
100
100
100
100
100
100
100
908
1679
69
50027 0
50021 0
22677 97
7759
2082
50026 0
50024 0
1357
100 −1.913E+00 −1.913E+00 154
100 8.199E-09
15331 97
7943
3.000E+00
100 −2.346E+00 −2.346E+00 113
100 3.000E+00
50025 0 1846
196
1641
nfavg % s
−1.032E+00 487
3.979E-01
9.252E-10
fmin
100 −2.478E+04 −2.478E+04 1032
100 -1.032E+00
100 3.979E-01
100 6.306E-09
50017 0
1975
100 −1.913E+00 −1.913E+00 300
100 8.286E-09
22302 57
607
794
1723
100 −2.346E+00 −2.346E+00 295
100 3.000E+00
hscore wout nfavg % s favg
100 −2.478E+04 −2.478E+04 1272
50014 0
nfavg all100%
3.979E-01
3.723E-10
fmin
100 −1.032E+00 −1.032E+00 358
100 3.979E-01
100 6.865E-09
50010 0
9682
1500
904
607
3.213E-06
9.952E-09
7.386E-09
4.262E-09
3.109E-09
1.635E-05
4.736E-05
8.286E-11
9.103E-09
3.000E+00
nfavg 100%
% prob 100%
−1.032E+00 −1.032E+00 883
DA
126
CB6
3.979E-01
3.979E-01
521
BP
4.119E-09
center wout nfavg % s favg
7.221E-09
fmin
BO
favg
best wout
Table 2. Results based on the use of one coordinate of the gradient, randomly selected without replacement.
Population Based Stochastic Method 23
24
A. M. A. C. Rocha et al. Table 3. Results based on the use of the full gradient. best full g
center full g
hscore full g
favg
nfavg
% s favg
nfavg
% s favg
nfavg
%s
BO
6.331E-09
208
100 5.862E-09
4841
100 5.397E-09
883
100
BP
3.979E-01
185
100 3.979E-01
2142
100 3.979E-01
294
100
CB6
−1.032E+00 116
100 −1.032E+00 465
100 −1.032E+00 558
100
DA
−2.477E+04 3398
100 −2.477E+04 33403
77
GP
3.000E+00
100 3.000E+00
100 3.000E+00
HSK
−2.346E+00 55
100 −2.346E+00 636
100 −2.346E+00 72
MT
9.615E-09
100 5.753E-09
100 4.442E-09
MC
−1.913E+00 41
100 −1.913E+00 1320
100 −1.913E+00 78
MHB
5.123E-01
10005
83
5.116E-09
2743
100 4.665E-09
1526
100
NF2
6.756E-03
50009
0
3.470E-02
50014
0
9.037E-03
50027
0
PWQ
1.082E-02
50011
0
6.890E-02
50020
0
5.892E-03
50020
0
RG-2
1.194E+00
39004
23
5.076E-09
6722
100 5.181E-09
6118
100
RG-5
1.718E+01
50013
0
4.245E+00
50018
0
4.669E+00
50015
0
RG-10
6.179E+01
50016
0
3.333E+01
50023
0
3.254E+01
50019
0
RB
5.162E-03
50006
0
2.809E-05
44037
47
3.643E-04
47933
17
WF
4.247E-01
50006
0
3.000E-01
50019
0
7.109E-01
50024
658 756
% prob 100%
1701 5153
50
−2.477E+04 4220 1235
100 100 100
709
100 100
0 63
56
nfavg 100%
677
2858
1569
nfavg all100%
288
2323
547
Table 4 compares the results obtained with five of the above mentioned problems with those presented in [2]. The comparison involves the three tested variants center w, hscore w and hscore wout, which provided the highest percentages of successful runs, 69%, 75% and 69% respectively. This table reports the values of favg and nfavg , after 30 runs. We note that the herein stopping condition is the same as that of [2]. All reported variants have 100% of successful runs when solving GP, MHB, RG-2 and RG-5. However, only the variant hscore w reaches 100% success when solving RG-10 (see last row in the table). Table 4. Comparative results. Results in [2] center w favg nfavg favg
hscore w nfavg favg
3.00E+00 1262
nfavg
hscore wout favg nfavg
GP
3.00E+00 833
3.00E+00 1564
3.00E+00 1518
MHB
5.10E-09
1229 5.53E-09
1721
4.25E-09
1450
3.79E-09
1357
RG-2
3.40E-09
1502 4.57E-09
1505
4.16E-09
2074
4.14E-09
2082
RG-5
3.76E-09
7759
3.52E-09
13576 4.03E-09
5918 3.86E-09
6981
RG-10 2.65E-01
30104 3.32E-02
13911 3.16E-09
20202 5.59E-08
22677
(% s)
(77)
(97)
(100)
(97)
Population Based Stochastic Method
5
25
Conclusions
In this paper, we present a population-based stochastic coordinate descent method for bound constrained GO problems. Several variants are compared in order to find the most robust, specially when difficult and larger problems are considered. The idea of using the point with highest score to generate the coordinate descent directions to move all the points of the subpopulation has shown to be more robust than the other tested ideas and worth pursuing. Future work will be directed to include, in the set of tested problems, instances with varied dimensions to analyze the influence of the dimension n in the performance of the algorithm. Another matter is related to choosing a specified number (yet small) of gradient coordinate indices (rather than just one) by the uniform distribution on the set {1, 2, . . . , n}, to move each point of the subpopulation. Acknowledgments. This work has been supported by FCT – Funda¸ca ˜o para a Ciˆencia e Tecnologia within the Projects Scope: UID/CEC/00319/2019 and UID/MAT/00013/2013.
References 1. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. Technical Report arXiv:1606.04838v3, Computer Sciences Department, University of Wisconsin-Madison (2018) 2. Rocha, A.M.A.C., Costa, M.F.P., Fernandes, E.M.G.P.: A stochastic coordinate descent for bound constrained global optimization. AIP Conf. Proc. 2070, 020014 (2019) 3. Kvasov, D.E., Mukhametzhanov, M.S.: Metaheuristic vs. deterministic global optimization algorithms: the univariate case. Appl. Math. Comput. 318, 245–259 (2018) 4. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012) 5. Wright, S.J.: Coordinate descent algorithms. Math. Program. Series B 151(1), 3–34 (2015) 6. Liu, H., Xu, S., Chen, X., Wang, X., Ma, Q.: Constrained global optimization via a DIRECT-type constraint-handling technique and an adaptive metamodeling strategy. Struct. Multidisc. Optim. 55(1), 155–177 (2017) 7. Ali, M.M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J. Glob. Optim. 31(4), 635–672 (2005)
A Sequential Linear Programming Algorithm for Continuous and Mixed-Integer Nonconvex Quadratic Programming Mohand Bentobache(B) , Mohamed Telli, and Abdelkader Mokhtari Laboratory of Pure and Applied Mathematics, University Amar Telidji of Laghouat, BP 37G, Gharda¨ıa Road, 03000 Laghouat, Algeria [email protected], [email protected], [email protected]
Abstract. In this work, we propose a new approach called “Sequential Linear Programming (SLP) algorithm” for finding an approximate global minimum of continuous and mixed-integer nonconvex quadratic programs (qps). In order to compare our algorithm with the existing approaches, we have developed an implementation with MATLAB and we presented some numerical experiments which compare the performance of our algorithm with the branch and cut algorithm implemented in CPLEX12.8 on 28 concave quadratic test problems, 64 nonconvex quadratic test problems and 12 mixed-integer nonconvex qps. The numerical results show that our algorithm has successfully found similar global objective values as CPLEX12.8 in almost all the considered test problems and it is competitive with CPLEX12.8, particularly in solving large problems (number of variables greater that 50 and less than 1000). Keywords: Concave quadratic programming · Nonconvex quadratic programming · Mixed-integer quadratic programming · Linear programming · Approximate global optimum · Extreme point · Numerical experiments
1
Introduction
Nonconvex quadratic programming is a very important branch in optimization. There does not exists a polynomial algorithm for finding the global optimum of nonconvex quadratic programs, so they are considered as NP-hard optimization problems. Several approaches were proposed for finding local optimal solutions (DCA [16], interior-point methods [2], simplex algorithm for the concave quadratic case [4], etc.) and approximate global solutions (branch and cut [18], branch and bound [10,12,13], DC combined with branch and bound [1], integer linear programming reformulation approaches [21], approximation set and linear programming (LP) approach [3,5,17], etc.) c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 26–36, 2020. https://doi.org/10.1007/978-3-030-21803-4_3
A SLP Algorithm for Nonconvex QP
27
In [5], a new and very interesting approach based on the concept of approximation set and LP for finding an approximate global solution of a strictly concave quadratic program with inequality and nonnegativity constraints is proposed. This approach computes a finite number of feasible points in the level line passing through the initial feasible solution, then it solves a sequence of linear programs. After that, the current point is improved by using the global optimality criterion proposed in [9]. In [3,17], the previous approach is adapted and extended to solve concave quadratic programs written in general form (the matrix of the quadratic form is negative semi-definite, the problem can contain equality and inequality constraints, the bounds of the variables can take finite or infinite values). In order to improve the current solution, the global optimality criterion proposed in [14] was used. However, the previous global criteria [9,14] use the concavity assumption, thus they can not be used for the general nonconvex case. In this work, we generalize the algorithms proposed in [3,5,17] for solving nonconvex quadratic programming problems written in general form. Hence a new approach called “sequential linear programming algorithm” is proposed. This algorithm starts with an initial extreme point, which is the solution of the linear program corresponding to the minimization of the linear part of the objective function over the feasible set of the quadratic problem, then it moves from a current extreme point to a new one with a better objective function value by solving a sequence of LP problems. The algorithm stops when no improvement is possible. Our algorithm finds a good approximate global extreme point for continuous as well as mixed-integer quadratic programs, it is easy to implement and has a polynomial average complexity. In order to compare our algorithm with the existing approaches, we developed an efficient implementation with MATLAB2018a [11]. Then, we presented some numerical experiments which compare the performance of the developed nonconvex solver (SLPqp) with the branch and cut algorithm implemented in CPLEX12.8 [6] on a collection of 104 test problems: 64 nonconvex quadratic test problems and 20 concave quadratic test problems of the library Globallib [8], 8 concave test problems randomly generated with the algorithm proposed in [15] and 12 mixed-integer concave quadratic test problems constructed using the continuous qps [7,15], by considering 50% of their variables as integers. This paper is organized as follows. In Sect. 2, we state the problem and recall some definitions and results of nonconvex quadratic programming. In Sect. 3, we describe the proposed algorithm and we illustrate it with two numerical examples (a continuous nonconvex qp [19] and an integer concave qp [20]). In Sect. 4, we present some numerical experiments which compare our solver with the branch and cut solver of CPLEX12.8. Finally, we conclude the paper and give some future works.
28
2
M. Bentobache et al.
Presentation of the Problem and Definitions
We consider the nonconvex quadratic programming problem presented in the following general form: min f (x) =
1 T x Dx + cT x, 2 A1 x ≤ b1 , A2 x = b2 ,
l ≤ x ≤ u, xj ∈ Z, j = 1, 2, . . . , n1 , xj ∈ R, j = n1 + 1, . . . , n,
(1) (2) (3) (4) (5) (6)
where D is a square symmetric matrix which can be negative semi-definite or indefinite; c, x, l, u are vectors in Rn , the components of l and u can take ±∞; A1 is a real matrix of dimension m1 × n, A2 is a real matrix of dimension m2 × n, b1 , b2 are vectors in Rm1 and Rm2 respectively. • The set of n-vectors satisfying constraints (2)–(6) is called the feasible set of the problem (1)–(6) and it is denoted by S. Any vector x ∈ S is called a feasible solution of the problem (1)–(6). • The vector x∗ ∈ S is called a global optimal solution of the problem (1)–(6), if ∀x ∈ S, f (x∗ ) ≤ f (x). • Let f be a function defined from Rn to R and z ∈ Rn . The level line of the function f passing through z is defined by Ef (z) (f ) = {y ∈ Rn : f (y) = f (z)}. Let z be an initial feasible point. In [5], the authors proposed an algorithm for solving a continuous strictly concave quadratic program, which is based on the construction of a finite set of points y j , j = 1, 2 . . . , r belonging to the set Ef (z) (f ) ∩ S. The following lemma [3,5,17] allows us to construct points belonging to the set Ef (z) (f ), which are not necessarily feasible. Lemma 1. Let h ∈ Rn , such that hT Dh = 0 and consider the real number γ calculated as follows: 2hT (Dz + c) . γ=− hT Dh Then the point yγ = z + γh ∈ Ef (z) (f ).
3
Steps of the SLP Algorithm
Let z 0 be an initial feasible point and ∇f (z 0 ) = Dz 0 + c be the gradient of f at the point z 0 , such that ∇f (z 0 ) = 0. The scheme of the sequential linear programming algorithm for finding an approximate global extreme point for the nonconvex quadratic programming problem (1)–(6) is described in the following steps:
A SLP Algorithm for Nonconvex QP
29
Algorithm 1. (SLP algorithm) Step 1. Choose a number r ∈ N∗ and set k = 0; Step 2. Choose the (n×r)-matrix H = (hj , j = 1, 2, . . . , r), hj ∈ Rn , set J = ∅; Step 3. Calculate the points y j : For j = 1, 2,. . . , r, T
if hj Dhj = 0, then γj = −
T
2hj (Dz k + c) T hj Dhj
, y j = z k + γj hj , J = J ∪ {j};
Step 4. Solve the linear programs min xT ∇f (y j ), j ∈ J. x∈S
Let uj , j ∈ J be the optimal solutions of these LP problems; Step 5. Calculate the index p ∈ J, such that f (up ) = min f (uj ); j∈J
Step 6. If f (up ) < f (z k ), then set k = k + 1, z k = up and go to Step 2. Else z k is an approximate global minimizer for the problem (1)–(6). In order to solve qp (1)–(6), we propose the following two-phase approach: Algorithm 2. (SLPqp) Phase I: Step 1. Solve the linear program minx∈S cT x. Let z 0 be the obtained optimal solution; (in [1], it is shown that a good starting point can accelerate the convergence to a global solution.) Step 2. Apply the SLP algorithm (Algorithm 1) with r = n and H = In , by starting with the initial feasible extreme point z 0 (In is the identity matrix of order n). Let z 1 be the obtained approximate global minimizer; Phase II: Step 3. Apply the SLP algorithm with r = 50n, if n < 500; r = n, if n ≥ 500; the elements of H are randomly generated with the uniform distribution in the interval [−1, 1], by starting with the initial point z 1 found in the first phase. The obtained point z ∗ is an approximate global extreme point for qp (1)–(6). Remark 1. Since the number of extreme points of the set S is finite and the new point z k+1 satisfies f (z k+1 ) < f (z k ), SLPqp finds an approximate global extreme point in a finite number of iterations. Let us solve the following nonconvex quadratic programs by SLPqp. Example 1. Consider the continuous nonconvex quadratic program [19]: min f (x) = x1 − 10x2 + 10x3 + x8 −x21 − x22 − x23 − x24 − 7x25 − 4x26 − x27 − 2x28 +2x1 x2 + 6x1 x5 + 6x2 x5 + 2x3 x4 , s.t. x1 + 2x2 + x3 + x4 + x5 + x6 + x7 + x8 ≤ 8, 2x1 + x2 + x3 ≤ 9, x3 + x4 + x5 ≤ 5, 0.5x5 + 0.5x6 + x7 + 2x8 ≤ 3, 2x2 − x3 − 0.5x4 ≤ 5, x1 ≤ 6, xj ≥ 0, j = 1, 2, . . . , 8.
30
M. Bentobache et al.
The global minimizer is x∗ = (0, 0, 0, 0, 5, 1, 0, 0)T with f (x∗ ) = −179 [19]. The current point and its corresponding objective value at each iteration of SLPqp are shown in the left side of Table 1. Example 2. Consider the integer concave quadratic program [20]: min f (x) = −5x21 + 8x1 − 3x22 + 7x2 , s.t. −9x1 + 5x2 ≤ 9, x1 − 6x2 ≤ 6, 3x1 + x2 ≤ 9, 1 ≤ xj ≤ 7, xj integer, j = 1, 2. The optimal solution is x∗ = (2, 3)T with f (x∗ ) = −10 [20]. The current point and its corresponding objective value at each iteration of SLPqp are shown in the right side of Table 1. Table 1. Results of Examples 1 and 2. Phase Iteration(k) z k I
II
4
f (z k )
Phase Iteration(k) z k
f (z k )
0
(0, 3, 0, 2, 0, 0, 0, 0)T
0
(1, 1)T
1
(0, 1.5, 0, 0, 5, 0, 0, 0)T −147.25
1
(2, 3)T −10
2
(0, 0, 0, 0, 5, 0, 0, 0)T
−175
2
(2, 3)T −10
3
(0, 0, 0, 0, 5, 1, 0, 0)T
−179
3
(2, 3)T −10
4
(0, 0, 0, 0, 5, 1, 0, 0)T
−179
−43
I
II
7
Numerical Experiments
In order to compare our algorithm (SLPqp) with the branch and cut solver of CPLMEX12.8 (CPLEX): the “globalqpex1” function with the parameter “optimalitytaget” set to 3, i.e., global, we developed an implementation with MATLAB2018a. In this implementation, we used the barrier interior-point algorithm of CPLEX12.8 (the “cplexlp” function with the parameter “lpmethod” set to 4) to solve the intermediate continuous linear programs and we used the “cplexmilp” function for solving the intermediate mixed-integer LPs. In the comparison, the different solvers are executed on a PC with a CORE i7-4790CPU 3.60 GHz processor, 8 G0 of RAM and Windows 10 operating system. We have considered 104 nonconvex quadratic test problems (these qps can be downloaded from “https://www.sciencedz.net/perso.php?id=mbentobache&p=253”): (A) Twelve mixed-integer concave quadratic test problems obtained by considering 50% of the variables of the following problems as integers: nine qps taken from [7] (the problems miqp7, miqp8, miqp9 are obtained by setting in Problem 7, page 12 [7] (λi , αi ) = (1, 2), (λi , αi ) = (1, −5), (λi , αi ) = (1, 8)
A SLP Algorithm for Nonconvex QP
31
respectively). The three last problems: miqp10, miqp11, miqp12 are obtained from the first three generated continuous qps: Rosen-qp1, Rosen-qp2 and Rosen-qp3 shown in Table 6. The results of the two solvers for these miqps are shown in Table 2. (B) Sixty-four nonconvex quadratic test problems of the library Globallib [8]. Results are shown in Tables 3 and 4. (C) Twenty concave quadratic test problems of the library Globallib [8]. Results are shown in Table 5. (D) Eight concave quadratic test problems randomly generated with the algorithm proposed in [15]. These qps are written in the form: min 0.5xT Dx + cT x, s.t. Ax ≤ b, x ≥ 0, with A an (n + 1) × n−matrix, c ∈ Rn , b ∈ Rn+1 , D ∈ Rn×n a symmetric negative semi-definite matrix. See Table 6. In the different tables, f ∗ , It1 , CP U1 , It, CP U and Error designate respectively the approximate global value, the number of phase 1 iterations of SLPqp, the CPU time of the phase 1 of SLPqp in seconds, the total number of iterations of SLPqp, the total CPU time and the absolute value of the difference between the approximate global minimum and the known global minimum. Note that the obtained results are quite encouraging: • Our algorithm has successfully found the same global objective values as CPLEX in 77 test problems, a better objective values than CPLEX for 23 problems and a worse ones for 4 nonconvex qps (ex2-1-9, st-e23, st-glmp-ss1, st-jcbpafex, see Table 3). Probably the global solution of these 4 problems is not an extreme point. Table 2. Mixed-integer concave quadratic test problems QP
Problem
m n
n1 SLPqp f∗
It1 CPU1 It CPU
CPLEX f∗
CPU 0,14
miqp1
1 page 5
1
5
3
−17,000
4
0,20
6 3,47
−17,000
miqp2
2 page 6
2
6
3
−361,500 1
0,09
2 2,08
−361,500 0,04
miqp3
3 page 7
9
13 6
−195,000 5
0,19
7 9,48
−195,000 0,03
miqp4
4 page 8
5
6
−14,800
1
0,07
2 3,46
−14,800
3
0,05
miqp5
5 page 10 11 10 5
−217,959 1
0,13
2 5,80
−217,959 0,05
miqp6
6 page 11 5
−39,000 2
0,24
4 9,65
−38,999 0,04
miqp7
7 page 12 10 20 10 −335,841 3
0,76
6 40,25
−335,841 0,58
miqp8
7 page 12 10 20 10 −615,841 2
0,47
5 37,09
−615,841 6,40
miqp9
7 page 12 10 20 10 −99,647
2
0,51
5 35,40
−99,647
miqp10
11 10 5
2
0,27
4 10,98 –
>10800
miqp11
21 20 10 1447,480 2
0,65
4 29,04 –
>10800
miqp12
31 30 15 3949,075 2
1,96
4 88,69 –
>10800
10 5
693,453
0,10
32
M. Bentobache et al. Table 3. Nonconvex quadratic qps of Globallib [8] No
QP
m
n
SLPqp
CPLEX
f∗
It1
CPU1
It
CPU
f∗
CPU
1
ex2-1-1
1
5
−17,000
4
0,07
6
0,31
−17,000
0,46
2
ex2-1-10
10
20
−498345,482
4
0,10
6
1,02
−498345,482
0,11
3
ex2-1-2
2
6
−213,000
1
0,06
2
0,22
−213,000
0,03
4
ex2-1-3
9
13
−15,000
5
0,07
7
0,58
−15,000
0,10
5
ex2-1-4
5
6
−11,000
1
0,06
2
0,24
−11,000
0,06
6
ex2-1-6
5
10
−39,000
2
0,07
5
0,67
−39,000
0,07
7
ex2-1-9
1
10
0,000
1
0,06
2
0,28
−0,375
0,28
8
nemhaus
5
5
31,000
1
0,06
2
0,20
31,000
0,03
9
qp1
2
50
0,063
2
0,10
4
2,20
–
> 14400
10
qp2
2
50
0,104
2
0,10
4
2,17
–
> 14400
11
qp3
52
100
0,006
1
0,06
3
7,44
–
> 14400
12
st-bpaf1a
10
10
−45,380
1
0,06
3
0,49
−45,380
0,07
13
st-bpaf1b
10
10
−42,963
1
0,06
3
0,50
−42,963
0,23
14
st-bpk1
6
4
−13,000
1
0,06
3
0,25
−13,000
0,07
15
st-bpk2
6
4
−13,000
1
0,06
3
0,25
−13,000
0,10
16
st-bpv2
5
4
−8,000
1
0,06
3
0,26
−7,999
0,12
17
st-bsj2
5
3
1,000
1
0,06
2
0,17
1,000
0,55
18
st-bsj3
1
6
−86768,550
5
0,07
7
0,32
−86768,550
0,03
19
st-bsj4
4
6
−70262,050
4
0,07
6
0,33
−70262,050
0,06
20
st-e22
5
2
−85,000
2
0,06
4
0,19
−85,000
0,04
21
st-e23
2
2
−0,750
1
0,06
2
0,15
−1,083
0,05
22
st-e24
4
2
8,000
1
0,06
3
0,19
8,000
0,04
23
st-e25
8
4
0,870
1
0,06
2
0,19
0,870
0,05
24
st-e26
4
2
−185,779
2
0,06
4
0,19
−185,779
0,04
25
st-fp1
1
5
−17,000
4
0,07
6
0,29
−17,000
0,05
26
st-fp2
2
6
−213,000
1
0,06
2
0,22
−213,000
0,03
27
st-fp3
10
13
−15,000
5
0,07
7
0,59
−15,000
0,06
28
st-fp4
5
6
−11,000
1
0,06
2
0,23
−11,000
0,04
29
st-fp5
11
10
−268,015
1
0,06
2
0,30
−268,015
0,06
30
st-fp6
5
10
−39,000
2
0,07
5
0,67
−39,000
0,06
31
st-glmp-fp1
8
4
10,000
1
0,06
3
0,27
10,000
0,05
32
st-glmp-fp2
9
4
7,345
1
0,06
3
0,28
7,345
0,14
33
st-glmp-fp3
8
4
−12,000
1
0,06
3
0,27
−12,000
0,04
34
st-glmp-kk90
7
5
3,000
1
0,06
3
0,31
3,000
0,05
35
st-glmp-kk92
8
4
−12,000
1
0,06
3
0,27
−12,000
0,03
36
st-glmp-kky
13
7
−2,500
1
0,06
2
0,25
−2,500
0,06
37
st-glmp-ss1
11
5
−24,000
1
0,06
3
0,31
−24,571
0,07
38
st-glmp-ss2
8
5
3,000
1
0,06
3
0,30
3,000
0,06
39
st-ht
3
2
−1,600
3
0,06
5
0,19
−1,600
0,11
40
st-iqpbk1
7
8
−621,488
3
0,07
5
0,40
−621,488
0,07
41
st-iqpbk2
7
8
−1195,226
3
0,07
5
0,40
−1195,226
0,09
42
st-jcbpaf2
13
10
−794,856
1
0,06
4
0,68
−794,856
0,07
43
st-jcbpafex
2
2
−0,750
1
0,06
2
0,15
−1,083
0,05
44
st-kr
5
2
−85,000
2
0,06
4
0,19
−85,000
0,07
45
st-pan1
4
3
−5,284
3
0,06
5
0,23
−5,284
0,05
46
st-pan2
1
5
−17,000
4
0,07
6
0,29
−17,000
0,05
47
st-ph1
5
6
−230,117
3
0,07
5
0,34
−230,117
0,05
48
st-ph10
4
2
−10,500
1
0,06
2
0,15
−10,500
0,03
A SLP Algorithm for Nonconvex QP
33
Table 4. Nonconvex quadratic qps of Globallib [8] NO QP
m n
SLPqp f∗
It1 CPU1
CPLEX It CPU f ∗
49
st-ph11
4
3
−11,281
4
0,06
6 0,22
−11,281
0,04
50
st-ph12
4
3
−22,625
4
0,06
6 0,22
−22,625
0,04
51
st-ph13
10 3
−11,281
4
0,06
6 0,23
−11,281
0,04
52
st-ph14
10 3
−229,722 2
0,06
4 0,23
−229,125 0,04
53
st-ph15
4
4
−392,704
3
0,06
5 0,27
−392,704
54
st-ph2
5
6
−1028,117 3
0,07
5 0,34
−1028,117 0,04
55
st-ph20
9
3
−158,000
3
0,06
5 0,24
−158,000
0,04
56
st-ph3
5
6
−420,235
3
0,06
5 0,33
−420,235
0,05
57
st-phex
5
2
−85,000
2
0,06
4 0,19
−85,000
0,05
58
st-qpc-m0
2
2
−5,000
3
0,06
5 0,19
−5,000
0,04
59
st-qpc-m3a 10 10 −382,695
2
0,07
4 0,49
−382,695
0,03
60
st-qpk1
4
1
0,06
3 0,19
−3,000
0,05
61
st-qpk2
12 6
−12,250
2
0,06
4 0,35
−12,250
0,07
62
st-qpk3
22 11 −36,000
3
0,07
5 0,59
−36,000
0,08
63
st-z
5
3
0,000
1
0,06
2 0,17
0,000
0,05
64
stat
5
3
0,000
1
0,06
2 0,17
0,000
0,05
2
−3,000
CPU
0,06
• In terms of CPU time, CPLEX is slightly faster than SLPqp in almost all the Globallib test problems. Except for the nonconvex problems qp1, qp2 and qp3 (see Table 3), CPLEX has failed to obtain the solution after 4 h, while SLPqp has found an approximate global extreme point in less than 8 s. • SLPqp outperforms CPLEX in solving all the generated test qps shown in Table 6: our algorithm has found the known global minimum of all the generated problems with a good accuracy (2.91 × 10−11 ≤ Error ≤ 1.86 × 10−5 ). Moreover, SLPqp solved the problem Rosen-qp3 of dimension 31 × 30 in 1.67 s, while CPLEX found the solution in 542.39 s (9 min); SLPqp solved Rosen-qp4 of dimension 41 × 40 in 2.69 s, while CPLEX found the solution in 63120.91 s (17.53 h); SLPqp solved Rosen-qp5 of dimension 51 × 50 in 4.26 s, while CPLEX failed to find the solution after 238482.58 s (66.25 h). Finally, for problems of dimension 201 × 200, 401 × 400 and 1001 × 1000, SLPqp found the global optimal values in less than 5142.77 s (1.43 h), while we have broken the execution of CPLEX after 4 h. • Since the global optimal solution of concave quadratic programs is an extreme point, SLPqp gives the global optimum with a good accuracy for this type of problems.
34
M. Bentobache et al.
Table 5. Concave quadratic qps of Globallib [8] NO QP
m n
1
ex2-1-5
2 3
SLPqp Error
CPLEX It1 CPU1 It CPU Error
CPU
11 10 4,74E-10
1
0,07
2 0,32
4,74E-10
0,23
ex2-1-7
10 20 1,74E-09
4
0,09
6 0,93
1,74E-09
0,17
ex2-1-8
0
2
0,08
4 1,02
0
0,08
4
st-fp7a
10 20 6,12E-04
2
0,08
5 1,31
6,12E-04
0,13
5
st-fp7b
10 20 6,12E-04
5
0,10
8 1,34
6,12E-04
0,13
6
st-fp7c
10 20 2,68E+03 4
0,09
7 1,32
3,18E-04
0,12
7
st-fp7d
10 20 6,12E-04
3
0,08
6 1,31
6,12E-04
0,18
8
st-fp7e
10 20 1,34E-04
4
0,09
7 1,31
1,34E-04
0,16
9
st-m1
11 20 0
1
0,07
2 0,55
8,89E-02 0,10
10
st-m2
21 30 0
1
0,07
2 0,89
9,67E-01 0,14
11
st-qpc-m1 5
0
2
0,06
4 0,30
4,50E-09 0,04
12
st-qpc-m3b 10 10 0
1
0,06
2 0,31
0
0,04
13
st-qpc-m3c 10 10 0
1
0,06
2 0,31
0
0,03
14
st-qpc-m4
10 10 0
1
0,06
2 0,30
0
0,04
15
st-rv1
5
10 0
2
0,07
4 0,51
1,42E-14 0,07
16
st-rv2
10 20 0
1
0,07
2 0,55
1,42E-14 0,08
17
st-rv3
20 20 0
2
0,08
4 1,02
0
18
st-rv7
20 30 0
2
0,09
4 1,50
7,87E-10 0,26
19
st-rv8
20 40 0
3
0,12
5 2,09
0
0,17
20
st-rv9
20 50 0
2
0,11
5 3,99
0
0,68
24 0
5
0,21
Table 6. Randomly generated concave qps [15] QP
n
SLPqp Error
It1 CPU1 It CPU
CPLEX Error
CPU
Rosen-qp1 10
5,38E-09 2
0,07
4 0,50
3,20E-04 0,78
Rosen-qp2 20
2,91E-11 2
0,08
4 0,99
7,06E-04 5,01
Rosen-qp3 30
5,82E-11 2
0,09
4 1,66
2,61E-03 542,39
Rosen-qp4 40
7,33E-09 2
0,11
4 2,69
1,30E-05 63120,91
Rosen-qp5 50
1,16E-08 2
0,15
4 4,26
Failure
238482.58
Rosen-qp6 200
9,69E-08 1
1,97
3 274,77
–
>14400
Rosen-qp7 400
1,86E-05 1
23,17
3 4782,68 –
>14400
677,29 3 5142,77 –
>14400
Rosen-qp8 1000 4,77E-06 1
A SLP Algorithm for Nonconvex QP
35
• SLPqp has successfully found the same global optimum as CPLEX for problems miqp1,. . . ,miqp9 (see Table 2). For test problems miqp10, miqp11 and miqp12, our algorithm has found mixed-integer approximate global optimal solutions in less than 89 s, while we have interrupted the execution of CPLEX after 3 h.
5
Conclusion
In this work, we have proposed a sequential linear programming algorithm to find an approximate global extreme point for nonconvex quadratic programming problems. This approach is easy to implement, efficient and gives accurate global optimal solution for almost all the considered test problems. In a future work, we will test the performance of our approach on other collections of test problems. Furthermore, we will combine it with the simplex algorithm [4], DCA [16] or with branch and bound approaches in order to find better global approximate solutions in less computational time.
References 1. An, L.T.H., Tao, P.D.: A branch and bound method via dc optimization algorithms and ellipsoidal technique for box constrained nonconvex quadratic problems. J. Global Optim. 13(2), 171–206 (1998) 2. Absil, P.-A., Tits, A.L.: Newton-KKT interior-point methods for indefinite quadratic programming. Comput. Optim. Appl. 36(1), 5–41 (2007) 3. Bentobache, M., Telli, M., Mokhtari, A.: A global minimization algorithm for concave quadratic programming. In: Proceedings of the 29th European Conference on Operational Research, EURO 2018, p. 329, University of Valencia, 08–11 July 2018 4. Bentobache, M., Telli, M., Mokhtari, A.: A simplex algorithm with the smallest index rule for concave quadratic programming. In: Proceedings of the Eighth International Conference on Advanced Communications and Computation, INFOCOMP 2018, pp. 88–93, Barcelona, Spain, 22–26 July 2018 5. Chinchuluun, A., Pardalos, P.M., Enkhbat, R.: Global minimization algorithms for concave quadratic programming problems. Optimization 54(6), 627–639 (2005) 6. CPLEX12.8, IBM Ilog. Inc., NY (2017) 7. Floudas, C.A., Pardalos, P.M., Adjiman, C., Esposito, W.R., Gumus, Z.H., Harding, S.T., Klepeis, J.L., Meyer, C.A., Schweiger, C.A.: Handbook of Test Problems in Local and Global Optimization. Nonconvex Optimization and its Applications. Springer, Boston (1999) 8. Globallib: Gamsworld global optimization library. http://www.gamsworld.org/ global/globallib.htm. Accessed 15 Jan 2019 9. Hiriart-Urruty, J.B., Ledyaev, Y.S.: A note on the characterization of the global maxima of a (tangentially) convex function over a convex set. J. Convex Anal. 3, 55–62 (1996) 10. Horst, R.: An algorithm for nonconvex programming problems. Math. Program. 10, 312–321 (1976) 11. Matlab2018a. Mathworks, Inc., NY (2018)
36
M. Bentobache et al.
12. Pardalos, P.M., Rodgers, G.: Computational aspects of a branch and bound algorithm for quadratic zero-one programming. Computing 45(2), 131–144 (1990) 13. Rusakov, A.I.: Concave programming under simplest linear constraints. Comput. Math. Math. Phys. 43(7), 908–917 (2003) 14. Strekalovsky, A.S.: Global optimality conditions for nonconvex optimization. J. Global Optim. 12(4), 415–434 (1998) 15. Sung, Y.Y., Rosen, J.B.: Global minimum test problem construction. Math. Program. 24(1), 353–355 (1982) 16. Tao, P.D., An, L.T.H.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997) 17. Telli, M., Bentobache, M.: Mokhtari, A: A Successive Linear Approximations Approach for the Global Minimization of a Concave Quadratic Program, Submitted to Computational and Applied Mathematics. Springer (2019) 18. Tuy, H.: Concave programming under linear constraints. Doklady Akademii Nauk SSSR 159, 32–35 (1964) 19. Tuy, H.: DC optimization problems. In : Convex analysis and global optimization. Springer optimization and its applications, vol. 110, pp. 167–228, Second edn. Springer, Cham (2016) 20. Wang, F.: A new exact algorithm for concave knapsack problems with integer variables. Int. J. Comput. Math. 96(1), 126–134 (2019) 21. Xia, W., Vera, J., Zuluaga, L. F.: Globally solving non-convex quadratic programs via linear integer programming techniques. arXiv preprint, arXiv:1511.02423v3 (2018)
A Survey of Surrogate Approaches for Expensive Constrained Black-Box Optimization Rommel G. Regis(B) Department of Mathematics, Saint Joseph’s University, Philadelphia, PA 19131, USA [email protected]
Abstract. Numerous practical optimization problems involve black-box functions whose values come from computationally expensive simulations. For these problems, one can use surrogates that approximate the expensive objective and constraint functions. This paper presents a survey of surrogate-based or surrogate-assisted methods for computationally expensive constrained global optimization problems. The methods can be classified by type of surrogate used (e.g., kriging or radial basis function) or by the type of infill strategy. This survey also mentions algorithms that can be parallelized and that can handle infeasible initial points and high-dimensional problems. Keywords: Global optimization · Black-box optimization Constraints · Surrogates · Kriging · Radial basis functions
1
·
Introduction
Many engineering optimization problems involve black-box objective or constraint functions whose values are obtained from computationally expensive finite element (FE) or computational fluid dynamics (CFD) simulations. Moreover, for some of these problems, the calculation of the objective or constraint functions might fail at certain inputs, indicating the presence of hidden constraints. For such optimization problems, a natural approach involves using surrogate models that approximate the expensive objective or constraint functions. Commonly used surrogates include kriging or Gaussian process models and Radial Basis Function (RBF) models. In the literature, various strategies for selecting sample points where the objective and constraint functions are evaluated (also known as infill strategies) have been proposed, including those for problems with expensive black-box constraints and cheap explicitly defined constraints.
c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 37–47, 2020. https://doi.org/10.1007/978-3-030-21803-4_4
38
R. G. Regis
This paper provides a survey of approaches for computationally expensive constrained optimization problems of the following general form: s.t.
min f (x) x ∈ Rd , ≤ x ≤ u gi (x) ≤ 0, i = 1, . . . , m hj (x) = 0, j = 1, . . . , p x ∈ X ⊆ Rd
(1)
where , u ∈ Rd , m ≥ 0, p ≥ 0, at least one of the objective or constraint functions f, g1 , . . . , gm , h1 , . . . , hp is black-box and computationally expensive, and X ⊆ Rd is meant to capture the region where hidden constraints are not violated. Here, we allow for the possibility that m = 0 or p = 0 (no inequality or no equality constraints or both). Also, we allow X = Rd (no hidden constraints). In general, hidden constraints cannot be relaxed (i.e., hard constraints) while the above inequality and equality constraints can be relaxed (i.e., soft constraints). Note that in a practical problem, there might be a mixture of expensive black-box constraints and cheap explicitly defined constraints. It is also possible that the objective may be cheap to evaluate and only some (or all) of the constraints are black-box and expensive. These problems are referred to grey-box models [7]. Most of the algorithms discussed here are meant for problems with expensive objective and inequality constraint functions (no equality and hidden constraints). Define the vector-valued functions G(x) := (g1 (x), . . . , gm (x)) and H(x) := (h1 (x), . . . , hp (x)), and let [, u] := {x ∈ Rd : ≤ x ≤ u}, and let D := {x ∈ [, u] ∩ X : G(x) ≤ 0, H(x) = 0} be the feasible region of (1). Here, [, u] ⊆ Rd is the search space of problem (1), and one simulation for a given input x ∈ [, u] ∩ X yields the values of f (x), G(x) and H(x). In the computationally expensive setting, one wishes to find the global minimum of (1) (or at least a feasible solution with good objective function value) given a relatively limited number of simulations. If D = ∅, f , g1 , . . . , gm , h1 , . . . , hp are all continuous functions on [, u] and X = Rd (or X contains the region defined by all the constraints), then D is a compact set and f is guaranteed to have a global minimizer in D. Now some of the black-box objective and constraint functions may not be continuous in practice, but it is helpful to consider situations when a global minimizer is guaranteed to exist. This paper is organized as follows. Section 2 provides the general structure of surrogate methods for constrained optimization. Section 3 provides two widely used surrogates, Radial Basis Functions and Kriging. Section 4 discusses some of the infill strategies for constrained optimization. An infill strategy is a way to select sample points for function evaluations. Finally, Sect. 5 provides a summary and some future directions for surrogate-based constrained optimization.
A Survey of Surrogate Approaches
2
39
General Structure of Surrogate Methods for Constrained Optimization
Surrogate-based methods for expensive black-box optimization generally follow the same structure. There is an initialization phase where the objective and constraint functions are evaluated at initial points, typically from a space-filling design such as a Latin hypercube design. After the initial sample points are obtained, the initial surrogates are built. Here, a sample point refers to a point in the search space (i.e., x ∈ [, u]) where the objective and constraint function values (f (x), G(x) and H(x)) are known. Depending on the type of method, the surrogates may be global surrogates in the sense that all sample points are used or local surrogates in that only a subset of sample points are used. After initialization, the method enters a sampling phase where the surrogates and possibly additional information from previous sample points are used to select one or more new points where the simulations will take place. The sampling phase is typically where various algorithms differ. Some methods solve optimization subproblems to determine new points while others select points from a randomly generated set of points, where the probability distribution is chosen in a way that makes it more likely to generate feasible points with good objective function values. After the sampling phase, simulations are run on the chosen point(s), yielding new data points. The surrogates are then updated with the new information obtained and the process iterates until some termination condition is satisfied. In practice, the method usually terminates when the computational budget, in terms of maximum number of simulations allowed, is reached. In the case of constrained black-box optimization, one major consideration is finding feasible sample points to begin with. For problems where the feasible region is relatively small in relation to the search space, it is not easy to obtain a feasible initial point by uniform random sampling. In this situation, part of the iterations could be devoted first to finding a good feasible point and then the remaining iterations are used to improve on this feasible point.
3 3.1
Surrogates for Constrained Black-Box Optimization Radial Basis Function Model
The Radial Basis Function (RBF) interpolation model described in Powell [17] has been successfully used in various surrogate-based methods for constrained optimization, including some that can handle high-dimensional problems with hundreds of decision variables and many black-box inequality constraints [18– 20]. Below we describe the procedure for building this RBF model. Let u(x) be the objective function or one of the constraint functions gi or hj for some i or j. Given n distinct sample points (x1 , u(x1 )), . . . , (xn , u(xn )) ∈ Rd × R, we use an interpolant of the form sn (x) =
n i=1
λi φ( x − xi ) + p(x), x ∈ Rd .
(2)
40
R. G. Regis
Here, · is the Euclidean norm, λi ∈ R for i = 1, . . . , n and p(x) is a polynomial in d variables. In some surrogate-based methods φ has the cubic form (φ(r) = r3 ) and p(x) is a linear polynomial. Other possible choices for φ include the thin plate spline, multiquadric and Gaussian forms (see [17]). To build the above RBF model in the case where the tail p(x) is a linear polynomial, define the matrix Φ ∈ Rn×n by: Φij := φ( xi −xj ), i, j = 1, . . . , n. Also, define the matrix P ∈ Rn×(d+1) whose ith row is [1, xTi ]. Now, the RBF model that interpolates the sample points (x1 , u(x1 )), . . . , (xn , u(xn )) is obtained by solving the linear system Φ P U λ = , (3) c 0d+1 P T 0(d+1)×(d+1) where 0(d+1)×(d+1) ∈ R(d+1)×(d+1) is a matrix of zeros, U = (u(x1 ), . . . , u(xn ))T , 0d+1 ∈ Rd+1 is a vector of zeros, λ = (λ1 , . . . , λn )T ∈ Rn and c = (c0 , c1 , . . . , cd )T ∈ Rd+1 consists of the coefficients for the linear function p(x). The coefficient matrix in (3) is nonsingular if and only if rank(P ) = d + 1 (Powell [17]). This condition is equivalent to having a subset of d + 1 affinely independent points among the points {x1 , . . . , xn }. 3.2
Kriging Model
A widely used kriging surrogate model is described in Jones et al. [13] and Jones [12] (sometimes called the DACE model) where the values of the blackbox function f are assumed to be the outcomes of a stochastic process. That is, before f is evaluated at any point, assume that f (x) is a realization of a Gaussian random variable Y (x) ∼ N (μ, σ 2 ). Moreover, for any two points xi and xj , the correlation between Y (xi ) and Y (xj ) is modeled by Corr[Y (xi ), Y (xj )] = exp −
d
θ |xi − xj |
p
,
(4)
=1
where θ , p ( = 1, . . . , d) are parameters to be determined. This correlation model is only one of many types of correlation functions that can be used in kriging metamodels. Note that when xi and xj are close, Y (xi ) and Y (xj ) will be highly correlated according to this model. As xi and xj become farther apart, the correlation drops to 0. Given n points x1 , . . . , xn ∈ Rd , the uncertainty about the values of f at these points can be modeled by using the random vector Y = (Y (x1 ), . . . , Y (xn ))T . Note that E(Y ) = Jμ, where J is the n×1 vector of all ones, and Cov(Y ) = σ 2 R, where R is the n × n matrix whose (i, j) entry is given by (4). Suppose the function f has been evaluated at the points x1 , . . . , xn ∈ Rd . Let y1 = f (x1 ), . . . , yn = f (xn ) and let y = (y1 , . . . , yn )T be the vector of observed function values. Fitting the kriging model in [12] through the data points (x1 , y1 ), . . . , (xn , yn ) involves finding the maximum likelihood estimates
A Survey of Surrogate Approaches
41
(MLEs) of the parameters μ, σ 2 , θ1 , . . . , θd , p1 , . . . , pd . The MLEs of these parameters are typically obtained by solving a numerical optimization problem. Now the value of the kriging predictor at a new point x∗ is provided by the formula [12] y(x∗ ) = μ + rT R−1 (y − J μ ),
(5)
J T R−1 y T and r = (Corr[Y (x∗ ), Y (x1 )], . . . , Corr[Y (x∗ ), Y (xn )]) . where μ = T −1 J R J Moreover, a measure of error of the kriging predictor at x∗ is given by (1 − rT R−1 r)2 2 2 T −1 s (x) = σ 1−r R r+ , (6) J T R−1 J where σ 2 =
4 4.1
1 n (y
− Jμ )T R−1 (y − J μ ).
Infill Strategies for Constrained Optimization Radial Basis Function Methods
One effective infill strategy for problems with expensive black-box objective and inequality constraints (no equality constraints and no hidden constraints) is provided by the COBRA algorithm [19]. COBRA uses the above RBF model to approximate the objective and constraint functions though one can use other types of surrogates with its infill strategy. It treats each inequality constraint individually instead of combining them into a penalty function and builds/updates RBF surrogates for the objective and constraints in each iteration. Moreover, it handles infeasible initial sample points using a two-phase approach where Phase I finds a feasible point while Phase II improves on this feasible point. In Phase I, the next iterate is a minimizer of the sum of the squares of the predicted constraint violations (as predicted by the RBF surrogates) subject only to the bound constraints. In Phase II, the next iterate is a minimizer of the RBF surrogate of the objective subject to RBF surrogates of the inequality constraints within some small margin and also satisfying a distance requirement from previous iterates. That is, the next iterate xn+1 solves the optimization subproblem: (0)
minx sn (x) s.t. x ∈ Rd , ≤ x ≤ u (i) (i) sn (x) + n ≤ 0, i = 1, 2, . . . , m x − xj ≥ ρn , j = 1, . . . , n (0)
(i)
(7)
Here, sn (x) is the RBF model of f (x) while sn (x) is the RBF model of gi (x) (i) for i = 1, . . . , m. Moreover, n > 0 is the margin for the ith constraint and ρn is the distance requirement given the first n sample points. The margins are meant to facilitate the generation of feasible iterates. The ρn ’s are allowed to cycle from large values meant to enforce global search and small values that promote
42
R. G. Regis
local search. In the original implementation, the optimization subproblem in (7) is solved using Matlab’s gradient-based fmincon solver from a good starting point obtained by a global search scheme, but one can also combine this with a multistart approach. COBRA performed well compared to alternatives on 20 test problems and on the large-scale 124-D MOPTA08 benchmark with 68 black-box inequality constraints [11]. One issue with COBRA [19] observed by Koch et al. [14] is that sometimes the solution returned by the solver for the subproblem is infeasible. Hence, they developed a variant called COBRA-R [14] that incorporates a repair mechanism that guides slightly infeasible points to the feasible region (with respect to the RBF constraints). Moreover, another issue with COBRA is that its performance can be sensitive to the choice of the distance requirement cycle (DRC) that specifies the ρn ’s in (7). To address this, Bagheri et al. [3] developed SACOBRA (Self-Adjusting COBRA), which includes an automatic DRC adjustment and selects appropriate ρn values based on the information obtained after initialization. In addition, SACOBRA re-scales the search space to [−1, 1]d , performs a logarithmic transformation on the objective function, if necessary, and normalizes the constraint function values. Numerical experiments in [3] showed that SACOBRA outperforms COBRA with different fixed parameter settings. An alternative to COBRA [19] is the ConstrLMSRBF algorithm [18], which also uses RBF models of the objective and constraint functions though one can also use other types of surrogates. ConstrLMSRBF is a heuristic that selects sample points from a set of randomly generated candidate points, typically from a Gaussian distribution centered at the current best point. In each iteration, the sample point is chosen to be the best candidate point according to two criteria (predicted objective function value and minimum distance from previous sample points) from among the candidate points with the minimum number of predicted constraint violations. When it was first introduced at ISMP 2009, ConstrLMSRBF was the best known algorithm for the MOPTA08 problem [11]. The original ConstrLMSRBF [18] assumes that there is a feasible point among the initial points. Extended ConstrLMSRBF [19] was developed to deal with infeasible initial points by following a similar two-phase structure as COBRA [19] where Phase I searches for a feasible point while Phase II improves the feasible point found. In Phase I of Extended ConstrLMSRBF, the next sample point is the one with the minimum number of predicted constraint violations among the candidate points, with ties being broken by using the maximum predicted constraint violation as an additional criterion. Another way to generate infill points for constrained optimization using RBF surrogates is to use the CONORBIT trust region approach [23], which is an extension of the ORBIT algorithm [27]. CONORBIT uses only a subset of previous sample points that are close to current trust region center to build RBF models for the objective and constraint functions. In a typical iteration, the next sample point is obtained by minimizing the RBF model of the objective subject to RBF models of the constraints within the current trust region. As with COBRA [19], it uses a small margin for the RBF constraints.
A Survey of Surrogate Approaches
4.2
43
Kriging-Based Methods
The most popular kriging-based infill strategy is the expected improvement criterion [25] that forms the basis of the original Efficient Global Optimization (EGO) method [9,13] for bound-constrained problems. Here, we use the notation from Sect. 3.2. In this strategy, the next sample point is the point x that maximizes the expected improvement function EI(x) over the search space where fmin − y(x) fmin − y(x) EI(x) = (fmin − y(x))Φ + s(x)φ , (8) s(x) s(x) if s(x) > 0 and EI(x) = 0 if s(x) = 0. Here, Φ and φ are the cdf and pdf of the standard normal distribution, respectively. Also, fmin is the current best objective function value. Extensions and modifications to the EI criterion include generalized expected improvement [25] and weighted expected improvement [26]. Moreover, alternatives to EI are given in [24], including the WB2 (locating the regional extreme) criterion, which maximizes WB2(x) = − y (x) + EI(x). This attempts to minimize the kriging surrogate while also maximizing EI. When the problem has black-box inequality constraints gi (x) ≤ 0, i = 1, . . . , m, we fit a kriging surrogate gi (x) for each gi (x). That is, for each i and a given x, assume that gi (x) is the realization of a Gaussian random variable Gi (x) ∼ N (μgi , σg2i ) where the parameters of this distribution are estimated by maximum likelihood as in Sect. 3.2. A standard way to handle these inequality constraints is to find the sample point x that maximizes a penalized expected improvement function obtained by multiplying the EI with the probability that x will be feasible (assuming the Gi (x)’s are independent) [25]: m m
− μgi EIp (x) = EI(x) P (Gi (x) ≤ 0) = EI(x) Φ , (9) σ gi i=1 i=1 where μ gi and σ g2i are the MLEs of the parameters of the random variable Gi (x) and where fmin for the EI in (8) is the objective function value of the current best feasible solution (or the point closest to being feasible if no feasible points are available yet) [16]. Sasena et al. [24], Parr et al. [16], Basudhar et al. [4] and Bagheri et al. [2] presented extensions of EGO for constrained optimization. Moreover, Bouhlel et al. [6] developed SEGOKPLS+K, which is an extension of SuperEGO [24] for constrained high-dimensional problems by using the KPLS+K (Kriging with Partial Least Squares) model [5]. SEGOKPLS+K uses the WB2 (locating the regional extreme) criterion described above where the surrogate is minimized while also maximizing the EI criterion. Moreover, it replaces the kriging models by the KPLS(+K) models, which are more suitable for high-dimensional problems. 4.3
Surrogate-Assisted Methods for Constrained Optimization
An alternative approach for constrained expensive black-box optimization is to use surrogates to accelerate or enhance an existing method, typically a
44
R. G. Regis
metaheuristic. We refer to these as surrogate-assisted methods. For example, CEP-RBF [20] is a Constrained Evolutionary Programming (EP) algorithm that is assisted by RBF surrogates. Recall that in a standard Constrained (μ + μ)EP, each parent generates one offspring using only mutations (typically from a Gaussian or Cauchy distribution) and does not perform any recombination. Moreover, the offspring are compared using standard rules such as: between two feasible solutions, the one with the better objective function value wins; or a between a feasible solution and an infeasible solution, the feasible solution wins; and between two infeasible solutions, the one with the smaller constraint violation (according to some metric) wins. In each generation of CEP-RBF, a large number of trial offspring are generated by each parent. Then, RBF surrogates are used to identify the most promising among these trial offspring for each parent, and this becomes the sample point where the simulation will take place. Here, a promising trial offspring is the one with the best predicted objective function value from among those with the minimum number of predicted constraint violations. Once the simulation is performed and the objective and constraint function values are known, the selection of the new parent population proceeds as in a regular Constrained EP and the process iterates. Another surrogate-assisted metaheuristic is the CONOPUS (CONstrained Optimization by Particle swarm Using Surrogates) framework [21]. In each iteration of CONOPUS, multiple trial positions for each particle in the swarm are generated, and surrogates for the objective and constraint functions are used to identify the most promising trial position where the simulations are performed. Moreover, it includes a refinement step where the current overall best position is replaced by the minimum of the surrogate of the objective within a neighborhood of that position and subject to surrogate inequality constraints with a small margin and with a distance requirement from all previous sample points. In addition, one can also use surrogates to assist provably convergent algorithms. For example, quadratic models have been used in the direct search method NOMAD [8]. Moreover, CARS-RBF [15] is an RBF-assisted version of Constrained Accelerated Random Search (CARS), which extends the Accelerated Random Search (ARS) algorithm [1] to constrained problems. In each iteration of CARS, a sample point is chosen uniformly within a box centered at the current best point. The initial size of the box is chosen so that it covers the search space. If the sample point is worse than the current best point, then the size of the box is reduced. Otherwise, if the sample point is an improvement over the current best point, then the size of the box is reset to the initial value so that the box again covers the search space. CARS [15] has been shown to converge to the global minimum almost surely. Further, it was shown numerically to converge faster than the constrained version of Pure Random Search on many test problems. In CARS-RBF, a large number of trial points is generated uniformly at random within the current box, and as before, RBF surrogates are used to identify the most promising among these trial points using the same criteria used by ConstrLMSRBF [18]. The simulations are then carried out at this promising trial point and the algorithm proceeds in the same manner as CARS.
A Survey of Surrogate Approaches
4.4
45
Parallelization and Handling High Dimensions
To make it easier to find good solutions for computationally expensive problems, one can generate multiple sample points that can be evaluated in parallel in each iteration. Metaheuristics such as evolutionary and swarm algorithms are naturally parallel, and so, the surrogate-assisted CEP-RBF [20] and CONOPUS [21] algorithms are easy to parallelize. COBRA [19] and ConstrLMSRBF [18] can be parallelized using ideas in [22]. Moreover, parallel EGO approaches are described in [10,28] and these can be extended to constrained problems. For high-dimensional problems, RBF methods have been shown to be effective (e.g., ConstrLMSRBF [18], COBRA [19], CEP-RBF [20] and CONOPUSRBF [21]). The standard constrained EGO, however, has difficulties with in high dimensions because of the computational overhead and numerical issues with fitting the kriging model. To alleviate these issues, Bouhlel et al. [6] introduced SEGOKPLS+K, which can handle problems with about 50 decision variables.
5
Summary and Future Directions
This paper gave a brief survey of some of the surrogate-based and surrogateassisted methods for constrained optimization. The methods discussed are based on RBF and kriging models, though other types of surrogates and ensembles may be used. Various infill strategies were discussed. Moreover, parallel surrogatebased methods and algorithms that can handle high-dimensional problems were mentioned. Possible future directions of research for constrained expensive blackbox optimization would be to develop methods that can handle black-box equality constraints. Relatively few such methods have been developed and one approach is described in [3]. Another direction is to deal with hidden constraints. Finally, it is important to develop more methods that can be proved to converge to the global minimum, or at least to first order points.
References 1. Appel, M.J., LaBarre, R., Radulovi´c, D.: On accelerated random search. SIAM J. Optim. 14(3), 708–731 (2004) 2. Bagheri, S., Konen, W., Allmendinger, R., Branke, J., Deb, K., Fieldsend, J., Quagliarella, D., Sindhya, K.: Constraint handling in efficient global optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 673–680. GECCO 2017, ACM, New York (2017) 3. Bagheri, S., Konen, W., Emmerich, M., B¨ ack, T.: Self-adjusting parameter control for surrogate-assisted constrained optimization under limited budgets. Appl. Soft Comput. 61, 377–393 (2017) 4. Basudhar, A., Dribusch, C., Lacaze, S., Missoum, S.: Constrained efficient global optimization with support vector machines. Struct. Multidiscip. Optim. 46(2), 201–221 (2012) 5. Bouhlel, M.A., Bartoli, N., Otsmane, A., Morlier, J.: Improving kriging surrogates of high-dimensional design models by partial least squares dimension reduction. Struct. Multidiscip. Optim. 53(5), 935–952 (2016)
46
R. G. Regis
6. Bouhlel, M.A., Bartoli, N., Regis, R.G., Otsmane, A., Morlier, J.: Efficient global optimization for high-dimensional constrained problems by using the kriging models combined with the partial least squares method. Eng. Optim. 50(12), 2038–2053 (2018) 7. Boukouvala, F., Hasan, M.M.F., Floudas, C.A.: Global optimization of general constrained grey-box models: new method and its application to constrained PDEs for pressure swing adsorption. J. Global Optim. 67(1), 3–42 (2017) 8. Conn, A.R., Le Digabel, S.: Use of quadratic models with mesh-adaptive direct search for constrained black box optimization. Optim. Methods Softw. 28(1), 139– 158 (2013) 9. Forrester, A.I.J., Sobester, A., Keane, A.J.: Engineering Design Via Surrogate Modelling: A Practical Guide. Wiley (2008) 10. Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging Is Well-Suited to Parallelize Optimization, pp. 131–162. Springer, Heidelberg (2010) 11. Jones, D.R.: Large-scale multi-disciplinary mass optimization in the auto industry. In: MOPTA 2008, Modeling and Optimization: Theory and Applications Conference. MOPTA, Ontario, Canada, August 2008 12. Jones, D.R.: A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21(4), 345–383 (2001) 13. Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998) 14. Koch, P., Bagheri, S., Konen, W., Foussette, C., Krause, P., B¨ ack, T.: A new repair method for constrained optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2015), pp. 273–280 (2015) 15. Nu˜ nez, L., Regis, R.G., Varela, K.: Accelerated random search for constrained global optimization assisted by radial basis function surrogates. J. Comput. Appl. Math. 340, 276–295 (2018) 16. Parr, J.M., Keane, A.J., Forrester, A.I., Holden, C.M.: Infill sampling criteria for surrogate-based optimization with constraint handling. Eng. Optim. 44(10), 1147– 1166 (2012) 17. Powell, M.J.D.: The theory of radial basis function approximation in 1990. In: Light, W. (ed.) Advances in Numerical Analysis, Volume 2: Wavelets, Subdivision Algorithms and Radial Basis Functions, pp. 105–210. Oxford University Press, Oxford (1992) 18. Regis, R.G.: Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions. Comput. Oper. Res. 38(5), 837–853 (2011) 19. Regis, R.G.: Constrained optimization by radial basis function interpolation for high-dimensional expensive black-box problems with infeasible initial points. Eng. Optim. 46(2), 218–243 (2014) 20. Regis, R.G.: Evolutionary programming for high-dimensional constrained expensive black-box optimization using radial basis functions. IEEE Trans. Evol. Comput. 18(3), 326–347 (2014) 21. Regis, R.G.: Surrogate-assisted particle swarm with local search for expensive constrained optimization. In: Koroˇsec, P., Melab, N., Talbi, E.G. (eds.) Bioinspired Optimization Methods and Their Applications, pp. 246–257. Springer International Publishing, Cham (2018) 22. Regis, R.G., Shoemaker, C.A.: Parallel radial basis function methods for the global optimization of expensive functions. Eur. J. Oper. Res. 182(2), 514–535 (2007)
A Survey of Surrogate Approaches
47
23. Regis, R.G., Wild, S.M.: CONORBIT: constrained optimization by radial basis function interpolation in trust regions. Optim. Methods Softw. 32(3), 552–580 (2017) 24. Sasena, M.J., Papalambros, P., Goovaerts, P.: Exploration of metamodeling sampling criteria for constrained global optimization. Eng. Optim. 34(3), 263–278 (2002) 25. Schonlau, M.: Computer Experiments and Global Optimization. Ph.D. thesis, University of Waterloo, Canada (1997) 26. S´ obester, A., Leary, S.J., Keane, A.J.: On the design of optimization strategies based on global response surface approximation models. J. Global Optim. 33(1), 31–59 (2005) 27. Wild, S.M., Regis, R.G., Shoemaker, C.A.: ORBIT: optimization by radial basis function interpolation in trust-regions. SIAM J. Sci. Comput. 30(6), 3197–3219 (2008) 28. Zhan, D., Qian, J., Cheng, Y.: Pseudo expected improvement criterion for parallel EGO algorithm. J. Global Optim. 68(3), 641–662 (2017)
Adaptive Global Optimization Based on Nested Dimensionality Reduction Konstantin Barkalov(B) and Ilya Lebedev Lobachevsky State University of Nizhni Novgorod, Nizhni Novgorod, Russia [email protected]
Abstract. In the present paper, the multidimensional multiextremal optimization problems and the numerical methods for solving these ones are considered. A general assumption only is made on the objective function that this one satisfies the Lipschitz condition with the Lipschitz constant not known a priori. The problems of this type are frequent in the applications. Two approaches to the dimensionality reduction for the multidimensional optimization problems were considered. The first one uses the Peano-type space-filling curves mapping a one-dimensional interval onto a multidimensional domain. The second one is based on the nested optimization scheme, which reduces a multi-dimensional problem to a family of the one-dimensional subproblems. A generalized scheme combining these two approaches has been proposed. In this novel scheme, solving a multidimensional problem is reduced to solving a family of problems of lower dimensionality, in which the space-filling curves are used. An adaptive algorithm, in which all arising subproblems are solved simultaneously has been implemented. The numerical experiments on several hundred test problems have been carried out confirming the efficiency of the proposed generalized scheme. Keywords: Global optimization · Multiextremal functions · Dimensionality reduction · Peano curve · Nested optimization Numerical methods
1
·
Introduction
This paper considers “black-box” global optimization problems of the following form: ϕ(y ∗ ) = min {ϕ(y) : y ∈ D}, D = y ∈ RN : ai ≤ yi ≤ bi , 1 ≤ i ≤ N .
(1)
This study was supported by the Russian Science Foundation, project No. 16-11-10150. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 48–57, 2020. https://doi.org/10.1007/978-3-030-21803-4_5
Adaptive Global Optimization
49
The objective function is assumed to satisfy the Lipschitz condition |ϕ(y ) − ϕ(y )| ≤ L y − y , y , y ∈ D, 0 < L < ∞, with the constant L unknown a priori. The multistart scheme is a well known method for solving the multiextremal problems. In such schemes, a grid is seeded in the search domain, the points of which are used as the starting ones for the search of the extrema by some local method, and then the lowest of the found extrema is chosen. The choice of the starting points performed, as a rule, on the basis of the Monte Carlo method is a special problem within this approach [25]. The approach is well applicable for the problems with a small number of local minima having a vide field of application, but for the problems with essential multiextremality its efficiency falls drastically. At present, the genetic algorithms one way or another based on the random search concept are used widely for solving the global optimization problems (see, for example, [24]). Because of simplicity of implementation and usage these ones have gained a large popularity. However the quality of work (the number of problems from some set solved correctly can serve as a quantitative measure of which) is essentially less as compared to the deterministic algorithms [14,20]. If one speaks about the deterministic global optimization methods, many ones of this class are based on various methods of division of the search domain into a system of subdomains followed by the selection of the most promising subdomain for placing the next trial (computing the objective function). The results in this direction are presented in [5,13,15,16,19]. Finally, the approach related to the reduction of the multidimensional problems to the equivalent one-dimensional ones (or to a system of the onedimensional subproblems) with subsequent solving the one-dimensional problems by the efficient univariate optimization algorithms is widely used for the development of the multidimensional optimization methods. Two such schemes are used: the reduction based on the Peano-type space-filling curves (evolvents) [22,23], and the nested optimization scheme [18,23]. An adaptive reduction scheme generalizing the classical nested optimization one has been proposed in [8]. The adaptive scheme essentially enhances the optimization efficiency as compared to the base prototype [10]. In the present work, a generalization of the adaptive dimensionality reduction scheme combining the use of the nested optimization and the Peano curves has been proposed. In this approach the nested subproblems in the adaptive scheme can be the onedimensional as well as the multidimensional ones. In the latter case, the evolvents are used for the reduction of the dimensionality of the nested subproblems.
2
The Global Search Algorithm
As a core problem we consider a one-dimensional multiextremal optimization problem (2) ϕ∗ = ϕ(x∗ ) = min {ϕ(x) : x ∈ [0, 1]}
50
K. Barkalov and I. Lebedev
with objective function satisfying the Lipschitz condition. Let us give the description of the global search algorithm (GSA) applied for solving the above problem (according to [23]). GSA involves constructing a sequence of points xi , where the values of the objective function z i = ϕ(xi ) are calculated. Let us call the function value calculation process the trial. According to the algorithm, the first two trials are executed at the ends of the interval [0, 1], i.e. x0 = 0, x1 = 1. The function values z 0 = ϕ(x0 ), z 1 = ϕ(x1 ) are computed and the number k is set to 1. In order to select the point of a new trial xk+1 , k ≥ 1, it is necessary to perform the following steps. Step 1. Renumber by subscripts (beginning from zero) the points xi , 0 ≤ i ≤ k, of the previous trials in increasing order, i.e., 0 = x0 < x1 < . . . < xk = 1. Juxtapose to the points xi , 0 ≤ i ≤ k, the function values zi = ϕ(xi ), 0 ≤ i ≤ k. Step 2. Compute the maximum absolute value of the first divided differences μ = max
1≤i≤k
|zi − zi−1 | Δi
(3)
where Δi = xi − xi−1 . If the above formula yields a zero value, assume that μ = 1. Step 3. For each interval (xi−1 , xi ), 1 ≤ i ≤ k, calculate the characteristic R(i) = rμΔi +
(zi − zi−1 )2 − 2(zi + zi−1 ), rμΔi
(4)
where r > 1 is a predefined parameter of the method. Step 4. Find the interval (xt−1 , xt ) with the maximum characteristic R(t) = max R(i). 1≤i≤k
(5)
Step 5. Execute the new trial at the point xk+1 =
1 zt − zt−1 (xt−1 + xt ) − . 2 2rμ
(6)
The algorithm terminates if the condition Δt < is satisfied; here t is from (5) and > 0 is the preset accuracy. For estimation of the global solution values zk∗ = min ϕ(xi ), x∗k = arg min ϕ(xi ). 0≤i≤k
0≤i≤k
are selected. The theory of algorithm convergence is presented in [23].
Adaptive Global Optimization
3 3.1
51
Dimensionality Reduction Dimensionality Reduction Using Peano-Type Space-Filling Curves
The use of Peano curve y(x) y ∈ RN : −2−1 ≤ yi ≤ 2−1 , 1 ≤ i ≤ N = {y(x) : 0 ≤ x ≤ 1}
(7)
unambiguously mapping the interval of real axis [0, 1] onto an N -dimensional cube is the first of the dimension reduction methods considered. Problems of numerical construction of Peano-type space-filling curves and the corresponding theory are considered in detail in [22,23]. Here we will note that a numerically constructed curve (evolvent) is 2−m accurate approximation of the theoretical Peano curve, where m is an evolvent construction parameter. By using this kind of mapping it is possible to reduce the multidimensional problem (1) to a univariate problem ϕ(y ∗ ) = ϕ(y(x∗ )) = min {ϕ(y(x)) : x ∈ [0, 1]}.
(8)
An important property of such mapping is preservation of boundedness of function relative differences (see [22,23]). If the function ϕ(y) in the domain D satisfies the Lipschitz condition, then the function ϕ(y(x)) on the interval [0, 1] will satisfy a uniform H¨ older condition 1/N
|ϕ(y(x1 )) − ϕ(y(x2 ))| ≤ H |x1 − x2 |
,
(9)
where the H¨older constant H is linked to the Lipschitz constant L by the relation √ H = 2L N + 3. (10) Condition (9) allows adopting the algorithm for solving the one-dimensional problems presented in Sect. 2 for solving the multidimensional problems reduced to the one-dimensional ones. For this, the lengths of intervals Δi involved into rules (3), (4) of the algorithm are substituted by the lengths 1/N
Δi = (xi − xi−1 )
(11)
and the following expression is introduced instead of formula (6): N xt + xt−1 1 |zt − zt−1 | xk+1 = − sign(zt − zt−1 ) . 2 2r μ 3.2
(12)
Nested Optimization Scheme
The nested optimization scheme of dimensionality reduction is based on the well-known relation (see, e.g., [4]) min ϕ(y) = y∈D
min
min
a1 ≤y1 ≤b1 a2 ≤y2 ≤b2
...
min
aN ≤yN ≤bN
ϕ(y),
(13)
52
K. Barkalov and I. Lebedev
which allows replacing the solving of multidimensional problem (1) by solving a family of one-dimensional subproblems related to each other recursively. In order to describe the scheme let us introduce a set of reduced functions as follows: (14) ϕN (y1 , ..., yN ) = ϕ(y1 , ..., yN ), ϕi (y1 , ..., yi ) =
ϕi+1 (y1 , ..., yi , yi+1 ), 1 ≤ i ≤ N − 1.
min
ai+1 ≤yi+1 ≤bi+1
(15)
Then, according to relation (13), solving of multidimensional problem (1) is reduced to solving a one-dimensional problem ϕ∗ =
min
a1 ≤y1 ≤b1
ϕ1 (y1 ).
(16)
But in order to evaluate the function ϕ1 at a fixed point y1 it is necessary to solve the one-dimensional problem of the second level ϕ1 (y1 ) =
min
a2 ≤y2 ≤b2
ϕ2 (y1 , y2 ),
(17)
and so on up to the univariate minimization of the function ϕN (y1 , ..., yN ) with fixed coordinates y1 , ..., yN −1 at the N -th level of recursion. For the nested scheme presented above, a generalization (block nested optimization scheme), which combines the use of evolvents and the nested scheme has been elaborated in [2]. Let us consider vector y as a vector of block variables y = (y1 , y2 , ..., yN ) = (u1 , u2 , ..., uM ),
(18)
where the i-th block variable ui is a vector of vector y components, taken serially, i.e., u1 = (y1 , y2 , ..., yN1 ), u2 = (yN1 +1 , yN1 +2 , ..., yN1 +N2 ),..., uM = (yN −NM +1 , yN −NM +2 , ..., yN ), where N1 + N2 + ... + NM = N . Using the new variables, main relation of the nested scheme (13) can be rewritten in the form min ϕ(y) = min min ... min ϕ(y), y∈D
u1 ∈D1 u2 ∈D2
uM ∈DM
(19)
where the subdomains Di , 1 ≤ i ≤ M , are projections of initial search domain D onto the subspaces corresponding to the variables ui , 1 ≤ i ≤ M . The formulae defining the method for solving the problem (1) based on relation (19), in general, are the same to the ones of nested scheme (14)–(16). It is only necessary to substitute the original variables yi , 1 ≤ i ≤ N , by the block variables ui , 1 ≤ i ≤ M . At that, the nested subproblems ϕi (u1 , ..., ui ) =
min
ui+1 ∈Di+1
ϕi+1 (u1 , ..., ui , ui+1 ), 1 ≤ i ≤ M − 1,
(20)
in the block scheme are the multidimensional ones. The dimension reduction method based on Peano curves can be applied for solving these ones. It is a principal difference from the initial nested scheme.
Adaptive Global Optimization
3.3
53
Block Adaptive Optimization Scheme
The solving of the arising set of subproblems (15) (for the nested optimization scheme) or (20) (for the block nested optimization scheme) can be organized in various ways. A straightforward way (developed in details for the nested optimization scheme [9,18] and for the block nested optimization scheme [2,3]) is based on solving the subproblems according to the generation order. However, here a loss of a considerable part of the information on the objective function takes place when solving the multidimensional problem. Another approach is the adaptive scheme, in which all subproblems are solved simultaneously, that allows more complete accounting for the information on the multidimensional and accelerating the process of its solving. For the case of the one-dimensional subproblems the adaptive scheme was theoretically substantiated and tested in [8,10,11]. The present work proposes a generalization of the adaptive scheme for the case of the multidimensional subproblems. Let us give a brief description of its basic elements. Let us assume the nested subproblems (20) to be solved with the use of a multidimensional global search algorithm described in Sect. 3.1. Then, each subproblem (20) can be associated with a numerical value called the characteristic of this problem. The value R(t) from (5) (i.e., the maximum characteristic of the intervals formed within the subproblem) can be taken as such characteristic. According to the rule of computing the characteristics (4), the higher the value of the characteristic, the more promising the subproblem for continuing the search of the global minimum of the initial problem (1). Therefore, the subproblem with the highest characteristic is selected for executing the next trial at each iteration. This trial either computes the objective function value ϕ(y) (if the selected subproblem belongs to the level j = M ) or generates new subproblems according to (20) when j ≤ M − 1. In the latter case, new generated problems are added to current problem set, their characteristics are computed, and the process is repeated. The optimization process is finished when the stop condition is fulfilled for the algorithm solving the root problem.
4
Results of Numerical Experiments
One of the well-known approaches to investigating and comparing multiextremal optimization algorithms is based on the application of these methods for solving a set of test problems generated randomly. The comparison of the algorithms has been carried out using the Grishagin test problems Fgr (see, for example, [17], test function 4) and the GKLS generator [7]. In [1,11] the global search algorithm (GSA) with the use of the evolvents as well as in combination with the adaptive dimensionality reduction scheme was shown to overcome many well known optimization algorithms including DIRECT [12] and DIRECTl [6]. Therefore, in the present study we will limit ourselves to the comparison of the variants of GSA with different dimensionality reduction schemes.
54
K. Barkalov and I. Lebedev
In order to compare the efficiencies of the algorithms, the two criteria were used: the average number of trials and the operating characteristic. The operating characteristic of an algorithm is the function P (k) defined as the fraction of the problems from the considered series, for solving of which not more than k trials have been required. The problem was considered to be solved, if the algorithm generated a trial point y k in the vicinity of the global minimizer y ∗ , k ∗ i.e. y − y < δ b − a, where δ = 0.01, a and b are the boundaries of the search domain D. The first series of experiments has been carried out on the two-dimensional problems from the classes Fgr , GKLS Simple, and GKLS Hard (100 functions from each class). Table 1 presents the averaged numbers of trials executed by GSA with the use of evolvents (Ke ), nested optimization scheme (Kn ), and adaptive nested optimization scheme (Kan ). Figures 1 and 2(a, b) present the operating characteristics of the algorithms obtained on the problem classes Fgr , GKLS Simple, and GKLS Hard respectively. The solid line corresponds to the algorithm using the evolvents (GSA-E), short dashed line – to the adaptive nested optimization scheme (GSA-AN), and the long dashed line – to the nested optimization scheme (GSA-N). The results of experiments demonstrate that GSA with the use of the adaptive nested optimization scheme shows almost the same speed as compared to GSA with the evolvents, and both algorithms considerably exceed the algorithm using the nested optimization scheme. Therefore further experiments were limited to the comparison of different variants of the adaptive dimensionality reduction scheme. Table 1. Average number of trials for 2D problems. Fgr
GKLS Simple GKLS Hard
Ke
180
252
674
Kn
341
697
1252
Kan 215
279
815
1
0.8
0.6
GSA-E GSA-N GSA-AN
0.4
0.2
0 0
100
200
300
400
500
600
700
800
Fig. 1. Operating characteristics using Fgr class
Adaptive Global Optimization 1
1
0.8
0.8
0.6
GSA-E GSA-N GSA-AN
0.4
0.2
0.6
55
GSA-E GSA-N GSA-AN
0.4
0.2
0
0 0
200
400
600
800
1000
1200
1400
0
500
1000
(a)
1500
2000
2500
3000
(b)
Fig. 2. Operating characteristics using 2d GKLS Simple (a) and Hard (b) classes
The second series of experiments has been carried out on the four-dimensional problems from the classes GKLS Simple and GKLS Hard (100 functions of each class). Table 2 presents the averaged numbers of trials executed by GSA with the use of the adaptive nested optimization scheme (Kan ) and block adaptive nested optimization scheme (Kban ) with two levels of subproblems of equal dimensionality N1 = N2 = 2. Note that when solving the problem of the dimensionality N = 4 using the initial variant of the adaptive scheme, four levels of onedimensional subproblems are formed that complicates the processing of these ones. Figure 3(a, b) presents the operating characteristics of the algorithms obtained on the classes GKLS Simple and GKLS Hard respectively. The dashed line corresponds to GSA using the adaptive nested optimization scheme (GSAAN), the solid line – the block adaptive nested optimization scheme (GSA-BAN). The results of experiments demonstrate the use of the block adaptive nested optimization scheme provides a considerable gain in the number of trials (up to 35%) as compared to the initial adaptive nested optimization scheme. Table 2. Average number of trials for 4D problems. GKLS Simple GKLS Hard Kan
21747
35633
Kban 13894
31620
56
K. Barkalov and I. Lebedev
1
1
0.8
0.8
0.6
0.6 GSA-AN GSA-BAN
GSA-AN GSA-BAN 0.4
0.4
0.2
0.2
0
0 0
20000
40000
60000
(a)
80000
100000
0
40000
80000
120000
160000
200000
(b)
Fig. 3. Operating characteristics using 4d GKLS Simple (a) and Hard (b) classes
5
Conclusion
In the present work, the generalized adaptive dimensionality reduction scheme for the global optimization problems combining the use of Peano space-filling curves and the nested (recursive) optimization scheme has been proposed. For solving the reduced subproblems of less dimensionality, the global search algorithm was applied. The computational scheme of the algorithm was given, main issues related to the use of the adaptive dimensionality reduction scheme were considered. The numerical experiments have been carried out using the series of test problems in order to compare the efficiencies of different dimensionality reduction schemes. The result of experiments demonstrated that the use of the block adaptive nested optimization scheme can essentially reduce the number of trials required to solve the problem with given accuracy. Further works on the development of the global optimization methods based on this method of the dimensionality reduction may be related to the use of the local estimates of the Lipschitz constant (considered, for example, in [9,21]) in different subproblems.
References 1. Barkalov, K., Gergel, V., Lebedev, I.: Use of Xeon Phi coprocessor for solving global optimization problems. Lecture Notes in Computer Science, vol. 9251, pp. 307–318 (2015) 2. Barkalov, K., Lebedev, I.: Solving multidimensional global optimization problems using graphics accelerators. Commun. Comput. Inf. Sci. 687, 224–235 (2016) 3. Barkalov, K., Gergel, V.: Multilevel scheme of dimensionality reduction for parallel global search algorithms. In: OPT-i 2014 Proceedings of 1st International Conference on Engineering and Applied Sciences Optimization, pp. 2111–2124 (2014) 4. Carr, C., Howe, C.: Quantitative Decision Procedures in Management and Economic: Deterministic Theory and Applications. McGraw-Hill, New York (1964) 5. Evtushenko, Y., Posypkin, M.: A deterministic approach to global box-constrained optimization. Optim. Lett. 7, 819–829 (2013)
Adaptive Global Optimization
57
6. Gablonsky, J.M., Kelley, C.T.: A locally-biased form of the direct algorithm. J. Glob. Optim. 21(1), 27–37 (2001) 7. Gaviano, M., Kvasov, D.E., Lera, D., Sergeev, Ya.D.: Software for generation of classes of test functions with known local and global minima for global optimization. ACM Transact. Math. Softw. 29(4), 469–480 (2003) 8. Gergel, V., Grishagin, V., Gergel, A.: Adaptive nested optimization scheme for multidimensional global search. J. Glob. Optim. 66(1), 35–51 (2016) 9. Gergel, V., Grishagin, V., Israfilov, R.: Local tuning in nested scheme of global optimization. Proc. Comput. Sci. 51(1), 865–874 (2015) 10. Grishagin, V., Israfilov, R., Sergeyev, Y.: Comparative efficiency of dimensionality reduction schemes in global optimization. AIP Conf. Proc. 1776, 060011 (2016) 11. Grishagin, V., Israfilov, R., Sergeyev, Y.: Convergence conditions and numerical comparison of global optimization methods based on dimensionality reduction schemes. Appl. Math. Comput. 318, 270–280 (2018) 12. Jones, D., Perttunen, C., Stuckman, B.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993) 13. Jones, D.R.: The direct global optimization algorithm. In: The Encyclopedia of Optimization, pp. 725–735. Springer, Heidelberg (2009) 14. Kvasov, D.E., Mukhametzhanov, M.S.: Metaheuristic vs. deterministic global optimization algorithms: the univariate case. Appl. Math. Comput. 318, 245 – 259 (2018) ˇ 15. Paulaviˇcius, R., Zilinskas, J.: Advantages of simplicial partitioning for Lipschitz optimization problems with linear constraints. Optim. Lett. 10(2), 237–246 (2016) ˇ 16. Paulaviˇcius, R., Zilinskas, J., Grothey, A.: Investigation of selection strategies in branch and bound algorithm with simplicial partitions and combination of Lipschitz bounds. Optim. Lett. 4(2), 173–183 (2010) 17. Sergeyev, Y., Grishagin, V.: Sequential and parallel algorithms for global optimization. Optim. Method. Softw. 3(1–3), 111–124 (1994) 18. Sergeyev, Y., Grishagin, V.: Parallel asynchronous global search and the nested optimization scheme. J. Comput. Anal. Appl. 3(2), 123–145 (2001) 19. Sergeyev, Y., Kvasov, D.: A deterministic global optimization using smooth diagonal auxiliary functions. Commun. Nonlinear Sci. Numer. Simul. 21(1–3), 99–111 (2015) 20. Sergeyev, Y., Kvasov, D., Mukhametzhanov, M.: On the efficiency of natureinspired metaheuristics in expensive global optimization with limited budget. Sci. Rep. 8(1), 435 (2018) 21. Sergeyev, Y., Mukhametzhanov, M., Kvasov, D., Lera, D.: Derivative-free local tuning and local improvement techniques embedded in the univariate global optimization. J. Optim. Theory Appl. 171(1), 186–208 (2016) 22. Sergeyev, Y.D., Strongin, R.G., Lera, D.: Introduction to Global Optimization Exploiting Space-filling Curves. Springer Briefs in Optimization. Springer, New York (2013) 23. Strongin R.G., Sergeyev Y.D.: Global Optimization with Non-convex Constraints. Sequential and Parallel Algorithms. Kluwer Academic Publishers, Dordrecht (2000) 24. Yang, X.S.: Nature-Inspired Metaheuristic Algorithms. Luniver Press, Frome (2008) ˇ 25. Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer, New York (2008)
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem Deepak D. Gawali1(B) , Bhagyesh V. Patil2,3 , Ahmed Zidna4 , and Paluri S. V. Nataraj5 1
5
Vidyavardhini’s College of Engineering and Technology, Palghar, India [email protected] 2 Cambridge Centre for Advanced Research and Education In Singapore, Singapore, Singapore [email protected] 3 John Deere Technology Centre, Magarpatta City, Pune, India 4 LGIPM, University of Lorraine, Metz, France [email protected] Systems and Control Engineering, Indian Institute of Technology Bombay, Mumbai, India [email protected]
Abstract. This paper addresses a nonconvex optimal power flow problem (OPF). Specifically, a new B-spline approach in the context of OPF problem is introduced. The applicability of this new approach is shown on a real-world 3-bus power system. The numerical results obtained with this new approach for this problem a 3-bus system reveal a satisfactory improvement in terms of optimality when compared against traditional interior-point method based MATPOWER toolbox. Similarly, the results are also found to be satisfactory with respect to the global optimization solvers like BARON and GloptiPoly. Keywords: Polynomial B-spline · Global optimization Polynomial optimization · Constrained optimization
1
·
Introduction
The optimal power flow (OPF) has a rich research history since it was first introduced by Carpentier in 1962 [1]. In practice, the OPF problem aims at minimizing the electric generator fuel cost to meet the desired load demand for power system under various operating conditions, such as system thermal dissipation, voltages and powers. Briefly, the classical formulation for the OPF problem can be stated as follows:
c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 58–67, 2020. https://doi.org/10.1007/978-3-030-21803-4_6
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem
min x
xmin
f (x)
s.t.g(x) ≤ 0. h(x) = 0. ≤ x ≤ xmax ,
59
(Fuel cost of generation) (Branch flow limits) (Nonlinear power balance equations) (Bound on the decision variables)
(1)
Here x = [θ V Pg Qg ]T , and θ is a voltage angle, V is a voltage magnitude, Pg is a real power generation, and Qg is a reactive power generation. In the literature several solution approaches have been investigated to solve the OPF problem (1). This includes linear programming, Newton-Raphson, quadratic programming, Lagrange relaxation, interior-point methods, genetic algorithms, evolutionary programming and particle swarm optimization [2–7]. However, it may be noted that the OPF problem is generally nonconvex in nature and multiple number of local optimum solutions exist (cf. [10]). Hence, the above mentioned solution approaches, which are based on a convexity assumption of the optimization problem, may result in a local optimal fuel cost value. As such it is of paramount interest to look for an alternative solution approach that can guarantee global optimality in the fuel cost value of (1). We note that the OPF problem possesses a polynomial formulation, i.e. functions f, g, and h in (1) are polynomial functions in x. Based on this fact, in the present work, we investigate a new approach for polynomial global optimization of the OPF problem. It is based on the polynomial B-spline form and uses several attractive geometric properties associated with it. Specifically, we use the polynomial B-spline form for higher order approximation of the OPF problem. It is noteworthy that the B-spline coefficients provide lower and upper bounds for the range of a function. Generally, this bound is over-estimated, but can be improved either with raising degree of B-spline or with an additional spline (or control) points. In this work, we particularly increase the number of B-spline control points to get sharper bounds on the range of the OPF problem. Further, we incorporate the obtained B-spline approximation for the OPF problem in a classical interval branch-and-bound framework. This enables us to locate a correct global optimal fuel cost value for the OPF problem to a user-specified accuracy. The rest of the paper is organized as follows. In Sect. 2, we give briefly the notations and definitions of the polynomial B-spline form and describe the univariate and multivariate cases, and B-spline global algorithm. In Sect. 3, we report the numerical experiments performed with the B-spline global optimization algorithm on a 3-bus power system and compare the quality of the global minimum with those obtained using the well-known NLP solvers BARON [12], GloptiPoly [13] and MATPOWER [11]. In Sect. 4, we give the conclusion of our work.
2
Background: Polynomial B-Spline Approach for Global Optimization
In this section, we briefly introduce polynomial B-spline form [17]. This polynomial B-spline form is the basis of the main B-spline global optimization algorithm reported in Sect. 2.3.
60
D. D. Gawali et al.
Let s ∈ N be the number of variables and x = (x1 , x2 , ..., xs ) ∈ Rs . A s multi-index I is defined as I := (i1 , i2 , ..., is ) ∈ (N ∪ {0}) and multi-power xI i1 i2 I is is defined as x := (x1 , x2 , ..., xs ). Given a multi-index N := (n1 , n2 , ..., ns ) and an index r, we define Nr,−l = (n1 , ....., nr−1 , nr − l, nr+1 , ...., ns ), where 0 ≤ nr − l ≤ nr . Inequalities I ≤ N for multi-indices are meant componentwise, i.e. il ≤ nl , l = 1, 2, ..., s. With I = (i1 , ..., ir−1 , ir , ir+1 , ..., is ) we associate the index Ir,l given by Ir,l = (i1 , ..., ir−1 , ir + l, ir+1 , ..., is ), where 0 ≤ ir + l ≤ nr . A real bounded and closed interval xr is defined as xr ≡ [xr , xr ] := [inf xr = min xr , sup xr = max xr ] ∈ IR, where IR denotes the set of compact intervals. Let wid xr denotes the width of xr , that is wid xr := xr − xr . Let I = [a, b], m (degree of polynomial B-spline) and k (number of B-spline segments) given positive integers, and u = {xi }k+m i=−m , h = (b − a)/k with mesh length, be a uniform grid partition defined by x−m = x−m+1 = · · · = x0 , xi = a + ih, for i = 1, . . . , k, xk+1 = xk+2 = · · · = xk+m . Then the associated polynomial spline space of degree m is defined by Sm (I, u) = {s ∈ C m−1 (I) : s|[xi ,xi+1 ] ∈ Pm }, where Pm is the space of polynomials of degree at most m. It is well known that the set of the classical normalized B-splines {Nim , i = −m, . . . , k − 1} is a basis for Sm (I, u) that satisfies interesting properties; for example, each Nim is positive on its support and {Nim }k−1 i=−m form a partition of unity. On the other hand, as Pm ⊂ Sm (I, u), the power basis functions {xr }m 0 can be expressed in terms of B-splines through the relations xt =
k−1
πjt Njm (x), t = 0, . . . , m,
(2)
j=−m
where πjt are the symmetric polynomials given by πjt =
Symt (j + 1, . . . , j + m) m for t = 0, . . . , m.
(3)
t
The B-splines can be computed by the recurrence formula m−1 (x), Nim (x) = γi,m (x)Nim−1 (x) + (1 − γi+1,m (x))Ni+1
where
⎧ ⎨
x − xi , γi,m (x) = xi+m − xi ⎩ 0,
and Ni0 (x)
:=
1, 0,
if xi ≤ xi+m ,
(4)
(5)
otherwise,
if x ∈ [xi , xi+1 ), otherwise.
(6)
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem
61
In order to easily compute bounds for a range of a multivariate polynomial of degree N over an s-dimensional box, one can derive its B-spline representation [14,15]. p(x) = aI xI , x ∈ Rs . (7) I≤N
2.1
Univariate Case
Firstly, we consider a univariate polynomial p (x) :=
n
at xt , x ∈ [a, b] ,
(8)
t=0
to be expressed in terms of the B-spline basis of the space of polynomial splines of degree m ≥ n (i.e. order m + 1). By substituting (2) into (8) we get
n k−1 n k−1 (t) m (t) p(x) = at πj Nj (x) = at πj Njm (x) t=0 j=−m j=−m t=0 (9) k−1 = dj Njm (x), j=−m
where dj
n
(t)
at πj .
(10)
t=0
2.2
Multivariate Case
Now, we derive the B-spline representation of a given multivariate polynomial (7) n1 ns ... ai1 ...is xi11 ...xiss = aI xI , (11) p (x1 , x2 , ..., xs ) = i1 =0
is =0
I≤N
where I := (i1 , i2 , ..., is ), and N := (n1 , n2 , ..., ns ). By substituting (2) for each xt , (11) can be written as ns k k 1 −1 s −1 (i ) (i ) ... ai1 ...is πj11 Njm1 1 (x1 ) ... πjss Njms s (xs ) i1 =0 is =0 j1 =−m1 js =−ms
k n1 ns s −1 (i1 ) (is ) ... ... ai1 ...is πj1 ....πjs ...Njm1 1 (x1 ) ...Njms s (xs )
p (x1 , ..., xs ) = =
k 1 −1 j1 =−m1
n1
js =−ms i1 =0 k 1 −1
=
...
j1 =−m1
is =0 k s −1
js =−ms
dj1 ...js Njm1 1 (x1 ) ...Njms s (xs ) , (12)
we have expressed p as p(x) =
I≤N
dI (x)NIN (x),
(13)
62
D. D. Gawali et al.
with the coefficients dI (x) given by dj1 ,...,js =
n1 i1 =0
...
ns is =0
(i )
(i )
ai1 ...is πj11 ....πjss .
(14)
Global optimization of polynomials using the polynomial B-spline approach needs transformation of the given multivariate polynomial from its power form into its polynomial B-spline form. Then B-spline coefficients are collected in an array D(x) = (dI (x))I∈S , where S = {I : I ≤ N}. This array is called a patch. Let p be a polynomial of degree N and let p¯(x) denote the range of p on the given domain x. Then, for a patch D(x) of B-spline coefficients it holds [8,9,14,15], p¯(x) ⊆ [min D(x), max D(x)], (15) Then the B-spline range enclosure for the polynomial p, is given as p¯(x) = D(x). Remark 1. Above equation (15) says that the minimum and maximum B-spline coefficients of multivariate polynomial p on x obtained by transforming it from the power form to B-spline form, provides an enclosure for the range of the p. We shall obtain such B-spline transformation for the OPF problem (1), followed by a interval branch-and-bound procedure to locate correct global optimal solution for (1). 2.3
Main B-Spline Global Optimization Algorithm
A classical algorithm for deterministic global optimization uses exhaustive search over the feasible region using interval branch-and-bound procedure. Typically, the interval branch-and-bound has means to compute upper and lower bounds (based on the inclusion function), followed by a divide and conquer approach. Similar, to a interval branch-and-bound algorithm, our B-spline global optimization algorithm uses B-spline range enclosure property as a inclusion function. This provides upper and lower bounds on an instance of a optimization problem. If both bounds are within the user-specified accuracy, then an optimal solution has been found. Otherwise, the feasible region is divided into two or more sub-regions, and the same procedure is applied to each of the sub-regions until termination condition is satisfied. Below we give a pseudo-code description of the polynomial B-spline global optimization algorithm (henceforth, referred to as B-spline Opt).
3
Case Study: 3-Bus Power System
In this section, we report the numerical results using the polynomial B-spline global optimization algorithm (B-spline Opt) applied to a 3-bus power system shown in Fig. 1. Specifically, we show the optimality achieved in the result, i.e. quality of global minimum, f ∗ in (1) with respect to different solution approaches. First, we compare against the interior-point method based MATPOWER toolbox [4] used to solve the OPF problem. Further, to validate our
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem
63
Algorithm 2.1. B-spline Opt (Ac , Nc , Kc , x, , zero ) Input : Here Ac is a cell structure containing the coefficients array of objective and all the constraints polynomial, Nc is a cell structure, containing degree vector N for objective and all constraints. Where elements of degree vector N defines the degree of each variable occurring in objective and all constraints polynomial, Kc is a cell structure containing vectors corresponding to objective polynomial, Ko and all constraints, i.e. Kgi , Khj . Where elements of this vector define the number of B-spline segments in each variable direction, the initial box x, the tolerance limit and tolerance parameter zero to which the equality constraints are to be satisfied. Output: Global minimum pˆ and all the global minimizers z (i) in the initial search box x to the specified tolerance . Begin Algorithm 1 {Compute the B-spline segment numbers} For each entry of K in Kc , compute K = N + 2. 2 {Compute the B-spline coefficients} Compute the B-spline coefficients array for objective and constraints polynomial on initial search domain x i.e. Do (x), Dgi (x) and Dhj (x) respectively. 3 {Initialize current minimum estimate} Initialize the current minimum estimate p˜ = max Do (x). 4 {Set flag vector} Set F = (F1 , . . . , Fp , Fp+1 , . . . , Fp+q ) := (0, . . . , 0). 5 {Initialize lists} L ← {x, Do (x), Dgi (x), Dhj (x), F }, Lsol ← {} 6 {Sort the list L} Sort the list L in descending order of (min Do (x)). 7 {Start iteration} if L = ∅ then go to 12 else pick the last item from L, denote it as {b, Do (b), Dgi (b), Dhj (b), F }, and delete this item entry from L. end 8 {Perform cut-off test} Discard the item {y, Do (y), Dgi (y), Dhj (y), F } if min Do (y) > p˜ then go to 7 end
64
9
10
11 (a) (b)
(c) (d) (I)
(II)
(e)
(f )
D. D. Gawali et al.
{Subdivision decision} if (wid b) & (max Do (b) − min Do (b)) < then enter the item {b, min D0 (b)} to Lsol & go to 7 else go to 10 end {Generate two sub boxes} Choose the subdivision direction along the longest direction of b and the subdivision point as the midpoint. Subdivide b into two subboxes b1 and b2 such that b = b1 ∪ b2 . for r ← 1 to 2 {Set flag vector} r r , . . . , Fp+q ) := F Set F r = (F1r , . . . , Fpr , Fp+1 {Compute B-spline coefficients and corresponding B-spline range enclosure for br } Compute the B-spline coefficient arrays of objective and constraints polynomial on box br and compute corresponding B-spline range enclosure Do (br ), Dgi (br ) and Dhj (br ) for objective and constraints polynomial. {Set local current minimum estimate} Set p˜local = min(Do (br )) if (˜ plocal < p˜) then for i ← 1 to p do if (Fi = 0) & (Dgi (br ) ≤ 0) then Fir = 1 end end for j ← 1 to q do if (Fp+j = 0) & (Dhj (br ) ⊆ [−zero , zero ]) then r Fp+j =1 end end end if F r = (1, . . . , 1) then set p˜ := min(˜ p, max(Do (br ))) end Enter {br , Do (br ), Dgi (br ), Dhj (br ), F r } into the list L. end
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem
12 13
14
65
{Compute the global minimum} Set the global minimum to the current minimum estimate pˆ = p˜. {Compute the global solution} Find all those items in Lsol for which min Do (b) = p. The first entries of these items are the global minimizer(s) z(i) . return the global minimum pˆ and all the global minimizers z(i) found above. End Algorithm
numerical results we choose global optimization solvers like BARON and GloptiPoly. For BARON the 3-bus system is modeled in GAMS and solved via the NEOS server for optimization [16]. The B-spline Opt algorithm is implemented in MATLAB and the OPF instance for the 3-bus system is also modeled in the MATLAB. It is solved on a PC with an Intel-core i3-370M CPU processor running at 2.40 GHz with a 6 GB RAM. The termination accuracy and equality constraint feasibility tolerance zero are set to 0.001. Table 1 shows the numerical results (global minimum, f ∗ ) with different solutions approaches. We found the B-spline Opt algorithm is able to find a better optimal solution with respect to the MATPOWER. It is worth noting that practically this accounts for around 3 $/hr savings in the fuel cost required for the electricity generation. Our numerical results are further validated with respect to BARON (cf. Table 1). We further note that, for GloptiPoly, the relaxation order needs to systematically increased to obtain the convergence to the final result. However, GloptiPoly exhausts the memory even with small relaxation order (in this case just 2).
Fig. 1. 3-bus power system.
66
D. D. Gawali et al.
Table 1. Comparison of the optimal fuel cost value (f ∗ ) in (1) for a 3-bus system with the different solution approaches.
Solver/Algorithm
f*($/hr)
B-spline Opt
5703.52∗
BARON
5703.52∗
GloptiPoly
∗
∗
MATPOWER 5707.11∗ ∗ ∗ ∗ Indicates the best obtained fuel cost value. ∗∗ Indicates that the solver did not give the result even after one hour and therefore terminated. ∗∗∗ Indicates the local optimal fuel cost value
Remark 2. In practice, a local optimum exists for the OPF problem. Reference [10] shows for small bus power systems where voltage is within practical limits, standard fixed-point optimization packages, such as MATPOWER converges to a local optimum. Similarly, we observed that global optimization software package GloptiPoly successfully solved the OPF instance of 3-bus power system. However, it took significantly large amount of computational time to report the final global optimum.
4
Conclusions
This paper addressed the important planning problem in power systems, termed as optimal power flow (OPF). A new global optimization algorithm based on the polynomial B-spline was proposed for solving the OPF problem. The applicability of the B-spline algorithm was demonstrated on the OPF instance corresponding to a real-world 3-bus system. A notable savings in the fuel cost (3 $/hr) was achieved using B-spline algorithm with respect to traditional MATPOWER toolbox. Similarly, the results obtained using proposed B-spline algorithm are further validated against the generic global optimization solver BARON and were found to be satisfactory.
References 1. Capitanescu, F.: Critical review of recent advances and further developments needed in AC optimal power flow. Electr. Power Syst. Res. 136, 57–68 (2016) 2. Huneault, M., Galiana, F.: A survey of the optimal power flow literature. IEEE Trans. Power Syst. 6(2), 762–770 (1991) 3. Torres, G., Quintana, V.: Optimal power flow by a nonlinear complementarity method. In: Proceedings of the 21st IEEE International Conference Power Industry Computer Applications (PICA’99), 211–216 (1999)
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem
67
4. Wang, H., Murillo-Sanchez, C., Zimmerman, R., Thomas, R.: On computational issues of market-based optimal power flow. IEEE Trans. Power Syst. 22(3), 1185– 1193 (2007) 5. Momoh, J.: Electric Power System Applications of Optimization. Markel Dekker, New York (2001) 6. Momoh, J., El-Hawary, M., Adapa, R.: A review of selected optimal power flow literature to 1993. Part I. Nonlinear and quadratic programming approaches. IEEE Trans. Power Syst. 14(1), 96–104 (1999) 7. Momoh, J., El-Hawary, M., Adapa, R.: A review of selected optimal power flow literature to 1993. Part II. Newton, linear programming and interior point methods. IEEE Trans. Power Syst. 14(1), 105–111 (1999) 8. Gawali, D., Zidna, A., Nataraj, P.: Algorithms for unconstrained global optimization of nonlinear (polynomial) programming problems: the single and multisegment polynomial B-spline approach. Comput. Oper. Res. Elsevier 87, 201–220 (2017) 9. Gawali, D., Zidna, A., Nataraj, P.: Solving nonconvex optimization problems in systems and control: a polynomial B-spline approach. In: Modelling, Computation and Optimization in Information Systems and Management Sciences, Springer, 467–478 (2015) 10. Bukhsh, W., Grothey, A., McKinnon, K., Trodden, P.: Local solutions of the optimal power flow problem. IEEE Trans. Power Syst. 28(4), 4780–4788 (2013) 11. Zimmerman, R., Murillo-Sanchez, C., Thomas, R.: MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26(1), 12–19 (2011) 12. Tawarmalani, M., Sahinidis, N.: GloptiPoly: a polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005) 13. Henrion, D., Lasserre, J.: GloptiPoly: global optimization over polynomials with Matlab and SeDuMi. ACM Trans. Math. Softw. (TOMS) 29(2), 165–194 (2003) 14. Lin, Q., Rokne, J.: Methods for bounding the range of a polynomial. J. Comput. Appl. Math. 58(2), 193–199 (1995) 15. Lin, Q., Rokne, J.: Interval approximation of higher order to the ranges of functions. Comput. Math. Appl. 31(7), 101–109 (1996) 16. NEOS Server for optimization. http://www.neos-server.org/neos/solvers/ 17. De Boor, C.: A Practical Guide to Splines. Applied Mathematical Sciences. Springer, Berlin (2001)
Concurrent Topological Optimization of a Multi-component Arm for a Tube Bending Machine Federico Ballo(B) , Massimiliano Gobbi , and Giorgio Previati Politecnico di Milano, 20156 Milan, Italy {federicomaria.ballo,giorgio.previati}@polimi.it
Abstract. In this paper the problem of the concurrent topological optimization of two different bodies sharing a region of the design space is dealt with. This design problem focuses on the simultaneous optimization of two bodies (components) where not only the material distribution of each body has to be optimized but also the design space has to be divided among the two bodies. This novel optimization formulation represents a design problem in which more than one component has to be located inside a limited allowable room. Each component has its function and load carrying requirements. The paper presents a novel development in the solution algorithm. The algorithm has been already presented referring to the concurrent optimization of two bodies where the same mesh is used for both bodies in the shared portion of the domain. In this version of the algorithm, this requirement has been removed and each of the two bodies can be meshed with an arbitrary mesh. This development allows the application of the method to any real geometry. The algorithm is applied to the design of a multi-component arm for a tube bending machine. Keywords: Structural optimization · Topology optimization Multi-component system optimization · SIMP
1
·
Introduction
In the literature many different engineering problems referring to the topological optimization of structural components can be found (see for instance [3–5,9, 13,15]). In most cases, only one component is considered. Recently, however, problems involving multi-domain optimization, such as the design of multi-phase or multi-material structures [12], have been considered. Referring to the particular problem of the optimization of systems composed by two bodies sharing the same design domain, some applications can be found. In [7,11], the level set method is used for the optimization of the c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 68–77, 2020. https://doi.org/10.1007/978-3-030-21803-4_7
Concurrent Topological Optimization of a Multi-component
69
interface between two, or more, phases (components) in order to obtain some prescribed interaction force. In [14], a completely different approach, the Moving Morphable Components method, which seems to be adaptable to solve such problems, is discussed. In [2,8], the authors have presented a simple algorithm based on the SIMP approach [3,9]. The method is able to optimize the material distribution of two different bodies while, at the same time, allocate in an efficient way the space between the bodies. The algorithm has also proved to be easily implementable and can be used along with available optimization algorithms. The applicability of the algorithm is limited to bodies presenting the same mesh in the shared portion of the domain. Such limitation prevents the utilization of the algorithm on arbitrary shaped optimization domains. The present paper aims to solve this limitation, extending the applicability of the algorithm to any design domain and any kind of finite element (Sect. 2). The algorithm will be tested by considering a simple two dimensional example (Sect. 3). Then, in Sect. 4, the application of the presented algorithm to the optimization of a multi-component arm for a tube bending machine will be shown.
2
Problem Formulation
The formulation of the concurrent structural optimization of two bodies sharing the same design space has been given and discussed in [8]. In this section, it is briefly summarized. Figure 1 depicts the two bodies sharing (a portion of) the design space. The following nomenclature is considered. – Ω1 and Ω2 : bodies to be optimized. – Ω1−2 : shared portion of the design space that can be occupied by any of the two bodies (or left void). Any given point of this region, can be occupied by only one of the two bodies. – Ω1∗ and Ω2∗ : the two unshared parts of the domains Ω1 and Ω2 respectively (s.t. Ω1 = Ω1∗ + Ω1−2 and Ω1 = Ω1∗ + Ω1−2 ). (1) (2) – Ω1−2 and Ω1−2 : portion of the shared portion of the design space assigned to Ω1 or Ω2 respectively. (1) – Ω1 and Ω2 : actual design space of the two bodies, given by Ω1 = Ω1∗ ∪ Ω1−2 (2) and Ω2 = Ω2∗ ∪ Ω1−2 – f1 and f2 : applied load on Ω1 and Ω2 respectively. – Γ1 and Γ2 : boundary conditions on Ω1 and Ω2 respectively. Obviously, for a physical meaningful solution of the problem, Ω1 and Ω2 must be connected. It must be emphasized that the portion of the shared domain Ω1−2 to be (1) (2) allocated to each body (i.e. Ω1−2 and Ω1−2 ) is not given a priori, but must be allocated during the optimization process and can change according to the evolution of the shapes of the two bodies themselves.
70
F. Ballo et al.
Fig. 1. Generalized geometries of the design domains (Ω1 and Ω2 ) of two bodies sharing a portion of the design space. Each body has its own system of applied forces (f1 and f2 ) and boundary constraints (Γ1 and Γ2 ). Ω1−2 represents the shared portion of the design space. Left: initial definition of the domains. Right: division of the domains with assignment of the shared portion of the design space to each body.
Under the hypotheses of – linear elastic bodies with small deformations; – the two bodies do not interact in the shared portion of the domain (i.e. no contact is considered between the two bodies in the shared portion of the domain, interactions between the two bodies outside this region is possible); – each body is made by only one isotropic material; – both materials have the same reference density, but they can have different elastic moduli; – loads and boundary conditions can be applied at any location of the domains, even in the shared portion; – the structural problem is formulated by the finite element theory; the problem can be stated in the framework of finite elements as F ind minu1 ,u2 ,ρ f1T u1 + f2T u2 s.t. :
K Ee1 u1 = f1 and K Ee2 u2 = f2 p Ee1 = ρ (xe ) Ee∗1 , xe ∈ Ω1 p Ee2 = ρ (xe ) Ee∗2 , xe ∈ Ω2 (1) (2) (1) (2) Ω1−2 ∪ Ω1−2 = Ω1−2 and Ω1−2 ∩ Ω1−2 = ∅ N N 1 2 ρ (xe ) + ρ (xe ) ≤ V 0 < ρmin ≤ ρ ≤ 1 e=1
(1)
e=1
N2 N1 Ke Ee1 and K Ee2 = e=1 Ke Ee2 where : K Ee = e=1 Ω1 and Ω2 connected
1
where u1 and u2 are the displacement fields of the two bodies, xe are the coordinates of the centre of the considered element, K is stiffness matrix of each body with Ke element stiffness matrix, Ee1 and Ee2 are the elastic moduli of each element of the two bodies, with Ee∗1 and Ee∗2 reference elastic moduli, N1 and
Concurrent Topological Optimization of a Multi-component
71
Fig. 2. Diagram of the algorithm for the concurrent topological optimization. Symbols refer to Fig. 1.
N2 are the number of elements of each body and ρ is the pseudodensity, with p the penalty term. In [2,8], the authors have presented a solution algorithm able to solve the problem stated in Eq. 1 under the condition that the two bodies present the same mesh in the shared part of the domain, i.e. in this region there is a one to one correspondence of each element of the two bodies. In the following, this condition is removed. The modified solution algorithm is reported in the diagram of Fig. 2. The algorithm is basically divided in two parts, i.e. the dashed rectangles labeled A and B in Fig. 2. The sub-algorithm A represents a standard topology optimization algorithm (see [3]). This sub-algorithm follows the solution of the finite element model
72
F. Ballo et al.
of the two bodies (block 1 in Fig. 2). The sub-algorithm B is the part of the algorithm devoted to the allocation of the shared part of the domain (Ω1−2 ) to each of the two bodies. The sub-algorithm B implements the following steps. – Interpolation of the sensitivity fields on Ω1−2 computed in Block A1. Interpolation is required as there is no correspondence between the elements of the two bodies. To interpolate the sensitivities, the value of sensitivity computed on each element is normalized with respect to the volume of the element. The interpolation is performed on local patches. – Comparison of the two sensitivities and allocation of the shared design space. (1) The shared design is allocated to Ω1−2 where the sensitivity computed on Ω1 (2) is greater than the sensitivity computed on Ω2 or to Ω1−2 otherwise. (1) (2) – Construction of Ω1 = Ω1∗ ∪ Ω1−2 and Ω2 = Ω2∗ ∪ Ω1−2 . – Imposition of the connectivity on Ω1 and Ω2 . Each non-connected region of the two sets, in case present, is found and assigned to the other set. In this way, the two sets result connected. – The elements of Ω1 and Ω2 belonging to Ω1 and Ω2 respectively are found and marked as active. The remaining elements are marked as non active. The information on the active or non-active elements is transferred to Block A3 and the new pseudodensity of each element is computed by the standard method described in [10]. The target volume fraction is enforced at system level and each body can have a different value of volume fraction. – Penalization of the non-active elements of the shared part of the domain, i.e. their pseudensity is set equal to the minimum value of pseudodensity. After the sub-algorithm B is completed, the new finite element model is constructed and the solution algorithm is repeated until convergence or the maximum number of iterations is reached.
3
Two Dimensional Problem
In this section a simple two dimensional problem is reported. The problem is depicted in Fig. 3 and consists of two cantilevers with an end load sharing part of the domain. The two cantilevers have the same geometry, material, load and boundary conditions. However, the two domains are meshed with very different meshes. Domain Ω1 is meshed by three node triangular elements with a non structured distribution and mean element size of 1 mm. Domain Ω2 is meshed by six node triangular elements with a structured distribution and mean element size of 1 mm. At the first step of the analysis, the same pseudodensity has been assigned to all the elements of the two bodies, including the elements in the shared part of the domain.
Concurrent Topological Optimization of a Multi-component
73
50
Dimension [unit]
40
1
F=100
2
30 20 1-2
10 0 F=100
-10 0
20
40
60
80
Dimension [unit]
Fig. 3. Two dimensional example definition.
In Fig. 4 the results of the concurrent topological optimization of the two cantilevers are shown. The figure shows both the division of the common part of the design space and the two optimized structures. The two optimized structures are very similar, as one may expect as the two sub-problems are the same. In fact, the mass difference is 0.17% and the difference in compliance is 0.42%. The obtained topology is also similar to the theoretical solution of the problem of the cantilever with load end ([3] p. 49).
4
Concurrent Topological Optimization of a Tool Support Swing Arm
The described algorithm has been employed for the optimization of a tool support swing arm for a tube bending machine. This research activity has been completed in collaboration with BLM Group [1]. Due to the very high production rate of the machine, the swing arm is subjected to high accelerations and, as a consequence, an inertial torque arises. The optimization of the system is thus important to reduce energy consumption and increase the production rate. In Fig. 5, the tool support swing arm is depicted. The figure shows only half of the model due to the symmetry in the geometry of the system. The arm is composed by two parts, namely the support arm which rotates around a vertical rotation axis and the sledge sliding on a guide rail on the support arm. The tool load is applied to the sledge by a multi-point constraint. A contact interaction is imposed between the sledge and the support arm in correspondence of the guide rail. It is worth noting that, by including this contact condition, a non linear finite element analysis has to be run for each optimization step. A screw drive is moved by a motor in order to position the sledge with respect to the support arm. The screw is actuated by a motor connected to a gear.
74
F. Ballo et al.
Fig. 4. Solution of the two dimensional problem in Fig. 3.
Fig. 5. Tool support swing arm model [1].
The system is subjected to an angular acceleration around the rotation axis. Also, three different loads are considered. The loads are applied one at the time and for different positions of the sledge along the rail. Forces F2 and F3 act only in the y direction, while force F1 has x and y components. Due to the angular acceleration and force F1 , the system undergoes a non symmetric deformation. To maintain the geometrical symmetry of the components, a symmetry constraint is applied in the optimization algorithm. The symmetry constraint
Concurrent Topological Optimization of a Multi-component
75
is enforced as explained in [6]. The components are manufactured by additive manufacturing, thus no manufacturability constraint is considered. The optimization domains of the support arm and of the sledge are also reported in Fig. 5. In the figure, a relatively small region is highlited in green. This region is quite critical for the stiffness of both components and cannot be used by both as they would interfere when the sledge is in the most inner position. This region is considered as shared domain that can be assigned to either component during the optimization process. With respect to the algorithm of Fig. 2, in the optimization of the tool support swing arm, not only the connectness of the two design spaces is enforced, but an additional condition that the shape of the domains must allow the sliding of the sledge is considered. The two components to be optimized are meshed by four-nodes tetrahedrons with mean size of 1.5 mm. A total of about 1.65 · 106 elements is present in the model. The optimization problem is solved by using AbaqusTM 2016 for the solution of the non linear finite element model and a handwritten code to read the results, implement the optimization algorithm and write the new input file. The handwritten code is realized by using Python and MatlabTM . As in the previous example, at the beginning of the analysis, the same pseudodensity has been assigned to all the elements. The results of the optimization process, with a overall target volume fraction of 0.55, is shown in Fig. 6. On the left, an overall view of the optimized system is reported. On the right, the detail of the optimization of the two components in the shared part of the domain can be seen. The shared domain has been divided in a quite complex way, allowing an internal reinforcement for the sledge and a lower reinforcement for the support arm.
Fig. 6. Optimization results - surfaces with pseudodensity greater than 0.3. Left: complete system. Right: detail of the shared part of the domain.
5
Conclusion
In the present paper, an improved algorithm has been presented for the concurrent optimization of two bodies sharing part of the design domain. The improved algorithm allows for the utilization of any arbitrary mesh on the bodies. Also,
76
F. Ballo et al.
the algorithm has been used with a commercial finite element software by considering a SIMP approach and a symmetry constraint in the solution, proving the applicability of the method with existing optimization algorithm. In this way, the algorithm can be applied to real world optimization problem. The new algorithm has been tested on a simple two dimensional problem and then applied to the optimization of the arm of a tube bending machine designed in collaboration with BLM Group. The application has shown the ability of the algorithm to solve real problems and find non trivial efficient solutions for the assignment of the shared domain. Further developments of the method that consider the possibility to include contact interactions in the shared part of the domain will be investigated.
References 1. BLM Group. http://www.blmgroup.com. Accessed 24 Jan 2019 2. Ballo, F., Gobbi, M., Previati, G.: Concurrent topological optimisation: optimisation of two components sharing the design space. In: EngOpt 2018 Proceedings of the 6th International Conference on Engineering Optimization, pp. 725–738. Springer International Publishing, Cham (2019) 3. Bendsøe, M.P., Sigmund, O.: Topology Optimization. Theory, Methods, and Applications, 2nd edn. Springer Berlin (2004) 4. Eschenauer, H.A., Olhoff, N.: Topology optimization of continuum structures: a review. Appl. Mech. Rev. 54(4), 331 (2001) 5. Guo, X., Cheng, G.D.: Recent development in structural design and optimization. Acta Mech. Sin. 26(6), 807–823 (2010) 6. Kosaka, I., Swan, C.C.: A symmetry reduction method for continuum structural topology optimization. Comput. Struct. 70(1), 47–61 (1999) 7. Lawry, M., Maute, K.: Level set shape and topology optimization of finite strain bilateral contact problems. Int. J. Numer. Methods Eng. 113(8), 1340–1369 (2018) 8. Previati, G., Ballo, F., Gobbi, M.: Concurrent topological optimization of two bodies sharing design space: problem formulation and numerical solution. Struct. Multidiscip. Optim. (2018) 9. Rozvany, G.I.N.: A critical review of established methods of structural topology optimization. Struct. Multidiscip. Optim. 37(3), 217–237 (2009) 10. Sigmund, O.: A 99 line topology optimization code written in matlab. Struct. Multidiscip. Optim. 21(2), 120–127 (2001) 11. Str¨ omberg, N.: Topology optimization of orthotropic elastic design domains with mortar contact conditions. In: Schumacher, A., Vietor, T., Fiebig, S., Bletzinger, K.-U., Maute, K. (eds.) Advances in Structural and Multidisciplinary Optimization: Proceedings of the 12th World Congress of Structural and Multidisciplinary Optimization, pp. 1427–1438. Springer International Publishing, Braunschweig, Germany (2018) 12. Tavakoli, R., Mohseni, S.M.: Alternating active-phase algorithm for multimaterial topology optimization problems: a 115-line MATLAB implementation. Struct. Multidiscip. Optim. 49(4), 621–642 (2014) 13. Zhang, W., Zhu, J., Gao, T.: Topology Optimization in Engineering Structure Design. Elsevier, Oxford (2016)
Concurrent Topological Optimization of a Multi-component
77
14. Zhang, W., Yuan, J., Zhang, J., Guo, X.: A new topology optimization approach based on Moving Morphable Components (MMC) and the ersatz material model. Struct. Multidiscip. Optim. 53(6), 1243–1260 (2016) 15. Zhu, J.H., Zhang, W.H., Xia, L.: Topology optimization in aircraft and aerospace structures design. Arch. Comput. Methods Eng. 23(4), 595–622 (2016)
Discrete Interval Adjoints in Unconstrained Global Optimization Jens Deussen(B) and Uwe Naumann Software and Tools for Computational Engineering, RWTH Aachen University, Aachen, Germany {deussen,naumann}@stce.rwth-aachen.de
Abstract. We describe how to deploy interval derivatives up to second order in the context of unconstrained global optimization with a branch and bound method. For computing these derivatives we combine the Boost interval library and the algorithmic differentiation tool dco/c++. The differentiation tool also computes the required floating-point derivatives for a local search algorithm that is embedded in our branch and bound implementation. First results are promising in terms of utility of interval adjoints in global optimization. Keywords: Discrete adjoints · Algorithmic differentiation · Interval arithmetic · Branch and bound · Global optimization
1
Introduction
The preferred numerical method to compute derivatives of a given computer code at a specified point by exploiting the chain rule and elemental symbolic differentiation rules is algorithmic differentiation (AD) [1,2]. The tangent mode of AD computes the Jacobian at a cost proportional to the number of arguments. In case of a high-dimensional domain and a low-dimensional codomain the adjoint mode is advantageous for the derivative computation as the costs are proportional to the number of outputs. AD methods are successfully applied in e.g. machine learning [3], computational finance [4], and fluid dynamics [5]. Interval arithmetic (IA) has the property that all values that can be evaluated are reliably contained in the output of the corresponding interval evaluation on a given domain. This has the advantage that instead of evaluating the function at several points a single function evaluation in IA is required to obtain semi-local information on the function value. Among others, IA can be used to estimate errors with floating-point computations [6,7], and in optimization to find global optima [8–10]. Branch and bound algorithms are often applied in this context. Combining the the discrete differentiation techniques of AD and the inclusion property of IA yields semi-local derivative information. This information can e.g. be used to compute worst-case approximations of the error that can occur in a neighborhood of an evaluation point. Another application field for interval adjoints is approximate and unreliable computing [11]. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 78–88, 2020. https://doi.org/10.1007/978-3-030-21803-4_8
Discrete Interval Adjoints in Unconstrained Global Optimization
79
In this paper we compute discrete interval adjoints up to second order to improve a simple branch and bound algorithm for unconstrained global optimization. The interval derivative information is used to eliminate branches and to converge faster. Another important task for branch and bound algorithms is to updating the bound of the optimum. We apply different methods for this task starting with interval information through to local optimization techniques. The paper is organized as follows: Section 2 presents the well-established basic concepts of interval arithmetic and algorithmic differentiation in a nutshell as well as the combination of both techniques. In Sect. 3 the implemented branch and bound algorithm is described. This algorithm is used in Sect. 4 to investigate the utility of interval derivatives for global optimization. The last section gives a conclusion and an outlook.
2 2.1
Methodology Interval Arithmetic
IA is a concept that enables to compute bounds of a function evaluation on a given interval. Since this chapter will only give a brief introduction to IA, the reader might be referred to [6–8] to get more information about the topic. We will use the following notation for an interval of a variable x with lower bound x and upper bound x [x] = [x, x] = {x ∈ R | x ≤ x ≤ x}
.
The real numbers of the midpoint m [x] and the width w [x] are defined as m [x] = 0.5 (x + x) ,
w [x] = x − x .
Evaluating an univariate scalar function y = g(x) in IA on [x] would result in [y] = g [x] ⊇ {g(x) ∈ R | x ∈ [x]}
.
The superset relation states that the interval [y] can be an overestimation of all possible values on [x], but it guarantees that these values are contained. To ensure this inclusion property arithmetic operations and elementary functions must be redefined. More complex functions can be composed of these basic functions. One reason for the already mentioned overestimation is that the underlying data format (e.g. floating-point numbers) cannot represent the exact bounds. In case of a lower bound IA rounds to negative infinity and in case of an upper bound to positive infinity. Overestimation can also be caused by the dependency problem. If a function evaluation uses a variable multiple times, IA does not take into account that actual values taken from these intervals are equal. The larger the intervals are the more significant the overestimation is. Another problem of applying IA occurs if the implementation contains conditional branches that depend on interval arguments. Comparisons of intervals are only well-defined if these intervals do not intersect. By accepting further
80
J. Deussen and U. Naumann
overestimation a comparison of two intervals can be reduced to a comparison of the difference with a scalar. But if the scalar is included in the interval the comparison is still ambiguous. Approaches to address this problem are either to compute both branches of a condition or to evaluate the function on subdomains that cover the original domain. The second approach is referred to as splitting or bisection [8]. Recursively refining the argument intervals results in some subdomains for which the comparison is well-defined and some for which it is still ambiguous. Since the dependency problem is dependent on the interval width, the overestimation becomes smaller by evaluating the function on smaller domains. 2.2
Algorithmic Differentiation
AD techniques use the chain rule to compute additionally to the function value of a primal implementation the derivative of the function values with respect to arguments and intermediate variables at a specified point. Differentiability of the underlying function is required for the application of AD. In the following we will only consider multivariate scalar functions with n arguments x and a single output y f : Rn → R,
y = f (x) .
(1)
The first derivative of these functions is the gradient ∇f (x) ∈ Rn , and the second derivative is the Hessian matrix ∇2 f (x) ∈ Rn×n . The next subsections will briefly introduce the basic modes of AD. More detailed and general derivations of these models can e.g. be found in [1,2]. Tangent Mode The tangent model can be derived by differentiating the function dependence. Thus, the model consists of the function evaluation in (1) and y (1) =
∂y (1) x ∂xj j
.
(2)
Einstein notation implies summation over all values of j = 0, . . . , n − 1. Following [2], the superscript (1) denotes the tangent of the variable. Equation (2) can be interpreted as an inner product of the gradient and the tangent x(1) as y (1) = ∇f (x) · x(1)
.
For each evaluation with x(1) set to the i-th Cartesian basis vector ei in Rn (also called seeding), an entry of the gradient can be extracted from y (1) (also called harvesting). Using this model to get all entries of the gradient requires n evaluations which is proportional to the number of arguments. The costs of this method are similar to the costs of a finite difference approximation but AD methods are accurate up to machine precision.
Discrete Interval Adjoints in Unconstrained Global Optimization
81
Adjoint Mode The adjoint mode is also called reverse mode, due to the reverse computation of the adjoints compared to the computation of the values. Therefore, a data-flow reversal of the program is required, to store additional information on the computation (e.g. partial derivatives) [12], which potentially leads to high memory requirements. The data structure to store this additional information is often called tape. Again following [2], first-order adjoints are denoted with a subscript (1) . x(1),j =
∂y y(1) ∂xj
.
(3)
This equation is computed for each j = 0, . . . , n − 1. Note that the evaluation of the primal (1) is also part of the adjoint models. The reverse mode yields a product of the gradient with the adjoint y(1) x(1) = y(1) · ∇f (x)
.
(4)
By seeding y(1) = 1 the resulting x(1) contains all entries of the gradient. A single adjoint computation is required. Higher Derivatives Higher derivatives of the primal implementation can be derived by combining the two basic modes. The pure second-order tangent model can be obtained by applying differentiation rules to the first-order tangent model in (2) which yield y (2) = y (1,2) =
∂y (2) x ∂xj j
,
∂y (1,2) ∂2y (1) (2) xj + x x ∂xj ∂xj ∂xk j k
.
(5)
The complete model also contains the evaluation of (1) and (2). The (2) indicate that the component belong to the second application of an AD mode. Seeding the Cartesian basis of Rn for the tangents x(1) and x(2) independently and setting all other components to zero the entries of the Hessian can be obtained with n2 evaluations of (5). The other three combinations apply the adjoint mode at least once such that the costs of the Hessian computation is different to the pure tangent model. The Hessian can be computed with n second-order adjoint model evaluations. The second-order adjoint model can be obtained by evaluating (1) and (3) and applying the tangent mode to (3) which yields ∂y (2) x , ∂xj j ∂2y (2) ∂y (2) = y(1) + y(1) x ∂xj ∂xk ∂xj j
y (2) = (2)
x(1),k
.
(6)
Again, seeding the adjoint y(1) = 1, the tangent x(2) with the Cartesian basis of Rn and everything else with zero results in a row of the Hessian in x(1) . (2)
82
2.3
J. Deussen and U. Naumann
Interval Adjoint Algorithmic Differentiation
Combining the two concepts of interval computations and adjoint AD yields additionally to the interval value of the function its interval valued gradient with a single evaluation. Compared to the traditional approach of AD in which the derivatives are only computed at specified points, we now get semi-local derivative values. The interval gradient contains all possible values of the gradient on the specified domain. These derivatives can be an overestimation of the real derivative values as it was already stated for the interval values. Moreover, it is possible to compute higher derivatives in IA, e.g. the Hessian. Since the interval version of the second-order adjoint model looks the same as (1), (3) and (6) except for the interval notation, we will omit that.
3
Branch and Bound with Interval Adjoints
Branch and bound algorithms [8] can be used to solve the unconstrained global optimization problem on a previously defined domain D y∗ =
min f (x)
x∈D⊆Rn
.
(7)
The general idea of these algorithms is to partition the domain and try to remove those parts that cannot contain a global minimum. Furthermore, the algorithms find a bound for the global minimum y ∗ on domain D and return a subdomain with desired precision that contains the minimum. The algorithm that is referred to in this paper is described in Algorithm 1. It uses a task queue Q to manage all (sub)domains that need to be analyzed. At the beginning there will only be a single task in the queue (domain [x] = D). The algorithm terminates if the queue is empty. In line 7 the tangent component of [x] and the adjoint component of [y] are seeded. The adjoint model itself is called in line 8. After that there are three verification of the following conditions that need to be fulfilled at the global minimum: 1. The value must be less than any other value in the domain. 2. The first-order optimality condition requires that the gradient is zero. 3. The second-order optimality condition needs a positive-definite Hessian. To eliminate those parts of the domain that cannot contain a global minimum the these conditions are reformulated in IA: domains with a lower bound of the function value y that is larger than the upper bound of the global minimum y ∗ and domains that does not contain zeros in the gradient intervals ∇[x] f are removed. The third check removes a domain if the Hessian is not positive-definite v · ∇2[x] f · v < 0 ∃v ∈ Rn . (2)
The product of the interval Hessian with a random vector [x] (line 7) can be (2) harvested from [x](1) . This product is multiplied by the random vector again. If the resulting interval is negative, the Hessian is not positive-definite.
Discrete Interval Adjoints in Unconstrained Global Optimization
83
Algorithm 1 Branch and Bound with interval adjoints 1: procedure BranchAndBound([x0 ] , X , N , L, y ∗ , LN ) y∗ ← ∞ 2: 3: Q ← [x0 ] 4: while Q = ∅ do 5: [x] ← Pop(Q) 6: try (2) 7: [x](2) ← RandomVector( ), [y](1) ← 1.0, [y](1) ← 0.0 (2) (2) (2) 8: [y] , [y](2) , [x](1) , [x](1) ← f(1) ([x] , [x](2) , [y](1) , [y](1) ) 9:
if ValueCheckFails([y] , y ∗ ) or GradientCheckFails([x](1) )or (2)
10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:
HessianCheckFails([x](2) , [x](1) ) then eliminate([x]) else y ∗ ←UpdateBoundGlMin([x], [y] , y ∗ ) if width([x]) > X then Branch(Q, [x]) else Push(L, [x]) end if end if catch exception if width([x]) > N then Branch(Q, [x]) else Push(LN , [x]) end if end try end while return L, y ∗ , LN end procedure
In line 12 the bound for the global minimum y ∗ is updated. The implemented branch and bound algorithm provides three methods to update this upper bound: 1. Store the minimal upper bound of the interval value y. 2. Store the smallest value evaluated at the midpoint of the domains f (m [x]). 3. Investigate a few gradient descent steps on the domain to advance towards a (local) minimum and store the smallest function value. While the first method only needs the already computed interval function value, the other two methods require further function evaluations and the third even requires derivatives in floating-point arithmetic. If none of the previous checks failed and if the domain is still larger than the desired precision X , the domain will be refined with a splitting (line 14) in every direction. This procedure appends 2n new tasks to the task queue. Whenever an undefined comparison occurs the IA implementation is assumed to throw an exception which is handled in lines 6 and 19. The algorithm only
J. Deussen and U. Naumann 2
2
1
1
x[1]
x[1]
84
0
-1
0
-1
-2
-2 -3
-2
-1
0
1
2
3
-3
x[0]
-2
-1
0
1
2
3
x[0]
Fig. 1. Isolines of the composed objective function (left) with non-smooth regions in red and global optima in green. Results of the algorithm (right). Red domains can be non-smooth, green domains can contain global optima, blue, orange and purple indicate if the value, gradient or Hessian check failed, respectively.
splits these domains until a previously defined width N . This procedure has the advantage that it will prevent the algorithm to generate huge amounts of tasks. But it will also result in a surrounding of each potential non-smoothness that needs to be investigated further by other techniques than the applied IA. The implementation of the branch and bound algorithm uses the AD tool dco/c++1 [13] to compute all required derivatives in interval as well as in floatingpoint arithmetic. The Boost library [14] is used as an implementation of IA. Both template libraries make use of the concept of operator overloading. Choosing the interval datatype as a base type for the adjoint datatype results in the desired first-order interval derivatives. Nesting the interval adjoint type into a tangent type yields a interval with second-order derivative information. The adjoint models are only evaluated if the value check passed. Furthermore, OpenMP was used for its implementation of a task queue which also takes care about the parallelization on a shared memory architecture. The user of the branch and bound implementation can decide which of the conditions described in the previous section should be verified. The software only uses second-order adjoint types if the second-order optimality condition is checked. Moreover, the user can select which method should be used to update the bound of the global minimum. If the third method is chosen the user needs to decide how many gradient descent steps should be performed for every task.
4
Case Studies
4.1
Ambiguous Control-Flow Branches
As a first test case the global minima of the six-hump camel function [15] are computed. To show how the algorithm treats control-flow branches the imple1
https://www.nag.co.uk/content/adjoint-algorithmic-differentiation.
1.4 1.2 1 0.8 0.6 0.4 0.2 0
GD (2 log(w[x]/ X)) GD (log(w[x]/ X)) GD (16) GD (4) f(m[x]) y– 0
10
20
30
40
50
(a) run time in seconds
–y*
–y*
Discrete Interval Adjoints in Unconstrained Global Optimization
60
1.4 1.2 1 0.8 0.6 0.4 0.2 0
85
GD (2 log(w[x]/ X)) GD (log(w[x]/ X)) GD (16) GD (4) f(m[x]) – y 0
50
100
150
200
250
300
350
(b) tasks in million
Fig. 2. Convergence of the bound for the minimum y ∗ for the different update approaches over time (left) and over number of tasks (right). For the gradient descent methods (GD) the number of performed steps is given in brackets.
mentation computes a quadratic function in both directions if the function value of the six-hump camel function is greater than 10. Figure 1 (left) shows isolines of the objective function and the highlighted non-smooth area at y = 10 (red) as well as the two global optima located at x = (0.0898, −0.7126) and x = (−0.0898, 0.7126) with value y ∗ = −1.0316 (green). The initial domain is set to [x0 ] = [−3, 3] in both directions. All three conditions are checked and the function value at the midpoint of a domain is used to update the bound of the minimum. For purpose of visualization the algorithm stops branching if the width of a subdomain is smaller than X = N = 0.1. The results of the branch and bound algorithm are visualized in Fig. 1 (right). Green domains can contain global minima. Red denotes a potential non-smooth region. Blue domains failed the value check, while orange domains do not contain a zero in the gradient interval. Purple is assigned to non-convex domains. The two global and two of the local minima are still contained in the green subdomains. The upper bound for these minima was computed to be y ∗ = −1.0241 after 1545 tasks. The area around the control-flow branch is larger than required due to the overestimation of IA. 4.2
Minimum Bound Update
To figure out which method is the best for updating y ∗ we performed tests for the Griewank function [16] with n = 8 on the domain [x0 ] = [−200, 220] in each direction. The target interval width was set to X = 10−13 . In Fig. 2 we compare the convergence of the minimum bound for the proposed update methods. Evaluating the function at the midpoint of the domain improves the branch and bound compared to using the upper bounds of the interval values. The (incomplete) local search of a minimum implemented by a gradient descent can decrease y ∗ even faster although it has higher computational costs due to the computation of the gradient for every task. We observe that the gradient descent method with 16 steps converges faster in the beginning, but it loses this advance after some time due to the high computational effort. Thus, choosing
86
J. Deussen and U. Naumann
the number of decent steps to be dependent on the width (brown) of the domain requires an unchanged number of tasks while the run time decreases. Computing even more gradient descent steps (green) reduces the number of computed tasks and with that the run time to less than a half. 4.3
Interval Derivative Conditions
The third tests are performed to evaluate how useful interval derivatives are in the context of branch and bound algorithms. Therefore, we compute the global optimum of the Griewank, the Rosenbrock [17] and the Styblinski-Tang [18] function with the specification of the algorithm as given in Sect. 4.2. For updating the bound of the minimum 16 gradient descent steps are computed. Since the adjoint models are only evaluated if the value check passes the average time per task is strongly dependent on the problem. If the value check fails for almost every task as it is the case for the Griewank function the additional average cost per task is very low as stated in Table 1. For the other two objectives the additional average cost per task is still less than 50%. Since there are only a few cases left in which the interval Hessian can be computed the average cost per task is just slightly increasing compared to the interval gradient computation. The tests without interval derivatives were aborted after 30 minutes such that it is impossible to quantify the achieved savings for using the interval gradient. Nevertheless, all tests using interval derivative information found the global minima in less than two minutes. More than 30% of the tasks were eliminated by the gradient check for the Styblinski-Tang function such that the impact of the derivative information on that function is larger than on the other two functions. The Hessian condition was violated in a few cases only. Table 1. Additional average costs per task for computing interval derivative information if required (left) and relative amount of tasks failing the particular conditions (right) for the Griewank (GW), Rosenbrock (RB) and Styblinski-Tang (ST) function. Additional average cost per task Failing condition GW RB ST GW RB ST Value
5
-
-
99.1% 89.7%
68.9%
Gradient 4.0% 29.5% 42.0%
0.5%
10.0%
30.7%
Hessian
0.0%
0}. We see that the frontier of its domain of definition is delimited by an hyperbola, a circle and lines represented by the domain constraints. A paving of the solution set of a numerical CSP can be computed by an interval branch-and-contract algorithm that recursively divides and contracts the initial box until reaching the desired precision, following a contractor programming approach [5]. In this framework, an interval contractor is associated with one or many constraints to tighten a box by removing inconsistent values from its bounds, using different techniques such as consistency techniques adapted to numerical computations with intervals [11] or interval Newton operators [10]. Several contractors can be applied in a fixed-point loop known as constraint propagation [12]. Finding inner boxes can be done by considering the negations of the constraints [2] or by means of inflation techniques [4,6]. In the following, we introduce an interval branch-and-contract algorithm that calculates a paving of the domain of definition of a real function within a given box. A set of rules is proposed to derive the system of domain constraints entailed by the expression of the function. We adapt the HC4Revise contractor [3] in order to process these specific constraints. Finally, a new heuristic for generating maximal inner boxes is devised. A set of experiments permit to evaluate the quality of computed pavings. The rest of this paper is organized as follows. Interval arithmetic and the notion of interval contractor will be introduced in Sect. 2. The new algorithms will be described in Sect. 3. Section 4 is devoted to the experimental results, followed by a conclusion.
Filtering Domains of Factorable Functions Using Interval Contractors
2 2.1
101
Interval Computations Interval Arithmetic
An interval is a closed and connected set of real numbers. The set of intervals is denoted by I. The empty interval represents an empty set of real numbers. The width of an interval [a, b] is equal to (b − a). The interval hull of a set of real numbers S is the interval [inf S, sup S] denoted by hull S. Given an integer n ≥ 1, an n-dimensional box X is a Cartesian product of intervals X1 × · · · × Xn . A box is empty if one of its components is empty. The width of a box X is the maximum width taken componentwise denoted by wid X. Interval arithmetic is a set extension of real arithmetic [13]. Let g : D → R be a real function with D ⊆ Rn . An interval extension of g is an interval function G : In → I such that (∀X ∈ In ) (∀x ∈ X ∩ D) g(x) ∈ G(X). This property called the fundamental theorem of interval arithmetic implies that the interval G(X) encloses the range of g over X. When g corresponds to a basic operation, it is possible to implement the interval operation in a way to calculate the hull of the range by exploiting monotonicity properties, limits and extrema. More complex functions can be extended in several ways. In particular, the natural interval extension of a factorable function consists of evaluating the function with interval operations given interval arguments. 2.2
Interval Contractors
Given a vector of unknowns x ∈ Rn , an interval contractor associated with a constraint c(x) is an operator Γ : In → In verifying the following properties: Γ (X) ⊇ {x ∈ X : c(x)} (consistency) (∀X ∈ In ) Γ (X) ⊆ X (contractance) An interval contractor aims at removing inconsistent values at the bounds of the variable domains. There are many kinds of contractors and we present here the forward backward contraction algorithm called HC4Revise [3]. Given an equation g(x) = 0 or an inequality constraint g(x) ≤ 0 and a box X, the first phase is an evaluation of the natural extension of g from the leaves to the root. We then consider the interval I associated with the relation symbol, namely [0, 0] for an equation and [−∞, 0] for an inequality. There are three cases: if the intersection G(X) ∩ I is empty then the constraint is inconsistent; if we have the inclusion G(X) ⊆ I then the constraint is consistent and X is an inner box for this constraint said to be inactive; otherwise the second phase calculates projections from the root of g lying in G(X) ∩ I to the leaves, eventually contracting the variable domains. An example is presented in Fig. 2. As previously shown, an HC4Revise contractor is able to detect that a box is an inner box after the first phase. Now it is possible to apply it to the negation
102
L. Granvilliers [−4, 0] + [−4, 46] u [0, 4] × [0, 20] v 2
[0, 2] x1 [0, 10]
[−4, 0] − [−4, 26] w [0, 4] sqr [0, 25] [0, 4] x3 [−1, 4]
[−2, 2] x2 [−5, 5]
Fig. 2. Let g(x) ≤ 0 be an inequality constraint with g(x1 , x2 , x3 ) = 2x1 + x22 − x3 and let X be the box [0, 10] × [−5, 5] × [−1, 4]. The interval on the right at each node of g is the result of the interval evaluation phase of the HC4Revise contractor. The interval at the root node is intersected with the interval I = [−∞, 0] associated with the relation symbol. The interval on the left at each node is the result of the projection phase from the root to the leaves. For example, let u ∈ [−4, 0], v ∈ [0, 20] and w ∈ [−4, 26] be three variables respectively labelling the + node, the × node and the − node. We have to project the equation v + w = u over v and w, which propagates the new domain at the root node to its children nodes. To this end the equation is inverted and it is equivalently rewritten as v = u − w. The new domain for v is calculated as [0, 20] ∩ ([−4, 0] − [−4, 26]), which leads to the new domain [0, 4] at the × node. The new domain for w is derived similarly. At the end of this backward phase it comes the new box [0, 2] × [−2, 2] × [0, 4].
of a constraint in order to generate inner boxes inside a box, as follows. Given an inequality constraint g(x) ≤ 0, let Γ be an HC4Revise contractor associated with its negation g(x) > 0. Given a box X, it comes by the consistency property of Γ that every element of the region X \ Γ (X) violates the constraint negation, hence satisfying the constraint itself. When this region is non empty, it is possible to generate inner boxes for the constraint, as shown in Fig. 3. Since the rounding errors of machine computations prevent in general to deal with open intervals, the constraint negation is safely relaxed as g(x) ≥ 0.
3 3.1
Filtering Domains of Functions Domain Constraints
Several operations have restricted domains such as the division x → x−1 defined in R \ {0}, the square root defined in R+ , the logarithm defined in (0, +∞), the arccosine and arcsine functions defined in [−1, 1] and the tangent function defined at every point that is not a multiple of π/2. A factorable function whose definition involves one of these operations may not be defined everywhere in a box, and, a fortiori, it may not be continuous. It naturally yields domain constraints that must be verified, as illustrated by Fig. 4.
Filtering Domains of Factorable Functions Using Interval Contractors
103
X c
Fig. 3. Let c be the inequality constraint x21 + x22 ≤ 4 that defines a disk in the Euclidean plane and let X be the box [0, 4] × [−1, 1]. The hatched surface is returned by an HC4Revise contractor Γ associated with the negation of c. The gray region X \ Γ (X) is thus an inner region for c (every point satisfies c) and it is a box here.
Every term op(u1 , . . . , uk ) occurring in a factorable real function f : D ⊆ Rn → Rm such that the domain of op is a strict subset of Rk entails a constraint. A constraint system C can then be generated from f using the following (non exhaustive) set of rules. There are different kinds of constraints such as (strict and non-strict) inequality constraints and disequations. The algorithms introduced thereafter will also consider their negations. ⎧√ u |= u ≥ 0 ⎪ ⎪ ⎪ ⎪ log u |= u > 0 ⎪ ⎪ ⎨ 1/u |= u = 0 acos u |= −1 ≤ u ≤ 1 ⎪ ⎪ ⎪ ⎪ asin u |= −1 ≤ u ≤ 1 ⎪ ⎪ ⎩ tan u |= u = π/2 + kπ (k ∈ Z) Given a box Ω ⊆ Rn , every x ∈ Ω satisfying all the constraints from C must belong to D. Finding the set Ω ∩ D is then equivalent to solve the numerical CSP C, Ω. It is worth noting that the set C may be separable. For example, the function f (x1 , x2 ) = log(x1 ) + x−1 2 entails two constraints x1 > 0 and x2 = 0 sharing no variable and thus handable separately. f
x
√ Fig. 4. The function f (x) = x2 − x is undefined in the open interval (0, 1) since the square root is defined in R+ and g(x) = x2 − x is negative for all x such that 0 < x < 1. The restricted domain of the square root entails the constraint x2 − x ≥ 0.
104
3.2
L. Granvilliers
Branch-and-Contract-Algorithm
Algorithm 1 implements a classical interval branch-and-contract algorithm that calculates a paving of the domain of definition of a function f within a given box Ω. It maintains a list L from which a CSP C, X is extracted at each iteration. This CSP is reduced and divided by two algorithms contract and branch that are specific to our problem. If the set C becomes empty then X is inserted in the set of inner boxes X i . A tolerance > 0 permits to stop processing too small boxes that are inserted in the set of outer boxes X o . Algorithm 1. Branch-and-contract algorithm. – a function f : D → Rm with D ⊆ Rn – a box Ω ⊆ Rn – a tolerance > 0 Output: – a paving (X i , X o ) of Ω ∩ D at tolerance Algorithm: generate the set of domain constraints C from f initialize L with the CSP C, Ω assign (X i , X o ) to the empty paving while L is not empty do extract an element C, X from L contract C, X if X = ∅ then if C = ∅ then insert X in X i elif wid X ≤ then insert X in X o else branch C, X endif endif endwhile Input:
Given a CSP C, X a contractor is associated with each constraint from the set C. The contract component classically implements a constraint propagation algorithm that applies the contractors to reduce X until reaching a fixed-point. Moreover, every constraint detected as inactive is removed from C. The HC4Revise contractor has been designed to handle non-strict inequality constraints and equations since it is not possible to manage open intervals in general due to the rounding errors. The more specific domain constraints are handled as follows. Let g(x) be a real function, let G be the natural interval extension of g and let X be a box. – A strict inequality constraint g(x) > 0 is safely relaxed as g(x) ≥ 0 since every point that violates the relaxation also violates the constraint. – A disequation g(x) = 0 is violated if the interval G(X) is reduced to 0, it is inactive if we have max G(X) < 0 or min G(X) > 0, and nothing happens in the backward phase otherwise.
Filtering Domains of Factorable Functions Using Interval Contractors
105
– A double inequality constraint a ≤ g(x) ≤ b is simply handled by setting the interval G(X) ∩ [a, b] at the root node after the first phase. – The periodic domain constraint g(x) = π/2 + kπ for some integer k does not permit to contract X. The constraint is simply detected as inactive if G(X) does not contain π/2 + kπ for every k, and nothing happens otherwise. The branch algorithm divides a CSP C, X into sub-problems. Let C be the set of constraints {c1 , . . . , cp }. A contractor Γi is associated with the negation of ci for i = 1, . . . , p. Each contractor is applied to X and it follows that the region X\
p
Γi (X)
i=1
is an inner region for the CSP, which means that every point of this region satisfies all the constraints from C, as illustrated in Fig. 5.
X1
X
X2
Γ2 (X)
Γ1 (X)
Γ3 (X)
(a)
(b)
Fig. 5. A box X is contracted by three contractors Γ1 , Γ2 , Γ3 associated with constraint negations, leading to the hatched boxes in Fig. (a). The complementary gray region is an inner region for the original constraints. Figure (b) shows that X can be split as two boxes X 1 ∪ X 2 where X 1 is the largest inner slice at one bound of X.
We then define the branching heuristic as follows. Let the box p H = hull Γi (X) i=1
be the interval hull of the contracted boxes with respect to the constraint negations. If H is empty then X is an inner box and it is inserted in X i . Now suppose + that H is not empty. Let d− i = min Hi − min Xi and di = max Xi − max Hi be the inter-bound distances between X and H for i = 1, . . . , n. Let − + + d = max{d− 1 , . . . , dn , d1 , . . . , dn }
be the maximum inter-bound distance. If d is greater than the tolerance then there exists an inner box at one bound of X that is large enough. Assuming for
106
L. Granvilliers
− i o instance that d = d− j for some j, X is split in two sub-boxes X ∪ X at xj = dj . The maximal inner box X i is directly inserted in the set of inner boxes X i and the CSP C, X o is inserted in L. Otherwise, a bisection of the largest component of X generates two sub-boxes X ∪ X and the CSPs C, X and C, X are added to L, which ensures the convergence of the branch-and-contract algorithm.
4
Experimental Results
The interval branch-and-contract algorithm has been developed in the interval solver Realpaver [9]. The interval operations are safely implemented with an outward rounding mode, the MPFR library [7] providing the elementary functions with correct rounding. As a consequence, the interval computations in Realpaver are rigorous. All experiments were conducted on a 64 bits Intel Core i7 4910MQ 2.90 GHz processor. Three strategies will be compared in the following: S3 corresponds to Algorithm 1, S2 mimics S3 but the split component always bisects the largest component (no inner box is computed) and S1 corresponds to S2 except that the backward phase of the HC4Revise contractors is disabled in the contract component (only a satisfaction test is done). The quality of a paving can be measured by its cardinality (#X i , #X o ) and the volume of X i . The introductory problem has been processed by S3 , S2 and S1 given = 0.1 and we respectively obtain pavings with cardinalities (330, 646), (738, 736) and (570, 696). There are about the same number of outer boxes which are required to enclose the frontier of the domain of definition at tolerance . However, the sets of inner boxes depicted in Fig. 6 are much different. S3 is able to calculate a small number of maximal inner boxes as compared to S2 and S1 . S1 generates a regular paving (a quadtree of boxes here). S2 is able to contract every box, which leads here to increase the number of inner boxes.
(S3 )
(S2 )
(S1 )
Fig. 6. The sets of inner boxes X i computed by the three strategies for the introductory problem at tolerance = 0.1: 330 boxes for S1 , 738 for S2 and 570 for S3 . Their total areas are respectively equal to 30.38, 29.95 and 29.74.
Filtering Domains of Factorable Functions Using Interval Contractors
107
Another function involving an arccosine, a square root and a division has been handled and the pavings computed by the three strategies at precision = 0.01 are depicted in Fig. 7. Their cardinalities are respectively equal to (1147, 3374), (3187, 3558) and (1896, 2871). The surfaces of the sets of inner boxes are respectively equal to 6.962, 6.933 and 6.918. Once again, S3 generates the best paving with only 1147 inner boxes covering a total area equal to 6.962. S2 derives a paving with too many boxes as compared to the other strategies but the area covered by the inner boxes 6.933 is slightly better than the one obtained from S1 equal to 6.918.
(S3 )
(S2 )
(S1 )
√ Fig. 7. Given the real function f (x1 , x2 ) = acos(x2 − x21 ) + 1/ x1 + x2 and the box 2 Ω = [−5, 5] the figures above depict the pavings obtained from S3 , S2 and S1 using the interval branch-and-contract algorithm applied to the CSP C, Ω given the set of domain constraints C = {−1 ≤ x2 − x21 ≤ 1, x1 + x2 > 0} generated from f .
These experiments suggest that combining the detection of maximal inner boxes with branching is efficient. On the one hand, this strategy leads to maximize the volume of the set of inner boxes. On the other hand, no more than two sub-boxes are generated at each branching step, which tends to minimize the number of boxes explored during the search.
5
Discussion and Perspectives
We have presented an interval branch-and-contract algorithm that rigorously calculates a paving of the domain of definition of a factorable real function. An inner box is a guarantee for interval tests and interval operators that require the continuity property, as motivated in [8] in the context of bound-constrained global optimization. For an inclusion in other methods, it could be interesting to extract from our work a domain contractor that returns a union of an inner box included in the domain of definition of the function and an outer box. The problem studied in this paper has been taken into account by the recent IEEE 1788 standard for interval arithmetic [1]. This standard proposes to decorate the intervals with different flags including a dac flag ensuring that an
108
L. Granvilliers
operation is defined and continuous on the given domain. Implementing a solver on top of an IEEE 1788 compliant interval arithmetic library could then be useful to assert that the result of an interval evaluation has the required property. In the future, we plan to experiment several inflation techniques [4,6] and to compare them with the currently implemented method based on the constraint negations. It could be interesting to investigate other branching heuristics and to associate suitable interval contractors with the domain constraints, for instance contractors enforcing strong consistency techniques when those constraints are complex with many occurrences of variables. Acknowledgment. The author would like to thank Christophe Jermann for interesting discussions about these topics and his careful reading of a preliminary version of this paper.
References 1. 1788-2015, I.S.: IEEE Standard for Interval Arithmetic (2015) 2. Benhamou, F., Goualard, F.: Universally quantified interval constraints. In: Proceedings of International Conference on Principles and Practice of Constraint Programming (CP), pp. 67–82 (2000) 3. Benhamou, F., Goualard, F., Granvilliers, L., Puget, J.F.: Revising hull and box consistency. In: Proceedings of International Conference on Logic Programming (ICLP), pp. 230–244 (1999) 4. Chabert, G., Beldiceanu, N.: Sweeping with continuous domains. In: Proceedings of International Conference on Principles and Practice of Constraint Programming (CP), pp. 137–151 (2010) 5. Chabert, G., Jaulin, L.: Contractor programming. Artif. Intell. 173(11), 1079–1100 (2009) 6. Collavizza, H., Delobel, F., Rueher, M.: Extending consistent domains of numeric CSP. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 406–413 (1999) 7. Fousse, L., Hanrot, G., Lef`evre, V., P´elissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2) (2007) 8. Granvilliers, L.: A new interval contractor based on optimality conditions for bound constrained global optimization. In: Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI), pp. 90–97 (2018) 9. Granvilliers, L., Benhamou, F.: Algorithm 852: realpaver: an interval solver using constraint satisfaction techniques. ACM Trans. Math. Softw. 32(1), 138–156 (2006) 10. Hentenryck, P.V., McAllester, D., Kapur, D.: Solving polynomial systems using a branch and prune approach. SIAM J. Numer. Anal. 34(2), 797–827 (1997) 11. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 232–238 (1993) 12. Mackworth, A.K.: Consistency in networks of relations. Artif. Intell. 8, 99–118 (1977) 13. Moore, R.E.: Interval Analysis. Prentice-Hall (1966)
Leveraging Local Optima Network Properties for Memetic Differential Evolution Viktor Homolya and Tam´ as Vink´ o(B) Department of Computational Optimization, University of Szeged, Szeged, Hungary {homolyav,tvinko}@inf.u-szeged.hu
Abstract. Population based global optimization methods can be extended by properly defined networks in order to explore the structure of the search space, to describe how the method performed on a given problem and to inform the optimization algorithm so that it can be more efficient. The memetic differential evolution (MDE) algorithm using local optima network (LON) is investigated for these aspects. Firstly, we report the performance of the classical variants of differential evolution applied for MDE, including the structural properties of the resulting LONs. Secondly, a new restarting rule is proposed, which aims at avoiding early convergence and it uses the LON which is built-up during the evolutionary search of MDE. Finally, we show the promising results of this new rule, which contributes to the efforts of combining optimization methods with network science. Keywords: Global optimization · Memetic differential evolution Local optima network · Network science
1
·
Introduction
Consider the global optimization problem min f (x),
x∈D⊂IR
(1)
where f is a continuous function, which we aim to solve by the usage of memetic differential evolution (MDE) [10]. Recent benchmarking results [1,5] show the promising efficiency of MDE over challenging optimization problems. Differential evolution (DE) is a well known iterative, population based algorithm [12] using only the function value of f as information. Memetic approaches use local optimization method in each and every iteration, hence the population members are always local optima of the objective function [7,9]. MDE is a simple extension of DE, the formal description of the algorithm is the following.
c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 109–118, 2020. https://doi.org/10.1007/978-3-030-21803-4_11
110
V. Homolya and T. Vink´ o
1. Start with a random population {p1 , . . . , pm } (pi ∈ Rn ). 2. For each pi iterate until the stopping conditions hold: (a) Select three pairwise different elements from the population: pj , pk , pl , all different from pi . (b) Let c = pj + F · (pk − pl ) be a candidate solution. (c) Modify vector c applying a CR-crossover using vector pi . (d) Execute a local search from vector c. (e) Replace vector pi with vector c if f (c) ≤ f (pi ) holds. As it can be seen, MDE has some parameters: m is the population size, F ∈ (0, 2) is the differential weight and CR ∈ (0, 1) is the crossover probability. In Step 2(c) the CR-crossover for the candidate solution c ∈ Rn means that for all dimensions of c a number r is generated uniform at random in (0, 1). If r > CR then the dimension of c is made equal to the same dimension of pi . To guarantee getting a new vector c, the CR-crossover is skipped for a randomly selected dimension, so the linear combination of the three other vectors in this dimension is kept. Our contributions can be summarized as follows. First, we numerically investigate the classical x /y/z variants in the context of MDE. Then the MDE algorithm gets extended by the concept of local optima network (LON). In general, LONs are graphs, in which the nodes correspond to local optima of the optimization problem and the edges represent useful information either related to the problem (e.g. critical points of f ) or to the optimization method in use. Similarly to our earlier work [4], the directed edges of MDE LONs are formed in such a way that they represent parent-child relation. Thus at the end of the MDE run, we obtain a graph representation of how the method discovered the landscape of the optimization problem. Apart from the standard performance metrics, we also report and compare certain characteristics of the resulting LONs using some global metrics. One of the detailed analysis is to show the relation between the function values of nodes and the function values of their out-neighbors. Based on this and some graph properties we propose an extension to the MDE which can lead to better performance on the test functions used in this paper.
2 2.1
Definitions Strategies
The most popular DE variants which apply different strategies are distinguished by the notation DE/x/y/z, where – x specifies the solution to be perturbed, and it can be either rand or best, i.e., a random one or the current best solution. In the above algorithm description it defines the way to choose pj in Step 2(a).
Leveraging Local Optima Network Properties for Memetic DE
111
– y specifies the number of difference vectors (i.e. the difference between two randomly selected and distinct population members) to be used in the perturbation done in Step 2(b), and its typical values are either 1 or 2. The choice y = 1 is considered as default and hence Step 2(a) and 2(b) are as already given in the description. In case of y = 2, then besides pk and pl , further two vectors, pm and pn are also selected in order to create another differential vector. – z identifies which probability distribution function to be used by the crossover operator: either bin (as binomial) or exp (as exponential). In bin choose randomly a dimension index d. In Step 2(c) the vector c modified as for every e = d index let ce := pie with CR probability. In exp choose randomly a dimension index d. Starting from d, step over every e dimension and modify ce to pie . In every step with 1 − CR probability finish the modification. 2.2
Local Optima Network
As it was already briefly described in the Introduction, given problem (1) a local optima network (LON) is a graph in which the vertices are local optima of function f and the edges are defined between vertices separated by a critical point [13]. It is important to note that other kind of LONs can also be introduced which then specifically depend on the optimization method in use as well. In our work two vertices (local optimizers) are connected if they are in parent-child relation, i.e., the parent vertex is the target vector, the base vector or a member from the differential vector(s) and the child vertex is the result of the MDE iteration with the mentioned vectors. The edges are directed to the children. Loops are allowed, and the LON can be weighted to represent multi-edges. Another possibility has been developed and analyzed in [11] for DE in which the nodes are the population members and the weighted edges also represent parent-child relation. However, the resulting network captures the evolution of the population members, rather than the detection of the local optima. 2.3
Network Measures
It is expected that different MDE variants lead to different LONs at the end of their runs. In order to characterize these differences we can use global measures to characterize the entire graph, which are the followings. – The number of nodes (N ) and edges (M ); – the diameter (D) is the length of the longest of all directed shortest paths; – and the average degree (d) (the average in-degree is equal to the average out-degree). Larger N value means more local optima found. The diameter corresponds to the maximal number of times when Step 2(e) gets fulfilled for a given population member. Finally, for the average degree, for the y = 1 and y = 2 variants d < 3.5 and respectively d < 5 is an indication of early convergence.
112
3
V. Homolya and T. Vink´ o
Benchmarking the Classic Variants
The MDE and the LON creator and analyzer was implemented in Python with Pyomo [3] and NetworkX [2] packages. The local solver in MDE was MINOS [8]. 3.1
Test Functions
Following the numerical experiments done in [1,5] we tested the MDE variants on two test functions: – Rastrigin: fR (x) = 10n +
n
(x2i − 10 cos(2πxi )), x ∈ [−5.12, 5.12]n ,
i=1
which is a single-funnel function with 10n local minimizers, and its global minimum value is 0. – Schwefel: n fS (x) = −xi sin( |xi |), x ∈ [−500, 500]n , i=1
which is a highly multi-funnel function with 2n funnel bottom, and its global minimum value is −418.98129n. In fact, we used modified versions of these functions, namely we applied shifting and rotation of Rastrigin: fR (W(x − x)), and rotation on Schwefel: fS (W(x)), where W is an n-dimensional orthogonal matrix, and x is an ndimensional shift vector. These transformations result in even more challenging test functions, as they are non-separable and their global minimizer points do not lie in the center of their search space (as in the original versions). 3.2
Performance Metrics
After fixing the shift vectors and the rotation matrices for the test functions we executed K = 50 independent runs for all MDE variations. The performance metrics used to compare their efficiency are the followings. – S is the percentage of success, i.e. how many times we reached the global minimizer; – ‘Best’ is the best function value found out of K runs; – ‘Avg’ is the average of function values; – ‘Adf’ is the average distance between the found function value and the global optimum value in those runs where a failure occurred [5]; – ‘LS’ is the average number of local searches per successful run; – ‘SP’ is the success performance [1], which is calculated as mean(# local searches over successful runs) ×
K . # successful runs
Note that for all metrics, but for S, lower number indicates better performance.
Leveraging Local Optima Network Properties for Memetic DE
3.3
113
Stop Conditions
The following stopping conditions were used: – the sum of pairwise differences of the current population members’ values less than 10−4 ; – the population members had not replaced over the last 100 iterations; – the best founded value did not changed during the last 20,000 local searches (# iterations × m, where m is the population size). 3.4
Results
As it was already mentioned, we executed K = 50 independent runs for every variants. Both the dimension and the population size was fixed to 20. The MDE parameters were set up as F = 0.5 and CR = 0.1 in all experiments. For the Rastrigin function the tested strategies resulted in different performance metrics as it can be seen in Table 1. The most successful is the rand/2/bin variant as it was able to find the global optimum in all cases. Overall the rand/y/z strategies did quite well, except the rand/1/exp which resulted in the highest SP value. Among the best/x/y ones the best/2/bin got the highest success rate and the lowest SP value, whereas the best/1/exp did not succeed at all. Regarding the LONs we can notice that the x/2/z strategies led to larger graphs, as expected. This is a clear indication that these versions discover wider regions during the optimization runs. Note that larger LONs, such as rand/2/z have not resulted in larger diameters. The small size LONs of the best/1/z strategies and their low average degree are evidences of early convergence to local optima. Table 1. Performance and graph metrics for rotated and shifted Rastrigin-20 Rule
S
Best
Avg
Adf
LS
SP
best/1/bin
4
0
6.10
6.35
490
12250
N
M
D
d
358.5
1345.7
10.8
3.74
best/1/exp
0
1.98
10.51
10.5
∞
∞
best/2/bin
44
0
0.73
1.31
1462.7
3324
224.3
837.3
9.24
3.72
7809.7
12.52
best/2/exp
14
0
1.48
1.72
1042.8
7449
879.6
5.69
5002
12.02
rand/1/bin
54
0
0.69
1.51
1938.5
3590
5.68
1721.8
6954.0
14.38
rand/1/exp
12
0
2.54
2.88
1393
4.03
11611
1106.1
4504.6
14.22
rand/2/bin
100
0
0
0
4.07
6325
6325
6203.7
37212.3
14
rand/2/exp
92
0
0.09
1.24
5.99
3964.3
4309
3817.6
22901.3
13.38
5.99
1370.5
As it was expected the Schwefel problem turned out to be much more challenging for the MDE versions, see Table 2. Only three out of eight strategies were able to find the global optimum at least once. For this function rand/2/bin has the largest success rate and the lowest Adf and SP values, being essentially better than any other variants. However, the relative good performance of rand/2/bin is related to the highest number of nodes and edges in its LONs, hence it spends considerably more computational time than the others. An overall observation
114
V. Homolya and T. Vink´ o Table 2. Performance and graph metrics for rotated Schwefel-20 rule
S
Best
Avg
Adf
LS
SP
N
best/1/bin
0
−7905.9
−7371.6
1007.9
∞
∞
176.1
M
D
d
best/1/exp
0
−8142.7
−7204.9
1174.6
∞
∞
100
best/2/bin
0
−8261.2
−7886.8
492.8
∞
∞
1857.5
best/2/exp
0
−8024.3
−7629.7
749.9
∞
∞
821.7
4658.6
7.1
5.62
rand/1/bin
2
−8379.6
−7875.2
514.6
3520
176000
2186.8
8676.4
10.8
3.97
rand/1/exp
0
−8024.3
−7639.5
740.1
∞
∞
931.7
3754.9
10.1
4.03
rand/2/bin
20
−8379.6
-8202.2
221.7
15408
77040
14946.1
88927.2
9.8
5.94
rand/2/exp
4
−8379.6
−8114.2
276.4
5530
138250
8763.3
52254.4
9.6
5.96
633.3
7.1
3.57
341.7
5.9
3.39
10573.4
7.5
5.64
is that the diameters are certainly lower for the Schwefel problem than for the Rastrigin. On the other hand, the average degree values are very similar for the two problems.
4
MDE Supported by Network Analysis
Apart from reporting the LONs and analyzing their basic characteristics, we aim at extending the MDE algorithm with rules exploiting network properties which provide us with rich amount of information about how the execution of the optimization method was done. There are lots of possibilities to do so, here we report on one of them, which turns out to be useful to guide MDE towards better performance. Based on the analysis reported below we can propose a modified version of MDE.
Fig. 1. Function values of out-neighbors for fS with n = 20; the most successful runs for: best/1/bin (left) and rand/1/bin (right)
During the MDE run the corresponding LON gets built-up and it is possible to store the function values of the nodes. We can investigate the out-neighbors of node u and compare their function values against u. Figure 1 contains two plots of this kind, showing two different runs of two MDE variants. The x-axis contains the function values of LON nodes with positive out-degree. Each dot
Leveraging Local Optima Network Properties for Memetic DE
115
shows the function values of the out-neighbors. The straight line helps us to notice the amount of neighbors with higher and lower function values for each node. Having more dots above the line indicates that the MDE variant created more children with worse function value from a given node. The side effect of this behavior is the wider discovery of the search space, which can be quite beneficial especially on multi-funnel functions such as Schwefel. Although the rand/1/bin variant resulted in much larger LONs than the best/1/bin ones, Fig. 1 clearly shows that rand/1/bin has relatively much more dots above the line than below. For the other rand/y/z variants we obtained similar figures, and we know from Table 2 that some of these variants were able to find the global minimizer. On the other hand, best/1/bin got stuck in a local minimizer point and from the plot we can see the sign of greedy behavior. The fact that more successful variants can show similar behavior for the single-funnel Rastrigin function is shown on Fig. 2. Greedy behavior for this function could lead to better performance, nevertheless, even the most successful run (in terms of best function value reached) of best/1/exp converged to a local minimizer (left hand side on Fig. 2).
Fig. 2. Function values of out-neighbors for fR with n = 20; the most successful runs for: best/1/exp (left) and rand/1/exp (right)
Based on these observations we are ready to propose an extension to MDE using LON. 4.1
Above-Below Rule
To avoid the early converge we propose a restart-rule to be applied some members of the population. Only the ones which generated the convergence are the problems while the MDE did not explore enough parts of the search space, so the nodes which have more neighbors below the line. Remove these members from the population and add new random ones. Permanent restarting would prevent the convergence, so the restart is applied only in every α-th iteration of the MDE. We noticed that when the diameter of the LON is high enough,
116
V. Homolya and T. Vink´ o
the population visited fairly large part of the space, so it has good chances to converge to the global optimum if we use the MDE without this modification. We propose to extend the MDE algorithm in its Step 2 with the following rule, which has three integer parameters, δ > 0, α > 0 and θ ≤ 0. If the diameter of the current LON is lower than δ then in every α-th iteration for all pi do the followings: – – – –
collect the out-neighbors of pi into the set Ni , calculate the function values of the elements of Ni , let Nia := {q : f (q) > f (pi )}, and Nib := {q : f (q) < f (pi )} if |Nia | − |Nib | < θ then replace pi by a newly generated random vector.
Note that function values of the nodes are stored directly in the LON, so practically they need to be calculated only once. 4.2
Numerical Experiment
Using the above introduced rule we have done extensive benchmarking in order to see the performance indicators. Our aim was to find a combination of the three parameters which leads to improved efficiency. Hence we did a parameter sweep: δ ∈ [6, 9], α ∈ [3, 6], and θ ∈ [−2, 0]. The choice for the interval from which the values of δ are taken is motivated by the fact that, according to Tables 1 and 2 the diameters of the LONs for a given MDE variant are much larger for Rastrigin than for Schwefel, and it never goes beyond 10 for fS . On the other hand, when population members for fR are already having function values close to 0, then it is unwise to make MDE exploring the search space. We report the results of the experiments for n = 20 only. According to our findings, the combination δ = 7, α = 3, θ = −1 led to the best performance improvements for the tested functions. The indicators are reported in Tables 3 and 4, where improved metrics are highlighted by underline. Table 3. Performance and graph metrics for rotated and shifted Rastrigin-20 using the new rule Rule
S
Best
Avg
Adf
LS
SP
best/1/bin
0
0.99
5.56
5.56
∞
∞
best/1/exp
0
1.99
17.38
17.38
∞
∞
best/2/bin
60
0
0.61
1.54
1449.3
2415
best/2/exp
18
0
2.39
2.92
993.3
rand/1/bin
60
0
0.55
1.39
rand/1/exp
14
0
2.21
rand/2/bin
100
0
rand/2/exp
98
0
N 378.1
M 1424.3
D
d
11.52
3.76
226.4
843.5
9.52
3.71
1349.8
7677.1
12.88
5.68
5519
879.8
4989.7
12.16
5.66
1996.0
3327
1794.4
7244.1
14.96
4.03
2.57
1411.4
10082
1110.1
4521.1
14.1
4.07
0
0
6341.6
6342
6225.1
37318.3
14.64
5.99
0.03
1.78
3897.1
3977
3776.1
22641.9
14.04
5.99
We can see that our rule improved the percentage of success (S) for the single-funnel Rastrigin function in up to 16%, and resulted in lower average
Leveraging Local Optima Network Properties for Memetic DE
117
Table 4. Performance and graph metrics for rotated and shifted Schwefel-20 using the new rule Rule
S
Best
Avg
Adf
LS
SP
best/1/bin
0
−8142.7
−7685.6
693.9
∞
∞
N
M
best/1/exp
0
−8024.3
−7464.8
914.8
∞
∞
best/2/bin
0
−8261.2
−7993.3
386.2
∞
∞
best/2/exp
0
−8261.2
−7834.0
545.6
∞
∞
1731.6
9869.2
7.5
5.67
rand/1/bin
0
−8142.7
−7899.7
479.8
∞
∞
2370.9
9408.0
11.2
3.97
rand/1/exp
0
−8261.2
−7639.2
740.4
∞
∞
1065.6
4270.5
10.2
4.01
rand/2/bin
26
−8379.6
−8188.6
258.1
13524
52017
16304.1
96986.8
9.9
5.94
rand/2/exp
4
−8379.6
−8111.9
278.8
5850
146250
8384.2
50048.2
9.8
5.96
278.1
D
d
1014.3
7.3
3.63
149.8
532.8
6.3
3.51
1991.5
11323.6
7.4
5.66
function values for six out of eight variants. We obtained 7% improvement in success performance with best/2/bin. For the multi-funnel Schwefel function the new rule does not help for the variants which were unsuccessful in the original versions to find the global optimum. However, it made them finding local optimum with lower function value on average and hence decreased their ‘average difference failure’ measure. The most efficient rand/2/bin variant got better in its SP measure by 32%.
5
Conclusions
To the best of our knowledge our paper is the first one reporting benchmarking results on MDE variants. According to the numerical experiments, the rand/2/bin strategy provides overall the best percentage of success metric, especially when it is applied on multi-funnel problem. This is somewhat in line with the results reported in [6] for DE. For a single-funnel function the best/2/bin variant can be advantageous if one needs good success performance, i.e. lower computational time. We have shown that incorporating certain knowledge on the local optima network of the MDE to the evolutionary procedure can lead us to formalize restarting rules to enhance the diversification of the population. Our numerical tests indicates that the proposed restarting rule is beneficial on average for most of the MDE variants. In this work we have developed a computational tool in Python using Pyomo and NetworkX packages which provide us with a general framework to discover further possibilities on the field of (evolutionary) global optimization and network science. We plan to extend our codebase with further MDE rules, in particular with those involve network centrality measures as selection [4]. Acknowledgment. This research has been partially supported by the project “Integrated program for training new generation of scientists in the fields of computer science”, no EFOP-3.6.3-VEKOP-16-2017-0002. The project has been supported by the European Union and co-funded by the European Social Fund. Ministry of Human Capacities, Hungary grant 20391-3/2018/FEKUSTRAT is acknowledged.
118
V. Homolya and T. Vink´ o
References 1. Cabassi, F., Locatelli, M.: Computational investigation of simple memetic approaches for continuous global optimization. Comput. Oper. Res. 72, 50 – 70 (2016) 2. Hagberg, A., Swart, P., S Chult, D.: Exploring network structure, dynamics, and function using NetworkX. Technical report, Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2008) 3. Hart, W.E., Laird, C.D., Watson, J.P., Woodruff, D.L., Hackebeil, G.A., Nicholson, B.L., Siirola, J.D.: Pyomo-Optimization Modeling in Python, vol. 67. Springer, Heidelberg (2012) 4. Homolya, V., T.Vink´ o: Memetic differential evolution using network centrality measures. In: AIP Conference Proceedings 2070, 020023 (2019) 5. Locatelli, M., Maischberger, M., Schoen, F.: Differential evolution methods based on local searches. Comput. Oper. Res. 43, 169–180 (2014) 6. Mezura-Montes, E., Vel´ azquez-Reyes, J., Coello Coello, C.A.: A comparative study of differential evolution variants for global optimization. In: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 485–492. ACM (2006) 7. Moscato, P.: On evolution, search, optimization, genetic algorithms and martial arts: towards memetic algorithms. Caltech concurrent computation program. C3P Rep. 826 (1989) 8. Murtagh, B.A., Saunders, M.A.: MINOS 5.5.1 user’s guide. Technical Report SOL 83-20R (2003) 9. Neri, F., Cotta, C.: Memetic algorithms and memetic computing optimization: a literature review. Swarm Evol. Comput. 2, 1–14 (2012) 10. Piotrowski, A.P.: Adaptive memetic differential evolution with global and local neighborhood-based mutation operators. Inf. Sci. 241, 164–194 (2013) 11. Skanderova, L., Fabian, T.: Differential evolution dynamics analysis by complex networks. Soft Comput. 21(7), 1817–1831 (2017) 12. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–359 (1997) 13. Vink´ o, T., Gelle, K.: Basin hopping networks of continuous global optimization problems. Cent. Eur. J. Oper. Res. 25, 985–1006 (2017)
Maximization of a Convex Quadratic Form on a Polytope: Factorization and the Chebyshev Norm Bounds Milan Hlad´ık1(B)
and David Hartman2,3
1
Faculty of Mathematics and Physics, Department of Applied Mathematics, Charles University, Malostransk´e n´ am. 25, 11800 Prague, Czech Republic [email protected] https://kam.mff.cuni.cz/∼hladik 2 Computer Science Institute, Charles University, Malostransk´e n´ am. 25, 11800 Prague, Czech Republic [email protected] 3 Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic
Abstract. Maximization of a convex quadratic form on a convex polyhedral set is an NP-hard problem. We focus on computing an upper bound based on a factorization of the quadratic form matrix and employment of the maximum vector norm. Effectivity of this approach depends on the factorization used. We discuss several choices as well as iterative methods to improve performance of a particular factorization. We carried out numerical experiments to compare various alternatives and to compare our approach with other standard approaches, including McCormick envelopes. Keywords: Convex quadratic form Interval computation
1
· Relaxation · NP-hardness ·
Introduction
We consider one of the basic global optimization problems [6,9,15,16], maximization of a convex quadratic form on a convex polyhedral set f ∗ = max xT Ax subject to x ∈ M.
(1)
Herein, A ∈ Rn×n is symmetric positive semidefinite and M is a convex polyhedral set described by a system of linear inequalities. If M is bounded, global optimum is attained at a vertex of M [9]. This makes the problem computationally intractable. It is NP-hard even when M is a hypercube [11,17] and for other special cases [4]. There are also identified polynomially solvable sub-classes [1]. Supported by the Czech Science Foundation Grant P403-18-04735S. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 119–127, 2020. https://doi.org/10.1007/978-3-030-21803-4_12
120
M. Hlad´ık and D. Hartman
There are various methods developed for solving (1). This includes cutting plane methods [10], reformulation-linearization/convexification and branch & bound methods [3,15], among others. Polynomial time approximation methods also exist [18]. There are many works on quadratic programming [7] and concave function minimization [14] to give a more detailed state-of-the-art. In this paper, we focus on computation of a cheap upper bound on f ∗ . Tight upper bounds are important, for instance, when quadratic functions in a nonlinear model are relaxed. Maybe more importantly, tight bounds are crucial for effectivity of a branch & bound approach when solving nonlinear optimization problems. Notation. For a matrix A, we use Ai,∗ to denote its ith row. Inequalities and absolute values are applied entry-wise for vectors and matrices. The vector of of size n by In . We ones is denoted by e = (1, . . . , 1)T and the identity matrix √ use two vector norms, the Euclidean norm x2 = xT x and the maximum For a matrix M ∈ Rn×n , we use the (Chebyshev) norm x∞ = maxi {|xi |}. induced maximum norm M ∞ = maxi j |Mij |. Factorization. Matrix A can be factorized as A = GT G. Then xT Ax = xT GT Gx = Gx22 and we can formulate the problem as maximization of the squared Euclidean norm max Gx22 subject to x ∈ M.
(2)
Upper bound. Replacing the Euclidean norm by another norm, we obtain an approximation on the optimal value. Equivalence of vector norms gives us the guaranteed bounds. In particular, we utilize the maximum norm f ∗ = max Gx22 ≤ n · max Gx2∞ ≡ g ∗ (G). x∈M
x∈M
(3)
The upper bound g ∗ (G) is effectively computable by means of linear programming (LP). Write g ∗ (G) = n · max Gx2∞ = n · max max (Gi,∗ x)2 . x∈M
i
x∈M
The inner optimization problem maxx∈M Gi,∗ x has the form of an LP problem, and we have to solve maxx∈M ±Gi,∗ x for each i = 1, . . . , n. So in order to calculate g ∗ (G), it is sufficient to solve 2n LP problems in total. Quality of the upper bound g ∗ (G) depends on the factorization A = GT G. Our problem thus reads: Find the factorization A = GT G such that the upper bound (3) is as tight as possible.
2
Methods
There are two natural choices for the factorization A = GT G:
Maximization of a Convex Quadratic Form: Factorization and Bounds
121
– Cholesky decomposition A = GT G, where G is upper triangular with nonnegative diagonal. – Square root A = G2 , where G is symmetric positive semidefinite. Denote by H the set of orthogonal matrices of size n. Let H ∈ H and denote R := HG. Then RT R = (HG)T HG = GT G = A is another factorization of A. This motivates us to seek for suitable H ∈ H such that g ∗ (HG) gives a tight upper bound. An important sub-class of orthogonal matrices are Householder matrices. Let u ∈ Rn \ {0}, then the corresponding Householder matrix is defined as H(u) = In −
2 uT u
uuT .
Each orthogonal matrix can be factored into a product of at most n Householder matrices, so there is no loss of generality to restrict to Householder matrices only. The upper bound (3) needn’t be tight because the maximum norm overestimates the Euclidean norm. The overestimation vanishes for vectors the entries of which are the same in absolute value, that is, y22 = ny2∞ for each y ∈ {±1}n and its multiples. This brings us to the following heuristic: Find H ∈ H such that HG has possibly constant row absolute sums. To this end, denote y := |G|e. Let H ∈ H be a Householder matrix transforming y to α · e, where α := √1n y2 . Thus we have Hy = α · e. The matrix H can be constructed simply as the Householder matrix H(u) with u := α · e − y. In general, there is no guarantee that the resulting matrix HG has constant row absolute sums and gives tighter bounds. We can, however, iterate this procedure to obtain more promising candidates. Thus we suggest the following iterative method. Algorithm 1. (Factorization A = RT R) Input: Let A = GT G be an initial factorization. 1: 2: 3: 4: 5:
Put R := G. Put y := |R|e. Put α := √1n y2 . Put H := H(α · e − y). If HR∞ < R∞ , put R := HR and go to step 2.
Output: factorization A = RT R. Alternative Approaches In order to carry out a comparison, we consider three alternative methods: Exact method by enumeration. The optimal value f ∗ is attained for a vertex of the feasible set M. Thus, to compute f ∗ , we enumerate all vertices of M and take the maximum. Due to high computational complexity of this method, we use it in small dimensions only.
122
M. Hlad´ık and D. Hartman
Trivial upper bound. Let x, x ∈ Rn be lower and upper bounds on M, respectively. That is, for each x ∈ M we have x ≤ x ≤ x. Then an upper bound is simply calculated by interval arithmetic [8,13]. Let x := [x, x] be the corresponding interval vector and evaluate f = [f , f ] = xT Ax. Then f ∗ ≤ f . In order that the upper bound is tight, we use the interval hull of M in our experiments. That is, x is the smallest interval vector enclosing M. This can be computed by solving 2n LP problems, each of them calculating the minimum or maximum in a particular coordinate. The computational effort can be further reduced by a suitable order of LP problems to solve [2]. McCormick envelopes. We relax the quadratic term xT Ax using the standard McCormick envelopes [5,12]. As above, let x, x ∈ Rn be lower and upper bounds on M, respectively. Split A into the positive and negative parts A = A+ − A− , A+ , A− ≥ 0. Then xT A+ x ≤ xT A+ x + xT A+ x − xT A+ x = (x + x)T A+ x − xT A+ x = 2xTc A+ x − xT A+ x, and xT A− x ≥ xT A− x + xT A− x − xT A− x = 2xT A− x − xT A− x, xT A− x ≥ xT A− x + xT A− x − xT A− x = 2xT A− x − xT A− x. Now, an upper bound on f ∗ can be computed by the LP problem max z subject to z ≤ 2xTc A+ x − xT A+ x − 2xT A− x + xT A− x, z ≤ 2xTc A+ x − xT A+ x − 2xT A− x + xT A− x, x ∈ M, or in the standard form, max z subject to 2(xT A− − xTc A+ )x + z ≤ −xT A+ x + xT A− x, 2(xT A− − xTc A+ )x + z ≤ −xT A+ x + xT A− x, x ∈ M.
3
Comparison and Numerical Experiments
We carried out series of numerical experiments to compare the methods presented. For a given dimension n, we constructed randomly matrix A ∈ Rn×n as A := GT G, where the entries of G ∈ Rn×n were generated randomly in [−1, 1] with uniform distribution. The feasible set M is described by n2 inequalities. An inequality aT x ≤ b is generated such that ai s are chosen randomly uniformly
Maximization of a Convex Quadratic Form: Factorization and Bounds
123
in [−1, 1] and b is chosen randomly uniformly in [0, eT |a|]. For larger dimensions (n ≥ 70), we make zero randomly selected 80% of entries of the constraint matrix and run the computations in sparse mode. For small dimensions, effectivity of a method is evaluated relatively to the exact method. That is, we record the ratio bm /f ∗ , where bm is the upper bound by the given method and f ∗ is the optimal value. For higher dimensions, the exact method is too time consuming, so effectivity of a method is evaluated relatively to the trivial method. That is, we record the ratio bm /btriv , where bm is the upper bound by the given method and btriv is the upper bound by the trivial method. The computations were carried out in MATLAB R2017b on a eight-processor machine AMD Ryzen 7 1800X, with 32187 MB RAM. The symbols used in the tables have the following meaning: – runs: the number of runs, for which the mean values in each row are computed; – triv: the trivial upper bound using the interval hull of M; – McCormick: the upper bound using McCormick relaxation and the interval hull of M; – sqrtm: our upper bound using G as the square root of A; – sqrtm+it: our upper bound using G as the square root of A and iterative modification of G by means of Algorithm 1; – chol: our upper bound using G from the Cholesky decomposition of A; – chol+it: our upper bound using G from the Cholesky decomposition of A and iterative modification of G by means of Algorithm 1; – chol+rand: our upper bound using G from the Cholesky decomposition of A and iterative improvement of G by trying 10 random Householder matrices. Small dimension. Table 1 compares the effectivities for small dimensions, and Table 2 displays the corresponding running times. By definition, effectivity of the exact method is 1. From the results we see that for a very small dimension, the best strategy is to compute the exact optimal value – it is the tightest and fastest method. As the dimension increases, computation of the exact optimal value becomes more time consuming. The running times of the upper bound methods are more-or-less the same. Of course, chol+rand is about ten times slower since it run ten instances. Our approach is more effective with respect to tightness provided a suitable factorization is used. The square root of A behaves better than the Cholesky decomposition on average. Algorithm 1 can improve the performance of the Cholesky approach, but not that of the square root one. The random generation of Householder matrices has the best performance, indicating that there is a high potential of using a suitable factorization. On average, the random Householder matrix generation performs similarly when applied on sqrtm or on chol, so we numerically tested only the latter.
124
M. Hlad´ık and D. Hartman
As the dimension increases, all the bounds (the trivial ones, the McCormick ones and our bounds) tend to improve. This is rather surprising, and we have no complete explanation for this behaviour. It seems to affected by the geometry of the convex polyhedron in connection with the way how the bounds are constructed. Table 1. Efficiency of the methods – small dimensions. The best efficiencies highlighted in boldface. n
runs triv
3
100
65.55 51.17
65.22 67.52
78.33 75.12
48.96
5
100
24.01 19.31
25.20 23.16
33.54 27.43
18.98
7
100
26.47 21.90
20.63 21.36
28.15 23.26
16.59
9
20
19.57 16.48
14.90 14.83
19.81 13.65
11.27
10 20
22.26 18.75
13.25 13.54
19.75 14.08
11.92
McCormick sqrtm sqrtm+it chol
chol+it chol+rand
Table 2. Computational times of the methods (in 10−3 s) – small dimensions. n
runs exact
3
100
0.8256 38.83 44.87
36.95 36.94
36.78 36.85
36.96
5
100
101.5
64.10 69.79
61.10 61.60
61.19 61.39
616.1
7
100
7160
91.87 97.62
89.01 88.86
88.48 88.01
887.7
9
20
141900 119.1 123.8
114.8 115.2
115.0 114.6
1145
10 20
240000 132.3 137.7
126.4 126.9
125.2 125.9
1257
triv
McCormick sqrtm sqrtm+it chol
chol+it chol+rand
Higher dimension. Tables 3 and 4 show the results for higher dimensions. By definition, effectivity of the trivial method is 1. For smaller n, random Householder matrix generation performs best, but for larger n the number of random matrices is not sufficient and the winner is the square root of A. Sometimes, its tightness is improved by additional iterations, but not always. Again, the computation times are very similar to each other. This is not surprising since all the methods basically need to solve 2n LP problems. For n ≥ 70, we run the computations in sparse mode. We can see from the tables that the calculations took lower time due to the sparse mode. With respect to the efficiencies, the methods perform similarly as in the previous dense case. Again, as the dimension increases, our bounds tend to improve. Since we related the displayed efficiency of the bounds to the trivial ones, this behaviour might be caused by the worse quality of the trivial bounds in higher dimensions.
Maximization of a Convex Quadratic Form: Factorization and Bounds
4
125
Conclusion
We proposed a simple and cheap method to compute an upper bound for a problem of maximization of a convex quadratic form on a convex polyhedron. The method is based on a factorization of the quadratic form matrix and application of Chebyshev vector norm. The numerical experiments indicate that (at least for the randomly generated instances) with basically the same running time, the method gives tighter bounds than the trivial method or than the McCormick relaxation approach. For small dimensions, the performance of all the considered approximation methods was low even in comparison with exact optimum computation. However, in medium or larger dimensions, the effectivity of our approach becomes very significant. Therefore, it may serve as a promising approximation method for solving largeTable 3. Efficiency of the methods – higher dimensions. The best efficiencies highlighted in boldface. The bottom part run in sparse mode. n
runs triv McCormick sqrtm
sqrtm+it chol
20
100
1
0.8737
0.4614
0.4625
0.6682 0.5013
0.4260
30
100
1
0.8879
0.3730
0.3731
0.5587 0.4046
0.3582
40
100
1
0.9019
0.3170 0.3170
0.4707 0.3471
0.3216
50
100
1
0.9102
0.2725
0.4273 0.3113
0.2940
60
100
1
0.9196
0.2396 0.2401
0.3806 0.2781
0.2692
70
20
1
0.9101
0.2709 0.2709
0.4344 0.3133
0.3062
80
20
1
0.9127
0.2445 0.2445
0.3905 0.2923
0.2900
90
20
1
0.9201
0.2237 0.2237
0.3604 0.2845
0.2779
100 20
1
0.9229
0.1993 0.1993
0.3496 0.2706
0.2677
0.2719
chol+it chol+rand
Table 4. Computational times of the methods (in seconds) – higher dimensions. The bottom part run in sparse mode. n
runs triv
20
100
0.4686 0.4799
0.4587 0.4575
0.4601 0.4573
4.583
30
100
2.115
2.150
2.075
2.073
2.087
2.087
20.80
40
100
7.889
7.983
7.735
7.725
7.812
7.780
77.74
50
100
25.16
25.44
24.71
24.72
24.93
24.85
248.4
60
100
64.89
63.97
63.97
64.19
64.92
64.43
641.1
70
20
12.36
12.57
12.99
12.94
12.89
13.25
131.2
80
20
24.09
24.23
24.61
24.64
25.34
25.19
251.5
90
20
43.97
44.10
45.71
45.45
46.25
46.62
465.9
100 20
78.92
79.77
84.74
84.22
85.08
86.19
855.7
McCormick sqrtm sqrtm+it chol
chol+it chol+rand
126
M. Hlad´ık and D. Hartman
scale problems. Indeed, the larger the dimension, the tighter bounds we got compared relatively to the trivial or McCormick ones. In the future, it would be also interesting to compare our approach to other approximation methods, including the state-of-the-art technique of semidefinite programming. As an open problem, it remains the question of finding a suitable factorization. In our experiments, the square root approach behaves best. Algorithm 1 can sometimes slightly improve tightness of the resulting bounds with almost no additional effort. Nevertheless, as the numerical experiments with random Householder matrices suggest, there is a high potential of achieving even better results. The problem of finding the best factorization is challenging – so far, there are no complexity theoretical results or any kind of characterization.
References 1. Allemand, K., Fukuda, K., Liebling, T.M., Steiner, E.: A polynomial case of unconstrained zero-one quadratic optimization. Math. Program. 91(1), 49–52 (2001) 2. Baharev, A., Achterberg, T., R´ev, E.: Computation of an extractive distillation column with affine arithmetic. AIChE J. 55(7), 1695–1704 (2009) 3. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming. Theory and Algorithms. 3rd edn. Wiley, Hoboken (2006) ˇ 4. Cern´ y, M., Hlad´ık, M.: The complexity of computation and approximation of the t-ratio over one-dimensional interval data. Comput. Stat. Data Anal. 80, 26–43 (2014) 5. Floudas, C.A.: Deterministic Global Optimization. Theory, Methods and Applications, Nonconvex Optimization and its Applications, vol. 37. Kluwer, Dordrecht (2000) 6. Floudas, C.A., Visweswaran, V.: Quadratic optimization. In: Horst, R., Pardalos, P.M. (eds.) Handbook of Global Optimization, pp. 217–269. Springer, Boston (1995) 7. Gould, N.I.M., Toint, P.L.: A quadratic programming bibliography. RAL Internal Report 2000-1, Science and Technology Facilities Council, Scientific Computing Department, Numerical Analysis Group, 28 March, 2012. ftp://ftp.numerical.rl. ac.uk/pub/qpbook/qp.pdf 8. Hansen, E.R., Walster, G.W.: Global Optimization Using Interval Analysis, 2nd edn. Marcel Dekker, New York (2004) 9. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Heidelberg (1990) 10. Konno, H.: Maximizing a convex quadratic function over a hypercube. J. Oper. Res. Soc. Jpn 23(2), 171–188 (1980) 11. Kreinovich, V., Lakeyev, A., Rohn, J., Kahl, P.: Computational Complexity and Feasibility of Data Processing and Interval Computations. Kluwer, Dordrecht (1998) 12. McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part I - Convex underestimating problems. Math. Program. 10(1), 147–175 (1976) 13. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. SIAM, Philadelphia (2009)
Maximization of a Convex Quadratic Form: Factorization and Bounds
127
14. Pardalos, P., Rosen, J.: Methods for global concave minimization: a bibliographic survey. SIAM Rev. 28(3), 367–79 (1986) 15. Sherali, H.D., Adams, W.P.: A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems. Kluwer, Boston (1999) 16. Tuy, H.: Convex Analysis and Global Optimization. Springer Optimization and Its Applications, vol. 110, 2nd edn. Springer, Cham (2016) 17. Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford University Press, New York (1991) 18. Vavasis, S.A.: Polynomial time weak approximation algorithms for quadratic programming. In: Pardalos, P.M. (ed.) Complexity in Numerical Optimization, pp. 490–500. World Scientific Publishing, Singapore (1993)
New Dynamic Programming Approach to Global Optimization Anna Ka´zmierczak(B)
and Andrzej Nowakowski
Faculty of Mathematics and Computer Science, University of Lodz, Banacha 22, 90-238 Lodz, Poland {anna.kazmierczak,andrzej.nowakowski}@wmii.uni.lodz.pl
Abstract. The paper deals with the problem of finding the global minimum of a function in a subset of Rn described by values of solutions to a system of semilinear parabolic equations. We propose a construction of a new dual dynamic programming to formulate a new optimization problem. As a consequence we state and prove a verification theorem for the global minimum and investigate a dual optimal feedback control for the global optimization.
Keywords: Global optimization Feedback control
1
· Dynamic programming ·
Introduction
In classical optimization problem, our aim is to minimize a real valued objective function, defined on a subset of an Euclidean space, which is determined by a family of constraint functions. Depending on the type of those functions: linear, convex, nonconvex and nonsmooth, different tools from analysis and numerical analysis can be applied in order to find the minimum (or approximate minimum) of the objective function (see e.g. [1]). However some sets, interesting from practical point of view, are very difficult to describe by constraints. Sometimes such problematic sets can be characterized as controllability sets of dynamics e.g. differential equations depending on controls. The aim of this paper is to present one of such dynamics–a system of parabolic differential equations and to construct a new dynamic programming to derive verification theorem for the optimization problem. As a consequence, we can define a dual feedback control and an optimal dual feedback and state a theorem regarding sufficient optimality conditions in terms of the feedback control.
c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 128–134, 2020. https://doi.org/10.1007/978-3-030-21803-4_13
New Dynamic Programming Approach to Global Optimization
2
129
The Optimization Problem
Let P ⊂ Rn an let R be a function defined on P, i.e. R : P → R. Consider the following optimization problem R: minimize R(x) on P. Notice that nothing is assumed on R and the set P can be very irregular. It is not an easy work to study such a problem and, in fact, theory of optimization does not offer suitable tools to perform that task. We develop a new method to handle the problem R. To this effect, we transform R to the language of optimal control theory. Let us introduce Ω–an open, bounded domain in Rn of the variables z, a compact set U ⊂ Rm (m ≥ 1) and an interval [0, T ]. Define a family U = u(t, z), (t, z) ∈ [0, T ] × Ω : u ∈ L1 ([0, T ] × Ω), u(t, z) ∈ U of controls and a function f : [0, T ] × Ω × U → Rn , sufficiently regular (at least Lipschitz continuous), consisting a nonlinearity of a system of parabolic differential equations xt (t, z) − Δx(t, z) = f (t, z, x(t, z), u(t, z)) , (t, z) ∈ [0, T ] × Ω, x(0, z) = x0 (z), z ∈ Ω.
(1)
The regularity of f should ensure existence of solutions to (1), belonging to the Sobolev space (W 1,2 ([0, T ] × Ω))n ∩ (C([0, T ] × Ω))n for x0 (·) ∈ (C0 (Ω))n and each control u ∈ U. Then, using (1) and with the proper choice of U , we can characterize P as: x(T, z)dz∈Rn : x is a solution to (1) for u ∈ U}. P={ Ω
Hence, R is now transformed to a well known problem of the optimal control theory, which reads: minimize R( x(T, z)dz), Ω
subject to xt (t, z) − Δx(t, z)) = f (t, z, x(t, z)), u(t, z)) , (t, z) ∈ [0, T ] × Ω,
(2)
x(0, z) = x0 (z), z ∈ Ω, u ∈ U.
(3)
We denote that problem by Rc and stress that Rc still regards minimizing the function R on the set P. Thus finding a solution to Rc is equivalent to finding a solution to the original problem R. However, to solve R we will deal with Rc and develop suitable new tools to study Rc . In this way we want to omit lack of regularity of the problem R. The set of all pairs (x, u) satisfying (2), (3) we denote by Ad. Note that the problem Rc is, in fact, a classical optimal control problem with distributed
130
A. Ka´zmierczak and A. Nowakowski
parameters u. Thus we can apply tools from optimal control theory. Of course, one may wonder whether this machinery is too complicated to optimize R on P. All depends on the type of the set P is, as well as on how smooth is the function R. If R and P are regular enough, we have many instruments in theory of optimization to solve the problem R also numerically, but when there is no sufficient regularity, then these methods are very complicated or, in case of very bad data, cannot be adopted. In order to derive verification conditions, in fact, sufficient optimality conditions for Rc , we develop quite a new dual method basing on ideas from [3]. Using that dual method we also construct a new dual optimal feedback control for Rc . Essential point of the proposed approach is that we do not need any regularity of R on P as we move all considerations related to Rc to extended space. Remark 1. Notice that if we omit the integral in the definition of P, then P will become a subset of an infinite dimensional space, but the method developed in subsequent sections can be applied also to that case (i.e. to the problem of finding a minimum in a subset of an infinite dimensional space).
3
Dual Approach to Rc
The dual approach to optimal control problems was first introduced in [3] and then developed in several papers to different problems of that kind, governed by elliptic, parabolic and wave equations (see e.g. [2,4]). In that method we do not deal directly with a value function but with some auxiliary function, defined in an extended set, satisfying a dual dynamic equation, which allows to derive verification conditions for the primal value function. One of the benefits of that technique is that we do not need any properties of the value function, such as smoothness or convexity. In this paper we want to construct a new dual method to treat the problem Rc . We start with the definition of a dual set: P ⊂ Rn –an open set of the variables p. The set P is chosen by us! Let P ⊂ R2n+1 be an open set of the variables (t, z, p), (t, z) ∈ [0, T ] × Ω, p ∈ P, i.e. P = {(t, z, p) ∈ R2n+1 : (t, z) ∈ [0, T ] × Ω, p ∈ P}.
(4)
Why do we extend the primal space of (t, z) variables? In classical approach to necessary optimality conditions of Pontryagin maximum principle, both in one variable and with distributed parameters, we work with the space of variables (t, z) and with so called conjugate variable (y0 , p)–multiplier. In the dual dynamic programming (y0 , p) is nothing more as just the multiplier p staying by constraints and y0 staying by the functional. However, the novelty in our method is that we move all our study to that extended space (t, z, p), but we do not use p as a multiplier and drop out the multiplier y0 . Denote by W 1:2 (P ) the specific Sobolev space of real valued functions of the variables (t, z, p), having the first order derivative with respect to t, and the second order weak or generalized derivative (in the sense of distributions) with respect to z. Our notation for the function space is used for the function
New Dynamic Programming Approach to Global Optimization
131
depending on the primal variable (t, z), and the dual variable p. The primal and the dual variables are independent and the functions in the space W 1:2 (P ) enjoy different properties with respect to (t, z) and p. The strategy of dual dynamic programming consists in building all notions in the dual space–this concerns also a dynamic programming equation. Thus the question is: how to construct that equation in our case? The answer is not easy and not unique: on the left hand side of (2) there is a linear differential operator, which concerns a state x. Certainly, the auxiliary function V has to be real valued, as it must relate somehow to a value function. This implies that the system of dynamic equations has to be composed of one equation only, despite that (2) is a system of n equations. The main problem is to choose a proper differential operator for the auxiliary function V and a correct Hamiltonian, as these choices depend on themselves. We have decided that in our original approach it is better to apply for V the parabolic operator ∂/∂t − Δ only. We state the dynamic equation in a strong form (see 5). We should stress that this equation is considered in the set P , i.e. in the set of the variables (t, z, p). Therefore, we require that a function V (t, z, p), V ∈ W 1:2 (P ), satisfies, in P, for some y 0 ∈ L2 ([0, T ] × Ω), continuous in t, a parabolic partial differential equation of dual dynamic programming of the form: (t, z, p) − Δz V (t, z, p) − inf{pf (t, z, V (t, z, p), u), u ∈ U } ∂ = ∂t V (t, z, p) − Δz V (t, z, p) − pf (t, z, V (t, z, p), u(t, z, p)) = y 0 (t, z), (t, z, p) ∈ P,
∂ ∂t V
as well the initial condition 0 y (T, z)dz ≤ R( pV (T, z, p)dz), p ∈ P, Ω
(5)
(6)
Ω
where u(t, z, p) is a function in P , for which the infimum in (5) is attained. Since the function f is continuous and U is a compact set, then u(t, z, p) exists and is continuous. Denote by p(t, z), (t, z) ∈ [0, T ] × Ω, p ∈ L2 ([0, T ] × Ω), a new trajectory having the property that for some u ∈ U and some y(·) ∈ L2 ([0, T ] × Ω), y ≤ y 0 , p is a solution to the following equation: ∂ ∂t V
(t, z, p(t,z)) − Δz V (t, z, p(t, z)) − p(t, z)f (t, z, V (t,z, p(t, z)), u(t, z)) (7) = y(t, z),
while V (t, z, p), is a solution to (5). We will call p(·) a dual trajectory, while x(·) stands for a primal trajectory. Moreover, we say that a dual trajectory p(·) is dual to x(·), if both are generated by the same control u(t, z). Further, we confine ourselves only to those admissible trajectories x(·), which satisfy the equation: x(t, z) = p(t, z)V (t, z, p(t, z)) (for (t, z) ∈ [0, T ] × Ω). Thus denote AdV = {(x, u) ∈ Ad : there exist p ∈L2 ([0, T ] × Ω), dual to x(t, z) and such that x(t, z) = p(t, z)V (t, z, p(t, z)), for (t, z) ∈ [0, T ] × Ω}. Actually, it means that we are going to study the problem Rc possibly in some smaller set AdV , which is determined by V . All the above was simply the
132
A. Ka´zmierczak and A. Nowakowski
precise description of the family AdV . This means we must reformulate Rc to: inf R( x(T, z)dz). (8) RV = (x,u)∈AdV
We name R
V
Ω
the dual optimal value, in contrast to the optimal value Ro = inf R( x(T, z)dz), (x,u)∈Ad
Ω
as R depends strongly upon dual trajectories p(t, z) which, in fact, determine the set AdV . Moreover, essential point is that the set AdV is, in general, smaller than Ad, i.e. AdV ⊂ Ad, so the dual optimal value RV may be greater than the optimal value Ro , i.e. RV ≥ Ro . In order to find the set AdV , first we must find the function V , i.e. solve equation (5) and then define the set of admissible dual trajectories. It is not an easy work, but it permits to assert that suspected trajectory is really optimal with respect to all trajectories lying in AdV . This fact is presented in the literature for the first time. V
Remark 2. We should not bother about the problem R, if AdV is strictly smaller than Ad, since the given P can be characterized with the help of the set AdV . In practice, we extend Ad in order that (a possibly smaller set) AdV corresponds precisely to P.
4
Sufficient Optimality Conditions for the Problem (8)
Below we formulate and prove the verification theorem, which gives sufficient conditions for the existence of the optimal value RV , as well as for the optimal pair (relative to the set AdV ). Theorem 1. Assume that there exists a W 1:2 (P )–solution V of (5), (6) on P , i.e. there exists y 0 ∈ L2 ([0, T ] × Ω) such that V fulfills (5) and (6). Let ¯(t, z), satisfy (7) and let p ¯ ∈ L2 ([0, T ] × Ω), with the corresponding u y 0 (T, z)dz = R( p ¯ (T, z)V (T, z, p ¯ (T, z))dz. (9) Ω
Ω
Moreover, assume that x ¯(t, z) = p¯(t, z)V (t, z, p¯(t, z)), (t,z) ∈ [0, T ]×Ω, together with u ¯, belong to AdV . Then (¯ x(·), u ¯(·)) is the optimal pair relative to all (x(·), u(·)) ∈ AdV . Proof. Let us take any (x(·), u(·)) ∈ AdV and p(·) generated by u(·), i.e. such that (u(t, z),p(t, z)), (t, z) ∈ [0, T ] × Ω, satisfy (7) for some y ∈ L2 ([0, T ] × Ω), y ≤ y 0 . Hence, from definition of AdV , the control u(·) generates x(t,z) = p(t,z)V (t, z, p(t,z)), (t, z) ∈ [0, T ] × Ω. Then, on the basis of (9) and (6), we can write ¯(T, z, )dz) = R( Ω p ¯ (T, z)V (T, z,p ¯ (T, z))dz) = Ω y 0 (T, z)dz R( Ωx ≤ R( Ω p(T, z)V (T, z, p(T, z))dz) = R( Ω x(T, z, )dz), which gives the assertion.
New Dynamic Programming Approach to Global Optimization
5
133
Feedback Control for the Problem Rc
In this section we present suitable notions to define an absolutely new optimal dual feedback control for the problem Rc . After appropriate definitions we state and prove sufficient conditions for optimality in terms of feedback control, which follow from the verification theorem. Let us see that a suggestion for the feedback appears by the definition of dual dynamic programming in (5). A function u(t, z, p) in P is called a dual feedback control if there exists a solution x(t, z, p) in P, x ∈ W 1:2 (P ), of the equation xt (t, z, p) − Δx(t, z, p) = f (t, z, x(t, z, p), u(t, z, p)) , (t, z, p) ∈ P.
(10)
A dual feedback control u ¯ (t, z, p), (t, z, p) ∈ P , is named optimal if there exist: ¯ (t, z, p), (i) a function x ¯(t, z, p), (t, z, p) ∈ P , x ¯ ∈ W 1:2 (P ), satisfying (10) with u 1:2 ¯(t, z, p) = pV (t, z, p), satisfying (6) for (ii) V ∈ W (P ), given by the relation x some y 0 ∈ L2 ([0, T ] × Ω) and defining Adx¯ = {(x, u) ∈ Ad : x(t, z) = x ¯(t, z, p(t, z)) for some p ∈ L2 ([0, T ] × Ω), satisfying(7) with u(t, z) = u ¯ (t, z, p(t, z)) and some y ∈ L2 ([0, T ] × Ω), y ≤ y 0 }, (iii) a dual trajectory p ¯ (·) ∈ L2 ([0, T ] × Ω), such that the pair x ¯(t, z) = x ¯(t, z, p ¯ (t, z)), u ¯(t, z) = u ¯ (t, z, p ¯ (t, z)), (t, z) ∈ [0, T ] × Ω, ¯ satisfies (7) together with u ¯. is optimal relative to the set Adx¯ and p Next theorem asserts the existence of an optimal dual feedback control, again in terms of the function V (t, z, p). Theorem 2. Let u ¯ (t, z, p) be a dual feedback control in P and x ¯(t, z, p), (t, z, p) ∈ P , be defined according to (10). Suppose that for some y 0 ∈ L2 ([0, T ]× Ω), there exists a function V ∈ W 1:2 (P ), satisfying (6), and that pV (t, z, p) = x ¯(t, z, p), (t, z, p) ∈ P .
(11)
¯ (t, z)) ∈ P , be such a function, that a pair Let p ¯ (·) ∈ L2 ([0, T ] × Ω), (t, z, p ¯ satisfies x ¯(t, z) = x ¯(t, z, p ¯ (t, z)), u ¯(t, z) = u ¯ (t, z, p ¯ (t, z)) belongs to Adx¯ and p (7) with u ¯ and V . Moreover, assume that ¯ (T, z)V (T, z, p ¯ (T, z))dz) (12) R( p Ω = R( y 0 (T, z)dz). Ω
Then u ¯ (t, z, p), (t, z, p) ∈ P , is an optimal dual feedback control. Proof. Take any function p(t, z), p ∈ L2 ([0, T ] × Ω), dual to x(t, z) = x ¯(t, z, p(t, z)) and such that for u(t, z) = u ¯ (t, z, p(t, z)), (x, u) ∈ Adx¯ . By (11),
134
A. Ka´zmierczak and A. Nowakowski
it follows that x(t, z) = p(t, z)V (t, z, p(t, z)) for (t, z) ∈ [0, T ] × Ω. Analogously as in the proof of Theorem 1, Eqs. (6) and (12) give R( p ¯ (T, z)V (T, z, p ¯ (T, z))dz) ≤ R( x(T, z, )dz). (13) Ω
Ω
As a conclusion from (13), we get R( x ¯(T, z, )dz) = R( p ¯ (T, z)V (T, z, p ¯ (T, z))dz) Ω
Ω
≤
inf
(x,u)∈Adx ¯
R( x(T, z)dz), Ω
which is sufficient to show that u ¯ (t, z, p) is an optimal dual feedback control, by the above definition.
References 1. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004) 2. Galewska, E., Nowakowski, A.: A dual dynamic programming for multidimensional elliptic optimal control problems. Numer. Funct. Anal. Optim. 27, 279–289 (2006) 3. Nowakowski, A.: The dual dynamic programming. Proc. Am. Math. Soc. 116, 1089– 1096 (1992) 4. Nowakowski, A., Sokolowski, J.: On dual dynamic programming in shape control. Commun. Pure Appl. Anal. 11, 2473–2485 (2012)
On Chebyshev Center of the Intersection of Two Ellipsoids Xiaoli Cen, Yong Xia(B) , Runxuan Gao, and Tianzhi Yang LMIB of the Ministry of Education; School of Mathematics and System Sciences, Beihang University, Beijing 100191, People’s Republic of China [email protected]
Abstract. We study the problem of finding the smallest ball covering the intersection of two ellipsoids, which is also known as the Chebyshev center problem (CC). Semidefinite programming (SDP) relaxation is an efficient approach to approximate (CC). In this paper, we first establish the worst-case approximation bound of (SDP). Then we show that (CC) can be globally solved in polynomial time. As a by-product, one can randomly generate Celis-Dennis-Tapia subproblems having positive Lagrangian duality gap with high probability. Keywords: Chebyshev center · Semidefinite programming · Approximation bound · Polynomial solvability · CDT subproblem
1
Introduction
We study the problem of finding Chebyshev center of the intersection of two ellipsoids: (CC) min max x − z2 , z
where (·) =
x∈Ω
(1)
(·)T (·) is the Euclidean norm of (·), Ω := x ∈ Rn : Fi x + gi 2 ≤ 1, i = 1, 2 ,
and Fi ∈ Rmi ×n , gi ∈ Rmi for i = 1, 2. We assume one of the two ellipsoids is nondegenerate so that Ω is bounded. To this end, we let F1 be of full column rank. We also assume that Ω has at least one interior point. Without loss of generality, we assume the origin 0 is an interior point of Ω, that is, gi < 1, i = 1, 2. Under these assumptions, (CC) has an optimal solution (z ∗ , x∗ ). Then, z ∗ is the Chebyshev center of Ω and the ball centered at z ∗ with radius x∗ − z ∗ is the smallest ball covering Ω. (CC) has a direct application in the bounded error estimation. Consider the linear regression model Ax ≈ b where A is ill-conditioned. In order to stabilize the estimation, a regularization constraint Lx2 ≤ η is introduced to restrict c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 135–144, 2020. https://doi.org/10.1007/978-3-030-21803-4_14
136
X. Cen et al.
x. Therefore, the admissible solutions to the linear system is given by the intersection of two ellipsoids [13]: F = {x ∈ Rn : Lx2 ≤ η, Ax − b2 ≤ ρ}. As a robust approximation of the true solution, Beck and Eldar [4] suggested the Chebyshev center of F, which leads to the minimax optimization (CC). (CC) is difficult to solve. Relaxing the inner nonconvex quadratic optimization problem to its Lagrange dual (which can be reformulated as a semidefinite programming (SDP) minimization), Beck and Eldar [4] proposed the SDP relaxation approach for (CC). Their numerical experiments demonstrated that this approximation is “pretty good” in practice. Interestingly, when (CC) is defined on the complex domain rather than the real space, there is no gap between (CC) and this SDP relaxation since strong duality holds for the inner quadratic maximization with two quadratic constraints over complex domain [3]. The other zero-duality case is reported in [2] when both ellipsoids are Euclidean balls and n ≥ 2. The SDP relaxation approach was later extended by Eldar et al. [10] to find the Chebyshev center of the intersection of multiple ellipsoids, where an alternative derivation of the SDP relaxation was presented. To the best of our knowledge, there is no particular global optimization method for solving (CC). Moreover, the following two questions is unknown: – The SDP relaxation has been shown “pretty good” only in numerical experiments [4]. Is there any theoretical guarantee? – Can (CC) be globally solved in polynomial time? In this paper, we will positively answer the above two questions. In particular, we establish in Sect. 2 the worst-case approximation bound of the SDP relaxation of (CC). In Sect. 3, we propose a global optimization method to solve (CC) and show that it can be done in polynomial time. As a by-product, in Sect. 4, we show that based on (CC) one can randomly generate Celis-Dennis-Tapia (CDT) subproblems having positive Lagrangian duality gap with high probability. Notations. Let σmax (·) and σmin (·) be the largest and smallest singular values of the matrix (·), respectively. Denote by In the n × n identity matrix. v(·) denotes the optimal valueof the problem (·). For two n × n symmetric n n matrices A and B, Tr(AB) = i=1 j=1 aij bij returns the inner product of A and B. A ()B means that the matrix A − B is positive (semi)definite. Let 0n and O be the n-dimensional zero vector and n × n zero matrix, respectively. For a real number x, x denotes the smallest integer larger than or equal to x.
2
SDP Relaxation and Its Approximation Bound
In this section, we provide a theoretical approximation bound of the SDP relaxation, which was first introduced in [4] and then re-established in [10] based on an alternative approach.
On Chebyshev Center of the Intersection of Two Ellipsoids
137
We first introduce the SDP relaxation in a new simple way. Consider the inner nonconvex maximization of (CC) (QP(z)) max {xT x − 2z T x + z T z}. x∈Ω
(2)
The first step is to write its Lagrangian dual problem as in [4]: (D(z))
min λ + z T z 2 2 αi FiT Fi − In αi FiT gi + z i=1 i=1 2 s.t. 2 0. T T 2 i=1 αi gi Fi + z i=1 (gi − 1)αi + λ αi ≥0,λ
(3)
Combining (D(z)) with the outer minimization yields a convex relaxation of (CC), i.e., minz v(D(z)). Then we reduce the convex relaxation to an SDP in a new way. Notice that for any z, it holds that In −z T 0 . z z = min μ : −z T μ Consequently, we have min v(D(z)) z
=
≥
min λ+μ 2 2 T In −z αi FiT Fi i=1 αi Fi gi i=1 0, s.t. − 2 2 T 2 −z T μ i=1 αi gi Fi i=1 (gi − 1)αi + λ + μ In −z 0, −z T μ z,αi ≥0,λ,μ
min t 2 2 αi FiT Fi αi FiT gi i=1 i=1 2 s.t. 2 0, T 2 i=1 αi gi Fi i=1 (gi − 1)αi + t αi ≥0,t
2
αi FiT Fi In .
(4) (5) (6)
i=1
The last inequality actually holds as an equality since one can verify that A b A b In A−1 b 0, A In =⇒ 0. (7) bT c bT c bT A−1 bT A−2 b Denote by (SDP) the SDP relaxation (4)–(6). Let (α1∗ , α2∗ ) be an optimal solution of (SDP). Then, according to (7), the optimal solution argminz v(D(z)) is recovered by
−1 2
2
αi∗ FiT Fi αi∗ FiT gi . (8) z = − i=1
i=1
Now, we establish the approximate bound of (SDP).
138
X. Cen et al.
Theorem 1. Let z be the recovered solution (8). Then, √ 2 1− γ 2 v(SDP) ≥ max x − z ≥ v(CC) ≥ √ v(SDP), √ x∈Ω 2+ γ
(9)
where the parameter γ (0 ≤ γ < 1) is the optimal value of the following univariate concave maximization problem: −1 l(λ), γ = sup λg1 2 + (1 − λ)g2 2 − l(λ)T λF1T F1 + (1 − λ)F2T F2 0 v(CC), then the CDT subproblem (QP(z ∗ )) (2) has a positive duality gap, where z ∗ is the optimal solution of (CC). Proof. According to the definitions of (QP(z)) (2) and (D(z)) (3), we have v(CC) = v(QP(z ∗ )) = min v(QP(z)) ≤ min v(D(z)) ≤ v(D(z ∗ )), z
z
142
X. Cen et al.
where the first inequality follows from weak duality and the second inequality holds trivially. Under the assumption v(SDP) > v(CC), it follows from the above chain of inequalities and the definition v(SDP) = minz v(D(z)) that v(QP(z ∗ )) < v(D(z ∗ )). The proof is complete since (D(z ∗ )) is the Lagrangian dual problem of the CDT subproblem (QP(z ∗ )). We tested 1000 instances of (CC) in two and three dimensions, respectively, where each component of the input Fi and gi (i = 1, 2) is randomly, independently and uniformly generated in {0, 0.01, 0.02, · · · , 0.99, 1}. v(CC) and v(SDP) are solved by the ellipsoid method in Sect. 3 and the solver CVX [11], respectively. To our surprise, among the 1000 two-dimensional instances, there are 766 instances satisfying v(SDP) > v(CC). While for the 1000 three-dimensional instances, the number of instances satisfying v(SDP) > v(CC) is 916. It implies that, with the help of (CC) and Proposition 1, one can generate CDT subproblem admitting positive duality gap with a high probability. Finally, we illustrate two small examples of (CC) and the corresponding CDT subproblems (QP(z ∗ )). For each example, we plot in Fig. 1 the exact Chebyshev center and the corresponding SDP approximation, the smallest covering circle v(CC) and the approximated circle via SDP relation whose radius with radius is v(SDP). One can observe that the smaller the distance between the centers of the two input ellipses, the tighter the SDP relaxation. It demonstrated the relation (10) in Theorem 1. Example 1. Let n = 2, F1 =
10 01
, g1 =
0.94 0.19
, F2 =
0.01 0.88 0.72 0.39
, g2 =
0.51 0.15
.
We can calculate v(CC) = 0.8044, z ∗ = (−0.5956, −0.2890)T and v(SDP) = 1. √ 2 1− γ The worst-case approximation ratio of (SDP) is √2+√γ = 0.0982. The CDT subproblem (QP(z ∗ )) has a positive duality gap, which is equal to v(D(z ∗ )) − v(QP(z ∗ )) = 1.2705 − 0.8044 = 0.4661. Example 2. Let 0.35 0.91 0.45 0.47 0.69 0.15 , g1 = , F2 = , g2 = . n = 2, F1 = 0.40 0.40 0.32 0.89 0.66 0.87 We have v(CC) = 10.2672, z ∗ = (−0.9975, −0.1392)T and v(SDP) = 10.8632. √ 2 1− γ The worst-case approximation ratio of (SDP) is √2+√γ = 0.2170. The CDT subproblem (QP(z ∗ )) has a positive duality gap, which is equal to v(D(z ∗ )) − v(QP(z ∗ )) = 10.9235 − 10.2672 = 0.6563.
On Chebyshev Center of the Intersection of Two Ellipsoids
143
4 0.5
3 2
0 +
1
*
-0.5
0
*+
-1
-1
-2
-1.5
-3 -1.5
-1
-0.5
0
0.5
1
1.5
-5
-4
-3
-2
-1
0
1
2
3
4
Fig. 1. Two examples in two dimension where the input ellipses are plotted in solid line. The dotted and dashed circles are the Chebyshev solutions and the SDP approximation, respectively. Chebyshev centers and the corresponding SDP approximation are marked by ∗ and +, respectively.
Acknowledgments. This research was supported by National Natural Science Foundation of China under grants 11822103, 11571029, 11771056 and Beijing Natural Science Foundation Z180005.
References 1. Ai, W., Zhang, S.: Strong duality for the CDT subproblem: a necessary and sufficient condition. SIAM J. Optim. 19(4), 1735–1756 (2009) 2. Beck, A.: Convexity properties associated with nonconvex quadratic matrix functions and applications to quadratic programming. J. Optim. Theory Appl. 142(1), 1–29 (2009) 3. Beck, A., Eldar, Y.: Strong duality in nonconvex quadratic optimization with two quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006) 4. Beck, A., Eldar, Y.: Regularization in regression with bounded noise: a Chebyshev center approach. SIAM J. Matrix Anal. Appl. 29(2), 606–625 (2007) 5. Bienstock, D.: A note on polynomial solvability of the CDT problem. SIAM J. Optim. 26(1), 488–498 (2016) 6. Burer, S.: A gentle, geometric introduction to copositive optimization. Math. Program. 151(1), 89–116 (2015) 7. Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trustregion subproblems. SIAM J. Optim. 23(1), 432–451 (2013) 8. Chen, X., Yuan, Y.: On local solutions of the Celis-Dennis-Tapia subproblem. SIAM J. Optim. 10(2), 359–383 (2000) 9. Consolini, L., Locatelli, M.: On the complexity of quadratic programming with two quadratic constraints. Math. Program. 164(1–2), 91–128 (2017) 10. Eldar, Y., Beck, A.: A minimax Chebyshev estimator for bounded error estimation. IEEE Trans. Signal Process. 56(4), 1388–1397 (2008) 11. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming error estimation, version 2.1. (March 2014). http://cvxr.com/cvx 12. Hsia, Y., Wang, S., Xu, Z.: Improved semidefinite approximation bounds for nonconvex nonhomogeneous quadratic optimization with ellipsoid constraints. Oper. Res. Lett. 43(4), 378–383 (2015)
144
X. Cen et al.
13. Milanese, M., Vicino, A.: Optimal estimation theory for dynamic systems with set membership uncertainty: an overview. Automatica 27(6), 997–1009 (1991) 14. Nesterov, Y.: Introductory Lectures on Convex Optimizaiton: A Basic Course. Kluwer Academic, Boston (2004) 15. Sakaue, S., Nakatsukasa, Y., Takeda, A., Iwata, S.: Solving generalized CDT problems via two-parameter eigenvalues. SIAM J. Optim. 26(3), 1669–1694 (2016) 16. Xia, Y., Yang, M., Wang, S.: Chebyshev center of the intersection of balls: complexity, relaxation and approximation (2019). arXiv:1901.07645 17. Yang, B., Burer, S.: A two-variable approach to the two-trust-region subproblem. SIAM J. Optim. 26(1), 661–680 (2016) 18. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1), 245–267 (2003) 19. Yuan, J., Wang, M., Ai, W., Shuai, T.: New results on narrowing the duality gap of the extended Celis-Dennis-Tapia problem. SIAM J. Optim. 27(2), 890–909 (2017) 20. Yuan, Y.: On a subproblem of trust region algorithms for constrained optimization. Math. Program. 47(1–3), 53–63 (1990)
On Conic Relaxations of Generalization of the Extended Trust Region Subproblem Rujun Jiang1(B)
and Duan Li2
1
2
School of Data Science, Fudan University, Shanghai, China [email protected] School of Data Science, City University of Hong Kong, Hong Kong, China [email protected]
Abstract. The extended trust region subproblem (ETRS) of minimizing a quadratic objective over the unit ball with additional linear constraints has attracted a lot of attention in the last few years due to its theoretical significance and wide spectra of applications. Several sufficient conditions to guarantee the exactness of its semidefinite programming (SDP) relaxation or second order cone programming (SOCP) relaxation have been recently developed in the literature. In this paper, we consider a generalization of the extended trust region subproblem (GETRS), in which the unit ball constraint in ETRS is replaced by a general, possibly nonconvex, quadratic constraint. We demonstrate that the SDP relaxation can further be reformulated as an SOCP problem under a simultaneous diagonalization condition of the quadratic form. We then explore several sufficient conditions under which the SOCP relaxation of GETRS is exact under Slater condition.
1
Introduction
We consider the following quadratically constrained quadratic programming (QCQP) problem, 1 (P0 ) min z T Cz + cT z 2 1 s.t. z T Bz + bT z + e ≤ 0, 2 AT z ≤ d,
(1)
where C and B are n×n symmetric matrices, not necessary positive semidefinite, A is an n × m matrices, c, b ∈ Rn , e ∈ R and d ∈ Rm . Problem (P0 ) is nonconvex since both the quadratic objective and the quadratic constraint may Supported by Shanghai Sailing Program 18YF1401700, Natural Science Foundation of China (NSFC) 11801087 and Hong Kong Research Grants Council under Grants 14213716 and 14202017. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 145–154, 2020. https://doi.org/10.1007/978-3-030-21803-4_15
146
R. Jiang and D. Li
be nonconvex. In fact, problem (P0 ) is NP-hard even when there is no quadratic constraint [21]. When there are no linear constraints and the quadratic constraint (1) is a unit ball constraint, problem (P0 ) reduces to the classical trust region subproblem (TRS). The TRS first arises in the trust region method for unconstrained optimization problems [7], and also admits important applications in robust optimization [1]. Various methods have been developed to solve the TRS [10,20,23]. When there are no additional linear constraints, problem (P0 ) reduces to the generalized trust region subproblem (GTRS), which is also a well studied subject in the literature [2,3,9,14,15,17,19,26,27]. When the quadratic constraint (1) reduces to a unit ball constraint, problem (P0 ) is termed the extended trust region subproblem (ETRS), which has recently attracted much attention in the literature [5,6,8,11–13,18,27,30]. The ETRS is nonconvex and semidefinite programming (SDP) relaxation has been a widely used technique for solving the ETRS. However, the SDP relaxation is often not tight enough and consequently only offers a lower bound, even for the case with m = 1 [27]. Jeyakumar and Li [13] first provided the following dimension condition under which the SDP relaxation is exact, dim Ker(C − λmin (C)In ) ≥ dim span{a1 , . . . , am } + 1, where λmin (C) stands for the minimal eigenvalue of C and [a1 , . . . , am ] = A, and showed its immediate application in robust least squares and a robust SOCP model problem. Hsia and Sheu [12] derived a more general sufficient condition, rank[C − λmin (C)In , a1 , . . . , am ] ≤ n − 1. After that, using KKT conditions of the SDP relaxation (in fact, an equivalent SOCP relaxation) of the ETRS, Locatelli [18] presented a better sufficient condition than [12], which corresponds to the solution conditions of a specific linear system. Meanwhile, Ho-Nguyen and Kilinc-Karzan [11] also developed a sufficient condition by identifying the feasibility of a linear system. In fact, the two conditions in [11,12] are equivalent for the ETRS as stated in [11]. In this paper, we mainly focus on a generalization of ETRS (GETRS), which replaces the unit ball constraint in ETRS with a general, possibly nonconvex, quadratic constraint. To the best of our knowledge, the current literature lacks study on the equivalence between the GETRS and its SDP relaxation. Our study in this paper on the equivalence between the GETRS and its SDP relaxation is motivated not only by wide applications of the GETRS, but also by its theoretical implication to a more general class of QCQP problems. The GETRS is much more difficult than ETRS as the feasible region of the GETRS is no longer compact and the optimal solution may be unattainable in some cases and the null space of C + uB in the GETRS is more complicated than that in the ETRS, where u is the corresponding KKT multiplier of constraint (1). To introduce our investigation of sufficient conditions when the SDP relaxation is exact, we first define the set IP SD = {λ : C + λB 0}, which is in fact an interval [19]. Define IP+SD = IP SD R+ , where R+ is the nonnegative orthogonal axis. We then focus
On Conic Relaxations of Generalization of the Extended
147
the condition that the set IP SD has a nonempty interior. We mainly show that under this condition the SDP relaxation is equivalent to an SOCP reformulation. We then derive sufficient conditions under which the SDP relaxation of problem (P) is tight. Notation For any index set J, we define AJ as the restriction of matrix A to the rows indexed by J and vJ as the restriction of vector v to the entries indexed by J. We denote by the notation J C the complementary set of J. The notation v denotes the Euclidean norm of vector v. We use Diag(A) and diag(a) to denote the vector formed by the diagonal entries of matrix A and the diagonal matrix formed by vector a, respectively. And v(·) represents the optimal value of problem (·). We use Null(A) to denote the null space of matrix A.
2
Optimality Conditions
In this section, to simplify our problem, we consider the case when Slater condition of the SDP relaxationholds and further show a sufficient exactness condition of the SDP relaxation when IP+SD has a nonempty interior. In this section, we consider the case IP+SD has a nonempty interior, which is also known as the regular condition in the study of the GTRS [19,26]. In fact, int(IP+SD ) = ∅ implies that the two matrices C and B are SD [28]. That is, there exists a nonsingular matrix U such that U T CU and U T BU both become diagonal matrices. Then problem (P0 ) can then be reformulated, via a change of variables z = U x, as follows, (P) min s.t.
n 1 i=1 n i=1
2
δi x2i
+
n
εi xi
i=1 n
1 αi x2i + βi xi + e ≤ 0, 2 i=1
A¯T x ≤ d, where δ = Diag(U T CU ), α = Diag(U T BU ), ε = U T c, β = U T b and A¯ = U T A. By invoking augmented variables yi = x2i and relaxing to yi ≥ x2i , we have the following SOCP relaxation, (SOCP) min s.t.
n 1 i=1 n i=1
2
δi yi +
n
εi xi
i=1 n
1 αi yi + βi xi + e ≤ 0, 2 i=1
A¯T x ≤ d, x2i ≤ yi , i = 1, . . . , n. The equivalence of (SOCP) and (SDP) is obvious and thus we only need to focus on identifying the exactness of (SOCP).
148
R. Jiang and D. Li
It is well known that under Slater condition any optimal solution of convex problems must satisfy KKT conditions [4]. This fact enables us to find sufficient conditions that guarantee the exactness of the SDP relaxation. Let us denote the jth column of matrix A¯ by aj . Then the KKT conditions of the convex problem (SOCP) are given as follows: + uαi ) − wi = 0, i = 1 . . . , n, m εi + uβi + j=1 vj aji + wi xi = 0, i = 1 . . . , n, n n 1 i=1 2 αi yi + i=1 βi xi + e ≤ 0, j T (¯ a ) x ≤ dj , j = 1, . . . , m, ≤ yi , i = 1, . . . , n, x2i n n u( i=1 12 αi yi + i=1 βi xi + e) = 0 vj ((¯ aj )T x − dj ) = 0 j = 1, . . . , m, i = 1, . . . , n, wi (x2i − yi ) = 0, j = 1, . . . , m, i = 1, . . . , n, u, vj , wi ≥ 0 1 2 (δi
(2)
n n where u is the KKT multiplier of the constraint i=1 12 αi yi + i=1 βi xi + e ≤ 0, aj )T x ≤ dj , j = 1, . . . , m, and wi is vj is the KKT multiplier of the constraint (¯ 2 the KKT multiplier of the constraint xi ≤ yi , i = 1, . . . , n. The following lemma shows that the SDP relaxation is always bounded from below and the optimal solution is attainable if int(IP+SD ) = ∅ and problem (P) is feasible, which is weaker than Slater condition of the original problem (P). Lemma 1 If int(IP+SD ) = ∅ and problem (P) is feasible, then the SDP relaxation of (P) is bounded from below and the optimal value is attainable. Proof. Consider the following Lagrangian dual problem of (P) ([24,25]), which is also the conic dual problem of (SDP), (L) max − τ /2 + ue − dT v u,v,τ C + uB ε + uβ + Av s.t. M := 0. (ε + uβ + Av)T τ u ≥ 0, v ≥ 0. Since int(IP+SD ) = ∅, we can always find some (v, τ ) such that the matrix M is positive semidefinite for any u ∈ int(IP+SD ). In fact, for any u ∈ int(IP+SD ) we have C + uB 0 and thus ∃τ ≥ 0 such that M 0 for every v ≥ 0, e.g., τ = (ε + uβ + Av)T (C + uB)−1 (ε + uβ + Av) + 1. This means (τ, u, v) satisfies Slater condition for problem (L). As Slater condition of problem (SDP ) implies its feasibility, we have v(SDP) ≤ +∞. And problem (L) is bounded from above due to weak duality, i.e., v(D) ≤ v(SDP). Hence from strong duality, the optimal value of the SDP relaxation is equivalent to problem (L) and the objective value is attainable [4]. For any u ∈ IP+SD , let us define J(u) = {i : δi + uαi = 0, i = 1, . . . , n}. We will use J instead of J(u) for simplicity if it does not cause any confusion. So
On Conic Relaxations of Generalization of the Extended
149
we have AJ = [a1J , . . . , am J ], where the superscribe means the column index. We next show a sufficient condition, which is a generalization of the result in [18], to guarantee the exactness of the SDP relaxation. Condition 2 The interior of IP+SD is not empty. For any u ∈ ∂IP+SD , if J = ∅, then {v : εJ + uβJ + A¯J v = 0} ∩ Rm + = ∅. Theorem 3 Assume that Slater condition holds for problem (SOCP). If Condition 2 holds, the SDP relaxation is exact and the optimal values of both the SDP relaxation and problem (P) are attainable. Proof. From Lemma 1, we obtain that (SOCP) is bounded from below and the optimal solution is attainable. Then due to Slater condition, every optimal solution of (SOCP) must be a KKT solution of system (2). So we have the following two cases: 1. If u ∈ ∂IP+SD , then either J = ∅ or J = ∅. For the first case, 21 (δi +uαi )−wi = 0 implies that wi = 12 (δi + uαi ) > 0. This, together with complementary slackness wi (x2i − yi ) = 0, implies that x2i = yi , i.e., (SOCP) is already exact. For the latter case, the KKT condition 12 (δi + uαi ) − wi = 0, i = 1, . . . , n, implies that wi = 0, ∀i ∈ J. But Condition 2 shows {v : J + uβJ + A¯J v = 0} ∩ Rm + = ∅, i.e., there is no KKT solution satisfying the second equations in (2) in this case. 2. Otherwise, u ∈ int(IP+SD ) and wi = 12 (δi + uαi ) > 0 for all u ∈ int(IP+SD ). By the complementary slackness wi (x2i − yi ) = 0, we have x2i − yi = 0, ∀i = 1, . . . , n, and thus the SOCP relaxation is exact. As the optimal value of SDP relaxation is attainable, so is problem (P).
Let us consider now the following illustrative example In this problem, IP+SD = [1, 2] is an interval and ∂IP+SD = {1, 2}. One may check that Condition 2 is satisfied. The optimal value of the SDP relaxation is −2.44082 with x = 1.6660 1.0534 T (−1.2907 − 0.8161) and X = . It is easy to verify that 1.0534 0.6660 X = xxT and the SDP relaxation is exact. Motivated by the perturbation condition in Theorem 3.1 in [18], we propose the following condition to extend Condition 2. Condition 4 The interior of IP+SD . For any u ∈ ∂IP+SD , if J = ∅, then ∀ > 0, ∃ η ∈ RJ such that η ≤ and {v : εJ + η + uβJ + A¯J v = 0} ∩ Rm + = ∅. Condition 4 will also guarantee the exactness of the SDP relaxation under the same mild assumptions as in Theorem 3. Theorem 5 Assume that Slater condition holds for problem (SOCP). If Condition 4 holds, the SDP relaxation is exact and the optimal values of SDP relaxation and problem (P) both are attainable. Detailed proof please see our working paper [16].
150
R. Jiang and D. Li
Remark 6 When B reduces to the identical matrix, problem (P) reduces to the ETRS and an exactness condition is given in [18]. The difficulty in our proof, compared to the results in [18], mainly comes from the possible non-compactness of the feasible region. Now let us consider another illustrative example, (P2 ) min − x21 + 2x22 x21 − x22 ≤ 1, x1 + x2 ≤ 1. In the above example, IP+SD = [1, 2] is an interval and ∂IP+SD = {1, 2}. It is easy to verify that Condition 2 is not fulfilled but √ Condition 4 is fulfilled for any > 0 and η = t(1 1)T , where t ∈ R and t ≤ 2 /2. The optimal value of the SDP 10 T relaxation is −1 with x = (−1 0) and X = . So we have X = xxT and 00 the SDP relaxation is exact. In fact, Condition 2 holds if and only if the following linear programming problem has no solution, (LP) min 0 v
s.t. εJ + uβJ + A¯J v = 0, v ≥ 0. The duality of problem (LP) is (LD) max −(εJ + uβJ )T y y
s.t. A¯TJ y ≤ 0. We show in the following lemma a nontrivial observation from strong duality of linear programming. The proof is similar to Proposition 3.3 in [18] but we give one here for completeness. Lemma 7 Condition 2 is fulfilled if and only if (LD) is unbounded from above, and when Condition 2 fails, Condition 4 holds if and only if (LD) has multiple optimal solutions. Proof. The first statement follows directly from the infeasibility of (LP) and strong duality. Condition 4 is equivalent to that ∀ > 0, ∃ η ∈ RJ with η ≤ and {v : εJ + η + uβJ + A¯J v = 0} ∩ Rm + = ∅, i.e., (LP ) min 0 v
s.t. εJ + η + uβJ + A¯J v = 0, v ≥ 0.
On Conic Relaxations of Generalization of the Extended
151
The duality of problem (LP ) is (LD ) max −(εJ + η + uβJ )T y y
s.t. A¯TJ y ≤ 0. Similarly, we conclude that (LD ) is unbounded from above
The above lemma shows: Condition 2 holds if and only if (LD) is unbounded from above, which is equivalent to that there exists a nonzero y¯ such that A¯TJ y¯ ≤ y , we have A¯TJ y˜ ≤ 0 and 0 and −(εJ + uβJ )T y¯ > 0. And thus by defining y˜ = k¯ T −(εJ + uβJ ) y˜ → ∞ as k → ∞; On the other hand, when Condition 2 fails, Condition 4 holds if and only if there exists a nonzero y¯ such that A¯TJ y¯ ≤ 0 and −(εJ + uβJ )T y¯ = 0. The above two statements can be simplified as: There exists a nonzero y¯ such that A¯TJ y¯ ≤ 0 and (εJ + uβJ )T y¯ ≤ 0. Assume θ ∈ Rn with θJ C = 0, θJ = y¯. Then we hvae A¯T θ = A¯TJ y ≤ 0, (ε + uβ)T θ = (εJ + uβJ )T y¯ ≤ 0, which is also equivalent to, by defining θ = U z, that ∃z ∈ Rn such that (C + uB)z = 0, AT z ≤ 0 and (ε + uβ)T z ≤ 0. (Note that U is the congruent matrix such that U T CU = diag(δ) and U T BU = diag(α) as mentioned in the beginning of this section.) The above implication suggests that Conditions 2 and 4 can be combined as the following condition. Condition 8 For any u ∈ ∂IP+SD , if Null(C + uB) = ∅, there exists a nonzero z ∈ Rn such that (C + uB)z = 0, AT z ≤ 0 and (ε + uβ)T z ≤ 0. Then we can summarise our main result under the condition int(IP+SD ) = ∅ in the following theorem. Theorem 9 When int(IP+SD ) = ∅ and Condition 8 holds, the SDP relaxation is exact. Moreover, if Slater condition holds, both problem (P) and its SDP relaxation are bounded from below and the optimal values are attainable. An advantage of Condition 8 is that it can be directly checked by the original data set, i.e., we do not need invoke the congruence transformation to get the SD form. In particular, when the quadratic constraint reduces to the unit ball constraint, problem (P) reduces to the ETRS, and Condition 8 reduces to Condition 2.1 in [11], i.e., there exists a nonzero vector z such that(C + λmin (C)I)z = 0, AT z ≤ 0 and cT z ≤ 0; Conditions 2 and 4 reduces to (13) and (14) in [18]. As a result, for problem (BP), Condition 2.1 in [11] is equivalent to (13) and (14) in [18], which was also indicated in [11]. Together with the fulfillment of Slater condition, we further have the following S-lemma with linear inequalities. Theorem 10 (S-lemma with linear inequalities) Assume that there exists (X, x) such that 12 B • X + bT x + e ≤ 0, AT x ≤ d and X xxT , int(IP+SD ) = ∅ and Condition 8 holds. Then the following two statements are equivalent: (i) 12 xT Bx + bT x + e ≤ 0 and AT x ≤ d ⇒ 12 xT Cx + cT x + γ ≥ 0. (ii) ∃u, v1 , . . . , vm ≥ 0, ∀x ∈ Rn , 12 xT Cx + cT x + f + u( 12 xT Bx + bT x + e) + v T (AT x − d) ≥ 0.
152
R. Jiang and D. Li
Proof. It is obvious that (ii) ⇒ (i). Next let us prove (i) ⇒ (ii). From Theorem 9, we obtain that the SDP relaxation is bounded from below. So the SDP relaxation is equivalent to the Lagrangian duality of problem (P) [24]. Hence, max min L(x, u, v) :=
u≥0,v≥0
x
1 T 1 x Cx + cT x + u( xT Bx + bT x + e) + v T (Ax − d) 2 2
= v(SDP) = v(P) 1 1 = min{ xT Cx + cT x : xT Bx + bT x + e ≤ 0, Ax ≤ d}. x 2 2
Thus, min{ 12 xT Cx + cT x : 12 xT Bx + bT x + e ≤ 0, Ax ≤ d} ≥ −γ is equivalent x
to
max min L(x, u, v) :=
u≥0,v≥0
x
1 T 1 x Cx+cT x+u( xT Bx+bT x+e)+v T (Ax−d) ≥ −γ. 2 2
The latter statement implies that ∃u, v1 , . . . , vm ≥ 0, ∀x ∈ Rn , 12 xT Cx + cT x + γ + u( 12 xT Bx + bT x + e) + v T (Ax − d) ≥ 0, which is exactly statement ii). Remark 11. The classical S-lemma, which first proposed by Yakubovich [29], and its variants have a lot of applications in the real world, see the survey paper [22]. To the best of our knowledge, our S-lemma is the most general one with linear constraints, while the S-lemma in Jeyakumar and Li [13] is confined to a unit ball constraint.
3
Conclusions
In this paper, we investigate sufficient conditions to guarantee the exactness of the SDP relaxation for the GETRS. Our main contribution is to propose different sufficient conditions to guarantee the exactness under a regular condition and Slater condition, based on the KKT system for the SDP relaxation of the GETRS. In fact, when the quadratic constraint becomes an equality in problem (P), our sufficient conditions still guarantee, albeit with a slight modification, the exactness of the SDP relaxation. Since the technique is similar, we omit the details for the case of problem (P) with an equality quadratic constraint. For future research directions, we will investigate more general sufficient conditions to guarantee the exactness of the SDP relaxation and extend our sufficient conditions in this paper to a wider class of QCQP problems.
References 1. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust optimization. Princeton University Press (2009)
On Conic Relaxations of Generalization of the Extended
153
2. Ben-Tal, A., den Hertog, D.: Hidden conic quadratic representation of some nonconvex quadratic optimization problems. Math. Program. 143(1–2), 1–29 (2014) 3. Ben-Tal, A., Teboulle, M.: Hidden convexity in some nonconvex quadratically constrained quadratic programming. Math. Program. 72(1), 51–63 (1996) 4. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004) 5. Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trustregion subproblems. SIAM J. Optim. 23(1), 432–451 (2013) 6. Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear constraints. Math. Program. 149(1–2), 253–264 (2015) 7. Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods, vol. 1. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000) 8. Fallahi, S., Salahi, M., Karbasy, S.A.: On SOCP/SDP formulation of the extended trust region subproblem (2018). arXiv:1807.07815 9. Feng, J.M., Lin, G.X., Sheu, R.L., Xia, Y.: Duality and solutions for quadratic programming over single non-homogeneous quadratic constraint. J. Glob. Optim. 54(2), 275–293 (2012) 10. Hazan, E., Koren, T.: A linear-time algorithm for trust region problems. Math. Program. 1–19 (2015) 11. Ho-Nguyen, N., Kilinc-Karzan, F.: A second-order cone based approach for solving the trust-region subproblem and its variants. SIAM J. Optim. 27(3), 1485–1512 (2017) 12. Hsia, Y., Sheu, R.L.: Trust region subproblem with a fixed number of additional linear inequality constraints has polynomial complexity (2013). arXiv:1312.1398 13. Jeyakumar, V., Li, G.: Trust-region problems with linear inequality constraints: exact SDP relaxation, global optimality and robust optimization. Math. Program. 147(1–2), 171–206 (2014) 14. Jiang, R., Li, D.: Novel reformulations and efficient algorithm for the generalized trust region subproblem (2017). arXiv:1707.08706 15. Jiang, R., Li, D.: A linear-time algorithm for generalized trust region problems (2018). arXiv:1807.07563 16. Jiang, R., Li, D.: Exactness conditions for SDP/SOCP relaxations of generalization of the extended trust region subproblem. Working paper (2019) 17. Jiang, R., Li, D., Wu, B.: SOCP reformulation for the generalized trust region subproblem via a canonical form of two symmetric matrices. Math. Program. 169(2), 531–563 (2018) 18. Locatelli, M.: Exactness conditions for an SDP relaxation of the extended trust region problem. Optim. Lett. 10(6), 1141–1151 (2016) 19. Mor´e, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2(3–4), 189–209 (1993) 20. Mor´e, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983) 21. Pardalos, P.M.: Global optimization algorithms for linearly constrained indefinite quadratic problems. Comput. Math. Appl. 21(6), 87–97 (1991) 22. P´ olik, I., Terlaky, T.: A survey of the s-lemma. SIAM Rev. 49(3), 371–418 (2007) 23. Rendl, F., Wolkowicz, H.: A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 77(1), 273–299 (1997) 24. Shor, N.Z.: Quadratic optimization problems. Sov. J. Comput. Syst. Sci. 25(6), 1–11 (1987)
154
R. Jiang and D. Li
25. Shor, N.: Dual quadratic estimates in polynomial and boolean programming. Ann. Oper. Res. 25(1), 163–168 (1990) 26. Stern, R.J., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM J. Optim. 5(2), 286–313 (1995) 27. Sturm, J.F., Zhang, S.: On cones of nonnegative quadratic functions. Math. Oper. Res. 28(2), 246–267 (2003) 28. Uhlig, F.: Definite and semidefinite matrices in a real symmetric matrix pencil. Pac. J. Math. 49(2), 561–568 (1973) 29. Yakubovich, V.A.: S-procedure in nonlinear control theory. Vestnik Leningrad University, vol. 1, pp. 62–77 (1971) 30. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1), 245–267 (2003)
On Constrained Optimization Problems Solved Using the Canonical Duality Theory Constantin Z˘ alinescu1,2(B) 1
2
University “Al. I. Cuza” Iasi, Bd. Carol I 11, Iasi, Romania [email protected] Octav Mayer Institute of Mathematics, Bd. Carol I 8, Iasi, Romania
Abstract. D.Y. Gao together with some of his collaborators applied his Canonical duality theory (CDT) for solving a class of constrained optimization problems. Unfortunately, in several papers on this subject there are unclear statements, not convincing proofs, or even false results. It is our aim in this work to study rigorously this class of constrained optimization problems in finite dimensional spaces and to point out several false results published in the last ten years.
1
Preliminaries
We consider the following constrained minimization problem (PJ ) min f (x) s.t. x ∈ XJ , where J ⊂ 1, m, XJ := x ∈ Rn | [∀j ∈ J : gj (x) = 0] ∧ [∀j ∈ J c : gj (x) ≤ 0] with J c := 1, m \ J, f := g0 and gk (x) := qk (x) + Vk (Λk (x))
x ∈ Rn , k ∈ 0, m ,
qk and Λk being quadratic functions on Rn , and Vk ∈ Γsc := Γsc (R) for k ∈ 0, m. To be more precise, we take qk (x) :=
1 2
x, Ak x−bk , x+ck ∧ Λk (x) :=
1 2
x, Ck x−dk , x+ek
(x ∈ Rn )
with Ak , Ck ∈ Sn , bk , dk ∈ Rn (seen as column matrices), and ck , ek ∈ R for k ∈ 0, m, where Sn denotes the set of n × n real symmetric matrices; of course, c0 can be taken to be 0. Γsc (Rp ) is the class of those functions h : Rp → R := R ∪ {−∞, +∞} which are essentially strictly convex and essentially smooth, that is the class of proper lsc convex functions of Legendre type (see [1, Sect. 26]). For h ∈ Γsc (Rp ) we have: h∗ ∈ Γsc (Rp ), dom ∂h = int(dom h), and h is differentiable on int(dom h), where the conjugate h∗ of h is defined by h∗ (σ) := sup{y, σ − h(y) | y ∈ c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 155–163, 2020. https://doi.org/10.1007/978-3-030-21803-4_16
156
C. Z˘ alinescu
Rp } ∈ R; moreover, ∇h : int(dom h) → int(dom h∗ ) is bijective and continuous −1 with (∇h) = ∇h∗ . It follows that Γsc := Γsc (R) is the class of those proper convex and lsc functions h : R → R with the property that h and h∗ are strictly convex and derivable on the interior of their domains; hence h : int(dom h) → int(dom h∗ ) is continuous, bijective and (h )−1 = (h∗ ) whenever h ∈ Γsc . The problem (P1,m ) [resp. (P∅ )], denoted by (Pe ) [resp. (Pi )], is a minimization problem with equality [resp. inequality] constraints whose feasible set is Xe := X1,m [resp. Xi := X∅ ]. In many examples considered by D.Y. Gao and his collaborators, some functions gk are quadratic, that is gk = qk ; to take this situation into account we set Q := {k ∈ 0, m | gk = qk }, Q0 := Q \ {0} = 1, m ∩ Q. For k ∈ Q we take Λk := 0 and Vk (t) := 12 t2 for t ∈ R; then clearly Vk∗ = Vk ∈ Γsc . Clearly, Ck = 0 ∈ Sn , dk = 0 ∈ Rn and ek = 0 ∈ R for k ∈ Q. We use also the notations m (1) Ik := domVk , Ik∗ := domVk∗ (k ∈ 0, m), I ∗ := k=0 Ik∗ ; of course, Ik = Ik∗ = R for k ∈ Q. In order to simplify the writing, in the sequel λ0 := λ0 := 1. To the functions f (= g0 ) and (gj )j∈1,m we associate several sets and functions. The Lagrangian L : X × Rm → R is defined by m m λj gj (x) = λk [qk (x) + Vk (Λk (x))] , L(x, λ) := f (x) + j=1
k=0
where λ := (λ1 , ..., λm ) ∈ R , and m (domVk ) , X : = x ∈ Rn | ∀k ∈ 0, m : Λk (x) ∈ domVk = k=0 Λ−1 k n X0 := x ∈ R | ∀k ∈ 0, m : Λk (x) ∈ int(domVk ) ⊂ intX; T
m
clearly X0 is open and L is differentiable on X0 . Using Gao’s procedure, we consider the “extended Lagrangian” Ξ associated to f and (gj )j∈1,m : m Ξ : Rn ×R1+m ×I ∗ → R, Ξ(x, λ, σ) := λk [qk (x) + σk Λk (x) − Vk∗ (σk )] , k=0
∗
where I is defined in (1) and σ := (σ0 , σ1 , ..., σm ) ∈ R × Rm = R1+m . Clearly, Ξ(·, λ, σ) is a quadratic function for every fixed (λ, σ) ∈ Rm × I ∗ . Considering the mappings G : Rm × R1+m → Sn , F : Rm × R1+m → Rn , E : Rm × R1+m → R defined by m m G(λ, σ) := λk (Ak + σk Ck ), F (λ, σ) := λk (bk + σk dk ), k=0 k=0 m λk (ck + σk ek ), E(λ, σ) := k=0
∗
for (λ, σ) ∈ R × I we have that m
Ξ(x, λ, σ) =
1 2
x, G(λ, σ)x − F (λ, σ), x + E(λ, σ) −
m k=0
λk Vk∗ (σk ).
(2)
On Constrained Optimization Problems Solved Using CDT
157
Remark 1. Note that G, F and E do not depend on σk for k ∈ Q. Moreover, G, F and E are affine functions when 1, m ⊂ Q, that is Q0 = 1, m. For λ ∈ Rm and J ⊂ 1, m we set M= (λ) := {j ∈ 1, m | λj = 0},
0 M= (λ) := M= (λ) ∪ {0}
and m ΓJ := λ ∈ Rm | λj ≥ 0 ∀j ∈ J c ⊃ Rm + := {λ ∈ R | λj ≥ 0 ∀j ∈ 1, m }, respectively; clearly, Γ∅ = Rm +,
Γ1,m = Rm ,
ΓJ∩K = ΓJ ∩ ΓK
∀J, K ⊂ 1, m.
Useful relations among the Lagrangian L, the extended Lagrangian Ξ, and the objective function f are provided in the next result. Lemma 1. Let x ∈ X and J ⊂ 1, m. Then L(x, λ) = sup Ξ(x, λ, σ) σ∈IJ,Q
where IJ,Q :=
m k=0
Ik∗∗
with Ik∗∗
:=
∀λ ∈ ΓJ∩Q ,
{0} if k ∈ J ∩ Q, Ik∗ if k ∈ 0, m \ (J ∩ Q),
and
sup
Ξ(x, λ, σ) =
(λ,σ)∈ΓJ∩Q ×IJ,Q
sup L(x, λ) = λ∈ΓJ∩Q
f (x) if x ∈ XJ∩Q , ∞ if x ∈ X \ XJ∩Q .
We consider also the sets TQ := (λ, σ) ∈ Rm × I ∗ | det G(λ, σ) = ∅ ∧ [∀k ∈ Q : σk = 0] , TQ,col : = (λ, σ) ∈ Rm × I ∗ | F (λ, σ) ∈ Im G(λ, σ) ∧ [∀k ∈ Q : σk = 0] ⊇ TQ , TQJ+ := (λ, σ) ∈ TQ | λ ∈ ΓJ∩Q , G(λ, σ) 0 , J+ := (λ, σ) ∈ TQ,col | λ ∈ ΓJ∩Q , G(λ, σ) 0 ⊇ TQJ+ , TQ,col as well as the sets T := T∅ ,
Tcol := T∅,col ,
T + := T∅∅+ ,
+ ∅+ Tcol := T∅,col ;
J+ + are not convex, unlike their corresponding sets Y + , Ycol in general TQJ+ and TQ,col + J+ and S + , Scol from [2,3], respectively. However, Tcol , TQJ+ and TQ,col are convex whenever Q0 = 1, m. In the present context it is natural (in fact necessary) to take λ ∈ ΓQ0 . As in [2,3], we consider the (dual objective) function
D : Tcol → R,
D(λ, σ) := Ξ(x, λ, σ) with G(λ, σ)x = F (λ, σ);
158
C. Z˘ alinescu
D is well defined by [2, Lem. 1 (ii)]. For further use consider ξ : T → Rn ,
ξ(λ, σ) := G(λ, σ)−1 F (λ, σ).
(3)
For (λ, σ) ∈ T we obtain that m D(λ, σ) = − 12 F (λ, σ), G(λ, σ)−1 F (λ, σ) + E(λ, σ) − λk Vk∗ (σk ). (4) k=0
Taking into account (2), we have that Ξ(·, λ, σ) is [strictly] convex for (λ, σ) ∈ + [(λ, σ) ∈ T + ], and so Tcol D(λ, σ) = minn Ξ(x, λ, σ) x∈R
∀(λ, σ) ∈ Tcol such that G(λ, σ) 0,
the minimum being attained uniquely at ξ(λ, σ) when, moreover, G(λ, σ) 0. The next result shows that the so-called “complimentary-dual principle” (or “perfect duality formula”) holds under mild assumptions on the data. n m ∗ Proposition 1. Let (x, λ, σ) ∈ R × R × I be such that ∇x Ξ(x, λ, σ) = 0, ∂Ξ ∂σ0 (x, λ, σ) = 0, and λ, ∇λ Ξ(x, λ, σ) = 0. Then (λ, σ) ∈ Tcol and
f (x) = Ξ(x, λ, σ) = D(λ, σ).
(5)
Other relations between L and Ξ are provided by the next result. Lemma 2. Let (x, λ, σ) ∈ X0 × Rm × intI ∗ be such that ∇σ Ξ(x, λ, σ) = 0 and σ k = 0 for k ∈ Q. Then L(x, λ) = Ξ(x, λ, σ) and ∇x L(x, λ) = ∇x Ξ(x, λ, σ). ∂L ∂Ξ (x, λ) ≥ ∂λ (x, λ, σ), with equality if j ∈ M= (λ)∪Q0 ; Moreover, for j ∈ 1, m, ∂λ j j in particular ∇λ L(x, λ) = ∇λ Ξ(x, λ, σ) if M= (λ) ⊃ Qc0 (= 1, m \ Q).
Observe that T ∩ (Rm × intI ∗ ) ⊂ intT , and for any σ ∈ I ∗ we have that the set {λ ∈ Rm | (λ, σ) ∈ T } is open. Similarly to the computation of ∂D(λ) ∂λj in [2, p. 5], using the expression of D(λ, σ) in (4), we get ∂D(λ, σ) = qj (ξ(λ, σ)) + σj Λj (ξ(λ, σ)) − Vj∗ (σj ) ∂λj
∀j ∈ 1, m, ∀(λ, σ) ∈ T,
and ∂D(λ, σ) = λk [Λk (ξ(λ, σ)) − Vk∗ (σk )] ∂σk
∀k ∈ 0, m, ∀(λ, σ) ∈ T ∩ (Rm × intI ∗ ).
Lemma 3. Let (λ, σ) ∈ (Rm × intI ∗ ) ∩ T and set x := ξ(λ, σ). Then ∇x Ξ(x, λ, σ) = 0 ∧ ∇λ Ξ(x, λ, σ) = ∇λ D(λ, σ) ∧ ∇σ Ξ(x, λ, σ) = ∇σ D(λ, σ). In particular, (x, λ, σ) is a critical point of Ξ if and only if (λ, σ) is a critical point of D.
On Constrained Optimization Problems Solved Using CDT
159
Similarly to [2], we say that (x, λ) ∈ X0 × Rm is a J-LKKT point of L if ∇x L(x, λ) = 0 and
∂L ∂L ∂L ∀j ∈ J c : λj ≥ 0 ∧ ∂λ (x, λ) ≤ 0 ∧ λ (x, λ) = 0 ∧ ∀j ∈ J : (x, λ) = 0 , j ∂λ ∂λ j j j or, equivalently, x ∈ XJ
∧ λ ∈ ΓJ
∧
∀j ∈ J c : λj gj (x) = 0 ;
moreover, we say that x ∈ X0 is a J-LKKT point of (PJ ) if there exists λ ∈ Rm such that (x, λ) is a J-LKKT point of L. Inspired by these notions, we say that (x, λ, σ) ∈ Rn × Rm × intI ∗ is a J-LKKT point of Ξ if ∇x Ξ(x, λ, σ) = 0, ∂Ξ (x, λ, σ) = 0 for all j ∈ J and ∇σ Ξ(x, λ, σ) = 0, ∂λ j ∀j ∈ J c : λj ≥ 0 ∧
∂Ξ ∂λj (x, λ, σ)
∂Ξ ≤ 0 ∧ λj ∂λ (x, λ, σ) = 0, j
and (λ, σ) ∈ (Rm × intI ∗ ) ∩ T is a J-LKKT point of D if ∇σ D(λ, σ) = 0 and
∂D ∂D ∂D ∀j ∈ J c : λj ≥ 0 ∧ ∂λ (λ, σ) ≤ 0 ∧ λ (λ, σ) = 0 ∧ ∀j ∈ J : (λ, σ) = 0 . j ∂λ ∂λ j j j In the case in which J = ∅ we obtain the notions of KKT points for Ξ and D. So, (x, λ, σ) ∈ Rn × Rm × intI ∗ is a KKT point of Ξ if ∇x Ξ(x, λ, σ) = 0, ∇σ Ξ(x, λ, σ) = 0 and m λ ∈ Rm λ, ∇λ Ξ(x, λ, σ) = 0, (6) + ∧ ∇λ Ξ(x, λ, σ) ∈ R− ∧ m where Rm | λj ≤ 0 ∀j ∈ 1, m }, and (λ, σ) ∈ Rm × intI ∗ is a KKT − := {λ ∈ R point of D if ∇σ D(λ, σ) = 0 and m λ ∈ Rm λ, ∇λ D(λ, σ) = 0. + ∧ ∇λ D(λ, σ) ∈ R− ∧
The definition of a KKT point for Ξ is suggested in the proof of [4, Th. 3]. Observe that (x, λ, σ) verifying the conditions in (6) is called critical point of Ξ in [5, p. 477]. Corollary 1. Let (λ, σ) ∈ (Rm × intI ∗ ) ∩ T . (i) If x := ξ(λ, σ), then (x, λ, σ) is a J-LKKT point of Ξ if and only if (λ, σ) is a J-LKKT point of D. (ii) If M= (λ) = 1, m, then (x, λ, σ) is a J-LKKT point of Ξ if and only if (x, λ, σ) is a critical point of Ξ, if and only if x = ξ(λ, σ) and (λ, σ) is a critical point of D. Remark 2. Taking into account Remark 1, as well as (3) and Lemma 3, the functions ∇x Ξ, ξ, ∇σ D do not depend on σk for k ∈ Q. Consequently, if (x, λ, σ) ˜ ) is also is a J-LKKT point of Ξ then σ k = 0 for k ∈ Q ∩ M= (λ), and (x, λ, σ ˜k := σ k for k ∈ 0, m \ Q. a J-LKKT point of Ξ, where σ ˜k := 0 for k ∈ Q and σ Conversely, taking into account that ∇σ D does not depend on σk for k ∈ Q, if ˜ ) is also a J-LKKT point of D, (λ, σ) ∈ T is a J-LKKT point of D then (λ, σ ˜k := σ k for k ∈ 0, m \ Q. where σ ˜k := 0 for k ∈ Q and σ
160
C. Z˘ alinescu
Having in view the previous remark, without loss of generality, in the sequel (if not mentioned otherwise) we shall assume that σ k = 0 for k ∈ Q when (x, λ, σ) ∈ Rn × Rm × intI ∗ is a J-LKKT point of Ξ, or (λ, σ) ∈ T is a J-LKKT point of D.
2
The Main Result
The main result of the paper is the next one; in it we can see the roles of different hypotheses in getting the main conclusion, that is the min-max duality formula provided by Eq. (7). Proposition 2. Let (x, λ, σ) ∈ Rn × Rm × intI ∗ be a J-LKKT point of Ξ such that σ k = 0 for k ∈ Q. (i) Then λ ∈ ΓJ , (λ, σ) ∈ TQ,col , λ, ∇λ Ξ(x, λ, σ) = 0, L(x, λ) = Ξ(x, λ, σ), ∇x L(x, λ) = 0, and (5) holds. (ii) Moreover, assume that Qc0 ⊂ M= (λ). Then ∇λ L(x, λ) = ∇λ Ξ(x, λ, σ), (x, λ) is a J-LKKT point of L and x ∈ XJ∪Qc0 . (iii) Furthermore, assume that λj > 0 for all j ∈ Qc0 and G(λ, σ) 0. Then J+ x ∈ XJ∪Qc0 ⊂ XJ ⊂ XJ∩Q , (λ, σ) ∈ TQ,col , and f (x) =
inf
x∈XJ∩Q
f (x) = Ξ(x, λ, σ) = L(x, λ) =
sup
D(λ, σ) = D(λ, σ);
J+ (λ,σ)∈TQ,col
(7) moreover, if G(λ, σ) 0 then x is the unique global solution of problem (PJ∩Q ). The variant of Proposition 2 in which Q is not taken into consideration, that is the case when one does not observe that Vk ◦ Λk = 0 for some k (if any), is much weaker; however, the conclusions coincide for Q = {0}. Proposition 3. Let (x, λ, σ) ∈ Rn × Rm × intI ∗ be a J-LKKT point of Ξ. (i) Then λ ∈ ΓJ , (λ, σ) ∈ Tcol , λ, ∇λ Ξ(x, λ, σ) = 0, L(x, λ) = Ξ(x, λ, σ), ∇x L(x, λ) = 0, and (5) holds. (ii) Assume that M= (λ) = 1, m. Then ∇λ L(x, λ) = ∇λ Ξ(x, λ, σ) = 0, whence (x, λ, σ) is a critical point of Ξ, (x, λ) is a critical point of L, and x ∈ Xe ⊂ XJ ⊂ Xi . m (iii) Assume that λ ∈ Rm ++ := int R+ and G(λ, σ) 0. Then x ∈ Xe , + (λ, σ) ∈ Tcol and f (x) = inf f (x) = Ξ(x, λ, σ) = L(x, λ) = x∈Xi
sup
D(λ, σ) = D(λ, σ);
+ (λ,σ)∈Tcol
moreover, if G(λ, σ) 0 then (λ, σ) ∈ T + and x is the unique global solution of problem (Pi ). The remark below refers to the case Q = ∅. A similar remark (but a bit less dramatic) is valid for Q0 = ∅.
On Constrained Optimization Problems Solved Using CDT
161
Remark 3. It is worth observing that given the functions f , g1 , ..., gm of type q + V ◦ Λ with q, Λ quadratic functions and V ∈ Γsc , for any choice of J ⊂ 1, m one finds the same x using Proposition 3 (iii). So, in practice, if one wishes to solve one of the problems (Pe ), (Pi ) or (PJ ) using CDT, it is sufficient to find those critical points (x, λ, σ) of Ξ such that λ ∈ Rm ++ and G(λ, σ) 0; if we are successful, x ∈ Xe and x is the unique solution of (Pi ), and so x is also solution for all problems (PJ ) with J ⊂ 1, m; moreover, (λ, σ) is a global maximizer of + . D on Tcol The next example shows that the condition Qc0 ⊂ M= (λ) is essential for x to be a feasible solution of problem (PJ ); moreover, it shows that, unlike J+ by the quadratic case (see [2, Prop. 9]), it is not possible to replace TQ,col {(λ, σ) ∈ Tcol | λ ∈ ΓJ , G(λ, σ) 0} in (7). The problem is a particular case of the one considered in [7, Ex. 1], “which is very simple, but important in both theoretical study and real-world applications since the constraint is a so-called double-well function, the most commonly used nonconvex potential in physics and engineering sciences [7]”;1 more precisely, q := 1, c := 6, d := 4, e := 2. Example 1. Let us take n = m = 1, J ⊂ {1}, q0 (x) := 12 x2 −6x, Λ1 (x) := 12 x2 −4, q1 (x) := Λ0 (x) := 0, V0 (t) := V1 (t) + 2 := 12 t2 for x, t ∈ R. Then f (x) = 12 x2 − 6x 2 and g1 (x) = 12 12 x2 − 4 − 2. Hence Q = {0} (whence Q0 = ∅) and Xe = √ √ √ √ {−2 3, 2 3, −2, 2} ⊂ [−2 3, −2] ∪ [2, 2 3] = Xi . Taking σ := (σ0 , σ1 ),
Ξ(x; λ; σ) = 12 x2 − 6x − 12 σ02 + λ σ1 12 x2 − 4 − 12 σ12 − 2 . We have that G(λ; σ) = 1 + λσ1 , Tcol = T = {(λ; σ) ∈ R × R2 | 1 + λσ1 = 0} and 18 − 12 σ02 − λ 12 σ12 + 4σ1 + 2 . 1 + λσ1 √ The critical points of Ξ are (2; −1; (0, −2)), (−2; 2; (0, −2)), 6; 0; (0, 14 + 8 3) , √ √ √ √ √ 6; 0; (0, 14 + 8 3) , −2 3; − 12 3 − 12 ; (0, 2) , 2 3; 12 3 − 12 ; (0, 2) , and so √ √ 1 + λσ 1 ∈ {3, −3, 1, − 3, 3} for (x, λ, σ) critical point of Ξ, whence (λ, σ) (∈ T ) is critical point of D by Lemma 3. For λ = 0 the corresponding x (= 6) is not in Xi ⊃ Xe ; in particular, (x, λ) is not a critical point of L. For λ = 0, Proposition √ 2 says that (x, λ) is a critical point of L; in particular x ∈ Xe . For λ ∈ {2, − 12 3 − 12 }, 1 + λσ 1 < 0, and so Proposition 2 says nothing about √ the optimality of x or (λ, σ); in fact, for λ = − 12 3 − 12 , the corresponding √ √ x (= −2 3) is the global maximizer of f on Xe . For λ := 12 3 − 12 > 0, √ √ 2 says that 1 + λσ 1 = 3 > 0, and so Proposition √ x = 2 3 (∈ Xe ) is the global solution of (Pi ), and (λ, σ) = 12 3 − 12 ; (0, 2) is a global maximizer of D on + Tcol = T + = {(λ, σ) ∈ R+ × R2 | 1 + λσ1 > 0}. For λ = −1, 1 + λσ 1 = 3 > 0, D(λ; σ) = −
1
The reference “[7]” is “Gao, D.Y.: Nonconvex semi-linear problems and canonical duality solutions, in Advances in Mechanics and Mathematics II. In: Gao, D.Y., Ogden R.W. (eds.), pp. 261–311. Kluwer Academic Publishers (2003)”.
162
C. Z˘ alinescu
but (λ, σ) is not a local extremum of D, as easily seen taking σ0 := 0, (λ, σ1 ) := (t − 1, t − 2) with |t| sufficiently small. When Q = 0, m problem (PJ ) reduces to the quadratic problem with equality and inequality quadratic constraints considered in [2, (PJ )], which is denoted here by (PJq ) and whose Lagrangian and dual function are denoted by Lq and Dq , respectively. Of course, in this case X = X0 = Rn , and so m λk σk2 (x ∈ Rn , λ ∈ Rm , σ ∈ R × Rm ) Ξ(x, λ, σ) = Lq (x, λ) − 12 k=0
with λ0 := 1. It follows that ∇x Ξ(x, λ, σ) = ∇x Lq (x, λ),
∇σ Ξ(x, λ, σ) = − (λk σk )k∈0,m , ∇λ Ξ(x, λ, σ) = ∇λ Lq (x, λ) − σj2 j∈1,m = qj (x) − 12 σj2 j∈1,m . 1 2
Moreover, G(λ, σ) = A(λ), F (λ, σ) = b(λ), E(λ, σ) = c(λ), and so T = m Y × R1+m , Tcol = Ycol × R1+m , D(λ, σ) = Dq (λ) − 12 k=0 λk σk2 , where A(λ), b(λ), c(λ), Y , Ycol , Dq (denoted there by D) are introduced in [2]. Applying Proposition 3 for this case we get the next result which is much weaker than [2, Prop. 9]. Corollary 2. Let (x, λ) ∈ Rn × Rm be a J-LKKT point of L. J := Ycol ∩ ΓJ , λ, ∇λ Lq (x, λ) = 0, and q0 (x) = Lq (x, λ) = (i) Then λ ∈ Ycol Dq (λ). (ii) Assume that M= (λ) = 1, m. Then ∇λ Lq (x, λ) = 0, and so (x, λ) is a critical point of Lq , and x ∈ Xe ⊂ XJ ⊂ Xi . + (iii) Assume that λ ∈ Rm ++ and A(λ) 0. Then x ∈ Xe , λ ∈ Ycol and q0 (x) = inf q0 (x) = Lq (x, λ) = sup Dq (λ) = Dq (λ); x∈Xi
i+ λ∈Ycol
moreover, if A(λ) 0 then λ ∈ Y i+ and x is the unique global solution of problem (Pi ). However, applying Proposition 2 we get assertion (i) and the last part of assertion (ii) of [2, Prop. 9]. We have to remark that in all papers of D.Y. Gao on constrained optimization problems in which CDT is used there is a result stating the “complementary-dual principle”, and at least a result stating min-max duality formula. However, we didn’t find a convincing proof of that min-max duality formula in these papers. We mention below some results which are not true. The problem considered by Gao, Ruan and Sherali in [5] is of type (Pi ). Theorem 2 (Global Optimality Condition) of [5] is false because in the mentioned conditions x is not necessarily in Xi , as Example 1 shows. Also Theorem 1 (Complementary-Dual Principle) √ and Theorem 3 (Triality Theory) are false because (λ, σ) = 0; (0, 14 + 8 3) is a critical point of D (by Lemma 3), while / Xi . It is the assertion “x is a KKT point of (P)” is not true because x = 6 ∈
On Constrained Optimization Problems Solved Using CDT
163
shown in [6, Ex. 6] that the “double-min or double-max” duality of Theorem 3, that is its assertion in the case G(λ, σ) ≺ 0, is also false. The problem considered by Latorre and Gao in [7] is of type (PJ ) in which Vk are “differentiable canonical functions”. As in the case of [5, Th. 1], Example 1 shows that Theorem 1 is false. Even without assuming that Sa+ is convex in Theorem 2, for the same reason, this theorem is false. The results established in Sects. 3 of Ruan and Gao’s paper [4] refer to (Pi ) ateaux differentiable on their domains and Vk are in which qk = 0, Λk are Gˆ “canonical functions” for k ∈ 0, m. As in the case of [5, Th. 1], Example 1 shows that Theorem 3 is false. The problem considered by Morales-Silva and Gao’s paper [8] refer to (Pe ) in which m = 2 and Q = {1}. The equality max(λ,μ,ς)∈Sa+ Π d (λ, μ, ς) = Π d (λ, μ, ς) from [8, Eq. (20)] √ is not true. For this consider n := 1,√A := 1, r := α := η := c := 1 and f := 98 2; this is a particular case γ := 98 2 of the problem (P) considered in [9].
References 1. Rockafellar, R. T.: Convex Analysis. Princeton University Press, Princeton, N.J. (1972) 2. Z˘ alinescu, C.: On quadratic optimization problems and canonical duality theory. arXiv:1809.09032 (2018) 3. Z˘ alinescu, C.: On unconstrained optimization problems solved using CDT and triality theory. arXiv:1810.09009 (2018) 4. Ruan, N., Gao, D.Y.: Canonical duality theory for solving nonconvex/discrete constrained global optimization problems. In: Gao, D.Y., Latorre, V., Ruan, N. (eds.) Canonical Duality Theory. Advances in Mechanics and Mathematics, vol. 37, pp. 187–201. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58017-3 9 5. Gao, D.Y., Ruan, N., Sherali, H.: Solutions and optimality criteria for nonconvex constrained global optimization problems with connections between canonical and Lagrangian duality. J. Global Optim. 45, 473–497 (2009) 6. Voisei, M.-D., Z˘ alinescu, C.: Counterexamples to some triality and tri-duality results. J. Global Optim. 49, 173–183 (2011) 7. Latorre, V., Gao, D.Y.: Canonical duality for solving general nonconvex constrained problems. Optim. Lett. 10, 1763–1779 (2016) 8. Morales-Silva, D., Gao, D.Y.: On minimal distance between two surfaces. In: Gao, D.Y., Latorre, V., Ruan, N. (eds.) Canonical Duality Theory. Advances in Mechanics and Mathematics, vol. 37, pp. 359–371. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-58017-3 18 9. Voisei, M.-D., Z˘ alinescu, C.: A counter-example to ‘minimal distance between two non-convex surfaces’. Optimization 60, 593–602 (2011)
On Controlled Variational Inequalities Involving Convex Functionals Savin Treant¸˘a(B) Department of Applied Mathematics, University Politehnica of Bucharest, Bucharest, Romania savin [email protected]
Abstract. In this paper, by using several variational techniques and a dual gap-type functional, we study weak sharp solutions associated with a controlled variational inequality governed by convex path-independent curvilinear integral functional. Also, under some hypotheses, we establish an equivalence between the minimum principle sufficiency property and weak sharpness for a solution set of the considered controlled variational inequality. Keywords: Controlled variational inequality · Weak sharp solution Convex path-independent curvilinear integral functional
1
·
Introduction
Convexity theory is an important foundation for studying a wide class of unrelated problems in a unified and general framework. Based on the notion of unique sharp minimizer, introduced by Polyak [11], and taking into account the works of Burke and Ferris [2], Patriksson [10], following Marcotte and Zhu [8], the variational inequalities have been strongly investigated by using the concept of weak sharp solution. We mention, in this respect, the works of Wu and Wu [15], Oveisiha and Zafarani [9], Alshahrani et al. [1], Liu and Wu [6] and Zhu [16]. In this paper, motivated and inspired by the ongoing research in this area, we introduce and investigate a new class of scalar variational inequalities. More precisely, by using several variational techniques presented in Clarke [3], Treant¸˘a [12,13] and Treant¸˘a and Arana-Jim´enez [14], we develop a new mathematical framework on controlled continuous-time variational inequalities governed by convex path-independent curvilinear integral functionals and, under some conditions and using a dual gap-type functional, we provide some characterization results for the associated solution set. As it is very well-known, the functionals of mechanical work type, due to their physical meaning, become very important in applications. Thus, the importance of this paper is supported both from theoretical and practical reasonings. As well, the ideas and techniques of this paper may stimulate further research in this dynamic field. Supported by University Politehnica of Bucharest, Bucharest, Romania (Grant No. MA51-18-01). c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 164–174, 2020. https://doi.org/10.1007/978-3-030-21803-4_17
On Controlled Variational Inequalities Involving Convex Functionals
2
165
Notations, Working Hypotheses and Problem Formulation
In this paper, we will consider the following notations and working hypotheses: two finite dimensional Euclidean spaces, Rn and Rk ; Θ ⊂ Rm is a compact domain in Rm and the point Θ t = (tβ ), β = 1, m, is a multi-parameter of evolution; 1 m for t1 = (t11 , . . . , tm 1 ), t2 = (t2 , . . . , t2 ) two different points in Θ, let Θ ⊃ Υ : t = t(θ), θ ∈ [a, b] (or t ∈ t1 , t2 ) be a piecewise smooth curve joining the points t1 and t2 in Θ; for U ⊆ Rk , P := Θ × Rn × U and i = 1, n, β = 1, m, ς = 1, q, we define the following continuously differentiable functions V = Vβi : P → Rnm , W = (Wς ) : P → Rq ; ∂x , β = 1, m, let X be the space of piecewise smooth state ∂tβ n functions x : Θ → R with the norm for xβ :=
x = x ∞ +
m
xβ ∞ ,
∀x ∈ X;
β=1
also, denote by U the space of piecewise continuous control functions u : Θ → U, endowed with the uniform norm · ∞ ; consider X × U equipped with the Euclidean inner product
(x, u); (y, w) = [x(t) · y(t) + u(t) · w(t)]dtβ , ∀(x, u), (y, w) ∈ X × U Υ
and the induced norm; denote by X × U a nonempty, closed and convex subset of X × U, defined as ∂xi X × U = {(x, u) ∈ X × U : β = Vβi (t, x, u) , W (t, x, u) ≤ 0, ∂t x(t1 ) = x1 , x(t2 ) = x2 }, where x, u are the simplified notations for x(t), u(t) and x1 and x2 are given; assume the continuously differentiable functions Vβ = Vβi , i = 1, n, β = 1, m, satisfy the following conditions of complete integrability Dζ Vβi = Dβ Vζi ,
β, ζ = 1, m, β = ζ, i = 1, n,
where Dζ denotes the total derivative operator; for any two q-tuples a = (a1 , ..., aq ) , b = (b1 , ..., bq ) in Rq , the following rules a = b ⇔ aς = bς , a ≤ b ⇔ aς ≤ bς , a < b ⇔ aς < bς ,
a b ⇔ a ≤ b, a = b,
ς = 1, q
166
S. Treant¸a ˘
are assumed. Note. Further, in this paper, it is assumed summation on repeated indices. In the following, J 1 (Rm , Rn ) denotes the first-order jet bundle associated to m R and Rn . For β = 1, m, we consider the real-valued continuously differentiable functions (closed Lagrange 1-form densities) lβ , sβ , rβ : J 1 (Rm , Rn )×U → R and, for (x, u) ∈ X × U, define the following path-independent curvilinear integral functionals: lβ (t, x, xϑ , u) dtβ , L, S, R : X × U → R, L(x, u) = Υ
S(x, u) =
Υ
β
sβ (t, x, xϑ , u) dt ,
R(x, u) =
Υ
rβ (t, x, xϑ , u) dtβ .
Definition 2.1 The scalar functional L(x, u) is called convex on X × U if, for any (x, u), (x0 , u0 ) ∈ X × U, the following inequality L(x, u) − L(x0 , u0 ) ≥
Υ
∂lβ 0 0 0 ∂lβ 0 0 0 0 0 t, x , xϑ , u (x − x ) + t, x , xϑ , u Dϑ (x − x ) dtβ ∂x ∂xϑ ∂lβ 0 0 0 t, x , xϑ , u (u − u0 ) dtβ + ∂u Υ
is satisfied. Definition 2.2 For β = 1, m, the variational (functional) derivative δβ L(x, u) of the scalar functional L : X × U → R, L(x, u) = lβ (t, x, xϑ , u) dtβ , is Υ
defined as δβ L(x, u) = with
δβ L δ β L + , δx δu
∂lβ ∂lβ δβ L = (t, x, xϑ , u) − Dϑ (t, x, xϑ , u) ∈ X, δx ∂x ∂xϑ δβ L ∂lβ = (t, x, xϑ , u) ∈ U δu ∂u
and, for (ψ, Ψ ) ∈ X × U, with ψ(t1 ) = ψ(t2 ) = 0, the following relation δβ L δβ L δβ L δ β L , ); (ψ, Ψ ) = (t) · ψ(t) + (t) · Ψ (t) dtβ
( δx δu δx δu Υ L(x + εψ, u + εΨ ) − L(x, u) ε→0 ε
= lim is satisfied.
On Controlled Variational Inequalities Involving Convex Functionals
167
Working assumptions. (i) In this work, it is assumed that the inner product between the variational derivative of a scalar functional and an element (ψ, Ψ ) in X × U is accompanied by the condition ψ(t1 ) = ψ(t2 ) = 0. (ii) Assume that ∂lβ 0 dU := Dϑ (x − x ) dtβ ∂xϑ is an exact total differential and satisfies U (t1 ) = U (t2 ). At this point, we have the necessary mathematical tools to formulate the following controlled variational inequality problem: find (x0 , u0 ) ∈ X × U such that
(CV IP )
Υ
∂lβ 0 0 0 ∂lβ 0 0 0 t, x , xϑ , u Dϑ (x − x0 ) dtβ t, x , xϑ , u (x − x0 ) + ∂x ∂xϑ
+ Υ
∂lβ 0 0 0 t, x , xϑ , u (u − u0 ) dtβ ≥ 0, ∂u
for any (x, u) ∈ X × U. The dual controlled variational inequality problem associated to (CV IP ) is formulated as follows: find (x0 , u0 ) ∈ X × U such that ∂lβ ∂lβ 0 0 (DCV IP ) (t, x, xϑ , u) (x − x ) + (t, x, xϑ , u) Dϑ (x − x ) dtβ ∂x ∂xϑ Υ ∂lβ 0 (t, x, xϑ , u) (u − u ) dtβ ≥ 0, + ∂u Υ for any (x, u) ∈ X × U. Denote by (X × U)∗ and (X × U)∗ the solution set associated to (CV IP ) and (DCV IP ), respectively, and assume they are nonempty. Remark 2.1 As it can be easily seen (see (ii) in Working assumptions), we can reformulate the above controlled variational inequality problems as follows: find (x0 , u0 ) ∈ X × U such that (CV IP )
(
δβ L δβ L , ); (x − x0 , u − u0 ) ≥ 0, δx0 δu0
∀(x, u) ∈ X × U,
respectively: find (x0 , u0 ) ∈ X × U such that (DCV IP )
(
δβ L δβ L , ); (x − x0 , u − u0 ) ≥ 0, δx δu
∀(x, u) ∈ X × U.
In the following, in order to describe the solution set (X × U)∗ associated to (CV IP ), we introduce the following gap-type path-independent curvilinear integral functionals. Definition 2.3 For (x, u) ∈ X×U, the primal gap-type path-independent curvilinear integral functional associated to (CV IP ) is defined as ∂lβ 0 (t, x, xϑ , u) (x − x ) dtβ S(x, u) = 0 max { ∂x (x ,u0 )∈X×U Υ
168
S. Treant¸a ˘
+ Υ
∂lβ ∂lβ 0 0 (t, x, xϑ , u) (u − u ) dtβ }, (t, x, xϑ , u) Dϑ (x − x ) + ∂xϑ ∂u
and the dual gap-type path-independent curvilinear integral functional associated to (CV IP ) is defined as follows ∂lβ 0 0 0 0 t, x (x − x { , x , u ) dtβ R(x, u) = 0 max ϑ ∂x (x ,u0 )∈X×U Υ ∂lβ 0 0 0 ∂lβ 0 0 0 t, x , xϑ , u (u − u0 ) dtβ }. t, x , xϑ , u Dϑ (x − x0 ) + + ∂u Υ ∂xϑ For (x, u) ∈ X × U, we introduce the following notations: ∂lβ (t, x, xϑ , u) (x − z) dtβ A(x, u) := {(z, ν) ∈ X × U : S(x, u) = ∂x Υ ∂lβ ∂lβ (t, x, xϑ , u) (u − ν) dtβ }, + (t, x, xϑ , u) Dϑ (x − z) + ∂u Υ ∂xϑ ∂lβ (t, z, zϑ , ν) (x − z) dtβ Z(x, u) := {(z, ν) ∈ X × U : R(x, u) = ∂x Υ ∂lβ ∂lβ (t, z, zϑ , ν) (u − ν) dtβ }. + (t, z, zϑ , ν) Dϑ (x − z) + ∂u Υ ∂xϑ In the following, in accordance with Marcotte and Zhu [8], we introduce some central definitions. Definition 2.4 The polar set (X × U)◦ associated to X × U is defined as (X × U)◦ = (y, w) ∈ X × U : (y, w); (x, u) ≤ 0, ∀(x, u) ∈ X × U . Definition 2.5 The normal cone to X × U at (x, u) ∈ X × U is defined as NX×U (x, u) = {(y, w) ∈ X × U : (y, w), (z, ν) − (x, u) ≤ 0, ∀(z, ν) ∈ X × U},
(x, u) ∈ X × U,
NX×U (x, u) = ∅,
(x, u) ∈ X × U ◦
and the tangent cone to X×U at (x, u) ∈ X×U is TX×U (x, u) = [NX×U (x, u)] . Remark 2.2 By using the definition of normal cone at (x, u) ∈ X × U, L L δ δ β β we observe the following: (x∗ , u∗ ) ∈ (X × U)∗ ⇐⇒ − ∗ ,− ∗ ∈ δx δu ∗ ∗ NX×U (x , u ).
On Controlled Variational Inequalities Involving Convex Functionals
3
169
Preliminary Results
In this section, in order to formulate and prove the main results of the paper, several auxiliary propositions are established. Proposition 3.1 Let the path-independent curvilinear integral functional L(x, u) be convex on X × U. Then: (i) the following equality ∂lβ 2 2 2 1 ∂lβ 2 2 2 t, x , xϑ , u (x − x2 ) + t, x , xϑ , u Dϑ (x1 − x2 ) dtβ ∂x ∂xϑ Υ ∂lβ 2 2 2 1 2 t, x , xϑ , u (u − u ) dtβ = 0 + ∂u Υ is fulfilled, for any (x1 , u1 ), (x2 , u2 ) ∈ (X × U)∗ ; (ii) (X × U)∗ ⊂ (X × U)∗ . Remark 3.1 The property of continuity for the variational derivative δβ L(x, u) implies (X × U)∗ ⊂ (X × U)∗ . By Proposition 3.1, we conclude (X × U)∗ = (X×U)∗ . Also, the solution set (X×U)∗ associated to (DCV IP ) is a convex set and, consequently, the solution set (X × U)∗ associated to (CV IP ) is a convex set. Proposition 3.2 Let the path-independent curvilinear integral functional R(x, u) be differentiable on X × U. Then the following ineguality
(
δβ R δβ R δβ L δβ L , ); (v, μ) ≥ ( , ); (v, μ) δx δu δy δw
is satisfied, for any (x, u), (v, μ) ∈ X × U, (y, w) ∈ Z(x, u). Proposition 3.3 Let the path-independent curvilinear integral functional R(x, u) be differentiable on (X × U)∗ and the path-independent curvilinear integral functional L(x, u) be convex on X × U. Also, assume the following implication
(
δβ R δβ R δβ L δβ L δβ L δβ L δβ R δ β R , ); (v, μ) =⇒ ( ∗ , ∗ ) = ( , ) , ); (v, μ) ≥ ( δx∗ δu∗ δz δν δx δu δz δν
is true, for any (x∗ , u∗ ) ∈ (X × U)∗ , (v, μ) ∈ X × U, (z, ν) ∈ Z(x∗ , u∗ ). Then Z(x∗ , u∗ ) = (X × U)∗ , ∀(x∗ , u∗ ) ∈ (X × U)∗ .
4
Main Results
In this section, taking into account the preliminary results established in the previous section, we investigate weak sharp solutions for the considered controlled variational inequality governed by convex path-independent curvilinear integral functional. Concretely, following Marcotte and Zhu [8], in accordance with Ferris and Mangasarian [4], the weak sharpness property of (X × U)∗ associated to (CV IP ) is studied. In this regard, two characterization results are established.
170
S. Treant¸a ˘
Definition 4.1 The solution set (X×U)∗ associated to (CV IP ) is called weakly sharp if there exists γ > 0 such that γB ⊂
δβ L δβ L , δx∗ δu∗
◦ + TX×U (x∗ , u∗ ) ∩ N(X×U)∗ (x∗ , u∗ ) ,
∀(x∗ , u∗ ) ∈ (X × U)∗ ,
(see int(Q) the interior of the set Q and B the open unit ball in X × U), or, equivalently, ⎞ ⎛
δβ L δβ L ◦ TX×U (x, u) ∩ N(X×U)∗ (x, u) ⎠ , − ∗ , − ∗ ∈ int ⎝ δx δu ∗ (x,u)∈(X×U)
for all (x∗ , u∗ ) ∈ (X × U)∗ . Lemma 4.1 There exists γ > 0 such that
◦ δβ L δβ L , γB ⊂ + TX×U (y, w) ∩ N(X×U)∗ (y, w) , δy δw
∀(y, w) ∈ (X × U)∗ (1)
if and only if
(
δβ L δ β L , ); (z, ν) ≥ γ (z, ν) , δy δw
∀(z, ν) ∈ TX×U (y, w) ∩ N(X×U)∗ (y, w).
(2) The first characterization result of weak sharpness for (X×U)∗ is formulated in the following theorem. Theorem 4.1 Let the path-independent curvilinear integral functional R(x, u) be differentiable on (X × U)∗ and the path-independent curvilinear integral functional L(x, u) be convex on X × U. Also, assume that: (a) the following implication
δβ R δβ R δβ L δβ L δβ L δβ L δβ R δβ R , ); (v, μ) =⇒ , , =
( ∗ , ∗ ); (v, μ) ≥ ( δx δu δz δν δx∗ δu∗ δz δν is true, for any (x∗, u∗ ) ∈ (X × U)∗ , (v, μ) ∈ X × U, (z, ν) ∈ Z(x∗ , u∗ ); δβ L δ β L (b) , is constant on (X × U)∗ . δx∗ δu∗ Then (X × U)∗ is weakly sharp if and only if there exists γ > 0 such that R(x, u) ≥ γd ((x, u), (X × U)∗ ) , where d ((x, u), (X × U)∗ ) =
min
(y,w)∈(X×U)∗
∀(x, u) ∈ X × U,
(x, u) − (y, w) .
Proof. “=⇒” Consider (X × U)∗ is weakly sharp. Therefore, by Definition 4.1, there exists γ > 0 such that (1) (or (2)) is fulfilled. Further, taking into account the property of convexity for the solution set (X×U)∗ associated to (CV IP ) (see y , w) ˆ ∈ (X × U)∗ , ∀(x, u) ∈ X × U Remark 3.1), it follows proj(X×U)∗ (x, u) = (ˆ
On Controlled Variational Inequalities Involving Convex Functionals
171
and, following Hiriart-Urruty and Lemar´echal [5], we obtain (x, u) − (ˆ y , w) ˆ ∈ y , w) ˆ ∩ N(X×U)∗ (ˆ y , w). ˆ By hypothesis and Lemma 4.1, we get TX×U (ˆ ∂lβ ∂lβ (t, yˆ, yˆϑ , w) ˆ (x − yˆ) + (t, yˆ, yˆϑ , w) ˆ Dϑ (x − yˆ) dtβ ∂x ∂xϑ Υ ∂lβ (t, yˆ, yˆϑ , w) + ˆ (u − w) ˆ dtβ ≥ γd((x, u), (X × U)∗ ), ∀(x, u) ∈ X × U. ∂u Υ (3) Since ∂lβ ∂lβ R(x, u) ≥ (t, yˆ, yˆϑ , w) ˆ (x − yˆ) + (t, yˆ, yˆϑ , w) ˆ Dϑ (x − yˆ) dtβ ∂x ∂xϑ Υ ∂lβ (t, yˆ, yˆϑ , w) + ˆ (u − w) ˆ dtβ , ∀(x, u) ∈ X × U, ∂u Υ by (3), we obtain R(x, u) ≥ γd((x, u), (X × U)∗ ), ∀(x, u) ∈ X × U. “⇐=” Consider there exists γ > 0 such that R(x, u) ≥ γd((x, u), (X × U)∗ ), ∗ ∀(x, u) ∈ X × U. Obviously, for any (y, w) ∈ (X × U) , the case ◦ TX×U (y, w) ∩ ∗ (y, w) T (y, w) ∩ N = X×U and, N(X×U)∗ (y, w) = {(0, 0)} involves X×U (X×U) ◦ δβ L δ β L , consequently, γB ⊂ + TX×U (y, w) ∩ N(X×U)∗ (y, w) , ∀(y, w) ∈ δy δw ∗ (X × U) is trivial. In the following, let (0, 0) = (x, u) ∈ TX×U (y, w) ∩ N(X×U)∗ (y, w) involving there exists a sequence (xk , uk ) converging to (x, u) with (y, w) + tk (xk , uk ) ∈ X × U (for some sequence of positive numbers {tk } decreasing to zero), such that d((y, w) + tk (xk , uk ), (X × U)∗ ) ≥ d((y, w) + tk (xk , uk ), Hx,u ) =
tk (x, u); (xk , uk ) , (x, u)
(4)
where Hx,u = (x, u) ∈ X × U : (x, u); (x, u) − (y, w) = 0 is a hyperplane passing through (y, w) and orthogonal to (x, u). By hypothesis and (4), it tk (x, u); (xk , uk ) , or, equivalently (R(y, w) = results R((y, w) + tk (xk , uk )) ≥ γ (x, u) 0, ∀(y, w) ∈ (X × U)∗ ), R((y, w) + tk (xk , uk )) − R(y, w)
(x, u); (xk , uk ) . ≥γ tk (x, u)
(5)
Further, by taking the limit for k → ∞ in (5) and using a classical result of functional analysis, we obtain lim
λ→0
R((y, w) + λ(x, u)) − R(y, w) ≥ γ(x, u), λ
(6)
172
S. Treant¸a ˘
where λ > 0. By Definition 2.2, the inequality (6) becomes
(
δβ R δβ R , ); (x, u) ≥ γ(x, u). δy δw
(7)
Now, taking into account the hypothesis and (7), for any (b, υ) ∈ B, it folδβ L δ β L δβ R δβ R lows γ(b, υ) − ( , ); (x, u) = γ(b, υ); (x, u) − ( , ); (x, u) ≤ δy δw δy δw γ(x, u) − γ(x, u) = 0 and the proof is complete. The second characterization result of weak sharpness for (X × U)∗ is based on the notion of minimum principle sufficiency property, introduced by Ferris and Mangasarian [4]. Definition 4.2 The controlled variational inequality (CV IP ) satisfies minimum principle sufficiency property if A(x∗ , u∗ ) = (X × U)∗ , ∀(x∗ , u∗ ) ∈ (X × U)∗ . Lemma 4.2 The inclusion arg max (x, u); (y, w) ⊂ (X × U)∗ is fulfilled (y,w)∈X×U ⎛ ⎞ ◦ for any (x, u) ∈ int ⎝ TX×U (x, u) ∩ N(X×U)∗ (x, u) ⎠ = ∅. (x,u)∈(X×U)∗
Theorem 4.2 Let the solution set (X × U)∗ associated to (CV IP ) be weakly sharp and the path-independent curvilinear integral functional L(x, u) be convex on X × U. Then (CV IP ) satisfies minimum principle sufficiency property. Theorem 4.3 Consider the functional R(x, u) is differentiable on (X×U)∗ and the path-independent curvilinear integral functional L(x, u) is convex on X × U. Also, for any (x∗ , u∗ ) ∈ (X × U)∗ , (v, μ) ∈ X × U, (z, ν) ∈ Z(x∗ , u∗ ), assume the following implication
δβ R δβ R δβ L δβ L δβ L δβ L δβ R δ β R , ); (v, μ) =⇒ , , =
( ∗ , ∗ ); (v, μ) ≥ ( δx δu δz δν δx∗ δu∗ δz δν
δβ L δβ L , is constant on (X × U)∗ . Then (CV IP ) satisfies is fulfilled and δx∗ δu∗ minimum principle sufficiency property if and only if (X × U)∗ is weakly sharp. Proof. “=⇒” Let (CV IP ) satisfies minimum principle sufficiency property. In consequence, A(x∗ , u∗ ) = (X × U)∗ , for any (x∗ , u∗ ) ∈ (X × U)∗ . Obviously, for (x∗ , u∗ ) ∈ (X × U)∗ and (x, u) ∈ X × U, we obtain ∂lβ ∂lβ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (t, x , xϑ , u ) (x − x ) + (t, x , xϑ , u ) Dϑ (x − x ) dtβ R(x, u) ≥ ∂x ∂xϑ Υ ∂lβ (t, x∗ , x∗ϑ , u∗ ) (u − u∗ ) dtβ . + (8) ∂u Υ
On Controlled Variational Inequalities Involving Convex Functionals
173
δβ L δ β L , ); (x, u), (x, u) ∈ X × U, we get δx∗ δu∗ ∗ ∗ A(x , u ) the solution set for min P (x, u). For other related investigaFurther, for P (x, u) = (
(x,u)∈X×U
tions, the readers are directed to Mangasarian and Meyer [7]. We can write x, u ˜) ∈ A(x∗ , u∗ ), P (x, u) − P (˜ x, u ˜) ≥ γd((x, u), A(x∗ , u∗ )), ∀(x, u) ∈ X × U, (˜ δβ L δ β L or, ( ∗ , ∗ ); (x, u) − (x∗ , u∗ ) ≥ γd((x, u), (X × U)∗ ), ∀(x, u) ∈ X × U, or, δx δu equivalently, ∂lβ ∂lβ (t, x∗ , x∗ϑ , u∗ ) (x − x∗ ) + (t, x∗ , x∗ϑ , u∗ ) Dϑ (x − x∗ ) dtβ ∂x ∂xϑ Υ ∂lβ (t, x∗ , x∗ϑ , u∗ ) (u − u∗ ) dtβ ≥ γd((x, u), (X×U)∗ ), ∀(x, u) ∈ X×U. + ∂u Υ (9) By (8), (9) and Theorem 4.1, we get (X × U)∗ is weakly sharp. “⇐=” This is a consequence of Theorem 4.2.
References 1. Alshahrani, M., Al-Homidan S., Ansari, Q.H.: Minimum and maximum principle sufficiency properties for nonsmooth variational inequalities. Optim. Lett. 10, 805– 819 (2016) 2. Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31, 1340–1359 (1993) 3. Clarke, F.H.: Functional Analysis, Calculus of Variations and Optimal Control. Springer, London (2013) 4. Ferris, M.C., Mangasarian, O.L.: Minimum principle sufficiency. Math. Program. 57, 1–14 (1992) 5. Hiriart-Urruty, J.-B., Lemar´echal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001) 6. Liu, Y., Wu, Z.: Characterization of weakly sharp solutions of a variational inequality by its primal gap function. Optim. Lett. 10, 563–576 (2016) 7. Mangasarian, O.L., Meyer, R.R.: Nonlinear perturbation of linear programs. SIAM J. Control Optim. 17, 745–752 (1979) 8. Marcotte, P., Zhu, D.: Weak sharp solutions of variational inequalities. SIAM J. Optim. 9, 179–189 (1998) 9. Oveisiha, M., Zafarani, J.: Generalized Minty vector variational-like inequalities and vector optimization problems in Asplund spaces. Optim. Lett. 7, 709–721 (2013) 10. Patriksson, M.: A unified framework of descent algorithms for nonlinear programs and variational inequalities. Ph.D. thesis, Link¨ oping Institute of Technology (1993) 11. Polyak, B.T.: Introduction to Optimization. Optimization Software. Publications Division, New York (1987) 12. Treant¸a ˘, S.: Multiobjective fractional variational problem on higher-order jet bundles. Commun. Math. Stat. 4, 323–340 (2016) 13. Treant¸a ˘, S.: Higher-order Hamilton dynamics and Hamilton-Jacobi divergence PDE. Comput. Math. Appl. 75, 547–560 (2018)
174
S. Treant¸a ˘
14. Treant¸a ˘, S., Arana-Jim´enez, M.: On generalized KT-pseudoinvex control problems involving multiple integral functionals. Eur. J. Control 43, 39–45 (2018) 15. Wu, Z., Wu, S.Y.: Weak sharp solutions of variational inequalities in Hilbert spaces. SIAM J. Optim. 14, 1011–1027 (2004) 16. Zhu, S.K.: Weak sharp efficiency in multiobjective optimization. Optim. Lett. 10, 1287–1301 (2016)
On Lagrange Duality for Several Classes of Nonconvex Optimization Problems Ewa M. Bednarczuk1,2
and Monika Syga2(B)
1
2
Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01447 Warsaw, Poland [email protected] Warsaw University of Technology, Faculty of Mathematics and Information Science, ul. Koszykowa 75, 00662 Warsaw, Poland [email protected]
Abstract. We investigate a general framework for studying Lagrange duality in some classes of nonconvex optimization problems. To this aim we use an abstract convexity theory, namely Φ-convexity theory, which provides tools for investigating nonconvex problems in the spirit of convex analysis (via suitably defined subdifferentials and conjugates). We prove a strong Lagrangian duality theorem for optimization of Φlsc convex functions which is based on minimax theorem for general Φconvex functions. The class of Φlsc -convex functions contains among others, prox-regular functions, DC functions, weakly convex functions and para-convex functions. An important ingredient of the study is the regularity condition under which our strong Lagrangian duality theorem holds. This condition appears to be weaker than a number of already known regularity conditions, even for convex problems. Keywords: Abstract convexity · Φ-convexity · Minimax theorem Lagrangian duality · Nonconvex optimization · Weakest constraint qualification condition
1
·
Introduction
Duality theory is an important tool in global optimization. The analysis of pairs of dual problems provides a significant additional knowledge about a given optimization problem and allows to construct algorithms for finding its global solutions. There exist numerous attempts to construct pairs of dual problems in nonconvex optimization e.g., for DC-functions [10], for composite functions [5] and DC and composite functions [16]. Theory of Φ-convexity provides a general framework for dealing with some classes of nonconvex optimization problem. This theory has been developed by [7, 11,14] and many others. Φ-convex functions are defined as pointwise suprema of functions from a given class. Such an approach to abstract convexity generalizes c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 175–181, 2020. https://doi.org/10.1007/978-3-030-21803-4_18
176
E. M. Bednarczuk and M. Syga
the classical fact that each proper lower semicontinuous convex function is the upper envelope of a certain set of affine functions. In the present paper we use Φconvexity to investigate the duality for a wide class of a nonconvex optimization problems. The aim of this paper is to investigate Lagrangian duality for optimization problems involving Φlsc -convex functions. The class of Φlsc -convex functions consists of lower semicontinuous functions defined on Hilbert spaces and minorized by quadratic functions. This class embodies many important classes of functions appearing in optimization, e.g. prox-regular functions [3], also known as proxbounded functions [12], DC (difference of convex) functions [20], weakly convex functions [21], para-convex functions [13] and lower semicontinuous convex (in the classical sense) functions. Our main Lagrangian duality result (Theorem 4) is based on a general minimax theorem for Φ-convex functions as proved in [18]. An important ingredient of Theorem 4 is condition (3) which can be viewed as a regularity condition guarantying the strong duality. Condition (3) appears to be weaker than many already known regularity conditions [4]. The organization of the paper is as follows. In Sect. 2 we recall basic facts on Φ-convexity. In Sect. 3 we define subgradient for a particular class of Φconvex functions, namely the class of Φlsc -convex functions. In Sect. 4 we formulate Lagrangian duality theorem in the class of Φlsc -convex functions (Theorem 4). Condition (3) of Theorem 4 is a regularity condition ensuring that the Lagrangian duality for optimization problem with Φlsc -convex functions holds. Let us note that Φlsc -convex functions as defined above may admit +∞ values which allows to consider also indicator functions in our framework.
2
Φ-Convexity
We start with definitions related to abstract convexity. Let X be a set. Let Φ be a set of real-valued functions ϕ : X → R. ˆ := R ∪ {−∞} ∪ {+∞} For any f, g : X → R f ≤ g ⇔ f (x) ≤ g(x) ∀x ∈ X. ˆ The set Let f : X → R. supp(f, Φ) := {ϕ ∈ Φ : ϕ ≤ f } is called the support of f with respect to Φ. We will use the notation supp(f ) if the class Φ is clear from the context. ˆ is called Φ-convex if Definition 1. ([7, 11, 14]) A function f : X → R f (x) = sup{ϕ(x) : ϕ ∈ supp(f )} ∀ x ∈ X.
On Lagrange Duality of Nonconvex Optimization Problems
177
By convention, supp(f ) = ∅ if and only if f ≡ −∞. In this paper we always ¯ := assume that supp(f ) = ∅, i.e. we limit our attention to functions f : X → R R ∪ {+∞}. ¯ is proper if the effective domain of f is We say that a function f : X → R nonempty, i.e. dom(f ) := {x ∈ X : f (x) < +∞} = ∅. In this paper we consider Φ-convex with respect to the following class Φlsc Φlsc := {ϕ : X → R, ϕ(x) = −a x 2 + , x+c, x ∈ X, ∈ X ∗ , a ≥ 0, c ∈ R}, where X is a Hilbert space. In the following theorem we recall a characterization of Φlsc -convex functions. ¯ be Theorem 1. ([14], Example 6.2) Let X be a Hilbert space. Let f : X → R lower semicontinuous on X. If supp(f ) = ∅, then f is Φlsc -convex. The class of Φlsc -convex functions is very broad and contains many well known classes of nonconvex functions appearing in optimization e.g prox-regular (also called prox-bounded) functions [3,12], DC functions [15,20], weakly convex functions [21], para-convex functions [6,13]. Let us note that the set of all Φlsc -convex functions defined on Hilbert space X contains all proper lower semicontinuous and convex (in the classical sense) functions defined on X.
3
Subgradients
Now we introduce the definition of subgradient of Φlsc -convex function. Such subgradients were considered in [11,14] but in slightly different form. Definition 2. An element (a, v) ∈ R+ × X is called a Φlsc -subgradient of a ¯ at x function f : X → R ¯, if the following inequality holds f (x) − f (¯ x) ≥ v, x − x ¯ − a x 2 + a ¯ x 2 ,
∀ x ∈ X.
(1)
The set of all Φlsc -subgradients of f at x ¯ is denoted as ∂lsc f (¯ x). It can be shown that many subgradients, which appear in the literature, are Φlsc subgradients. Examples of such subgradients are: proximal subgradients [19], subgradients for DC functions [1], for weakly convex functions [21], para-convex functions [13] and classical subgradient for lower semicontinuous convex functions.
178
3.1
E. M. Bednarczuk and M. Syga
Minimax Theorem for Φ-Convex Functions
Our Lagrangian duality theorem for Φlsc -convex functions is based on the general minimax theorem for bifunctions which are Φ-convex with respect to one of variables. In this section we recall the necessary definitions and the formulation of the general minimax theorem for Φ-convex functions as appeared in [18]. ¯ such that, for every y ∈ Y functions For a given function a : X × Y → R a(·, y) are Φ-convex, a sufficient and necessary condition for the minimax equality to hold is the so called intersection property. The intersection property was investigated in [2,17,18]. Let ϕ1 , ϕ2 : X → R be functions from the set Φlsc and α ∈ R. We say that the intersection property holds for ϕ1 and ϕ2 on X at the level if and only if [ϕ1 < α] ∩ [ϕ2 < α] = ∅, where [ϕ < α] := {x ∈ X : ϕ(x) < α} is the strict lower level set of a function ϕ : X → R. The general minimax theorem for Φ-convex functions is proved in Theorem 3.3.3 of [18]. In the case Φ = Φlsc Theorem 3.3.3 of [18] can be rewritten in the following way. ¯ Assume that Theorem 2. Let X, Y be Hilbert spaces and let a : X × Y → R. ¯ for any y ∈ Y the function a(·, y) : X → R is Φlsc -convex on X and for any ¯ is concave on Y . The following conditions x ∈ X the function a(x, ·) : Y → R are equivalent: (i) for every α ∈ R, α < inf sup a(x, y), there exist y1 , y2 ∈ Y and ϕ1 ∈ x∈X y∈Y
supp a(·, y1 ), ϕ2 ∈ supp a(·, y2 ) such that the intersection property holds for ϕ1 and ϕ2 on X at the level α, (ii) sup inf a(x, y) = inf sup a(x, y). y∈Y x∈X
x∈X y∈Y
To make the condition (i) of Theorem 2 operational we proved a slightly weaker version of Theorem 2. Let β = inf sup a(x, y) < +∞. x∈X y∈Y
¯ be a function Theorem 3. Let X, Y be a Hilbert spaces. Let a : X × Y → R such that for any y ∈ Y the function a(·, y) : X → R is Φlsc -convex on X and for any x ∈ X the function a(x, ·) : Y → R is concave on Y . If there exist y1 , y2 ∈ Y and x ¯ ∈ [a(·, y1 ) ≥ β] ∩ [a(·, y2 ) ≥ β] such that 0 ∈ co(∂a(·, y1 )(¯ x) ∪ ∂a(·, y2 )(¯ x)) then
sup inf a(x, y) = inf sup a(x, y).
y∈Y x∈X
x∈X y∈Y
and there exists y¯ ∈ Y such that inf a(x, y¯) ≥ β. x∈X
Proof. For the proof see [19].
On Lagrange Duality of Nonconvex Optimization Problems
179
Remark 1. Let us note that, if inf sup a(x, y) = −∞ then, the equality x∈X y∈Y
sup inf a(x, y) = inf sup a(x, y) holds. If inf sup a(x, y) = +∞, then for min-
y∈Y x∈X
x∈X y∈Y
x∈X y∈Y
imax equality to hold we need to assume that the assumption of Theorem 3 is true for every β < +∞.
4
Lagrange Duality for Φ-Convex Functions
Lagrangian duality for functions satisfying some generalized convexity conditions has been recently investigated by many authors e.g. for prox-regular functions see [9], for DC functions see [8]. The following construction of the Lagrangian for Φlsc -convex functions is based on the general Lagrangian for Φ-convex functions introduced in [7] and investigated in [11,14]. Let X, Y be Hilbert spaces. Let us consider the optimization problem f (x),
Min
x∈X
(P ),
where f : X → R ∪ {+∞} is a Φlsc -convex function. ¯ be a function satisfying Let p(x, y) : X × Y → R p(x, y0 ) = f (x), where y0 ∈ Y . Consider the family of problems p(x, y),
Min
x∈X
(Py ).
¯ is defined as The Lagrangian L : X × R+ × Y ∗ :→ R L(x, a, v) = −a y0 2 + v, y0 − p∗x (a, v),
(2)
where p∗x (a, v) = sup {−a y 2 + v, y − p(x, y)} is the ΦYlsc -conjugate of the y∈Y
function p(x, ·). The problem inf
sup
x∈X (a,v)∈R+ ×Y ∗
L(x, a, v).
is equivalent to (P ) if and only if the function p(x, ·) is ΦYlsc -convex function on Y for all X. Indeed, sup (a,v)∈R+ ×Y ∗
L(x, a, v) =
sup (a,v)∈R+ ×Y ∗
{−a y0 2 + v, y0 − p∗x (a, v)} =
= p∗∗ x (y0 ) = p(x, y0 ). The dual problem to (P ) is defined is defined as follows sup
inf L(x, a, v)
(a,v)∈R+ ×Y ∗ x∈X
Let β := inf
sup
x∈X (a,v)∈R+ ×Y ∗
eral minimax theorem.
(D).
L(x, a, v). The following theorem is based on the gen-
180
E. M. Bednarczuk and M. Syga
¯ be the Theorem 4. Let X, Y be Hilbert spaces. Let L : X × R+ × Y ∗ → R Lagrangian defined by (2). Assume that for any (a, v) ∈ R+ × Y ∗ the function L(·, a, v) : X → R is Φlsc -convex on X. ¯ ∈ [L(·, a1 , v1 ) ≥ β] ∩ If there exist (a1 , v1 ), (a2 , v2 ) ∈ R+ × Y ∗ and x [L(·, a2 , v2 ) ≥ β] such that 0 ∈ co(∂L(·, a1 , v1 )(¯ x) ∪ ∂L(·, a2 , v2 )(¯ x)) then sup
inf L(x, a, v) = inf
(a,v)∈R+ ×Y ∗ x∈X
sup
x∈X (a,v)∈R+ ×Y ∗
(3)
L(x, a, v)
¯, v¯) ≥ β and there exists (¯ a, v¯) ∈ R+ × Y ∗ ∈ Y such that inf L(x, a x∈X
Proof. Follows immediately from Theorem 3. Example 1. Consider the problem (P ) with X = R and ⎧ if x > 0, ⎨1, if x = 0, . f (x) = −1, ⎩ +∞, if x < 0 Then, it is easy to see that β = −1 and ⎧ 2 if x > 0, ⎨ax − bx, −1, if x = 0, L(x, a, b) = ⎩ +∞, x 0 ⇒ F (x), x − y > 0. Definition 2.7. [3] A map F from U into Rn is quasimonotone on an open convex subset U of Rn if for every pair of distinct points x, y ∈ U, we have F (y), x − y > 0 ⇒ F (x), x − y ≥ 0. Remark 2.5. Quasimonotonicity of semidifferentiable map of f is implied by (df )+ (y, x − y) > 0 =⇒ (df )+ (x, x − y) ≥ 0,
3
∀x, y ∈ U.
Main Results
Theorem 3.1. Let U be a non-empty open and convex subset of Rn . Then, a semidifferentiable function f : U → R is convex if and only if (df )+ (., x − y) is monotone on U . Proof. Suppose the semidifferential function f is convex on U, then f (x) − f (y) ≥ (df )+ (y, x − y),
∀x, y ∈ U.
(1)
∀x, y ∈ U.
(2)
Interchanging the role of x and y, we get f (y) − f (x) ≥ (df )+ (x, y − x), Adding inequalities (1) and (2), we get 0 ≥ (df )+ (y, x − y) + (df )+ (x, y − x),
∀x, y ∈ U.
0 ≥ (df )+ (y, x − y) − (df )+ (x, x − y),
∀x, y ∈ U.
(df )+ (x, x − y) − (df )+ (y, x − y) ≥ 0,
∀x, y ∈ U.
i.e. Therefore, (df )+ (., x − y) is monotone on U. Conversely, suppose (df )+ (., x − y) is monotone on U, i.e. (df )+ (x, x − y) − (df )+ (y, x − y) ≥ 0,
∀x, y ∈ U.
(3)
By the Mean-Value Theorem, ∃ z = λx + (1 − λ)y for some λ ∈ (0, 1), such that f (x) − f (y) = (df )+ (z, x − y) =
1 (df )+ (z, z − y). λ
(4)
186
S. K. Mishra et al.
Since U is convex, z, y ∈ U, then by (3), we have 1 1 (df )+ (z, z − y) ≥ (df )+ (y, z − y). λ λ
(5)
From (4) and (5), we have f (x) − f (y) ≥ (df )+ (y, x − y),
∀x, y ∈ U.
Therefore, f is convex on U. Example 3.1. Let f : R → R be defined by, f (x) = max{ex , e−x },
x ∈ R.
Here, f is not differentiable but semidifferentiable and convex, the semidifferential of f at y ∈ R in the direction of x − y is −y e (y − x) for x < 0, (df )+ (y, x − y) = for x ≥ 0. ey (x − y) It is easy to see that (df )+ (., x − y) is monotone map (Fig. 1). 5 4 3 2 1 0 -2
-1
0
1
2
Fig. 1. Semidifferentiable function f
Theorem 3.2. Let U be a non-empty open and convex subset of Rn . Then, a semidifferentiable function f : U → R is pseudoconvex if and only if (df )+ (., x − y) is pseudomonotone on U . Proof. Suppose that the semidifferential function f is pseudoconvex on U, then (df )+ (y, x − y) ≥ 0 =⇒ f (x) ≥ f (y),
∀x = y ∈ U.
(6)
On Monotone Maps: Semidifferentiable Case
187
We have to show that (df )+ (x, x − y) ≥ 0.
(7)
Suppose that (7) does not hold, i.e. (df )+ (x, x − y) < 0. i.e. (df )+ (x, y − x) > 0 =⇒ f (y) ≥ f (x), which contradicts (6). Hence, (df )+ (y, x − y) ≥ 0 =⇒ (df )+ (x, x − y) ≥ 0. Therefore, (df )+ (., x − y) is pseudomonotone on U. Conversely, suppose that (df )+ (., x − y) is pseudomonotone on U. i.e. (df )+ (y, x − y) ≥ 0 =⇒ (df )+ (x, x − y) ≥ 0,
∀x = y ∈ U.
(8)
We have to show that f (x) ≥ f (y). Suppose on contrary f (x) < f (y). By the Mean-Value Theorem ∃ z = λx + (1 − λ)y for some λ ∈ (0, 1), such that 0 > f (x) − f (y) = (df )+ (z, x − y). i.e.,
(9)
(df )+ (z, x − y) < 0, z − y < 0, =⇒ (df )+ z, λ (df )+ (z, z − y) < 0,
∀y = z ∈ U.
By Proposition 2.2, we have (df )+ (y, z − y) < 0, i.e. (df )+ (y, λ(x − y)) < 0,
∀x = y ∈ U, for some λ ∈ (0, 1),
i.e. (df )+ (y, x − y) < 0,
(∵ λ > 0)
which contradicts the assumption of pseudomonotonicity of (df )+ (., x−y). Thus, f is pseudoconvex on U. Example 3.2. Let f : U = [−π, π] → R be defined by f (x) = x + sinx. Here, f is semidifferentiable and pseudoconvex on U = [−π, π]. The semidifferential of f is given by (df )+ (y, x − y) = (x − y)(1 + cosy), which is pseudomonotone on U. Theorem 3.3. Let U be a non-empty open and convex subset of Rn . Then, a semidifferentiable function f : U → R is quasiconvex if and only if (df )+ (., x−y) is quasimonotone on U . Proof. Suppose that the semidifferentiable function f is quasiconvex on U, i.e. f (x) ≤ f (y) =⇒ (df )+ (y, x − y) ≤ 0,
∀x = y ∈ U.
(10)
188
S. K. Mishra et al.
Let for any x, y ∈ U, x = y be such that (df )+ (y, x − y) > 0.
(11)
We have to show that (df )+ (x, x − y) ≥ 0. From the inequality (10), we get (df )+ (y, x − y) > 0 =⇒ f (x) > f (y). ∵ f is quasiconvex on U, f (y) < f (x) =⇒ (df )+ (x, y − x) ≤ 0. i.e., (df )+ (x, x − y) ≥ 0. + Hence, (df ) (., x − y) is quasimonotone on U. Conversely, suppose that (df )+ (., x − y) is quasimonotone on U. Assume that f is not quasiconvex, then ∃ x, y ∈ U such that ¯ ∈ (0, 1) such that for x ¯ + (1 − λ)y ¯ and f (x) ≤ f (y) and λ ¯ = λx ¯ + (1 − λ)y) ¯ f (λx > f (y), f (x) ≤ f (y) < f (¯ x).
(12)
ˆ + (1 − λ)y ˆ and x∗ = λ∗ x + (1 − λ∗ )y such By the Mean-Value Theorem ∃ x ˆ = λx that x, x ¯ − x), (13) f (¯ x) − f (x) = (df )+ (ˆ and
f (¯ x) − f (y) = (df )+ (x∗ , x ¯ − y).
(14)
¯ 0. (15) (df )+ (ˆ From statements (12) and (14), we get ¯ − y) > 0. (df )+ (x∗ , x
(16)
From (15) and (16), we have ¯ x(1 − λ)(y − x)) > 0, (df )+ (ˆ
¯ (∵ x ¯ − x = (1 − λ)(y − x))
(17)
and
¯ − y)) > 0. ¯ − y)) (∵ x ¯ − y = λ(x (df )+ (x∗ , λ(x ¯ > 0 and λ ¯ > 0, hence from (17) and (18), we have ∵ 1−λ
and
(18)
x, y − x) > 0, (df )+ (ˆ
(19)
(df )+ (x∗ , x − y) > 0.
(20)
On Monotone Maps: Semidifferentiable Case
189
Inequalities (19) and (20) can be rewritten as x∗ − x ˆ (df )+ x ˆ, > 0, ˆ − λ∗ λ i.e. and
x, x∗ − x ˆ) > 0, (df )+ (ˆ
(21)
x ˆ − x∗ (df )+ x∗ , > 0, ˆ − λ∗ λ
i.e.
ˆ) < 0. (df )+ (x∗ , x∗ − x
(22)
From (21) and (22), we have (df )+ (ˆ x, x∗ − x ˆ) > 0 =⇒ (df )+ (x∗ , x∗ − x ˆ) < 0, which is a contradiction to quasimonotonicity of (df )+ (., x − y) on U. Therefore, f is quasiconvex on U.
Remark 3.1. Theorem 3.2 and Theorem 3.3 extend the Theorem 3.1 of Karamardian [1] and Proposition 5.2 of Karamardian and Schaible [3], respectively, in semidifferentiable case. Example 3.3. Let f : R → R be defined ⎧ ⎨ 2x + 1 f (x) = x ⎩ 2x − 1
by for x ≤ −1, for −1 ≤ x ≤ 1, for x ≥ 1.
Here, f is semidifferentiable and quasiconvex on R. The semidifferential of f is given by x−y for −1 ≤ x ≤ 1, (df )+ (y, x − y) = 2(x − y) otherwise. Which is quasimonotone on R. Acknowledgements. The first author is financially supported by Department of Science and Technology, SERB, New Delhi, India, through grant no.: MTR/2018/000121. The second author is financially supported by CSIR-UGC JRF, New Delhi, India, through Reference no.: 1272/(CSIR-UGC NET DEC.2016). The third author is financially supported by UGC-BHU Research Fellowship, through sanction letter no: Ref.No./Math/Res/ Sept.2015/2015-16/918.
190
S. K. Mishra et al.
References 1. Karamardian, S.: Complementarity problems over cones with monotone and pseudomonotone maps. J. Optim. Theory Appl. 18, 445–454 (1976) 2. Rockafellar, R.T.: Characterization of the subdifferentials of convex functions. Pacific J. Math. 17, 497–510 (1966) 3. Karamardian, S., Schaible, S.: Seven kinds of monotone maps. J. Optim. Theory Appl. 66(1), 37–46 (1990) 4. Minty, G.J.: On the monotonicity of the gradient of a convex function. Pacific J. Math. 14, 243–247 (1964) 5. Ye, M., He, Y.: A double projection method for solving variational inequalities without monotonicity. Comput. Optim. Appl. 60(1), 141–150 (2015) 6. Kaul, R.N., Kaur, S.: Generalizations of convex and related functions. European J. Oper. Res. 9(4), 369–377 (1982) 7. Delfour, M.C.: Introduction to optimization and semidifferential calculus. Society for Industrial and Applied Mathematics (SIAM). Philadelphia (2012) 8. Penot, J.-P., Quang, P.H.: Generalized convexity of functions and generalized monotonicity of set-valued maps. J. Optim. Theory Appl. 92(2), 343–356 (1997) 9. Koml´ osi, S.: Generalized monotonicity and generalized convexity. J. Optim. Theory Appl. 84(2), 361–376 (1995) 10. Mangasarian, O.L.: Nonlinear Programming. McGraw-Hill Book Co., New YorkLondon-Sydney (1969) 11. Castellani, M., Pappalardo, M.: On the mean value theorem for semidifferentiable functions. J. Global Optim. 46(4), 503–508 (2010) 12. Durdil, J.: On Hadamard differentiability. Comment. Math. Univ. Carolinae 14, 457–470 (1973) 13. Penot, J.-P.: Calcul sous-diff´erentiel et optimisation. J. Funct. Anal. 27(2), 248–276 (1978) 14. Delfour, M.C., Zol´esio, J.-P.: Shapes and geometries, Society for Industrial and Applied Mathematics (SIAM). Philadelphia (2001) 15. Giannessi, F., Maugeri, A. (eds.): Variational Inequalities and Network Equilibrium Problems. Plenum Press, New York (1995)
Parallel Multi-memetic Global Optimization Algorithm for Optimal Control of Polyarylenephthalide’s ThermallyStimulated Luminescence Maxim Sakharov(&)
and Anatoly Karpenko
Bauman MSTU, Moscow, Russia [email protected]
Abstract. This paper presents a modification of the parallel multi-memetic global optimization algorithm based on the Mind Evolutionary Computation algorithm which is designed for loosely coupled computing systems. The algorithm implies a two-level adaptation strategy based on the proposed landscape analysis procedure and utilization of multi-memes. It is also consistent with the architecture of loosely coupled computing systems due to the new static load balancing procedure that allows to allocate more computational resources for promising search domain’s sub-areas while maintaining approximately equal load of computational nodes. The new algorithm and its software implementation were utilized to solve a computationally expensive optimal control problem for a model of chemical reaction’s dynamic for thermally-stimulated luminescence of polyarylenephtalides. Results of the numerical experiments are presented in this paper. Keywords: Global optimization Multi-memetic algorithms
Parallel algorithms
1 Introduction Many real-world global optimization problems are computationally expensive due to the non-trivial landscape of an objective function and high dimension of a problem. To cope with such problems within reasonable time, it is required to utilize parallel computing systems. Nowadays, grid systems, made of heterogeneous personal computers (desktop grids), are widely used for scientific computations [1]. Such systems belong to a class of loosely coupled computing systems. Their popularity is caused by a relatively low cost and simple scaling. On the other hand, desktop grids require intermediate software to organize communication between computing nodes as well as the task scheduling. In general, real-world global optimization problems are frequently solved using various population-based algorithms [2]. One of the main advantages of this class of algorithms, apart from their simplicity of implementation, is a high probability of localizing, so called, sub-optimal solutions, in other words, solutions that are close to the global optimum. In many real-world optimization problems, such solutions are © Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 191–201, 2020. https://doi.org/10.1007/978-3-030-21803-4_20
192
M. Sakharov and A. Karpenko
sufficient. However, the efficiency of population-based algorithms heavily depends on the numeric values of their free parameters, which should be selected based on the characteristics of a problem in hand. It should also be noted, when one deals with the computationally expensive objective functions, a number of evaluations becomes crucial. Additionally, empirical studies suggest, the more information on a problem is included into an algorithm, the better it operates [3]. However, it’s not always feasible to modify an algorithm or tune it to every optimization problem. This is why, modern optimization techniques often utilize preliminary analysis and preprocessing of a problem. This includes initial data analysis, dimensionality reduction of a search domain, landscape analysis of the objective function, etc. [4]. In [5] the authors proposed a two-level adaptation technique for population-based algorithms, designed to extract information from an objective function prior to the optimization process at the first level and provide adaptation capabilities for fine tuning at the second level. The first level is based on the proposed landscape analysis (LA) method which utilizes a concept of Lebesgue integrals and allows grouping objective functions into three categories. Each category suggests a usage of specific values of the basic algorithm’s free parameters. At the second level it was proposed to utilize multimemetic hybridization of the basic algorithm with suitable local search techniques. Such an approach helps to adjust the algorithm to a specific problem while maintaining some adaptation capabilities in cases when the LA procedure fails to determine all objective’s function distinct features. When solving an optimization problem on parallel computing systems in general, and on loosely coupled systems in particular, one of the main difficulties is the optimal mapping problem [6] – how to distribute groups of sub-problems over processors. It should be noted that a problem of optimal mapping of computational processes onto a parallel computing system is one of the main issues associated with parallel computations. It is well known that such a problem is NP-complete and can be solved with exact methods within a very narrow class of problems [6]. Various methods of load balancing are applied to obtain an approximate solution of the optimal mapping problem. The main idea behind those methods is to distribute the computations over the processors in such a way that the total computing and communication load is approximately the same for each processor. In [7] the authors proposed a static load balancing method for loosely coupled systems that minimizes the number of interactions between computation nodes. The static load balancing is based on the results of the LA procedure and helps to allocate more computational resources to the promising search domain’s sub-areas. Such a tight integration between the algorithm and the load balancing provides consistency of the algorithm with the architecture of the computing system. This work deals with the Simple Mind Evolutionary Computation (Simple MEC, SMEC) algorithm [8]. It was selected for investigation because it is highly suitable for parallel computations, especially for loosely coupled systems. In general, to be efficient on loosely coupled systems, a basic optimization algorithm must imply a minimum number of interactions between sub-populations which evolve on separate computing nodes. Only a few currently known population-based algorithms, including the SMEC algorithm, meet this requirement.
Parallel Multi-memetic Global Optimization Algorithm
193
This paper presents the modified parallel MEC algorithm with the incorporated LA procedure and accompanied with the static load balancing method. An outline of the algorithm as well as the brief description of its software implementation are described in this paper. In addition, the computationally expensive optimal control problem of polyarylenephthalide’s thermally-stimulated luminescence was studied in this work and solved using the proposed technique.
2 Problem Statement and the SMEC Algorithm In this paper we consider a deterministic global constrained minimization problem min Uð X Þ ¼ UðX Þ ¼ U :
X2DRn
ð1Þ
Here Uð X Þ is the scalar objective function, UðX Þ ¼ U is the required minimal value, X ¼ ðx1 ; x2 ; . . .; xn Þ is the n-dimensional vector of variables, Rn is the n-dimensional arithmetical space, D is the constrained search domain. Initial values of vector X are generated within a domain D0 , which is defined as follows n xi xmax D0 ¼ X xmin i i ; i 2 ½ 1 : n R : In this work, the SMEC algorithm is considered as a basic algorithm. It belongs to a class of MEC algorithms [9] inspired by a human society and simulate some aspects of human behavior. An individual s is considered as an intelligent agent which operates in a group S made of analogous individuals. During the evolution process every individual is affected by other individuals within a group. This simulates the following logic. In order to achieve a high position within a group, an individual has to learn from the most successful individuals in this group. Groups themselves should follow the same principle to stay alive in the intergroup competition. The detailed description of the SMEC algorithm is presented in [10].
3 Parallel M3MEC Algorithm This paper presents the new parallel Modified Multi-Memetic MEC (M3MEC) algorithm. The SMEC algorithm is based on the three stages: initialization, similar taxis and dissimilation [10]. In turn, the initialization stage of the M3MEC algorithm contains the LA procedure which is based on the concept of Lebesgue integral [5, 11] and divides objective function’s range space into levels based on the values Uð X Þ. This stage can be described as follows. 1. Generate N quasi-random n-dimensional vectors within domain D0 . In this work LPs sequence was used to generate quasi-random numbers since it provides a highquality coverage of a domain.
194
M. Sakharov and A. Karpenko
2. For every Xr ; r 2 ½1 : N calculate the corresponding values of the objective function Ur and sort those vectors in ascending order of values Ur ; r 2 ½1 : N . 3. Equally divide a set of vectors ðX1 ; X2 ; . . .; XN Þ into K sub-domains so that subdomain k1 contains the lowest values Uð X Þ. 4. For every sub-domain kl ; l 2 ½1 : K calculate a value of its diameter dl - a maximum Euclidian distance between any two individuals within this sub-domain (Fig. 1).
a) Distribution of individuals for four sub-populations
b) Determining a diameter of the first sub-population
Fig. 1. Determining a diameter of the first sub-domain for the benchmark composition function 1 from CEC’14
5. Build a linear approximation for the dependency of diameter d on sub-domain number l, using the least squares method. 6. Put the objective function Uð X Þ into one of three categories based on the calculated values (Table 1). Each of three categories represents a certain topology of objective function Uð X Þ. Table 1. Classification of objective functions based on the LA results. d ðlÞ increases Nested sub-domains with the dense first domain (category I)
d ðlÞ neither increases nor decreases Non-intersected subdomains of the same size (category II)
d ðlÞ decreases Distributed sub-domains with potential minima (category III)
There are three possible cases for approximated dependency d ðlÞ: d can be an increasing function of l; d can decrease as l grows; d ðlÞ can be neither decreasing nor increasing. Within the scope of this work it is assumed that the latter scenario takes place when a slope angle of the approximated line is within 5 . Each case represents a
Parallel Multi-memetic Global Optimization Algorithm
195
certain set of the numeric values of M3MEC’s free parameters suggested based on the numeric studies [10]. The similar taxis stage was modified in M3MEC in order to include meme selection and local improvement stage. Meme selection is performed in accordance with the simple random hyper-heuristic [12]. Once the most suitable meme is selected for a specific sub-population it is applied randomly to a half of its individuals for kls ¼ 10 iterations. The dissimilation stage of SMEC was not modified in M3MEC. To handle constraints of a search domain D the death penalty technique [2] was utilized during the similar taxis stage. In this work four local search methods were utilized, namely, Nelder-Mead method [13], Hooke-Jeeves method [14], Monte-Carlo method [15], and Random Search on a Sphere [16]. Only zero-order methods were used to deal with problems where the objective function’s derivative is not available explicitly and its approximation is computationally expensive. The similar taxis and dissimilation stages are performed in parallel independently for each sub-population. To map those sub-populations on the available computing nodes the static load balancing was proposed by the authors [5] specifically for loosely coupled systems. We modify the initialization stage described above so that at the step 2 apart from calculating the values of the objective function Ur , time required for those calculations tr is also measured. The proposed adaptive load balancing method can be described as follows. 1. For each sub-population Kl , l 2 ½1 : jK j we analyze all time measurements tr for the corresponding vectors Xr ; r 2 ½1 : N=jK j whether there are outliers or not. 2. All found outliers t are excluded from sub-populations. A new sub-population is composed from those outliers and it can be investigated by the user’s request after the computational process is over. 3. All available computing nodes are sorted by their computation power then the first sub-population K1 is sent to the first node. 4. Individuals in other sub-populations are re-distributed between neighboring subpopulations starting from K2 so that the average calculation time would be approximately the same for every sub-populations. Balanced sub-populations Kl , l 2 ½2 : jK j are then mapped onto the computational nodes. The modified similar taxis stage along with the dissimilation stage are launched on each node with the specific values of the free parameters in accordance with the results of landscape analysis. Each computing node utilizes the stagnation of computational process as a termination criterion while the algorithm in general works in a synchronous mode so that the final result is calculated when all computing nodes completed their tasks.
196
M. Sakharov and A. Karpenko
4 Optimal Control of Polyarylenephthalide’s ThermallyStimulated Luminescence The Modified Multi-Memetic MEC algorithm (M3MEC) along with the utilized memes were implemented by the authors in Wolfram Mathematica. Software implementation has a modular structure, which helps to modify algorithm easily and extend it with additional assisting methods and memes. The proposed parallel algorithm and its software implementation were used to solve an optimal control problem for the thermally-stimulated luminescence of polyarylenephthalides (PAP). Nowadays, organic polymer materials are widely used in the field of optoelectronics. The polyarylenephthalides are high-molecular compositions that belong to a class of unconjugated cardo polymers. PAPs exhibit good optical and electrophysical characteristics along with the thermally-stimulated luminescence. Determining the origins of PAP’s luminescent states is of both fundamental and practical importance [17]. 4.1
Thermally-Stimulated Luminescence of Polyarylenephthalides
Physical experiments [18] suggest that there are at least two types of stable reactive species which are produced in PAP. These species have different activation energy levels or in other words trap states. The dynamic model studied in this work was proposed in the Institute of Petrochemistry and Catalysis of Russian Academy of Science (IPC RAS) [18, 19]. It includes the following process: recombination of stable ion-radicals ðy1 ; y2 Þ; repopulation of ion radical trap states; luminescence or, in other words, deactivation of excited state y3 and emission of quantum of light y4 . The model can be represented as follows: 8 > y 1 ðt Þ2 ; y01 ðtÞ ¼ 374 exp 10 10 69944 > þ 8;31T ð t Þ > > > > > > y0 ðtÞ ¼ 396680 exp 10 101630 > y2 ðtÞ2 þ 1; 99 108 exp 10 10 21610 y3 ðtÞ; > 2 10 þ 8;31T ð t Þ þ 8;31T ð t Þ > > > < 69944 101630 2 y y2 ðtÞ2 ð t Þ þ 198340 exp y03 ðtÞ ¼ 187 exp 1 10 10 þ 8; 31T ðtÞ 10 10 þ 8; 31T ðtÞ > > > > > > 21610 > 7 > > 21010 y y3 ðtÞ ð t Þ 9; 98 10 exp 3 > > 10 10 þ 8; 31T ðtÞ > > : y04 ðtÞ ¼ 21010 y3 ðtÞ:
ð2Þ Here y1 , y2 represent the initial stable species of various nature; y3 is some certain excited state where y1 and y2 are headed to; y4 – quants of light. T ðtÞ is the reaction temperature. Luminescence intensity I ðtÞ is being calculated according to the formula I ðtÞ ¼ 544663240 y3 ðtÞ. Relative units are used for measuring I ðtÞ. Initial concentrations of the species in the reaction equal y1 ð0Þ ¼ 300; y2 ð0Þ ¼ 1000; y3 ð0Þ ¼ y4 ð0Þ ¼ 0. The integration interval equals ½0; 2000 seconds.
Parallel Multi-memetic Global Optimization Algorithm
4.2
197
Optimal Control Problem and Numerical Experiments
For the chemical reaction it is required to determine the law of variation of temperature over time T ðtÞ, which would guarantee the desired law of variation of the thermosstimulated luminescence intensity I ðtÞ of PAP. The physical experimental setup for chemical reaction imposes restrictions on the minimal and maximal values of the temperature 298K T ðtÞ 460K. The optimal control problem was transformed in this work to a global optimization problem in the following manner. Integration interval ½0; 2000 is discretized so, that the length of one section ½ti ; ti þ 1 meets the restrictions imposed by the experimental setup on the velocity of the change in temperature T ðtÞ. The values of T ðti Þ, are the components of vector X ¼ ðx0 . . .xn 1 Þ. The piecewise linear function was selected for approximation of the function T ðtÞ. The following objective function was proposed in this study: J ðT ðtÞÞ ¼
2000 Z
Iref ðtÞ I ðT ðtÞÞ2 dt ! min :
0
T ðtÞ
ð3Þ
The global minimization problem (3) was solved using the proposed M3MECP algorithm and its software implementation. The following values of the algorithm’s free parameters were utilized: the number of groups c ¼ 90; the number of individuals in each group jSj ¼ 50; the stagnation iteration number kstop ¼ 100; tolerance used for identifying stagnation was equal to e ¼ 10 5 . All computations were performed with a use of desktop grid made of eight personal computers that didn’t communicate with each other. The number of sub-populations jK j ¼ 8 was selected to be equal to the number of computing nodes. In order to increase the probability of localizing global optima, the multi-start method with 15 launches was used. The BDF integration method was utilized at every evaluation to solve (2). The first set of experiments was devoted to studying dynamics of the model (2) under the constant values of the temperature in the reaction within the range of T ¼ 150::215 C with the step size of 10 degrees (Fig. 2). Obtained results demonstrate that under any constant temperature the luminescence intensity I ðtÞ decreases over time. Furthermore, the higher the temperature, the faster the luminescence intensity decreases. The second set of experiments was devoted to maintaining the constant value of the luminescence intensity I ðtÞ. The results obtained for the target value Iref ðtÞ ¼ 300 are displayed in Fig. 3. They suggest that in order to maintain the constant value of I ðtÞ, the reaction temperature has to increase similar to the linear law of variation (approximately by 1 C every 300 s). This implies the restriction on the reaction time as it is impossible to maintain the constant growth of the temperature in the experimental setup. The third set of experiments was conducted to determine the law of variation of the temperature T ðtÞ that would provide the required pulse changes in the luminescence intensity I ðtÞ. Obtained results are presented in Fig. 4. The optimal control trajectory repeats the required law of I ðtÞ variation with the addition of linear increasing trend.
198
M. Sakharov and A. Karpenko
Fig. 2. PAP’s luminescence intensity under various constant values of the temperature T 2 ½150::215 C
161 160 159 T(t), °C 158 157 156
0
500
1000
1500
2000
t, s
a) Obtained and required stant luminescence intensity
con-
b) Obtained optimal control temperature
Fig. 3. Optimal control for maintaining the constant value of PAP’s luminescence intensity
164
162
T(t), °C 160
158
156
0
500
1000
1500
t, s
a) Obtained and required luminescence intensity with two pulses
b) Obtained optimal control temperature
Fig. 4. Optimal control for providing pulse changes in the luminescence intensity
2000
Parallel Multi-memetic Global Optimization Algorithm
199
Figure 5 displays the results of numerical experiments, that were conducted in order to determine the law of variation of the reaction’s temperature T ðtÞ, that would provide harmonic oscillations of luminescence intensity I ðtÞ with the amplitude of 40 relative units and the oscillation period of approximately 200 s. Once again, the optimal control trajectory repeats the required law of I ðtÞ variation with the addition of linear increasing trend. 162
160
T(t), °C 158
156
154
0
500
1000
1500
2000
t, s
a) Obtained and required harmonic oscillations of luminescence intensity
b) Obtained optimal control temperature
Fig. 5. Optimal control for providing harmonic oscillations of the luminescence intensity
5 Conclusions This paper presents the new modified parallel population based global optimization algorithm designed for loosely coupled systems and its software implementation. The M3MEC-P algorithm is based on the adaptation strategy and the landscape analysis procedure originally proposed by the authors and incorporated into the traditional SMEC algorithm. The algorithm is capable of adapting to various objective functions using both static and dynamic adaptation. Static adaptation was implemented with a use of landscape analysis, while dynamic adaptation was made possible by utilizing several memes. The proposed landscape analysis is based on the concept of Lebesgue integral and allows one to group objective functions into six categories. Each category suggests a usage of the specific set of values for the algorithm’s free parameters. The proposed algorithm and its software implementation were proved to be efficient when solving a real world computationally expensive global optimization problem: determination of kinetics of the thermally-stimulated luminescence of polyarylenephthalides. Further research will be devoted to the study of asynchronous stopping criteria, as well as the investigation of different architectures of loosely coupled systems. Acknowledgments. This work was supported by the RFBR under a grant 18-07-00341.
200
M. Sakharov and A. Karpenko
References 1. Sakharov, M.K., Karpenko, A.P., Velisevich, Ya.I.: Multi-memetic mind evolutionary computation algorithm for loosely coupled systems of desktop computers. In: Science and Education of the Bauman MSTU, vol. 10, pp. 438–452 (2015). https://doi.org/10.7463/1015. 0814435 2. Karpenko, A.P.: Modern algorithms of search engine optimization. Nature-inspired optimization algorithms. Moscow, Bauman MSTU Publ., p. 446 (2014) 3. Neri, F., Cotta, C., Moscato, P.: Handbook of Memetic Algorithms, pp. 368. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-23247-3 4. Mersmann, O. et al.: Exploratory landscape analysis. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. ACM, pp. 829–836. (2011). https:// doi.org/10.1145/2001576.2001690 5. Sakharov, M., Karpenko, A.: Multi-memetic mind evolutionary computation algorithm based on the landscape analysis. In: Theory and Practice of Natural Computing. 7th International Conference, TPNC 2018, Dublin, Ireland, 12–14 Dec 2018, Proceedings, pp. 238–249. Springer (2018). https://doi.org/10.1007/978-3-030-04070-3 6. Voevodin, V.V., Voevodin, Vl. V.: Parallel Computations, p. 608. BHV-Peterburg, SPb. (2004) 7. Sakharov, M.K., Karpenko, A. P.: Adaptive load balancing in the modified mind evolutionary computation algorithm. In: Supercomputing Frontiers and Innovations, 5(4), 5–14 (2018). https://doi.org/10.14529/jsfi180401 8. Jie, J., Zeng, J.: Improved mind evolutionary computation for optimizations. In: Proceedings of 5th World Congress on Intelligent Control and Automation, Hang Zhou, China, pp. 2200– 2204 (2004). https://doi.org/10.1109/WCICA.2004.1341978 9. Chengyi, S., Yan, S., Wanzhen, W.: A Survey of MEC: 1998-2001. In: 2002 IEEE International Conference on Systems, Man and Cybernetics IEEE SMC2002, Hammamet, Tunisia. October 6–9. Institute of Electrical and Electronics Engineers Inc., vol. 6, pp. 445– 453 (2002). https://doi.org/10.1109/ICSMC.2002.1175629 10. Sakharov, M., Karpenko, A.: Performance investigation of mind evolutionary computation algorithm and some of its modifications. In: Proceedings of the First International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’16), pp. 475–486. Springer (2016). https://doi.org/10.1007/978-3-319-33609-1_43 11. Sakharov, M., Karpenko, A.: A new way of decomposing search domain in a global optimization problem. In: Proceedings of the Second International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’17), pp. 398–407. Springer (2018). https://doi.org/10.1007/978-3-319-68321-8_41 12. Ong, Y.S., Lim, M.H., Zhu, N., Wong, K.W.: Classification of adaptive memetic algorithms: a comparative study. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, pp. 141–152 (2006) 13. Nelder, J.A., Meade, R.: A Simplex method for function minimization. Comput. J. 7, 308– 313 (1965) 14. Karpenko, A.P.: Optimization Methods (Introductory Course), http://bigor.bmstu.ru/. Accessed 25 Mar 2019 15. Sokolov, A.P., Pershin, A.Y.: Computer-aided design of composite materials using reversible multiscale homogenization and graph-based software engineering. Key Eng. Mater. 779, 11–18 (2018). https://doi.org/10.4028/www.scientific.net/KEM.779.11
Parallel Multi-memetic Global Optimization Algorithm
201
16. Agasiev, T., Karpenko, A.: The program system for automated parameter tuning of optimization algorithms. Proc. Comput. Sci. 103, 347–354 (2017). https://doi.org/10.1016/j. procs.2017.01.120 17. Antipin, V.A., Shishlov, N.M., Khursan, S.L.: Photoluminescence of polyarylenephthalides. VI. DFT study of charge separation process during polymer photoexcitation. Bulletin of Bashkir University, vol. 20, Issue 1, pp. 30–42 (2015) 18. Akhmetshina, L.R., Mambetova, Z.I., Ovchinnikov, M.Y.: Mathematical modeling of thermoluminescence kinetics of polyarylenephthalides. In: V International Scientific Conference on Mathematical Modeling of Processes and Systems, pp. 79–83 (2016) 19. Antipin, V.A., Mamykin, D.A., Kazakov, V.P.: Recombination luminescence of poly (arylene phthalide) films induced by visible light. High Energy Chem. 45(4), 352–359 (2011)
Proper Choice of Control Parameters for CoDE Algorithm Petr Bujok(B) , Daniela Einˇspiglov´ a , and Hana Z´ ameˇcn´ıkov´ a University of Ostrava, 30. Dubna 22, 70200 Ostrava, Czech Republic {petr.bujok,daniela.einspiglova,hana.zamecnikova}@osu.cz
Abstract. An adaptive variant of CoDE algorithm uses three couples of settings of two control parameters. These combinations provide well performance when solving a various type of optimisation problems. The aim of the paper is to replace original values of control parameters in CoDE to achieve better efficiency in real-world problems. Two different variants of enhanced CoDE algorithm are proposed and compared with the original CoDE variant. The new combinations of F and CR parameters are selected from results provided in a preliminary study where 441 various combinations of these parameters were evaluated. The results show that newly proposed CoDE variants (CoDEFCR1 and CoDEFCR2 ) perform better than the original CoDE in most of 22 real-world problems. Keywords: Global optimisation · Differential evolution · Control parameters · CoDE · Real-world problems · Experimental comparison
1
Introduction
A proper setting of control parameters in Differential evolution (DE) algorithm plays an important role when solving various optimisation problems. More precisely, there is no one proper setting which performs the best on most of the problems (No-Free-Lunch theorem [16]). Although there is a lot of various approaches for adapting values of DE parameters, no one is able to be the best. This paper is focused on a more proper setting of two control parameters of the DE algorithm based on preliminary work. Our preliminary comprehensive experiment provides very interesting results of the DE algorithm solving realworld problems. A lot of combinations of two DE control parameters are studied and evaluated by selected problems to order them from more efficient to more inefficient. Although DE has only few control parameters, the efficiency is very sensitive especially to the control parameter setting of F and CR values. Unfortunately, simple trial-and-error tuning of the parameters requires a lot of time. Several authors recommended the setting of DE control parameters [7,9,10], unfortunately, these values are valid only for a part of optimisation problems. This fact results in a lot of adaptive mechanisms controlling values of F and CR were proposed e.g. [1,11,12,14,15]. The summary of DE research has been presented c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 202–212, 2020. https://doi.org/10.1007/978-3-030-21803-4_21
Proper Choice of Control Parameters for CoDE Algorithm
203
recently in the several comprehensive papers [4,6,8]. One of the well-performing adaptive DE algorithms is called CoDE [14], and in this paper, the CoDE algorithm was selected to enhance settings of the control parameters. This algorithm differs from other DE variants by using of three various triplets of control parameters. Three triplets of original CoDE setting are replaced by two other triplets which achieve a substantial efficiency in our preliminary experiments. The original and two newly proposed variants of CoDE algorithm are compared on a set of real-world optimisation problems CEC 2011 [5]. The main goal of this paper is to show if better performing setting of DE control parameters increases the efficiency of the CoDE algorithm substantially. In the global optimisation, problems are represented by an objective function f (x), x = (x1 , x2 , . . . , xD ) ∈ IRD defined on the search domain Ω limited by D lower and upper boundaries, i.e. Ω = j=1 [aj , bj ], aj < bj , j = 1, 2, . . . , D. The solution of the problem is the global minimum point x∗ , which satisfies condition f (x∗ ) ≤ f (x), ∀x ∈ Ω. The rest of the paper is organised as follows. Section 2 shows a description of the adaptive DE algorithm used in the experiments. A brief report of the experimental study regarding DE control parameters is given in Sect. 3. Newly proposed variants of adaptive DE algorithms are described in Sect. 4. Experimental setting and methods applied to statistical assessment are described in Sect. 5. Experimental results on real-world optimisation problems are presented in Sect. 6. Section 7 brings the conclusion of the paper with some final remarks.
2
Adaptive Variant of CoDE
In this experiment we use DE algorithm with composite trial vector generation strategies and control parameters (called CoDE) presented by Wang et al. in 2011 [14]. Authors of CoDE algorithm compare this algorithm with four adaptive DE variants (jDE, SaDE, JADE, EPSDE) and the results showed that CoDE is at least competitive with the algorithms in the comparison. In the CoDE, three well-studied trial vector strategies with three control parameter settings are randomly combined to generate trial vectors. The strategies are rand/1/bin, rand/2/bin, and current-to-rand/1. Therefore, three different offspring vectors are generated and the vector with the least function value from this triplet is selected as the trial vector. This mechanism promises faster convergence because the best ‘possible’ solution is preferred. On the other hand, for each solution, three vectors are evaluated by the objective function which decreases time to search (the time is measured by the number of function evaluations). The values of control parameters F and CR are also chosen randomly from the parameter pools containing [F = 1.0, CR = 0.1], [F = 1.0, CR = 0.9], and [F = 0.8, CR = 0.2]. After the current-to-rand/1 mutation, no crossover is applied because it includes so-called arithmetic crossover making this strategy rotation invariant.
204
P. Bujok et al.
Although CoDE is not very often used in real applications, results of the following experiments show that there are some problems in which the DE variant performs better compare to other algorithms [2,3].
3
Efficiency of DE Control Parameters
A lot of different approaches to adapt values of F and CR during a search process in DE algorithm were studied. Several widely-used adaptive DE are described and partially compared in [4,6,13]. Although many of adaptive DE algorithms are very efficient, the values of F and CR control parameters are usually sampled from interval [0, 1]. It is clear that although one combination of F , CR values provides the best results for one optimisation problem, the same setting performs poorly in another task. When a relative wide sampling interval of control parameters is used, it promises a possibility to use also some efficient settings. On the other hand, a relatively big amount of unsuccessful F , CR values is also sampled which can cause a slow convergence of the DE algorithm. This fact was the main inspiration for the experiment where usual sampling interval of DE control parameters is equidistantly divided to get a lot of combinations of F , CR values. All the combinations were evaluated by real-world problems and obtained results are statistically assessed in some conclusions. In our experimental study, the same equidistant values with step 0.05 are used. It means that 21 same values {0, 0.05, 0.1, 0.15, . . . , 0.95, 1} are used for each parameter. Necessary to note, the most frequently used mutation variant rand/1 and binomial crossover were selected in this experiment. It gives totally 21 × 21 = 441 combinations of F and CR setting. Each setting was evaluated by 22 real-world CEC 2011 optimisation problems, details of this set are provided in Sect. 5. All predefined combinations are statistically assessed using the nonparametric Friedman test. It provides global insight into the performance of classic DE variant where various control parameters settings are used. The test was carried out on medians of minimal function values at the end of the search. The null hypothesis on equivalent efficiency of the settings was rejected with p < 5 × 10−6 . Each combination of F and CR is evaluated by a mean-rank value which represents the overall performance including all selected problems. Afterwards, lower mean-rank values represent DE settings which provide better results in overall problems. The plot representing the efficiency of all 441 combinations of F and CR on all 22 real-world problems is shown in Fig. 1. The most efficient combination (the least mean-rank value) is represented by a black square and the worst efficient setting (the biggest mean-rank value) is illustrated by a white square, respectively. We can see that there is an interesting continuous dark area in which the best combination F = 0.45 and CR = 0.95 is located. Conversely, the worst performance is provided by a combination of F = 0 and CR = 1. This fact is caused by a zero diversity step (F = 0) and high propagation of such solution (CR = 1). There are two bright regions where the efficiency of DE setting is rather worse, especially CR is close to 1, and F is rather zero.
Proper Choice of Control Parameters for CoDE Algorithm
205
Fig. 1. Mean ranks from Friedman test and 441 F , CR settings and all problems.
4
Proposed Variants of CoDE Algorithm
The preliminary experiment with many combinations of F and CR parameters shows that three original combinations of DE parameters used in CoDE are not very efficient. The rank of [F = 1, CR = 0.1] is 221, combination of [F = 1, CR = 0.9] is on 418th position, and the last setting of [F = 0.8, CR = 0.2] has rank 286. The aim is to select another three combinations of F and CR achieved better overall results. The results were obtained only with classical DE and rand/1/bin. CoDE is based on a rand approach (rand/1, rand/2, and current-to-rand), so similarity with classical DE is supposed. In this paper, two different well-performing settings of DE parameters are selected to replace the original values in the adaptive CoDE algorithm. For simplicity, the newly proposed algorithms are labelled CoDEFCR1 and CoDEFCR2 , and detail description of both variants are in the following paragraphs. 4.1
CoDEFCR1 : Better Average Performing Setting
The original CoDE algorithm uses three combinations of F and CR settings which achieve absolute ranks 221, 286, and 418, i.e. estimated average mean rank of CoDE setting is approximately 308. In the first proposed enhanced CoDEFCR1 variant, three different couples of control parameters with good efficiency are selected to replace the original settings. The best results provides [F = 0.45, CR = 0.95] (absolute rank 1), combination of [F = 0.95, CR = 1] achieves absolute rank 8, and combination of [F = 0.05, CR = 0.05] is on 20th position. These settings achieve average rank 10 in preliminary experiment. Substantially better-performing combinations of control
206
P. Bujok et al.
parameters are used in CoDE to increase performance when The remaining setting of CoDEFCR1 is the same as the setting of the original CoDE algorithm. 4.2
CoDEFCR2 : Worse Average Performing Setting
The best performing setting used in previous enhance CODE algorithm ([F = 0.45, CR = 0.95]) is replaced by a combination [F = 0.3, CR = 0.8] which achieves the 46th position in the preliminary experiment. Other two settings remain the same–combination of [F = 0.95, CR = 1] is on 8th position, and the [F = 0.05, CR = 0.05] is on 20th position. The estimated average mean rank of the settings of CoDEFCR2 is 18. All remaining parameters of this algorithm are set according to the original CoDE variant.
5
Experimental Settings
The main aim of this study is to increase the efficiency of CoDE algorithm on real-world problems. Therefore, the test suite of 22 real-world problems selected for CEC 2011 competition in Special Session on Real-Parameter Numerical Optimization [5] is used as a benchmark in the experimental comparison. The functions in the benchmark differ in the computational complexity and in the dimension of the search space which varies from D = 1 to D = 240. For each algorithm and problem, 25 independent runs were carried out. The run of the algorithm stops if the prescribed number of function evaluations MaxFES = 150000 is reached. The partial results of the algorithms after reaching one third and two-thirds of MaxFES were also recorded for further analysis. The point in the terminal population with the smallest function value is the solution of the problem found in the run. The minimal function values of the problems are unknown, the algorithm providing lower function value is better performing. The experiments in this paper can be divided into two parts. In the first part, a classic DE algorithm with strategy rand/1/bin and 441 different combinations of F and CR setting was studied, as mentioned in Sect. 3. The only control parameter is population size, and it is set N = 100. In the second part of the experiment, the original CoDE algorithm and two newly proposed enhanced CoDE variants (CoDEFCR1 , CoDEFCR2 ) were applied to the set of 22 real-world problems. The names of the newly proposed variants are abbreviated to FCR1 (CoDEFCR1 ) and FCR2 (CoDEFCR2 ) in some parts of the results. The other control parameters are set up according to the recommendation of authors in the original paper. All the algorithms are implemented in Matlab 2017b, and all computations were carried out on a standard PC with Windows 7, Intel(R) Core(TM)i7-4790 CPU 3.6 GHz, 16 GB RAM.
6
Results
The original CoDE algorithm and two newly proposed enhanced CoDE variants are compared on 22 real-world problems. Global insight into overall algorithms
Proper Choice of Control Parameters for CoDE Algorithm
207
performance provides Friedman statistical test applied to the median values of each problem and algorithm. The zero hypothesis about equality of algorithms’ results was rejected in each stage of the run, where the significance level was set at 1×10−3 . For better illustration, the mean ranks of three compared algorithms are illustrated in Fig. 2. We can see that the original CoDE algorithm performs substantially worse compared to the newly proposed variants. The performance of CoDEFCR1 is better than the original CoDE, the difference between the algorithms is decreased with increased number of function evaluations. On the other hand, the second proposed variant CoDEFCR2 performs the best, and the efficiency of this algorithm is rather invariant with increasing function evaluations. The average better-performing combinations of F and CR used in CoDEFCR1 cause better results compare to the original CoDE algorithm. It is surprising that average worse performing combinations of F and CR (when the best combination is replaced by a substantially worse performing one) used in CoDEFCR2 achieve the best performance overall real-world problems.
Fig. 2. Mean rank values for three CoDE variants in three stages of the search from Friedman test.
More detailed results from the comparison of three CoDE variants provide non-parametric Kruskal-Wallis test with Dunn’s multiple comparison methods. This test is applied on each problem separately to show which settings of control parameters in CoDE is more efficient. The zero hypothesis about algorithms’ performance was rejected in most of problems at significance level 1 × 10−4 . The
1
30
30
20
7
126 1823.74
T04
T05
T06
T07
T08
T09
0
0
1.15E-05 1.15E-05 No difference
No difference
2719.55
220
1.39838 1323.17
220
1.04545 2128.9
220
1.40338 1188.79
220
1.18516 1746.66
220
1.36425
FCR2,FCR1 CoDE
No difference
No difference
120 51855.2
240 15444.2
934483
17.263
6
13
15
40
140 934946
940282
96
96
22
T11.2
T11.3
T11.4
T11.5
T11.6
T11.7
T11.8
T11.9
T11.10 96
26
T11.1
T13
T14
16.3306
1868890
124765
32797.9
18263.1
1070680
12
T10 51439.1
126989
32715.2
18075.3
15444.2
21.14
20.4819
936501
960844
937003
8.93319
13.555
933303
944549
935498
1892380 1892840
126703
32847.6
18372.8
15444.2
1074570 1084660
53034.7
15.5489
18.0592
939932
968972
940008
1908440
128813
32766.3
18085.5
15444.2
1130970
52220.5
8.73351
11.6715
935862
941807
934834
1874920
126883
32729.9
18019.2
15444.2
1069540
51299.7
15.3736
18.2202
938568
983218
939806
1901360
128564
32768.2
18080.9
15444.2
1113210
52179.2
FCR2,FCR1
FCR2,FCR1
FCR2,FCR1
FCR2,FCR1
FCR1,FCR2 CoDE
FCR2,FCR1 CoDE
CoDE
No difference
CoDE
CoDE,FCR2 FCR1
CoDE
FCR1,FCR2 CoDE
FCR2,FCR1 CoDE
No difference
CoDE
FCR2,FCR1 CoDE
−21.8285 −21.5793 −21.8425 −21.6437 −21.8425 −21.6437 FCR1,FCR2 CoDE
220
1.20988
−28.1222 −26.0169 FCR2,FCR1 CoDE
0 −33.5565 −32.3392 FCR2,FCR1 CoDE
0
Worst
FCR1,FCR2 CoDE
Best
−26.5649 −21.6885 −28.0248 −25.9769
0
8.81E-29
Median
Kruskal-Wallis
−22.7103 −20.2401 FCR1,FCR2 CoDE
0
Best
CoDEFCR2
−33.5321 −29.8468 −33.2764 −32.2624
0
1.15E-05 1.15E-05 1.15E-05 1.15E-05
−11.6394 −22.1027 −20.1225
1
T03
0
Median
−15.366
30
T02
0
Best
2.69825
Median
CoDEFCR1
0.57406
6
T01
Best
D
Fun
CoDE
Table 1. Median values of three CoDE variants and Results of Kruskal-Wallis tests.
208 P. Bujok et al.
Proper Choice of Control Parameters for CoDE Algorithm
209
minimum and median values of 25 runs for each variant along with the best and the worst performing variant for each problem are figured in Table 1. In the ‘best’ column, there are the significantly better performing variants, and in the ‘worst’ column, the variants providing the worst efficiency are located. It is clear, the most frequently worst performing variant is the original CoDE algorithm which loses in 12 out of 22 problems. On the other hand, both newly proposed CoDE variants win in 12 problems out of 22, variant CoDEFCR2 outperforms the CoDEFCR1 significantly only in one problem. In six problems out of 22, all CoDE variants perform similarly. In the left side of the table, the least median value of each problem is printed bold and underlined. The original CoDE provides the best median value in 7 out of 22 problems, CoDEFCR1 variant is best performing in 4 out of 22 problems, and CoDEFCR2 achieves the least median in 9 out of 22 problems. Table 2. Average frequencies of use of three CoDE variants over 22 real-world problems. CoDE
rand/1
rand/2
current-to-rand/1
F
1
1
0.8
1
1
0.8
1
1
0.8
CR
0.1
0.9
0.2
0.1
0.9
0.2
0.1
0.9
0.2
Avg
11.8
10.3
18.8
3.4
2.1
19.2
6.2
5.4
22.9
CoDEFCR1 rand/1 F CR Avg
0.05 0.05 12.2
rand/2 0.45 0.95
10.1
0.05
0.45
1
0.05
26.3
6.3
CoDEFCR2 rand/1 F CR Avg
0.95
0.05
0.45
0.95
1
0.05
0.95
1
3.9
18.8
5
3.5
13.9
rand/2
0.05
0.3
0.05 11.9
current-to-rand/1
0.95
0.95
current-to-rand/1
0.95
0.05
0.3
0.8
1
0.05
0.8
9.8
25.6
5.1
3.2
0.95
0.05
0.3
0.95
1
0.05
0.8
1
21.4
5.5
3.9
13.8
Furthermore, frequencies of use of all nine settings in each of three CoDE variants are studied. Three compared CoDE variants use the same three strategies (rand/1/bin, rand/2/bin, and current-to-rand/1) which are combined with three couples of F and CR. The average frequencies of 25 runs for each problem are computed and illustrated in Fig. 3. The problems (horizontal axis) are sorted by the level of dimension. The problems with the same dimension are ordered based on their names (for example first is the problem T03 and second is T04). The same types of lines are used for the same strategies. There is no big difference between plots of CoDEFCR1 and CoDEFCR2 . Although different combinations of parameters are used (F = 0.45 and CR = 0.95, and F = 0.3 and CR = 0.8), both the settings are used with similar frequencies. The only visible difference is observed in strategy rand/2/bin with F = 0.95 and CR = 1,
210
P. Bujok et al.
Fig. 3. Frequencies of strategies in CoDE, CoDEFCR1 , and CoDEFCR2 for 22 real-world problems (rand/1/bin–dot line, rand/2/bin–dash line, current-to-rand/1–solid line).
the frequency of this setting is used more frequently in CoDEFCR2 . In the original CoDE variant, the settings with F = 0.8 and CR = 0.2 are used similarly frequently as the combinations F = 0.95 and CR = 1 in new variants, the curves are in CoDE more similar.
Proper Choice of Control Parameters for CoDE Algorithm
7
211
Conclusion
Based on the experimental results, it is obvious that the original CoDE algorithm performs substantially worse compared to the newly proposed variants. The performance of CoDEFCR1 is better than the original CoDE, the difference between the algorithms is decreased with increased number of function evaluations. The second proposed CoDEFCR2 performs best, the efficiency is rather invariant with increasing function evaluations. The worse average performing combinations of F and CR used in CoDEFCR2 achieve the best overall performance in real-world problems. The most frequently worst performing variant is the original CoDE algorithm which loses in 12 out of 22 problems, both newly proposed CoDE variants win in 12 problems out of 22. CoDEFCR2 outperforms the CoDEFCR1 significantly only in one problem. Proposed variants of CoDE algorithm are able to outperform the original CoDE. It is interesting that rather worse settings used in CoDEFCR2 achieve slightly better results than CoDEFCR1 with the best combination of F and CR. The proposed methods outperform the winner of CEC 2011 competition (GA-MPC) in 7 out of 22 problems. More proper combinations of control parameters in adaptive DE variants will be studied in further research.
References 1. Brest, J., Mauˇcec, M.S., Boˇskovi´c, B.: Single objective real-parameter optimization: algorithm jSO. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 1311–1318 (2017) 2. Bujok, P.: Migration model of adaptive differential evolution applied to real-world problems. Artificial Intelligence and Soft Computing–Part I. Lecture Notes in Computer Science, vol. 10841, pp. 313–322. In: 17th International Conference on Artificial Intelligence and Soft Computing ICAISC. Zakopane, Poland (2018) 3. Bujok, P., Tvrd´ık, J., Pol´ akov´ a, R.: Differential evolution with exponential crossover revisited. In: Matouˇsek, R. (ed.) MENDEL, 22nd International Conference on Soft Computing, pp. 17–24. Czech Republic, Brno (2016) 4. Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution-an updated survey. Swarm Evol. Comput. 27, 1–30 (2016) 5. Das, S., Suganthan, P.N.: Problem Definitions and Evaluation Criteria for CEC 2011 Competition on Testing Evolutionary Algorithms on Real World Optimization Problems. Tech. rep. Jadavpur University, India and Nanyang Technological University, Singapore (2010) 6. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15, 27–54 (2011) 7. Feoktistov, V.: Differential Evolution in Search of Sotution. Springer (2006) 8. Neri, F., Tirronen, V.: Recent advances in differential evolution: a survey and experimental analysis. Artif. Intell. Rev. 33, 61–106 (2010) 9. Price, K.V., Storn, R., Lampinen, J.: Differential Evolution: A Practical Approach to Global Optimization. Springer (2005) 10. Storn, R., Price, K.V.: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11, 341–359 (1997)
212
P. Bujok et al.
11. Tang, L., Dong, Y., Liu, J.: Differential evolution with an individual-dependent mechanism. IEEE Trans. Evol. Comput. 19(4), 560–574 (2015) 12. Tvrd´ık, J.: Competitive differential evolution. In: Matouˇsek, R., Oˇsmera, P. (eds.) MENDEL 2006, 12th International Conference on Soft Computing, pp. 7–12. University of Technology, Brno (2006) 13. Tvrd´ık, J., Pol´ akov´ a, R., Veselsk´ y, J., Bujok, P.: Adaptive variants of differential evolution: towards control-parameter-free optimizers. In: Zelinka, I., Sn´ aˇsel, V., Abraham, A. (eds.) Handbook of Optimization–From Classical to Modern Approach. Intellingent Systems Reference Library, vol. 38, pp. 423–449. Springer, Berlin Heidelberg (2012) 14. Wang, Y., Cai, Z., Zhang, Q.: Differential evolution with composite trial vector generation strategies and control parameters. IEEE Trans. Evol. Comput. 15, 55– 66 (2011) 15. Wang, Y., Li, H.X., Huang, T., Li, L.: Differential evolution based on covariance matrix learning and bimodal distribution parameter setting. Appl. Soft Comput. 18, 232–247 (2014) 16. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997)
Semidefinite Programming Based Convex Relaxation for Nonconvex Quadratically Constrained Quadratic Programming Rujun Jiang1
and Duan Li2(B)
1
2
School of Data Science, Fudan University, Shanghai, China [email protected] School of Data Science, City University of Hong Kong, Hong Kong, China [email protected]
Abstract. In this paper, we review recent development in semidefinite programming (SDP) based convex relaxations for nonconvex quadratically constrained quadratic programming (QCQP) problems. QCQP problems have been well known as NP-hard nonconvex problems. We focus on convex relaxations of QCQP, which forms the base of global algorithms for solving QCQP. We review SDP relaxations, reformulationlinearization technique, SOC-RLT constraints and various other techniques based on lifting and linearization.
1
Introduction
We consider in this survey paper the following class of quadratically constrained quadratic programming (QCQP) problems: (P)
min xT Q0 x + cT0 x s.t. xT Qi x + cTi x + di ≤ 0, i = 1, . . . , l, aTj x ≤ bj , j = 1, . . . , m,
where Qi is an n × n symmetric matrix, ci ∈ Rn , i = 0 . . . , l, di ∈ R, i = 1 . . . , l and aj ∈ Rn , bj ∈ R, j = 1, . . . , m. QCQP is in general NP-hard [14,18], although some special cases of QCQP are polynomially solvable [3–6,17]. A global optimal solution of QCQP is generally hard. Branch and bound methods have been developed in the literature to find exact solutions for QCQP problems [8,13], whose efficiency depend on two major factors: the quality of the relaxation bound and its associated computational cost. This paper focuses on a review on various semidefinite programming (SDP) based convex relaxations strengthened with different valid inequalities for QCQP problems. We will also point out that for several special problems, Supported by Shanghai Sailing Program 18YF1401700, Natural Science Foundation of China (NSFC) 11801087 and Hong Kong Research Grants Council under Grants 14213716 and 14202017. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 213–220, 2020. https://doi.org/10.1007/978-3-030-21803-4_22
214
R. Jiang and D. Li
SDP relaxations enhanced with valid inequalities may be tight for the original problems, i.e., there exists a rank one SDP solution and an optimal solution of the original problem can be recovered from the SDP solution. The remaining of this paper is organized as follows. In Sect. 2, we review various valid inequalities to strengthen the basic SDP relaxation. We conclude the paper in Sect. 3. Notations We use v(·) to denote the optimal value of problem (·). Let x √ T denote the Euclidean norm of x, i.e., x= x x, and AF denote the Frobenius norm of a matrix A, i.e., AF = tr(AT A). The notation A 0 refers that matrix A is a positive semidefinite and symmetric square matrix and the notation A B for matrices A and B implies that A − B 0 and both A and B are symmetric. The inner product of two symmetric matrices is defined by A · B = i,j=1,...,n Aij Bij , where Aij and Bij are the (i, j) entries of A and B, respectively. We also use Ai,· and A·,i to denote the ith row and column of matrix A, respectively. For a positive semidefinite n × n matrix A with spectral decomposition A = U T DU , where D is a n × n diagonal matrix and U is an 1 1 1 T 2 2 2 n × n orthogonal matrix, √ we use notation A to denote U D U , where D is a diagonal matrix with Dii being its ith entry.
2
Convex Relaxations Based only on Constraints
In this section, we review the basic SDP relaxation for problem (P) and its strengthened variants with RLT, SOC-RLT, GSRT and other valid inequalities. T T T By lifting x to matrix X T=xx and relaxing X = xx to X xx , which 1x is further equivalent to 0 due to the Schur complement, we have the x X following basic SDP relaxation for problem (P): (SDP)
min Q0 · X + cT0 x s.t. Qi · X + cTi x + di ≤ 0, i = 1, . . . , l, aTj x
≤ bj , j = 1, . . . , m, 1 xT 0, x X
(1) (2) (3)
where Qi · X = trace(Qi X) is the inner product of matrices Qi and X. Direct SDP relaxations are often loose, except for some special case, e.g., there is only one quadratic [17] and homogeneous quadratic problem with two quadratic constraints [19]. The optimal solution of the SDP relaxation can also be used to generate a feasible solution for the original QCQP problem. One of the most famous case is the max cut problem, where a randomization solution is expected to have an objective value that is at least 0.878 of the original objective value [11]. Further investigation shows that the SDP relaxation is the conic dual of
SDP Based Convex Relaxation for Nonconvex QCQP
215
the Lagrangian dual of problem (P), (L) max τ s.t.
Q0 cT 0 2
c0 2
−τ
−
l
λi
i
Qi cT i 2
ci 2
di
−
m i
μj
0 aT j 2
aj 2
−bj
0,
λi ≥ 0, i = 1, . . . , l, μj ≥ 0, j = 1, . . . , m, also known as the Shor’s relaxation [16]. The strong duality holds for (SDP) when (SDP) is bounded from below and Slater condition holds for (SDP). We next review valid inequalities that have been considered to strengthen (SDP) in the literature. Sherali and Adams [15] first introduced the concept of “reformulation-linearization technique” (RLT) to formulate a linear programming relaxation for problem (P). The RLT [15] linearizes the product of any pair of linear constraints, i.e., (bi − aTi x)(bj − aTj x) = bi bj − (bj aTi + bi aTj )x + aTi xxT aj ≥ 0. A tighter relaxation for problem (P) can be obtained via enhancing (SDP) relaxation with the RLT constraints: (SDPRLT )
min Q0 · X + cT0 x s.t. (1), (2), (3), ai aTj · X + bi bj − bj aTi x − bi aTj x ≥ 0, ∀1 ≤ i < j ≤ m. (4)
Anstreicher in [1] proposed a theoretical analysis for successfully applying RLT constraints to remove a large portion of the feasible region for the relaxation, and suggested that a combination of SDP and RLT constraints leads to a tighter bound. From now on, we assume that Qi is not a zero matrix for i = 1, . . . , l. We further partition the quadratic constraints into the following two groups: C = {i : Qi is positive semidefinite, i = 1, . . . , l}, N = {i : Qi is not positive semidefinite, i = 1, . . . , l}, and denote k (k ≤ l) as the cardinality of C. Sturm and Zhang [17] showed that combining the so-called SOC-RLT constraints and basic SDP relaxation can solve the problem of minimizing a quadratic objective function subject to a convex quadratic constraint and a linear constraint exactly. More specifically, they rewrote a convex quadratic constraint as a second order cone (SOC) constraint and linearized the product of the SOC and linear constraints. It has been shown in [7] and [17] that SOC-RLT constraints can be used to strengthen the convex relaxation (SDPRLT ) for general QCQP problems. To obtain the SOCRLT valid inequality, we need first decompose a positive semidefinite matrix Qi as Qi = BiT Bi , i ∈ C and rewrite the convex quadratic constraint in an SOC
216
R. Jiang and D. Li
form, i.e.,
xT Qi x ≤ −di − cTi x ⇒ −di − cTi x ≥ 0 ⇒ xT Qi x ≤ −di − cTi x
1
Bi x
≤ (−di − cTi x + 1).
1 T
2
2 (−di − ci x − 1)
(5)
Multiplying the linear term bj − aTj x ≥ 0 to both sides of the above SOC yields the following valid inequality,
1 Bi x T T
(bj − aTj x)
1 (1 + di + cT x) ≤ 2 (bj − aj x)(1 − di − ci x). i 2 Linearization of the above inequality is the following SOC-RLT constraint,
Bi (bj x − Xaj )
1
(−cTi Xaj + (bj cTi − di aTj − aTj )x + (1 + di )bj )
(6) 2 ≤ 12 (cTi Xaj + (di aTj − aTj − bj cTi )x + (1 − di )bj ), i ∈ C, j = 1, . . . , m. Enhancing (SDPRLT ) with the SOC-RLT constraints gives rise to a tighter relaxation: (SDPSOC-RLT ) min Q0 · X + cT0 x s.t. (1), (2), (3), (4), (6). Recently, Burer and Yang [9] demonstrated that the SDP+RLT+(SOC-RLT) relaxation has no gap in an extended trust region problem of minimizing a quadratic function subject to a unit ball and multiple linear constraints, where the linear constraints do not intersect with each other in the interior of the ball. Stimulated by the construction of SOC-RLT constraints, the authors in [12] derive the GSRT constraints from nonconvex quadratic constraints and linear constraints. They first decompose each indefinite matrix in quadratic constraints according to the signs of its eigenvalues, i.e., Qi = LTi Li − MiT Mi , i ∈ N , where Li is corresponding to the positive eigenvalues and Mi is corresponding to the negative eigenvalues. One of such decompositions is the spectral decomn−p+r position, Qi = j=1 λij vij viTj , where λi1 ≥ λi2 · · · λir > 0 > λip+1 ≥ · · · ≥ λin , 0 ≤ r ≤ p < n, and correspondingly Li = ( λi1 vi1 , . . . , λir vir )T , Mi = ( −λip+1 vip+1 , −λin vin ). They introduced an augmented zi to reform problem (P) as follows, (RP)
min xT Q0 x + cT0 x s.t. xT Qi x + cTi x + di ≤ 0, i = 1, . . . , l,
≤ zi , i ∈ N ,
1 T Li x
2 (ci x + di + 1)
= zi , i ∈ N ,
1 T Mi x
2 (ci x + di − 1) aTj x ≤ bj , j = 1, . . . , m.
(7) (8)
SDP Based Convex Relaxation for Nonconvex QCQP
217
X S x Denote = (xT z T ). We then relax the intractable nonconvex ST Z z X S X S x x T T constraint = (x z ) to (xT z T ), which is z z ST Z ST Z equivalent to the following LMI by the Schur complement, ⎛ ⎞ 1 xT z T ⎝ x X S ⎠ 0. z ST Z
Multiplying bj − aTj x and Li x, 12 (cTi x + di + 1) ≤ zi , we further get
T
1 T Li x(bj − aj x) T
≤ zi (bj − aTj x),
(c x + d + 1)(b − a x) i j j 2 i
Li bj x − Li xxT aj T
i.e.,
1 (cTi (bj x − xxT aj ) + (di + 1)(bj − aTj x)) ≤ zi bj − zi x aj . 2 Then the linearization of the above formula gives rise to
Li bj x − Li Xaj T
1 T
≤ zi bj − S·,i aj . T
(c (b x − Xa ) + (d + 1)(b − a x)) j j i j j 2 i
(9)
Since the equality constraint (8) is nonconvex and intractable, relaxing (8) to inequality yields the following tractable SOC constraint,
1 T Mi x
≤ zi . (10)
2 (ci x + di − 1) Similarly, linearizing the product of (10) and bj − aTj x gives rise to the following valid inequalities
Mi bj x − Mi Xaj T
1 T
≤ zi bj − S·,i aj . (11) T
2 (ci (bj x − Xaj ) + (di − 1)(bj − aj x)) We also linearize the quadratic form of (8),
2
= zi2 ,
1 T Mi x
(c x + d − 1) i 2 i to a tractable linearization, 1 Zi−k,i−k = X · MiT Mi + (ci cTi · X + (di − 1)2 + 2cTi x(di − 1)). (12) 4 Finally, (7), (9), (10), (11) and (12) together make up the GSRT constraints. With the GSRT constraint, we strengthen (SDPRLT ) to the following tighter relaxation: (SDPGSRT-A ) min Q0 · X + cT0 x s.t. (1), (2), (4), (6), (7), (9), (10), (11), (12) ⎞ ⎛ 1 xT z T ⎝ x X S ⎠ 0. z ST Z
218
R. Jiang and D. Li
The following theorem, which shows the relationship among all the above convex relaxations, is obvious due to the nested inclusion relationship of the feasible regions for this sequence of the relaxations. Theorem 1. v(SDP).
v(P) ≥ v(SDPGSRT-A ) ≥ v(SDPSOC-RLT ) ≥ v(SDPRLT ) ≥
A natural extension of GSRT is to apply a similar idea to linearize the product of a pair of SOC constraints. From the above paragraph, we see that SOC constraints can be generated by both convex and nonconvex (by adding an augmented variable zi ) constraints, denoting by C i x + ξ i ≤ li (x, z),
(13)
where li (x, z) is a linear function of x and z. Multiplying two SOC constraints yields the valid inequality,
s T t T
C xx (C ) + C s x(ξ t )T + ξ s xT (C t )T + ξ s (ξ t )T ≤ ls lt . (14) F Linearizing (14) yields the following constraint, termed SOC-SOC-RLT (SST) constraint,
s
C X(C t )T + C s x(ξ t )T + ξ s xT (C t )T + ξ s (ξ t )T ≤ βs,t , F where βs,t (X, S, Z) = (ζ s )T Xζ t + (ζ s )T Sη t + (ζ t )T Sη s + (η s )T Zη t + (θs ζ t + θt ζ s )T x + (θs η t + θt η s )T z + θs θt is a linear function of variables s, z, X, S, Z, which is linearized from ls (x, z)lt (x, z). We next review a recently proposed valid inequality, the KSOC valid inequalities, by linearizing the Kronecker products of semidefinite matrices derived from valid SOC constraints, which was first proposed in the recent work [2]. Anstreicher [2] introduced a new kind of constraint with an RLT-like technique for the well-known CDT problem [10], min xT Bx + bT x s.t. x ≤ 1, Ax + c ≤ 1, where B is an n × n symmetric matrix and A is an m × n matrix with full row rank. Using the Schur complement, the two quadratic constraints can be reformulated to the following LMIs, I Ax + c I x 0 and 0. (15) (Ax + c)T 1 xT 1 Anstreicher [2] proposed a valid LMI by linearizing the Kronecker product of the above two matrices, because the Kronecker product of any two positive semidefinite matrices is positive semidefinite. A drawback of direct linearizing the Kronecker product is that the matrix dimension, n2 × n2 , is too high for
SDP Based Convex Relaxation for Nonconvex QCQP
219
computation. To reduce the large dimension of the Kronecker matrix, he further proposed KSOC cuts to handle the problem of dimensionality. This idea can in fact be extended to the following two semidefinite matrices, ls (x, z)Ip hs (x) lt (x, z)Iq ht (x) and , (hs (x))T ls (x, z) (ht (x))T lt (x, z) which are derived from (and equivalent to) GSOC constraints in (13) by Schur complement, where hj (x) = C j x + ξ j , j = s, t. Linearizing the above Kronecker product yields KSOC valid inequalities.
3
Concluding Remark
In this survey paper, we have reviewed various valid inequalities to tighten the SDP relaxations for nonconvex QCQP problems. In fact, we can further rewrite the objective function as min τ and add a new constraint xT0 Q0 x0 + cT0 x ≤ τ , with a new variable τ . The original problem is then equivalent to minimizing τ and all the techniques developed in this paper can be applied to the new constraint xT0 Q0 x0 + c0 T x ≤ τ to achieve a tighter lower bound. A drawback convex relaxations in this paper is their large number of valid inequalities, which prevent them from efficient computation. A future direction should be to investigate how to find out the valid inequalities that are violated most and add them dynamically to solve the original problem.
References 1. Anstreicher, K.: Semidefinite programming versus the reformulation-linearization technique for nonconvex quadratically constrained quadratic programming. J. Global Optim. 43(2–3), 471–484 (2009) 2. Anstreicher, K.: Kronecker product constraints with an application to the twotrust-region subproblem. SIAM J. Optim. 27(1), 368–378 (2017) 3. Anstreicher, K., Chen, X., Wolkowicz, H., Yuan, Y.X.: Strong duality for a trustregion type relaxation of the quadratic assignment problem. Linear Algebr. Its Appl. 301(1–3), 121–136 (1999) 4. Anstreicher, K., Wolkowicz, H.: On lagrangian relaxation of quadratic matrix constraints. SIAM J. Matrix Anal. Appl. 22(1), 41–55 (2000) 5. Beck, A., Eldar, Y.C.: Strong duality in nonconvex quadratic optimization with two quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006) 6. Burer, S., Anstreicher, K.: Second-order-cone constraints for extended trust-region subproblems. SIAM J. Optim. 23(1), 432–451 (2013) 7. Burer, S., Saxena, A.: The MILP road to MIQCP. In: Mixed Integer Nonlinear Programming, pp. 373–405. Springer (2012) 8. Burer, S., Vandenbussche, D.: A finite branch-and-bound algorithm for nonconvex quadratic programming via semidefinite relaxations. Math. Program. 113(2), 259– 282 (2008) 9. Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear constraints. Math. Program. 149(1–2), 253–264 (2013)
220
R. Jiang and D. Li
10. Celis, M., Dennis, J., Tapia, R.: A trust region strategy for nonlinear equality constrained optimization. Numer. Optim. 1984, 71–82 (1985) 11. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM (JACM) 42(6), 1115–1145 (1995) 12. Jiang, R., Li, D.: Convex relaxations with second order cone constraints for nonconvex quadratically constrained quadratic programming (2016) 13. Linderoth, J.: A simplicial branch-and-bound algorithm for solving quadratically constrained quadratic programs. Math. Program. 103(2), 251–282 (2005) 14. Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Global Optim. 1(1), 15–22 (1991) 15. Sherali, H.D., Adams, W.P.: A reformulation-linearization technique for solving discrete and continuous nonconvex problems, vol. 31. Springer Science & Business Media (2013) 16. Shor, N.Z.: Quadratic optimization problems. Sov. J. Comput. Syst. Sci. 25(6), 1–11 (1987) 17. Sturm, J.F., Zhang, S.: On cones of nonnegative quadratic functions. Math. Oper. Res 28(2), 246–267 (2003) 18. Vavasis, S.A.: Quadratic programming is in NP. Inf. Process. Lett. 36(2), 73–77 (1990) 19. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1), 245–267 (2003)
Solving a Type of the Tikhonov Regularization of the Total Least Squares by a New S-Lemma Huu-Quang Nguyen1,2 , Ruey-Lin Sheu2(B) , and Yong Xia3 1
Department of Mathematics, Vinh University, Vinh, Vietnam [email protected] 2 Department of Mathematics, National Cheng Kung University, Tainan, Taiwan [email protected] 3 State Key Laboratory of Software Development Environment School of Mathematics and System Sciences, Beihang University, Beijing, China [email protected]
Abstract. We present a new S-lemma with two quadratic equalities and use it to minimize a special type of polynomials of degree 4. As a result, by the Dinkelbach approach with 2 SDP’s (semidefinite programming), the minimum value and the minimum solution to the Tikhonov regularization of the total least squares problem with L = I can be nicely obtained.
Keywords: S-lemma with equality least squares · Dinkelbach method
1
· Tikhonov regularization · Total
Introduction
The well-known S-lemma due to Yakubovich [15] is a fundamental tool in control theory, optimization and robust analysis. Given two quadratic functions f (x) = xT P x+2pT x+p0 and g(x) = xT Qx+2q T x+q0 having symmetric matrices P and Q, the S-lemma asserts that, if g(x) ≤ 0 satisfies Slater’s condition (i.e., g(¯ x) 0 is a penalty parameter, consider the following problem min {E2 + r2 + ρLx2 : (A + E)x = b + r}
E,r,x
(TRTLS)
where E ∈ Rm×n , r and x ∈ Rn . Then, (TRTLS) can be transformed to the following sum-of-ratios problem: min {E2 + r2 + ρLx2 : (A + E)x = b + r} = min min{E2 + r2 + ρLx2 : (A + E)x = b + r} E,r,x
x
= minn x∈R
E,r
||Ax − b||2 + ρ||Lx||2 ||x||2 + 1
(3)
For L = I, Beck and Ben-Tal in [1] then used the Dinkelbach method [4] incorporating with the bisection search method to solve (3). We show, in Sect. 3, that (3) can be resolved by solving two SDP’s, with one SDP to obtain its optimal value and the other one for the optimal solution. There is no need for any bisection method. The remainder of this study is organized as follows: In Sect. 2, we provide the proof for Theorem 1 and solve Problem (PoD4). In Sect. 3, we use the Dinkelbach method incorporating two SDP’s to solve (TRTLS) for the case L = I. Finally, we have a short discussion in Sect. 4 for future extensions.
2
Proof for the New Version of the S-Lemma
The proof was done by using an important result by Polyak [8] that, under Condition (1), the joint numerical range (f (x), g(x)) is a convex subset in R2 . Proof. (G1 ) =⇒ (G2 ): By a result in [8, Theorem 2.2], the set D1 = {(z1 , z2 ) | f (x) − z1 = 0, g(x) − z2 = 0, x ∈ Rn } ⊂ R2 ,
(4)
is convex. Let D2 = {(z1 , z2 )| z1 a + z2 b ≤ c}.
(5)
and it is easy to see that D2 ⊂ R2 is also convex. Then, the statement (G1 ) can be recast as (z1 , z2 ) ∈ D1 ∩ D2 ⇒ F (z) − γ = (z T Θz + θT z − γ) ≥ 0. Equivalently, it means that (D1 ∩ D2 ) ∩ {(z1 , z2 ) | Fγ (z1 , z2 ) < 0} = ∅. Due to Condition (2) that Θ 0, the set {(z1 , z2 ) | F (z1 , z2 ) − γ < 0} is convex. ¯ 2 + γ¯ = 0} separates ¯ z1 + βz Therefore, there exist α ¯ , β¯ such that {(z1 , z2 ) | α D1 ∩D2 and {(z1 , z2 ) | F (z1 , z2 )−γ < 0}. Without loss the generality, we assume that ¯ 2 + γ¯ ≥ 0, ∀ (z1 , z2 ) ∈ D1 ∩ D2 , (6) α ¯ z1 + βz
224
H.-Q. Nguyen et al.
¯ 2 + γ¯ < 0, ∀ (z1 , z2 ) ∈ {(z1 , z2 ) | F (z1 , z2 ) − γ < 0}. α ¯ z1 + βz
(7)
From (7), it implies that ¯ 2 + γ¯ ≥ 0 ⇒ F (z1 , z2 ) − γ ≥ 0. αz ¯ 1 + βz By S-lemma, there exists t ≥ 0 such that ¯ 2 + γ¯ ) ≥ 0, ∀ (z1 , z2 ) ∈ R2 . F (z1 , z2 ) − γ − t(¯ αz1 + βz
(8)
If t = 0, then with α = β = 0, μ = 0, (G2 ) holds. If t > 0, by (6), the system ¯ 2 + t¯ γ < 0, t¯ αz1 + tβz z1 a + z2 b − c ≤ 0, (z1 , z2 ) ∈ D1 is not solvable. By Farkas theorem (see [9, Theorem 21.1], [10, Sect. 6.10 21.1], [6, Theorem 2.1]), there exists μ ∈ Rm + such that ¯ 2 + t¯ γ + μT (z1 a + z2 b − c) ≥ 0, ∀ (z1 , z2 ) ∈ D1 . t¯ αz1 + tβz Therefore, we have ¯ t¯ αf (x) + tβg(x) + t¯ γ + μT (f (x)a + g(x)b − c) ≥ 0, ∀ x ∈ Rn . ¯ Then, α, β = μT b + tβ. Let α = μT a + t¯
(9)
¯ (9) ⇔ (μT a + tα)f ¯ (x) + (μT b + tβ)g(x) + t¯ γ − μT c ≥ 0 T ¯ − α)z1 + (μT b + tβ¯ − β)z2 + t¯ γ − μT c ≥ 0 ⇔ αf (x) + βg(x) + (μ a + tα ¯ 2 − t¯ ⇔ α(f (x) − z1 ) + β(g(x) − z2 ) + μT (z1 a + z2 b − c) ≥ −tαz ¯ 1 − tβz γ.
Combining (8) and (10), we get (G2 ). (G2 ) =⇒ (G1 ): It is trivial. 2.1
(10)
Optimizing a Class of Polynomials of Degree 4 (PoD4)
Applying Theorem 1, we can now solve the problem (PoD4) by solving the SDP (11) below under the assumption that f, g satisfy Condition (1) whereas θ1 , θ2 , θ3 ∈ R satisfy condition (2). min G(x) = θ1 f (x)2 + 2θ2 f (x)g(x) + θ3 g(x)2 + θ4 f (x) + θ5 g(x)
x∈Rn
=
min
{f (x)=z1 , g(x)=z2 }
F (z1 , z2 )
= max γ : {(z1 , z2 , x)| f (x) = z1 , g(x) = z2 , F (z1 , z2 ) − γ < 0} = ∅ = max γ : {(f (x) = z1 , g(x) = z2 )} ⇒ {F (z1 , z2 ) − γ ≥ 0} = max γ : F (z1 , z2 ) − γ + α(f (x) − z1 ) + β(g(x) − z2 ) ≥ 0 γ, α, β∈R ⎫ ⎧ ⎛ ⎞ θ4 −α θ1 θ2 ⎪ ⎪ 2 ⎪ ⎪ [0] ⎬ ⎨ θ5 −β ⎜ θ2 ⎟ θ 3 ⎟ 0 (11) 2 γ : ⎜ = max ⎝ [0]T αP + βQ αp + βq ⎠ ⎪ γ, α, β∈R ⎪ ⎪ ⎪ ⎭ ⎩ θ4 −α θ5 −β T T αp + βq αp + βq − γ 0 0 2 2
Solving a Type of the Tikhonov Regularization
3
225
Dinkelbach Method for Solving (TRTLI)
It is interesting to see that problem (PoD4) allows us to solve the total least squares with Tikhonov identical regularization problem (see [1,16]) via solving two SDPs. Let us consider the following sum-of-quadratic-ratios problem. θ1 f (x)2 + θ4 f (x) + θ + θ3 g(x) + 2θ2 f (x) x∈R g(x) + γ θ1 f (x)2 + 2θ2 f (x)g(x) + θ3 g(x)2 + (θ4 + 2γθ2 )f (x) + γθ3 g(x) + θ = minn x∈R g(x) + γ h(x) (12) = minn x∈R l(x) minn
where f, g are quadratic functions, θ1 , θ2 , θ3 ∈ R satisfy the following condition: θ1 θ2 Matrix
0, Q 0 and γ > 0. (C 2) θ2 θ3 In fact, the problem (12) covers the problem (TRTLSI) in [1,16] as a special case. With γ = 1, θ = 0, θ1 = 0, θ2 = 0, θ3 = ρ, θ4 = 1, f (x) = Ax + b2 , g(x) = x2 then (12) reduces to (TRTLSI). Notice that the form (12) is a single-ratio h(x)/l(x) fractional programming problem. It can be solved by the well-known Dinkelbach method [4]. To this end, define π(t) = minn {h(x) − tl(x)} x∈R
= minn {θ1 f (x)2 + 2θ2 f (x)g(x) + θ3 g(x)2 x∈R
+ (θ4 + 2γθ2 )f (x) + (γθ3 − t)g(x) + θ − tγ}. It has been proved in [4] that π(t) is strictly decreasing and min {
x∈Rn
h(x) } = t∗ l(x)
if and only if
min {h(x) − t∗ l(x)} = π(t∗ ) = 0
x∈Rn
(13)
Since π(t) is strictly decreasing, then we conclude that t∗ is maximum of all t such that π(t) ≥ 0. Then, we can recast (12) to become t∗ = max{t : π(t) ≥ 0} = max {t : minn (h(x) − tl(x) ≥ 0)} t∈R
t∈R
x∈R
= max {t : h(x) − tl(x) ≥ 0, ∀x ∈ Rn } t∈R
= max {t : θ1 f (x)2 + 2θ2 f (x)g(x) + θ3 g(x)2 + t∈R
+(θ4 + 2γθ2 )f (x) + (γθ3 − t)g(x) + θ − tγ ≥ 0, ∀x ∈ Rn } = max {t : θ1 z12 + 2θ2 z1 z2 + θ3 z22 + (θ4 + 2γθ2 )z1 + t∈R
+(γθ3 − t)z2 + θ − tγ ≥ 0, (z1 = f (x), z2 = g(x))} = max t : θ1 z12 + 2θ2 z1 z2 + θ3 z22 + (θ4 + 2 θ2 )z t, α, β∈R
+(γθ3 − t)z2 + θ − tγ + α(f (x) − z1 ) + β(g(x) − z2 ) ≥ 0
(14)
226
H.-Q. Nguyen et al.
where the last equation (14) is due to Theorem 1 by re-defining the notations as θ4 + 2γθ2 := θ4 , γθ3 − t := θ5 , θ − tγ := −γ. Moreover, we can write (14) as the following SDP: ⎛ θ4 +2γθ2 −α ⎞ θ1 θ2 2 [0] γθ3 −t−β ⎟ ⎜ θ2 θ3 ⎜ ⎟ 0, 2 (15) t∗ = ⎝ [0] αP + βQ αp + βq ⎠ θ4 +2γθ2 −α γθ3 −t−β αpT + βq T ξ 2 2 where ξ = αp0 + βq0 + θ − tγ. In other words, the optimal value t∗ of (12), and thus the optimal value of the problem (TRTLSI), can be computed through solving the SDP (15). After getting the optimal value t∗ of (12) from (15), by (13), we can find the corresponding optimal solution x∗ by solving the following problem min {h(x) − t∗ l(x)}
x∈Rn
(16)
where h(x) − t∗ l(x) = θ1 f (x)2 + 2θ2 f (x)g(x) + θ3 g(x)2 + (θ4 + 2γθ2 )f (x) + (γθ3 − t∗ )g(x) + θ − t∗ γ. Since (16) is a special form of (PoD4), therefore we are able to get x∗ by solving another SDP similar to (11).
4
Discussion
In this paper, we propose a set of sufficient conditions (1)–(2) under which (G1 ) ∼ (G2 ). It can be easily verified that, when m = 1, a = 1, b = c = θ1 = . . . = θ4 = γ = 0, θ5 = 1, (G1 ) ∼ (G2 ) reduces to (S1 ) ∼ (S2 ) and we get the classical S-lemma. Similarly, (G1 ) ∼ (G2 ) covers (I1 ) ∼ (I2 ) with m = 2, a = (1, −1)T , b = (0, 0)T , c = (v0 , −u0 )T , θ1 = θ2 = θ3 = θ4 = γ = 0 and θ5 = 1. Moreover, if we further have u0 = v0 = 0, (G1 ) ∼ (G2 ) becomes (E1 ) ∼ (E2 ). In other words, if the sufficient conditions (1)–(2) van be removed, (G1 ) ∼ (G2 ) would be the most general results summarizing all previous results on S-lemma so far.
References 1. Beck, A., Ben-Tal, A.: On the solution of the Tikhonov regularization of the total least squares problem. SIAM J. Optim. 17(1), 98–118 (2006) 2. Beck, A., Eldar, Y.C.: Strong duality in nonconvex quadratic optimization with two quadratic constraint. SIAM J. Optim. 17(3), 844–860 (2006) 3. Derinkuyu, K., Pınar, M.C ¸ .: On the S-procedure and some variants. Math. Methods Oper. Res. 64(1), 55–77 (2006) 4. Dinkelbach, W.: On nonlinear fractional programming. Manag. Sci. 13, 492–498 (1967) 5. Nguyen, V.B., Sheu, R.L., Xia, Y.: An SDP approach for quadratic fractional problems with a two-sided quadratic constraint. Optim. Methods Softw. 31(4), 701–719 (2016)
Solving a Type of the Tikhonov Regularization
227
6. Polik, I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371–418 (2007) 7. Pong, T.K., Wolkowicz, H.: The generalized trust region subprobelm. Comput. Optim. Appl. 58, 273–322 (2014) 8. Polyak, B.T.: Convexity of quadratic transformations and its use in control and optimization. J. Optim. Theory Appl. 99(3), 553–583 (1998) 9. Rockefellar, R.T.: Convex Analysis. Princeton University Press (1970) 10. Stoer, J., Witzgall, C.: Convexity and Optimization in Finite Dimensions, vol. I. Springer-Verlag, Heidelberg (1970) 11. Stern, R., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM J. Optim. 5(2), 286–313 (1995) 12. Wang, S., Xia, Y.: Strong duality for generalized trust region subproblem: S-lemma with interval bounds. Optim. Lett. 9(6), 1063–1073 (2015) 13. Xia, Y., Wang, S., Sheu, R.L.: S-lemma with equality and its applications. Math. Program. Ser. A. 156(1), 513–547 (2016) 14. Tuy, H., Tuan, H.D.: Generalized S-lemma and strong duality in nonconvex quadratic programming. J. Global Optim. 56, 1045–1072 (2013) 15. Yakubovich, V.A.: S-procedure in nonlinear control theory. Vestn. Leningr. Univ. 1, 62–77 (1971). (in Russian) 16. Yang, M., Yong, X., Wang, J., Peng, J.: Efficiently solving total least squares with Tikhonov identical regularization. Comput. Optim. Appl. 70(2), 571–592 (2018)
Solving Mathematical Programs with Complementarity Constraints with a Penalization Approach Lina Abdallah1(B) , Tangi Migot2 , and Mounir Haddou3 1
3
Lebanese University, Tripoli, Lebanon lina [email protected] 2 University of Guelph, Guelph, ON, Canada [email protected] INSA-IRMAR, UMR-CNRS 6625, Rennes, France [email protected]
Abstract. In this paper, we consider mathematical problems with complementarity constraints. To solve it, we propose a penalization approach based on concave and nondecreasing functions. We give the link between the penalized problem and our original problem. This approach was already used in [3]. The main difference is that, we do not use any constraint qualification assumption. Some numerical results are presented to show the validity of this approach. Keywords: Constrained optimization Mpcc · Penalty function
1
· Nonlinear programming ·
Introduction
The Mathematical Program with Equilibrium Constraints (MPEC) is a constrained optimization problem in which the constraints include equilibrium constraints, such as variational inequalities or complementarity conditions. In this paper, we consider a special case of MPEC, the Mathematical Program with Complementarity Constraints (MPCC) in which the equilibrium constraints are complementarity constraints. MPCC is an important class of problems since they arise frequently in applications in engineering design, in economic equilibrium and in multilevel games [18]. One main source of MPCC comes from bilevel programming problems, which have numerous applications in practice [27]. A way to solve a standard nonlinear programming problem is to solve its Karush-Kuhn-Tucker (KKT) system by using some numerical methods such as Newton-type methods. However, the classical MFCQ that is very often used to guarantee convergence of algorithms is violated at any feasible point when the MPCC is treated as a standard nonlinear programming problem, a local minimizer of MPCC may not be a solution of the classical KKT system. This is partly due to the geometry of the complementarity constraint that always has an empty relative interior. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 228–237, 2020. https://doi.org/10.1007/978-3-030-21803-4_24
Solving Mathematical Programs with Complementarity Constraints
229
A wide range of numerical methods have been proposed to solve this problem such as relaxation methods [5,6], interior-point methods [16,20,25], penalty methods [10,18,24], SQP methods [8], dc methods [23], filter methods [15] and Levenberg-Marquardt methods [14]. In this study, following [3], we study a penalization method to solve the MPCC. We regularize the complementarity constraints by using concave and nondecreasing functions introduced in [9], and then penalize the constraints. This approach allows us to consider the regularization parameter as a variable of the problem. We prove that every cluster point of the KKT points of the penalty problem gives a local minimum for the MPCC. We improve the result from [3] by proving a convergence theorem without any constraint qualification, thus removing a restrictive assumption. Numerical tests on some randomly generated problems are studied to show the efficiency and robustness of this approach. This paper is organized as follows. In Sect. 2, we present some preliminaries on the smoothing functions and our formulation problem. We present our penalty method and gives the link between the penalized problem and the original problem in Sect. 3. The last section presents a set of numerical experiments concerning a simple number partitioning problem.
2
Preliminaries
In this section, we present some preliminaries concerning the regularization and approximation process. We consider the following problem ⎧ ∗ ⎨ f = min f (x, y) < x.y > = 0 (P ) ⎩ (x, y) ∈ D where f : IR2n → IR is continuously differentiable, D = [0, v]2n and < . > denotes the inner product on IRn . We made this choice for D only to simplify the exposition, we can consider any bounded set D. Now, we reformulate this problem by using a smoothing technique. This technique has been studied in the context of complementarity problems [1,2] and uses a family of non-decreasing, continuously differentiable and concave θ that satisfies θ(0) = 0, and ∀t > 0, lim θ(t) = 1 t→+∞
One generic way to build such functions is to consider non-increasing probability density functions f : IR+ → IR+ and then take the corresponding cumulative distribution functions t f (x)dx. ∀t ≥ 0, θ(t) = 0
By definition of f
lim θ(t) =
t→+∞
+∞
f (x)dx = 1 and θ(0) = 0
0
f (x)dx = 0. 0
230
L. Abdallah et al.
The hypothesis on f gives the concavity of θ. We introduce θε (t) := θ( εt ) for ε > 0. This definition is similar to the perspective functions in convex analysis. These functions satisfy θε (0) = 0, and ∀t > 0, lim θε (t) = 1. t0
Some interesting examples of this family for t ≥ 0 are: θε1 (t) = t
1 − e− ε or θε3 (t) =
log(1 + t) . log(1 + t + ε)
t , θ2 (t) = t+ε ε
Lemma 1. ∀x ∈ [0, v], ∀ε ∈ (0, ε0 ], there exists m > 0 such that |∂ε θε (x)| ≤
m . ε2
x Proof. Since θε (x) := θ( xε ), then ∂ε θε (x) = −x ε2 θε ( ε ). Now by the concavity of x θ, for x ≥ 0 we have 0 ≤ θε ( ε ) ≤ θε (0). Then − εm2 ≤ − εx2 θε ( xε ) ≤ 0 with m = xθε (0).
Using θε functions, we regularize each complementarity constraint by considering xi yi = 0, by θε (xi ) + θε (yi ) ≤ 1 (i = 1, . . . , n). We first transform the inequality constraints into equality constraints, by introducing some slack variables e: xi yi = 0,
by
(Gε (x, y, e))i := θε (xi ) + θε (yi ) + ei − 1 = 0.
The formulation of our approach can be written as follows ⎧ ⎨ min f (x, y) Gε (x, y, e) = 0 (Pε ) ⎩ (x, y, e) ∈ D where D = [0, v]2n × [0, 1]n . The limit problem (Pε ) for ε = 0 is noted (P). Moreover, (P) is equivalent to (P ), see [2, Lemma 2.1].
3
A Penalization Approach
In this section, we consider a penalization of (Pε ) through the following penal ization function fσ : D × [0, ε] → IR ⎧ if ε = Δ(z, ε) = 0 ⎪ ⎨ f (x, y) 1 fσ (z, ε) := f (x, y) + Δ(z, ε) + σβ(ε) if ε > 0 ⎪ 2ε ⎩ +∞ if ε = 0 and Δ(z, ε) = 0
Solving Mathematical Programs with Complementarity Constraints
231
where z := (x, y, e) and the feasibility violation Δ is defined by Δ(z, ε) := Gε (z) 2 . The function β : [0, ε] → [0, ∞) is continuously differentiable on (0, ε] and β(0) = 0 ( ε is fixed). This function was introduced in [11] in the smooth case. Remark 1. ∀z ∈ D , Δ(z, 0) = 0 ⇔ z is feasible for P ⇔ z is feasible for (P ).
Then, we consider the following penalized problem: min fσ (z, ε) (Pσ ) (z, ε) ∈ D × [0, ε] The term σβ(ε) allows to consider ε as a new optimization variable, and minimize simultaneously z and ε. Let us now to recall the definition of Mangasarian-Fromovitz (MFCQ). Definition 1. [21] We say that the Mangasarian-Fromovitz condition (MFCQ) for (Pε ) holds at z ∈ D if Gε (z) has full rank and there exists a vector p ∈ IR3n such that Gε (z)p = 0, where > 0 if zi = 0 (1) pi < 0 if zi = wi with
wi =
v if i ∈ {1 . . . 2n} 1 if i ∈ {2n + 1 . . . 3n}
The following lemma proves that MFCQ is satisfied whenever ε > 0. This is a great improvement with respect to [3] where this was a crucial assumption. Lemma 2. Let ∀ε > 0. Any z feasible for (Pε ) verifies MFCQ.
Proof. (i) Let z be a feasible point of (Pε ). The matrix Gε (z) ∈ IRn × IR3n defined as follows ⎛ ⎞ θε (x1 ) 0 . . . 0 θε (y1 ) 0 . . . 0 1 0 . . . 0 ⎜ 0 θ (x2 ) 0 . . . 0 θε (y2 ) 0 . . . 0 1 . . . 0 ⎟ ε ⎜ ⎟ Gε (z) = ⎜ . .. .. .. .. .. ⎟ ⎝ .. . . . . .⎠ . . . 0 θε (yn ) 0 . . . 0 1 0 . . . 0 θε (xn ) 0 is of full rank. (ii) We have to proves that there exits p ∈ IR3n such that Gε (z)p = 0 and pi verifies (1). Let p = (p1 , . . . , p3n ). Gε (z)p = 0 implies that
θε (xi )pi + θε (yi )pn+i + p2n+i = 0,
for i = 1, . . . , n.
(2)
The equality constraints Gε (z) = 0 gives θε (xj ) + θε (yj ) + ej − 1 = 0, for j = 1, . . . , n. So, we consider three cases:
232
L. Abdallah et al.
1. ej = 1, 2. ej = 0, 3. 0 < ej < 1. (1) For the first case, ej = 1, so xj = yj = 0, since θε (0) = 0. Replacing in (2), we obtain
θε (0)(pj + pn+j ) = −p2n+j .
We can take pj = pn+j = 1 > 0 and p2n+j = −2θε (0) < 0 (θε (0) > 0). So, MFCQ is verified in this case. (2) For the second case, ej = 0, we have to consider xj = 0, yj = 0, xj = 0, yj = 0 or xj = 0, yj = 0 since (θε (xj ) = 0 when xj = 0). “no” means that we don’t have any constraints on pj or pn+j .
(i) Taking pj = −1, pn+j = −1, in (2), we obtain: p2n+j = θε (v) + θε (v) > 0 (Since θε (v) > 0), and, MFCQ is verified. (2i) There is no constraints on pj , pn+j , only p2n+j should be positive. So, MFCQ is verified. θε (v) + 1 . So, MFCQ (3i) Taking pj = −1, p2n+j = 1, in (2), we have: pn+j = − θε (yj ) is verified. θ (0) + 1 (4i) Taking pj = 1, p2n+j = 1, in (2), we get: pn+j = − ε < 0. So, θε (v) MFCQ is verified. (5i,6i,7i,8i) As above, it’s easy to see that the MFCQ is verified. (3) In the third case, 0 < ej < 1, we can consider the same cases as (2), but additionally here there is no constraints on p2n+j . For all this cases, the condition MFCQ is verified.
The following theorem yields a condition to find a solution for (Pσ ). It also proves a direct link to (P ) (Table 1). Theorem 1. We suppose that
β (ε) ≥ β1 > 0 for 0 < ε < ε. Let (z k , εk ) be a KKT point of Pσk corresponding to σ = σk with σk ↑ ∞ as k → ∞ and let (z ∗ , ε∗ ) be a cluster point of {(z k , εk )} with finite fσ (z ∗ , ε∗ ). Then ε∗ = 0 and (x∗ , y ∗ ) is a local minimum of the MPCC problem. Proof. Let (z, ε) be a Kuhn Tucker point of Pσ with ε > 0, then there exist λ and μ ∈ IR3n+1 such that: (i) ∇fσ (z, ε) = λ − μ (ii) min(λi , zi ) = min(μi , wi − zi ) = 0, i = 1 . . . 3n (iii) λ3n+1 = min(μ3n+1 , ε − ε) = 0,
(3)
Solving Mathematical Programs with Complementarity Constraints
233
Table 1. Case ej = 0 xj
yj
pj
v
v
pn+j p2n+j
0
(3i) xj = v
0 < yj < v 0
(4i) 0
yj = v
>0 0
(5i) 0 < xj < v yj = v
no 0
(6i) v
0
0
>0
(7i) 0 < xj < v 0
no >0
>0
0 < yj < v >0 no
>0
(i)
(8i) 0
where ∇fσ is the gradient of fσ with respect to (z, ε). Let (zk , εk ) be a sequence of KKT points of Pσk with εk = 0, ∀k and lim σk = +∞. k→+∞
Since D is compact, it holds (up to a subsequence) that lim εk = ε∗
k→+∞
lim zk = z∗ .
and
k→+∞
(3.i) yields to ∂ε fσk (z k , εk ) = −μ3n+1 ≤ 0. Then, if we denote Δk = Δ(z k , εk ), we have 1 1 ∂ε fσk = − 2 Δk + ∂ε Δk + σk β (εk ). 2εk 2εk Multiplying by 2ε3k , and using ∂ε fσk ≤ 0, we obtain
ε2k ∂ε Δk + 2ε3k σk β (εk ) ≤ εk Δk .
Then β (ε) ≥ β1 > 0, yields ε2k ∂ε Δk + 2ε3k σk β1 ≤ εk Δk . Since Δk , θε and ε2 ∂ε θε are bounded (by Lemma 1), σk → ∞ when k → ∞, we have ε∗ = 0. Let V be a neighborhood of (z ∗ , 0), for any z feasible for P such that (z, 0) ∈ V we have (4) fσ (z ∗ , 0) ≤ fσ (z, 0) = f (x, y) < +∞ as Δ(z, 0) = 0. Since fσ (z ∗ , 0) is finite, it follows that Δ(z ∗ , 0) = 0. So, < x∗ , y ∗ > = 0, and therefore, (x∗ , y ∗ ) is a feasible point of (P ). Thus, (4) gives f (x∗ , y ∗ ) = fσ (z ∗ , 0) ≤ fσ (z, 0) = f (x, y). Therefore, (x∗ , y ∗ ) is a local minimum of the MPCC problem.
234
4
L. Abdallah et al.
Numerical Results
Thanks to Theorem 1 and driving, in a smart way, σ to infinity, we also improved the numerical results with respect to [3]. We consider some generated partitioning problems, that can be cast as an MPCC. These simulations have been done using AMPL language [4], with the the SNOPT solver [13]. In all our tests, we use the √ same function β defined by β(ε) := ε [11]. Partitioning problem We describe now the formulation of this partitioning problem. We consider a set of numbers S = {s1 , s2 , s3 , . . . , sn }. The goal is to divide S into two subsets such that the subset sums are as close to each other as n possible. Let xj = 1 if sj is assigned to subset 1, 0 otherwise. Let sum1 = sj xj and sum2 =
n j=1
sj −
n
j=1
sj xj . The difference of the sums is
j=1
diff = sum2 − sum1 = c − 2
n
sj xj ,
(c =
j=1
n
sj ).
j=1
We minimize the square of diff ⎛ diff2 = ⎝c − 2
n
⎞2 sj xj ⎠ .
j=1
Note that diff2 can be written as follows diff2 = c2 + 4xT Qx, where qii = si (si − c),
qij = si sj .
Dropping the additive and multiplicative constants, we obtain the following optimization problem min xT Qx (U QP ) x ∈ {0, 1}n This formulation can be written as an MPCC problem min xT Qx (U QP ) x.(1 − x) = 0 To get some local solutions for (U QP ), we use our approach described in the previous section. We generated various randomly problems of size (n = 25, 50, 100, 150, 200, 250, 300), with the elements drawn randomly from the interval (50,100).
Solving Mathematical Programs with Complementarity Constraints
235
Table 2 summarizes the different informations concerning the computational effort of the SNOPT solver, by using respectively θ1 , θ2 and θ3 functions, we fix ε = 1. For each problem, we used 100 different initial points generated randomly from the interval [0, 1]. – Best Sum Diff: the best value of |
100
(Q ∗ round(x[i]) − 0.5 ∗ c|,
i=1
– Integrality measure: the max |round(xi ) − xi |, i
– nb: the number of tests such that the best sum is satisfied, – nb10 : the number of tests such that the sum: 100 (Q ∗ round(x[i]) − 0.5 ∗ c ≤ 10. i=1
Table 2. Results on Partitioning Problem using (θ1 , θ2 , θ3 ) n
Best Sum Diff nb
nb10
Integrality measure
25
(0, 0, 0)
(6, 6, 4) (25, 28, 22) (0, 0, 0)
50
(0.5, 0.5, 0.5)
(0, 0, 0) (18, 23, 13) (0, 9.190e − 23, 1.810e − 10)
100 (0.5, 0.5, 0.5)
(0, 0, 0) (19, 23, 23) (0, 0, 0)
150 (0, 0, 0)
(1, 1, 2) (13, 9, 10)
200 (0.5, 0.5, 0.5)
(0, 0, 0) (21, 23, 17) (1.640e − 16, 4.070e − 11, 0)
250 (0, 0, 0.5)
(1, 4, 0) (22, 26, 22) (1.122e − 7, 2.220e − 16, 0)
300 (0, 0, 0)
(4, 4, 2) (17, 19, 22) (0, 0, 1.312e − 9)
(0, 0, 0)
With our formulation, we obtain local solutions for (U QP ). We remark that by considering the three functions θ1 , θ2 or θ3 we obtain the same result in almost all the considered test problems. The obtained results for random problems validate our approach.
5
Conclusion
In this paper, we present a penalty approach to solve the mathematical program with complementarity constraints. Under some mild hypotheses and without any constraint qualification assumption, we prove the link between the penalized problem and the MPCC. Our approach tested on some randomly generated partitioning problems give very promising results and validated our approach.
236
L. Abdallah et al.
References 1. Abdallah, L., Haddou, M., Migot, T.: A sub-additive dc approach to the complementarity problem. Comput. Optim. Appl. 1–26 (2019) 2. Abdallah, L., Haddou, M., Migot, T.: Solving absolute value equation using complementarity and smoothing functions. J. Comput. Appl. Math. 327, 196–207 (2018) 3. Abdallah, L., Haddou, M.: An exact penalty approach for mathematical programs with equilibrium constraints. J. Adv. Math. 9, 2946–2955 (2014) 4. AMPL. http://www.ampl.com 5. Dussault, J.P., Haddou, M., Kadrani, A., Migot, T.: How to Compute a MStationary Point of the MPCC. Optimization online.org (2017) 6. Dussault, J.P., Haddou, M., Migot, T.: The New Butterfly Relaxation Method for Mathematical Programs with Complementarity Constraints. Optimization online.org (2016) 7. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. I and II. Springer, New York (2003) 8. Fletcher, R., Leyffer, S., Ralph, D., Scholtes, S.: Local convergence of SQP methods for mathematical programs with equilibrium constraints. SIAM J. Optim. 17(1), 259–286 (2006) 9. Haddou, M.: A new class of smoothing methods for mathematical programs with equilibrium constraints. Pac. J. Optim. 5, 87–95 (2009) 10. Hu, X.M., Ralph, D.: Convergence of a penalty method for mathematical programming with complementarity constraints. J. Optim. Theory Appl. 123, 365–390 (2004) 11. Huyer, W., Neumaier, A.: A new exact penalty function. SIAM J. Optim. 13(4), 1141–1158 (2003) 12. Facchinei, F., Jiang, H., Qi, L.: A smoothing method for mathematical programs with equilibrium constraints. Math. Program. 85, 81–106 (1995) 13. Gill, P., Murray, W., Saunders, M.: SNOPT, a large-scale smooth optimization problems having linear or nonlinear objectives and constraints. http://www-neos. mcs.anl.gov/neos/solvers 14. Guo, L., Lin, G.H., Ye, J.J.: Solving mathematical programs with equilibrium constraints. J. Optim. Theory Appl. 166, 234–256 (2015) 15. Leyfer, S., Munson, T.S.: A globally convergent filter method for MPECs, April 2009. ANL/MCS-P1457-0907 16. Leyfer, S., Lepez-Calva, G., Nocedal, J.: Interior methods for mathematical programs with complementarity constraints. SIAM J. Optim. 17(1), 52–77 (2006) 17. Lin, G.H., Fukushima, M.: Some exact penalty results for nonlinear programs and mathematical programs with equilibrum constraints. J. Optim. Theory Appl. 118, 67–80 (2003) 18. Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge, UK (1996) 19. Liu, G., Ye, J., Zhu, J.: Partial exact penalty for mathematical programs with equilibrum constraints. J. Set-Valued Anal. 16, 785–804 (2008) 20. Liu, X., Sun, J.: Generalized stationary points and an interior-point method for mathematical programs with equilibrium constraints. Math. Program. 101(1), 231–261 (2004) 21. Mangasarian, O.L., Fromovitz, S.: The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl. 17, 37–47 (1967)
Solving Mathematical Programs with Complementarity Constraints
237
22. Mangasarian, O.L., Pang, J.S.: Exact penalty functions for mathematical programs with linear complementary constraints. J. Glob. Optim. 5 (1994) 23. Marechal, M., Correa, R.: A DC (Difference of Convex functions) Approach of the MPECs. Optimization online.org (2014) 24. Monteiro, M.T.T., Meira, J.F.P.: A penalty method and a regularization strategy to solve MPCC. Int. J. Comput. Math. 88(1), 145–149 (2011) 25. Raghunathan, A.U., Biegler, L.T.: An interior point method for mathematical programs with complementarity constraints (MPCCs). SIAM J. Optim. 15(3), 720–750 (2005) 26. Ralph, D., Wright, S.J.: Some Properties of Regularization and Penalization Schemee for MPECS. Springer, New York (2000) 27. Ye, J.J., Zhu, D.L., Zhu, Q.J.: Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM J. Optim. 7, 481–507 (1997)
Stochastic Tunneling for Improving the Efficiency of Stochastic Efficient Global Optimization F´ abio Nascentes1,2 , Rafael Holdorf Lopez1(B) , Rubens Sampaio3 , and Eduardo Souza de Cursi4 1
3
Center for Optimization and Reliability in Engineering (CORE), Universidade Federal de Santa Catarina, Florian´ opolis 88037-000, Brazil [email protected] 2 ´ Departamento de Areas Acadˆemicas, Instituto Federal de Educa¸ca ˜o, Ciˆencia e Tecnologia de Goi´ as-IFG, Jata´ı 75804-714, Brazil [email protected] Departamento de Engenharia Mecˆ anica, PUC-Rio, Rio de Janeiro 22453-900, Brazil 4 Department Mecanique, Institut National des Sciences Appliquees (INSA) de Rouen, Saint Etienne du Rouvray Cedex 76801, France
Abstract. This paper proposes the use of a normalization scheme for increasing the performance of the recently developed Adaptive Target Variance Stochastic Efficient Global Optimization (sEGO) method. Such a method is designed for the minimization of functions that depend on expensive to evaluate and high dimensional integrals. The results showed that the use of the normalization in the sEGO method yielded very promising results for the minimization of integrals. Indeed, it was able to obtain more precise results, while requiring only a fraction of the computational budget of the original version of the algorithm. Keywords: Stochastic efficient global optimization · Stochastic tunneling · Global optimization · Robust design
1
Introduction
The optimization of a variety of engineering problems may require the minimization (or maximization) of expensive to evaluate and high dimensional integrals. These problems become more challenging if the resulting objective function turns out be not non convex and multimodal. Examples of this kind may arise, for example, from the maximization of the expected performance of a mechanical system, vastly applied in robust design [10], the multidimensional integral of Performance Based Design Optimization [2], or the double integral of Optimal Design of Experiment problems [3]. A powerful approach to handle these issues is the Efficient Global Optimization (EGO) [9], which exploits the information provided by the Kriging metamodel to iteratively add new points, improving the surrogate accuracy and at c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 238–246, 2020. https://doi.org/10.1007/978-3-030-21803-4_25
Stochastic Tunneling for Improving the Efficiency
239
the same time seeking its global minimum. For problems presenting variability (or uncertainty), the Stochastic Kriging (SK) [1] was developed. The use of SK within the EGO framework, or stochastic Efficient Global Optimization (sEGO), is relatively recent. For example, [11] benchmarked different infill criteria for the noisy case, while [8] compared Kriging-based methods in heterogeneous noise situations. Recently, a Adaptive Variance Target sEGO [4] approach was proposed for the minimization of integrals. It employs Monte Carlo Integration (MCI) to approximate the objective function and includes the variance of the error in the integration into the SK framework. This variance of the error is adaptively managed by the method, providing an efficient optimization process by rationally spending the available computational budget. This method reached promising results, specially in high dimensional problems [4]. This paper, thus, aims at enhancing the performance of the Adaptive Variance Target sEGO [4] by proposing the use of a normalization scheme during the optimization process. This normalization is the result of the so called stochastic tunneling approach, applied together with the Simulated Annealing (SA) for global Minimization of Complex Potential Energy Landscapes [12]. In the SA context, the physical idea behind the stochastic tunneling method is to allow the particle to “tunnel” high energy regions of domain, once it was realized that they are not relevant for the low-energy properties of the problem. In the sEGO context, it is expected that this normalization reduce the variability level of the regions of the design domain that have high values of the objective function as well as reduce the dependency of the quality of the search on the parameters of the SK. The rest of the paper is organized as follows: Sect. 2 presents the problem statement. The Adaptive Variance Target sEGO is presented in Sect. 3, together with the proposed normalization scheme. Numerical examples are studied in Sect. 6 to show the efficiency and robustness of the normalization. Finally, the main conclusion are listed in Sect. 7.
2
Problem Statement
The goal of this paper is to solve the problem of minimization of a function y, which depends on an integral as in min y(d) = φ(d, x)w(x)dx, (1) d∈S
Ω
where d ∈ n is the design vector, x ∈ nx is the parameter vector, φ : n × nx → is a known function, S is the design domain, w(x) is some known weight function (e.g. probability distribution) and Ω ⊆ nx is the integration domain (e.g. support of the probability distribution). We also assume here that the design domain S considers only box constrains. Here, we are interested in situations that: φ is a black box function and is computationally demanding,
240
F. Nascentes et al.
while the resulting objective function y is not convex and multimodal. Applying MCI to estimate y, we have y(d) ≈ y¯(d) =
nr 1 φ(d, x(i) ), nr i=1
(2)
where nr is the sample size and x(i) are sample points randomly drawn from distribution w(x). One of the advantages of MCI is that we are able to estimate the variance of the error of the approximation as: σ 2 (d) =
1
nr
nr (nr − 1)
i=1
(φi − y¯(d))2 ,
(3)
where φi = φ(d, x(i) ). Thus, by increasing the sample size nr (i.e. the number of replications), the variance estimate decreases and approximation in 2 gets closer to the exact value of Eq. 1.
3
The Adaptive Variance Target sEGO Approach
sEGO methods generally follow these steps: 1. Construction of the initial sampling plan; 2. Construction of the SK metamodel; 3. Addition of a new infill point to the sampling plan and return to step 2. For the construction of the initial sampling plan, we employ here the Latin hypercube scheme detailed in [5]. Then, Steps 2 and 3 are repeated until a stop criterion is met, e.g., maximum number of function evaluations. The manner in which the infill points are added in each iteration is what differs the different sEGO approaches. In this paper, we employ the AEI criterion as infill point strategy, since it provided promising results [11]. Step 2 constructs a prediction model, which is given by the SK in this paper, and its formulation is given in the next subsection. 3.1
Stochastic Kriging (SK)
The work of [1] proposed a SK accounting for the sampling variability that is inherent to a stochastic simulation. To accomplish this, they characterized both the intrinsic error inherent in a stochastic simulation and the extrinsic error that comes from the metamodel approximation. Then, the SK prediction can be seen as:
Stochastic Tunneling for Improving the Efficiency Trend
Extrinsic
241
Intrinsic
yˆ(di ) = M (di ) + Z(di ) + (di ) ,
(4)
where M (d) is the usual average trend, Z(d) accounts for the model uncertainty and is now referred as extrinsic noise. The additional term , called intrinsic noise, accounts for the simulation uncertainty or variability. In this paper, the variability is due to the error in the approximation of the integral from Eq. (1) caused by MCI. It is worth to recall here that MCI provides an estimation of the variance of this error. That is, we are able to estimate the intrinsic noise, and consequently, introduce this information into the metamodel framework. To accomplish this, we construct the covariance matrix of the intrinsic noise - among the current sampling plan points. Since the intrinsic error is assumed to be i.i.d. and normal, the covariance matrix is a diagonal matrix with components (Σ )ii = σ 2 (di ),
i = 1, 2, ..., ns ,
(5)
where σ 2 is given by 3. Then, considering the Best Linear Unbiased Predictor (BLUP) shown by [1], the prediction of the SK at a given point du is: yˆ(du ) = μ ˆ + rT (Ψ + Σ )−1 (y − 1ˆ μ),
(6)
which is the usual Kriging prediction with the added diagonal correlation matrix from the intrinsic noise. Similarly, the predicted error takes the form: 2 1 + λ(d) − rT (Ψ + Σ )−1 r s2n (d) = σ
(1 − 1T (Ψ + Σ )−1 r)2 + , 1T (Ψ + Σ )−1 1
(7)
where λ(d) corresponds to the regression term.
4
Adaptive Target Selection
The adaptive target selection is briefly introduced in this section. For a more detailed description, the reader is referred to [4]. With the framework presented so far, we are able to incorporate error estimates from MCI within the sEGO scheme. It is important to notice that the number of samples of the MCI is an input parameter, i.e. the designer has to set nr in Eq. (3). Consequently, the designer is able to control the magnitude of Σ and λ by changing the sample size nr . However, in practice a target variance (σ 20 ) is first chosen and the sample size is iteratively increased until the evaluated variance is close to the target value. Thus, for a constant target variance, the regression parameter is then enforced by the MCI procedure to be (8) λ(d) = σ 20 . The choice of the target variance must consider two facts: (a) if the target variance is too high, the associated error may lead to a poor and deceiving
242
F. Nascentes et al.
approximation of the integral, and, (b) if the target tends to zero, so does the error and we retrieve the deterministic case, however, at the expense of a huge computational effort. The advantage here is that the Adaptive Variance Target selection automatically defines the variance target for a new infill point in the sEGO search. That is, the adaptive approach starts exploring the design domain by evaluating the objective function value of each design point using MCI with a high target variance - so that each evaluation requires only a few samples. Then, it gradually reduces the target variance for the evaluation of additional infill points in already visited regions. A flowchart of the proposed stochastic EGO algorithm, including the proposed adaptive target selection, is shown in Fig. 1. In the next paragraphs, each of its steps is detailed.
Fig. 1. Flowchart of the algorithm
After the construction of the SK metamodel for the initial sampling plan, the infill stage begins. The AEI method is employed for this purpose. Here, an initial target variance σ 20 is set and the first infill point is added to the model being simulated up to this corresponding target variance. From the second infill point on, the adaptive target selection scheme starts to take place. We propose the use of an exponential decay equation parameterized
Stochastic Tunneling for Improving the Efficiency
243
by problem dimension (n) and the number of points already sampled near the new infill point (nclose ), which is defined by the number of points in the model located at a given distance (rhc ) of the infill point. Here, we consider a hypercube around the infill point selected with half-sides rhc to evaluate nclose . Then, when the infill is located within an unsampled region, its target variance is set as the initial target variance. On the other hand, when the infill is located in a region with existing sampled points, a lower target variance (σ 2adapt ) is employed for the approximation of its objective function value. This is done to allocate more computational effort on regions that need to be exploited. When they start to group up, the focus changes to landscape exploitation. In this situation, the target MCI variance is set to a lower value, increasing the model accuracy. The expression proposed to calculate the adaptive target value for each iteration of the sEGO algorithm is σ 2adapt =
σ 20 , exp(a1 + a2 · n + a3 · nclose − a4 · nclose · n)
(9)
where ai are given constants. We also set a minimum and a maximum value for the adaptive target, in order to avoid a computationally intractable number of samples. We thus enforce σ 2min ≤ σ 2adapt ≤ σ 20 , (10) where σ 2min is a lower bound on the target.
5
The Proposed Normalization Scheme
The normalization employed in this paper is based on the stochastic tunneling approach, which consists in allowing the particle to “tunnel” high energy regions of domain, avoiding getting trapped on local minima, usually employed in the SA algorithm. According to [12], it may be done by applying the following nonlinear transformation to y¯: J(d) = 1 − exp (−γ (y(d) − y0 )) ,
(11)
where γ and y0 are given parameters. Then, the sEGO approaches minimizes J instead of the original approximated function y¯. In the sEGO context, it is expected that this normalization reduce the variability level of the regions of the design domain that have high values of the objective function, as well as reduce the dependency of the quality of the search on the parameters of the SK and adaptive methods.
244
6
F. Nascentes et al.
Numerical Examples
In this section, we analyze the minimization of two multimodal problems taken from [4]. The first problem is a stochastic version of the 2D Multimodal Branin function: φ(d, X) = p1 (d2 − p2 d21 + p3 d1 − p4 )2 X1 + p5 (1 − p6 ) cos(d1 )X2 + p5 + 5d1 ,
(12)
with parameters p1 = 1, p2 = 5.1/(4π 2 ), p3 = 5/π, p4 = 6, p5 = 10, p6 = 1/(8π). The design domain is S = {d1 ∈ [−5, 10], d2 ∈ [0, 15]}. X1 and X2 are Normal random variables given by (X1 , X2 ) ∼ N (1, 0.05). The second is a stochastic version of the 10 Multimodal Levy function: φ(d, X) = sin2 (πp1 ) +
n−1
(pi − 1)2 [1 + 10 sin2 (πpi + 1)]
i=1
+ (pn − 1) [1 + sin2 (2πpn )], 2
(13)
where pi = 1 + di4Xi for i = 1, 2, ..., n. Here we take n = 10 and a design domain S = {di ∈ [−10, 10], i = 1, 2, ..., 10}. The random variables Xi follow a Normal distribution with σX = 0.01, i.e., Xi ∼ N (1, 0.01). In both problems, the weight function w from Eq. 1 is taken as the probability density function (PDF) fX (x) of the random vector X. We employ the framework described in Fig. 1 with and without the proposed normalization approach. The following parameters are kept constant: initial sampling plans comprised of ns = 4n points, rhc = 0.1, a1 = a2 = a3 = 1/2 and a4 = 1/100 (9), σ 2min = 10−6 and σ 20 = 10−2 . For the proposed normalization, we employed γ = 0.01, and y0 as the deterministic optimum of each problem, y0 = −16.644022 and 0 for the 2D and 10D problems, respectively. For the AEI infill criterion, α = 1.00 was used as suggested by [7]. The efficiency of each algorithm is measured by the number of function evaluations (NFE), i.e. the number of times the function φ is evaluated, which also is employed as the stopping criterion. It is important to point out that the optimization procedure presented depends on random quantities. Therefore, the results obtained are not deterministic and may change when the algorithm is run several times. For this reason, when dealing with stochastic algorithms, it is appropriate to present statistical results over a number of algorithm runs [6]. Thus, for each problem, the average as well as the 5 and 95 percentiles of the results found over the set of 20 independent runs are presented as box plots. In order to highlight the increase in performance when the proposed normalization scheme is applied, we assign different computational budgets (NFE values as stopping criterion) for the case with and without normalization. That is, for the 2D Multimodal problem, we set NFE = 100 as stopping criterion when using the normalization, while NFE = 1000 for the case not employing such a normalization. For the 10D Multimodal problem, we employed NFE = 200 and 1000 for the cases with and without the normalization, respectively (Fig. 2).
Stochastic Tunneling for Improving the Efficiency
245
(a) Multimodal 2D (Branin): σX = 0.05
(b) Multimodal 10D (Levy): σX = 0.01
Fig. 2. Comparison between normalized and non-normalized solution for σ 20 = 0.01.
7
Conclusion
This paper proposed the use of a normalization scheme in order to increase the performance of the recently developed Adaptive Target Variance sEGO method for the minimization of functions that depend on expensive to evaluate and high dimensional integrals. As in the original version of the method, the integral to be minimized was approximated by MCI and the variance of the error in this approximation was included into the SK framework. The AEI infill criterion was employed to guide the addition of new points in the metamodel. The modification proposed here was to minimize a normalized version of y¯, by employing the nonlinear transformation proposed by [12].
246
F. Nascentes et al.
Overall, the use of the normalization in the sEGO method yielded very promising results for the minimization of integrals. Indeed, it was able to obtain more precise results, while requiring only a fraction of the computational budget of the original version of the algorithm. However, since the results presented here are preliminary, the use of the normalization deserves further investigation in order to better assess its impact in the sEGO search in different situations and problems. Acknowledgements. The authors acknowledge the financial support and thank the Brazilian research funding agencies CNPq and CAPES.
References 1. Ankenman, B., Nelson, B.L., Staum, J.: Stochastic kriging for simulation metamodeling. Oper. Res. 58(2), 371–382 (2010) 2. Beck, A.T., Kougioumtzoglou, I.A., dos Santos, K.R.M.: Optimal performancebased design of non-linear stochastic dynamical rc structures subject to stationary wind excitation. Eng. Struct. 78, 145–153 (2014) 3. Beck, J., Dia, B.M., Espath, L.F.R., Long, Q., Tempone, R.: Fast Bayesian experimental design: laplace-based importance sampling for the expected information gain. Comput. Methods Appl. Mech. Eng. 334, 523–553 (2018). https://doi.org/ 10.1016/j.cma.2018.01.053 4. Carraro, F., Lopez, R.H., Miguel, L.F.F., Andre J.T.: Optimum design of planar steel frames using the search group algorithm. Struct. Multidiscip. Optim. (2019) (p. to appear) 5. Forrester, A., Sobester, A., Keane, A.: Engineering Design Via Surrogate Modelling: A Practical Guide. Wiley, Chichester (2008) 6. Gomes, W.J., Beck, A.T., Lopez, R.H., Miguel, L.F.: A probabilistic metric for comparing metaheuristic optimization algorithms. Struct. Saf. 70, 59–70 (2018) 7. Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Glob. Optim. 34(3), 441– 466 (2006) 8. Jalali, H., Nieuwenhuyse, I.V., Picheny, V.: Comparison of kriging-based algorithms for simulation optimization with heterogeneous noise. Eur. J. Oper. Res. 261(1), 279–301 (2017). https://doi.org/10.1016/j.ejor.2017.01.035 9. Jones, D.R., Schonlau, M., William, J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998). https://doi.org/10.1023/ a:1008306431147 10. Lopez, R., Ritto, T., Sampaio, R., de Cursi, J.S.: A new algorithm for the robust optimization of rotor-bearing systems. Eng. Optim. 46(8), 1123–1138 (2014). https://doi.org/10.1080/0305215X.2013.819095 11. Picheny, V., Wagner, T., Ginsbourger, D.: A benchmark of kriging-based infill criteria for noisy optimization. Struct. Multidiscip. Optim. 48(3), 607–626 (2013) 12. Wenzel, W., Hamacher, K.: Stochastic tunneling approach for global minimization of complex potential energy landscapes. Phys. Rev. Lett. 82, 3003– 3007 (1999). https://doi.org/10.1103/PhysRevLett.82.3003, https://link.aps.org/ doi/10.1103/PhysRevLett.82.3003
The Bernstein Polynomials Based Globally Optimal Nonlinear Model Predictive Control Bhagyesh V. Patil1 , Ashok Krishnan1,2(B) , Foo Y. S. Eddy2 , and Ahmed Zidna3 1
2
Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore [email protected], [email protected] School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang, Singapore [email protected] 3 LGIPM, Universit´e de Lorrain, Lorrain, France [email protected]
Abstract. Nonlinear model predictive control (NMPC) has shown considerable success in the control of a nonlinear systems due to its ability to deal directly with nonlinear models. However, the inclusion of a nonlinear model in the NMPC framework potentially results in a highly nonlinear (usually ‘nonconvex’) optimization problem. This paper proposes a solution technique for such optimization problems. Specifically, this paper proposes an improved Bernstein global optimization algorithm. The proposed algorithm contains a Newton-based box trim operator which extends the classical Newton method using the geometrical properties associated with the Bernstein polynomial. This operator accelerates the convergence of the Bernstein global optimization algorithm by discarding those regions of the solution search space which do not contain any solution. The utility of this improved Bernstein algorithm is demonstrated by simulating an NMPC problem for tracking multiple setpoint changes in the reactor temperature of a continuous stirred-tank reactor (CSTR) system. Furthermore, the performance of the proposed algorithm is compared with those of the previously reported Bernstein global optimization algorithm and a conventional sequential-quadratic programming based sub-optimal NMPC scheme implemented in MATLAB.
1
Introduction
The design of efficient controllers to extract the desired performance from physical engineering systems has been a well studied problem in control engineering [3]. This problem can be broadly split into the following two stages: (i) 1
Now with the John Deere Technology Centre, Magarpatta City, Pune, India (email: [email protected]). 2,3 The authors acknowledge funding support from the NTU Start-Up Grant and the MOE Academic Research Fund Tier 1 Grant. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 247–256, 2020. https://doi.org/10.1007/978-3-030-21803-4_26
248
B. V. Patil et al.
development of a mathematical model for the physical system under study; and (ii) controller design based on the mathematical model developed in (i). In the last decade, tremendous advancements have been made in the development of optimization algorithms and computing platforms used to solve optimization problems. Consequently, several controllers have been designed which utilize advanced computational algorithms to solve complex optimization problems (see, for instance [7,8,10], and the references therein). In recent years, nonlinear model predictive control (NMPC) has emerged as a promising advanced control methodology. In principle, MPC performs a cost optimization subject to specific constraints on the system. The cost optimization is performed repeatedly over a moving horizon window [13]. We note that the following two issues need to be carefully considered while designing any NMPC scheme: (i) Can the nonlinear optimization procedure be completed until a convergence criterion is satisfied to guarantee the optimality of the solution obtained? (ii) Can (i) be achieved within the prescribed sampling time? This work primarily addresses (i) which necessitates the development of an advanced optimization procedure. In the literature, (i) has been addressed by many researchers using various global optimization solution approaches. For instance, the particle swarm optimization (PSO) approach was used in [4]. A branch-and-bound approach was adopted to solve the NMPC optimization problem in [2]. Reference [9] extended the traditional branch-and-bound approach with bound tightening techniques to locate the correct global optimum for the NMPC optimization problem. Apart from these works, Patil et al. advocated the use of Bernstein global optimization procedures for NMPC applications (see, for instance, [1,11]). These optimization procedures are based on the Bernstein form of polynomials [12] and use several attractive ‘geometrical’ properties associated with the Bernstein form of polynomials. This work a sequential improvement of the previous work reported in [11]. Specifically, [11] introduced a Bernstein branch-and-bound algorithm to solve the nonlinear optimization problems encountered in an NMPC scheme. We note that the algorithm presented in [11] is computationally expensive due to the numerous branchings involved in a typical branch-and-bound framework. This motivates the main contribution of this work wherein a tool is developed to accelerate the solution search procedure for the Bernstein global optimization algorithm. The developed tool speeds up the Bernstein global optimization algorithm by trimming (discarding) those regions from the solution search space which certainly do not contain any solution. Due to the nature of its main function, the developed tool is called a ‘box trim operator’.
2
NMPC Formulation
Consider a class of time-invariant continuous-time systems described using the following nonlinear model: x˙ = f (x, u),
x(t0 ) = x0
(1)
The Bernstein Polynomials Based Globally Optimal NMPC
249
where x ∈ Rnx and u ∈ Rnu represent the vectors of the system states and the control inputs respectively while f describes the nonlinear dynamic behavior of the system. The NMPC of a discrete-time system involves the solution of a nonlinear optimization problem at each sampling instant. Mathematically, the NMPC problem formulation can be summarized as follows: min J =
xk ,uk
N −1
L (xk , uk )
(2a)
k=0
subject to x0 = x ˆ0 xk+1 = xk + Δt.f (xk , uk ) c(xk , uk ) ≤ 0
(2b) (2c) (2d)
xmin ≤ xk ≤ xmax k k
(2e)
umin k
(2f) (2g)
umax k
≤ uk ≤ for k = 0, 1, . . . , N − 1
where N represents the prediction horizon; x ˆ0 ∈ Rnx represents the initial states nx nu of the system and xk ∈ R and uk ∈ R represent the system states and the control inputs respectively at the kth sampling instant. The objective function (J) in (2a) is defined by the stage cost L (.). The discretized nonlinear dynamics in (2c) are formulated as a set of equality constraints and c(xk , uk ) represents the nonlinear constraints arising from the operational requirements of the system in (1). The system is subjected to the state and input constraints described by (2e)–(2f).
3
Bernstein Polynomial Approach for Global Optimization
This section briefly introduces some important notions and properties about the Bernstein form of polynomials. Subsequently, these properties are used to present a Newton-based box trim operator. Finally, the Bernstein form of polynomials and the Newton-based box trim operator are used in a suitable branch-andbound framework to determine the global solutions of the nonlinear optimization problems encountered in the NMPC scheme described in Sect. 2. We can write an l-variate polynomial pf with real coefficients aI as follows: pf (x) = aI xI , x ∈ Rl , (3) I≤N
where N is the degree of pf . We transform (3) into the following Bernstein form of polynomials to obtain the bounds for its range over an l-dimensional box x. bI (x) BIN (x) , (4) pb (x) = I≤N
250
B. V. Patil et al.
where BIN (x) is the Ith Bernstein basis polynomial of degree N . BIN (x) is defined as follows: x ∈ Rl (5) BIN (x) = Bin11 (x1 ) · · · Binll (xl ), For ij = 0, 1, . . . , nj , j = 1, 2, . . . , l , n
Bijj (xj ) =
nj ij
(xj − xj )ij (xj − xj )nj −ij , (xj − xj )nj
(6)
and bI (x) is an array of Bernstein coefficients which is computed as the weighted sum of the coefficients aI in (3) on the box x. I J K J w(x) bI (x) = I ≤ N. (7) (inf x)K−J aK , J N J≤I K≤J J Note that all the Bernstein coefficients bI (x)I∈S form an array, wherein S = {I : I ≤ N }. Furthermore, we define S0 as a special set comprising only the vertex indices from S as shown below: S0 = {(0, 0, . . . , 0), (n1 , 0, . . . , 0), (0, n2 , 0, . . . , 0), . . . , (n1 , n2 , . . . , nl )}. Remark 3.1. The partial derivative of the polynomial pf in (3) with respect to xr (1 ≤ r ≤ l) can be obtained from its Bernstein coefficients bI (x) using the following relation [12]: nr pf,r (x) = [bIr,1 (x) − bI (x)]BNr,−1 ,I (x), 1 ≤ r ≤ l, x ∈ x. (8) w(x) I≤Nr,−1
where pf,r (x) contains an enclosure of the range of the derivative of the polynomial pf on the box x. 3.1
Newton-Based Box Trim Operator
An NMPC scheme for the system described by (1) involves solving an optimization problem of the form (2a)–(2g). Such an optimization problem has a set of nonlinear constraints described by (2c)–(2d) and a bounded search space described by (2e)–(2f). Apart from a feasible solution, the search space also contains a set of values that do not satisfy (2c)–(2d) (i.e. inconsistent values). As such, it is imperative to remove these inconsistent values. A box trim operator can help in removing the set of inconsistent values which do not satisfy (2c)–(2d) from the search space described by (2e)–(2f). This is achieved by applying the Newton method to remove the leftmost and rightmost ‘quasi-zero’ points from the search-space described by (2e)–(2f). It is worth noting that such a box trim operator helps in narrowing the search-space, thereby speeding up the solution search procedure in an optimization algorithm.
The Bernstein Polynomials Based Globally Optimal NMPC
251
In principle, the classical Newton method is used to find successively better approximations of the zeros of a real-valued function. Based on this, the Newton method to find the zero(s) of a function of the form (2c) for the set of variables y = (x, u) is described below: y k+1 = y k −
F (y k ) , F (y k )
where F is the evaluation of (2c) at y k , and F is the Jacobian of F . The interval Newton method generalizes the aforementioned procedure to intervals [6, p. 169]. In this context, we note that the Bernstein range enclosure (Remark 3.2) is an interval form [14]. Consequently, we extend the interval Newton method based on the Bernstein range enclosure property to obtain the Newton-like box trim operator shown below: F(Irk ) k+1 k k , (9) Ir − y r = yr FI k r
where y = (x, u) is an interval vector or box and subscript r indicates the rth variable from the box y on which the box trim operator is to be applied. Furthermore, Irk := y kr or y kr depending on the endpoint of the rth variable from which the ‘quasi-zero’ points of (2c) need to be removed. F(Irk ) represents an interval encompassing the minimum and maximum Bernstein coefficients from an array bI (y) (computed using (7) at Irk ), FI k denotes an interval enclosure for r the derivative of (2c) on yrk obtained using Remark 3.1, and yrk+1 represents the trimmed interval for the rth variable. 3.2
Bernstein Bound-trim-branch Global Optimization Algorithm
This algorithm exhaustively explores the search space of the nonlinear optimization problem in (2a)–(2g) with the objective of finding the best solution among all the available solutions. This is performed at each NMPC step. Specifically, this algorithm involves the three main steps listed below: – Bounding (2a), (2c) and (2d) using the Bernstein range enclosure property. – Box trimming operation using (9) for (2e)–(2f) using the relations (2c)–(2d). – Branching by subdividing the domains of the variables in (2e)–(2f). These three steps give rise to the name Bernstein bound-trim-branch (BBTB) global optimization algorithm. A pseudo-code listing the steps of the BBTB algorithm is provided below. Algorithm Bernstein bound-trim-branch: [f ∗ , y∗ ] = BBTB(f, gi , hj , y, f , h ) Inputs: The cost function in (2a) as f , the equality constraints in (2c) as hj , the inequality constraints in (2d) as gi , the initial search box comprising the system
252
B. V. Patil et al.
states (xk ) and the control inputs (uk ) as y, the tolerance parameter f on the global minimum of (2a), and the tolerance parameter h to which the equality constraints in (2c) need to be satisfied. Outputs: The global minimum cost value of (2a) as f ∗ and the global minimizers for the states (xk ) and the control input profile (uk ) as y∗ . BEGIN Algorithm Relaxation step • Compute the Bernstein coefficient matrices for f , gi , and hj on y as bI,f , bI,gi , and bI,hj respectively. Set f to the minimum Bernstein coefficient from bI,f . Set the global minimum estimate as f = f. • Construct L ← {(f, bI,f , bI,gi , bI,hj , y)}, Lsol ← {}. Box Trimming step • If L is empty, then go to the termination step. Else, sort the items of L in the ascending order of f( each item in L is of the form: (f, bI,f , bI,gi , bI,hj , y) ). • Pick the first item from L by removing its entry. • Apply the box trim operator (see Sect. 3.1) to y using the constraints in (2c)– (2d) and obtain the trimmed box y . Compute the new Bernstein coefficient matrices for f , gi , and hj on y as bI,f , bI,gi , and bI,hj respectively. • Set f to the minimum Bernstein coefficient from b and update the global I,f
minimum estimate as f = f. Construct the item as (f, bI,f , bI,gi , bI,hj , y ).
Constraint Feasibility and Vertex step • For the item (f, bI,f , bI,gi , bI,hj , y ), check the constraint feasibility as detailed in Remark 3.2. If it is found to be quasi-feasible, go to the branching step. • Check if the item (f, bI,f , bI,gi , bI,hj , y ) satisfies the vertex property. If ‘true’, then update f = b vi and add this item to Lsol . Go to the box trimming step.
I,f
vi (Note that bI,f are the vertex Bernstein coefficients obtained from bI,f using the special set S0 ).
Branching step • For the item (f, bI,f , bI,gi , bI,hj , y ), subdivide the box y into two subboxes y1 and y2 . • Compute the Bernstein coefficient matrices for f , gi , and hj on y1 and y2 . Construct the two items as (fk , bI,f,k , bI,gi,k , bI,hj,k , yk ), k = 1, 2. • Discard yk , k = 1, 2 for which min(bI,f,k ) > f . Enter (fk , bI,f,k , bI,gi,k ,
bI,hj,k , yk ) into L Here fk := min(bI,f,k ) . Go to the box trimming step.
The Bernstein Polynomials Based Globally Optimal NMPC
253
Termination step • Find the item in Lsol for which the first entry is equal to f . Denote that item by If . • In If , the first entry is the global minimum f ∗ while the last entry is the global minimizer y∗ . • Return the global solution (f ∗ , y∗ ). END Algorithm 3.3
Simulation Results
In this section, the performance of the BBTB algorithm (Sect. 3.2) based NMPC scheme is first compared with that of the conventional sequential-quadratic programming based suboptimal NMPC scheme implemented in MATLAB [5]. These two schemes are compared in terms of the optimality of the solutions obtained while solving the nonlinear optimization problems encountered in the respective schemes. Subsequently, the benefits derived from the use of the box trim operator in the BBTB algorithm are assessed. Specifically, the computation time required and the number of boxes processed by the BBTB algorithm based NMPC scheme are compared with the computation time required and the number of boxes processed by the previously reported Bernstein algorithm (BBBC) based NMPC scheme from [11]. The following parameter values were chosen for the simulation studies in this paper: • • • •
Sampling time, Δt = 0.5 seconds, and prediction horizon, N = 7. Q = diag(1 0.01)T and R = 0.01 as weighting matrices. Initial conditions: CA = 0.2 mol/l, T = 370 K, and Tc = 300 K. Tolerance: f = zero = 0.001 in the BBTB algorithm.
Figure 1 shows the evolution of the system states from their initial values (CA = 0.2, T = 370) for a series of setpoint changes. The closed-loop performances of the BBTB based NMPC scheme and the suboptimal NMPC scheme are compared. We observed that both the system states transitioned smoothly to their new values for multiple setpoint changes when the CSTR system was controlled using the BBTB algorithm based NMPC scheme. On the other hand, some undershoot and overshoot (≈ 2 − 5%) was observed when the suboptimal NMPC scheme was used to control the CSTR system. The settling time was similar for both the NMPC schemes. Figure 2a illustrates the control action observed when the CSTR system was controlled using the BBTB algorithm based NMPC scheme and the suboptimal NMPC scheme. It is apparent that except for the first few samples (≈ 0−20), the BBTB algorithm based NMPC scheme demonstrates smooth control performance. The suboptimal NMPC scheme demonstrates a slightly oscillating control action, particularly when the setpoint changes are applied.
254
B. V. Patil et al.
1
380
0.9
370
Reactor Temperature (T)
A
Concentration (C )
Figure 2b presents the computational times required per sample to solve the nonlinear optimization problems encountered in the BBTB algorithm based NMPC scheme and the BBBC algorithm based NMPC scheme. It is observed that the BBTB algorithm takes 56% lesser time than the BBBC algorithm for computation. This is because the box trim operator discards some regions from the solution search space which do not contain the global solution during the branch-and-bound process of the Bernstein algorithm. Overall, it aids in decreasing the time required to locate the correct global solution. This is also evident from Fig. 3a, which shows the number of boxes processed by the BBBC and BBTB algorithms during the branch-and-bound process. The presence of the box trim operator in the BBTB algorithm reduces the number of boxes processed by an average of 98% when compared with the BBBC algorithm. It is worth mentioning that the nonlinear optimization problem structure at each sampling instant of an NMPC scheme for a CSTR system remains the same. This fact is well-exploited by the BBTB algorithm which consistently processes a similar number of boxes (i.e. 6) at each NMPC iteration. This clearly demonstrates the efficacy of the box trim operator used in the BBTB algorithm. Figure 3b plots the cost function values of the nonlinear optimization problems solved at each sampling instant of the BBTB algorithm based NMPC scheme and the suboptimal NMPC scheme. It is observed that the cost function values obtained are nearly identical for both the NMPC schemes. However, it is observed that at each setpoint change (introduced at samples 0, 50, 100, 150, and 200), the BBTB algorithm based NMPC scheme returned a lower cost function value when compared with the suboptimal NMPC scheme. This can be attributed to the smoother transitions (visible in Figs. 1a, b and 2a) observed while using the BBTB algorithm based NMPC scheme.
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0
Global NMPC (Bernstein algorithm) Sub−optimal NMPC (fmincon)
50
100 150 Samples
(a)
200
250
Global NMPC (Bernstein algorithm) Sub−optimal NMPC (fmincon)
360 350 340 330 320 310 300 0
50
100 150 Samples
200
250
(b)
Fig. 1. Evolution of the states CA (a) and T (b) when the CSTR system is controlled using the BBTB algorithm based NMPC scheme and the sequential-quadratic programming based suboptimal NMPC scheme.
The Bernstein Polynomials Based Globally Optimal NMPC 0.2
310
BBBC (without the box trim operator)
0.18 305
BBTB
0.16 300
Time (Sec)
Coolant stream temperature (Tc)
255
295
290
(with the box trim operator)
0.14 0.12 0.1 0.08 0.06
285 Global NMPC (Bernstein algorithm) Sub−optimal NMPC (fmincon)
280 0
50
100 150 Samples
200
0.04 250
0.02 0
50
100 150 Samples
(a)
200
250
(b)
500 450
BBBC (without the box trim operator) BBTB (with the box trim operator)
400 350 300
7
250 6
200 150
5
100
0
100
200
250
50 0 0
50
100 150 Samples
(a)
200
250
NLP values (at each NMPC iteration)
Boxes processed (per NMPC iteration)
Fig. 2. (a) Control input (Tc ) profile for the CSTR system controlled using the BBTB algorithm based NMPC scheme and the sequential-quadratic programming based suboptimal NMPC scheme. (b) Comparison of the computation times needed for solving a nonlinear optimization problem of the form (2a)–(2g) at each sampling instant using the BBTB algorithm and BBBC algorithm based NMPC schemes. The sampling time is 0.5s. 9 Global NMPC (Bernstein) Sub−optimal NMPC (fmincon)
8 7 6 5 4 3
SP2
SP1
SP3
SP5
SP4
2 1 0 0
50
100 150 Samples
200
250
(b)
Fig. 3. (a) Number of boxes processed during the branch-and-bound process of the BBBC and BBTB algorithms. (b) Cost function values of the nonlinear optimization problems of the form (2a)–(2g) solved at each sampling instant when the CSTR system is controlled using the BBTB algorithm based NMPC scheme and the sequentialquadratic programming based suboptimal NMPC scheme. SP1 , SP2 , SP3 , and SP4 show the samples at which the setpoint changes are implemented.
4
Conclusions
This work presented a global optimization algorithm based NMPC scheme for nonlinear systems. We first discussed the necessity of using global optimization algorithm based NMPC scheme. Subsequently, we proposed an improvement in the Bernstein global optimization algorithm. The proposed improvement was a Newton method based box trim operator which utilized some nice geometrical properties associated with the Bernstein form of polynomials. Practically, this operator quickened the computational times for the online nonlinear optimization problems encountered in an NMPC scheme. The BBTB algorithm based NMPC scheme was tested on a CSTR system to demonstrate its efficacy. The results of the case studies performed on the CSTR system demonstrated the superior control performance of the BBTB algorithm based NMPC scheme when compared with a conventional sequential-quadratic programming based suboptimal NMPC scheme. The case studies also showed that the performance of the Bernstein global optimization algorithm based NMPC scheme can be improved
256
B. V. Patil et al.
significantly in terms of the computation time by including the Newton based box trim operator described in this paper. This was particularly found true when compared against previously reported Bernstein algorithm based NMPC scheme from the literature.
References 1. Patil, B.V., Bhartiya, S., Nataraj, P.S.V., Nandola, N.N.: Multiple-model based predictive control of nonlinear hybrid systems based on global optimization using the Bernstein polynomial approach. J. Process Control 22(2), 423–435 (2012) 2. Cizniar, M., Fikar, M., Latifi, M.A.: Design of constrained nonlinear model predictive control based on global optimisation. In: 18th European Symposium on Computer Aided Process Engineering-ESCAPE 18, pp. 1–6 (2008) 3. Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback Control Theory. Dover Publications, USA (2009) 4. Germin Nisha, M., Pillai, G.N.: Nonlinear model predictive control with relevance vector regression and particle swarm optimization. J. Control. Theory Appl. 11(4), 563–569 (2013) 5. Gr¨ une, L., Pannek, J.: Nonlinear Model Predictive Control, pp. 43–66. Springer, London (2011) 6. Hansen, E.R., Walster, G.W.: Global Optimization Using Interval Analysis, 2nd edn. Marcel Dekker, New York (2005) 7. Inga J. Wolf, Marquardt, W.: Fact NMPC schemes for regulatory and economic NMPC− A review. J. Process Control 44, 162–183 (2016) 8. Lenhart, S., Workman, J.T.: Optimal Control Applied to Biological Models. CRC Press, USA (2007) 9. Long, C., Polisetty, P., Gatzke, E.: Nonlinear model predictive control using deterministic global optimization. J. Process Control 16(6), 635–643 (2006) ˚ om, K.J., Wittenmark, B.: Computer-Controlled Systems: Theory and Design, 10. Astr¨ 3rd edn. Dover Publications, USA (2011) 11. Patil, B.V., Maciejowski, J., Ling, K.V.: Nonlinear model predictive control based on Bernstein global optimization with application to a nonlinear CSTR. In: IEEE Proceedings of 15th Annual European Control Conference, pp. 471–476. Aalborg, Denmark (2016) 12. Ratschek, H., Rokne, J.: New Computer Methods for Global Optimization. Ellis Horwood Publishers, Chichester, England (1988) 13. Rawlings, J.B., Mayne, D.Q., Diehl, M.M.: Model Predictive Control: Theory, Computation, and Design, 2nd edn. Nob Hill Publishing, USA (2017) 14. Stahl, V.: Interval methods for bounding the range of polynomials and solving systems of nonlinear equations. Ph.D. thesis, Johannes Kepler University, Linz (1995)
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions Deepak Kumar and Yves Lucet(B) University of British Columbia Okanagan, 3187, University Way, Kelowna, BC V1V 1V7, Canada [email protected] https://people.ok.ubc.ca/ylucet/
Abstract. Computing the closed convex envelope or biconjugate is the core operation that bridges the domain of nonconvex with convex analysis. We focus here on computing the conjugate of a bivariate piecewise quadratic function defined over a polytope. First, we compute the convex envelope of each piece, which is characterized by a polyhedral subdivision such that over each member of the subdivision, it has a rational form (square of a linear function over a linear function). Then we compute the conjugate of all such rational functions. It is observed that the conjugate has a parabolic subdivision such that over each member of its subdivision, it has a fractional form (linear function over square root of a linear function). This computation of the conjugate is performed with a worstcase linear time complexity algorithm. Our results are an important step toward computing the conjugate of a piecewise quadratic function, and further in obtaining explicit formulas for the convex envelope of piecewise rational functions. Keywords: Conjugate · Convex envelope Piecewise quadratic function
1
·
Introduction
Computational convex analysis (CCA) focuses on creating efficient algorithms to compute fundamental transforms arising in the field of convex analysis. Computing the convex envelope or biconjugate is the core operation that bridges the domain of nonconvex analysis with convex analysis. Development of most of the algorithms in CCA began with the Fast Legendre Transform (FLT) in [5], which was further developed in [6,18], and improved to the optimal linear worst-case time complexity in [19] and then [10,20]. More complex operators were then considered [3,4,22] (see [21] for a survey including a list of applications). Piecewise Linear Quadratic (PLQ) functions (piecewise quadratic functions over a polyhedral partition) are well-known in the field of convex analysis [24] Supported by NSERC, CFI. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 257–266, 2020. https://doi.org/10.1007/978-3-030-21803-4_27
258
D. Kumar and Y. Lucet
with the existence of linear time algorithms for various convex transforms [4,22]. Computing the full graph of the convex hull of univariate PLQ functions is possible in optimal linear worst-case time complexity [9]. For a function f defined over a region P , the pointwise supremum of all its convex underestimators is called the convex envelope and is denoted convfP (x, y). Computing the convex envelope of a multilinear function over a unit hypercube is NP-Hard [7]. However, the convex envelope of functions defined over a polytope P and restricted by the vertices of P can be computed in finite time using a linear program [26,27]. A method to reduce the computation of convex envelope of functions that are one lower dimension(Rn−1 ) convex and have indefinite Hessian to optimization problems in lower dimensions is discussed in [14]. Any general bivariate nonconvex quadratic function can be linearly transformed to the sum of bilinear and a linear function. Convex envelopes for bilinear functions over rectangles have been discussed in [23] and validated in [1]. The convex envelope over special polytopes (not containing edges with finite positive slope) was derived in [25] while [15] deals with bilinear functions over a triangle containing exactly one edge with finite positive slope. The convex envelope over general triangles and triangulation of the polytopes through doubly nonnegative matrices (both semidefinite and nonnegative) is presented in [2]. In [16], it is shown that the analytical form of the convex envelope of some bivariate functions defined over polytopes can be computed by solving a continuously differentiable convex problem. In that case, the convex envelope is characterized by a polyhedral subdivision. The Fenchel conjugate f ∗ (s) = supx∈Rn [s, x − f (x)] (we note s, x = sT x) of a function f : Rn → R ∪ {+∞} is also known as the Legendre-Fenchel Transform or convex conjugate or simply conjugate. It plays a significant role in duality and computing it is a key step in solving the dual optimization problem [24]. Most notably, the biconjugate is also the closed convex envelope. A method to compute the conjugate known as the fast Legendre transform was introduced in [5] and studied in [6,18]. A linear time algorithm was later introduced by Lucet to compute the discrete Legendre transform [19]. Those algorithms are numeric and do not provide symbolic expressions. Computation of the conjugate of convex univariate PLQ functions have been well studied in the literature and linear time algorithms have been developed in [8,11]. Recently, a linear time algorithm to compute the conjugate of convex bivariate PLQ functions was proposed in [12]. Let f : Rn → R ∪ {+∞} be a piecewise function, i.e. f (x) = fi (x) if x ∈ Pi for i = 1, . . . , N . From [13, Theorem 2.4.1], we have (inf i fi )∗ = supi fi∗ , and from [13, Proposition 2.6.1], conv(inf i (fi + IPi )) = conv(inf i [conv(fi + IPi )]) where IPi is the indicator function for Pi . Hence, conv(inf i (fi + IPi )) = (supi [conv(fi + IPi )]∗ )∗ . This provides an algorithm to compute the closed convex envelope: (1) compute the convex envelope of each piece, (2) compute the conjugate of the convex envelope of each piece, (3) compute the maximum of all
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions
259
the conjugates, and (4) compute the conjugate of the function obtained in (3) to obtain the biconjugate. The present work focuses on Step (2). Recall that given a quadratic function over a polytope, the eigenvalues of its symmetric matrix determine how difficult its convex envelope is to compute (for computational purposes, we can ignore the affine part of the function). If the matrix is semi-definite (positive or negative), the convex envelope is easily computed. When it is indefinite, a change of coordinate reduces the problem to finding the convex envelope of the function (x, y) → xy over a polytope, for which step (1) is known [17]. The paper is organized as follow. Section 3 focuses on the domain of the conjugate while Sect. 4 determines the symbolic expressions. Section 5 concludes the paper with future work.
2
Preliminaries and Notations
The subdifferential ∂f (x) of a function f : Rn → R∪{+∞} at any x ∈ dom(f ) = {x : f (x) < ∞} is ∂f (x) = {s : f (y) ≥ f (x) + s, y − x, ∀y ∈ dom(f )} (∂f (x) = {∇f (x)} when f is differentiable at x). We note IP the indicator / P. function of the set P , i.e. IP (x) = 0 when x ∈ P and IP (x) = +∞ when x ∈ A parabola is a two dimensional planar curve whose points (x, y) satisfy the equation ax2 + bxy + cy 2 + dx + ey + f = 0 with b2 − 4ac = 0. A parabolic region is formed by the intersection of a finite number of parabolic inequalities, i.e. Pr = {x ∈ R2 : Cpi (x) ≤ 0, i ∈ {1, · · · , k}} where Cpi (x) = ai x21 + bi x1 x2 + ci x22 + di x1 + ei x2 + fi and b2i − 4ai ci = 0. The set Pri = {x ∈ R2 : Cpi (x) ≤ 0} ∈ R2 : Cpi (x) ≥ 0} is not. is convex, but Psi = {x A convex set R = i∈{1,...,m} Ri , R ⊆ R2 , defined as the union of a finite number of parabolic regionsis said to have a parabolic subdivision if for any j, k ∈ {1, · · · , m}, j = k, Rj Rk is either empty or is contained in a parabola.
3
The Domain of the Conjugate
Given a nonconvex PLQ function, we first compute the closed convex envelope of each piece and obtain a piecewise rational function [17]. We now compute the conjugate of such a rational function over a polytope by first computing its domain, which will turn out to be a parabolic subdivision. Recall that for PLQ functions, dom f ∗ = ∂f (dom f ). We decompose the polytope dom f = P into its interior, its vertexes, and its edges. Following [17], we write a rational function as r(x, y) =
(ξ1 (x, y))2 + ξ0 (x, y), ξ2 (x, y)
where ξi (x, y) are linear functions in x and y.
(1)
260
D. Kumar and Y. Lucet
Proposition 1 (Interior). Consider r defined by (1), there exists αij such that x∈dom(r) ∂r(x) = {s : Cr (s) = 0}, where Cr (s) = α11 s21 + α12 s1 s2 + α22 s22 + α10 s1 + α02 s2 + α00 and {s : Cr (s) = 0} is a parabolic curve. Proof. Note ξ1 (x) = ξ11 x1 + ξ12 x2 + ξ10 , ξ2 (x) = ξ21 x1 + ξ22 x2 + ξ20 and ξ0 (x) = ξ01 x1 + ξ02 x2 + ξ00 . Since r is differentiable everywhere in dom(r) = R2 /{z : ξ2 (z) = 0}, for any x ∈ dom(r) we compute s = ∇r(x) as si = 2ξ1i t − ξ2i t2 + ξ0i for i = 1, 2, where t = (ξ11 x1 + ξ12 x2 + ξ10 )/(ξ21 x1 + ξ22 x2 + ξ20 ). Hence, s = ∇r(x) represents the parametric equation of a conic section, and by eliminating t, we get Cr (s) = 0 where Cr (s) = α11 s21 + α12 s1 s2 + α22 s22 + α10 s1 + α02 s2 + α00 , 2 2 3 4 with α11 = ξ21 ξ22 , α12 = −2ξ21 ξ22 , α22 = ξ21 and other αij are functions of the 2 3 6 2 ξ22 )2 −4ξ21 ξ22 = 0, so the coefficients of r. We check that α12 −4α11 α22 = (−2ξ21 conic section is a parabola. Consequently, for all x ∈ dom(r), ∂r(x) is contained in the parabolic curve Cr (s) = 0, i.e. ∂r(x) ⊂ {s : Cr (s) = 0}. x∈dom(r)
Conversely, any point sr that satisfies Cr (sr ) = 0, satisfies the parametric equation as well, so the converse inclusion is true. Corollary 1 (Interior). For a bivariate rational function r,and a polytope P , define f (x) = r(x) + IP (x), then for all x ∈ int(P ), the set x∈int(P ) ∂f (x) is contained inside a parabolic arc. Proof. We have, x∈int(P ) ∂f (x) ⊆ x∈int(P ) ∂r(x) and x∈int(P ) ∂r(x) ⊂ P where P ⊂ R2 is a parabolic curve (from Proposition 1). Since P is connected, we obtain that x∈int(P ) ∂r(x) is contained in a parabolic arc. Next we compute the subdifferential at any vertex in the smooth case (the proof involves a straightforward computation of the normal cone). Lemma 1 (Vertices). For g ∈ C 1 , P a polytope, and v vertex. Let f (x) = g(x) + IP (x). Then ∂f (v) is an unbounded polyhedral set. There is one vertex at which both numerator and denominator equal zero although the rational function can be extended by continuity over the polytope; we conjecture the result based on numerous observations. Conjecture 1 (Vertex). Let r as in (1), f (x) = r(x) + IP (x) and v be a vertex of P with ξ1 (v) = ξ2 (v) = 0. Then ∂f (v) is a parabolic region. Lemma 2 (Edges). For g ∈ C 1 , a polytope P , and an edge E = {x : x2 = } between vertices xl and xu , let f (x) = g(x) + IP (x), mx1 + 1 c, xl1 ≤ x1 ≤ xu then x∈ri(E) ∂f (x) = x∈ri(E) {s + ∇g(x) : s1 + ms2 = 0, s2 ≥ 0}.
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions
261
Proof. For all x ∈ ri(E), ∂f (x) = ∂g(x) + NP (x). Let L(x) = x2 − mx1 − c be the expression of the line joining xl and xu such that P ⊂ {x : L(x) ≤ 0}. (The case P ⊂ {x : L(x) ≥ 0} is analogous.) Since P ⊂ R2 is a polytope, for all x ∈ ri(E), NP (x) = {s : s = λ∇L(x), λ ≥ 0} is the normal cone of P at x and can be written NP (x) = {s : s1 + ms2 = 0, s2 ≥ 0}. In the special case when E = {x : x1 = d, xl1 ≤ x1 ≤ xu1 }, L(x) = x1 − d and NP (x) = {s : s2 = 0, s1 ≥ 0}. Now for any x ∈ ri(E), ∂f (x) = ∂g(x) + NP (x) = {s + ∇g(x) : s1 + ms2 = 0, s2 ≥ 0}, so ∂f (x) = {s + ∇g(x) : s1 + ms2 = 0, s2 ≥ 0}. x∈ri(E)
x∈ri(E)
Proposition 2 (Edges). For r as in (1), a polytope P and an edge E = − + − + {x : x2 = mx1 + c, v1 ≤ x1 ≤ v1 } between vertices v and v , let f (x) = r(x) + IP (x), then x∈ri(E) ∂f (x) is either a parabolic region or a ray. Proof. From Corollary 1, there exists l, u ∈ R2 such that x∈ri(E) ∂r(x) = x∈ri(E) {s : Cr (s) = 0, l1 ≤ s1 ≤ u1 }. So computing x∈ri(E) ∂f (x) leads to the following two cases: Case 1 (l = u) Same case as when r is quadratic (known result). Case 2 (l = u) By setting g = r in Lemma 2, for any x ∈ ri(E), ∂f (x) = {s : s1 + ms2 = 0, s2 ≥ 0}. Similar to the quadratic case, when ∇r(x) = l, ∂f (x) = {s : s1 + ms2 − (l1 + ml2 ) = 0, s2 ≥ l2 } and when ∇r(x) = u, ∂f (x) = {s : s1 + ms2 − (u1 + mu2 ) = 0, s2 ≥ u2 }. Assume ∂f (x) ⊂ {s : Cr (s) ≤ 0} (the case ∂f (x) ⊂ {s : Cr (s) ≥ 0} is analogous). Then ∂f (x) = {s + ∇r(x) : s1 + ms2 = 0, s2 ≥ 0} x∈ri(E)
x∈ri(E)
= {s : l1 + ml2 ≤ s1 + ms2 ≤ u1 + mu2 , Cr (s) ≤ 0} is a parabolic region. By gathering Lemma 1, Proposition 2, and Corollary 1, we obtain. Theorem 1 (Parabolic domain). Assuming Conjecture 1 holds, r is as in (1), P is a polytope, and f (x) = r(x) + IP (x). Then x∈P ∂f (x) has a parabolic subdivision. 36x21 + 21x1 x2 + 36x22 − 81x1 + 24x2 − 252 and polytope −12x1 + 9x2 + 75 P formed by vertices v1= (−1, 1), v2 = (−3, −3) and v3 = (−4, −3), let f (x) = r(x) + IP (x). We have x∈dom(r) ∂r(x) = {s : C(s) = 0} where C(s) = 9s21 + 24s1 s2 − 234s1 + 16s22 + 200s2 − 527. The parabolic subdivision for this example is shown in Fig. 1. Example 1. For r =
262
D. Kumar and Y. Lucet
∂f(v1)
⋃
∂f(x)
x∈ri(E13)
⋃
∂f(x)
x∈ri(E12)
∂f(v3) ∂f(v2)
Fig. 1. Parabolic subdivision for r and P from Example 1
4
Conjugate Expressions
Now that we know dom f ∗ as a parabolic subdivision, we turn to the computation of its expression on each piece. We note gf (s1 , s2 ) =
ψ (s , s ) 1 1 2 + ψ0 (s1 , s2 ) ζ00 ψ1/2 (s1 , s2 )
(2)
gq (s1 , s2 ) = ζ11 s21 + ζ12 s1 s2 + ζ22 s22 + ζ10 s1 + ζ01 s2 + ζ00
(3)
gl (s1 , s2 ) = ζ10 s1 + ζ01 s2 + ζ00
(4)
where ψ0 , ψ1/2 and ψ1 are linear functions in s, and ζij ∈ R. Theorem 2. Assume Conjecture 1 holds. For r as in (1), a polytope P , and f (x) = r(x) + IP (x), the conjugate f ∗ (s) has a parabolic subdivision such that over each member of its subdivision it has one of the forms in (2)–(4) Proof. We compute the critical points for the optimization problem defining f ∗ . Case 1 (Vertices) For any vertex v, f ∗ (s) = s1 v1 + s2 v2 − r(v) is a linear function of form (4) defined over an unbounded polyhedral set (from Lemma 1). In the special case, when ∂f (v) is a parabolic region (Conjecture 1), the conjugate would again be a linear function but defined over a parabolic region.
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions
263
Case 2 (Edges) Let F be the set of all the edges, and E = {x : x2 = mx1 + c, l1 ≤ x1 ≤ u1 } ∈ F be an edge between vertices l and u, then f ∗ (s) = supx∈ri(E) {s, x − (r(x) + IP (x))}. By computing the critical points, we have s − (∇r(x) + NP (x)) = 0 where NP (x) = {s : s = λ(−m, 1), λ ≥ 0} with m the slope of the edge. So s1 = −ξ21 t2 + 2ξ11 t + ξ01 − mλ s2 = −ξ22 t2 + 2ξ12 t + ξ02 + λ where t =
(5)
ξ11 x1 + ξ12 x2 + ξ10 . Since x ∈ ri(E), we have ξ21 x1 + ξ22 x2 + ξ20 x2 = mx1 + c,
which with (5) gives ⎧ ⎪ s2 + γ00 ⎨γ10 s1 + γ01 x1 = γ00 ± γ1/2 γ10/2 s1 + γ01/2 s2 + γ00/2 ⎪ ⎩ ±γ γ s +γ s +γ −1/2
10/2 1
01/2 2
(6)
when ξ21 + mξ22 = 0 otherwise,
(7)
00/2
where all γij and γij/k are defined in the coefficients of r, and parameters m and c. When ξ21 + mξ22 = 0, solving (5) and (6), leads to a quadratic equation in t with coefficients as linear functions in s. By substituting (7) and (6) in f ∗ (s), when ξ21 + mξ22 = 0, we have f ∗ (s) =
ζ00
ψ (s , s ) 1 1 2 + ψ0 (s1 , s2 ), ψ1/2 (s1 , s2 )
and when ξ21 + mξ22 = 0, f ∗ (s) = ζ11 s21 + ζ12 s1 s2 + ζ22 s22 + ζ10 s1 + ζ01 s2 + ζ00 , where all ζij , ψi and ψi/j are defined in the coefficients of r, and parameters m and c, with ψi (s) and ψi/j(s) linear functions in s. From Proposition 2, x∈ri(E) ∂f (x) is either a parabolic region or a ray. So for any E, the conjugate is a fractional function of form (2) defined over a parabolic region. When x∈ri(E) ∂f (x) is a ray, the computation of the conjugate is deduced from its neighbours by continuity. Case 3 (Interior) Since x∈int(P ) ∂f (x) is contained in a parabolic arc (from Corollary 1), the computation of the conjugate is deduced by continuity. x22 defined over x2 − x1 + 1 a polytope P with vertices v1 = (1, 1), v2 = (1, 0) and v3 = (0, 0), let f (x) = r(x) + IP (x).
Example 2. For a bivariate rational function r(x) =
264
D. Kumar and Y. Lucet
The conjugate (shown in Fig. 2) can be written ⎧ ⎪ 0 s ∈ R1 ⎪ ⎪ ⎪ ⎪ ⎪ s ∈ R2 ⎨s1 fP∗ (s) = s1 s ∈ R3 ⎪ ⎪ ⎪ s1 + s2 − 1 s ∈ R4 ⎪ ⎪ ⎪ ⎩ 1 (s + s )2 s ∈ R 2 5 4 1 where R1 = {s : s2 ≥ −s1 + 2, s2 ≥ 1} R2 = {s : s2 ≥ s1 , s21 + 2s1 s2 − 4s1 + s22 ≤ 0} R3 = {s : s2 ≤ s1 , s2 ≤ 1, s1 ≥ 0} R4 = {s : 0 ≤ s1 , s2 ≤ −s1 } R5 = {s : s2 ≥ −s1 , s2 ≤ −s1 + 2, s2 ≥ s1 , s21 + 2s1 s2 − 4s1 + s22 ≥ 0}.
Fig. 2. Conjugate for Example 2
5
Conclusion and Future Work
Figure 3 summarizes the strategy. Given a PLQ function, for each piece, its convex envelope is computed as the convex envelope of a quadratic function over a polytope using [16]. This is the most time consuming operation since the known algorithms are at least exponential. For each piece, we obtain a piecewise rational function. Then we take each of those pieces, and compute its conjugate to obtain
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions
265
a fractional function over a parabolic subdivision. That computation is complete except for Conjecture 1. Note that there is only a single problematic vertex v and since the conjugate is full domain, we can deduce ∂f (v) by elimination. Future work will focus on Step 3, which will give the conjugate of the original PLQ function. This will involve solving repeatedly the map overlay problem and is likely to take exponential time. From hundred of examples we ran, we expect the result to be a fractional function of unknown kind over a parabolic subdivision; see Fig. 3, bottom row, middle figure. The final step will be to compute the biconjugate (bottom-left in Fig. 3. We know it is a piecewise function over a polyhedral subdivision but do not know the formulas.
Qi, Pi
Qi, Pi
Step 1 [Loc16]
ri, Pi
Convex Envelope
ri, Pi PLQ Dual domain
Dual domain
?, Pi
Step 4 Conjugate
?
Step 3 Max Conjugate
frj, Prj
Step 2 Conjugate
Biconjugate
Fig. 3. Summary
References 1. Al-Khayyal, F.A., Falk, J.E.: Jointly constrained biconvex programming. Math. Oper. Res. 8(2), 273–286 (1983) 2. Anstreicher, K.M.: On convex relaxations for quadratically constrained quadratic programming. Math. Program. 136(2), 233–251 (2012) 3. Bauschke, H.H., Goebel, R., Lucet, Y., Wang, X.: The proximal average: basic theory. SIAM J. Optim. 19(2), 766–785 (2008) 4. Bauschke, H.H., Lucet, Y., Trienis, M.: How to transform one convex function continuously into another. SIAM Rev. 50(1), 115–132 (2008) 5. Brenier, Y.: Un algorithme rapide pour le calcul de transform´ees de LegendreFenchel discretes. Comptes rendus de l’Acad´emie des sciences. S´erie 1, Math´ematique 308(20), 587–589 (1989) 6. Corrias, L.: Fast Legendre-Fenchel transform and applications to Hamilton-Jacobi equations and conservation laws. SIAM J. Numer. Anal. 33(4), 1534–1558 (1996) 7. Crama, Y.: Recognition problems for special classes of polynomials in 0–1 variables. Math. Program. 44(1–3), 139–155 (1989)
266
D. Kumar and Y. Lucet
8. Gardiner, B., Jakee, K., Lucet, Y.: Computing the partial conjugate of convex piecewise linear-quadratic bivariate functions. Comput. Optim. Appl. 58(1), 249– 272 (2014) 9. Gardiner, B., Lucet, Y.: Convex hull algorithms for piecewise linear-quadratic functions in computational convex analysis. Set-Valued Var. Anal. 18(3–4), 467–482 (2010) 10. Gardiner, B., Lucet, Y.: Graph-matrix calculus for computational convex analysis. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 243–259. Springer (2011) 11. Gardiner, B., Lucet, Y.: Computing the conjugate of convex piecewise linearquadratic bivariate functions. Math. Program. 139(1–2), 161–184 (2013) 12. Haque, T., Lucet, Y.: A linear-time algorithm to compute the conjugate of convex piecewise linear-quadratic bivariate functions. Comput. Optim. Appl. 70(2), 593– 613 (2018) 13. Hiriart-Urruty, J.B., Lemar´echal, C.: Convex analysis and minimization algorithms II: Advanced Theory and Bundle Methods. Springer Science & Business Media (1993) 14. Jach, M., Michaels, D., Weismantel, R.: The convex envelope of (n-1)-convex functions. SIAM J. Optim. 19(3), 1451–1466 (2008) 15. Linderoth, J.: A simplicial branch-and-bound algorithm for solving quadratically constrained quadratic programs. Math. Program. 103(2), 251–282 (2005) 16. Locatelli, M.: A technique to derive the analytical form of convex envelopes for some bivariate functions. J. Glob. Optim. 59(2–3), 477–501 (2014) 17. Locatelli, M.: Polyhedral subdivisions and functional forms for the convex envelopes of bilinear, fractional and other bivariate functions over general polytopes. J. Glob. Optim. 66(4), 629–668 (2016) 18. Lucet, Y.: A fast computational algorithm for the Legendre-Fenchel transform. Comput. Optim. Appl. 6(1), 27–57 (1996) 19. Lucet, Y.: Faster than the fast Legendre transform, the linear-time Legendre transform. Numer. Algorithms 16(2), 171–185 (1997) 20. Lucet, Y.: Fast Moreau envelope computation i: Numerical algorithms. Numer. Algorithms 43(3), 235–249 (2006) 21. Lucet, Y.: What shape is your conjugate? A survey of computational convex analysis and its applications. SIAM Rev. 52(3), 505–542 (2010) 22. Lucet, Y., Bauschke, H.H., Trienis, M.: The piecewise linear-quadratic model for computational convex analysis. Comput. Optim. Appl. 43(1), 95–118 (2009) 23. McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part iconvex underestimating problems. Math. Program. 10(1), 147–175 (1976) 24. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science & Business Media (1998) 25. Sherali, H.D., Alameddine, A.: An explicit characterization of the convex envelope of a bivariate bilinear function over special polytopes. Ann. Oper. Res. 25(1), 197–209 (1990) 26. Tardella, F.: On the existence of polyhedral convex envelopes. In: Frontiers in Global Optimization, pp. 563–573. Springer (2004) 27. Tardella, F.: Existence and sum decomposition of vertex polyhedral convex envelopes. Optim. Lett. 2(3), 363–375 (2008)
Tractable Relaxations for the Cubic One-Spherical Optimization Problem Christoph Buchheim1 , Marcia Fampa2(B) , and Orlando Sarmiento2 1
2
Technische Universit¨ at Dortmund, Dortmund, Germany [email protected] Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil {fampa,osarmiento}@cos.ufrj.br
Abstract. We consider the cubic one-spherical optimization problem, consisting in minimizing a homogeneous cubic function over the unit sphere. We propose different lower bounds that can be computed efficiently, using decompositions of the objective function and well-known results for the corresponding quadratic problem variant. Keywords: Cubic one-spherical optimization problem · Best rank-1 tensor approximation · Trust region subproblem Convex relaxation
1
·
Introduction
The cubic one-spherical optimization problem has the following form: CSP : minn f (x) := Ax = 3
x∈R
s.t. x = 1,
n
aijk xi xj xk
i,j,k=1
where n ≥ 2 and A is a third-order (n×n×n)-dimensional real symmetric tensor. Tensor A is symmetric in the sense that its element aijk is invariant under any permutation of the indices (i, j, k). As shown by Zhang et al. [9], using a result by Nesterov [6], Problem CSP is NP-hard. This is in contrast to the quadratic variant of CSP, which is the well-known trust region subproblem. In spite of the non-convexity, the latter quadratic problem can be solved efficiently and has a concave dual problem with no duality gap [8]. We will use the latter results in one of the approaches presented in this paper. Applications for the cubic one-spherical optimization problem can be found in signal processing, where a discrete multidimensional signal is treated as a tensor, and the low-rank approximation of the tensor is used to approximate the signal. The CSP, for n = 3, is also used to formulate magnetic resonance signals in biological tissues. Some of these applications are described in [9] and in the references therein [2–5,7,8]. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 267–276, 2020. https://doi.org/10.1007/978-3-030-21803-4_28
268
C. Buchheim et al.
Our main purpose in this work is to develop new and efficient relaxations for problem CSP. For that, we propose different approaches, described in Sect. 3. In Sect. 4, we present preliminary numerical results concerning the quality of the resulting lower bounds and the computational effort to compute them, for small instances from the literature as well as larger randomly generated instances on up to n = 200 variables.
2
Notation and Preliminaries
We denote by S 3 (Rn ) the space of third-order (n × n × n)-dimensional real symmetric tensors. Given the symmetric tensor A ∈ S 3 (Rn ) appearing in Problem CSP and some ∈ {1, . . . , n}, we define A as the symmetric matrix composed by the elements ajk of A, for all j, k = 1, . . . , n. Moreover, by A˜ we denote the submatrix of A where the -th row and column are eliminated. For example, for n = 3, we have ⎞ ⎛ a111 a112 a113 a211 a212 a213 a311 a312 a313 A := ⎝ a121 a122 a123 a221 a222 a223 a321 a322 a323 ⎠ = (A1 , A2 , A3 ) a131 a132 a133 a231 a232 a233 a331 a332 a333 with
⎞ ⎞ ⎞ ⎛ ⎛ a111 a112 a113 a211 a212 a213 a311 a312 a313 A1 = ⎝ a121 a122 a123 ⎠ , A2 = ⎝ a221 a222 a223 ⎠ , A3 = ⎝ a321 a322 a323 ⎠ , a131 a132 a133 a231 a232 a233 a331 a332 a333 ⎛
and,
A˜1 :=
a122 a123 a132 a133
, A˜2 :=
a211 a213 a231 a233
, A˜3 :=
a311 a312 a321 a322
.
In the following, for a symmetric real matrix X, we will denote by λmin (X) the smallest eigenvalue of X. Given a vector x ∈ Rn and ∈ {1, . . . , n}, we define the vector xˆ ∈ Rn−1 as xˆ := (x1 , . . . , x−1 , x+1 , . . . , xn ), i.e., the vector x where the -th component is omitted.
3
Relaxations for the CSP
Our objective is to compute lower bounds for Problem CSP that can be efficiently calculated. For this, we relax the problem in different ways. The common idea is n to decompose the sum i,j,k=1 aijk xi xj xk appearing in the objective function of CSP into pieces that can be minimized over the constraint ||x|| = 1 efficiently. Combining all minima then yields a lower bound for CSP.
Tractable Relaxations for the Cubic One-Spherical Optimization Problem
3.1
269
Lower Bound by Decomposition – Approach 1
We first decompose the objective function of CSP by the first index, as follows: n
aijk xi xj xk =
n
i=1
i,j,k=1
n
xi
aijk xj xk + 2x2i
n
aiij xj + aiii x3i
(1)
j=1 j=i
j,k=1 j,k=i
Then we conclude n
min
x=1
aijk xi xj xk ≥
i,j,k=1 n
min
i=1
x=1
n
xi
aijk xj xk + 2x2i
n
aiij xj + aiii x3i .
(2)
j=1 j=i
j,k=1 j,k=i
By a further decomposition, for each i = 1, 2, . . . , n, we have
min
n
xi
x=1
j,k=1 j,k=i
j=1 j=i
min
xi ∈[−1,1]
+
n
aijk xj xk + 2x2i xi
min
xi ∈[−1,1]
min √
xˆi =
2x2i
1−x2i
aiij xj + aiii x3i ≥ n j,k=1 j,k=i
n
min √
xˆi =
aijk xj xk
1−x2i
+ min aiii x3i .
(3)
aiij xj
j=1 j=i
x=1
We now consider each problem on the right-hand-side of (3) independently. First, note that n min aijk xj xk = (1 − x2i )λmin (A˜i ), (4) √ 1−x2i
xˆi =
j,k=1 j,k=i
using the notation of Sect. 2. Multiplying the right hand side with xi and taking the minimum over xi ∈ [−1, 1], we obtain
min
xi ∈[−1,1]
xi
min √
xˆi =
1−x2i
Moreover, min √
xˆi =
1−x2i
n j=1 j=i
2√ aijk xj xk = − 3 |λmin (A˜i )| . 9 j,k=1 n
(5)
j,k=i
n
2 1 − x2i aiij xj = (−1) a iij j=1 j=i
(6)
270
C. Buchheim et al.
and hence 2x2i
min
xi ∈[−1,1]
min √
xˆi =
1−x2i
n
aiij xj
j=1 j=i
n 4√ =− 3 a2iij . 9 j=1
(7)
j=i
Finally, min aiii x3i = −|aiii | .
(8)
x=1
Adding up, from (2), (3), and (5)–(8) we obtain n
min
x=1
aijk xi xj xk ≥ ⎛
⎞
n n √ ⎟ ⎜2√ ˜i )| + 4 3 ⎜ 3 |λ ( A a2iij + |aiii |⎟ − min ⎠. ⎝9 9
i,j,k=1
j=1 j=i
i=1
The time to calculate this lower bound is dominated by computing the smallest eigenvalues of the n symmetric (n − 1) × (n − 1)-matrices A˜1 , . . . , A˜n . 3.2
Lower Bound by Duality – Approach 2
In the second approach, our aim is to decompose the objective function of CSP into fewer terms, thus hoping to obtain a stronger lower bound. For this, we use n
min
x=1
aijk xi xj xk ≥
n i=1
i,j,k=1
min
x=1
n
aijk xi xj xk .
j,k=1
Note that, for each i = 1, 2, . . . , n fixed, we obtain n
min
x=1
=
aijk xi xj xk
j,k=1
min
xi ∈[−1,1] xˆ= i
=
min
min
xi ∈[−1,1] y=1
1−x2i j,k=1 n
min
min
xi ∈[−1,1] y=1
aijk xi xj xk
aijk xi (1 − x2i )yj yk + 2
j,k=1 j,k=i
=
n
min √
xi (1 − x2i )y A˜i y + x2i
n
aiij x2i
1 − x2i yj + aiii x3i
j=1 j=i
3 1 − x2i a y + a x iii i , i
where we set 1 (x1 , x2 , . . . , xi−1 , xi+1 , . . . , xn ) 1−x2i
y := √
=√
1 xˆ 1−x2i i
∈ Rn−1 ,
ai := 2(aii1 , aii2 , . . . , aii(i−1) , aii(i+1) , . . . , aii(n−1) , aiin ) ∈ Rn−1 .
(9)
Tractable Relaxations for the Cubic One-Spherical Optimization Problem
271
In the following, we take advantage of the spectral decomposition of A˜i . For this, let λmin (A˜i ) = λi1 ≤ . . . ≤ λin be the eigenvalues of A˜i , and vi1 , . . . , vin be a corresponding orthonormal basis of eigenvectors. We have A˜i = Vi Λi Vi , where Vi := (vi1 , . . . , vin ) ∈ R(n−1)×(n−1) , with Vi Vi = Vi Vi = In−1 , and Λi := Diag(λi1 , . . . , λin ). For each i = 1, . . . , n, we then have 2 ˜ 2 3 2 min xi (1 − xi )y Ai y + xi 1 − xi ai y + aiii xi = y=1 (10) 2 2 3 2 min xi (1 − xi )z Λi z + xi 1 − xi bi z + aiii xi , z=1
where we substitute z := Vi y and bi := Vi ai . Recall that xi is a constant in this context. Note that Problem (10) aims at minimizing a quadratic function over the unit sphere, i.e., it is an instance of the so-called trust region subproblem and can thus be solved efficiently. A dual problem of (10), with no duality gap, is presented in [8]. For its solution, three cases have to be distinguished. In all cases, a lower bound on (10) is given by −1 1 4 2 2 3 bi + aiii xi . μ − xi (1 − xi )bi xi (1 − xi )Λi − μI max 4 xi (1−x2i )Λi −μI0 (11) Assuming for simplicity that bi has no zero entries, we have to distinguish between the cases xi ∈ {−1, 0, 1} and xi ∈ (−1, 0) ∪ (0, 1). In the latter case, we have x4i (1−x2i ) = 0 and hence the maximizer of (11) will lie in the interior of the feasible set (the so-called “easy case”). In the former case, (10) turns out to be trivial, with optimal value aiii x3i . However, we aim at deriving a bound without knowing xi in advance, so we cannot apply this case distinction a priori, which means we have to use the lower bound given by (11) in all cases. Let us now further divide the case xi ∈ (−1, 0) ∪ (0, 1) into two subproblems, namely, xi being negative or positive. Assuming xi ∈ (−1, 0), we may replace μ by −xi (1 − x2i )α in (11), obtaining −1 1 2 3 x (1 − x )(Λ + αI) b + a x −xi (1 − x2i )α − x4i (1 − x2i )b . i i i iii i i i −Λi −αI0 4 max
Therefore, for xi ∈ (−1, 0), we obtain a lower bound 2 ˜ 2 3 2 min xi (1 − xi )y Ai y + xi 1 − xi ai y + aiii xi ≥ y=1 1 3 2 −1 3 max −xi (1 − xi )α − xi bi (Λi + αI) bi + aiii xi . α 2, for the two first examples, and smaller than 0.16% for Example 3. These results suggest that the lower bounds obtained by Approach 3 quickly converge to valid bounds when the number of discretization points increases.
274
C. Buchheim et al.
4.3
Random Instances
Finally, we generated random third-order (n×n×n)-dimensional real symmetric tensors A, with entries uniformly distributed in (0, 1). The tensors were generated using the open source software Tensor Toolbox for MATLAB [1]. Tables 3 and 4 report average bounds for 20 instances for each n = 3, 5, 10, 30, 50, 100, 200. Table 2. Approach 3 – Results for different numbers of discretization points Example 1 it npoints lower bound rel dif 1 5 –7.661209e-001 55 –1.082002e+000 41.2312600 2 155 –1.083636e+000 0.1510226 3 305 –1.083790e+000 0.0142051 4 505 –1.084534e+000 0.0686056 5 755 –1.084610e+000 0.0070751 6 1055 –1.084610e+000 0.0000000 7 1405 –1.084610e+000 0.0000000 8 1805 –1.084610e+000 0.0000000 9 2255 -1.084688e+000 0.0071920 10 Example 2 it npoints lower bound rel dif 1 5 –2.117380e+000 55 –2.972419e+000 40.3819833 2 155 –2.973344e+000 0.0311089 3 305 –2.973650e+000 0.0102988 4 505 –2.973650e+000 0.0000000 5 755 –2.973650e+000 0.0000000 6 1055 –2.973688e+000 0.0012582 7 1405 –2.973770e+000 0.0027572 8 1805 –2.973770e+000 0.0000000 9 2255 –2.973770e+000 0.0000000 10 Example 3 it npoints lower bound rel dif 1 5 –7.405235e+000 55 –1.051328e+001 41.9709771 2 155 –1.131087e+001 7.5864276 3 305 –1.131287e+001 0.0176827 4 5 505 –1.131301e+001 0.0012360 755 –1.131359e+001 0.0051961 6 1055 –1.131359e+001 0.0000000 7 1405 –1.131405e+001 0.0039842 8 1805 –1.133194e+001 0.1581602 9 2255 –1.133194e+001 0.0000000 10
Tractable Relaxations for the Cubic One-Spherical Optimization Problem
275
For Approach 3, for each i = 1, . . . , n, we solve the quadratic problem (16), for 200 equally spaced points xi in the interval [−1, 1], using the algorithm described in [6]. The computational time needed for this approach is large and significantly increases with n. Therefore, we only apply it for the smallest instance in Table 3. We emphasize once more that the main objective of applying Approach 3 is to have an evaluation of the quality of the lower bounds computed by the other approaches. Note that, for all instances n = 3 in case the number of discretized points in Approach 3 approaches infinity, we should have its solution converging to the best possible bound given by Approach 2. For the larger instances in Table 4, we apply our two approaches actually intended to generate lower bounds for the CSP. Table 3. Results for random instances, n = 3. Approach Lower bound Time 1
−3.4802001
0.002
2
−3.4515856
0.043
3
−3.4480189
113.790
Table 4. Results for random instances, n = 5, 10, 30, 50, 100, 200. Approach Lower bound 1
−7.1952054
2
−8.2862354
Approach Lower bound
Time 0.0014 0.0807 Time
1
−19.1051160
0.0016
2
−28.4392343
0.1656
Approach Lower bound 1
−93.9774938
2
−220.1731727
Approach Lower bound
Time 0.0097 0.5579 Time
1
−197.1711761
0.0587
2
−581.0479091
1.1046
Approach Lower bound
Time
1
−540.4581602
2
−2203.4444732 5.8779
Approach Lower bound
0.6257 Time
1
−1495.4581060 9.9779
2
−8483.9467825 36.8817
276
C. Buchheim et al.
We note that the computational time of Approach 2 can still be reduced, because the computational times reported for Approach 2 take into account the solution of the minimization problems in (15) for 100 different values of . This strategy can be made more practical and more efficient, by a better analysis of the problem with the aim of reducing the number of dual solutions considered and improve the quality of the bounds. The improvement on these computations is part of our future research. Acknowledgments. C. Buchheim has received funding from the European Unions Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 764759. M. Fampa was supported in part by CNPq-Brazil grants 303898/2016-0 and 434683/2018-3. O. Sarmiento contributed much of his work while visiting the Technische Universit¨ at Dortmund, Dortmund, Germany, supported by a Research Fellowship from CAPES-Brazil - Finance Code 001.
References 1. Bader, B.W., Kolda, T.G., et al.: MATLAB Tensor Toolbox Version 3.0-dev, Oct 2017. https://www.tensortoolbox.org 2. Basser, P.J., Mattiello, J., LeBihan, D.: MR diffusion tensor spectroscopy and imaging. Biophys. J. 66, 259–267 (1994) 3. Basser, P.J., Mattiello, J., LeBihan, D.: Estimation of the effective seldiffusion tensor from the NMR spin echo. J. Magn. Reson. B 103, 247–254 (1994) 4. Basser, P.J., Jones, D.K.: Diffusion-tensor MRI: theory, experimental design and data analysis-a technical review. NMR Biomed. 15, 456–467 (2002) 5. Liu, C.L., Bammer, R., Acar, B., Moseley, M.E.: Characterizing non-gaussian diffusion by using generalized diffusion tensors. Magn. Reson. Med. 51, 924–937 (2004) 6. Nesterov, Y.E.: Random walk in a simplex and quadratic optimization over convex polytopes. CORE Discussion Paper 2003/71 CORE-UCL (2003) 7. Nie, J., Wang, L.: Semidefinte relaxations for best rank-1 tensor approximations. SIAM J. Matrix Anal. Appl. 35, 1155–1179 (2014) 8. Stern, R.J., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM J. Optim. 5, 286–313 (1995) 9. Zhang, X., Qi, L., Ye, Y.: The cubic spherical optimization problems. Math. Comput. 81(279), 1513–1525 (2012)
DC Programming and DCA
A DC Algorithm for Solving Multiobjective Stochatic Problem via Exponential Utility Functions Ramzi Kasri(B) and Fatima Bellahcene Faculty of Sciences, LAROMAD, Mouloud Mammeri University, BP 17 RP, 15000 Tizi-Ouzou, Algeria [email protected], [email protected]
Abstract. In this paper we suggest an algorithm for solving a multiobjective stochastic linear programming problem with normal multivariate distributions. The problem is first transformed into a deterministic multiobjective problem introducing the expected value criterion and an utility function. The obtained problem is reduced to a monobjective quadratic problem using a weighting method. This last problem is solved by DC algorithm. Keywords: Multiobjective programming · Stochastic programming · DCA · DC programming · Utility function · Expected value criterion
1
Introduction
Multiobjective stochastic linear programming (MOSLP) is a tool for modeling many concrete real-life problems because it is not obvious to have the complete data about problems parameters. Such a class of problems includes investment and energy resources planning [1,20], manufacturing systems in production planning [7,8], mineral blending [12], water use planning [2,5] and multi-product batch plant design [23]. So, to deal with this type of problems it is required to introduce a randomness framework. In order to obtain the solutions for these multiobjective stochastic problems, it is necessary to combine techniques used in stochastic programming and multiobjective programming. From this, two approaches are considered, both of them involve a double transformation. The difference between the two approaches is the order in which the transformations are carried out. Ben Abdelaziz qualified as multiobjective approach the perspective which transform first, the stochastic multiobjective problem into its equivalent multiobjective deterministic problem, and stochastic approach the techniques that transform in first the stochastic multiobjective problem into a monobjective stochastic problem [4]. Several interactive methods for solving (MOSLP) problems have been developed. We can mention the Probabilistic Trade-off Development Method or PROTRADE by Goicoechea et al. [10]. The Strange method proposed by c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 279–288, 2020. https://doi.org/10.1007/978-3-030-21803-4_29
280
R. Kasri and F. Bellahcene
Teghem et al. [21] and the interactive method with recourse which uses a two stage mathematical programming model by Klein et al. [11]. In this paper, we propose another approach which is a combination between the multiobjective approach and a nonconvex technique (Difference of Convex functions), to solve the multiobjective stochastic linear problem with normal multivariate distributions. The DC programming and DC Algorithm have been introduced by Pham Dinh Tao in 1985 and developed by Le Thi and Pham Dinh since 1994 [13–16]. This method has proved its efficiency in a large number of noncovex problems [17–19]. The paper is structured as follows: In Sect. 2, the problem formulation is given. Section 3, shows how to reformulate the problem by introducing utility functions and applying the weighting method. Section 4 presents a review of DC programming and DCA. Section 5 illustrates the application of DC programming and DCA for the resulting quadratic problem. Our, experimental results are presented in the last section.
2
Problem Statement
Let us consider the multiobjective stochastic linear programming problem formulated as follows: min (˜ c1 x, c˜2 x, ..., c˜q x), (1) s.t. x ∈ S, where x = (x1 , x2 , ..., xn ) denotes the n-dimensional vector of decision variables. The feasible set S is a subset of n-dimensional real vector space IRn characterized by a set of constraint inequalities of the form Ax ≤ b; where A is an m × n coefficient matrix and b an m-dimensional column vector. We assume that S is nonempty and compact in IRn . Each vector c˜k follows a normal distribution with mean ck and covariance matrix Vk . Therefore, every objective c˜k x follows a normal distribution with mean μk = c¯k x and variance σk2 = xt Vk x. In the following section, we will be mainly interested in the main way to transform problem (1) into an equivalent multiobjective deterministic problem which in turn will be reformulated as a DC programming problem.
3
Transformations and Reformulation
First, we will take into consideration the notion of risk. Assuming that decision makers’ preferences can be represented by utility functions, under plausible assumptions about decision makers’s risk attitudes, problem (1) is interpreted as: c1 x)], E[U (˜ c2 x)], ..., E[U (˜ cq x)]), min(E[U (˜ x (2) s.t. x ∈ S. The utility function U is generally assumed to be continuous and convex. In this paper, we consider an exponential utility function of the form U (r) = 1 − e−ar ,
A DC Algorithm for Solving Multiobjective Stochatic Problem
281
where r is the value of the objective and a the coefficient of incurred risk (a large corresponds to a conservative attitude). Our choice is motivated by the fact that exponential utility functions will lead to an equivalent quadratic problem which encouraged us to design a DC method to solve it simply and accurately. Therefore, if r ∼ N (μ, σ 2 ), we have:
2
+∞
E(U (r)) = −∞
−ar
(1 − e
2
σ 2 a2 e−(r−μ) /2σ dr √ = 1 − e 2 −μa . ) σ 2π 2 2
2
Minimizing E(U (r)) means maximizing σ 2a − μa or minimizing μ − σ2a . Our aim is to search for efficient solutions of the multiobjective deterministic problem (2) according to the following definition: Definition 1. [3] A feasible solution x∗ to problem (1) is an efficient solution ck x∗ )] with at if there is not another feasible x such that E[U (˜ ck x))] ≥ E[U (˜ least one strict inequality. The resulting criterion vector E[U (˜ ck x∗ )] is said to be non-dominated. Applying the widely used method for finding efficient solutions in multiobjective programming problems, namely the weighting sum method [3,6], we assign to each objective function in (2) a non-negative weight wk and aggregate the objectives functions in order to obtain a single function. Thus, problem (2) is reduced to: q wk E[U (˜ ck x)], min x k=1 (3) s.t. x ∈ S, wk ∈ Λ ∀k ∈ {1, . . . , q}, or equivalently minE[U ( x
q
wk c˜k x)],
k=1
s.t. x ∈ S, wk ∈ Λ ∀k ∈ {1, . . . , q}, where Λ = {wk :
q
(4)
wk = 1, wk ≥ 0 ∀k ∈ {1, . . . , q}}.
k=1
Theorem 1. [9] A point x∗ ∈ S is an efficient solution to problem (2) if and only if x∗ ∈ S is optimal for problem (4). Given that the random variable F (x, c˜) =
q
wk c˜k x in (4) is a linear function
k=1
of the random objectives c˜k x; its variance depends on the variances of c˜k x and on their covariances. Since each c˜k x follows a normal distribution with mean μk and covariance σk2 , the function F (x, c˜) follows a normal distribution with mean μ and covariance σ 2 where, μ=
q k=1
μk =
q k=1
wk c¯k x,
(5)
282
R. Kasri and F. Bellahcene q
σ2 =
k=1
wk2 σk2 + 2
q
wk ws σks ,
(6)
k,s=1
where σks denotes the covariance of the random objectives c¯k x and c¯s x. Finally, we obtain the following quadratic problem: ⎛ ⎞ q q q wk c¯tk x − a2 ⎝ wk2 σk2 + 2 wk ws σks ⎠ , min (7) x k=1 k,s=1 k=1 s.t. x ∈ S, ⎛
or min x
q k=1
wk c¯tk x −
a 2
⎝
q
k=1
wk2 xt Vk x + 2
s.t. x ∈ S,
k 0, λjk ≥ 0, x ≥ RI,k , 2x v − u ≤0, ⎪ ⎪ ⎨ Wk hkk λkk Q + Wk
0, hH H H kk Wk hkk ≥ fkk ⇔ ⎪ ⎪ hkk Wk hkk Wk hkk − fkk − λkk ⎪
⎪ ⎪ ⎪ ⎪ −Wj hjk ⎪hH W h ≤ f ⇔ λjk Q − Wj ⎪
0, j = k. H H ⎩ jk k jk jk −hjk Wj −hjk Wj hjk + fjk − λjk The relaxed constraints hold with equalities at the optimal solution [1]. RI,k −Rk , the outage constraint is transBy using slack variables ξk = 2Tr(Gkk W−1 k) formed into ⎧ 2 ⎪ ⎨ − ln pk − ξk σ3k − j=k ln(1 + ξk Tr(Gjk Wj ) ≤ 0, Tr(Gkk Wk ) ≤ sk , 0 < sk , 0 < ξk , ⎪ ⎩ξ ≤ 2RI,k −Rk −1 ⇔ R − R + log (1 + ξ s ) ≤ 0. k k I,k k k 2 sk RI,k −Rk
At the optimum, ξk = 2Tr(Gkk W−1 at optimum, if not, we can increase Rk . In k) addition, Tr(Gkk Wk ) = sk at the optimum, otherwise, we can decrease s leading RI,k −Rk . to increase Rk due to ξk = 2Tr(Gkk W−1 k) Such that, if the relaxed constraints do not hold with equalities at the optimum, the objective function can be further increased.
References 1. Chen, D., He, Y., Lin, X., Zhao, R.: Both worst-case and chance-constrained robust secure SWIPT in MISO interference channels. IEEE Trans. Inf. Forensics Secur. 13(2), 306–317 (2018) 2. Chu, Z., Zhu, Z., Hussein, J.: Robust optimization for an-aided transmission and power splitting for secure MISO SWIPT system. IEEE Commun. Lett. 20(8), 1571–1574 (2016) 3. Feng, Y., Yang, Z., Zhu, W., Li, Q., Lv, B.: Robust cooperative secure beamforming for simultaneous wireless information and power transfer in amplify-and-forward relay networks. IEEE Trans. Veh. Technol. 66(3), 2354–2366 (2017) 4. Gharavol, E.A., Liang, Y., Mouthaan, K.: Robust downlink beamforming in multiuser MISO cognitive radio networks with imperfect channel-state information. IEEE Trans. Veh. Technol. 59(6), 2852–2860 (2010) 5. Khandaker, M.R.A., Wong, K., Zhang, Y., Zheng, Z.: Probabilistically robust SWIPT for secrecy misome systems. IEEE Trans. Inf. Forensics Secur. 12(1), 211– 226 (2017)
298
P. A. Nguyen and H. A. L. Thi
6. Krikidis, I., Timotheou, S., Nikolaou, S., Zheng, G., Ng, D.W.K., Schober, R.: Simultaneous wireless information and power transfer in modern communication systems. IEEE Commun. Mag. 52(11), 104–110 (2014) 7. Le, T.A., Vien, Q., Nguyen, H.X., Ng, D.W.K., Schober, R.: Robust chanceconstrained optimization for power-efficient and secure SWIPT systems. IEEE Trans. Green Commun. Netw. 1(3), 333–346 (2017) 8. Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Dc programming and DCA for general dc programs. In: Advanced Computational Methods for Knowledge Engineering, pp. 15–35 (2014) 9. Le Thi, H.A., Le, H.M., Phanand, B.,Tran, D.N.: A DCA-like algorithm and its accelerated version with application in data visualization (2018). CoRR arXiv:abs/1806.09620 10. Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with dc models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1), 23–46 (2005) 11. Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018) 12. Lei, H., Ansari, I.S., Pan, G., Alomair, B., Alouini, M.: Secrecy capacity analysis over α − μ fading channels. IEEE Commun. Lett. 21(6), 1445–1448 (2017) 13. Li, Q., Ma, W.: Secrecy rate maximization of a MISO channelwith multiple multiantenna eavesdroppers via semidefinite programming. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3042–3045 (2010) 14. Ma, S., Hong, M., Song, E., Wang, X., Sun, D.: Outage constrained robust secure transmission for MISO wiretap channels. IEEE Trans. Wirel. Commun. 13(10), 5558–5570 (2014) 15. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithm and applications. Acta Math. Vietnam. 22(1), 289–355 (1997) 16. Pham Dinh, T., Le Thi, H.A.: D.C. optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998) 17. Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In: Transactions on Computational Intelligence XIII. pp. 1–37. Springer, Heidelberg (2014) 18. Rashid, U., Tuan, H.D., Kha, H.H., Nguyen, H.H.: Joint optimization of source precoding and relay beamforming in wireless mimo relay networks. IEEE Trans. Commun. 62(2), 488–499 (2014) 19. Tian, M., Huang, X., Zhang, Q., Qin, J.: Robust an-aided secure transmission scheme in MISO channels with simultaneous wireless information and power transfer. IEEE Signal Process. Lett. 22(6), 723–727 (2015) 20. Wang, K., So, A.M., Chang, T., Ma, W., Chi, C.: Outage constrained robust transmit optimization for multiuser MISO downlinks: tractable approximations by conic optimization. IEEE Trans. Signal Process. 62(21), 5690–5705 (2014) 21. Wang, S., Wang, B.: Robust secure transmit design in mimo channels with simultaneous wireless information and power transfer. IEEE Signal Process. Lett. 22(11), 2147–2151 (2015)
DCA-Like, GA and MBO: A Novel Hybrid Approach for Binary Quadratic Programs Sara Samir1(B) , Hoai An Le Thi1 , and Mohammed Yagouni2 1
Computer Science and Applications Department, LGIPM, University of Lorraine, Metz, France {sara.samir,hoai-an.le-thi}@univ-lorraine.fr 2 LaROMaD, USTHB, Alger, Algeria [email protected]
Abstract. To solve problems of quadratic binary programming, we suggest a hybrid approach based on the cooperation of a new version of DCA (Difference of Convex functions Algorithm), named the DCA-Like, a Genetic Algorithm and Migrating Bird Optimization algorithm. The component algorithms start in a parallel way by adapting the MasterSlave model. The best-found solution is distributed to all algorithms by using the Message Passing Interface (MPI) library. At each cycle, the obtained solution serves as a starting point for the next cycle’s component algorithms. To evaluate the performance of our approach, we test on a set of benchmarks of the quadratic assignment problem. The numerical results clearly show the effectiveness of the cooperative approach. Keywords: Binary quadratic programming problem · DC programming and DCA · Metaheuristics · Parallel and distributed programming · Genetic algorithm Migrating bird optimization
1
·
Introduction
The binary quadratic programs (BQPs) are NP-hard combinatorial optimization problems which take the following mathematical form: ⎧ min Z(x) = xT Qx + cT x, ⎪ ⎪ ⎨ Ax = b, (BQP ) (1) Bx ≤ b , ⎪ ⎪ ⎩ n x ∈ {0, 1} ,
where Q ∈ Rnn , c ∈ Rn , A ∈ Rmn , B ∈ Rpn , b ∈ Rm , b ∈ Rp . BQP is a common model of several problems in different areas including scheduling, facility location, assignment and knapsack. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 299–309, 2020. https://doi.org/10.1007/978-3-030-21803-4_31
300
S. Samir et al.
Many exact methods have been developed to solve the BQP such as branchand-bound and cutting plane. The main limit of these methods is their exponential execution time. Thus, they quickly become unusable for realistic size instances. To effectively solve these problems, researchers have directed their efforts towards the development of methods known as heuristics. In order to save the time, the researchers have widely studied heuristics: e.g. genetic algorithms, scatter search, ant colony optimization, tabu search, variable neighborhood search, cuckoo Search, migrating bird optimization. The DC (Difference of Convex function) programming and DCA (DC Algorithm) constitute another research direction which has been successfully applied to BQP (see, [8,9,17]). The main contribution of this study to the literature lies in a new cooperative approach using DCA-like and Metaheuristic methods, named COP-DCAl -Meta, for solving BQP. COP-DCAl -Meta is inspired by The Collaborative Metaheuristic optimization Scheme proposed by Yagouni and Hoai An in 2014 [20]. COPDCAl -Meta consists in combining DCA-like (a new variant of DCA), a genetic algorithm and the migrating bird optimization metaheuristic, in a cooperative way. The participating algorithms start running in parallel. Then, the solution of every algorithm will be distributed to the other ones via MPI (Message Passing Interface). We opted for DCA-like due to its power for solving nonconvex programs in different areas. As for GA and MBO, their efficiency has been proved in combinatorial optimization which motivated us to use them. To evaluate the performance of COP-DCAl -Meta we test on instances of the well known the quadratic assignment problem (QAP). DC programming and DCA were introduced first in 1985 by Pham Dinh Tao and then have been extensively developed since 1994 by Le Thi Hoai An and Pham Dinh Tao. To understand DC programming and DCA, we refer the reader to a seminal survey of Hoai An and Pham in [9]. DCA is a continuous approach which showed its efficiency for solving combinatorial optimization problems [7, 18] by using the exact penalty techniques (see [10,11]). Genetic algorithms are evolutionary algorithms proposed by Holland [3] and inspired by the process of natural selection. In the literature, we can find a lot of applications of genetic algorithms [4,13,14]. Migrating bird optimization (MBO) was presented by Duman et al. in 2012 [1]. MBO is heartened by the V formation flight of migrating birds. It has been proved to be efficient in combinatorial optimization (see, e.g., [1,2,19]) This paper is organized as follows. After the introduction, the component algorithms including DCA-like, GA, MBO and their application for solving BQP are briefly presented in Sect. 2. Section 3 is devoted to the cooperative approach for BQP. Numerical results are reported and discussed in Sects. 4 and 5 concludes the paper.
2
DCA-Like, GA and MBO for Solving BQP
In this study, we use three algorithms to design our cooperative approach. First of all, we briefly outline an overall description of DC programming and DCA, GA, MBO as well as their application to the BQP.
DCA-Like, GA and MBO: A Novel Hybrid Approach
2.1
301
DC Programming and DCA
DC programming and DCA have been applied to many large-scale nonconvex problem with impressive results in diverse areas. They adresse the DC program whose the objectif function is DC on the whole space Rn or on a convex set C ∈ Rn . A standard DC program is of the form min {f (x) = g(x) − h(x) : x ∈ Rn } ,
(2)
where g, h ∈ Γ0 (Rn ), the set contains all lower semi-continuous proper convex functions on Rn . g and h are called DC component, while g − h is a DC decomposition of f . To avoid the ambiguity in DC programming, (+∞)− (+∞) = +∞ is a usual convention [16]. A constrained DC program is defined by min {f (x) = g(x) − h(x) : x ∈ C} .
(3)
When C is a nonempty closed convex set. It can be clearly seen that (3) is a special case of (2). A constrained DC program can be transformed into an unconstrained DC program by adding the indicator function χC of the set C (χC (x) = 0 if x ∈ C, +∞ otherwise) to the first DC component g: (3) ⇔ min {f (x) = χC (x) + g(x) − h(x) : x ∈ Rn } .
(4)
DCA is an iterative algorithm based on local optimality. At each iteration, DCA computes y k ∈ ∂h(xk ) and then solves the convex sub-problem min{g(x) − h(xk ) + x − xk , y k : x ∈ Rn }. (5) 2.2
DC Reformulation of BQP
Many works have tackled DCA for solving BQP [6,7,12,18]. In order to apply DC programming and DCA, we reformulate BQP as a concave quadratic program with continuous variables using an exact penalty function [6,7,11,18]. Let Ω = {x ∈ [0, 1]n : Ax = b, Bx ≤ b }. BQP is equivalent to the following formulation min Z(x) : x ∈ Ω, p(x) ≤ 0 . (6) m xi (1 − xi ) is the penalty function. where p(x) = Σi=1 We can define BQPt the penalized program of BQP as follows min F (x) = Z(x) + tp(x) : x ∈ Ω.
(7)
According to the theorem of exact penalty [10], there exists a number t0 > 0 such that for all t > t0 the problems (1) and (7) are equivalent in the sense that they have the same optimal value and the same optimal solution set. We can use the following DC components of F : • Gρ (x) = ρ2 x2 + χΩ (x), • Hρ (x) = ρ2 x2 − Z(x) − tp(x),
302
S. Samir et al.
with ρ > 0. We can see that Gρ (x) is convex since Ω(x) is convex. As for Hρ (x), its convexity depends on the value of ρ. In practice, it is hard to determine the best value of ρ and is estimated by a large value. With ρ beeing larger than the 2 2 Z(x) denoted ρ ∇ Z(x) , Hρ (x) is convex. spectral radius of ∇
2 2 Since ρ ∇ Z(x) ≤ ∇ Z(x), we can choose ρ = ∇2 Z(x), where 2 2 (aip djq + api dqj ) . (8) ∇ Z(x) = i,j,p,q
However, ρ = ∇2 Z(x) is quite large which can affect the convergence of DCA. Hence, we can update ρ as in the following DCA-like algorithm [5]. Algorithm 1 DCA-Like for solving (7) Initialization: Choose x0 , ζ ≥ 1, ρ0 = ρζ , η1 > 1, η2 > 1, t, tmax , ξ ≥ 1 and k = 0. repeat k 1. Compute ρ = max{ρ0 , ρη1 }. 2. Compute y k = ∇Hρ (xk ) by y k = ρxk − ∇Z(xk ) − te + 2txk ,
(9)
where e is the one vector, i.e., all elements are 1. 3. Compute x ˜ ∈ argmin { ρ2 x2 − y k , x : x ∈ Ω}. x) < Hρk (xk ) + ˜ x −xk , y k do 4. While Hρ (˜ 4.1 ρ = η2 ρ. 4.2 If (t < tmax ), t = ξt. ˜ by Steps 2. and 3. 4.3 Update y k and x End. 5. Set k = k + 1. x. 6. Set xk =˜ 7. Set ρk = ρ. until Stopping criterion is satisfied.
2.3
Genetic Algorithms and Its Application to BQP
The GAs proposed by Holland in 1975 [3] are adaptative stochastic evolutionary algorithms, inspired by the biological model of the natural evolution of species. The population of the GA is composed of chromosomes which are the genetic representation of the solutions to our problem. The quality of each chromosome is evaluated by using the fitness function. The GA is based mainly on some operators. The operators used usually in GA are: Selection: A subset of the population is selected to be mated, using one of the selection procedures including, the tournament selection, the roulette wheel selection, rank selection, random selection, etc.
DCA-Like, GA and MBO: A Novel Hybrid Approach
303
Crossover: It is applied to the parents in order to create new individuals called children. A child takes its characteristics from both of its parents. Mutation: It consists of randomly changing one gene or more of a chromosome. The aim of the mutation is to create new properties in the population. Elitism: It allows to keep the best individuals for the next generation. In algorithm 2, we give the pseudo code of GA.
Algorithm 2 Pseudo code of GA 1. initialization. repeat 2. Selection. 3. Crossover operator. 4. Mutation operator. 5. Evaluate the fitness of new chromosomes. 6. Elitism and updating of the population. until Stopping criterion.
2.4
Migrating Bird Optimization
Contemplating and studying the V formation flight of the migrating birds give rise to design a new metaheuristic for solving the combinatorial optimization problems. This metaheuristic, called migrating bird optimization (MBO), was introduced by Duman et al. in [1]. It consists in exploiting the power saving in the V shape flight to minimize (or maximize) an objective function. The MBO is based on a population of birds and uses the neighboring search technique. A bird in a combinatorial optimization problem represents a solution. To keep the context of the V formation, one bird is considered a leader and the others constitute the right and left lines. The MBO starts treating from the leader to tails along the lines as it is shown in Algorithm 3.
3
COP-DCAl -Meta: The Cooperative Approach
COP-DCAl -Meta is inspired by the Metastorming approach [20] which is the brainstorming of metaheuristics. The brainstorming was presented by Osborn in 1948 [15]. Its main goal is to give birth to new ideas through a discussion between a group of participants. In brainstorming, all ideas of all the participating members are well taken into consideration. The similarity between the real brainstorming and the brainstorming of metaheuristics is investigated in [20]. The difference between COP-DCAl -Meta and Metastorming lies in the animator which has not been used in our work. The principle of COP-DCAl -Meta is to create a storm between DCA-like, GA and MBO. In COP-DCAl -Meta, the collaboration is reflected by using three
304
S. Samir et al.
Algorithm 3 MBO algorithm Initialization: n, m, k, x. 1. Generate new population and choose the leader. repeat 2. nbrT our = 0. 3. while(nbrT our < m) do 3.1. Generate k neighbors for the leader and improve it. 3.2. Share 2x best neighbors of the leader with the left and right lines. 3.3. For all birds on the right and left lines, do 3.3.1. Generate k − x neighbors. 3.3.2. Improve it by using the shared and the generated neighbors. 3.3.3. Share the x best neighbors with the next bird. End. 4. nbrT our = nbrT our + 1. End. 5. Change the leader. until Stopping criterion.
algorithms in parallel and solving all the problem. The cooperation is shown by exchanging and distributing information. Using the master-slave model, we can describe COP-DCAl -Meta as follows. The Parallel Initialization. The master chooses an initial point for DCA-like. Then, it computes ρ using (8). Both of slaves generate randomly the initial populations. A Cycle (Parallel and Distributed). The master runs one iteration of DCA-like. If a binary solution is found then it distributes a message indicating the end of the actual cycle. Otherwise, the order of continuing execution is broadcasted. The slave 1 (slave 2) performs one iteration of GA (MBO) and receives the information to rerun or to go to the next step. This step will be repeated until getting a binary solution by DCA-like which means until the end of the cycle. Parallel Exchanging and Evaluation. At the end of a cycle, all component algorithms will exchange the value of the objective function (SDCA, SGA, and SMBO) between each other. After that, an evaluation is done to determine the algorithm getting the best solution. In this step, every process broadcasts whether if its algorithms satisfy their stopping criterion (Stop-DCA, Stop-GA, and Stop-MBO) or not. If all criteria are met, then COP-DCAl -Meta will end and the final solution is BFS∗ . Otherwise, the best process distributes its solution (BFS) to the others which will use it as an initial solution by DCA-like, as a new chromosome by GA or as a new bird by MBO. At this level, COP-DCAl -Meta starts a new cycle (Fig. 1).
DCA-Like, GA and MBO: A Novel Hybrid Approach
305
Fig. 1. Cooperative-DCA-like-Metaheuristics: cooperative scheme based on metaheuristics and DCA (InfoDCA−like : SDCA and Stop-DCA, InfoGA : SGA and Stop-GA, InfoM BO : SMBO and Stop-MBO).
4
Numerical Results
In this section, we report the computational results of the cooperation COPDCAl -Meta. The proposed approach has been coded in C++ program and compiled within Microsoft Visual Studio 2017 using the software CPLEX version 12.6
306
S. Samir et al.
as a convex quadratic solver. All experiments were carried out on a Dell desktop computer with Intel Core(TM) i5-6600 CPU 3.30GHz and 8GB RAM. To study the performance of our approach, We test on seven instances of quadratic assignment problem taken from QAPLIB (A Quadratic Assignment Problem Library1 ) in OR-Library.2 We provide a comparison between COP-DCAl -Meta, the participating algorithms and the best known lower bound (BKLB which is either the optimal solution or the best-known one) for some instances. The comparison takes into consideration the objective value, the gap (see Eq. (10)) and the running time measured in seconds. Comparison Between COP-DCAl -Meta, the Component Algorithms and the Best Known Lower Bound The experiments are performed on four algorithms: COP-DCAl -Meta, DCAlike, GA and MBO. The quality of each algorithm is evaluated by a comparison with the best known lower bound (BKLB) and the gap between the obtained objective value and the BKLB. The gap of an algorithm (A) is computed by the following formulation Gap =
|Objective value given by (A) − BKLB| ∗ 100%. Objective value given by (A)
(10)
The results of four algorithms are reported in Table 1. The first column gives the ID of each dataset, the size which varies from 12 to 90 and the BKLB was taken from OR-Library. The remaining columns show the objective value, the gap and the CPU obtained by each algorithm. From the numerical results, it can be seen that: – In terms of the objective value and the gap, the cooperative approach COPDCAl -Meta is the most efficient, then MBO, DCA-like and finally GA. – COP-DCAl -Meta obtained the BKLB on 6/7 instances. – The cooperation between the component algorithms allows them to change the behavior and browse more promising regions to get better results. – DCA-like is very efficient when it solves the problems of big-size. This advantage can be exploited by COP-DCAl -Meta to improve it. – Regarding the running time, COP-DCAl -Meta consumes more time than each component algorithm. The difference is due to communication between the members.
1 2
http://anjos.mgi.polymtl.ca/qaplib//inst.html. http://people.brunel.ac.uk/mastjjb/jeb/info.html.
DCA-Like, GA and MBO: A Novel Hybrid Approach
307
Table 1. Comparison between COP-DCAl -Meta, DCA-like, GA and MBO. Dataset
Algorithm
Objective value gap% CPU
Bur26a n = 26
COP-DCAl -Meta DCA-like GA MBO
5426670 5438650 5435200 5431680
0.00 0.22 0.16 0.09
1610,41 128.11 403.77 241.69
COP-DCAl -Meta DCA-like GA MBO
9552 11418 10192 9552
0 16.34 6.28 0
112.43 17.65 63.39 1.55
COP-DCAl -Meta DCA-like GA MBO
3796 5200 4662 4204
0 27 7.98 2.32
835.62 62.66 2050.78 618.38
COP-DCAl -Meta DCA-like GA MBO
578 594 586 578
0 2.69 1.37 0
167.31 8.65 214.655 12.0997
COP-DCAl -Meta DCA-like GA MBO
200 210 200 200
0 4.76 0 0
368.00 73.90 59.8395 135.655
283315445 283315445 287271000 283335000
0 0 1.38 0.01
3208,83 295.11 151.21 666.65
362970 360630 363276 362953
0.64 0.61 0.73 0.64
16027.00 9797.05 3533.52 16618.7
BKLB = 5426670 Chr12a n = 12 BKLB = 9552 Chr25a n = 25 BKLB = 3796 Nug12
n = 12 BKLB = 578
Esc32d n = 32 BKLB = 200 Tai35b n = 35
COP-DCAl -Meta DCA-like BKLB = 283315445 GA MBO
Lipa90 n = 90 BKLB = 360630
5
COP-DCAl -Meta DCA-like GA MBO
Conclusion
We have investigated a new cooperative approach combining the deterministic algorithm DCA-like and two metaheuristics, GA and MBO, for solving the BQPs. This approach COP-DCAl -Meta is inspired by the metastorming. We gave first a brief explanation of the component algorithms of COP-DCAl Meta and then, detailed information about the cooperation. COP-DCAl -Meta is applied for solving the quadratic assignment problem. In the experiments, COP-DCAl -Meta outperformed the three component algorithms. This, if any-
308
S. Samir et al.
thing, shows that the cooperation between the component algorithms has been successfully realized. As an area of further research, we can collaborate DCA with other algorithms to solve other nonconvex problems.
References 1. Duman, E., Uysal, M., Alkaya, A.F.: Migrating birds optimization: a new metaheuristic approach and its performance on quadratic assignment problem. Inf. Sci. 217, 65–77 (2012) 2. Duman, E., Elikucuk, I.: Solving credit card fraud detection problem by the new metaheuristics migrating birds optimization. In: Proceedings of the 12th International Conference on Artificial Neural Networks. Advences in Computational Intelligence, vol. II, pp. 62–71. Springer (2013) 3. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975) 4. Julstrom, B.A.: Greedy, genetic, and greedy genetic algorithms for the quadratic knapsack problem. In: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, pp. 607–614. ACM (2005) 5. Hoai An, L.T., Le, H.M., Phan, D.N., Tran, B.: A DCA-like algorithm and its accelerated version with application in data visualization. https://arxiv.org/abs/ 1806.09620 (2018) 6. Hoai An, L.T., Pham, D.T.: Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J. Glob. Optim. 11(3), 253–285 (1997) 7. Hoai An, L.T., Pham, D.T.: A continuous approch for globally solving linearly constrained quadratic. Optimization 50(1–2), 93–120 (2001) 8. Hoai An, L.T., Pham D.T.: A continuous approach for large-scale constrained quadratic zero-one programming. Optimization 45(3): 1–28 (2001). (In honor of Professor ELSTER, Founder of the Journal Optimization) 9. Hoai An, L.T., Pham, D.T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018) 10. Hoai An, L.T., Pham, D.T., Le, D.M.: Exact penalty in DC programming. Vietnam. J. Math. 27(2), 169–178 (1999) 11. Hoai An, L.T., Pham, D.T., Van Ngai, H.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52(3), 509–535 (2011) 12. Hoai An, L.T., Pham, D.T., Yen, N.D.: Properties of two DC algorithms in quadratic programming. J. Glob. Optim. 49(3), 481–495 (2011) 13. Merz, P., Freisleben, B.: Genetic algorithms for binary quadratic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 1, pp. 417–424. Morgan Kaufmann Publishers Inc. (1999) 14. Misevicius, A., Staneviciene, E.: A new hybrid genetic algorithm for the grey pattern quadratic assignment problem. Inf. Technol. Control. 47(3), 503–520 (2018) 15. Osborn, A.F.: Your creative power: how to use imagination to brighten life, to get ahead. How To Organize a Squad To Create Ideas, pp. 265–274. Charles Scribner’s Sons, New York (1948). ch. XXXIII. 16. Pham, D.T., Hoai An, L.T.: Convex analysis approach to DC programming: theory, algorithm and applications. Acta Mathematica Vietnamica, 22(1), 289–355 (1997) 17. Pham, D.T., Hoai An, L.T., Akoa, F.: Combining DCA (DC Algorithms) and interior point techniques for large-scale nonconvex quadratic programming. Optim. Methods Softw.23, 609–629 (2008)
DCA-Like, GA and MBO: A Novel Hybrid Approach
309
18. Pham, D.T., Canh, N.N., Hoai An, L.T.: An efficient combined DCA and B&B using DC/SDP relaxation for globally solving binary quadratic programs. J. Glob. Optim. 48(4), 595–632 (2010) ¨ 19. Tongur, V., Ulker, E.: Migrating birds optimization for flow shop sequencing problem. J. Comput. Commun. 02, 142–147 (2014) 20. Yagouni, M., Hoai An, L.T.: A collaborative metaheuristic optimization scheme: methodological issues. In: van Do, T., Thi, H.A.L., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol. 282, pp. 3–14. Springer (2014)
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm Xuan Vinh Doan1,2(B) and Stephen Vavasis3 1
Operations Group, Warwick Business School, University of Warwick, Coventry CV4 7AL, UK [email protected] 2 The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK 3 Combinatorics and Optimization, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada [email protected]
Abstract. We propose Ky Fan 2-k-norm-based models for the nonconvex low-rank matrix recovery problem. A general difference of convex algorithm (DCA) is developed to solve these models. Numerical results show that the proposed models achieve high recoverability rates. Keywords: Rank minimization
1
· Ky Fan 2-k-norm · Matrix recovery
Introduction
Matrix recovery problem concerns the construction of a matrix from incomplete information of its entries. This problem has a wide range of applications such as recommendation systems with incomplete information of users’ ratings or sensor localization problem with partially observed distance matrices (see, e.g., [3]). In these applications, the matrix is usually known to be (approximately) low-rank. Finding these low-rank matrices are theoretically difficult due to their non-convex properties. Computationally, it is important to study the tractability of these problems given the large scale of datasets considered in practical applications. Recht et al. [11] studied the low-rank matrix recovery problem using a convex relaxation approach which is tractable. More precisely, in order to recover a low-rank matrix X ∈ Rm×n which satisfy A(X) = b, where the linear map A : Rm×n → Rp and b ∈ Rp , b = 0, are given, the following convex optimization problem is proposed: min X∗ X (1) s.t. A(X) = b, where X∗ = σi (X) is the nuclear norm, the sum of all singular values of i
X. Recht et al. [11] showed the recoverability of this convex approach using some This work is partially supported by the Alan Turing Fellowship of the first author. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 310–319, 2020. https://doi.org/10.1007/978-3-030-21803-4_32
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm
311
restricted isometry conditions of the linear map A. In general, these restricted isometry conditions are not satisfied and the proposed convex relaxation can fail to recover the matrix X. Low-rank matrices appear to be appropriate representations of data in other applications such as biclustering of gene expression data. Doan and Vavasis [5] proposed a convex approach to recover low-rank clusters using dual Ky Fan 2-k-norm instead of the nuclear norm. Ky Fan 2-k-norm is defined as |A|k,2 =
k
12 σi2 (A)
,
(2)
i=1
where σ1 ≥ . . . σk ≥ 0 are the first k largest singular values of A, k ≤ k0 = rank(A). The dual norm of the Ky Fan 2-k-norm is denoted by | · |k,2 , |A|k,2 = max A, X X
(3)
s.t. |X|k,2 ≤ 1.
These unitarily invariant norms (see, e.g., Bhatia [2]) and their gauge functions have been used in sparse prediction problems [1], low-rank regression analysis [6] and multi-task learning regularization [7]. When k = 1, the Ky Fan 2-k-norm is the spectral norm, A = σ1 (A), the largest singular value of A, whose dual norm is the nuclear norm. Similar to the nuclear norm, the dual Ky Fan 2-knorm with k > 1 can be used to compute the k-approximation of a matrix A (Proposition 2.9, [5]), which demonstrates its low-rank property. Motivated by this low-rank property of the (dual) Ky Fan 2-k-norm, which is more general than that of the nuclear norm, and its usage in other applications, in this paper, we propose a Ky Fan 2-k-norm-based non-convex approach for the matrix recovery problem which aims to recover matrices which are not recoverable by the convex relaxation formulation (1). In Sect. 2, we discuss the proposed models in detail and in Sect. 3, we develop numerical algorithms to solve those models. Some numerical results will also be presented.
2
Ky Fan 2-k-Norm-Based Models
The Ky Fan 2-k-norm is the 2 -norm of the vector of k largest singular values with k ≤ min{m, n}. Thus we have: |A|k,2 =
k
12 σi2 (A)
i=1
⎛ ≤ AF = ⎝
min{m,n}
⎞ 12 σi2 (A)⎠ ,
i=1
where · is the Frobenius norm. Now consider the dual Ky Fan 2-k-norm and use the definition of the dual norm, we obtain the following inequality: AF = A, A ≤ |A|k,2 · |A|k,2 . 2
312
X. V. Doan and S. Vavasis
Thus we have: |A|k,2 ≤ AF ≤ |A|k,2 ,
k ≤ min{m, n}.
(4)
It is clear that these inequalities become equalities if and only if rank(A) ≤ k. It shows that to find a low-rank matrix X that satisfies A(X) = b with rank(X) ≤ k, we can solve either the following optimization problem |X|k,2 XF s.t. A(X) = b,
(5)
min |X|k,2 − XF s.t. A(X) = b.
(6)
min
or
It is straightforward to see that these non-convex optimization problems can be used to recover low-rank matrices as stated in the following theorem given the norm inequalities in (4). Theorem 1. If there exists a matrix X ∈ Rm×n such that rank(X) ≤ k and A(X) = b, then X is an optimal solution of (5) and (6). Given the result in Theorem 1, the exact recovery of a low-rank matrix using (5) or (6) relies on the uniqueness of the low-rank solution of A(X) = b. Recht et al. [11] generalized the restricted isometry property of vectors introduced by Cand`es and Tao [4] to matrices and use it to provide sufficient conditions on the uniqueness of these solutions. Definition 1 (Recht et al. [11]). For every integer k with 1 ≤ k ≤ min{m, n}, the k-restricted isometry constant is defined as the smallest number δk (A) such that (7) (1 − δk (A)) XF ≤ A(X)2 ≤ (1 + δk (A)) XF holds for all matrices X of rank at most k. Using Theorem 3.2 in Recht et al. [11], we can obtain the following exact recovery result for (5) and (6). Theorem 2. Suppose that δ2k < 1 and there exists a matrix X ∈ Rm×n which satisfies A(X) = b and rank(X) ≤ k, then X is the unique solution to (5) and (6), which implies exact recoverability. The condition in Theorem 2 is indeed better than those obtained for the nuclear norm approach (see, e.g., Theorem 3.3 in Recht et al. [11]). The nonconvex optimization problems (5) and (6) use norm ratio and difference. When k = 1, the norm ratio and difference are computed between the nuclear and Frobenius norm. The idea of using these norm ratios and differences with k = 1 has been used to generate non-convex sparse generalizer in the vector case, i.e., m = 1. Yin et al. [13] investigated the ratio 1 /2 while Yin et al. [14] analyzed
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm
313
the difference 1 − 2 in compressed sensing. Note that even though optimization formulations based on these norm ratios and differences are non-convex, they are still relaxations of 0 -norm minimization problem unless the sparsity level of the optimal solution is s = 1. Our proposed approach is similar to the idea of the truncated difference of the nuclear norm and Frobenius norm discussed in Ma et al. [8]. Given a parameter t ≥ 0, the truncated difference is defined ⎛ ⎞ 12 min{m,n} min{m,n} as A∗,t−F = σi (A) − ⎝ σi2 (A)⎠ ≥ 0. For t ≥ k − 1, the i=t+1
i=t+1
problem of truncated difference minimization can be used to recover matrices with rank at most k given that X∗,t−F = 0 if rank(X) ≤ t + 1. Similar results for exact recovery as in Theorem 2 are provided in Theorem 3.7(a) in Ma et al. [8]. Despite the similarity with respect to the recovery results, the problems (5) and (6) are motivated from a different perspective. We are now going to discuss how to solve these problems next.
3
Numerical Algorithm
3.1
Difference of Convex Algorithms
We start with the problem (5). It can be reformulated as 2
max ZF Z ,z
s.t. |Z|k,2 ≤ 1, A(Z) − z · b = 0, z > 0,
(8)
with the change of variables z = 1/|X|k,2 and Z = X/|X|k,2 . The compact formulation 2 (9) min δZ (Z, z) − ZF /2, Z ,z
where Z is the feasible set of the problem (8) and δZ (·) is the indicator function of Z. The problem (9) is a difference of convex (d.c.) optimization problem (see, e.g., [9]). The difference of convex algorithm DCA proposed in [9] can be applied to the problem (9) as follows. Step 1. Start with (Z 0 , z 0 ) = (X 0 /|X 0 |k,2 , 1/|X 0 |k,2 ) for some X 0 such that A(X 0 ) = b and set s = 0. Step 2. Update (Z s+1 , z s+1 ) as an optimal solution of the following convex optimization problem max Z s , Z Z ,z
s.t. |Z|k,2 ≤ 1 A(Z) − z · b = 0 z > 0.
Step 3. Set s ← s + 1 and repeat Step 2.
(10)
314
X. V. Doan and S. Vavasis
Let X s = Z s /z s and use the general convergence analysis of DCA (see, e.g., Theorem 3.7 in [10]), we can obtain the following convergence results. Proposition 1. Given the sequence {X s } obtained from the DCA algorithm for the problem (9), the following statements are true.
|X s |k,2 is non-increasing and convergent. (i) The sequence X s F X s+1 Xs (ii) − → 0 when s → ∞. |X s+1 |k,2 |X s |k,2 F
The convergence results show that the DCA algorithm improves the objective of the ratio minimization problem (5). The DCA algorithm can stop if (Z s , z s ) ∈ O(Z s ), where O(Z s ) is the set of optimal solution of 10 and (Z s , z s ) which satisfied this condition is called a critical point. Note that (local) optimal solutions of (9)can be shown to be critical points. The following proposition shows that an equivalent condition for critical points. Proposition 2. (Z s , z s ) ∈ O(Z s ) if and only if Y = 0 is an optimal solution of the following optimization problem min |X s + Y |k,2 −
|X s |k,2
Y
s.t. A(Y ) = 0.
X s F 2
· X s , Y
(11)
Proof. Consider Y ∈ Null(A), i.e., A(Y ) = 0, we then have: Xs + Y 1 , ∈ Z. |X s + Y |k,2 |X s + Y |k,2 Clearly,
Xs Xs + Y Xs Xs ≤ , is equivalent to s , s s |X |k,2 |X + Y |k,2 |X |k,2 |X s |k,2 |X s + Y |k,2 −
|X s |k,2 X s F 2
· X s , Y ≥ |X s |k,2 .
When Y = 0, we achieve the equality. We have: (Z s , z s ) ∈ O(Z s ) if and only the above inequality holds for all Y ∈ Null(A), which means f (Y ; X s ) ≥ f (0; X s ) |X|k,2 for all Y ∈ Null(A), where f (Y ; X) = |X + Y |k,2 − 2 · X, Y . Clearly, XF it is equivalent to the fact that Y = 0 is an optimal solution of (11). The result of Proposition 2 shows the similarity between the norm ratio minimization problem (5) and the norm different minimization problem (6) with respect to the implementation of the DCA algorithm. It is indeed that the problem (6) is a d.c. optimization problem and the DCA algorithm can be applied as follows.
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm
315
Step 1. Start with some X 0 such that A(X 0 ) = b and set s = 0. Step 2. Update X s+1 = X s + Y , where Y is an optimal solution of the following convex optimization problem min |X s + Y |k,2 − Y
s.t. A(Y ) = 0.
1 · X s , Y X s F
(12)
Step 3. Set s ← s + 1 and repeat Step 2. It is clear that X s is a critical point for the problem (6) if and only if Y is an optimal solution of (12). Both problems (10) and (12) can be written in the general form as min |X s + Y |k,2 − α(X s ) · X s , Y Y (13) s.t. A(Y ) = 0, |X s |k,2
1 for (12), respectively. X s F Given that A(X ) = b, this problem can be written as where α(X) =
for (10) and α(X) =
2 X s F s
min |X|k,2 − α(X s ) · X s , X X
s.t. A(X) = b.
(14)
The following proposition shows that X s is a critical point of the problem (14) for many functions α(·) if rank(X s ) ≤ k. Proposition 3. If rank(X s ) ≤ k, X s is a critical point of the problem (14) for any function α(·) which satisfies |X s |k,2 1 ≤ α(X) ≤ 2 . XF X s F
(15)
Proof. If rank(X s ) ≤ k, we have: α(X s ) = 1/|X s |k,2 since |X s |k,2 = XF = |X s |k,2 . Given that ∂|A|k,2 = arg
max
X :|X |k,2 ≤1
X, A ,
we have: α(X s )·X s ∈ ∂|X s |k,2 . Thus for all Y , the following inequality holds: |X s + Y |k,2 − |X s |k,2 ≥ α(X s ) · X s , Y . It implies Y = 0 is an optimal solution of the problem (13) since the optimality condition is |X s + Y |k,2 − |X s |k,2 ≥ α(X s ) · X s , Y , Thus X s is a critical point of the problem (14).
∀ Y : A(Y ) = 0.
316
X. V. Doan and S. Vavasis
Proposition 3 shows that one can choose different functions α(·) such as α(X) = 1/|X|k,2 for the sub-problem in the general DCA framework to solve the original problem. This generalized sub-problem (14) is a convex optimization problem, which can be formulated as a semidefinite optimization problem given the following calculation of the dual Ky Fan 2-k-norm provided in [5]: |X|k,2 = min p + trace(R) s.t. kp − trace(P ) = 0, pI − P 10, T P −2X
0. − 12 X R
(16)
In order to implement the DCA algorithm, one also needs to consider how to find the initial solution X 0 . We can use the nuclear norm minimization problem 1, the convex relaxation of the rank minimization problem, to find X 0 . A similar approach is to use the following dual Ky Fan 2-k-norm minimization problem to find X 0 given its low-rank properties: min |X|k,2 X
(17)
s.t. A(X) = b.
This initial problem can be considered as an instance of (14) with X s = 0 (and α(0) = 1), which is equivalent to starting the iterative algorithm with X 0 = 0 one step ahead. We are now ready to provide some numerical results. 3.2
Numerical Results
Similar to Cand`es and Recht [3], we construct the following the experiment. We generate M , an m × n matrix of rank r, by sampling two m × r and n × r factors M L and M R with i.i.d. Gaussian entries and setting M = M L M R . The linear map A is constructed with s independent Gaussian matrices Ai whose entries follows N (0, 1/s), i.e., A(X) = b ⇔ Ai , X = Ai , M = bi ,
i = 1, . . . , s.
We generate K = 50 matrix M with m = 50, n = 40, and r = 2. The dimension of these matrices is dr = r(m+n−r) = 176. For each M , we generate s matrices for the random linear map with s ranging from 180 to 500. We set the maximum number of iterations of the algorithm to be Nmax = 100. The instances are solved using SDPT3 solver [12] for semi-definite optimization problems in Matlab. The computer used for these numerical experiments is a 64-bit Windows 10 machine with 3.70 GHz quad-core CPU, and 32GB RAM. The performance measure is the X − M F and the threshold = 10−6 is chosen. We run three relative error M F different algorithms, nuclear used the nuclear optimization formulation (1), k2-nuclear used the proposed iterative algorithm with initial solution obtained from (1), and k2-zero used the same algorithm with initial solution X 0 = 0.
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm
317
Fig. 1. Recovery probabilities and average computation times of three algorithms
Figure 1 shows recovery probabilities and average computation times (in seconds) for different sizes of the linear map. The results show that the proposed algorithm can recover exactly the matrix M with 100% rate when s ≥ 250 with both initial solutions while the nuclear norm approach cannot recover any matrix at all, i.e., 0% rate, if s ≤ 300. k2-nuclear is slightly better than k2-zero in terms of recoverability when s is small while their average computational times are almost the same in all cases. The efficiency of the proposed algorithm when s is small comes with higher average computational times as compared to that of the nuclear norm approach. For example, when s = 180, on average, one needs 80 iterations to reach the solution when the proposed algorithm is used instead of 1 with the nuclear norm optimization approach. Note that the average number of iterations is computed for all cases including cases when the matix M cannot be recovered. For recoverable cases, the average number of iterations is much less. For example, when s = 180, the average number of iterations for recoverable case is 40 instead of 80. When the size of the linear map increases, the average number of iterations is decreased significantly. We only need 2 extra iterations when s = 250 or 1 extra iteration
318
X. V. Doan and S. Vavasis
on average when s = 300 to obtain 100% recover rate when the nuclear norm optimization approach still cannot recover any of the matrices (0% rate). These results show that the proposed algorithm achieve significantly better recovery rate with a small number of extra iterations in many cases. We also test the algorithms with higher ranks including r = 5 and r = 10. Figure 2 shows the results when the size of linear map is s = 1.05dr .
Fig. 2. Recovery probabilities and average computation times for different ranks
These results show that when the size of linear maps is small, the proposed algorithms are significantly better than the nuclear norm optimization approach. With s = 1.05dr , the recovery probability increases when r increases and it is close to 1 when r = 10. The computational time increases when r increases given that the size of the sub-problems depends on the size of the linear map. With respect to the number of iterations, it remains low. When r = 10, the average numbers of iterations are 22 and 26 for k2-nuclear and k2-zero, respectively. It shows that k2-nuclear is slightly better than k2-zero both in terms of recovery probability and computational time.
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm
4
319
Conclusion
We have proposed non-convex models based on the dual Ky Fan 2-k-norm for low-rank matrix recovery and developed a general DCA framework to solve the models. The computational results are promising. Numerical experiments with larger instances will be conducted with first-order algorithm development for the proposed modes as a future research direction.
References 1. Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the k-support norm. In: NIPS, pp. 1466–1474 (2012) 2. Bhatia, R.: Matrix Analysis, Graduate Texts in Mathematics, vol. 169. Springer, New York (1997) 3. Cand`es, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009) 4. Cand`es, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005) 5. Doan, X.V., Vavasis, S.: Finding the largest low-rank clusters with Ky Fan 2-knorm and 1 -norm. SIAM J. Optim. 26(1), 274–312 (2016) 6. Giraud, C.: Low rank multivariate regression. Electron. J. Stat. 5, 775–799 (2011) 7. Jacob, L., Bach, F., Vert, J.P.: Clustered multi-task learning: a convex formulation. NIPS 21, 745–752 (2009) 8. Ma, T.H., Lou, Y., Huang, T.Z.: Truncated 1−2 models for sparse recovery and rank minimization. SIAM J. Imaging Sci. 10(3), 1346–1380 (2017) 9. Pham, D.T., Hoai An, L.T.: Convex analysis approach to dc programming: theory, algorithms and applications. Acta Mathematica Vietnamica 22(1), 289–355 (1997) 10. Pham, D.T., Hoai An, L.T.: A dc optimization algorithm for solving the trustregion subproblem. SIAM J. Optim. 8(2), 476–505 (1998) 11. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010) 12. Toh, K.C., Todd, M.J., T¨ ut¨ unc¨ u, R.H.: SDPT3-a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581 (1999) 13. Yin, P., Esser, E., Xin, J.: Ratio and difference of 1 and 2 norms and sparse representation with coherent dictionaries. Commun. Inf. Syst. 14(2), 87–109 (2014) 14. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1 − 2 for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Online DCA for Times Series Forecasting Using Artificial Neural Network Viet Anh Nguyen(B) and Hoai An Le Thi LGIPM, University of Lorraine, Metz, France {viet-anh.nguyen,hoai-an.le-thi}@univ-lorraine.fr
Abstract. In this work, we study the online time series forecasting problem using artificial neural network. To solve this problem, different online DCAs (Difference of Convex functions Algorithms) are investigated. We also give comparison with online gradient descent—the online version of one of the most popular optimization algorithm in the collection of neural network problems. Numerical experiments on some benchmark time series datasets validate the efficiency of the proposed methods. Keywords: Online DCA · DC programming forecasting · Artificial neural network
1
· DCA · Time series
Introduction
Time series analysis and forecasting have an important role and a wide range of applications such as stock market, weather forecasting, energy demand, fuels usage, electricity and in any domain with specific seasonal or trendy changes in time [15]. The information one gets from forecasting time series data can contribute to important decisions of companies or organizations with high priority. The goal of time series analysis is to extract information of a given time series data over some period of time. Then, the information is used to construct a model, which could be used for predicting future values of the considering time series. Online learning is a technique in machine learning which is performed in a sequence of consecutive rounds [14,17]. At each round t, we receive a question xt and have to give a corresponding prediction pt (xt ). After that, we receive the true answer yt and suffer the loss between pt (x) and yt . In many real world situations, we do not know the entire time series beforehand. New data might arrive sequentially in real time. In those cases, analysis and forecasting of time series should be put in an online learning context. Linear models like autoregressive and autoregressive moving average models are standard tools for time series forecasting problems [2]. However, many processes in real world are nonlinear. Empirical experience shows that linear models are not always the best to simulate the underlying dynamics of a time series. This gives rise to a demand for better nonlinear models. Recently, artificial neural networks have shown promising results in different applications [9], c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 320–329, 2020. https://doi.org/10.1007/978-3-030-21803-4_33
Online DCA for Times Series Forecasting Using Artificial Neural Network
321
Algorithm 1. Online learning scheme 1: for t = 1, 2, ... do 2: Receive question xt ∈ X. 3: Predict p(xt | θ) ∈ Y . 4: Receive the true answer yt ∈ Y and suffer loss L (p(xt | θ), yt ). 5: end for
which comes from the flexibility of those models in approximating functions (see [4,5]). In term of time series forecasting using neural networks, there are many works to mention [1,11]. Although these works demonstrate the effectiveness of neural networks for time series applications, they used the smooth activation functions such as sigmoid or tanh. In recent works, ReLU activation function are shown to be better than the above smooth activation functions with good properties in practical [12]. In this work, we propose an online autoregressive model using neural network with ReLU activation function. Unlike other regression works which use square loss, we choose -insensitive loss function to reduce the impact of outliers. We limit the architecture of the network to one hidden layer to reduce overfitting problem. In spite of not being a deep network, fitting one hidden layer neural network is still a nonconvex, nonsmooth optimization problem. To solve such problem in online context, we utilize the tools of online DC (Difference of Convex functions) programming and online DCA (DC Algorithm) (see [7,8,13]). The contribution of this work is the proposal and comparison of several online optimization algorithms based on online DCA to solve the time series forecasting problem using neural network. Numerical experiments on different time series datasets indicate the effectiveness of the proposed methods. The structure of this work is organized as follows. In Sect. 2, we present the online learning scheme, DCA and online DCA schemes with two learning rules. In Sect. 3, we formulate the autoregressive model with neural network and the corresponding online optimization problem. Section 4 contains three online DC algorithms to solve the problem in the previous section. We also consider the online gradient descent algorithm in this section. Numerical experiments are presented in Sect. 5 with conclusion in Sect. 6.
2 2.1
Online Learning and Online DCA Online Learning
In online learning, the online learner task is to answer a sequence of questions given the knowledge of the previous ones and possibly additional available information [14]. Online learning has interesting properties in both theoretical and practical aspects and is one of the most important domain in machine learning. The general scheme for online learning is summarized in Algorithm 1. The process of online learning is performed in a sequence of consecutive rounds. At round t, the learner is given a question xt , which is taken from an
322
V. A. Nguyen and H. A. Le Thi
instance domain X and is required to provide an answer p(xt | θ) in a target space Y . The learner is determined by its parameter θ in parameter space S. The learner then receive the true answer yt from the environment and suffers a loss L (p(xt | θ), yt ), which is a measure for the quality of the learner. In this work, at each round t, we use the assumption that the loss function is a DC function. 2.2
DCA and Online DCA
DC Programming and DCA constitute the backbone of smooth/nonsmooth nonconvex programming and global optimization. They address the problem of minimizing a function f which is a difference of convex functions on the whole space Rd . Generally speaking, a DC program takes the form (Pdc ), α = inf f (x) := g(x) − h(x) : x ∈ Rd where g, h ∈ Γ0 (Rd ), the set contains all lower semicontinuous proper convex functions on Rd . Such a function f is called a DC function, and g − h, a DC decomposition of f while g and h are DC components of f . A standard DC program with a convex constraint C (a non empty closed convex set in Rd ), which is α = inf {f (x) := g(x) − h(x) : x ∈ C}, can be expressed in the form of (Pdc ) by adding the indicator function of C to the function g. The main idea of DCA is quite simple: each iteration k of DCA approximates the concave part −h by its affine majorization corresponding to taking the subgradient y ∈ ∂h(xk ) and minimizes the resulting convex function. Convergence properties of the DCA and its theoretical basis are described in [7,13]. In the past years, DCA has been successfully applied in several works of various fields among them machine learning, financial optimization, supply chain management [8]. In online learning context, at each round t, we receive a DC loss function ft = gt − ht with gt , ht are functions in Γ0 (S) and S is the parameter set of the online learner. Then, we approximate the concave part −ht by its affine majorization corresponding to taking zt ∈ ∂ht (xt ) and minimize the resulting convex subproblem [3]. The subproblem of online DCA can take two forms. The first form is followthe-leader t learning rule [14], which means at round t, we minimize the cumulative loss i=1 fi . We can also minimize the current loss ft instead the cumulative one. t In short, we can write both learning rules above in a single formula as i=t0 fi , where t0 = 1 or t. Then, the online DCA scheme is given in Algorithm 2.
3
Autoregressive Neural Network for Online Forecasting Problem
Time series analysis is engaged in analyzing the underlying dynamics of a collection of successive values recorded in time called time series. The underlying
Online DCA for Times Series Forecasting Using Artificial Neural Network
323
Algorithm 2. Online DCA 1: Initialization: Let θ1 ∈ S be the best guess, t0 ∈ {1, t}. 2: for t = 1, 2, 3, ..., T do 3: Receive question xt . Give prediction p(xt | θt ). 4: Receive answer yt and suffer loss ti=t0 fi (θ) = ti=t0 gi (θ) − ti=t0 hi (θ). 5: Calculate wt ∈ ∂ht (θt ). t t 6: Calculate θt+1 ∈ arg min g (θ) − w , θ : θ ∈ S . i i i=t0 i=t0 7: end for
dynamics could be describe as a series of random variables {Tt }t∈N . A time series then is a series {τt }t∈N of observed values of those random variables [15]. A common way to simulate a time series is to construct the process as a function of past observed values, which is called autoregressive. When the autoregressive function is linear, we name the model linear autoregressive. In literature, linear autoregressive models are the most simple and oldest models for time series modeling [16]. Assume that past values are used for the regression, we denote by AR() the linear autoregressive model, which could be written as τt = α0 + α1 τt−1 + ... + α τt− . In many cases, linear models are not able to capture the nonlinearities in real world applications [6]. To overcome this, nonlinear models are the alternative. Recently, artificial neural network has grown quickly as it has the universal approximation property, which means it is able to approximate any nonlinear function on a compact set [5]. We call the AR-integrated version of neural network the autoregressive neural network (ARNN). In ARNN(), the linear AR() is replaced with a nonlinear composite transformation of past values. In this work, we consider a neural network with one hidden layer, which contains N hidden nodes activated by ReLU activation function [9]. Then, the formula of ARNN() model can be written as follows: τt = b + max(0, xTt U + aT )V,
(1)
where U ∈ R×N , V ∈ RN , a ∈ R , b ∈ R are the parameters of the network and xt = (τt−1 , τt−2 , ..., τt− ) is the vector containing past values of the series. Now we consider the online time series forecasting problem using ARNN() to predict the value at current time of the series {τn }n∈N . The features and labels of our dataset could be created as xt = (τt−1 , τt−2 , ..., τt− ), and yt = τt . We choose the prediction function of ARNN() as in (1), which is p(xt | θ) = max(0, xTt U + aT )V + b, where θ = (U, V, a, b). We note that the function max in this work is applied to all elements if its input contains a vector, which means if β = (β1 , β2 , ..., βd ) is a vector in Rd , then we have max(0, β) = max(0, (β1 , . . . , βd )) = (max(0, β1 ), . . . , max(0, βd )). To estimate the error between the predicted and true data, we use the −insensitive loss: L (p(xt | θ), yt ) = max (0, |p(xt | θ) − yt | − ) , where is a
324
V. A. Nguyen and H. A. Le Thi
positive number. From now on, we use the notation ft to denote the objective function corresponding to a single data point (xt , yt ), which also means ft (θ) = L (p(xt | θ), yt ). In online setting, we have two learning rules: either minimizing the loss ft or t the cumulative loss i=1 fi at round t. Both choices can be written in a single form as min θ∈S
t
fi (θ) =
i=t0
t
max 0, max 0, xTi U + aT V + b − yi − ,
(2)
i=t0
where t0 be either 1 or t. In summary, we follow the online learning scheme 1 to receive question xt , predict p(xt | θ) then suffer the loss ft (θ) = L (p(xt | θ), yt ) and minimize the problem (2). This process is repeated for t = 1, 2, ..., T in real time.
4
Solving the Problem (2) by Online DCA
In this section, we use online DCA to solve problem (2) in the following steps. First, we find a DC decomposition for the objective function ft which corresponds to the time stamp t. Using that decomposition, we solve (2) with t0 = t in Sect. 4.2. Also in Sect. 4.2, two versions of online DCA is studied as well as the online gradient descent algorithm. We solve (2) for the case t0 = 1 in Sect. 4.3, which results in another online DC algorithm. 4.1
DC Decomposition and Subproblems of Online DCA
DC Decomposition We will find a DC decomposition for the loss function fi (θ) = L (p(xi | θ), yi ) = max (0, |p(xt | θ) − yt | − ). We see that the loss function is the composition of the max, absolute value and p functions. Let φ be a DC function which has a DC decomposition given by φ = φ1 − φ2 , we have max (0, φ) = max (φ1 , φ2 ) − φ2 and |φ| = 2 max (φ1 , φ2 ) − (φ1 + φ2 ). Assume further that p has a DC decomposition p = q − r and apply the above formulas into fi , we obtain: fi (θ) = max (0, |p(xi | θ) − yi | − ) = 2 max (q(xi | θ), r(xi | θ) + yi ) − (q(xi | θ) + r(xi | θ) + yi + ) =: gi − hi .
(3)
Thus, the decomposition of fi = gi − hi is determined if we have q and r. In order to find those functions, we first consider the prediction function p(xt | θ) = max(0, xTt U + aT )V + b = max(0, xTt U + aT )V + − max(0, xTt U + aT )V − + b,
(4)
where V + = max(0, V ) and V − = − min(0, V ). We observe that the two first terms of the above equation are products of two nonnegative, convex functions.
Online DCA for Times Series Forecasting Using Artificial Neural Network
325
For two arbitrary nonnegative, convex functions u and v, we have a DC decom2 2 2 position of their product as follows: uv = (u+v) − u +v . Applying this formula 2 2 to (4), we obtain p(xt | θ) = q(xt | θ) − r(xt | θ), where 2 2 1 [ max(0, xTt Uj + aTj ) + Vj+ + Vj− ] 2 j=1 N
q(xt | θ) = b +
2 2
1 max(0, xTt Uj + aTj ) + Vj− + Vj+ 2 j=1 N
r(xt | θ) =
(5)
Hence, substituting (5) into (3) gives us the DC decomposition of fi . Subproblems Formulation 2, at round t, tonline DCAscheme t According to t we receive the loss function i=t0 fi (θ) = i=t0 gi (θ) − i=t0 hi (θ) with gi and hi being the DC components of fi that we found above. Then, we update the parameters θt+1 of the prediction function by solving the following subproblem: t t min gi (θ) − zi , θ , (6) θ∈S
i=t0
i=t0
where zi ∈ ∂hi (θt ). There are two cases for this subproblem: t0 = t or t0 = 1. In the following sections, we will consider each case in detail. 4.2
Learning Rule: min {ft (θ) : θ ∈ S} (Case t0 = t)
DCA-1a and DCA-1b The subproblem (6) becomes mins∈S gt (s) − zt , s. We can use projected subgradient method to solve this problem as follows. First, we choose θt as the initial point, which means s1 = θt . For clarity, we choose the superscript number as the indication for iteration of subgradient method. Let wtk ∈ ∂gt (sk ) be the subgradient of gt at point sk and αk be the step size at iteration k, the update formula can be written as follows: (7) sk+1 = ProjS sk − αk wtk − zt . Although the above formula is an explicit update, we have to repeat it until convergence of sk for each round t in the online context. This nested loop makes the convergence slower. A natural approach for this issue is to apply (7) for only 1 iteration each round, which results in the Algorithm 3 (DCA-1a). Solving the subproblem for only 1 iteration could reduce the heavy computation, but possibly leads to poor quality of solutions since the update parameter θt+1 , which is also the solution of the subproblem at round t, is not optimal. To balance between quality and time, we propose a combined strategy as follows. In each round the first T0 online rounds, we solve the subproblem with K iterations. Then, from round T0 + 1 to the last round T , we solve the subproblem with only 1 iteration as in DCA-1a. This strategy would improve the quality of solutions
326
V. A. Nguyen and H. A. Le Thi
Algorithm 3. DCA-1a 1: Initialization: θ1 ∈ S. 2: for t = 1, 2, 3, ..., T do 3: Receive question xt . Give prediction p(xt | θt ). 4: Receive answer yt and suffer loss ft (θ) = gt (θ) − ht (θ). 5: Choose step size αt > 0. 6: Calculate wt ∈ ∂gt (θt ) and zt ∈ ∂ht (θt ). 7: Calculate θt+1 = ProjS (θt − αt (wt − zt )). 8: end for
Algorithm 4. DCA-1b 1: Initialization: θ1 ∈ S and T0 , K ∈ N. 2: for t = 1, 2, 3, ..., T do 3: Receive question xt . Give prediction p(xt | θt ). 4: Receive answer yt and suffer loss ft (θ) = gt (θ) − ht (θ). 5: Calculate zt ∈ ∂ht (xt ). 6: if t ≤ T0 then 7: Solve minθ∈S {gt (θ) − zt , θ} by subgradient method with K iterations. 8: else 9: Solve minθ∈S {gt (θ) − zt , θ} by subgradient method with 1 iteration. 10: end if 11: end for
in the first T0 rounds, therefore leads to faster convergence. On the other hand, the computational time is kept in an acceptable threshold T0 as we adjust T0 and fi is bounded, so it K. In the view of regret bound, one might see that i=1 does not affect the sublinearity bound of the regret. The bound only depends on DCA-1a, which is applied for the latter T − T0 rounds. This combined strategy is describe in Algorithm 4 (DCA-1b). Online Gradient Descent (OGD) The update formula of OGD at round t with step size αt can be written as xt+1 = ProjS (xt − αt ∇ft (xt )) . From this formula, we can see that in order to use OGD, one must have the gradient of the objective function ft . ReLU, which is our activation function in the neural network, is a non-differentiable function at 0. This means there exists a subset S ⊂ S
Although networks with such that ft is not differentiable at any point θ in S. ReLU activation do not satisfy theoretical property of differentiability, gradientbased methods are still used widely in practical deep learning problems [9]. In implementation, one would choose the derivative value of ReLU at 0 as 0 or 1. Convergence analysis of such implementation are referred to [10]. Now recall that we have decomposed ft into DC components gt − ht . Let wt ∈ ∂gt (θt ) and zt ∈ ht (θt ) be the subgradients of gt and ht at θt . If we choose the subgradient of ReLU at 0 as the same value as in case of OGD above, then we obtain that wt − zt would be equal to the value of gradient of ft at θt . In this
Online DCA for Times Series Forecasting Using Artificial Neural Network
327
Algorithm 5. DCA-2 1: Initialization: θ1 ∈ S. 2: for t = 1, 2, 3, ..., T do 3: Receive question xt . Give prediction p(xt | θt ). 4: Receive answer yt and suffer loss ti=1 fi (θ) = ti=1 gi (θ) − ti=1 hi (θ). 5: Choose step size λt > 0. ∂ht (θt ). 6: Calculate wt ∈ ∂gt (θt ) and zt ∈ 7: Calculate θt+1 = ProjS θt − λt ti=1 (wi − zi ) . 8: end for
case, the update formula of DCA-1a and OGD are exactly the same. Therefore, in the numerical experiment, we consider DCA-1a and OGD as one algorithm. More details about OGD for convex loss functions could be found in [14].
Learning Rule: min {
i=1 }
t
fi (θ) : θ ∈ S (Case t0 = 1) t t g (s) − z , s , In this case, the subproblem (6) becomes mins∈S i=1 i i=1 i
4.3
where zi ∈ ∂hi (θi ). If we use subgradient method, first we initialize s1 = θt and k. The update formula can be obtained as let αk be the step size at iteration t t k+1 k k k k k s = s −α i=1 wi − i=1 zi , where wi ∈ ∂gi (s ). At each iteration k, we have to compute wik for all i in {1, 2, ..., t}. This makes the computation heavy, even with only 1 iteration of subgradient method. An other approach is to replace of gi at each gi with its piecewise linear approximation. The linear approximation 0 0 0 for all s in S, where w (s0 )+ wi0 , s − s ∈ ∂g s0 has the form of φ0i (s) = gi i (s ). i 1 2 n Assume we have a finite set s , s , ..., s ⊂ S. Then gi could be approximated by the piecewise linear function as follows: gi (s) ≈ max{φji (s) : j ∈ {1, ..., n}}. Put Gi (s) = max{φi,j (s) : j ∈ {1, ..., n}}, then the subproblem becomes min s∈S
t i=1
Gi (s) −
t i=1
zi , s =
t
max
wij − zi , s : j ∈ {1, ..., n} .
i=1
i For simplicity, we just linearize t each gi at the one point s = θi and the above problem becomes: mins∈S i=1 wi − zi , s , where wi ∈ ∂gi (θi ). We solve this subproblem using proximal point method with step size λt , of which the update rule has the form: t t s − θt 22 θt+1 ∈ arg min{ wi − zi , s + } = arg min{s − θt + λt (wi − zi ) 22 }. s∈S s∈S 2λ t i=1 i=1
t So we obtain θt+1 = ProjS θt − λt i=1 (wi − zi ) . With this update rule, we have Algorithm 5 (DCA-2).
328
5
V. A. Nguyen and H. A. Le Thi
Numerical Experiments
We conduct experiments on three algorithms on five time series datasets taken from UCI machine learning repository1 . The experimental procedure is implemented as follows. We perform the preprocessing by transforming the time series {τt } into a new dataset in which the feature vector xt has the form xt = (τt−4 , τt−3 , τt−2 , τt−1 ) , which we call a window of length 4. For each window, we take the current t-th value of the time series as the label, which is yt = τt . In short, we can see the online model as using 4 past values to predict the upcoming value. We choose T0 = 10 and K = 100 for DCA-1b. Mean square error (MSE) is used as the measure for quality. Results are reported in Table 1. Table 1. Comparative results on time series datasets. L denotes the length of the time series. Bold values correspond to best results for each dataset. We have chosen the subgradients such that DCA-1a and OGD have the same update formulas. Dataset
MSE Time (s) DCA-1a DCA-1b DCA-2 DCA-1a DCA-1b DCA-2
EU stock (L = 536)
2.957
1.518
1.679
0.015
0.025
0.037
NIKKEI stock (L = 536)
2.382
1.691
1.782
0.015
0.025
0.039
Appliances (L = 19735)
0.075
0.071
0.058
2.331
3.962
6.314
Temperature (L = 19735) 11.701
11.661
11.521 0.851
0.884
3.400
Pressure (L = 20000)
0.307
0.269
1.359
2.922
0.388
0.796
Comment. In term of MSE, DCA-2 is better than DCA-1a (or OGD) in all datasets. This can be explained by the fact that DCA-2 minimizes the cumulative loss, which gives more information for the online learner. DCA-1b outperforms DCA-2 in datasets with more number of instances (Appliances, Temperature and Pressure). Regarding computational time, DCA-1a (or OGD) is the best due to its light computation update formula. DCA-1b is slow since it has to loop over K iterations of subgradient method for the first T0 rounds. In summary, DCA-2 performs well on both quality and time. Although DCA1b has good MSEs on long time series, the computational time is large compared to two other algorithms. DCA-1a (or OGD) is the fastest but with the lowest quality.
6
Conclusion
This work presents an approach for online time series forecasting by using neural networks. The resulting optimization problem of neural network with ReLU acti1
http://archive.ics.uci.edu/ml.
Online DCA for Times Series Forecasting Using Artificial Neural Network
329
vation is nonconvex and nonsmooth. To handle that, we have proposed several online DCAs. The effectiveness of those algorithms are shown in the experiments. In future works, we plan to study more DC decompositions and optimization strategies for improving the results. In addition, using deeper neural networks in time series forecasting is an interesting problem that is worth further investigation.
References 1. Anders, U., Korn, O., Schmitt, C.: Improving the pricing of options: a neural network approach. J. Forecast. 17(5–6), 369–388 (1998) 2. Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time series analysis: forecasting and control. Wiley (2015) 3. Ho, V.T., Le Thi, H.A., Bui Dinh, C.: Online DC optimization for online binary linear classification. In: Nguyen, N.T., Trawi´ nski, B., Fujita, H., Hong, T.P. (eds.) Intelligent Information and Database Systems, pp. 661–670. Springer, Berlin (2016) 4. Hornik, K.: Some new results on neural network approximation. Neural Netw. 6(8), 1069–1072 (1993) 5. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989) 6. Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press (2004) 7. Le Thi, H.A., Pham Dinh, T.: The DC (Difference of Convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1), 23–46 (2005) 8. Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018) 9. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 10. Li, Y., Yuan, Y.: Convergence analysis of two-layer neural networks with ReLU activation. In: Advances in Neural Information Processing Systems, pp. 597–607 (2017) 11. Medeiros, M.C., Ter¨ asvirta, T., Rech, G.: Building neural network models for time series: a statistical approach. J. Forecast. 25(1), 49–75 (2006) 12. Pan, X., Srikumar, V.: Expressiveness of rectifier networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. ICML’16, vol. 48, pp. 2427–2435. JMLR.org (2016) 13. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to d.c. programming: theory, algorithm and applications. Acta Math. Vietnam. 22(01) (1997) 14. Shalev-Shwartz, S., Singer, Y.: Online learning: Theory, Algorithms, and Applications (2007) 15. Shumway, R.H., Stoffer, D.S.: Time Series Analysis and its Applications (Springer Texts in Statistics). Springer, Berlin (2005) 16. Yule, G.U.: On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers. In: Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, vol. 226, pp. 267–298 (1927) 17. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 928–936 (2003)
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program Yi-Shuai Niu1,2(B) , Yu You1 , and Wen-Zhuo Liu3 1
2
School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China {niuyishuai,youyu0828}@sjtu.edu.cn SJTU-Paristech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China 3 All´ee des techniques avanc´ees, Ensta Paristech, 91120 Palaiseau, France
Abstract. In this paper, we propose a new approach based on DC (Difference of Convex) programming, DC cutting plane and DCA (DC Algorithm) for globally solving mixed binary linear program (MBLP). Using exact penalty technique, we can reformulate MBLP as a standard DC program which can be solved by DCA. We establish the DC cutting plane (DC cut) to eliminate local optimal solutions of MBLP provided by DCA. Combining DC cut with classical cutting planes such as lift-and-project and Gomory’s cut, we establish a DC cutting plane algorithm (DC-CUT algorithm) for globally solving MBLP. A parallel DC-CUT algorithm is also developed for taking the power of multiple CPU/GPU to get better performance in computation. Preliminary numerical results show the efficiency of our methods. Keywords: DC programming · DCA DC cut · Parallel DC-CUT algorithm
1
· Mixed binary linear program ·
Introduction
Considering the mixed binary linear program (MBLP): min
f (x, y) := c x + d y
s.t.
Ax + By ≥ b (x, y) ∈ {0, 1}n × Rq+ ,
(P)
where vectors c ∈ Rn , d ∈ Rq , b ∈ Rm , and matrices A ∈ Rm×n , B ∈ Rm×q .
The research is funded by the National Natural Science Foundation of China (Grant No: 11601327) and by the Key Construction National “985” Program of China (Grant No: WF220426001). c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 330–340, 2020. https://doi.org/10.1007/978-3-030-21803-4_34
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program
331
MBLP is in general NP-hard which is well-known as one of Karp’s 21 NPcomplete problems [21]. Over the past several decades, a variety of optimization techniques have been proposed for MBLP. There are generally two kinds of approaches: Exact algorithms and Heuristic methods. The exact algorithms include cutting-plane method, branch-and-bound, and column generation etc. These methods intend to find the approximate global optimal solution, but due to NP-hardness, they are often inefficient in practice especially for large-scale cases. So heuristic methods are proposed instead, such as tabu search, simulated annealing, ant colony, hopfield neural networks etc. However, it is usually impossible to qualify the globality of the solutions returned by heuristic methods. Finding a global optimal solution for MBLP is very expensive in computation. In practice, we are often interested in efficient local optimization approaches. Among these methods, DCA is a promising one to provide good quality local (often global) optimal solution in various practical applications (see e.g., [5,9,18, 19]). Applying DCA to general mixed-integer linear optimization is firstly studied in [13] (where the integer variables are not only binaries), and extended for solving mixed-integer nonlinear programming [14,15] within various applications including scheduling [6], network optimization [22], and finance [7,20] etc. This algorithm is based on continuous representation techniques for integer set, exact penalty theorem, DCA and Branch-and-Bound (BB) algorithms. Recently, the author developed a parallel BB framework (called PDCABB) in order to use the power of multiple CPU/GPU for improving DCABB [17]. Besides the combination of DCA and BB, another global approach, called DCA-CUT, is a cutting plane method based on constructing cutting planes (called DC cuts) from the solutions of DCA. This kind of method is firstly studied in [11] and applied to several real-world problems as the bin-parking problem [10] and the scheduling problem [12]. However, DCA-CUT algorithm is not well-constructed since there exist some cases where DC cut is unconstructible. Due to this drawback, Niu proposed in [16] a hybrid approach to combine the constructible DC cut in DCABB for improving the lower bounds. In our paper, we will revisit DC cutting plane technique and discuss about the constructible DC cuts. For unconstructible cases, we propose to combine the classical global cuts such as lift-and-project cut and Gomory’s cut to establish a hybrid cutting plane algorithm called DC-CUT algorithm to globally solve MBLP. A parallel version of our DC-CUT algorithm is also developed for better performance. Moreover, variant algorithms with more cutting planes constructed in each iteration are proposed. The paper is organized as follows: Section 2 presents DC programming formulation and DCA for MBLP. In Sect. 3, we introduce the DC cutting planes. DC-CUT algorithms (with and without parallelism) and their variants are proposed in the next section. The numerical experimental results are reported in Sect. 5. Some conclusions and perspectives are discussed in the last section.
332
2
Y.-S. Niu et al.
DC Programming Formulation and DCA for MBLP
Let S be the feasible set of (P), and y be upper bounded by y¯. Let K be the linear relaxation of S defined by K = {(x, y) : Ax + By ≥ b, x ∈ [0, 1]n × Rq+ }. The linear relaxation of (P) denoted by R(P ) is defined as: min{f (x, y) : (x, y) ∈ K}, whose optimal value denoted by l(P ) is a lower bound of (P). The continuous representation technique for integer set {0, 1}n consists of finding a continuous DC function1 p : Rn → R such that {0, 1}n ≡ {x : p(x) ≤ 0}. For integer set {0, 1}n , we often use the piecewise linear function n p(x) = min{xi , 1 − xi }, i=1
n
then S = K ∩ {x ∈ R : p(x) ≤ 0} × [0, y¯]. Based on exact penalty theorem [4,8], if K is nonempty, then there exists a finite number t0 ≥ 0 such that for all t > t0 , the problem (P) is equivalent to: min s.t.
τt (x, y) := f (x, y) + tp(x) (x, y) ∈ K.
(P t )
A DC decomposition of τt (x, y) is given by: 0 −[−f (x, y) − tp(x)]. τt (x, y) = g(x,y)
h(x,y)
A subgradient (v, w) ∈ ∂h(x, y) can be chosen as v = −c + u and w = −d with t if xi ≥ 12 ; ui = (1) −t otherwise. for all i ∈ {1, · · · , n}. DCA for solving the problem (P t ) is described in Algorithm 1. In view of the polyhedral convexity of h, the problem (P t ) is a polyhedral DC program. According to the convergence theorem of DCA for polyhedral DC program [5,18], it follows that: (1) Algorithm 1 generates a sequence {(xk , y k )} ⊆ V (K)2 which converges to a KKT point (x∗ , y ∗ ) of (P t ) after finite iterations. ∗ ∗ (2) The sequence {f (xk , y k )}∞ k=1 is monotonically decreasing to f (x , y ). (3) If x∗i = 12 , ∀i ∈ {1, · · · , n}, then (x∗ , y ∗ ) is a local minimizer of (P t ). 1 2
A function f : Rn → R is called DC if there exist two convex functions g and h (called DC components) such that f = g − h. V (K) denotes the vertex set of the polyhedron K.
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program
333
Algorithm 1. DCA for (P t )
1 2 3 4 5 6 7 8 9
3 3.1
Input: Initial point (x0 , y 0 ) ∈ Rn × Rq+ ; large enough penalty parameter t > 0; tolerance ε1 , ε2 > 0. Output: Optimal solution x∗ and optimal value f ∗ ; Initialization: Set k = 0. Step 1: Compute (v k , wk ) ∈ ∂h(xk , y k ) via (1); Step 2: Solve the linear program by simplex algorithm to find a vertex solution (xk+1 , y k+1 ) ∈ arg min{−(v k , wk ), (x, y) : (x, y) ∈ K}; Step 3: Stopping check: if (xk+1 , y k+1 ) − (xk , y k ) ≤ ε1 or |τt (xk+1 , y k+1 ) − τt (xk , y k )| ≤ ε2 then (x∗ , y ∗ ) ← (xk+1 , y k+1 ); f ∗ ← τt (xk+1 , y k+1 ); return; else k ← k + 1; Goto Step 1. end
DC Cutting Planes Valid Inequalities Based on DC Programming
Let us denote u∗ = (x∗ , y ∗ ) ∈ K, I = {1, · · · , n} and define the index sets: J0 (u∗ ) = {j ∈ I : x∗j ≤ 1/2}; J1 (u∗ ) = I \ J0 (u∗ ). The affine function lu∗ is defined by: xj + lu∗ (u) = ∗ j∈J0 (u )
j∈J1 (u∗ )
(1 − xj ).
To simplify the notations, we will identify lu∗ (x) with lu∗ (u) and p(x) with p(u). Lemma 1. (see [12]) ∀u∗ ∈ K, we have: (i) lu∗ (u) ≥ p(u) ≥ 0, ∀u ∈ K. (ii) lu∗ (u∗ ) = p(u∗ ), and in particular, if u∗ ∈ S, then lu∗ (u∗ ) = p(u∗ ) = 0. Theorem 1. (see [11, 12]) There exists a finite number t1 ≥ 0 such that for all t > t1 , if u∗ ∈ V (K) \ S is a local minimizer of (P t ), then lu∗ (u) ≥ lu∗ (u∗ ), ∀u ∈ K. 3.2
DC Cut from Infeasible Solution
Let u∗ be infeasible solution of (P t ) obtained by DCA (i.e., u∗ ∈ / S). We have the following two cases: (1) All components of x∗ are different to (2) Otherwise.
1 2
and p(u∗ ) ∈ / Z.
334
Y.-S. Niu et al.
Case 1: When case 1 occurs, a cutting plane to cut off u∗ from K is constructed as in Theorem 2. Theorem 2. Let u∗ ∈ / S be a local minimizer of (P t ) and p(u∗ ) ∈ / Z, then the following inequality provides a cutting plane to separate u∗ and S lu∗ (u) ≥ lu∗ (u∗ ) .
(2)
Proof. First, it follows from Lemma 1 that lu∗ (u∗ ) = p(u∗ ), then p(u∗ ) ∈ / Z implies lu∗ (u∗ ) < lu∗ (u∗ ) . Thus, u∗ unsatisfies the inequality (2). Second, when t is sufficiently large, it follows from Theorem 1 that for all u ∈ S, Z / Z, hence, lu∗ (u∗ ) ≥ lu∗ (u∗ ) , ∀u ∈ S. lu∗ (u) ≥ lu∗ (u∗ ) ∈ Case 2: In this case, we will use classical cuts to separate u∗ and S. Lift-andproject (LAP) cut, one of the classical cuts, is introduced in Algorithm 2, the reader can refer to [1,2] for more details.
Algorithm 2. LAP cut
1 2 3 4 5 6
Input: u∗ ∈ V (K) \ S. Output: LAP cut. Step 1: (Index selection) Select an index j ∈ {1, · · · , n} with x∗j ∈ / Z; Step 2: (Cut generation) Set Cj be an m × (n + q − 1) matrix obtained from the matrix [A|B] by removing the j-th column aj ; j be an m × (n + q) zero matrix with only j-th column being aj − b; Set D j ; Set Cj ← [A|B] − D Solve the linear program: j )u∗ : wCj − vCj = 0, (w, v) ≥ 0} j + v C max{vb − (wD
7
3.3
to get its solution (w∗ , v ∗ ). j + v ∗ C j )u ≥ v ∗ b which separates u∗ and S. The LAP cut is defined by (w∗ D
DC Cut from Feasible Solution
Let u∗ be a feasible solution of (P t ) obtained by DCA (i.e., u∗ ∈ S), then it must be a local minimizer of (P t ). The next theorem provides a cutting plane. Theorem 3. Let u∗ ∈ S be a feasible local minimizer of (P t ), then the following inequality cuts off u∗ from S and remains all better feasible solutions in S. lu∗ (u) ≥ 1.
(3)
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program
335
Proof. First, since lu∗ (u∗ ) = 0, then u∗ unsatisfies the inequality (3). Second, let C1 = {(x∗ , y) : (x∗ , y) ∈ K}, the following problem min{τt (u) : u ∈ C1 }
(P1 )
is reduced to the linear program min{f (u) : u ∈ C1 }
(P2 )
since p(u) = 0, ∀u ∈ C1 . Moreover, since u∗ is a local minimizer of (P t ), then it is also a local minimizer of (P1 ) and (P2 ), and any local minimizer of linear program is a global minimizer, thus, u∗ globally solve (P2 ), i.e., f (u∗ ) ≤ f (u), ∀u ∈ C1 . Therefore, all better feasible solutions in S are not included in C1 which implies lu∗ (u) ≥ 1, ∀u ∈ S \ C1 .
4
DC-CUT Algorithms
In this section, we will establish DC-CUT algorithms based on DC cuts and the classical cutting planes presented in previous section. 4.1
DC-CUT Algorithm Without Parallelism
DC-CUT algorithm consists of constructing in each iteration a DC cut or a classical cut to reduce progressively the set K. During the iterations, once we find a feasible solution in S, then we update the incumbent solution, and once the linear relaxation on the reduced set provides a feasible optimal solution in S, or the reduced set is empty, or the gap between the lower bound and the upper bound is small enough, then we terminate our algorithm and return the best feasible solution. Let us denote V k the set of cutting planes constructed in the k-th iteration. Let K 0 = K, S 0 = S, then we update K k+1 = K k ∩ V k , and S k+1 = S k ∩ V k . We refer to the linear relaxation defined on K k as R(P k ) given by min{f (u) : u ∈ K k }, and the DC program defined on K k is (DCP k ) defined by: min{τt (u) : u ∈ K k }. The DC-CUT algorithm is described in Algorithm 3. 4.2
Parallel-DC-CUT Algorithm
Note that each restarting of DCA in line 7 of DC-CUT Algorithm 3 yields either a DC cut or a LAP cut (the LAP cut could also be other classical cuts). Thus, if DCA is applied several times in one iteration with different initial points, it is potential to introduce several cutting planes in one iteration to reduce quickly
336
Y.-S. Niu et al.
the lower bound, and provides more candidates for updating incumbent upper bound. Consider also the power of multiple CPU/GPU, we propose to start DCA simultaneously from random initial points. The differences between Parallel-DC-CUT Algorithm and DC-CUT Algorithm 3 are mainly focused on the codes from line 7 to line 19. Supposing that we want to use s parallel workers, then at the line 7, we can choose s random initial points in [0, 1]n × [0, y¯] for starting DCA simultaneously and construct cutting planes respectively. Once the parallel block is terminated at line 19, we collect all created cutting planes in V k to update the sets K k and S k .
Algorithm 3. DC-CUT Algorithm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Input: Problem (P); penalty parameter t; tolerence ε > 0; Output: Optimal solution (xopt , yopt ) and optimal value f val; Initialize: k ← 0; K 0 ← K; S 0 ← S; U B ← +∞; xopt = null; yopt = null; Solve R(P 0 ) to obtain its optimal solution u = (x, y) and update LB; if x ∈ S then xopt ← x; yopt ← y; f val ← LB; return; else while |LB − U B| < ε do Set u as the initial point for starting DCA for (DCP k ), and get its solution u∗ = (x∗ , y ∗ ); if u∗ ∈ / S k then if p(u∗ ) ∈ / Z && x∗i = 12 ∀i ∈ {1, . . . , n} then use inequality (2) to add a dc cut to V k ; else add a LAP cut to V k ; end else if u∗ is a better feasible solution then xopt ← x∗ ; yopt ← y ∗ ; U B ← f (x∗ , y ∗ ); f val ← U B; end use inequality (3) to add a dc cut to V k ; end K k+1 ← K k ∩ V k ; S k+1 ← S k ∩ V k ; k ← k + 1; solve R(P k ) to obtain u = (x, y) and LB; if R(P k ) is infeasible LB ≥ U B then return the current best solution (xopt , yopt ) and f val; else if u ∈ S k && LB < U B then xopt ← x; yopt ← y; U B ← LB; f val ← U B; return; end end end end
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program
4.3
337
Variant DC-CUT and Parallel-DC-CUT Algorithms
More efficient algorithms could be derived by introducing more cutting planes during each iteration in order to reduce the set K more quickly. For example, instead of adding one cut in each iteration of DC-CUT Algorithm, we can add several cuts including DC cut, LAP cut and Gomory’s cut etc if they are constructible. The advantage of this strategy can help to update lower bounds more quickly. However, more cutting planes increase the number of constraints more quickly which will potentially increase the difficulty for solving the subproblems. Moreover, it is also possible that some of these cuts could be redundant or inefficient (lazy cuts) for updating lower bound. Therefore, we should switch off the variant strategy when the lower bound updating rate is too small, and turn on this strategy when the updating rate is big enough.
5
Experimental Results
In this section, we report some numerical results of our proposed algorithms. Our algorithms are implemented in MATLAB, and using parfor for parallel computing. The linear subproblems are solved by Gurobi 8.1.0 [3]. The experiments are performed on a laptop equipped with 2 Intel i5-6200U 2.30GHz CPU (4 cores) and 8 GB RAM, thus we use 4 workers for tests in parallel computing.
(a) DC-CUT
(b) Parallel-DC-CUT
(c) Variant DC-CUT
(d) Variant Parallel-DC-CUT
Fig. 1. Results of different DC-CUT Algorithms
338
Y.-S. Niu et al.
We first illustrate the test results for a pure binary linear program example with 10 binary variables and 10 linear constraints. The optimal value is 0. Figure 1 illustrates the updates of upper bounds (solid cycle line) and lower bounds (dotted square line) with respect to the iterations of four DC-CUT Algorithms (DC-CUT, Parallel-DC-CUT, Variant DC-CUT, and Variant ParallelDC-CUT Algorithm). Comparing the cases (b) and (d) with parallelism to the cases (a) and (c) without parallelism, we observe that by introducing parallelism, DCA can find a global optimal solution more quickly, and the number of required iterations is reduced. The computing times for these four algorithms are respectively (a) 3.4s, (b) 2.3s, (c) 2.6s and (d) 1.6s. Clearly, the parallel cases (b) and (d) are faster than the cases (a) and (c) without parallelism, and the variant methods (c) and (d) with more cutting planes introduced in each iteration performed better in general. The fastest algorithm is the last case (d). Moreover, in this test example, it is possible to produce negative gap as in case (c), since DC cut is a kind of local cut which can cut off some feasible solutions such that the lower bound on the reduced set could be greater than the current upper bound. This feature is particular for DC cut which is quite different to the classical global cuts such as lift-and-project. Therefore, DC cut can often provide deeper cut than the global cuts, so it plays an important role in accelerating the convergence of the cutting plane method. Another important feature of our algorithms is the possibility to find a global optimal solution without small gap between upper and lower bound. We can observe that the cases (a), (b) and (d) are terminated without small gap. This is due to the fact that the introduction of cutting planes yields an empty set, thus no more computations are required. More numerical test results for large-scale cases will be reported in the fulllength paper.
6
Conclusion and Perspectives
In this paper, we have investigated the construction of DC cuts and established four DC-CUT algorithms with and without parallelism for solving MBLP. DC cut is a local cut which often provides a deeper cutting effect than classical global cuts, and helps to terminate our cutting plane algorithms more quickly without reducing to zero gap between upper and lower bounds. By introducing parallelism and adding more different types of cutting planes in each iteration, the performance of DC-CUT algorithms are significantly improved. Concerning on future works, we need more tests of our algorithms comparing with the state of the arts MBLP solvers such as Gurobi and Cplex on large-scale cases and real-world applications. Next, we have to improve our algorithms by introducing different types of cutting planes such as Gomory’s cut, mixed-integer rounding cut, knapsack cut etc. It is worth to investigate the performance of DCCUT algorithms when introducing more global cuts in each iteration. Moreover, we will extend DC cut to general integer cases and nonlinear cases.
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program
339
References 1. Balas, E., Ceria, S., Cornu´ejols, G.: A lift-and-project cutting plane algorithm for mixed 0–1 programs. Matematical programming. 58, 295–324 (1993) 2. Cornu´ejols, G.: Valid inequalities for mixed integer linear programs. Math. Program. 112(1), 3–44 (2008) 3. Gurobi 8.1.0. http://www.gurobi.com 4. Le Thi, H.A., Pham, D.T., Le Dung, M.: Exact penalty in dc programming. Vietnam J. Math. 27(2), 169–178 (1999) 5. Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005) 6. Le Thi, H.A., Nguyen, Q.T., Nguyen, H.T., Pham, D.T.: Solving the earliness tardiness scheduling problem by DC programming and DCA. Math. Balk. 23(3– 4), 271–288 (2009) 7. Le Thi, H.A., Moeini, M., Pham, D.T.: Portfolio selection under downside risk measures and cardinality constraints based on DC programming and DCA. Comput. Manag. Sci. 6(4), 459–475 (2009) 8. Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in dc programming. J. Glob. Optim. 52(3), 509–535 (2012) 9. Le Thi, H.A., Pham, D.T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018) 10. Ndiaye, B.M., Le Thi, H.A., Pham, D.T., Niu, Y.S.: DC programming and DCA for large-scale two-dimensional packing problems. In: Pan, J.S., Chen, S.M., Nguyen, N.T. (eds.) Intelligent Information and Database Systems, LNCS, vol. 7197, pp. 321–330, Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-28490-8 34 11. Nguyen, V.V.: M´ethodes exactes pour l’optimisation DC poly´edrale en variables mixtes 0-1 bas´ees sur DCA et des nouvelles coupes. Ph.D. thesis, INSA de Rouen (2006) 12. Nguyen, Q.T.: Approches locales et globales bas´ees sur la programmation DC et DCA pour des probl`emes combinatoires en variables mixtes 0–1, applications ` a la planification op´erationnelle. These de doctorat dirig´ee par Le Thi H.A, Informatique Metz (2010) 13. Niu, Y.S., Pham, D.T.: A DC Programming Approach for Mixed-Integer Linear Programs. In: Le Thi, H.A., Bouvry, P., Pham, D.T. (eds.) Modelling, Computation and Optimization in Information Systems and Management Sciences (MCO 2008), Communications in Computer and Information Science, vol. 14, pp. 244– 253. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-87477-5 27 14. Niu, Y.S.: Programmation DC & DCA en Optimisation Combinatoire et Optimisation Polynomiale via les Techniques de SDP–Codes et Simulations Num´eriques. Ph.D. thesis, INSA-Rouen, France (2010) 15. Niu, Y.S., Pham D.T.: Efficient DC programming approaches for mixed-integer quadratic convex programs. In: International Conference on Industrial Engineering and Systems Management (IESM 2011), pp. 222–231 (2011) 16. Niu, Y.S.: On combination of DCA branch-and-bound and DC-Cut for solving mixed 0-1 linear program. In: 21st International Symposium on Mathematical Programming (ISMP 2012). Berlin (2012) 17. Niu, Y.S.: A parallel branch and bound with DC algorithm for mixed integer optimization. In: 23rd International Symposium in Mathematical Programming (ISMP 2018). Bordeaux, France (2018)
340
Y.-S. Niu et al.
18. Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithm and applications. Acta Math. Vietnam. 22(1), 289–355 (1997) 19. Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trustregion subproblem. SIAM J. Optim. 8(2), 476–505 (1998) 20. Pham, D.T., Le Thi, H.A., Pham, V.N., Niu, Y.S.: DC programming approaches for discrete portfolio optimization under concave transaction costs. Optim. Lett. 10(2), 261–282 (2016) 21. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, The IBM Research Symposia Series, pp. 85–103. Springer, Boston (1972). https://doi.org/10.1007/ 978-1-4684-2001-2 9 22. Schleich, J., Le Thi, H.A., Bouvry, P.: Solving the minimum M-dominating set problem by a continuous optimization approach based on DC programming and DCA. J. Comb. Optim. 24(4), 397–412 (2012)
Sentence Compression via DC Programming Approach Yi-Shuai Niu1,2(B) , Xi-Wei Hu2 , Yu You1 , Faouzi Mohamed Benammour1 , and Hu Zhang1 1
School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China [email protected] 2 SJTU-Paristech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China
Abstract. Sentence compression is an important problem in natural language processing. In this paper, we firstly establish a new sentence compression model based on the probability model and the parse tree model. Our sentence compression model is equivalent to an integer linear program (ILP) which can both guarantee the syntax correctness of the compression and save the main meaning. We propose using a DC (Difference of convex) programming approach (DCA) for finding local optimal solution of our model. Combing DCA with a parallel-branchand-bound framework, we can find global optimal solution. Numerical results demonstrate the good quality of our sentence compression model and the excellent performance of our proposed solution algorithm. Keywords: Sentence compression · Probability model Parse Tree Model · DCA · Parallel-branch-and-bound
1
·
Introduction
The recent years have been known by the quick evolution of the artificial intelligence (AI) technologies, and the sentence compression problems attracted the attention of researchers due to the necessity of dealing with a huge amount of natural language information in a very short response time. The general idea of sentence compression is to make a summary with shorter sentences containing the most important information while maintaining grammatical rules. Nowadays, there are various technologies involving sentence compression as: text summarization, search engine and question answering etc. Sentence compression will be a key technology in future human-AI interaction systems. There are various models proposed for sentence compression. The paper of Jing [3] could be one of the first works addressed on this topic with many rewriting operations as deletion, reordering, substitution, and insertion. This approach The research is funded by Natural Science Foundation of China (Grant No: 11601327) and by the Key Construction National “985” Program of China (Grant No: WF220426001). c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 341–351, 2020. https://doi.org/10.1007/978-3-030-21803-4_35
342
Y.-S. Niu et al.
is realized based on multiple knowledge resources (such as WordNet and parallel corpora) to find the pats that can not be removed if they are detected to be grammatically necessary by using some simple rules. Later, Knight and Marcu investigated discriminative models [4]. They proposed a decision-tree model to find the intended words through a tree rewriting process, and a noisy-channel model to construct a compressed sentence from some scrambled words based on the probability of mistakes. MacDonald [12] presented a sentence compression model using a discriminative large margin algorithm. He ranks each candidate compression using a scoring function based on the Ziff-Davis corpus using a Viterbi-like algorithm. The model has a rich feature set defined over compression bigrams including parts of speech, parse trees, and dependency information, without using a synchronous grammar. Clarke and Lapata [1] reformulated McDonald’s model in the context of integer linear programming (ILP) and extended with constraints ensuring that the compressed output is grammatically and semantically well formed. The corresponding ILP model is solved by branch-and-bound algorithm. In this paper, we will propose a new sentence compression model to both guarantee the grammatical rules and preserve main meaning. The main contributions in this work are: (1) Taking advantages of Parse tree model and Probability model, we hybridize them to build a new model that can be formulated as an ILP. Using the Parse tree model, we can extract the sentence truck, then fix the corresponding integer variables in the Probability model to derive a simplified ILP with improved quality of the compressed result. (2) We propose to use a DC programming approach called PDCABB (an hybrid algorithm combing DCA with a parallel branch-and-bound framework) developed by Niu in [17] for solving our sentence compression model. This approach can often provide a high quality optimal solution in a very short time. The paper is organized as follows: The Sect. 2 is dedicated to establish hybrid sentence compression model. In Sect. 3, we will present DC programming approach for solving ILP. The numerical simulations and the experimental setup will be reported in Sect. 4. Some conclusions and future works will be discussed in the last section.
2
Hybrid Sentence Compression Model
Our sentence compression model is based on an Integer Linear Programming (ILP) probability model [1], and a parsing tree model. In this section, we will give a brief introduction of the two models, and propose our new hybrid model. 2.1
ILP Probability Model
Let x = {x1 , x2 , . . . , xn } be a sentence with n ≥ 2 words.1 We add x0 =‘start’ as the start token and xn+1 =‘end’ as the end token. 1
Punctuation is also deemed as word.
Sentence Compression via DC Programming Approach
343
The sentence compression is to choose a subset of words in x for maximizing its probability to be a sentence under some restrictions to the allowable trigram combinations. This probability model can be described as an ILP as follows: Decision variables: We introduce the binary decision variables δi , i ∈ [[1, n]] for2 each word xi as: δi = 1 if xi is in a compression and 0 otherwise. In order to take context information into consideration, we introduce the context variables (α, β, γ) such that: ∀i ∈ [[1, n]], we set αi = 1 if xi starts a compression and 0 otherwise; ∀i ∈ [[0, n − 1]] , j ∈ [[i + 1, n]], we set βij = 1 if the sequence xi , xj ends a compression and 0 otherwise; and ∀i ∈ [[0, n − 2]] , j ∈ [[i + 1, n − 1]] , k ∈ [[j + 1, n]], we set γijk = 1 if sequence xi , xj , xk is in a compression and 0 3 2 otherwise. There are totally n +3n6 +14n binary variables for (δ, α, β, γ). Objective function: The objective function is to maximize the probability of the compression computed by: f (α, β, γ) =
n
αi P (xi |start) +
i=1
+
n−1
n−2 n−1
n
γijk P (xk |xi , xj )
i=1 j=i+1 k=j+1 n
βij P (end|xi , xj )
i=0 j=i+1
where P (xi |start) stands for the probability of a sentence starting with xi , P (xk |xi , xj ) denotes the probability that xi , xj , xk successively occurs in a sentence, and P (end|xi , xj ) means the probability that xi , xj ends a sentence. The probability P (xi |start) is computed by bigram model, and the others are computed by trigram model based on some corpora. Constraints: The following sequential constraints will be introduced to restrict the possible trigram combinations: Constraint 1 Exactly one word can begin a sentence. n
αi = 1.
(1)
i=1
Constraint 2 If a word is included in a compression, it must either start the sentence, or be preceded by two other words, or be preceded by the ‘start’ token and one other word. δk − αk −
k−2 k−1
γijk = 0, ∀k ∈ [[1, n]] .
i=0 j=1
2
[[m, n]] with m ≤ n stands for the set of integers between m and n.
(2)
344
Y.-S. Niu et al.
Constraint 3 If a word is included in a compression, it must either be preceded by one word and followed by another, or be preceded by one word and end the sentence. j−1 j−1 n δj − γijk − βij = 0, ∀j ∈ [[1, n]] . (3) i=0 k=j+1
i=0
Constraint 4 If a word is in a compression, it must either be followed by two words, or be followed by one word and end the sentence. δi −
n−1
n
γijk −
j=i+1 k=j+1
n
βij −
j=i+1
i−1
βhi = 0, ∀i ∈ [[1, n]] .
(4)
h=0
Constraint 5 Exactly one word pair can end the sentence. n−1
n
βij = 1.
(5)
i=1 j=i+1
Constraint 6 The length of a compression should be bounded. l≤
n
δi ≤ ¯l.
(6)
i=1
with given lower and upper bounds of the compression l and ¯l. Constraint 7 The introducing term for preposition phrase (PP) or subordinate clause (SBAR) must be included in the compression if any word of the phrase is included. Otherwise, the phrase should be entirely removed. Let us denote Ii = {j : xj ∈ PP/SBAR, j = i} the index set of the words included in PP/SBAR leading by the introducing term xi , then δj ≥ δi , δi ≥ δj , ∀j ∈ Ii . (7) j∈Ii
ILP probability model: The optimization model for sentence compression is summarized as a binary linear program as: max{f (α, β, γ) : (1)−(7), (α, β, γ, δ) ∈ {0, 1}
n3 +3n2 +14n 6
}.
(8)
with O(n3 ) binary variables and O(n) linear constraints. The advantage of this model is that its solution will provide a compression with maximal probability based on the trigram model. However, there is no information about syntactic structures of the target sentence, so it is possible to generate ungrammatical sentences. In order to overcome this disadvantage, we propose to combine it with the parse tree model presented below.
Sentence Compression via DC Programming Approach
2.2
345
Parse Tree Model
A parse tree is an ordered, rooted tree which reflects the syntax of the input language based on some grammar rules (e.g. using CFG syntax-free grammar). For constructing a parse tree in practice, we can use a nature language processing toolkit NLTK [19] in Python. Based on NLTK, we have developed a CFG grammar generator which helps to generate automatically a CFG grammar based on a target sentence. A recursive descent parser can help to build a parse tree. For example, the sentence “The man saw the dog with the telescope.” can be parsed as in Fig. 1. It is observed that a higher level node in the parse tree indicates more important sentence components (e.g., the sentence S consists of a noun phrase NP, a verb phrase VP, and a symbol SYM), whereas a lower node tends to carry more semantic contents (e.g., the proposition phrase PP is consists of a preposition ‘with’, and a noun phrase ‘the telescope’). Therefore, a parse tree presents the clear structure of a sentence in a logical way.
Fig. 1. Parse tree example
Sentence compression can be also considered as finding a subtree which remains grammatically correct and containing main meaning of the original sentence. Therefore, we can propose a procedure to delete some nodes in the parse tree. For instance, the sentence above can be compressed as “The man saw the dog.” by deleting the node PP. 2.3
New Hybrid Model: ILP-Parse Tree Model
Our proposed model for sentence compression, called ILP-Parse Tree Model (ILP-PT), is based on the combination of the two models described above. The ILP model will provide some candidates for compression with maximal probability, while the parse tree model helps to guarantee the grammar rules and keep the main meaning of the sentence. This combination is described as follows: Step 1 (Build ILP probability model): Building the ILP model as in formulation (8) for the target sentence. Step 2 (Parse Sentence): Building a parse tree as described in Subsect. 2.2. Step 3 (Fix variables for sentence trunk): Identifying the sentence trunk in the parse tree and fixing the corresponding integer variables to be 1 in ILP model.
346
Y.-S. Niu et al.
This step helps to extract the sentence trunk by keeping the main meaning of the original sentence while reducing the number of binary decision variables. More precisely, we will introduce for each node Ni of the parse tree a label sNi taking the values in {0, 1, 2}. A value 0 represents the deletion of the node; 1 represents the reservation of the node; whereas 2 indicates that the node can either be deleted or be reserved. We set these labels as compression rules for each CFG grammar to support any sentence type of any language. For the word xi , we go through all its parent nodes till the root S. If the traversal path contains 0, then δi = 0; else if the traversal path contains only 1, then δi = 1; otherwise δi will be further determined by solving the ILP model. The sentence truck is composed by the words xi whose δi are fixed to 1. Using this method, we can extract the sentence trunk and reduce the number of binary variables in ILP model. Step 4 (Solve ILP): Applying an ILP solution algorithm to solve the simplified ILP model derived in Step 3 and generate a compression. In the next section, we will introduce a DC programming approach for solving ILP.
3
DC Programming Approach for Solving ILP
Solving an ILP is in general NP-hard. A classical and most frequently used method is branch-and-bound algorithm as in [1]. Gurobi [2] is currently one of the best ILP solvers using branch-and-bound combing various techniques such as presolve, cutting planes, heuristics and parallelism etc. In this section, we will introduce a Difference of Convex (DC) programming approach, called Parallel-DCA-Branch-and-Bound (PDCABB), for solving this model. Combing DCA and Branch-and-Bound without parallelism (called DCABB) is firstly proposed for solving zero-one programming problem in [6], which is later applied in strategic supply chain design problem [18]. DCABB proposed for solving general mixed-integer linear programming (MILP) is developped in [13], and extended for solving mixed-integer nonlinear programming [14,15] with various applications including scheduling [8], network optimization [21], cryptography [10] and finance [9,20] etc. This algorithm is based on continuous representation techniques for integer set, exact penalty theorem, DCA and Branch-and-Bound algorithms. Recently, the author developed a parallel branch-and-bound framework [17] in order to use the power of multiple CPU and GPU for improving the performance of DCABB. The ILP model can be stated in standard matrix form as: min{f (x) := c x : x ∈ S}
(P )
where S = {x ∈ {0, 1}n : Ax = b}, c ∈ Rn , b ∈ Rm and A ∈ Rm×n . Let us denote K the linear relaxation of S defined by K = {x ∈ [0, 1]n : Ax = b}. Thus, we have the relationship between S and K as S = K ∩ {0, 1}n .
Sentence Compression via DC Programming Approach
347
Let us denote R(P ) the linear relaxation of (P ) defined as min{f (x) : x ∈ K} whose optimal value l(P ) is a lower bound of (P ). The continuous representation technique for integer set {0, 1}n consists of finding a continuous DC function3 p : Rn → R such that {0, 1}n ≡ {x : p(x) ≤ 0}. We often use the following functions for p with their DC components: Expression of p DC components of p n Piecewise linear min{xi , 1 − xi } g(x) = 0, h(x) = −p(x) i=1 n Quadratic xi (1 − xi ) i=1 n 2 Trigonometric g(x) = π 2 x2 , h(x) = g(x) − p(x) i=1 sin (πxi ) Function type
Based on the exact penalty theorem [5,11], there exists a large enough parameter t ≥ 0 such that the problem (P ) is equivalent to the problem (P t ): min{Ft (x) := f (x) + tp(x) : x ∈ K}.
(P t )
The objective function Ft : Rn → R in (P t ) is also DC with DC components gt and ht defined as gt (x) = tg(x), ht (x) = th(x) − f (x) where g and h are DC components of p. Thus the problem (P t ) is a DC program which can be solved by DCA which is simply described in the next scheme: xi+1 ∈ arg min{g(x) − x, y i : x ∈ K} with y i ∈ ∂h(xi ). The symbol ∂h(xi ) denotes the subdifferential of h at xi which is fundamental in convex analysis. The subdifferential generalizes the derivative in the sense that h is differentiable at xi if and only if ∂h(xi ) reduces to the singleton {∇h(xi )}. Concerning on the choice of the penalty parameter t, we suggest using the following two methods: the first method is to take arbitrarily a large value for t; the second one is to increase t by some ways in iterations of DCA (e.g., [14,20]). Note that a smaller parameter t yields a better DC decomposition [16]. DCA often provides an integer solution for (P ) thus it is often served as an upper bound algorithm. More details about DCA and its convergence theorem can be found in [7]. Due to the length limitation of the proceeding paper, the combination of DCA with a parallel-branch-and-bound algorithm (PDCABB) proposed in [17] as well as its convergence theorem, branching strategies, parallel node selection strategies ... will be discussed in our full-length paper. 3
A function f : Rn → R is called DC if there exist two convex functions g and h (called DC components) such that f = g − h.
348
4
Y.-S. Niu et al.
Experimental Results
In this section, we present our experimental results for assessing the performance of the sentence compression model described above. Our sentence compression model is implemented in Python as a Natural Language Processing package, called ‘NLPTOOL’ (actually supporting multilanguage tokenization, tagging, parsing, automatic CFG grammar generation, and sentence compression), which implants NLTK 3.2.5 [19] for creating parsing trees and Gurobi 8.1.0 [2] for solving the linear relaxation problems R(Pi ) and the convex optimization subproblems in Step 2 of DCA. The PDCABB algorithm is implemented in C++ and invoked in python. The parallel computing part in PDCABB is realized by OpenMP. 4.1
F-score Evaluation
We use a statistical approach called F-score to evaluate the similarity between the compression computed by our algorithm and a standard compression provided by human. Let us denote A as the number of words both in the compressed result and the standard result, B as the number of words in the standard result but not in the compressed result, and C as the number of words in the compressed result but not in the standard result. Then F-score is defined by: Fµ = (μ2 + 1) ×
P ×R μ2 × P + R
A where P and R represent for precision rate and recall rate as P = A+C ,R = A . The parameter μ, called preference parameter, stands for the preference A+B between precision rate and recall rate for evaluating the quality of the results. Fµ is a strictly monotonic function defined on [0, +∞[ with lim Fµ = P and µ→0
lim Fµ = R. In our tests, we will use F1 as F-score. Clearly, a bigger F-score
µ→+∞
indicates a better compression. 4.2
Numerical Results
Table 1 illustrates the compression result of 100 sentences obtained by two ILP compression models: our new hybrid model (H) v.s. the probability model (P). Penn Treebank corpus (Treebank) provided in NLTK and CLwritten corpus (Clarke) provided in [1] are used for sentence compression. We applied KneserNey Smoothing for computing trigram probabilities. The compression rates4 are given by 50%, 70% and 90%. We compare the average solution time and the average F-score for these models solved by Gurobi and PDCABB. The experiments are performed on a laptop equipped with 2 Intel i5-6200U 2.30GHz CPU (4 cores) and 8 GB RAM. It can be observed that our hybrid model often provides 4
The compression rate is computed by the length of compression over the length of original sentence.
Sentence Compression via DC Programming Approach
349
better F-scores in average for all compression rates, while the computing time for both Gurobi and PDCABB are all very short within less than 0.2 s. We can also see that Gurobi and PDCABB provided different solutions since F-scores are different. This is due to the fact that branch-and-bound algorithm find only approximate global solutions when the gap between upper and lower bounds is small enough. Even both of the solvers provide global optimal solutions, these solutions could be also different since the global optimal solution for ILP could be not unique. However, the reliability of our judgment can be still guaranteed since these two algorithms provided very similar F-score results. Table 1. Compression results Corpus+Model Treebank+P Treebank+H Clarke+P Clarke+H
Solver
50% compression rate
70% compression rate
90% compression rate
F-score (%)
Time (s)
F-score (%)
Time (s)
F-score (%)
Time (s)
Gurobi
56.5
0.099
72.1
0.099
79.4
0.081
PDCABB
59.1
0.194
76.2
0.152
80.0
0.122
Gurobi
79.0
0.064
82.6
0.070
81.3
0.065
PDCABB
79.9
0.096
82.7
0.171
82.1
0.121
Gurobi
70.6
0.087
80.2
0.087
80.0
0.071
PDCABB
81.4
0.132
80.0
0.128
81.2
0.087
Gurobi
77.8
0.046
85.5
0.052
82.4
0.041
PDCABB
79.9
0.081
85.2
0.116
82.3
0.082
The box-plots given in Fig. 2 demonstrates the variations of F-scores for different models with different corpora. We observed that our hybrid model (Treebank+H and Clarke+H) provided better F-scores in average and is more stable in variation, while the quality of the compressions given by probability model is worse and varies a lot. Moreover, the choice of corpora will affect the
Fig. 2. Box-plots for different models v.s. F-scores
350
Y.-S. Niu et al.
compression quality since the trigram probability depends on corpora. Therefore, in order to provide more reliable compressions, we have to choose the most related corpora to compute the trigram probabilities.
5
Conclusion and Perspectives
We have proposed a hybrid sentence compression model ILP-PT based on the probability model and the parse tree model to guarantee the syntax correctness of the compressed sentence and save the main meaning. We use a DC programming approach PDCABB to solve our sentence compression model. Experimental results show that our new model and the solution algorithm can produce high quality compressed results within a short compression time. Concerning on future works, we are very interested in designing a suitable recurrent neural network for sentence compression. With deep learning method, it is possible to classify automatically the sentence types and fundamental structures, it is also possible to distinguish the fixed collocation in a sentence and make these variables be remained or be deleted together. Researches in these directions will be reported subsequently.
References 1. Clarke, J., Lapata, M.: Global inference for sentence compression: an integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008) 2. Gurobi 8.1.0. http://www.gurobi.com 3. Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 310–315 (2000) 4. Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probalistic approach to sentence compression. Artif. Intell. 139, 91–107 (2002) 5. Le Thi, H.A., Pham, D.T., Le Dung, M.: Exact penalty in dc programming. Vietnam J. Math. 27(2), 169–178 (1999) 6. Le Thi, H.A., Pham, D.T.: A continuous approach for large-scale constrained quadratic zero-one programming. Optimization 45(3), 1–28 (2001) 7. Le Thi, H.A., Pham, D.T.: The dc (difference of convex functions) programming and dca revisited with dc models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005) 8. Le Thi, H.A., Nguyen, Q.T., Nguyen, H.T., et al.: Solving the earliness tardiness scheduling problem by DC programming and DCA. Math. Balk. 23, 271–288 (2009) 9. Le Thi, H.A., Moeini, M., Pham, D.T.: Portfolio selection under downside risk measures and cardinality constraints based on DC programming and DCA. Comput. Manag. Sci. 6(4), 459–475 (2009) 10. Le Thi, H.A., Minh, L.H., Pham, D.T., Bouvry, P.: Solving the perceptron problem by deterministic optimization approach based on DC programming and DCA. In: Proceeding in INDIN 2009, Cardiff. IEEE (2009) 11. Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in dc programming. J. Glob. Optim. 52(3), 509–535 (2012) 12. MacDonald, D.: Discriminative sentence compression with soft syntactic constraints. In: Proceedings of EACL, pp. 297–304 (2006)
Sentence Compression via DC Programming Approach
351
13. Niu, Y.S., Pham, D.T.: A DC programming approach for mixed-integer linear programs. In: Modelling, Computation and Optimization in Information Systems and Management Sciences, CCIS, vol. 14, pp. 244–253 (2008) 14. Niu, Y.S.: Programmation DC & DCA en Optimisation Combinatoire et Optimisation Polynomiale via les Techniques de SDP. Ph.D. thesis, INSA, France (2010) 15. Niu, Y.S., Pham, D.T.: Efficient DC programming approaches for mixed-integer quadratic convex programs. In: Proceedings of the International Conference on Industrial Engineering and Systems Management (IESM2011), pp. 222–231 (2011) 16. Niu, Y.S.: On difference-of-SOS and difference-of-convex-SOS decompositions for polynomials (2018). arXiv:1803.09900 17. Niu, Y.S.: A parallel branch and bound with DC algorithm for mixed integer optimization. In: The 23rd International Symposium in Mathematical Programming (ISMP2018), Bordeaux, France (2018) 18. Nguyen, H.T., Pham, D.T.: A continuous DC programming approach to the strategic supply chain design problem from qualified partner set. Eur. J. Oper. Res. 183(3), 1001–1012 (2007) 19. NLTK 3.2.5: The Natural Language Toolkit. http://www.nltk.org 20. Pham, D.T., Le Thi, H.A., Pham, V.N., Niu, Y.S.: DC programming approaches for discrete portfolio optimization under concave transaction costs. Optim. Lett. 10(2), 261–282 (2016) 21. Schleich, J., Le Thi, H.A., Bouvry, P.: Solving the minimum m-dominating set problem by a continuous optimization approach based on DC programming and DCA. J. Comb. Optim. 24(4), 397–412 (2012)
Discrete Optimization and Network Optimization
A Horizontal Method of Localizing Values of a Linear Function in Permutation-Based Optimization Liudmyla Koliechkina1
and Oksana Pichugina2(B)
1
2
University of Lodz, Uniwersytecka Str. 3, 90-137 Lodz, Poland [email protected] National Aerospace University Kharkiv Aviation Institute, 17 Chkalova Street, 61070 Kharkiv, Ukraine [email protected]
Abstract. This paper is dedicated to linear constrained optimization on permutation configurations’ set, namely, to permutation-based subset sum problem (PB-SSP). To this problem, a directed structural graph is associated connected with a skeleton graph of the permutohedron and allowing to perform a directed search to solve this linear program. To solve PB-SSP, a horizontal method for localizing values of a linear objective function is offered combining Graph Theory tools, geometric and structural properties of a permutation set mapped into Euclidean space, the behavior of linear functions on the set, and Branch and Bound techniques. Keywords: Discrete optimization · Linear constrained optimization Combinatorial configuration · Permutation · Skeleton graph · Grid graph · Search tree
1
·
Introduction
Combinatorial optimization problems (COPs) with permutation as candidate solutions commonly known as permutation-based problems [13] can be found in a variety of application areas such as balancing problems associated with chip design, ship loading, aircraft outfitting, turbine balancing as well as in geometric design, facility layout, VLSI design, campus design, assignments, scheduling, routing, scheduling, process communications, ergonomics, network analysis, cryptography, etc. [4,9,10,13–15,19,23–25,27,34,35]. Different COPs are representable easily by graph-theoretic approach (GTA) [1–3,6–8,11]. First of all, it concerns COP on a set E coinciding with a vertex set of their convex hull P (vertex-located sets, VLSs [28,30]). Such COPs are equivalent to optimization problems on a node set of a skeleton graph G = (E, E) of the polytope P , where E is an edge set of P . Note that in case if E is not a VLS, approaches to an equivalent reformulation of the COP as an optimization problem on a VLS in higher dimensional space can be applied first [31,32]. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 355–364, 2020. https://doi.org/10.1007/978-3-030-21803-4_36
356
L. Koliechkina and O. Pichugina
The benefits of using the graph-theoretic approach are not limited to simple illustrations, but also provide an opportunity to develop approaches to solving COPs based on using configuration graphs [11] and structural graphs [1–3,6–8] of the problems. Localization of COP-solutions or values of the objective function is an interesting technique allowing to reduce considerably a search domain based on deriving specifics of the domain, type of constraints and the objective function [1,3,6,7,29]. In particular, the method of ordering the values of objective function on an image En (A) in Rn of a set of n-permutations induced by a set A is considered in [1]. It consists in constructing a Hamiltonian path in an E of a permutation graph, which is a skeleton graph of the permutohedron Pn (A) = conv(En (A)). In [2], a similar problem is considered on an image Enk (A) in Euclidean space of a set multipermutations induced by a multiset A including k different elements. In this case, a skeleton graph of the generalized permutohedron (the multipermutation graph) is considered instead of the permutation graph. In [8], linear constrained single and multiobjective COPs on Enk (A) are solved using the multipermutation graph, etc. This paper is dedicated to developing GTA-techniques [1–3,6–8] for solving permutation-based COPs (PBOPs) related to localization of objective function values. Namely, a generalization of Subset Sum Problem (SSP) [5,12], which as known NP-complete COP, from the Boolean set Bn as an admissible domain to En (A) (further referred to as a permutation-based SSP, BP-SSP). Also, we will consider versions of BP-SSP where a feasible solution x∗ is sought (BP-SSP1) or a complete solution X ∗ is sought (BP-SSP2).
2
The Combinatorial Optimization Problem: Statement and Properties
In the general form, COP can be formulated as follows: there is set A of n elements (1) A = {a1 , a2 , . . . , ak } ⊂ R1 , such that a1 < · · · < ak on which a finite point configuration E = {e1 , e2 , . . . , eN } ⊂ Rn is given and function f (x) : E → R1 . By an e-configurations e ∈ E [26,30], one can understand a permutation, a partial permutation, a combination, a partition, a composition, a partially ordered set induced by A, etc. and considered as a point in Rn . It is required to find an extremum z ∗ (maximum or minimum) of f (x) and an extremal x∗ or a set X ∗ of the extremals, where the extremum is attained, and additional constraints are satisfied (further referred to as COP1/COP2, respectively). Thus, their formulations are: find COP1 : z ∗ = extr f (x), x∗ = argextr f (x); x∈E
x∈E
COP2 : z ∗ = extr f (x), X ∗ = Argextr f (x), x∈E
x∈E
where E = {x ∈ E : fi (x) ≤ 0, i ∈ Jm }, Jm = {1, . . . , m}.
A Horizontal Method of Localizing Values in Permutation Optimization
357
A permutation-based COP (PB-COP) is a particular case of COP, where E ∈ {Πn (A), Πnk (A), En (A), Enk (A)}. Here, Πn (A) is a set of n-permutations induced by a set A = {ai }i∈Jn : ai < ai+1 , i ∈ Jn−1 En (G) ⊂ Rn , and En (A) - is an image of Πn (A) in Euclidean space; E = Enk (G) - is is an image in Euclidean space of n-multipermutation set Πnk (A) induced by set (1), k < n. Denoting Enn (G) = En (G) these two are united in a class Enk (G) – the generalized set of e-configuration of permutations [16,17,19,26,27]. E = En (G) has many interesting constructive and geometric peculiarities [1–3,16–19,21,26–28,30–32,34–36], e.g., xmax = argmaxf (x) = (ai )i∈Jn ; xmin = argminf (x) = (an−i+1 )i∈Jn , x∈E
x∈E
if f (x) = cT x, c = 0, c1 ≤ · · · ≤ cn ;
(2)
– Xmin = Argminf (x)/Xmax = Argmaxf (x) is obtained from xmin /xmax by x∈E
x∈E
permuting coordinates within sets of coordinates with the same coefficient of f (x), wherefrom if c1 < · · · < cn ⇒ Xmin = xmin , Xmax = xmax ;
(3)
– E is VLS; – E is inscribed in a hypersphere Sr (b) centered at b = (b, . . . , b) ∈ Rn (b ∈ R1 ); – ∀i ∈ Jn , E lies on n parallel hyperplaines Hi = {Hij }i∈Jn : Hij = {x ∈ Rn : xi = aj }, j ∈ Jn . As a result, ∀i ∈ Jn
E = ∪ Eij , j∈Jn
(4)
where Eij = E ∩ Hij En−1 (Jn−1 ), i, j ∈ Jn ; – P = convE = Pn (A) is a permutohedron, which is a simple polytope and its H-presentation is: ⎧ n n ⎪ ⎪ xi = ai ; ⎨ i=1 i=1 (5) |ω| ⎪ ⎪ xi ≥ ai , ∀ω ⊂ Jn ; ⎩ i∈ω
i=1
– a skeleton graph Gn (A) of the permutohedron Pn (A) has all permutations induced by A as a node set. Its adjacent vertices are differ by adjacent transposition (i.e., (ai , ai+1 )-transposition); – any function f : E → R1 allows extending is a convex way onto arbitrary convex set K ⊃ E; – E can be represented analytically in the following ways: (a) by equation of Sr (b) and (5); xi = gi , j ∈ Jn ; (b) ω⊆Jn ,|ω|=j i∈ω
(c)
n i=1
xji =
n i=1
ω⊆Jn ,|ω|=j i∈ω
gij , j ∈ Jn .
358
L. Koliechkina and O. Pichugina
– if x ∈ E is formed from y ∈ E by a single transposition ai ↔ aj , i < j, then ∀c ∈ Rn cT x ≤ cT x iff ci ≤ cj (further referred to as Rule1). Let us consider the following versions of PB-COP: find a solution of permutation-based versions of COP1/COP2 with (2) as objective function, E = En (A); f1 (x) = f (x) − z0 ≤ 0; f2 (x) ≤ −f (x) + z0 ≤ 0,
(6)
where z0 ∈ R1 (further referred to as PB-COP1/PB-COP2). Note that (6) can be rewritten as follows: f (x) = z0 , wherefrom PB-COP1, PB-COP2 are permutation-based feasibility problems of finding a point x0 in E or the whole set E , respectively. PB-COP1, PB-COP2 are both at least as hard as NP-complete problems since the subset sum problem – given a real set A on n elements, is there its m-element (m < n) subset whose sum is z0 – is a particular case of PB-COP1, where c1 = · · · = cn−m = 0, cn−m+1 = · · · = cn = 1.
3
The Horizontal Method for PB-COP2
Let us introduce a horizontal method for solving PB-COP2 (PB-COP2.HM), which is based on applying the Branch and Bound paradigm to this feasibility problem. To solve PB-COP2, a search tree with a root E = En (A) and leaves {E3 (A)}A⊂A,|A|=3 is build. It uses GTA and the listed properties of En (A) and Gn (A). In particular, it applies the decomposition (4) recursively fixing last coordinates in En (A)-subsets being the tree nodes; the explicit solution (2) of a linear COP; the vertex locality of E; the adjacency criterion, Rule1, and so on. Let us introduce some notations. cd = (cdi )i∈Jlcd – is a code of the object 0 [.] ∈ {E, G, G}, which is a partial lcd -permutation from A, lcd ∈ Jn−1 – 0 is the length of the code (Jm = Jm ∪ {0}). cd defines values of consecutive last lcd -positions of [.](cd). E(∅) = En (A), G(∅) = Gn (A), G(∅) = Gn (A) is a grid-graph, which will be defined later. Branching of [.](cd) is based on (4) and performed according a rule: branch([.](cd)) = {[.](cdi )i∈Jn−lcd }, where cdi = (ai (cd), cd), i ∈ Jn−lcd , A(cd) = {ai (cd)}i∈Jlcd = A\{cdi }i∈Jlcd , ai (cd) ≤ ai+1 (cd), i ∈ Jn−lcd −1 . Estimates are found with respect to (2) and taking into account already fixed coordinates, namely, lb([.](cd)) = zmin (cd))/ub([.](cd)) = zmax (cd) –
A Horizontal Method of Localizing Values in Permutation Optimization
359
is a lower/upper bound on the branch [.](cd), where zmin (cd) = f (ymin (cd)), zmax (cd) = f (ymax (cd)). G(cd) – is a skeleton graph of conv(E(cd)), G(cd) is a directed grid-graph shown on Fig. 1. G(cd) is of 2(n − lcd ) nodes, two of which have been examined, namely, top-left and bottom-right ones: zmax (cdn−lcd ) = zmin (cd), zmin (cd1 ) = zmin (cd). In the terminology of [8], G(cd) is a two-dimensional structural graph of BP-COP2.
Fig. 1. The grid-graph G(cd)
Fig. 2. The graph (6, 4, 2)
Fig. 3. The grid-graph G(∅)
Pruning branches: – if z0 > zmax (cd) or z0 < zmin (cd), then prune E(cd) (a rule PB1); – if z0 = zmax (cd), then find Xmax (cd), upload X ∗ : X ∗ = X ∗ ∪Ymax (cd), where Ymax (cd) = {(x, cd)}x∈Xmax (cd) , and prune E(cd) (a rule PB2); – if z0 = zmin (cd), then find Xmin (cd), upload X ∗ : X ∗ = X ∗ ∪ Ymin (cd), where Ymin (cd) = {(x, cd)}x∈Xmin (cd) , and prune E(cd) (a rule PB3).
360
L. Koliechkina and O. Pichugina
Fig. 4. The grid-graph G(2)
Fig. 5. The grid-graph G(4, 2)
By construction, in a column of the grid G(cd), consecutive nodes differ by an adjacent transposition that enforce the following (with respect to Rule1): zmax (cdn−lcd ) ≥ zmax (cdn−lcd −1 ) ≥ · · · ≥ zmax (cd1 ); zmin (cdn−lcd ) ≥ zmin (cdn−lcd −1 ) ≥ · · · ≥ zmin (cd1 ); zmin (cdi ) ≤ zmax (cdi ), i ∈ Jn−lcd ; – if i ∈ Jn−lcd −1 : z0 > zmax (cdi ), then prune E(cdi ), . . . , E(cd1 ); – if i ∈ Jn−lcd \{1}: z0 < zmin (cdi ), then prune E(cd1 ), . . . , E(cdi ). Remark 1. If PB-COP1 needs to be solved, the PB-COP2.HM-scheme is used until a first admissible solution is found. This version of PB-COP2.HM is directly generalized from En (A) to Enk (A). The only difference is that (4) becomes ∀i ∈ Jn E = ∪ Eij , where Eij = j∈Jk
E ∩ Hij , i ∈ Jn – is a set in the class En−1,k(Bj ) (Bj ), j ∈ Jk . Another generalization of PB-COP2 concerns considering
f1 (x) = f (x) − z0 ≤ 0; f2 (x) ≤ −f (x) + z0 − Δ ≤ 0,
A Horizontal Method of Localizing Values in Permutation Optimization
361
Fig. 6. The grid-graph G(5, 2)
Fig. 7. The grid-graph G(6, 2)
where Δ ≥ 0 instead of (6). Here, minor modifications in estimates’ rules are required. Simultaneously with forming X ∗ , a permutation-based COP of optimizing ϕ : E → R1 can be found both single-objective as multiobjective [6–8,21,22].
4
BP.COP2 Example
Solve BP.COP2 with n = 6, c = (2, 3, 4, 6, 7, 8), A = J6 , z0 = 109. Coefficients of c are different, therefore, by (2)–(3), rules PB2, PB3 are simplified to: – if z0 = zmax (cd), then X ∗ = X ∗ ∪ (xmax (cd), cd) (a rule PB2’); – if z0 = zmin (cd), then X ∗ = X ∗ ∪ (xmin (cd), cd) (a rule PB3’). Step 1. cd = X 0 = ∅, lcd = 0. xmin (cd) = xmin = (6, 5, 4, 3, 2, 1), xmax (cd) = xmax = (1, 2, 3, 4, 5, 6), zmin (cd) = 83 < z0 = 109 < zmax (cd) = 127. The branch E(cd) is not discarded. branch(E(cd)) = {E(i)i∈J6 }. Graph G(∅) is depicted on Fig. 3 with E(6) on top and E(1) on bottom, as well as with bounds lb(E(i)), ub(E(i)), i ∈ Jn .
362
L. Koliechkina and O. Pichugina
ub(E(1)) = 109, hence, by PB2’, X ∗ = X ∗ ∪ (xmax (cd), cd) = {(2, 3, 4, 5, 6, 1)} and the branch E(1) is discarded; Step 2. Explore E(2). cd = (2), lcd = 1, branch(E(cd)) = {E(i, 2)i∈A(cd) }, where A(cd) = A\{2}. G(cd) = G(2) is shown on Fig. 4. It is seen, the branches E(1, 2), E(2, 2) are pruned by PB1. Step 3. Explore consecutively E(4, 2), E(5, 2), E(6, 2). Here, lcd = 2, cd ∈ {(4, 2), (5, 2), (6, 2)}. Branching is performed into n − lcd = 4 branches (see graphs G(cd) on Figs. 5–7). Step 3.a. In E(4, 2), prune branches E(1, 4, 2), E(3, 4, 2) by PB1. By PB2’, X ∗ = X ∗ ∪ (xmax (5, 4, 2), (5, 4, 2)) = X ∗ ∪ {(1, 3, 6, 5, 4, 2)}, and the branch E(5, 4, 2) is discarded (see Fig. 5). The branch E(6, 4, 2) is explore analyzing four vertices of G(6, 4, 2) as shown on Fig. 3. As a result, two new feasible points are found x1 (6, 4, 2) = (3, 1, 5), x2 (6, 4, 2) = (1, 5, 3) are found, therefore X ∗ = X ∗ ∪ (xi (cd), cd)i=1,2 = X ∗ ∪ {(3, 1, 5, 6, 4, 2), (1, 5, 3, 6, 4, 2)}. Step 3.b. In E(5, 2), prune branches E(3, 5, 2), E(1, 5, 2) by PB1 (see Fig. 6). E(4, 5, 2), E(6, 5, 2) are analyzed similarly to E(6, 4, 2). As a result, one more feasible solution is found and X ∗ = X ∗ ∪ {(3, 4, 1, 6, 5, 2)}. Step 3.c. In E(6, 2), prune a branch E(1, 6, 2) by PB1. By PB3’, X ∗ = X ∗ ∪ (xmin (5, 6, 2), (5, 6, 2)) = X ∗ ∪ {(4, 3, 1, 5, 6, 2)}, and the branch E(5, 6, 2) is discarded (see Fig. 7). Exploring E(4, 6, 2), E(3, 6, 2) like E(6, 4, 2), we get that X ∗ is complemented by new admissible solution, namely, X ∗ = X ∗ ∪ {(1, 5, 4, 3, 6, 2)}. By now, a third of E have been examined. For that, about 50 points from E = 720 were analyzed and 7 elements of E were found. The set E contains 26 points, implying that about a third of E has been found. The same proportion holds for the rest branches E(1) − E(3). As a result, around p = 20% points of E are analyzed to get E . PB-COP2.HM was implemented for PB-COP2s of dimensions up to 200. Numerical results demonstrated that the percentage p decreases as n increases. Also, p decreases as far as z0 becomes closer to zmin or zmax to the middle value (zmin + zmax )/2.
5
Conclusion
Complex extreme and feasibility combinatorial problems on the set of permutations En (A) have been investigated by means of its embedding into Euclidean space and associating and utilizing the permutation polytope, permutation graph, and grid graph. For the problem PB-SSP, the horizontal method for localizing values of a linear objective function (PB-COP2.HM) is developed, and directions of its generalization to solve a wide class of permutation-based problems, which are formalized as linear combinatorial programs, are outlined. (PB-COP2.HM) is supported by an example and illustrations.
A Horizontal Method of Localizing Values in Permutation Optimization
363
References 1. Donec, G.A., Kolechkina, L.M.: Construction of Hamiltonian paths in graphs of permutation polyhedra. Cybern. Syst. Anal. 46(1), 7–13 (2010). https://doi.org/ 10.1007/s10559-010-9178-1 2. Donec, G.A., Kolechkina, L.M.: Extremal Problems on Combinatorial Configurations. RVV PUET, Poltava (2011) 3. Donets, G.A., Kolechkina, L.N.: Method of ordering the values of a linear function on a set of permutations. Cybern. Syst. Anal. 45(2), 204–213 (2009). https://doi. org/10.1007/s10559-009-9092-6 4. Gimadi, E., Khachay, M.: Extremal Problems on Sets of Permutations. Ural Federal University, Yekaterinburg (2016). [in Russian] 5. Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, Berlin, New York (2010) 6. Koliechkina, L.M., Dvirna, O.A.: Solving extremum problems with linear fractional objective functions on the combinatorial configuration of permutations under multicriteriality. Cybern. Syst. Anal. 53(4), 590–599 (2017). https://doi.org/10.1007/ s10559-017-9961-3 7. Koliechkina, L.N., Dvernaya, O.A., Nagornaya, A.N.: Modified coordinate method to solve multicriteria optimization problems on combinatorial configurations. Cybern. Syst. Anal. 50(4), 620–626 (2014). https://doi.org/10.1007/s10559-0149650-4 8. Koliechkina, L., Pichugina, O.: Multiobjective Optimization on Permutations with Applications. DEStech Trans. Comput. Sci. Eng. Supplementary Volume OPTIMA 2018, 61–75 (2018). https://doi.org/10.12783/dtcse/optim2018/27922 9. Kozin, I.V., Maksyshko, N.K., Perepelitsa, V.A.: Fragmentary structures in discrete optimization problems. Cybern. Syst. Anal. 53(6), 931–936 (2017). https:// doi.org/10.1007/s10559-017-9995-6 10. Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms. Springer, New York (2018) 11. Lengauer, T.: Combinatorial Algorithms for Integrated Circuit Layout. Vieweg+Teubner Verlag (1990) 12. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, Chichester, New York (1990) 13. Mehdi, M.: Parallel Hybrid Optimization Methods for permutation based problems (2011). https://tel.archives-ouvertes.fr/tel-00841962/document 14. Pichugina, O.: Placement problems in chip design: Modeling and optimization. In: 2017 4th International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S&T). pp. 465–473 (2017). https://doi.org/ 10.1109/INFOCOMMST.2017.8246440 15. Pichugina, O., Farzad, B.: A human communication network model. In: CEUR Workshop Proceedings, pp. 33–40. KNU, Kyiv (2016) 16. Pichugina, O., Yakovlev, S.: Convex extensions and continuous functional representations in optimization, with their applications. J. Coupled Syst. Multiscale Dyn. 4(2), 129–152 (2016). https://doi.org/10.1166/jcsmd.2016.1103 17. Pichugina, O.S., Yakovlev, S.V.: Functional and analytic representations of the general permutation. East. Eur. J. Enterp. Technol. 79(4), 27–38 (2016). https:// doi.org/10.15587/1729-4061.2016.58550 18. Pichugina, O.S., Yakovlev, S.V.: Continuous representations and functional extensions in combinatorial optimization. Cybern. Syst. Anal. 52(6), 921–930 (2016). https://doi.org/10.1007/s10559-016-9894-2
364
L. Koliechkina and O. Pichugina
19. Pichugina, O., Yakovlev, S.: Optimization on polyhedral-spherical sets: Theory and applications. In: 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering, UKRCON 2017-Proceedings, pp. 1167–1174. KPI, Kiev (2017). https://doi.org/10.1109/UKRCON.2017.8100436 20. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Springer, Berlin, New York (2003) 21. Semenova, N.V., Kolechkina, L.M., Nagirna, A.M.: Multicriteria lexicographic optimization problems on a fuzzy set of alternatives. Dopov. Nats. Akad. Nauk Ukr. Mat. Prirodozn. Tekh. Nauki. (6), 42–51 (2010) 22. Semenova, N.V., Kolechkina, L.N., Nagornaya, A.N.: On an approach to the solution of vector problems with linear-fractional criterion functions on a combinatorial set of arrangements. Problemy Upravlen. Inform. 1, 131–144 (2010) 23. Sergienko, I.V., Kaspshitskaya, M.F.: Models and Methods for Computer Solution of Combinatorial Optimization Problems. Naukova Dumka, Kyiv (1981). [in Russian] 24. Sergienko, I.V., Shilo, V.P.: Discrete Optimization Problems: Challenges. Methods of Solution and Analysis. Naukova Dumka, Kyiv (2003). [in Russian] 25. Stoyan, Y.G., Yakovlev, S.V.: Mathematical Models and Optimization Methods of Geometrical Design. Naukova Dumka, Kyiv (1986). [in Russian] 26. Stoyan, Y.G., Yakovlev, S.V., Pichugina O.S.: The Euclidean Combinatorial Configurations: A Monograph. Constanta (2017). [in Russian] 27. Stoyan, Y.G., Yemets, O.O.: Theory and Methods of Euclidean Combinatorial Optimization. ISSE, Kyiv (1993). [in Ukrainian] 28. Yakovlev, S.: Convex Extensions in Combinatorial Optimization and Their Applications. Optim. Methods Appl. 567–584. Springer, Cham (2017). https://doi.org/ 10.1007/978-3-319-68640-0 27 29. Yakovlev, S.V., Grebennik, I.V.: Localization of solutions of some problems of nonlinear integer optimization. Cybern. Syst. Anal. 29(5), 727–734 (1993). https:// doi.org/10.1007/BF01125802 30. Yakovlev, S.V., Pichugina, O.S.: Properties of combinatorial optimization problems over polyhedral-spherical sets. Cybern. Syst. Anal. 54(1), 99–109 (2018). https:// doi.org/10.1007/s10559-018-0011-6 31. Yakovlev, S., Pichugina, O., Yarovaya, O.: On optimization problems on the polyhedral-spherical configurations with their properties. In: 2018 IEEE First International Conference on System Analysis Intelligent Computing (SAIC), pp. 94–100 (2018). https://doi.org/10.1109/SAIC.2018.8516801 32. Yakovlev, S.V., Pichugina, O.S., Yarovaya, O.V.: Polyhedral spherical configuration in discrete optimization. J. of Autom. Inf. Sci. 51, 38–50 (2019) 33. Yakovlev, S., Pichugina, O., Yarovaya, O.: Polyhedral spherical configuration in discrete optimization. J. of Autom. Inf. Sci. 51(1), 38–50 (2019) 34. Yakovlev, S.V., Valuiskaya, O.A.: Optimization of linear functions at the vertices of a permutation polyhedron with additional linear constraints. Ukr. Math. J. 53(9), 1535–1545 (2001). https://doi.org/10.1023/A:1014374926840 35. Yemelichev, V.A., Kovalev, M.M., Kravtsov, M.K.: Polytopes. Graphs and Optimisation. Cambridge University Press, Cambridge (1984) 36. Ziegler, G.M.: Lectures on Polytopes. Springer, New York (1995)
An Experimental Comparison of Heuristic Coloring Algorithms in Terms of Found Color Classes on Random Graphs Deniss Kumlander(&) and Aleksei Kulitškov Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia {deniss.kumlander,aleksei.kulitskov}@ttu.ee
Abstract. Well-known graph theory problems are graph coloring and finding the maximum clique in an undirected graph, or shortly - MCP. And these problems are closely related. Vertex coloring is usually considered an initial step before the start of finding maximum clique of a graph. The maximum clique problem is considered to be of NP-hard complexity, which means that there is no algorithm found that could solve this kind of problem in polynomial time. The maximum clique algorithms employ a lot the heuristic vertex coloring algorithm to find bounds and estimations. One class of such algorithms executes the coloring one only in the first stage, so those algorithms less concerned on the performance of the heuristic and more on the discovered colors. The researchers always face a problem, which heuristic vertex coloring algorithm should be selected to improve the performance of the core algorithm. Here we tried to give a lot of insights on existing heuristic vertex coloring algorithms and compare them identifying their ability to find color classes - 17 coloring algorithms are investigated: described and tested on random graphs. Keywords: Graph theory
Vertex coloring Heuristic
1 Introduction Let G = (V, E) be an undirected graph. Then, V is a finite set of elements called vertices and E is a finite set of unordered pairs of vertices, called edges. The cardinality of a set of vertices, or just the number of its elements, is called the order of a graph and is denoted as n = |V|. The cardinality of a set of edges, or just the number of its edges, is called the size of a graph and is denoted as m = |E| [1]. If vi and vj are vertices that belong to one and the same graph and there is a relationship between them, which ends up being an edge, then these vertices are adjacent. The degree of vertex v in graph G is the number of edges incident to it [1]. Or, in other words, it is the number of this vertex’s neighbors, which are connected to it. The maximum degree of a vertex in a graph is the number of edges of a vertex with the maximum neighbors. The minimum degree of a vertex in a graph is the number of edges of a vertex, which has the least neighbors. Usually, the degree of a vertex is denoted as deg(v). Density is the ratio of the edges in graph G to the number of vertices of the graph. It is defined as g(G). A graph is considered to be complement if it has the same vertices as graph G and any © Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 365–375, 2020. https://doi.org/10.1007/978-3-030-21803-4_37
366
D. Kumlander and A. Kulitškov
two vertices in this graph are adjacent only if the same vertices are nonadjacent in the original graph. A simple graph is considered to be an undirected graph with finite sets of vertices and edges, which has no loops or multiple edges. A subgraph G0 ¼ ðV 0 ; E0Þ is considered to be a subset of the vertices of graph G with the corresponding edges. But not all possible edges may be included. This means that if vertices vi and vj are adjacent in graph G, then it may happen that on a subgraph of graph G they won’t have an edge between them. V 0 V; E 0 E. An induced subgraph G0 ¼ ðV 0 ; E 0 Þ is considered to be a subset of the vertices of graph G with all their corresponding edges. G½V 0 ¼ ðV 0 V; E 0 ¼ f vi ; vj ji 6¼ j; vi ; vj 2E; vi ; vj 2V 0 gÞ. A complete subgraph G0 ¼ ðV 0 ; E0 Þ is considered to be a subset of the vertices of graph G with all their corresponding edges, where each pair of vertices is connected by an edge. Clique is a complete subgraph of graph G. The clique V 0 in graph G is called maximal if there does not exist any other V00, such that V 0 V00. The size of the largest maximum clique in graph G is called the clique number [1]. An independent set (IS) of a graph G is any subset of vertices V 0 V, where vertices are not pairwise adjacent. So, it is not hard to conclude that for any clique in graph G, there is an independent set in a complement graph G0 and vice versa. The assignment of colors to vertices of a graph according to algorithm’s construction. If we have an undirected graph G ¼ ðV; EÞ, then the process of colors assignment must follow the rules below: • vi ; vj 2E; i 6¼ j • cðvi Þ 6¼ c vj ; i 6¼ j A color class is known to be a subset of vertices that belong to a certain color. A chromatic number of a graph G is considered to be the smallest number of colors needed to make a proper coloring of graph G [2]. Or we should say that it is the smallest number k for which there exists a k-coloring of graph G [1]. It is usually denoted by vðGÞ. An algorithm is considered to be heuristic if it finds an approximate solution to the problem in acceptable time. Situation when vertices have the same saturation degree is called tie. The main problem of the maximal independent set (MIS), is to find a subset of vertices that are pairwise nonadjacent or, in other words, none of the vertex in this set must be connected to other vertices of that set. The IS is called maximal only if there are no vertices that could be added to the current maximal independent set without ruining its structure. MIS problem is considered to be of NPhard complexity. A graph coloring problem, or GCP is to find the least possible number of colors for coloring a particular graph. It means that any two vertices that have a relationship must be colored differently. GCP is considered to be an NP-complete problem. It has a lot in common with MIS problem, because all the vertices that share the same color, or are in one color class, can be called an independent set.
An Experimental Comparison of Heuristic Coloring Algorithms
367
2 Coloring Algorithms Many algorithms have been developed to solve the graph coloring problem heuristically. But Greedy remains to be the basic algorithm to assign colors in a graph. It provides a relatively good solution in a small amount of time. The order, in which the algorithm colors the vertices, plays a major role in the process and heavily affects the quality of the coloring. Therefore, there are many algorithms, which employ different ordering heuristics to determine the order before coloring the vertices. These algorithms are mostly based on Greedy but use additional vertex ordering to achieve better performance. As a rule, they surpass Greedy in the number of colors used, producing better results but taking more time to complete. The most popular ordering heuristics are: • First-Fit ordering - the most primitive ordering existing. Assigns each vertex a lowest possible color. This technique is the fastest amount ordering heuristics. • Degree base ordering – uses a certain criteria to order the vertices and then chooses the correct one to color. Uses a lot more time compared to First-Fit ordering, but produces much better results in terms of the number of used colors. There are many different degree ordering heuristics, but he most popular among them are: a. Random: colors the vertices of a graph in random order or according to random degree function, i.e. random unique numbers given to every vertex; b. Largest-First: colors the vertices of a graph in order of decreasing degree, i.e. it takes into account the number of neighbors of each vertex; c. Smallest-Last: repeatedly assigns weights to the vertices of a graph with the smallest degree, and removes them from the graph, then colors the vertices according to their weights in decreasing order [4]; d. Incidence: sequentially colors the vertices of a graph according to the highest number of colored neighbors; e. Saturation: iteratively colors the vertices of a graph by the largest number of distinctly colored neighbors; f. Mixed/Combined: uses a combination of known ordering heuristics. For example, saturation degree ordering combined with largest first ordering, which is used only to solve situations, when there is a tie, i.e. saturation degree of some vertices is the same. Sequential algorithms tend to do a lot of tasks that could have been executed simultaneously. That is why many popular algorithms have their parallel versions. 2.1
Sequential Algorithms
1. Greedy – classical algorithm introduced by Welsh and Powell in 1967 [3]. It iterates over the vertices in a graph and assigns each vertex a smallest possible color, which is not assigned to any adjacent vertex, i.e. no neighbor must share the same color. 2. Largest-First - Welsh and Powell also suggested an ordering for the greedy algorithm called largest first. It is based on vertices’ degrees. The algorithm orders the
368
3.
4.
5.
6.
7.
8.
2.2
D. Kumlander and A. Kulitškov
vertices according to the number of neighbors that each of them has and then starts with the greedy coloring. Largest-First V2 - It is a slightly modified version of Largest-First algorithm. In this algorithm more than one vertex could be colored in each iteration, i.e. after coloring the vertex with the largest number of neighbors, the algorithm also assigns the same color to all the vertices, which follow the rules of coloring - no adjacent vertices must share the same color, and, finally, it removes these vertices from the graph. Largest-First V3 - Based on the second version we made a third edition of the Largest-First algorithm. The main idea of the algorithm is the same as in V2, however, this time there will be a reordering of vertices in each iteration, meaning that if the vertex is removed from the graph, then its neighbor’s degree is decreased. DSatur- This heuristic algorithm was developed by Daniel Brelaz in 1979 [5]. The core idea of it is to order the vertices by their saturation degrees. If a tie occurs, then the vertex is chosen by the largest number of uncolored neighbors. By assigning colors to a vertex with the largest number of distinctly colored neighbors, DSatur minimizes the possibility of setting an incorrect color [2]. DSatur V2 - another interesting version of DSatur [6]. At first, it finds a largest clique of graph and assigns each a distinct color. Then, it just removes the newly colored vertices from the graph. After this procedure, the algorithm executes as the previous DSatur. The greedy algorithm takes a complement graph, finds the largest independent set and colors it with a distinct color. Then, removes these vertices from the graph and starts working as the first version of DSatur. Incidence degree ordering (IDO) - This ordering was firstly introduced by Daniel Brelaz [5] and was modified by Coleman and More in their work [7]. In one word, it is a modification of the DSatur algorithm. The main principle of this heuristic is to order vertices by decreasing number of the vertices’ colored neighbors. If a tie occurs, it can be decided, which vertex is going to be chosen, by the usage of random numbers. The coloring itself is done by the Greedy algorithm. MinMax - The MinMax algorithm was introduced by Hilal Almara’Beh and Amjad Suleiman in their work in 2012 [8]. The main function of this algorithm is to find the maximum independent set, but it could be used for coloring purposes as well because independent sets are color classes. Mixed/Combined Algorithms
1. IDO-LDO - This algorithm is a combination of incidence degree ordering and largest-first ordering heuristics. As a primary heuristic we use IDO. If a tie occurs, then it will be decided, which vertex is going to be taken, by the largest number of neighbors. 2. IDO-LDO-Random - another modified IDO algorithm. This time the random numbers’ function was added to decide in a situation of a tie. At first, the algorithm orders the vertices by the largest number of colored neighbors, then by the largest number of neighbors and then, if there are two or more vertices with the exact same details, the one with the largest random number is chosen.
An Experimental Comparison of Heuristic Coloring Algorithms
369
3. LDO-IDO - This modification was introduced by Dr. Hussein Al-Omari and Khair Eddin Sabri in their work in 2006 [9]. The basic heuristic for this algorithm is the Largest-First. If a tie occurs, then the IDO heuristic decides, which vertex to take. On the whole, this is almost the same algorithm as the Largest-First V3 with an IDO function inside, in one word, the first ordering is being done by the largest number of neighbors and then by the largest number of colored neighbors. 4. DSatur-LDO - This modification of the DSatur algorithm was also introduced by Dr. Hussein Al-Omari and Khair Eddin Sabri in their work in 2006 [9]. The algorithm works as DSatur but if a tie occurs, then Largest-First algorithm steps into the action to solve the conflict. According to the results, this heuristic works a little better than the original DSatur within the same amount of time. 5. DSatur-IDO-LDO - In this algorithm ties are resolved by Incidence Degree Ordering at first, then the remaining ties are resolved by the Largest Degree Ordering [10]. 2.3
Parallel Algorithms
1. Jones and Plassmann algorithm - The algorithm was firstly proposed by Jones and Plassmann in their work in 1993 [11]. The algorithm is based on the Lubys parallel algorithm [12]. The core idea was to construct a unique set of weights at the beginning that would be used throughout the algorithm itself. For example, random numbers. Any conflict of the same random numbers is solved by the vertex number. Each iteration the JP algorithm finds the independent set of a graph, i.e. all the vertices, which weight is higher than the weight of the neighboring vertices, and then assigns colors to these vertices using the Greedy algorithm. Every action is done in parallel. 2. Jones and Plassmann V2 - Another version of JP algorithm was introduced by William Hasenplaugh, Tim Kaler, Tao B. Schardl and Charles E. Leiserson in their work in 2014 [4]. The idea behind the modification was to use recursion. The algorithm orders the vertices in the order of function p, which generates random numbers. It starts by partitioning the neighbors of each vertex into predecessors (the vertices with larger priorities) and successors (the vertices with lower priorities) [4]. If there are no vertices in predecessors, then the algorithm begins coloring. It has a helper function named JpColor, which uses recursion to color the vertices. The color is chosen by collecting all the colors from the predecessors and choosing the smallest possible (this is being done in the GetColor helper function). When the vertex with the empty predecessors list is colored, the algorithm searches for changes in this vertex successors list for vertices with counter equals to zero starts coloring them. All this is done in parallel subtasks. 3. Parallel Largest-First - As a base algorithm for Parallel Largest-First is used JP algorithm, but as the heuristic – Largest-First. The main difference is that instead of weights system, used in JP, here they are replaced by finding the largest degree of each vertex. However, random numbers are not removed. 4. Parallel Smallest-Last - The Smallest-Last heuristics was firstly introduced by Matula in his work in 1972 [13]. The SL heuristic’s system of weights is more
370
D. Kumlander and A. Kulitškov
sophisticated and complex. The algorithm uses two phases [14]: Weighting phase and Coloring phase. The weighting phase begins by finding vertices that correspond to the current smallest degree in the graph. These vertices are assigned the current weight and removed from the graph. The degree of all the neighbors of deleted vertices are decreased. All these steps are repeated until every vertex receives its weight. 5. Non-Parallel Implementations - These algorithms include: • Greedy From Parallel – non-parallel copy of Jones and Plassmann algorithm; • Greedy V2 From Parallel – non-parallel copy of Jones and Plassmann V2 algorithm; • Largest-First From Parallel – non-parallel copy of Parallel Largest-First algorithm; • Smallest-Last From Parallel – non-parallel copy of Parallel Smallest-Last.
3 Tests and Results In this part we are going to conduct tests to determine the most acceptable coloring algorithms that could be used later for the maximum clique algorithms. In this study we focus on the number of used colors by density of the graph. The random graphs are generated as described by Kumlander in 2005 [2]. 3.1
Sequential Algorithms
Number of used colors
As can be seen from the charts, every algorithm performs better than the Greedy algorithm in terms of number of used colors almost on every density. However, for example, on 40% density IDO and MinMax algorithms’ performance is similar to that of the Greedy algorithm. Also it is worth mentioning that in some particular cases Largest-First and Largest-First V2 algorithms used the same number of colors as the Greedy algorithm, while taking more time to achieve this goal (Fig. 1).
80 70 60 50 40
Greedy LargestFirst LargestFirstV2 LargestFirstV3 DSatur Number of verƟces
DSaturV2
Fig. 1. Randomly generated graphs tests’ results compared in used colors. Sequential algorithms, density 10%.
Number of used colors
An Experimental Comparison of Heuristic Coloring Algorithms
65
371
Greedy
60
LargestFirst
55
LargestFirstV2
50
LargestFirstV3
45 350 360 370 380 390 400 410 420 430 440 450 Number of verƟces
DSatur DSaturV2
Number of used colors
Fig. 2. Randomly generated graphs tests’ results compared in used colors. Sequential algorithms, density 50%.
70
Greedy
60
LargestFirst
50
LargestFirstV2 LargestFirstV3
40 110 114 118 122 126 130 134 138 142 146 150 Number of verƟces
DSatur DSaturV2
Fig. 3. Randomly generated graphs tests’ results compared in used colors. Sequential algorithms, density 90%.
The best results in terms of used colors among all the sequential algorithms produced DSatur, DSatur V2 and Largest-First V3. Their performance is much better compared to the Greedy algorithm; however, it comes with a cost of taking more time to complete (Figs. 2 and 3). 3.2
Combined Algorithms
At first sight, it seems that the results of combined algorithms are very similar to the sequential ones. There are also three leading algorithms, which this time are: DSaturLDO, DSatur-IDO-LDO and LDO-IDO. Their results are much better than the Greedy one. The higher the density is, the more similar performance of algorithms. On 10% to 70% density we can see clear division between these three algorithms and the rest. However, at 80% density and higher the difference begins to vanish, although the lead in the number of used colors remains (Fig. 4).
D. Kumlander and A. Kulitškov
Number of used colors
372
80
Greedy
70
IdoLdo
60
IdoLdoRandom
50
LdoIdo
40 2000 2150 2300 2450 2600 2750 2900 3050 3200 3350 3500 Number of verƟces
DSaturLdo DSaturIdoLdo
Fig. 4. Randomly generated graphs tests’ results compared in used colors. Combined algorithms, density 10%.
Number of used colors
Furthermore, it is possible to clearly see a very strange behavior of IDO-LDORandom algorithm - it used more colors than the Greedy algorithm in some cases. This behavior might be caused by the fact that random numbers are used during execution of IDO-LDO-Random algorithm and should be investigated by a separate research (Figs. 5 and 6).
70 65
Greedy
60
IdoLdo
55
IdoLdoRandom
50
LdoIdo
45
DSaturLdo 350 360 370 380 390 400 410 420 430 440 450
DSaturIdoLdo
Number of verƟces
Fig. 5. Randomly generated graphs tests’ results compared in used colors. Combined algorithms, density 50%.
When it comes to consumed time, then LDO-IDO clearly wins among these three, although it is bigger than the same of the Greedy one. 3.3
Parallel Algorithms
It can be seen from the charts that Parallel Largest-First prevails almost in every situation. Along with Parallel Largest- First it is necessary to mention the Parallel Smallest-Last algorithm, however, it shows promising results only at higher densities using almost the same amount of colors and at 90% density even outperforming
An Experimental Comparison of Heuristic Coloring Algorithms
373
Parallel Largest-First algorithm. Parallel Jones and Plassmann and its second version perform very similar to the Greedy algorithm, using less or more colors compared to Greedy (Figs. 7, 8 and 9).
Number of used colors
70 65
Greedy
60
IdoLdo
55
IdoLdoRandom
50
LdoIdo
45
DSaturLdo
40 110 114 118 122 126 130 134 138 142 146 150
DSaturIdoLdo
Number of verƟces
Number of used colors
Fig. 6. Randomly generated graphs tests’ results compared in used colors. Combined algorithms, density 90%.
80 75 70 65 60 55 50 45
Greedy ParallelJp ParallelJpV2 ParallelLargestFirst ParallelSmallestLast 20002150230024502600275029003050320033503500 Number of verƟces
GreedyFromParallel GreedyV2FromParallel
Number of used colors
Fig. 7. Randomly generated graphs tests’ results compared in used colors. Parallel algorithms, density 10%. 65
Greedy
60
ParallelJp
55
ParallelJpV2
50
ParallelLargestFirst ParallelSmallestLast
45 350 360 370 380 390 400 410 420 430 440 450 Number of verƟces
GreedyFromParallel GreedyV2FromParallel
Fig. 8. Randomly generated graphs tests’ results compared in used colors. Parallel algorithms, density 50%.
D. Kumlander and A. Kulitškov
Number of used colors
374
70
Greedy
65 60
ParallelJp
55
ParallelJpV2
50
ParallelLargestFirst
45
ParallelSmallestLast
40 110 114 118 122 126 130 134 138 142 146 150 Number of verƟces
GreedyFromParallel GreedyV2FromParallel
Fig. 9. Randomly generated graphs tests’ results compared in used colors. Parallel algorithms, density 90%.
In terms of time used to complete the task, Parallel Smallest-Last demonstrates the worst results. The performance of Parallel Largest-First is not far away from Parallel Smallest-Last algorithm. The only thing that should be noted is the fact that on 30%, 50% and 80% density Parallel Largest-First algorithm’s execution time is very similar to Parallel JP despite the fact that it uses largest first ordering.
4 Conclusion Algorithms that showed the better results among others in their group in terms of number of used colors. And these algorithms are: • Among sequential: DSatur, DSatur V2 and Largest-First V3; • Among combined: DSatur-LDO, DSatur-IDO-LDO and LDO-IDO; • Among parallel: Parallel Largest-First and Parallel Smallest-Last.
References 1. Kubale, M.: Graph Colorings. American Mathematical Society, US (2004) 2. Kumlander, D.: Some practical algorithms to solve the maximum clique problem. Tallinn University of Technology, Tallinn (2005) 3. Welsh, D.J.A., Powell, M.B.: An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1), 85–86 (1967) 4. Hasenplaugh, W., Kaler, T., Schardl, T.B., Leiserson, C.E.: Ordering heuristics for parallel graph coloring. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures–SPAA’14, pp. 166–177 (2014) 5. Brelaz, D.: New methods to color the vertices of a graph. Commun. ACM 22(4), 251–256 (1979) 6. Andrews, P.S., Timmis, J., Owens, N.D.L., Aickelin, U., Hart, E., Hone, A., Tyrrell, A.M.: Artificial Immune Systems. York, UK (2009)
An Experimental Comparison of Heuristic Coloring Algorithms
375
7. Coleman, T.F., More, J.J.: Estimation of sparse Jacobian matrices and graph coloring problems. SIAM J. Numer. Anal. 20, 187–209 (1983) 8. Almarabeh, H., Suleiman, A.: Heuristic algorithm for graph coloring based on maximum independent set. J. Appl. Comput. Sci. Math. 6(13), 9–18 (2012) 9. Al-Omari, H., Sabri, K.E.: New graph coloring algorithms. J. Math. Stat. 2(4), 439–441 (2006) 10. Saha, S., Baboo, G., Kumar, R.: An efficient EA with multipoint guided crossover for biobjective graph coloring problem. In: Contemporary Computing: 4th International Conference-IC3 2011, pp. 135–145 (2011) 11. Jones, M.T., Plassmann, P.E.: A parallel graph coloring heuristic. SIAM J. Sci. Comput. 14 (3), 654–669 (1993) 12. Luby, M.: A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput. 15(4), 1036–1053 (1986) 13. Matula, D.W., Marble, G., Isaacson, J.D.: Graph coloring algorithms. Academic Press, New York (1972) 14. Allwright, J.R., Bordawekar, R., Coddington, P.D., Dincer, K., Martin, C.L.: A comparison of parallel graph coloring algorithms. Technical Report SCCS-666 (1995)
Cliques for Multi-Term Linearization of 0–1 Multilinear Program for Boolean Logical Pattern Generation Kedong Yan1 and Hong Seo Ryoo2(B) 1 Department of Computer Science and Technology, School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Xuanwu District, Nanjing 210094, Jiangsu, People’s Republic of China [email protected] 2 School of Industrial Management Engineering, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul 02841, Republic of Korea [email protected]
Abstract. 0–1 multilinear program (MP) holds a unifying theory to Boolean logical pattern generation. For a tighter polyhedral relaxation of MP, this note exploits cliques in the graph representation of data under analysis to generate valid inequalities for MP that subsume all previous results and, collectively, provide a much stronger relaxation of MP. A preliminary numerical study demonstrates strength and practical benefits of the new results. Keywords: Logical analysis of data · Pattern · 0–1 multilinear programming · 0–1 polyhedral relaxation Clique
1
· Graph ·
Introduction and Background
Logical Analysis of Data (LAD) is a combinatorial optimization-based supervised learning methodology, and the key and bottleneck step in LAD is pattern generation where a set of features and their negations are optimally combined together to form knowledge/rule that distinguishes one type of data/observations from the other(s). Without loss of generality, we consider the analysis of two types of + and − data and denote by S • the index set of • type of data for • ∈ {+, −}. Let S = S + ∪ S − . We assume S is duplicate and contradiction free (such that S + ∩ S − = ∅) and that the data under analysis are described by n Boolean attributes aj , j ∈ N := {1, . . . , n}. We let an+j = ¬aj for j ∈ N and 1
This work was supported by National Natural Science Foundation of China (Grant Number: 61806095.) 2 Corresponding author. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant Number: 2017R1D1A1A02018729.) c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 376–386, 2020. https://doi.org/10.1007/978-3-030-21803-4_38
Cliques for Multi-term Linearization of 0–1 Multilinear Program
377
let N := {n + 1, . . . , 2n} and N := N ∪ N . Finally, for each data Ai , i ∈ S, we denote by Aij the j-th attribute value of the data; such that Aij = 1 − Ai,n+j for j ∈ N and Aij = 1 − Ai,j−n for j ∈ N . Last, since + and − patterns are symmetric in definition, we present (most of) the material below in the context of + pattern generation for convenience, without loss of generality. To build a mathematical model for pattern generation, we introduce 0–1 indicator variables xj for j ∈ N and let 1, if attribute aj is involved in a pattern; and xj = 0, otherwise. For i ∈ S, we let
Ji := {j ∈ N | Aij = 0}.
Since the dataset is duplicate and contradiction free, all Ji ’s are unique and |Ji | = n, ∀i ∈ S. In [15], we showed that the 0–1 MP below holds a unifying theory to LAD pattern generation: (PG) : max ϕ+ (x) + l(x) ϕ− (x) = 0, x ∈ {0, 1}2n where l(x) is a linear function and
ϕ• (x) =
(1 − xj )
i∈S • j∈Ji
for • ∈ {+, −}. It is well-known that the constraint of (PG) is equivalent to a set of minimal cover inequalities [8]: (1 − xj ) = 0 ⇐⇒ xj ≥ 1, i ∈ S − i∈S − j∈Ji
j∈Ji
The minimal cover inequalities provide a poor linear programming (LP) relaxation bound, however. For 0–1 linearly overestimating ϕ+ , McCormick concave envelopes for a 0–1 monomial can serve the purpose (e.g., [2,10,12,14].) This ‘standard’ method achieves the goal by means of introducing m+ (where m+ = |S + |) variables (1 − xj ), i ∈ S + (1) yi = j∈Ji
and n × m+ inequalities yi ≤ 1 − xj , j ∈ Ji , i ∈ S +
(2)
378
K. Yan and H. S. Ryoo
to the formulation of a 0–1 linear relaxation of (PG). Alternatively, one may aggregate the constraints in (2) with respect to j to concavity ϕ+ by m+ valid inequalities (e.g., [13]) xj ≤ n, i ∈ S + , nyi + j∈Ji
or aggregate them with respect to i via standard probing techniques and logical implications in integer programming (e.g., [5–7]) to yi ≤ |Ij |(1 − xj ), j ∈ N , i∈Ij+
where Ij+ := {i ∈ S + | j ∈ Ji } for j ∈ N . As aggregation-based, these each provide a weaker relaxation of ϕ+ than the McCormick relaxation but can prove useful in data mining applications (e.g., see [16].) As for ϕ+ , [17] represented + and − data under analysis as vertices in graph and introduced an edge between each pair of + and − vertices that are 1 Hamming distance apart. The analysis of the resulting graph has revealed that each star of degree d (≥2) in the graph generates n + d valid inequalities for (PG) that dominate n × d McCormick inequalities from the d leaf nodes (that is, + data) of the star. Furthermore, we showed that a set of ‘neighboring’ stars further reduce the number of linear inequalities in the polyhedral relaxation of (PG) while strengthen it and demonstrated cases when these inequalities are facet-defining of the 0–1 multilinear polytope associated with the McCormick inequalities that they replace. Recently, [3] studied the facial structure of the polyhedral convex hull of a 0–1 multilinear function via a hypergraph representation of the multilinear function and proposed several lifting techniques for generating facet-defining inequalities. The number of inequalities generated can be exponential in the number of multilinear terms and/or variables, however. Plus, the hypergraph representation of (PG) is never free of Berge or γ-cycles. This implies that results from [4] cannot be (directly) utilized for a tight polyhedral convexification of (PG). In short, we note (at least for now) that recent mathematical discoveries from [3,4] do not constitute a viable means for solving (PG). In this note, we enhance the approach in [17] to study a tight, multi-term relaxation of ϕ+ by virtue of an improved, more effective graph representation of data for analysis. More specifically, we discover a new useful ‘neighborhood’ property among data (with respect to generating stronger valid inequalities) that introduces edges in the graph representation of data in a manner that allows for the generation of a valid inequality from each maximal clique of the graph. Once entangled, a set of neighboring data in a maximal clique allows one to replace the corresponding McCormick inequalities by a single, much stronger valid inequality. This gives rise to polyhedrally relaxing (PG) by means of a much smaller number of stronger valid inequalities, in comparison with the methods from the aforementioned references. With regards to results from [17], the new
Cliques for Multi-term Linearization of 0–1 Multilinear Program
379
results in this note subsume our earlier results; thus, yield a tighter polyhedral relaxation of (PG) in terms of a smaller number of 0–1 linear inequalities. As for organization, this note is consisted of three parts. Following this section of introduction and background, we present the main results in Sect. 2 and follow it by a preliminary numerical study. The numerical study compares our new relaxation method against the McCormick relaxation method in pattern generation experiments with six machine learning benchmark datasets; recall that McCormick provides the strongest lower bound when compared to two alternatives in [5–7,13]. In short, the performance of the new results is far superior and demonstrates their practical utilities in data mining applications well.
2
Main Results
Definition 1. A clique is a simple graph in which every pair of vertices has an edge. A clique of size k is called a k-clique, where the size means the number of vertices in a clique. A maximal clique is a clique that is not included in a larger clique while a maximum clique in a graph is a clique with the largest number of vertices. Definition 2. A vertex clique cover is a set of cliques whose union cover all vertices of a graph. A vertex maximal clique cover is vertex clique cover in which each clique is maximal. Definition 3. If πx ≤ π0 and μx ≤ μ0 are two valid inequalities for a polytope in the nonnegative orthant, πx ≤ π0 dominates μx ≤ μ0 if there exists u > 0 such that uμ ≤ π and π0 ≤ uμ0 , and (π, π0 ) = (uμ, uμ0 ). Finally, let + IP G := x ∈ {0, 1}2n , y ∈ [0, 1]m (1), ϕ− (x) = 0 . For variable x regarding a pair of original and negated Boolean attributes, the requirement for a logical statement gives the following complementary relation which is of great importance in deriving stronger valid inequalities for IP G . Proposition 1 (Proposition 1 in [16]). For j ∈ N , let j c = n + j if j ∈ N and j c = j − n if j ∈ N . Then, the complementarity cut xj + xj c ≤ 1,
(3)
is valid for IP G . In this section, we are interested in generating valid inequalities for IP G of the form yi ≤ 1 − xj , I ⊂ S + (4) i∈I
380
K. Yan and H. S. Ryoo
with respect to a specific j ∈ N . When |I| = 1, (4) simply reduces to a McCormick inequality. It is easy to see that as I grows larger, (4) becomes stronger. Thus, for a tighter polyhedral relaxation of IP G , we wish to extend I as large as possible while keeping (4) valid. For this purpose, we examine 0–1 features (variable x) to identify a maximal I. First, we consider a pair of observations in Ij+ . Lemma 1. For j ∈ N , suppose there are i, k ∈ Ij+ such that j ∈ J := Ji ∩ Jk and J ⊂ J for some ∈ Ij− := {i ∈ S − |j ∈ Ji }. Then, the following inequality yi + yk ≤ 1 − xj ,
(5)
is valid for IP G . Proof. For A , ∈ Ij− , we have
(1 − xι ) = 0,
(6)
ι∈J
which needs to be satisfied. Since j ∈ J , if xj = 1, (6) is satisfied and we also have yi = yk = 0. Therefore yi + yk = 0 ≤ 1 − xj . On the other hand, suppose xj = 0 and consider variables with indices in J \ {j}. It is easy to see that J \ {j} = (J \ J) ∪ (J \ {j}). If xι = 1 for any ι ∈ J \ {j}, then (6) is satisfied and yi = yk = 0. That is, yi + yk = 0 < 1 − xj . Assume xι = 0, ∀ι ∈ J \ {j}, to satisfy (6), there must exist ι ∈ J \ J such that xι = 1. Note that N = J ∪ J c ∪ (Ji \ J ) ∪ (Jk \ J ) , where J c := {j c |j ∈ J}. One can see that ι ∈ J c , otherwise ιc ∈ J ⊂ J , which contradicts ι ∈ J . This indicates either ι ∈ Ji \ J or ι ∈ Jk \ J. Without loss of generality, assume ι ∈ Ji \ J, then yi ≤ 1 − xι = 0, and yk ≤ 1 − xιc = 1 via (3). Thus yi + yk ≤ 1 = 1 − xj . This completes the proof.
To fully extend the result above, we represent the data under analysis in a graph, as done so in [16,17]. The difference in this graph representation is that, while each observation in Ij+ maps to a unique node in a graph, we now introduce an edge between a pair of vertices if the pair satisfies the condition set forth in Lemma 1. The resulting undirected graph is denoted by G+ j . Now, we have the following result for each clique of G+ . j
Cliques for Multi-term Linearization of 0–1 Multilinear Program
381
Theorem 1. For j ∈ N , consider a clique which contains a set Ω(Ω ⊆ Ij+ , |Ω| ≥ 2) of observations in G+ j . Then, the following inequality
yi ≤ 1 − xj
i∈Ω
is valid for IP G . Furthermore, the inequality above dominates for any Ω ⊂ Ω.
(7)
i∈Ω
yi ≤ 1 − xj
Proof. From the way that G+ j is created we have yi + yk ≤ 1 − xj for each pair i, k ∈ Ω via Theorem 1. So if xj = 1, one has yi = 0, ∀i ∈ Ω. Thus yi = 0 ≤ 1 − xj . i∈Ω
On the other hand, suppose xj = 0, then yi + yk ≤ 1 for each pair i, k ∈ Ω. One can easily verify that at most one yi , i ∈ Ω can take value 1. That is, yi ≤ 1 = 1 − xj . i∈Ω
The dominance result is straightforward since (7) becomes stronger as Ω expands. To examine the strength of (7), we let: ⎧ ⎫ ⎨ x ∈ {0, 1}2n , ⎬ − + (3); Ξ := x ≥ 1, i ∈ I ; y ≤ 1 − x , j ∈ J , i ∈ Ω ∪ I c j i j i j j ⎩ y ∈ [0, 1] ⎭ j∈Ji
Theorem 2. For a clique with vertex set Ω in Theorem 1, (7) defines a facet of conv(Ξ). Proof. First note that (7) is valid via the proof of Theorem 1. Suppose that (7) is not facet-defining. Then, there exists a facet-defining inequality of conv(Ξ) of the form αj xj + βi yi ≤ γ, (8) j∈N
i∈Ω∪Ij+c
where (α, β) = (0, 0), such that (7) defines a face of the facet of conv(Ξ) defined by (8). That is: F := (x, y) ∈ Ξ xj + yi = 1 i∈Ω ⎧ ⎫ ⎪ ⎪ ⎨ ⎬ ⊆ F := (x, y) ∈ Ξ αj xj + βi yi = γ ⎪ ⎪ ⎩ ⎭ j∈N i∈Ω∪Ij+c
382
K. Yan and H. S. Ryoo
Consider the following two cases for the solutions in F . Case 1. (xj = 1) In this case, xj c = 0 and yi = 0, ∀i ∈ Ω. Since j ∈ Ji , ∀i ∈ Ij− , a solution with xj = 1 satisfies all the minimal cover inequalities defining Ξ. Such a solution with xι = 0, ∀ι ∈ N \ {j, j c }, and yi = 0, ∀i ∈ Ω ∪ Ij+c belongs to F hence F , which yields αj = γ. For this solution, we can set xι = 1 for any ι ∈ N \ {j, j c }, which yields αj + αι = γ. Therefore, we have: αj = γ and αι = 0, ∀ι ∈ N \ {j, j c } Furthermore, note that j ∈ Ij+c and a pattern exists for a contradiction-free dataset. This implies that there exists a 0–1 vector x that yields yi = 1 for each i ∈ Ij+c , which yields αj + βi = γ, i ∈ Ij+c , thus βi = 0, ∀i ∈ Ij+c . Case 2. (xj = 0) By the same argument that a pattern exists for a contradictionfree dataset, we have βi = γ, i ∈ Ω and αj c + βi = γ, i ∈ Ω for solutions with xj c = 0 and xj c = 1 respectively. These yield αj c = 0 and βi = γ, ∀i ∈ Ω. Summarizing, the two cases above show αι = 0, ∀ι ∈ N \ {j}, βi = 0, ∀i ∈ Ij+c , and αj = βi = γ > 0, ∀i ∈ Ω, where γ > 0 is from our supposition that (7) is dominated by (8). This shows that (8) is a positive multiple of (7) and completes the proof. We close this section with two remarks. Remark 1. The last statement of Thorem 1 directs that only the maximal cliques of G+ j need to be considered. As finding all maximal cliques is timeconsuming (e.g., [11],) we recommend instead using a vertex maximal clique cover of G+ j . Remark 2. We wish to add that two main results – namely, Theorems 1 and 3 – in [17] are subsumed by Theorem 1 above. Specifically, via Theorem 1, we obtain a set of inequalities that dominate those by the forementioned results from [17] in large part along with a small number of common ones. (We have theorems and proofs for this but omit those here for reasons of space with respect to the 10 page limitation for the conference proceedings.) This helps greatly improve the overall efficiency of LAD pattern generation via a more effective multi-term relaxation of (PG) and its solution, as demonstrated in the following section.
3
A Preliminary Experiment
In this preliminary study, we compare the new results of the previous section against the standard, McCormick relaxation method. We used six machine learning benchmark datasets from [9] for this experiment.
Cliques for Multi-term Linearization of 0–1 Multilinear Program
383
For notational simplicity, for • ∈ {+, −}, we denote by •¯ the complementary element of • with respect to the set {+, −}. For generating • patterns by the two methods (for ϕ• ) compared, we used the minimal cover inequalities for linearizing ϕ•¯ of (PG). The resulting 0–1 equivalent of (PG) by † ∈ {mccormick, cliques}, where mccormick and cliques denote the McCormick relaxation and the new results in this note, respectively, takes the following form: maximize yi x∈{0,1} 2n , 0≤y≤1 i∈S • subject to (OBJ)•† (PG)•† : xj ≥ 1, i ∈ S •¯ j∈J i
For experiments, we implemented a simple pattern generation procedure below: procedure pg† (for † ∈ {mccormick,cliques}) 1: for • ∈ {+, −} do 2: if † = mccormick then 3: obtain (OBJ)•† via (2) 4: else 5: obtain a graph representation G• of data in S • . 6: for j ∈ N do 7: retrieve subgraph G•j from G• . 8: find a vertex maximal clique cover for G•j . 9: obtain (OBJ)•† via (7). 10: end for 11: end if 12: solve (PG)•† via CPLEX. 13: end for All MILP instances generated were solved by CPLEX [1] with disallowing any cuts to be added by the solver. For the choice of metric for comparison, we adopted CPU time for solution and the root relaxation gap, defined as the difference between root node relaxation value and the optimum. We remind the reader that the latter is a fair metric in that this value is least affected by an arsenal of solution options and heuristics featured in a powerful MILP solver such as CPLEX. First, Table 1 provides the root node relaxation gap values in format ‘average ± 1 standard deviation’ of 30 results for each dataset, followed by the minimum and the maximum values inside parenthesis. The numbers in the last column measure the improvement in the root relaxation gap made by cliques, in comparison to mccormick. Table 2 provides the CPU seconds for solving the instances of (PG)•mccormick and (PG)•cliques in format ‘average ± 1 standard deviation’ of 30 results. We note that the time for (PG)•cliques includes the time spent in finding graph representation of data and vertex maximal clique covers. Again, the last column of
384
K. Yan and H. S. Ryoo Table 1. Root relaxation gap Dataset • (PG)•mccormick
(PG)•cliques
†
bupa
+ 82.6 ± 3.1 (75.2, 89.3) 5.0 ± 6.5 (0.0, 25.6) − 85.0 ± 2.1 (78.7, 88.2) 5.6 ± 4.9 (0.0, 15.4)
94.1 93.5
clev
+ 66.2 ± 6.1 (53.6, 77.0) 3.9 ± 4.3 (0.0, 12.3) − 71.6 ± 4.7 (63.0, 79.4) 2.2 ± 2.9 (0.0, 8.7)
94.4 97.1
cred
+ 70.2 ± 4.5 (58.0, 78.9) 5.3 ± 5.1 (0.0, 21.1) − 75.3 ± 4.5 (64.1, 81.8) 7.7 ± 5.8 (0.0, 20.1)
92.7 90.1
diab
+ 88.5 ± 1.8 (81.9, 91.4) 16.6 ± 7.4 (0.0, 31.4) 81.3 − 80.4 ± 2.9 (73.9, 86.6) 3.4 ± 4.3 (0.0, 16.6) 95.9
hous
+ 65.1 ± 6.7 (52.1, 76.9) 2.2 ± 2.7 (0.0, 8.5) − 74.2 ± 3.7 (66.1, 79.2) 3.2 ± 3.7 (0.0, 11.9)
96.7 95.8
wisc
+ 59.0 ± 5.2 (51.2, 70.4) 1.5 ± 2.5 (0.0, 9.1) − 53.9 ± 4.8 (44.4, 65.3) 1.1 ± 2.3 (0.0, 8.2)
97.5 97.9
(PG)•mccormick and (PG)•cliques are solved without utilizing CPLEX cuts All results are in format ‘average ± 1 standard deviation (min, max)’ †: Measures relative efficacy of (PG)•cliques over (PG)•mccormick : (Average of 30
Gap by(PG)• mccormick −Gap
by(PG)•cliques values) Worse of the 2 results
Table 2. CPU seconds Dataset • (PG)•mccormick
(PG)•cliques
†
bupa
+ 0.38 ± 0.10 (0.24, 0.61) − 0.73 ± 0.11 (0.44, 0.97)
0.04 ± 0.04 (0.00, 0.11) 82.88 0.10 ± 0.08 (0.00, 0.24) 80.97
clev
+ 0.07 ± 0.02 (0.03, 0.10) − 0.10 ± 0.06 (0.04, 0.34)
0.02 ± 0.01 (0.00, 0.04) 69.26 0.01 ± 0.01 (0.00, 0.04) 77.82
cred
+ 1.28 ± 0.57 (0.45, 2.39) − 0.96 ± 0.40 (0.39, 2.09)
0.25 ± 0.22 (0.00, 1.09) 66.79 0.26 ± 0.23 (0.01, 1.10) 61.96
diab
+ 3.13 ± 0.88 (1.49, 5.08) 0.69 ± 0.39 (0.01, 1.50) 73.39 − 4.93 ± 1.89 (2.68, 12.27) 0.40 ± 0.41 (0.01, 1.80) 85.89
hous
+ 0.16 ± 0.05 (0.08, 0.31) − 0.24 ± 0.13 (0.10, 0.66)
0.03 ± 0.03 (0.00, 0.11) 71.61 0.04 ± 0.04 (0.00, 0.14) 73.81
wisc
+ 0.06 ± 0.02 (0.02, 0.09) − 0.02 ± 0.01 (0.01, 0.03)
0.01 ± 0.01 (0.00, 0.03) 81.02 0.00 ± 0.01 (0.00, 0.02) 76.67
(PG)•mccormick and (PG)•cliques are solved without utilizing CPLEX cuts Time by (PG)•cliques includes time for creating graphs and finding cliques. †: Measures relative efficacy of (PG)•cliques over (PG)•mccormick : (Average of 30
Time for(PG)•mccormick −Time for(PG)•cliques values) Worse of the 2 results
Cliques for Multi-term Linearization of 0–1 Multilinear Program
385
the table provides a measure of the overall improvement in efficiency of pattern generation by the results of this note. Briefly summarizing, the numbers in these tables show that the new results of this note provide a tight polyhedral relaxation of (PG) and help achieve a great deal of improvement in efficiency of pattern generation for LAD. In note that pattern generation is notorious for being the bottleneck operation in Boolean logical analysis of data, the comparative results in the two tables above demonstrate practical utilities of the new mathematical results of this note well. Before closing, we refer interested readers to Tables 5 and 6 in [17] to note that new results of this note not only subsume their predecessors but also provide a much tighter polyhedral relaxation of (PG) and, as a result, help in much faster solution of the difficult-to-solve MILP pattern generation instances for Boolean logical analysis of data.
References 1. IBM Corp.: IBM ILOG CPLEX Optimization Studio CPLEX User’s Manual Version 12 Release 8 (2017). https://www.ibm.com/support/knowledgecenter/ SSSA5P 12.8.0/ilog.odms.studio.help/pdf/usrcplex.pdf. Accessed 12 Dec 2018 2. Crama, Y.: Concave extensions for nonlinear 0–1 maximization problems. Math. Program. 61, 53–60 (1993) 3. Del Pia, A., Khajavirad, A.: A polyhedral study of binary polynomial programs. Math. Oper. Res. 42(2), 389–410 (2017) 4. Del Pia, A., Khajavirad, A.: The multilinear polytope for acyclic hypergraphs. SIAM J. Optim. 28(2), 1049–1076 (2018) 5. Fortet, R.: L’alg`ebre de boole dt ses applications en recherche op´erationnelle. ´ Cahiers du Centre d’Etudes de Recherche Op´erationnelle 1(4), 5–36 (1959) 6. Fortet, R.: Applications de l’alg`ebre de boole en recherche op´erationnelle. Revue Fran¸caise d’Informatique et de Recherche Op´erationnelle 4(14), 17–25 (1960) 7. Glover, F., Woolsey, E.: Converting the 0–1 polynomial programming problem to a 0–1 linear program. Oper. Res. 12(1), 180–182 (1974) 8. Granot, F., Hammer, P.: On the use of boolean functions in 0–1 programming. Methods Oper. Res. 12, 154–184 (1971) 9. Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci. edu/ml. Accessed 12 Dec 2018 10. McCormick, G.: Computability of global solutions to factorable nonconvex programs: part I-convex underestimating problems. Math. Program. 10, 147–175 (1976) 11. Moon, J.W., Moser, L.: On cliques in graphs. Isr. J. Math. 3(1), 23–28 (1965) 12. Rikun, A.: A convex envelope formula for multilinear functions. J. Glob. Optim. 10, 425–437 (1997) 13. Ryoo, H.S., Jang, I.Y.: MILP approach to pattern generation in logical analysis of data. Discret. Appl. Math. 157, 749–761 (2009) 14. Ryoo, H.S., Sahinidis, N.: Analysis of bounds for multilinear functions. J. Glob. Optim. 19(4), 403–424 (2001) 15. Yan, K., Ryoo, H.S.: 0–1 multilinear programming as a unifying theory for LAD pattern generation. Discret. Appl. Math. 218, 21–39 (2017)
386
K. Yan and H. S. Ryoo
16. Yan, K., Ryoo, H.S.: Strong valid inequalities for Boolean logical pattern generation. J. Glob. Optim. 69(1), 183–230 (2017) 17. Yan, K., Ryoo, H.S.: A multi-term, polyhedral relaxation of a 0-1 multilinear function for Boolean logical pattern generation. J. Glob. Optim. https://doi.org/10. 1007/s10898-018-0680-8. (In press)
Gaining or Losing Perspective Jon Lee1(B) , Daphne Skipper2 , and Emily Speakman3 1
3
University of Michigan, Ann Arbor, MI, USA [email protected] 2 U.S. Naval Academy, Annapolis, MD, USA [email protected] Otto-von-Guericke-Universit¨ at, Magdeburg, Germany [email protected]
Abstract. We study MINLO (mixed-integer nonlinear optimization) formulations of the disjunction x ∈ {0} ∪ [l, u], where z is a binary indicator of x ∈ [l, u], and y “captures” xp , for p > 1. This model is useful when activities have operating ranges, we pay a fixed cost for carrying out each activity, and costs on the levels of activities are strictly convex. One well-known concrete application (with p = 2) is mean-variance optimization (in the style of Markowitz). Using volume as a measure to compare convex bodies, we investigate a family of relaxations for this model, employing the inequality yz q ≥ xp , parameterized by the “lifting exponent” q ∈ [0, p − 1]. These models are higher-dimensional-power-cone representable, and hence tractable in theory. We analytically determine the behavior of these relaxations as functions of l, u, p and q. We validate our results computationally, for the case of p = 2. Furthermore, for p = 2, we obtain results on asymptotic behavior and on optimal branching-point selection. Keywords: Mixed-integer nonlinear optimization · Volume · Integer Relaxation · Polytope · Perspective · Higher-dimensional power cone
·
Introduction Background. Our interest is in studying “perspective reformulations”. This technique has been used in the presence of indicator variables: when an indicator is “off”, a vector of decision variables is forced to a specific point, and when it is “on”, the vector of decision variables must belong to a specific convex set. [6] studied such a situation where binary variables manage terms in a separablequadratic objective function, with each continuous variable x being either 0 or in a positive interval (also see [4]). The perspective-reformulation approach (see [6] ´ J. Lee was supported in part by ONR grant N00014-17-1-2296 and LIX, l’Ecole Polytechnique. D. Skipper was supported in part by ONR grant N00014-18-W-X00709. E. Speakman was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 314838170, GRK 2297 MathCoRe. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 387–397, 2020. https://doi.org/10.1007/978-3-030-21803-4_39
388
J. Lee et al.
and the references therein) leads to very strong conic-programming relaxations, but not all MINLO (mixed-integer nonlinear optimization) solvers are equipped to handle these. So one of our interests is in determining when a natural and simpler non conic-programming relaxation may be adequate. Generally, our view is that MINLO modelers and algorithm/software developers can usefully factor in analytic comparisons of relaxations in their work. d-dimensional volume is a natural analytic measure for comparing the size of a pair of convex bodies in Rd . [8] introduced the idea of using volume as a measure for comparing relaxations (for fixed-charge, vertex packing, and other relaxations). [10–13] extended the idea to relaxations of graphs of trilinear monomials on box domains. Following up on work of [7,9,14] compared relaxations of graphical Boolean-quadric relaxations. [2,3] use volume cut off as a measure for the strength of cuts. Our view of the current relevant convex-MINLO software environment is that it is very unsettled with a lot to come. One of the best algorithmic options for convex-MINLO is “outer approximation”, but this is not usually appropriate when constraint functions are not convex (even when the feasible region of the continuous relaxation is a convex set). Even “NLP-based B&B” for convexMINLO may not be appropriate when the underlying NLP solver is presented with a formulation where a constraint qualification does not hold at likely optima. In some situations (like ours), the relevant convex sets can be represented as convex cones, thus handling the constraint-qualification issue—but introducing non-differentiability at likely optima. In this way of thinking, conic constraints are not well handled by general convex-MINLO software (like Knitro, Ipopt, Bonmin, etc.). The only conic solver that handles integer variables (via B&B) is MOSEK, and then only quadratic cones, and “as long as they do not contain both quadratic objective or constraints and conic constraints at the same time”. So not all of our work can be applied today, within the current convex-MINLO software environment, and so we see our work as forward looking. Our Contribution and Organization. We study MINLO formulations of the disjunction x ∈ {0} ∪ [l, u], where z is a binary indicator of x ∈ [l, u], and y “captures” xp , for p > 1 (see [1], for example). We investigate a family of relaxations for this model, employing the inequality yz q ≥ xp , parameterized by the “lifting exponent” q ∈ [0, p − 1]; we make the convention that 00 = 1 (relevant when z = 0 and q = 0). These models are higher-dimensional-powercone representable, and hence tractable in theory. We bound our formulations using the linear inequality up z ≥ y which is always satisfied at optimality (for the typical application where y replaces xp in a minimization objective). In Sect. 1 we formally define the sets relevant to our study. For q = 0, we have the most most na¨ıve relaxation using y ≥ xp . For q = 1, we have the na¨ıve perspective relaxation using yz ≥ xp . For q = p − 1, we get the true perspective relaxation using yz p−1 ≥ xp , which gives the convex hull. Interestingly, this last fact seems to be only very-well known when p = 2, in
Gaining or Losing Perspective
389
which case p − 1 = 1 and the na¨ıve perspective relaxation is the true perspective relaxation. So some might think, even for p > 2, that q = 1 would give the convex hull—but this na¨ıve perspective relaxation is not the strongest; we need to use q = p − 1 to get the convex hull. In Sect. 2, we present a formula for the volumes of all of these relaxations as a means of comparing them. In doing so, we quantify, in terms of l, u, p, and q, how much stronger the convex hull is compared to the weaker relaxations, and when, in terms of l and u, there is much to be gained at all by considering more than the weakest relaxation. Using our formula, and thinking of the baseline of q = 1, namely the na¨ıve perspective relaxation, we quantify the impact of “losing perspective” (e.g., going to q = 0, namely the most na¨ıve relaxation) and of “gaining perspective” (e.g., going to q = p − 1, namely the true perspective relaxation). Depending on l and u for a particular x (of which there may be a great many in a real model), we may adopt different relaxations based on the differences of the volumes of the various relaxation choices and on the solver environment. For p = 2, we obtain further results on asymptotic behavior and on optimal branching-point selection. Compared to earlier work on volume formulae and related branching-point selection relevant to comparing convex relaxations, our present results are the first involving convex sets that are not polytopes. Thus we demonstrate that we can get meaningful results that do not rely on triangulation of polytopes. In Sect. 3 we present some computational experiments (for p = 2) which bear out our theory, as we verify that volume can be used to determine which variables are more important to handle by perspective relaxation. Notation. Throughout, we use boldface lower-case for vectors and boldface upper-case for matrices, vectors are column vectors, · indicates the 2-norm, and for a vector x, its transpose is indicated by x .
1
Definitions
For real scalars u > l > 0 and p > 1, we define Sp := (x, y, z) ∈ R2 × {0, 1} : y ≥ xp , uz ≥ x ≥ lz , and, for 0 ≤ q ≤ p − 1, the associated relaxations Spq := (x, y, z) ∈ R3 : yz q ≥ xp , uz ≥ x ≥ lz, 1 ≥ z ≥ 0, y ≥ 0 . Note that even though xp − yz q is not a convex function for q > 0 (even for p = 2, q = 1), the set Spq is convex. In fact, the set Spq is higher-dimensionalpower-cone representable, which makes working with it appealing. Still, computationally handling higher-dimensional power cones efficiently is not a trivial matter, and we should not take it on without considering alternatives. These sets are unbounded in the increasing-y direction. This is rather inconvenient because we want to assess relaxations by computing their volumes. But
390
J. Lee et al.
in applications, y is meant to model/capture xp via the minimization pressure of an objective function. So for our purposes, we introduce the linear inequality up z ≥ y, which captures that z = 0 implies xp = 0, and that z = 1 implies up ≥ xp . For convenience, we write S¯ := S ∩ (x, y, z) ∈ R3 : up z ≥ y , for S ∈ {Sp , Spq , }. So, S¯pq is a relaxation of S¯p . The following result, part of which is closely related to results in [1], is easy to establish. Proposition 1. For p > 1 and q ∈ [0, p − 1], (i) S¯p ⊆ S¯pq , (ii) S¯pq is a convex set, (iii) S¯pq ⊆ S¯pq , for 0 ≤ q ≤ q, and (iv) conv(S¯p ) = S¯pp−1 .
2 2.1
Our Results Volumes
Theorem 2. For p > 1 and 0 ≤ q ≤ p − 1, vol(S¯pq ) =
(p2 − pq + 3p − q − 1)up+1 + 3lp+1 − (p + 1)(p − q + 2)lup . 3(p + 1)(p − q + 2)
Proof. Case 1: 0 < q ≤ p − 1. We proceed using standard integration techniques, and we begin by fixing the variable y and considering the corresponding 2-dimensional slice, Ry , of S¯pq . In the (x, z)-space, Ry is described by: z ≥ xp/q y −1/q z ≥ x/u
(1) (2)
z ≥ y/up
(3)
z ≤ x/l
(4)
z≤1 z≥0
(5) (6)
Inequality (6) is implied by (3) because y ≥ 0. Therefore, for the various choices of u, l, and y, the tight inequalities for Ry are among (1), (2), (3), (4), and (5). In fact, the region will always be described by either the entire set of inequalities (if y > lp ), or (1), (2), (3), and (4) (if y ≤ lp ). For an illustration of these two cases with p = 5 and q = 3, see Figs. 1, 2. To understand why these two cases suffice, observe that together (2) and (4) create a ‘wedge’ in the positive orthant. Ry is composed of this wedge intersected with {(x, z) ∈ R2 : z ≥ xp/q y −1/q }, for uyp ≤ z ≤ 1. With a slight abuse of notation, based on context we use (k), for k = 1, 2, . . . , 5, to refer both to the inequality defined above and to the 1-d boundary of the region it describes. Now consider the set of points formed by the wedge and the inequality and (4) intersect at (0, 0) and at a = (xa , za ) := z ≥ xp/q y −1/q . Curves (1) y 1/(p−q) y 1/(p−q) . Curves (1) and (2) intersect at (0, 0) and at b = , p lq l 1/(p−q) 1/(p−q) (xb , zb ) := uyq . To understand the area that we are seek, uyp ing to compute, we need to ascertain where (0, 0), a, and b fall relative to (3)
Gaining or Losing Perspective
391
Fig. 1. 0 < q ≤ p − 1, y < lp
Fig. 2. 0 < q ≤ p − 1, y > lp
and (5), which bound the region uyp ≤ z ≤ 1. Note that the origin falls on or below (3), and because u > l, a is always above b (in the sense of higher value of z). We show that b must fall between lines (3) and (5). This is equivalent to y 1/(p−q) y = zb ≤ 1. Now, we know y ≤ up , which implies uyp ≤ 1. up ≤ up 1 ≤ 1. From this we can From our assumptions on p and q, we also have 0 < p−q y 1/(p−q) y = zb ≤ 1. immediately conclude up ≤ up Furthermore, given that a must be above b, we now have our two cases: a is either above (5) (if y > lp ), or on or below (5) (if y ≤ lp ). Using the observations made above, we can now calculate the area of Ry via integration. We integrate
392
J. Lee et al.
over z, and the limits of integration depend on the value of y. If y ≤ lp , then the area is given by the expression: zb za 1 (uz − lz) dz + (yz q ) p − lz dz. y up
zb
If y ≥ lp , then the area is given by the expression: 1 zb 1 (yz q ) p − lz dz. (uz − lz) dz + y up
zb
Note that when y = lp , these quantities are equal. Furthermore, when q = p − 1 (and we have the hull), the first integral in each sum is equal to zero. Integrating over y, we compute the volume of S¯pq as follows:
lp
⎛ ⎝
( uyp )1/(p−q)
0
up
+ lp
=
y up
(uz − lz) dz +
( lyp )1/(p−q) ( uyp )
1/(p−q)
1 p
(yz q ) − lz
⎞ dz ⎠ dy
⎛ ⎞ ( uyp )1/(p−q) 1 1 ⎝ (yz q ) p − lz dz ⎠ dy (uz − lz) dz + 1/(p−q) y y ( ) p p u u
(p2 − pq + 3p − q − 1)up+1 + 3lp+1 − (p + 1)(p − q + 2)lup . 3(p + 1)(p − q + 2)
Case 2: q = 0. This case is similar to Case 1. However, here we need to ensure that we avoid division by zero. Consider a slice of the set S¯p0 for fixed y, again denoted Ry . In the (x, z)-space, Ry is described by inequalities (2)–(6) and the following inequality (which replaces (1)): x ≤ y 1/p .
(1 )
Similarly to before, we have that for the various choices of u, l, and y, Ry is described by either the entire set of inequalities (if y > lp ), or (1 ), (2), (3), and (4) (if y ≤ lp ). The reason it suffices to consider the two cases is very similar to the case of q > 0: consider the triangle formed by the ‘wedge’ inequalities i.e., (2) and (4), and the inequality x ≤ y 1/p . The vertices of this triangle are (0, 0), 1/p 1/p a = (xa , za ) := (y 1/p , y l ), and b = (xb , zb ) := (y 1/p , y u ). Again, we want to understand where a and b fall in relation to the lines (3) and (5) describing the region uyp ≤ z ≤ 1, and we know that the origin always falls below or on the bottom line. Furthermore, we again have that a is above b (because u > l). We show that b must fall between the two lines (3) and (5). We know y ≤ up , which implies uyp ≤ 1. We also know 0 < p1 < 1. From this we can conclude y 1/p y = zb ≤ 1. We have the same situation as when q > 0: a will either up ≤ up be above both lines, or between the two; b will always be between the two lines.
Gaining or Losing Perspective
393
Similarly to before, we can use this information to compute the volume of S¯p0 :
lp
vol(S¯p0 ) =
⎛ ⎝
0
up
+
y 1/p u y up
⎛ ⎝
lp
=
(uz − lz) dz +
y 1/p u y up
y 1/p l
y 1/p u
(uz − lz) dz +
1 p
y − lz
1 y 1/p u
⎞
1 p
y − lz
dz ⎠ dy
⎞ dz ⎠ dy
(p2 + 3p − 1)up+1 + 3lp+1 − (p + 1)(p + 2)lup . 3(p + 1)(p + 2)
Substituting q = 0 into the theorem statement gives this last expression.
We can now precisely quantify how much better the convex-hull perspective relaxation (q = p − 1) is compared to the most na¨ıve relaxation (q = 0): Corollary 3. For p > 1, vol(S¯p0 ) − vol(S¯pp−1 ) =
(p − 1)(up+1 − lp+1 ) 3(p + 1)(p + 2)
=
u3 − l 3 , for p = 2 . 36
We can also precisely quantify how much better the convex-hull perspective relaxation (q = p − 1) is compared to the na¨ıve perspective relaxation (q = 1): Corollary 4. For p ≥ 2, (p − 2)(up+1 − lp+1 ) vol(S¯p1 ) − vol(S¯pp−1 ) = . 3(p + 1)2 2.2
Asymptotics: The Case of p = 2
It is a direct consequence of our volume formula for p = 2, that in a natural asymptotic regime, the excess volume of the most na¨ıve relaxation, above the volume of the true perspective relaxation, is not a vanishingly small part of the volume of the most na¨ıve relaxation. Corollary 5. For l = ku, with constant k ∈ [0, 1), 1 + k + k2 1 vol(S¯20 ) − vol(S¯21 ) = ≥ . 0 2 ¯ u→∞ 3(3 − k − k ) 9 vol(S2 ) lim
2.3
Branching-Point Selection: The Case of p = 2
“Branching-point selection” is a key algorithmic issue in sBB (spatial branchand-bound), the main algorithm used in global optimization of “factorable formulations”. [11] introduced the investigation of “branching-point selection” using volume as a measure. Their idea is to determine the point at which we can split
394
J. Lee et al.
the domain of a variable, so that re-convexifying the two child relaxations yields the least volume. x) be the sum of the volumes of the two pieces For S ∈ {S¯21 , S¯20 }, let vS (ˆ of S created by branching on x at x ˆ ∈ [l, u]. Interestingly, the branching-point behavior of S¯21 and S¯20 are identical. Theorem 6. √ For S ∈ {S¯21 , S¯20 }, vS is strictly convex on [l, u], and its minimum is at x ˆ = (l + l2 + 3u2 )/3. Additionally, this suggests biasing branching-point selection up from the common choice of mid-point branching: √ Corollary 7. For S ∈ {S¯21 , S¯20 }, the optimal branching point is at least u/ 3 ≈ 0.57735 u, which is achieved if and only if l = 0.
3
Computational Experiments: The Case of p = 2
We carried out experiments on a 16-core machine (running Windows Server 2012 R2): two Intel Xeon CPU E5-2667 v4 processors running at 3.20GHz, with 8 cores each, and 128 GB of memory. We used the conic solver SDPT3 4.0 ([15]) under the Matlab “disciplined convex optimization” package CVX ([5]). 3.1
Separable Quadratic-Cost Knapsack Covering
Our first experiment is based on the following model, which we think of as a relaxation of the identical model having the constraints zi ∈ {0, 1} for i = 1, 2, . . . , n. The data c, f , a, l, u ∈ Rn and b ∈ R are all positive. The idea is that we have costs ci on x2i , and xi is either 0 or in the “operating range” [li , ui ]. We pay a cost fi when xi is nonzero. min c y + f z subject to: a x ≥ b ; ui zi ≥ xi ≥ li zi , ≥ yi ≥ 1 ≥ zi ≥ 0, u2i zi
x2i
i = 1, . . . , n ;
, i = 1, . . . , n ; i = 1, . . . , n .
For some of the i, we conceptually replace yi ≥ x2i with its perspective tightening yi zi ≥ x2i , yi ≥ 0; really, we are using a conic solver, so we instead employ an SOCP representation. We do this for the choices of i that are the k highest according to a ranking of all i, 1 ≤ i ≤ n. We let k = n(j/15), with j = 0, 1, 2, . . . , 15. Denoting the polytope with no tightening by Q and with tightening by P , we looked at three different rankings: descending values of vol(Q)−vol(P ) = (u3i −li3 )/36, ascending values of vol(Q)−vol(P ), and random. For n = 30, 000, we present our results in Fig. 3. As a baseline, we can see that if
Gaining or Losing Perspective
395
we only want to apply the perspective relaxation for some pre-specified fraction of the i’s, we get the best improvement in the objective value (thinking of it as a lower bound for the true problem with the constraints zi ∈ {0, 1}) by preferring i with the largest value of u3i −li3 . Moreover, most of the benefit is already achieved at much lower values of k than for the other rankings. [8,12] suggested that for a pair ofrelaxationsP, Q ⊂ Rd , a good measure for d d evaluating Q relative to P might be vol(Q) − vol(P ) (in our present setting, we have d = 3). We did experiments ranking by this, rather then the simpler vol(Q) − vol(P ), and we found no significant difference in our results. This can be explained by the fact that ranking by either of these choices is very similar for our test set. 3.2
Mean-Variance Optimization
Next, we conducted a similar experiment on a richer model, though at a smaller scale. Our model is for a (Markowitz-style) mean-variance optimization problem (see [4,6]). We have n investment vehicles. The vector a contains the expected returns for the portfolio/holdings x. The scalar b is our requirement for the minimum expected return of our portfolio. Asset i has a possible range [li , ui ], and we limit the number of assets that we hold to κ. Variance is measured, as usual, via a quadratic which is commonly taken to have the form: x (Q + Diag(c)) x, where Q is positive definite and c is all positive (see [4,6] for details on why this form is used in the application). Taking the Cholesky factorization Q = MM , we define w := M x, and introduce the scalar variable v. In this way, we arrive at the model:
Fig. 3. Separable-quadratic knapsack covering, n = 30, 000
396
J. Lee et al.
min v + c y subject to: ax≥b; e z ≤ κ ; w − M x = 0 ; v ≥ w2 ; ui zi ≥ xi ≥ li zi , u2i zi
≥ yi ≥
x2i
,
i = 1, . . . , n ; i = 1, . . . , n ;
1 ≥ zi ≥ 0, i = 1, . . . , n ; wi unrestricted, i = 1, . . . , n . Note: The inequality v ≥ w2 is correct; there is a typo in [6], where it is written as v ≥ w. The inequality v ≥ w2 , while not formulating a Lorentz (secondorder) cone, may be re-formulated as a an affine slice of a rotated Lorentz cone, or not, depending on the solver employed. Our results followed the same general trend seen in Fig. 3.
References 1. Akt¨ urk, M.S., Atamt¨ urk, A., G¨ urel, S.: A strong conic quadratic reformulation for machine-job assignment with controllable processing times. Oper. Res. Lett. 37(3), 187–191 (2009) 2. Basu, A., Conforti, M., Di Summa, M., Zambelli, G.: Optimal cutting planes from the group relaxations. arXiv:abs/1710.07672 (2018) 3. Dey, S., Molinaro, M.: Theoretical challenges towards cutting-plane selection. arXiv:abs/1805.02782 (2018) 4. Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program. 106(2), 225–236 (2006) 5. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1, build 1123. http://cvxr.com/cvx (2017) 6. G¨ unl¨ uk, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. Ser. B 124, 183–205 (2010) 7. Ko, C.W., Lee, J., Steingr´ımsson, E.: The volume of relaxed Boolean-quadric and cut polytopes. Discret. Math. 163(1–3), 293–298 (1997) 8. Lee, J., Morris Jr., W.D.: Geometric comparison of combinatorial polytopes. Discret. Appl. Math. 55(2), 163–182 (1994) 9. Lee, J., Skipper, D.: Volume computation for sparse boolean quadric relaxations. Discret. Appl. Math. (2017). https://doi.org/10.1016/j.dam.2018.10.038. 10. Speakman, E., Lee, J.: Quantifying double McCormick. Math. Oper. Res. 42(4), 1230–1253 (2017) 11. Speakman, E., Lee, J.: On branching-point selection for trilinear monomials in spatial branch-and-bound: the hull relaxation. J. Glob. Optim. (2018). https:// doi.org/10.1016/j.dam.2018.10.038. 12. Speakman, E., Yu, H., Lee, J.: Experimental validation of volume-based comparison for double-McCormick relaxations. In: Salvagnin, D., Lombardi, M. (eds.) CPAIOR 2017, pp. 229–243. Springer (2017)
Gaining or Losing Perspective
397
13. Speakman, E.E.: Volumetric guidance for handling triple products in spatial branch-and-bound. Ph.D., University of Michigan (2017) 14. Steingr´ımsson, E.: A decomposition of 2-weak vertex-packing polytopes. Discret. Comput. Geom. 12(4), 465–479 (1994) 15. Toh, K.C., Todd, M.J., T¨ ut¨ unc¨ u, R.H.: SDPT3-a MATLAB software package for semidefinite programming. Optim. Methods Softw. 11, 545–581 (1998)
Game Equilibria and Transition Dynamics with Networks Unification Alexei Korolev(&)
and Ilia Garmashov
National Research University “Higher School of Economics” Saint-Petersburg, Saint-Petersburg, Russia [email protected], [email protected]
Abstract. In this paper, we consider the following problem - what affects the Nash equilibrium amount of investment in knowledge when one of the complete graph enters another full one. The solution of this problem will allow us to understand exactly how game agents will behave when deciding whether to enter the other net, what conditions and externalities affect it and how the level of future equilibrium amount of investments in knowledge can be predicted. Keywords: Network Network game Productivity Innovation cluster
Nash equilibrium
Externality
1 Introduction The processes of globalization, post-industrial development and digitalization of the economy make studying of the role of innovative firms in the world economic development extremely significant. In papers [1, 3] mathematical models of the international innovative economy are constructed, on the basis of which the behavior of innovative firms is analyzed. In particular, authors of this article consider an important topic: how do firms realize their investment strategy in the development of knowledge, including outside their own region or country. The behavior of agents is determined by various externalities, which can have a completely different nature. Description of secondary effects is one of the most important directions in network game theory that authors of different articles try to analyze (for example, [5, 6]). There is also another aspect of the question: how to structure and organize their behavior in the best way in constantly changing economic and social conditions. In [2], the authors of the article try to take a new look at the system of organizing the actions of agent-innovators. It is important to take into account the impact (externalities) that influence agents by the environment, including other network entities. The article [4] shows the necessity of creation of regional innovative systems based on clusters. From this follows the relevance of the model description of the process of creating more extensive innovative clusters based on existing ones. In addition, there is a need to model the process of changing the Nash equilibrium investment values, as well as the search for new internal or angular ones.
© Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 398–406, 2020. https://doi.org/10.1007/978-3-030-21803-4_40
Game Equilibria and Transition Dynamics
399
This article continues the study of Nash equilibria and its changes in the process of unification complete graphs. However, this paper contains a number of new elements in comparison with previous studies. To begin with, we study dynamic behavior of agents, not only by generalizing the simple two-period model of endogenous growth of Romer with the production and externalities of knowledge (as in paper [7]), but also by using difference equations. Moreover, we assume that our agents are innovative companies that are interested in knowledge investmens. The main content of the article is focused on the analysis of changes in the Nash equilibrium investment values, as well as the description of angular solutions. We study necessary and sufficient conditions, and possible limitations for the appearance of new different equilibria. The main problem of this research is to study the differences in agents’ behavior during networks unification of different sizes and relations between amount of actors’ knowledge investments and their productivity. To achieve this goal, the following objectives should be fulfilled: (1) to create the model which can describe agents’ decisions in amount of knowledge investments; (2) to find the equilibrium condition of this model hat shows the optimal choices of each network agent; (3) outline the relations between agent’s productivity, network size and value of knowledge invesments.
2 Model Description There is a network (undirected graph) with n nodes, i ¼ 1; 2; ::; n; each node represents an agent. In period 1 each agent i possesses initial endowment of good, e, and uses it partially for consumption in first period of life, ci1 , and partially for investment into knowledge, ki : ci1 þ ki ¼ e; i ¼ 1; 2; . . .; n:
ð1Þ
Investment immediately transforms one-to-one into knowledge which is used in production of good for consumption in second period, ci2 . Preferences of agent i are described by quadratic utility function: Ui ci1 ; ci2 ¼ ci1 e aci1 þ bi ci2 ;
ð2Þ
where bi [ 0; a is a satiation coefficient, bi is a parameter, characterized the value of comfort and health in the second period of life compared to consumption in the first period. It is assumed that ci1 2 ½0; e, the utility increases in ci1 , and is concave with respect to ci1 . These assumptions are equivalent to condition 0\a\1=2. Production in node i is described by production function:
400
A. Korolev and I. Garmashov
F ðki ; Ki Þ ¼ Bi ki Ki ; Bi [ 0;
ð3Þ
which depends on the state of knowledge in i-th node, ki , and on environment, Ki , Bi is a technological coefficient. The environment is the sum of investments by the agent himself and her neighbors: X ~i; K ~i ¼ K i ¼ ki þ K kj ; ð4Þ j2N ðiÞ
where N ðiÞ – is the set of neighboring nodes of node i. We will denote the product bi Bi by Ai and assume that a\Ai . Since increase of any of parameters bi ; Bi promotes increase of the second period consumption, we will call Ai “productivity”. We will assume that Ai 6¼ 2a; i ¼ 1; 2; . . .; n: If Ai [ 2a, we will say that i-th agent is productive, and if Ai \2a, we will say that i-th agent is unproductive. Three ways of behavior are possible: agent i is called passive if she makes zero investment, ki ¼ 0 (i.e. consumes the whole endowment in period 1); active if 0\ki \e; hyperactive if she makes maximally possible investment e (i.e. consumes nothing in period 1). Let us consider the following game. Players are the agents i ¼ 1; 2; . . .; n. Possible actions (strategies) of player i are values of investment ki from the segment ½0; e. Nash equilibrium with externalities (forshortness, equilibrium) is a profile of knowledge levels (investments) k1 ; k2 ; . . .; kn , such that each ki is a solution of the following problem PðKi Þ of maximization of i-th player’s utility given environment Ki : Ui ci1 ; ci2 !½ci1 ; ci2 ; ki max 8 ci1 e ki ; > < ; ð5Þ i c2 F ðki ; Ki Þ; > : i c1 0; ci2 0; ki 0; where the environment Ki is defined by the profile k1 ; k2 ; . . .; kn : Ki ¼ ki þ
X
k : j2N ðiÞ j
ð6Þ
The first two constraints of problem PðKi Þ in the optimum point are evidently satisfied as equalities. Substituting into the objective function, we obtain a new function (payoff function): Vi ðki ; Ki Þ ¼ e2 ð1 aÞ ki eð1 2aÞ aki2 þ Ai ki Ki
ð7Þ
If all players’ solutions are internal ð0\ki \e; i ¼ 1; 2; . . .; nÞ, i.e. all players are active, the equilibrium will be referred as inner equilibrium. Clearly, the inner equilibrium (if it exists for given values of parameters) is defined by the system D1 Vi ðki ; Ki Þ ¼ 0; i ¼ 1; 2; . . .; n
ð8Þ
Game Equilibria and Transition Dynamics
401
or D1 Vi ðki ; Ki Þ ¼ eð2a 1Þ 2aki þ Ai Ki ; i ¼ 1; 2; . . .; n
ð9Þ
We introduce the following notation. Regardless of agent type of behavior the equation root ~ i eð1 2aÞ ¼ 0 D1 Vi ðki ; Ki Þ ¼ ðAi 2aÞki þ Ai K
ð10Þ
will be denoted by ~kis . Thus, ~ ~ks ¼ eð2a 1Þ þ Ai Ki ; i 2a Ai
ð11Þ
~ i – pure externality of agent i. It is obvious, that if agent i is active, then his where K investments will be equal to ~kis in equilibrium. To analyze equilibriums we need the following statement. Proposition 1. ([5], Lemma 2.1 and Corollary 2.1) A set of investment agent values ðk1 ; k2 ; . . .; kn Þ can be an equilibrium only if for each i ¼ 1; 2; . . .; n it is true that ~i 1. if ki ¼ 0, then K 2. if 3. if
eð12aÞ Ai ; 0\ki \e, then ki ¼ ~kis ; ~ i eð1Ai Þ. ki ¼ e, then K Ai
3 Unification of Innovative Nets Let us consider the following situation. There are two clicks with n1 and n2 edges respectively, with the same agents’ productivity: A. There are three types of agents in the united network. The actors of the first type are all the agents of the first network, besides the agent of the first network, which connected to the agents of the second network. Actors of the second type are all agents of the second network. The third type of actors is only one agent – this is the agent of the first network that connected to all actors of the second network. Since all agents of the same type will have the same environment, they will behave in the same way, not only in equilibrium, but also in dynamics. Therefore, the investment of each agent of the type i will be denoted ki , and the environment of each agent of the type i will be denoted Ki . Both the clicks are initially in inner equilibrium. It follows immediately from (9) that the initial investment of agents is the following k10 ¼ k30 ¼
eð1 2aÞ ; n1 A 2a
k20 ¼
eð1 2aÞ n2 A 2a
The system (9) for inner equilibrium in joined network is
ð12Þ
402
A. Korolev and I. Garmashov
8
>
> : t þ 1 ðn1 1ÞA t n2 A t Þ A t k3 þ eð2a1 k3 ¼ 2a k1 þ 2a k2 þ 2a 2a :
where t ¼ 0; 1; 2. . . Characteristic equation for this system is
A t 2a k3
þ
ð18Þ
Game Equilibria and Transition Dynamics
k3 þ k2
ðn1 þ n2 ÞA ðn1 1Þn2 A2 ðn1 1Þn2 A3 ¼ 0: k 2a 4a2 8a3
403
ð19Þ
Definition 2. The equilibrium is called dynamically stable if, after a small deviation of one of the agents from the equilibrium, dynamics starts which returns the equilibrium back to the initial state. In the opposite case the equilibrium is called dynamically unstable.
5 Network Dynamics Model of Net Unification To find the eigenvalues of difference equations (18) in general we need to impose the restrictions: n2 ¼ n1 1 ¼ n. Then the Eq. (19) takes the form nA k 2a 0 nA
nA 2a
2a
k nA 2a
hence k1;2
ðn þ 1ÞA ¼ 4a
nA ðn þ 1ÞA nA2 2 k k k 2 ¼ 0; ¼ 2a 2a 4a k
A 2a A 2a
0 A 2a
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðn þ 1Þ2 A2 nA2 A ¼ n þ 1 n2 þ 6n þ 1 ; þ 16a2 4a2 4a
k3 ¼
ð20Þ
nA : 2a ð21Þ
In this case inner equilibrium in joined network will be k1 ¼ k2 ¼
nA2
2aeð1 2aÞ ; þ 2aðn þ 1ÞA 4a2
k3 ¼
eð1 2aÞðnA þ 2aÞ : nA2 þ 2aðn þ 1ÞA 4a2
Let us find the eigenvectors 8 >
A2a : nA nA 2a x1 þ 2a x2 þ 2a k x3 ¼ 0:
Thus, if k 6¼ nA 2a , then x1 ¼ x2 ¼ x, x3 ¼
ðknA 2a Þx A 2a
ð22Þ
ð23Þ
, and if k ¼ nA 2a , then x3 ¼ 0, x1 ¼ x2 .
Hence we may choose 0
1 1 A e1 ¼ @ p1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 1 2 þ 6n þ 1 1 n n 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A n þ 1 n2 þ 6n þ 1 corresponding to k1 ¼ 4a
ð24Þ
404
A. Korolev and I. Garmashov
x3 ¼
A 4a
nþ1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nA n2 þ 6n þ 1 2a A 2a
¼
(we supposed x ¼ x1 ¼ x2 ¼ 1) ¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 1 n þ 1 n2 þ 6n þ 1 n ¼ 1 n n2 þ 6n þ 1 ; 2 2 0 1 1 A e2 ¼ @ p1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 1 2 þ 6n þ 1 1 n þ n 2
ð25Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A corresponding to k2 ¼ 4a n þ 1 þ n2 þ 6n þ 1 , 0
1 1 e3 ¼ @ 1 A; 0
ð26Þ
corresponding to k3 ¼ nA 2a . Thus, dynamics in joined network is described with following vector equation 0
1 0 1 k1 k1t @ k2t A ¼ C1 kt e1 þ C2 kt e2 þ C3 kt e3 þ @ k2 A 1 2 3 k3t k3
ð27Þ
The constants C1 , C2 , C3 can be found from initial conditions. Before unification the both networks were in symmetric inner equilibria: k10 ¼ k30 ¼
eð1 2aÞ eð1 2aÞ ; k20 ¼ : ðn þ 1ÞA 2a nA 2a
ð28Þ
Hence by n ¼ 0 we receive the following equations 8 > > > > > >
2 2 > > ðn þ 1ÞA2a ¼ 2 1 n n þ 6n þ 1 C1 þ 2 1 n þ n þ 6n þ 1 C2 þ > > > : þ eð12aÞðnA þ 2aÞ :
ð29Þ
nA2 þ 2aðn þ 1ÞA4a2
We add first two equations of the previous system. We receive the following system of two equations to define C1 and C2 :
Game Equilibria and Transition Dynamics
8 Þ½ð2n þ 1ÞA4a 2aeð12aÞ C1 þ C2 ¼ 2e½ðð12a > n þ 1ÞA2aðnA2aÞ nA2 þ 2aðn þ 1Þ4a2 [ 0; > < pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 1 n n2 þ 6n þ 1 C1 þ 12 1 n þ n2 þ 6n þ 1 C2 ¼ 2 > > : Þ eð12aÞðnA þ 2aÞ ¼ ðneþð12a 1ÞA2a nA2 þ 2aðn þ 1ÞA4a2 \0
405
ð30Þ
It is easy to check, that the right side of the first equation is positive, and the right side of the second equation is negative. It is clear, that k2 is the largest eigenvalue in absolute and it is positive as all the components of eigenvector e2 : Hence, the nature of transitional process will be determined with the sign of constant C2 . Further, the sign of C2 , is defined by sign of the following expression pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 þ 6n þ 1 þ n 1 k 0 þ k 0 k k ; ~ 2 ¼ 2 k0 k þ n D 3 3 1 2 1 2
ð31Þ
where k10 ; k20 ; k30 – initial investment values. ~ 2 is positive, then after transition process the network pass to corner equilibIf D ~ 2 is negative, then after tranrium, where all the agents are hyperactive ðki ¼ eÞ. If D sition process the network pass to corner equilibrium, where all the agents are passive ðki ¼ 0Þ. It is easy to see that in considering case when both networks were in inner ~ 2 [ 0 and the network pass to corner equilibria with any parameters if n [ 1, then D equilibrium, where all the agents are hyperactive. If n ¼ 1 (dyada connects to single agent), then the nature of network behavior in transitional process depends on ratio of pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi A and a. Namely, if Aa 96 þ 32 2 4, then the network pass to corner equilibpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi rium, where all the agents are hyperactive. If Aa \ 96 þ 32 2 4, then the network pass to corner equilibrium, where all the agents are passive. So it is obvious (with Kramer formulas), that C1 [ 0. Comparing first equation of system (29) and first equation of system (30) it is obvious that C3 [ 0. Let us check that corner solution, where k1 ¼ k2 ¼ k3 ¼ e, is equilibrium: 1. D1 V1 ðk1 ; K1 Þjk1 ¼k2 ¼k3 ¼e ¼ 2ae þ Ane eð1 2aÞ ¼ Ane e 0, if A 1n, 2. D1 V2 ðk2 ; K2 Þjk1 ¼k2 ¼k3 ¼e ¼ 2ae þ Ane eð1 2aÞ ¼ Ane e 0, if A 1n, 3. D1 V3 ðk3 ; K3 Þjk1 ¼k2 ¼k3 ¼e ¼ 2ae þ A2ne eð1 2aÞ ¼ 2Ane e 0, if A
1 2n,
corresponding to Corollary 2.4 in [5].
6 Conclusion In this paper, we have described the process of change in game equilibrium during graphs unification using dynamics model. We have highlighted the significance of the productivity role that influences the agents’ behavior. Moreover, we determined the importance of networks sizes, which also effect on agents’ decisions that they take during unification process.
406
A. Korolev and I. Garmashov
We believe that this article offers the base model of game equilibria change that can be improved with increasing the amount of parameters and modification graphs’ type to incomplete nets or non-oriented graphs. Acknowledgement. The research is supported by the Russian Foundation for Basic Research (project 17-06-00618).
References 1. Alcacer, J.: Chung W: Location strategies and knowledge spillovers. Manage. Sci. 53(5), 760–776 (2007) 2. Bresch, S., Lissoni, F.: Knowledge spillovers and local innovation systems: a critical survey. Ind. Corp. Change 10(4), 975–1005 (2001) 3. Chung, W., Alcácer, J.: Knowledge seeking and location choice of foreign direct investment in the United States. Manage. Sci. 48(12), 1534–1554 (2002) 4. Cooke, P.: Regional innovation systems, clusters, and the knowledge economy. Ind. Corp. Change 10(4), 945–974 (2001) 5. Jaffe, A.B., Trajtenberg, M.: Henderson, R: Geographic localization of knowledge spillovers as evidenced by patent citations. Q. J. Econ. 108(3), 577–598 (1993) 6. Katz, M.L.: Shapiro, C: Network externalities, competition, and compatibility. Am. Econ. Rev. 75(3), 424–440 (1985) 7. Matveenko, V.D., Korolev, A.: V: Network game with production and knowledge externalities. Contrib. Game Theory Manag. 8, 199–222 (2015) 8. Matveenko, V.D., Korolev, A.: V: Knowledge externalities and production in network: game equilibria, types of nodes, network formation. Int. J. Comput. Econ. Econ. 7(4), 323–358 (2017) 9. Matveenko, V., Korolev, A.: Zhdanova, M: Game equilibria and unification dynamics in networks with heterogeneous agents. Int. J. Eng. Bus. Manag. 9, 1–17 (2017)
Local Search Approaches with Different Problem-Specific Steps for Sensor Network Coverage Optimization Krzysztof Trojanowski
and Artur Mikitiuk(B)
Cardinal Stefan Wyszy´ nski University, Warsaw, Poland {k.trojanowski,a.mikitiuk}@uksw.edu.pl
Abstract. In this paper, we study relative performance of local search methods used for the Maximum Lifetime Coverage Problem (MLCP) solving. We consider nine algorithms obtained by swapping problemspecific major steps between three local search algorithms we proposed earlier: LSHMA , LSCAIA , and LSRFTA . A large set of tests carried out with the benchmark data set SCP1 showed that the algorithm based on the hypergraph model approach (HMA) is the most effective. The remaining results of other algorithms divide them into two groups: effective ones, and weak ones. The findings expose the strengths and weaknesses of the problem-specific steps applied in the local search methods.
Keywords: Maximum lifetime coverage problem Perturbation operators
1
· Local search ·
Introduction
Wireless sensor networks are subject to many research projects where an important issue is a maximization of time when the system fulfills its tasks, namely the network lifetime. When a set of immobile sensors with a limited battery capacity is randomly distributed over an area to monitor a set of points of interest (POI) and the number of sensors is significant, for the majority of sensors, their monitoring ranges overlap. Moreover, not all POIs must be monitored all the time. In many applications, it is sufficient to control at any given time 80 or 90% of POIs. This percentage of POIs is called the required level of coverage. Thus, not all sensors must be active all the time. Turning off some of them saves their energy and allows to extend the network lifetime. In this paper, we study relative performance of local search strategies used to solve the problem of the network lifetime maximization. Such an approach consists of three major problem–specific steps: finding any possible schedule, that is, an initial problem solution, using a perturbation procedure to obtain its neighbor, i.e., a solution close to the original one, and refining this neighbor solution. If the refined neighbor is better than its ancestor, it takes the place c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 407–416, 2020. https://doi.org/10.1007/978-3-030-21803-4_41
408
K. Trojanowski and A. Mikitiuk
of the current best–found schedule. The algorithm repeats the steps of neighbor generation and replacement until some termination condition is satisfied. Earlier, in [8–10] we have proposed three local search algorithms to solve the problem in question. Each of these algorithms employs a different method to obtain an initial solution and a different perturbation procedure to get a neighbor solution. Moreover, each of these algorithms refines the neighbor with the method used to get the initial solution. Due to the regular and universal structure of the three optimization algorithms, one can easily create new ones by swapping selected problem–specific steps between them. In this paper, we construct a group of local search algorithms based on the three proposed earlier. We also evaluate their performance experimentally. The paper is organized as follows. Related work is briefly discussed in Sect. 2. Section 3 defines the Maximum Lifetime Coverage Problem (MLCP) formally. The local search approach is introduced in Sect. 4. Section 5 describes our experiments with local search algorithms for MLCP. Our conclusions are given in Sect. 6.
2
Related Work
The problem of maximization of the sensor network lifetime has been intensively studied for the last two decades. There are many variants of this problem, and various strategies have been employed to solve them. More about these strategies can be found in, e.g., the monograph [11] on the subject. More recently, modern heuristic techniques have also been applied to this problem. One can find, e.g., papers on schedule optimization based on evolutionary algorithms [1,4], local search with problem–specific perturbation operators [8–10], simulated annealing [5,7], particle swarm optimization [13], whale algorithm [12], graph cellular automata [6] or other problem–specific heuristics [2,3,14].
3
Maximum Lifetime Coverage Problem (MLCP)
We assume that NS immobile sensors with a limited battery capacity are randomly deployed over an area to monitor NP points of interest (POI). All sensors have the same sensing range rsens and the same fully charged batteries. We use a discrete-time model where a sensor is either active or asleep during every time slot. Every sensor consumes one unit of energy per time unit for its activity while in a sleeping state the energy consumption is negligible. Battery capacity allows the sensor to be active during Tbatt time steps (consecutive, or not). The assumptions mentioned above give a simplified model, of course. In real life, effective battery capacity depends on various factors, such as temperature, and is hard to predict. Frequent turnings on and off the battery shorten its lifetime. In this research, we omit such problems and assume that neither the temperature nor the sensor activity schedule influences the battery parameters. An active sensor monitors all POIs within its sensing range. We assume that every POI can be
Local Search Approaches with Different Problem-Specific Steps
409
monitored by at least one sensor. For effective monitoring, it is sufficient to control just a percentage cov of all POIs (usually 80–90%). To avoid redundancy in coverage of POIs, we want the percentage of POIs being monitored at any given time not to exceed cov by more than a tolerance factor δ (usually 2–5%). In the real world, sensors also need to communicate with each other about their findings and turning off some sensors can affect this process. A sensor which monitoring area overlaps areas of its neighbor sensors may be still necessary but to ensure connectivity in the communication graph. However, in the model under consideration, communication between sensors is not a problem – energy consumption for communication is negligible and active sensors are always able to communicate, even if some sensors are in a sleeping state. Summing up, we aim to provide a schedule of sensor activity giving a sufficient level of coverage in every time step as long as possible. Problems from this class are called the Maximum Lifetime Coverage Problems (MLCP).
4
General Scheme of the Local Search
In [8–10] three algorithms have been proposed: LSHMA , LSCAIA and LSRFTA . Each of them is based on the same schema given in Algorithm 1. They differ from each other in three steps: the initialization step (line 1), and two steps being main parts of the neighborhood function (lines 3 and 4). This function perturbs the current schedule to obtain its neighbor (line 3) and then refines the neighbor (line 4) hoping to get a better schedule than the current one.
Algorithm 1. Local Search (for the schedule maximization context) 1: 2: 3: 4: 5: 6: 7: 8:
Initialize x ∈ D Step #1: generate an initial solution x repeat Step #2: create a neighbor – modification of x x = modify(x) Step #3: create a neighbor – improvement or repair of x x” = refine(x ) if F (x”) > F (x) then x = x” The current solution is replaced by its neighbor until termination condition met return x
When step #1 is over, and the initialization procedure returns a new schedule, in almost every case a small set of sensors retains a little energy in their batteries. Even if we turn them all on, they will not provide a sufficient level of POI coverage. Therefore, no feasible slot can be created using these sensors. Perturbation operators make use of this set. A perturbation operator builds a neighbor schedule in two steps (lines 3 and 4). First, the operator modifies the input schedule to make the set of available working sensors larger. In the second step, it builds slots based on these sensors. Eventually, the new list of slots should be longer or at least as long as the list in the input schedule.
410
K. Trojanowski and A. Mikitiuk
The first step of the perturbation operator may follow two opposite strategies. In the first one, the operator turns off selected sensors in the schedule. In [8] a single slot is chosen randomly and removed entirely from the schedule. Thus, all the active sensors from this slot recover one portion of energy. In [10], for each of the slots the sensors to be off are chosen randomly with a given probability. Simulations show that even for a minimal probability like, for example, 0.0005 the number of slots with unsatisfied coverage level is much larger than one when the procedure is over. Therefore, this perturbation is much stronger than the previous one because all such invalid slots are removed immediately from the schedule. The second strategy [9] starts with activation of sensors from the pool of the remaining working sensors in random slots of the schedule. For each of the selected slots we draw a sensor from the pool randomly, but for faster improvement, we activate only sensors which increase the level of coverage in the slot. Precisely, for selected slots, we choose a sensor randomly from the pool and then check if its coverage of POI is fully redundant with any of sensors already active in this slot. If yes, activation of this sensor is pointless because the slot coverage level does not change. In this case, the selected sensor goes back to the pool, and we try the same procedure of sensor selection and activation with the next slot. When the pool is empty, that is, all the remaining working sensors have been activated, in the modified slots we fit the sets of active sensors to have the coverage level just a bit above the requested threshold. Saved sensors retain energy and participate in a new set of working ones. In the second step of the perturbation operator, we assume that the new set of working sensors is large enough to provide a satisfying level of coverage, so we apply the initialization procedure from step #1. The procedure creates slots one by one and decreases the energy in batteries at the same time according to the sensor activities defined in subsequent new slots. This scheme is the same in HMA, RFTA, and CAIA (albeit, they differ in details). Hence, the non-empty schedule and the set of working sensors may successfully represent input data for the initialization procedure called in the second step of the perturbation operator. Eventually, we get three versions of proceedings for each of the three steps. When we swap these versions between LS algorithms, we can obtain twenty seven versions of LS. Let’s call these versions of LS according to the origin of the three steps. For example, the notation [HM A, HM A, HM A] represents a Local Search algorithm, where all the three steps are like in LSHMA , that is, it is the original, unmodified version of LSHMA . [HM A, RF T A, HM A] represents the case where the initialization and the refine steps come from LSHMA , but the modification – from LSRFTA .
5
Experiments
The experimental part of the research consists of experiments with new versions of LS. For fair comparisons, all the tested versions of LS should start with the
Local Search Approaches with Different Problem-Specific Steps
411
same initial solutions. Low quality of the initial schedules creates an opportunity to show the efficiency of the compared optimization algorithms. HMA returns the most extended schedules which are hard to improve; therefore it is not taken into account. From the remaining two procedures, we selected CAIA to generate initial schedules for each of the problem instances. CAIA represents the initialization step of compared LS versions in every case, and what is more important, the main loops of the algorithms begin optimization from the same starting points assigned to the instances. Thus, just the main loop, that is, precisely the perturbation operator of the algorithm may vary in the subsequent versions of LS. So, in the further text, we label the LS versions accordingly to the construction of just the perturbation operator and the symbol for the method used in the initialization procedure is omitted. The full list of considered versions of LS is as follows: [HM A, HM A], [HM A, RF T A], [HM A, CAIA], [RF T A, HM A], [RF T A, RF T A], [RF T A, CAIA], [CAIA, HM A], [CAIA, RF T A], and [CAIA, CAIA]. In every case, the loop has a limit of 500 iterations. For fair evaluation of the algorithm efficiency, we should compare lengths of obtained schedules with the optimal schedules for each of the instances. Unfortunately, optimal solutions of the instances are unknown, and the complexity of these problems makes them impossible to solve by an exhaustive search in a reasonable time. Therefore, to obtain sub-optimal schedules, we did a set of experiments with different versions of LS for all instances. All these versions employed HMA as the initialization step hoping that in this way we maximize chances to get solutions in close vicinity of the optimum. Lengths of best-obtained schedules represent reference values in further evaluations of the percentage quality of schedules. 5.1
Benchmark SCP1
For our experiments, we used a set of eight test cases SCP1 proposed earlier [8– 10]. In all cases, there are 2000 sensors with the sensing range rsens one unit (this is an abstract unit, not one of the standard units of the length). In these test cases, POIs form nodes of a rectangular or a triangular grid. The area under consideration is a square with possible side sizes: 13, 16, 19, 22, 25, and 28 units. The distance between POIs grows together with the side size of the area. This gives us similar numbers of POIs in all test cases. The POIs distribution should not be regular; therefore, the number of about 20% of nodes in the grid does not get its POIs. A grid node has a POI only if a randomly generated value from the range [0, 1] is less than 0.8. Thus, instances of the same test case differ in the number of POIs from 199 to 240 for the triangular grid and from 166 to 221 for the rectangular grid. Either a random generator or a Halton generator is the source of the sensor localization coordinates. For every test case, a set of 40 instances has been generated. The reader is referred to [8–10] for a more detailed description of the benchmark SCP1. In our experiments, we have assumed cov 80% and δ 5%.
412
5.2
K. Trojanowski and A. Mikitiuk
Overall Mean Percentage Quality
Schedule length is the primary output parameter of the experiments. However, the optimal schedule lengths may differ for subsequent instances, so, the straight comparisons of the schedule lengths may be misleading. Therefore, for each of the schedules returned by LS, we calculated its percentage quality respectively to the best-known schedule lengths. This kind of normalization allows comparing the efficiency of LS versions over different classes of problems. Table 1 shows mean, min and max percentage qualities of the best-found schedules returned by the LS algorithms for each of the five values of Tbatt from 10 to 30. Table 1. Mean, min and max percentage qualities of the best-found schedules returned by the LS algorithms for each of the five values of Tbatt from 10 to 30. Codes in column headers: C – CAIA, H – HMA, R – RFTA, e.g., HR represents the version [HM A, RF T A]; init – qualities of the initial schedules generated by CAIA Tbatt
Init
HH
HR
CH
HC
RH
CR
RR
CC
RC
10 Mean 53.94 96.75 94.55 94.51 93.56 57.37 55.18 54.70 54.55 54.49 Min 52.18 94.21 92.26 91.51 89.59 54.72 53.36 52.87 52.79 52.80 Max 56.04 98.72 96.73 98.10 96.23 61.35 57.07 56.74 56.60 56.60 15 Mean 53.99 97.01 94.77 94.31 93.84 56.87 55.44 54.70 54.50 54.46 Min 52.35 95.01 92.56 91.26 90.72 54.41 53.49 52.98 52.83 52.79 Max 55.96 99.02 96.93 97.37 96.42 59.79 57.28 56.65 56.47 56.38 20 Mean 53.89 96.95 94.76 94.08 93.83 56.30 55.60 54.56 54.32 54.28 Min 52.37 94.60 92.70 91.01 91.15 54.16 53.92 53.07 52.83 52.73 Max 56.04 98.91 96.77 97.14 96.06 59.04 57.80 56.68 56.40 56.37 25 Mean 54.10 97.13 94.92 94.22 94.06 56.34 56.05 54.78 54.48 54.44 Min 52.37 94.98 92.76 91.13 91.08 54.18 54.14 53.01 52.75 52.73 Max 55.87 98.95 96.92 97.03 96.14 58.62 58.13 56.65 56.22 56.20 30 Mean 53.86 97.02 94.73 93.69 93.80 55.86 56.07 54.55 54.21 54.17 Min 52.29 95.19 92.77 90.91 91.31 53.68 54.29 52.93 52.58 52.55 Max 55.85 98.99 97.00 96.77 96.19 58.33 58.17 56.59 56.24 56.20
Concerning effectiveness, one can divide the LS versions into three groups: weak ones, effective ones, and the master approach which is [HM A, HM A]. The group of effective ones consists of [HM A, RF T A], [CAIA, HM A] and [HM A, CAIA]. The remaining LS versions belong to the group of weak ones. Thus, all three approaches using the perturbation method from HMA result in obtaining a relatively good schedule, no matter what algorithm is used in the last step to repair or refine the schedule. One could ask why [CAIA, HM A] is an effective approach while [RF T A, HM A] is a weak one. The probable reason is that the perturbation operator used in CAIA removes from the original schedule many more slots
Local Search Approaches with Different Problem-Specific Steps
413
than the one used in RFTA. Thus, the number of sensors available again is higher, and they have more energy in batteries than in the case of RFTA perturbation operator. Having a broader set of available sensors (or even the same set but having more power), an efficient algorithm HMA can extend a shorter input schedule much more than in the opposite case when it gets on input a longer input schedule and a smaller set of available sensors obtained from the perturbation used by RFTA. 5.3
Lengths of Schedules
Mean, min and max lengths of schedules returned by the best representatives of the three groups are presented in Tables 2 – [HM A, HM A], 3 – [HM A, RF T A], 4 – [RF T A, HM A]. One can see that the individual results in Table 2 are often even 5–7% better than the corresponding results in Table 3. However, in some cases results in Table 3 are slightly (less than 1%) better than those in Table 2. The individual results in Table 4 are always much worse than the corresponding Table 2. Mean, min and max lengths of schedules returned by the version [HM A, HM A] for each of the eight test cases in SCP1 and for five values of Tbatt from 10 to 30 No Tbatt
Mean Min Max No Tbatt
Mean Min Max
1
10 357.02 348 366 5 15 537.92 527 551 20 718.58 703 734 25 899.67 880 923 30 1080.38 1037 1103
10 15 20 25 30
173.20 261.68 350.55 438.10 529.73
167 250 340 425 510
180 271 361 453 548
2
10 372.70 366 379 6 15 561.77 551 576 20 749.17 735 766 25 937.00 919 956 30 1127.63 1111 1150
10 15 20 25 30
139.80 211.40 282.65 354.10 425.93
136 208 272 346 419
142 215 287 361 433
3
10 15 20 25 30
245.97 370.52 494.77 619.40 744.48
242 364 486 606 730
251 7 376 502 630 755
10 15 20 25 30
101.50 154.05 206.63 258.73 311.20
97 149 198 249 299
108 163 218 274 328
4
10 15 20 25 30
162.00 244.75 328.52 409.60 494.65
150 236 300 375 477
171 8 257 346 432 518
10 15 20 25 30
90.15 135.95 182.60 227.93 274.52
89 134 177 222 269
93 140 187 234 281
414
K. Trojanowski and A. Mikitiuk
Table 3. Mean, min and max lengths of schedules returned by the version [HM A, RF T A] for each of the eight test cases in SCP1 and for five values of Tbatt from 10 to 30 No Tbatt
Mean Min Max No Tbatt
Mean Min Max
1
10 336.43 15 506.68 20 676.67 25 846.23 30 1014.00
325 344 5 494 524 661 691 819 870 980 1043
10 15 20 25 30
171.57 259.55 347.05 436.15 525.33
167 250 340 425 510
176 267 355 448 539
2
10 349.50 337 360 6 15 524.92 502 542 20 700.23 667 718 25 874.40 852 901 30 1049.83 1023 1091
10 15 20 25 30
138.15 208.55 279.25 349.90 420.27
135 206 274 345 414
141 213 283 355 428
3
10 15 20 25 30
234.32 353.05 472.00 590.20 708.75
226 341 464 578 689
239 7 360 480 598 722
10 15 20 25 30
101.85 154.95 207.75 261.40 317.43
99 149 199 249 299
108 163 219 275 330
4
10 15 20 25 30
161.82 244.50 329.85 414.35 499.55
150 225 319 388 450
169 8 254 343 432 519
10 15 20 25 30
90.58 136.43 182.70 228.57 275.13
89 134 179 224 269
93 140 187 234 281
results from the previous two tables. Thus, comparison of absolute results for individual test cases confirms observations from Sect. 5.2 concerning overall mean relative quality of the schedules generated by particular approaches.
6
Conclusions
In this paper, we study relative performance of local search algorithms used to solve MLCP. The local search method has two problem–specific steps: generation of the initial solution and perturbation of a solution applied for generation of its neighbor. In our case, the perturbation step consists of two substeps: obtaining a neighbor schedule, and refining or repairing this neighbor schedule. Eventually, the three problem–specific steps are necessary to adapt the general scheme of local search to the Maximum Lifetime Coverage Problem space. The starting point in our research were three LS algorithms we have proposed earlier: LSHMA , LSCAIA , and LSRFTA . Each of them contained its own versions of the three problem–specific steps. We swapped the steps between the algorithms
Local Search Approaches with Different Problem-Specific Steps
415
Table 4. Mean, min and max lengths of schedules returned by the version [RF T A, HM A] for each of the eight test cases in SCP1 and for five values of Tbatt from 10 to 30 No Tbatt
Mean Min Max No Tbatt
Mean Min Max
1
10 15 20 25 30
211.03 315.98 419.77 524.67 629.45
204 307 406 510 608
219 5 324 430 542 647
10 15 20 25 30
101.20 150.80 200.00 249.43 298.27
95 144 193 242 287
105 156 209 261 311
2
10 15 20 25 30
218.95 326.55 435.45 543.80 651.25
212 318 427 532 640
226 6 336 444 555 668
10 15 20 25 30
81.88 121.17 160.60 199.85 239.28
78 115 157 196 230
86 126 166 207 247
3
10 15 20 25 30
143.97 214.38 285.13 356.18 426.15
139 209 278 347 416
148 7 221 291 363 437
10 61.90 57 15 92.10 87 20 121.50 115 25 151.57 143 30 181.40 172
66 99 129 157 190
4
10 15 20 25 30
95.53 141.82 187.53 233.82 278.57
89 136 178 220 266
108 8 151 199 245 292
10 54.50 52 15 81.38 78 20 107.03 103 25 133.40 130 30 159.22 154
58 86 111 137 166
and this way we obtained new versions of the local search algorithms. In the set of experiments, we compared the efficiency of these new versions. In our experiments, we generated an initial schedule using CAIA, and we tried perturbation methods and refinement/repair methods from all three approaches. We used benchmark data set SCP1 which we proposed in our previous papers. Our experiments have shown that the best pair of perturbation and refinement/repair methods is the one used in LSHMA , i.e., [HM A, HM A]. Approaches [HM A, RF T A], [CAIA, HM A], and [HM A, CAIA] are also effective while the remaining combinations give much worse results.
References 1. Gil, J.M., Han, Y.H.: A target coverage scheduling scheme based on genetic algorithms in directional sensor networks. Sensors (Basel, Switzerland) 11(2), 1888– 1906 (2011). https://doi.org/10.3390/s110201888
416
K. Trojanowski and A. Mikitiuk
2. Keskin, M.E., Altinel, I.K., Aras, N., Ersoy, C.: Wireless sensor network lifetime maximization by optimal sensor deployment, activity scheduling, data routing and sink mobility. Ad Hoc Netw. 17, 18–36 (2014). https://doi.org/10.1016/j.adhoc. 2014.01.003 3. Roselin, J., Latha, P., Benitta, S.: Maximizing the wireless sensor networks lifetime through energy efficient connected coverage. Ad Hoc Netw. 62, 1–10 (2017). https://doi.org/10.1016/j.adhoc.2017.04.001 4. Tretyakova, A., Seredynski, F.: Application of evolutionary algorithms to maximum lifetime coverage problem in wireless sensor networks. In: IPDPS Workshops, pp. 445–453. IEEE (2013). https://doi.org/10.1109/IPDPSW.2013.96 5. Tretyakova, A., Seredynski, F.: Simulated annealing application to maximum lifetime coverage problem in wireless sensor networks. In: Global Conference on Artificial Intelligence, GCAI, vol. 36, pp. 296–311. EasyChair (2015) 6. Tretyakova, A., Seredynski, F., Bouvry, P.: Graph cellular automata approach to the maximum lifetime coverage problem in wireless sensor networks. Simulation 92(2), 153–164 (2016). https://doi.org/10.1177/0037549715612579 7. Tretyakova, A., Seredynski, F., Guinand, F.: Heuristic and meta-heuristic approaches for energy-efficient coverage-preserving protocols in wireless sensor networks. In: Proceedings of the 13th ACM Symposium on QoS and Security for Wireless and Mobile Networks, Q2SWinet’17, pp. 51–58. ACM (2017). https://doi.org/ 10.1145/3132114.3132119 8. Trojanowski, K., Mikitiuk, A., Guinand, F., Wypych, M.: Heuristic optimization of a sensor network lifetime under coverage constraint. In: Computational Collective Intelligence: 9th International Conference, ICCCI 2017, Nicosia, Cyprus, 27–29 Sept 2017, Proceedings, Part I, LNCS, vol. 10448, pp. 422–432. Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67074-4 41 9. Trojanowski, K., Mikitiuk, A., Kowalczyk, M.: Sensor network coverage problem: a hypergraph model approach. In: Computational Collective Intelligence: 9th International Conference, ICCCI 2017, Nicosia, Cyprus, 27–29 Sept 2017, Proceedings, Part I, LNCS, vol. 10448, pp. 411–421. Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67074-4 40 10. Trojanowski, K., Mikitiuk, A., Napiorkowski, K.J.M.: Application of local search with perturbation inspired by cellular automata for heuristic optimization of sensor network coverage problem. In: Parallel Processing and Applied Mathematics, LNCS, vol. 10778, pp. 425–435. Springer International Publishing (2018). https:// doi.org/10.1007/978-3-319-78054-2 40 11. Wang, B.: Coverage Control in Sensor Networks. Computer Communications and Networks. Springer (2010). https://doi.org/10.1007/978-1-84800-328-6 12. Wang, L., Wu, W., Qi, J., Jia, Z.: Wireless sensor network coverage optimization based on whale group algorithm. Comput. Sci. Inf. Syst. 15(3), 569–583 (2018). https://doi.org/10.2298/CSIS180103023W 13. Yile, W.U., Qing, H.E., Tongwei, X.U.: Application of improved adaptive particle swarm optimization algorithm in WSN coverage optimization. Chin. J. Sens. Actuators (2016) 14. Zorbas, D., Glynos, D., Kotzanikolaou, P., Douligeris, C.: BGOP: an adaptive coverage algorithm for wireless sensor networks. In: Proceedings of the 13th European Wireless Conference, EW07 (2007)
Modelling Dynamic Programming-Based Global Constraints in Constraint Programming Andrea Visentin1(B) , Steven D. Prestwich1 , Roberto Rossi2 , and Armagan Tarim3 1
Insight Centre for Data Analytics, University College Cork, Cork, Ireland [email protected], [email protected] 2 University of Edinburgh Business School, Edinburgh, UK [email protected] 3 Cork University Business School, University College Cork, Cork, Ireland [email protected]
Abstract. Dynamic Programming (DP) can solve many complex problems in polynomial or pseudo-polynomial time, and it is widely used in Constraint Programming (CP) to implement powerful global constraints. Implementing such constraints is a nontrivial task beyond the capability of most CP users, who must rely on their CP solver to provide an appropriate global constraint library. This also limits the usefulness of generic CP languages, some or all of whose solvers might not provide the required constraints. A technique was recently introduced for directly modelling DP in CP, which provides a way around this problem. However, no comparison of the technique with other approaches was made, and it was missing a clear formalisation. In this paper we formalise the approach and compare it with existing techniques on MiniZinc benchmark problems, including the flow formulation of DP in Integer Programming. We further show how it can be improved by state reduction methods.
Keywords: Constraint programming Encoding
1
· Dynamic programming · MIP ·
Introduction
Constraint Programming (CP) is one of the most active fields in Artificial Intelligence (AI). Designed to solve optimisation and decision problems, it provides expressive modelling languages, development tools and global constraints. An overview of the current status of CP and its challenges can be found in [9]. The Dynamic Programming (DP) approach builds an optimal solution by breaking the problem down into subproblems and solving each to optimality in a recursive manner, achieving great efficiency by solving each subproblem once only. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 417–427, 2020. https://doi.org/10.1007/978-3-030-21803-4_42
418
A. Visentin et al.
There are several interesting connections between CP and DP: – DP has been used to implement several efficient global constraints within CP systems, for example [10,17]. A tag for DP approaches in CP is available in the global constraint catalogue [1]. – [8] used CP to model a DP relaxed version of the TSP after a state space reduction. – The DP feature of solving each subproblem once only has been emulated in CP [5], in Constraint Logic Programming languages including Picat [19], by remembering the results of subtree searches via the technique of memoization (or tabling) which remembers results so that they need not be recomputed. This can improve search performance by several orders of magnitude. – DP approaches are widely used in binary decision diagrams and multi-value decision diagrams [3]. Until recently there was no standard procedure to encode a DP model into CP. If part of a problem required a DP-based constraint that is not provided by the solver being used, the modeller was forced either to write the global constraint manually, or to change solver. This restricts the usefulness of DP in CP. However, a new connection between CP and DP was recently defined. [16] introduced a technique that allows DP to be seamlessly integrated within CP: given a DP model, states are mapped to CP variables while seed values and recurrence equations are mapped to constraints. The resulting model is called a dynamic program encoding (DPE). Using a DPE, a DP model can be solved by pure constraint propagation without search. DPEs can form part of a larger CP model, and provide a general way for CP users to implement DP-based global constraints. In this paper we explore DPEs further: – We provide a formalization of the DPE that allows a one-to-one correspondence with a generic DP approach. – We compare the DPE with a widely known variable redefinition technique for modelling DP in Mixed Integer Programming (MIP) [13], and show its superior performance. – We show that the performance of a DPE can be further improved by the application of state reduction techniques. – We show that is possible to utilize it to model some DP-based constraints in MiniZinc. The paper is organised as followed. Section 2 formalises the DPE technique to allow a one-to-one mapping of the DP approaches. Section 3 applies DPE to the shortest path problem in MiniZinc. We study its application to the knapsack problem and show how it can strongly improve the way we represent DP in CP, and how to use state reduction techniques to improve performance. Section 4 concludes the paper and discusses when this technique should be used.
Modelling Dynamic Programming-Based Global Constraints
2
419
Method
In this section we formalize the DPE. As mentioned above, it models every DP state with a CP variable, and the seed values and recurrence relations with constraints. [16] introduces the technique informally, and here we give a more formal description based on the definition of DP given in [4]. Many problems can be solved with a DP approach that can be modelled as a shortest path on a DAG, for example the knapsack problem [11] or the lot sizing problem [7]. We decided to directly use the shortest path problem, which was already used as a benchmark for the MiniZinc challenge [18]. One of the most famous DP-like algorithms is used to solve this problem: Dijkstra’s algorithm. We will use Fig. 1 to help the visualization of the problem.
Fig. 1. Graph relative to a generic shortest path problem.
Most DPs can be described by their three most important characteristics: stages, states and recursive optimization. The fundamental feature of the DP approach is the structuring of optimization problems in stages, which are solved sequentially. The solution of each stage helps to define the characteristics of the next stage problem. In Fig. 1 the stages are represented in grey. In the DPE the stages are simply represented as groups of states, and it is important that the stages depends only on the next stage’s states. Since the graph is acyclic we can divide the stages into group of nodes that can not access the nodes of the previous stages, and can not be accessed by the nodes of the next stages. To each stage of the problem are associated one or more states. These contain enough information to make future decisions, without regard to how the process reached the current state. In the DPE the states are represented by CP variables and they contain the optimal value for the subproblem represented by that state. In the graphical representation they are the nodes of the graph. In Dijkstra’s algorithm these variables contain the length of the shortest path from that node to the sink. We can identify two particular type of stages: the initial state that contains the optimal solution for the whole problem and no other state solution
420
A. Visentin et al.
is based on its value; and base cases or final states that are the solution of the smallest problems. The solutions for these problems do not depend on any other state. In Fig. 1 they are represented by the source and the sink of the graph. The last general characteristic of the DP approach is the recursive optimization procedure. The goal of this procedure is to build the overall solution by solving one stage at time and linking the optimal solution to each state to other states of subsequent stages optimal solution. This procedure is generally based on a backward induction process. The procedure has numerous components, that can be generally modelled as a functional equation or Bellman equation [2] and recursion order. In the DPE the functional equation is not a single equation, but is applied to every state via a constraint. This constraint contains an equality that binds the optimal value obtainable to that states to the ones of the next stages involved. For every state we have a set of feasible decisions that can lead to a state of the next stage, which in the graph are represented by the edges leaving the associated node: if used in the shortest path it means that decision is taken. In the constraint is included also the immediate cost associated with each decision, which is the value that is added to or subtracted from the next stage state variables. In Fig. 1 these costs are represented by the weights of the involved edges. In the shortest path problem, the constraint applied to each (non-sink) state assigns to the node’s CP variable the minimum of the reachable with one edge node’s CP variables, plus the edge cost. The important difference between the encodings is the order in which the states are explored and resolved. In DP they are ordered in such a way that each state is evaluated only when all the subsequent stages are solved, while in the encodings the ordering is delegated to the solvers. In the MIP flow formulation it is completely replaced by a search on the binary variables, while in the DPE it is done by constraint propagation, which depends on the CP solver implementation of the propagators. This approach is more robust than search, which in the worst case can spend an significant time exploring a search subtree. The optimality of the solution is guaranteed by the correctness of the DP formulation.
3
Computational Results
We aim to make all our results replicable by other researchers. Our code is available online at: https://github.com/andvise/dpincp. We also decided to use only open source libraries. In the first experiment we used MiniZincIDE 2.1.7, while the second part is coded in Java 10. We used 3 CP solvers: Gecode 6.0.1, Google OR-Tools 6.7.2 and Choco Solver 4.0.8. We used as MIP solvers: COIN-OR branch-and-cut solver, IBM ILOG CPLEX 12.8 and Gurobi 8.0.1. All experiments are executed on an Ubuntu system with an Intel i7-3610QM, 8 GB of RAM and 15 GB of swap memory.
Modelling Dynamic Programming-Based Global Constraints
3.1
421
Shortest Path in MiniZinc
MiniZinc is a standard modelling language for CP. It provides a set of standard constraints, with ways of decomposing all of them to be solved by a wide variety of solvers. To achieve standardization the MiniZinc constraints catalogue is very limited. Only constraints that are available in all its solvers, or that can be decomposed into simpler constraints, are included. The decomposition is generally done in a naive way, causing poor performance. This is true of all the DP-based constraints in particular. In this section we focus on applications of the DPE in MiniZinc. We aim to apply this new technique to the shortest path problem and solve it with the DPE of the Dijkstra algorithm we used as an example in the method section. The shortest path was one of the benchmarks of the MiniZinc challenge [18]. The current reduction is based on a flow formulation on the nodes of the graph, which regulates the flow over each node and requires a binary variable for each edge indicating whether an edge is used or not. This is the same encoding proposed by [13]. Our implementation is based on Dijkstra’s algorithm: every decision variable contains the shortest distance to the sink node. The formulation is shorter and more intuitive than the previous one. We compared the methods on the 10 available benchmark instances. We used the MiniZincIDE and Gecode as solver, with 20 min as a time limit. Table 1 shows the results of the computations. When the flow formulation finds a good or optimal solution quickly, the DPE is approximately twice as fast. However, the flow formulation requires search that can take exponential time, and it is unable to find a solution before timeout occurs. The most interesting result is that, by using only constraint propagation, DPE performance is robust and only marginally affected by the structure of the instances. In some cases, for example instance 7, the flow formulation finds an optimal solution but takes a long time to prove optimality, in which case the DPE is more than 4 orders of magnitude faster. Table 1. Time required to complete the computation of the 10 benchmark instances in Gecode. ‘-’ represents a timeout. CP solver Dijkstra Flow formulation
0
1
2
23 ms 19 ms 18 ms -
3 17 ms
4
5
6
24 ms 20 ms 25 ms
50 ms 60 ms 571 ms 46 ms
-
7
8
9
23 ms
20 ms
29 ms
47 ms 1 182 s 4 504 ms
-
The DPE requires a smaller number of variables, since it requires only one for each node. On the contrary, the flow formulation requires a variable for each edge. This is without taking in account the number of additional variables created during the decomposition.
422
A. Visentin et al.
The DPE cannot rival a state-of-the-art shortest path solver in terms of performance, in the case of parameterised shortest path problems, in which the costs of the edges are influenced by other constraints. However, the DPE allows a more flexible model than a specific global constraint and a more efficient model in MiniZinc. We repeated the above experiment using a MIP solver instead of CP. Table 2 contains the results of the 10 instances solved using COIN-OR branch-and-cut solver. Interestingly, the situation is inverted: the flow formulation performs efficiently while the DPE fails to find an optimal solution in many cases. This is due to the high number of auxiliary discrete variables needed by the MIP decomposition of the min constraint. Because of this the DPE loses one of its main strengths: DP computation by pure constraint propagation. Moreover the MIP can take advantage of the unimodularity of the matrix, as mentioned before. We therefore recommend the usual flow-based formulation for MIP and the DPE for CP. Table 2. Time required to complete the computation of the 10 benchmark instances in CBC COIN. ‘-’ represents a timeout. MIP solver
0
1
2
3
4
Dijkstra
375 s
64 ms
-
-
-
Flow formulation
31 ms 39 ms 34 ms 40 ms 46 ms
3.2
5
6
20 667 ms 61 ms 35 ms
7 -
36 ms 40 ms
8
9
138 ms 303 ms 37 ms
53 ms
Knapsack Problem
We now apply the DPE to the knapsack problem [11] because it is a widely known NP-hard problem, it has numerous extensions and applications, there is a reduction in MiniZinc for this constraint, and it can be modelled with the technique proposed by [13]. We consider the most common version in which every item can be packed at most once, also known as the 0–1 knapsack problem [15]. Research on this problem is particularly active [12] with many applications and different approaches. The problem consists of a set of items I with volumes v and profits p. The items can be packed in a knapsack of capacity C. The objective is to maximize the total profit of the packed items without exceeding the capacity of the knapsack. The binary variables x represent the packing scheme, xi is equal to 1 if the item is packed, 0 otherwise. The model is: max x · p s.t.
x·v ≤C x ∈ {0, 1}
(1a) (1b) (1c)
This model can be directly implemented in CP or in MIP. To solve the binary knapsack problem we the well known DP-like algorithm described in [11], refer
Modelling Dynamic Programming-Based Global Constraints
423
to the source for the full description of the algorithm. Utilizing the structure of a DP approach in the previous section: J[i, j] with i ∈ I and j ∈ [0, C] are the states of our DP. Each J[i, j] contains the optimal profit of packing the subset of items Ii = (i, . . . , n) in a knapsack of volume j. The formulation can be represented by a rooted DAG, in this case a tree with node J[1, C] as root and nodes J[n, j], j ∈ [0, C] as leaves. For every internal node J[i, j] the leaving arcs represent the action of packing the item i, and their weight is the profit obtained by packing the i-th item. A path from the root to a leaf is equivalent to a feasible packing, and the longest path of this graph is the optimal solution. If we encode this model using a DPE, creating all the CP variables representing the nodes of the graph, then it is solved by pure constraint propagation with no backtracking. We use this problem to show the potential for speeding up computational times. With the DPE implementation we can use simple and well known techniques to reduce the state space without compromising the optimality. For example, if at state J[i, j] volume j is large enough to contain all items from i to n (all the items I might pack in the next stages) then we know that the optimal solution of J[i, j] will contain all of them, as their profit is a positive number. This pruning can be made more effective by sorting the items in decreasing order of size, so the pruning will occur closer to the root and further reduce the size of the search space. We test the DPE on different type of instances of increasing item set sizes, and compare its performance with several other decompositions of the constraint: – A CP model that uses the simple scalar product of the model (1a)–(1b) (Naive CP). The MiniZinc encoding of the knapsack contraint uses the same structure. – The knapsack global constraint available in Choco (Global constraint). This constraint is implemented with scalar products. The propagator uses Dantzig-Wolfe relaxation [6]. – A CP formulation of the encoding proposed in this paper (DPE) solved using Google OR. – A DPE with state space reduction technique introduced before (DPE + sr). – A DPE with state space reduction technique with the parts sorted, (DPE + sr + sorting). – The MIP flow formulation proposed by [13] (Flow model Solver) which we tested with 3 different MIP solvers: COIN CBC, CPLEX and Gurobi. To make the plots readable we decided to show only these solutions, but others are available in our code. As a benchmark we decided to use Pisinger’s described in [14]. We did not use the code available online because it is not possible to set a seed and make the experiments replicable. Four different type of instances are defined, in decreasing correlation between items’s weight and profit order: subsetsum, strongly correlated, weakly correlated and uncorrelated. Due to space limitation we leave the reader finding the details of the instances in the original paper. In our experiments we tested all the types, and we kept the same configuration of [14] first
424
A. Visentin et al.
set of experiments. We increased the size of the instances until all the DP encodings were not finding the optimal solution before the time limit. A time limit of 10 min was imposed on the MIP and CP solvers, including variable creation overhead.
Fig. 2. Average computational time for: (a) subsetsum instances; and, (b) uncorrelated instances.
Figure 2 shows the computational time in relation with the instance size. Due to space limitations we had to limit the number of plots. We can see that DPE clearly outperforms the naive formulation in CP or the previous encoding (flow formulation) solved with an open source solver, CBC. Normal DPE solved with an open source solver is computationally comparable to the flow formulation implemented in CPLEX, and outperform the one solved by Gurobi in instances where the correlation between weight and profit of the items is lower, even if the commercial MIP solvers use parallel computations. The DPE outperforms the variable redefinition technique in MIP, because of the absence of search. It is also clearly better that a simple CP model of the problem definition, which is the same model used for the MiniZinc constraint. The Choco constraint with ad-hoc propagator outperforms the DPE in most of the cases, confirming that a global constraint is faster than a DPE. A particular situation is the test on the strongly correlated instances, in this case the global constraint fails to find the optimal solution in many test instances even with a small number of items; probably the particular structure of the problem makes the search get stucked in some non optimal branches. It is interesting to note the speed up from the space reduction technique. The basic DPE can solve instances up to 200 items but it has a memory problem: the state space grows so rapidly that a massive usage of the SWAP memory is needed. However, this effect is less marked when a state reduction technique is applied. This effect is stronger when the correlation between item profits and
Modelling Dynamic Programming-Based Global Constraints
425
volumes is stronger. The reduction technique improves considerably when we increase the number of items needed to fill the bin, since the pruning occurs earlier in the search tree: see Fig. 3.
Fig. 3. Computational time in the subsetsum instances with volume per item reduced
In the case that the constraint has to be called multiple times during the solving of a bigger model, the DPE can outperform the pure constraint since the overhead to create all the variables is not repeated. This experiment demonstrates the potential of the DPE with state space reduction: even with a simple and intuitive reduction technique we can solve instances 10 times bigger than with a simple CP model. We can see that the behaviour of DPE is stable regardless of the type of the instance; on the contrary, the performance of the space reduction technique strongly depends on the instance type and the volume of the knapsack. Of course we can not outperform a pure DP implementation, even if our solution involves a similar number of operations. This is mainly due to the time and space overhead of creating CP variables. In fact the DPE requires more time to create the CP variables than to propagate the constraints.
4
Conclusions
In this paper we have analysed a recently proposed technique for mapping DP into CP, called the dynamic program encoding (DPE), which takes advantage of the optimality substructure of the DP. With a DPE the DP execution is achieved by pure constraint propagation (without search or backtracking). We provided a standard way to model a DP into DPE. We have demonstrated the potential of the DPE in constraint modelling in several ways: we compared it with another DP-encoding technique using CP and MIP solvers; we showed how
426
A. Visentin et al.
to use state reduction techniques to improved its performance; we showed that it outperforms standard DP encoding techniques in the literature, and greatly outperforms non-DP-based CP approaches to the knapsack problem; and we applied the DPE to MiniZinc benchmarks, showing how its performance is faster and more robust than existing CP techniques. We also showed a negative result: the DPE is unsuitable for use in MIP, where standard methods are much better. To recap the potential applications of the DPE, it can be used when: a DPbased constraint is needed but also other constraints can affects states inside the DP; when the respective DP global constraint is not implemented in the specific solver; and when DP approaches are needed in MiniZinc as starting approach to decompone more complex problems in simpler instructions. Acknowledgments. This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 which is co-funded under the European Regional Development Fund.
References 1. Beldiceanu, N., Carlsson, M., Rampon, J.X.: Global constraint catalog, (revision a) (2012) 2. Bellman, R.: The theory of dynamic programming. Technical report, RAND Corp Santa Monica CA (1954) 3. Bergman, D., Cire, A.A., van Hoeve, W.J., Hooker, J.N.: Discrete optimization with decision diagrams. INFORMS J. Comput. 28(1), 47–66 (2016) 4. Bradley, S.P., Hax, A.C., Magnanti, T.L.: Applied Mathematical Programming. Addison Wesley (1977) 5. Chu, G., Stuckey, P.J.: Minimizing the maximum number of open stacks by customer search. In: International Conference on Principles and Practice of Constraint Programming, pp. 242–257. Springer (2009) 6. Dantzig, G.B., Wolfe, P.: Decomposition principle for linear programs. Oper. Res. 8(1), 101–111 (1960) 7. Eppen, G.D., Martin, R.K.: Solving multi-item capacitated lot-sizing problems using variable redefinition. Oper. Res. 35(6), 832–848 (1987) 8. Focacci, F., Milano, M.: Connections and integrations of dynamic programming and constraint programming. In: CPAIOR 2001 (2001) 9. Freuder, E.C.: Progress towards the holy grail. Constraints 23(2), 158–171 (2018) 10. Malitsky, Y., Sellmann, M., van Hoeve, W.J.: Length-lex bounds consistency for knapsack constraints. In: International Conference on Principles and Practice of Constraint Programming, pp. 266–281. Springer (2008) 11. Martello, S.: Knapsack Problems: Algorithms and Computer Implementations. Wiley-Interscience Series in Discrete Mathematics and Optimization (1990) 12. Martello, S., Pisinger, D., Toth, P.: New trends in exact algorithms for the 0–1 knapsack problem. Eur. J. Oper. Res. 123(2), 325–332 (2000) 13. Martin, R.K.: Generating alternative mixed-integer programming models using variable redefinition. Oper. Res. 35(6), 820–831 (1987) 14. Pisinger, D.: A minimal algorithm for the 0–1 knapsack problem. Oper. Res. 45(5), 758–767 (1997)
Modelling Dynamic Programming-Based Global Constraints
427
15. Plateau, G., Nagih, A.: 0–1 knapsack problems. In: Paradigms of Combinatorial Optimization: Problems and New Approaches, vol. 2, pp. 215–242 (2013) 16. Prestwich, S.D., Rossi, R., Tarim, S.A., Visentin, A.: Towards a closer integration of dynamic programming and constraint programming. In: 4th Global Conference on Artificial Intelligence (2018) 17. Quimper, C.G., Walsh, T.: Global grammar constraints. In: International Conference on Principles and Practice of Constraint Programming, pp. 751–755. Springer (2006) 18. Stuckey, P.J., Feydy, T., Schutt, A., Tack, G., Fischer, J.: The minizinc challenge 2008–2013. AI Mag. 35(2), 55–60 (2014) 19. Zhou, N.F., Kjellerstrand, H., Fruhman, J.: Constraint Solving and Planning with Picat. Springer (2015)
Modified Extended Cutting Plane Algorithm for Mixed Integer Nonlinear Programming Wendel Melo1(B) , Marcia Fampa2 , and Fernanda Raupp3 1
3
College of Computer Science, Federal University of Uberlandia, Uberlˆ andia, Brazil [email protected] 2 Institute of Mathematics and COPPE, Federal University of Rio de Janeiro, Janeiro, Brazil [email protected] National Laboratory for Scientific Computing (LNCC) of the Ministry of Science, Technology and Innovation, Petr´ opolis, Brazil [email protected]
Abstract. In this work, we propose a modification on the Extended Cutting Plane algorithm (ECP) that solves convex mixed integer nonlinear programming problems. Our approach, called Modified Extended Cutting Plane (MECP), is inspired on the strategy of updating the set of linearization points in the Outer Approximation algorithm (OA). Computational results over a set of 343 test instances show the effectiveness of the proposed method MECP, which outperforms ECP and is competitive to OA.
Keywords: Mixed integer nonlinear programming Extended cutting plane · Outer approximation
1
·
Introduction
In this work, we address the following convex Mixed Integer Nonlinear Programming (MINLP) problem: (P ) min f (x, y) x, y
s. t. g(x, y) ≤ 0, x ∈ X, y ∈ Y ∩ Zny ,
(1)
where X and Y are polyhedral subsets of Rnx and Rny , respectively, Y is bounded, f : Rnx +ny → R and g : Rnx +ny → Rm are convex and continuously differentiable functions. We assume that problem (P ) has an optimal solution. The difficulty involved in the solution of problem (P ), as well as its applicability in diverse situations, justify the search for efficient algorithms for its c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 428–437, 2020. https://doi.org/10.1007/978-3-030-21803-4_43
Modified Extended Cutting Plane Algorithm for MINLP
429
resolution. In this context, several approaches have been proposed to solve MINLP problems. (The interested reader can find good bibliographic review in [2,5,8,13]). Among these different approaches, the algorithms which belong to the class of linear approximation deserve special emphasis. This class of algorithms solve convex MINLP problems approximating them by a sequence of Mixed Integer Linear Programming (MILP) problems, whose solutions provide lower bounds (in the minimization case) for the original problem addressed. Such approximations are obtained through first order derivatives and are based on the convexity of the functions in (P ). A good characteristic of these algorithms is that they take advantage of the maturity achieved in the MILP area, with the use of sophisticated computational packages. A well known algorithm from the literature in the class of linear approximation is the Outer Approximation (OA) algorithm [6,7]. At each iteration, OA solves a MILP problem and one or two continuous Nonlinear Programming (NLP) problems. Such scheme was developed to guarantee the convergence of OA in a finite number of iterations and has shown to be efficient in several practical situations. Another well known algorithm in the class of linear approximation is the Extended Cutting Plane (ECP) algorithm [14]. The main difference between OA and ECP is that ECP does not solve any NLP problem during its execution, restricting itself to the solution of MILP problems. Although, at first, this seems to be an advantage of ECP, this strategy leads to no guaranteed convergence of the algorithm in a finite number of iterations. We have observed that, in many cases, the ECP algorithm demands a bigger number of iterations than OA to converge to the optimal solution of problem (P ), and thus, ECP requires more computational effort than OA. Nevertheless, as the ECP algorithm does not need to solve NLP problems, it has the advantage of not requiring the computation of second order derivatives, or any approximation of them. In this work, we propose an algorithm based on ECP. Our main contribution consists in a small modification in the ECP algorithm, making it more similar to OA, hoping to reduce the number of necessary iterations to converge, but keeping the friendly characteristic of ECP, of not solving any kind of NLP problem, and so avoiding the computation of second order derivatives (or approximations of them). Despite modest, our contribution, which we call Modified Extended Cutting Plane (MECP) algorithm, has shown promising results for a set of 343 MINLP test instances, indicating that the proposed method is competitive with OA. This paper is organized as follows: Sect. 2 discusses about the ECP algorithm, while algorithm OA is presented in Sect. 3. Our MECP approach is then introduced in Sect. 4. Finally, Sect. 5 presents computational results comparing MECP to OA and ECP, pointing some conclusions about the work developed.
2
The Extended Cutting Plane Algorithm
The Extended Cutting Plane algorithm was proposed in [14] and is based on the approximation of (P ) by a Mixed Integer Linear Programming (MILP) problem,
430
W. Melo et al.
known as the master problem. To facilitate the understanding of the master problem, we note that (P ) can be reformulated as a problem with linear objective function with the use of an additional auxiliary variable α: (P¯ ) min α α,x,y
s. t. f (x, y) ≤ α g(x, y) ≤ 0 x ∈ X, y ∈ Y ∩ Zny .
(2)
Once the constraints of (P¯ ) are convex, when we linearize them by Taylor series about any given point (¯ x, y¯) ∈ X × Y , we obtain the following valid inequalities for (P¯ ): x−x ¯ ∇f (¯ x, y¯)T + f (¯ x, y¯) ≤ α (3) y − y¯ x−x ¯ T ∇g(¯ x, y¯) + g(¯ x, y¯) ≤ 0 (4) y − y¯ Therefore, we generate the master problem from a set L = {(x1 , y 1 ), . . . , (x , y k )} with k linearization points, which is a relaxation of the original problem (P ): L min α M α,x,y x − xj s. t. ∇f (xj , y j )T + f (xj , y j ) ≤ α, ∀(xj , y j ) ∈ L j y − y (5) x − xj j j T j j j j ∇g(x , y ) + g(x , y ) ≤ 0, ∀(x , y ) ∈ L y − yj x ∈ X, y ∈ Y ∩ Zny . k
Let (ˆ α, x ˆ, yˆ) be an optimal solution of (M L ). We emphasize that the value α ˆ is a lower bound for (P¯ ) and (P ). If (ˆ α, x ˆ, yˆ) is feasible for (P¯ ), then the value α ˆ is also an upper bound for (P¯ ) and (P ). In this case, as (ˆ α, x ˆ, yˆ) gives the same value as a lower and an upper bound for (P¯ ), this solution is optimal for (P¯ ). Therefore, (ˆ x, yˆ) is also an optimal solution for (P ). On the other side, if (ˆ α, x ˆ, yˆ) is not feasible for (P¯ ), it is necessary to add valid inequalities to (M L ), to cut this solution out of its feasible set, strengthening the relaxation given by this problem. To reach this goal, the ECP algorithm uses the strategy of adding the solution (ˆ x, yˆ) to the set L. The ECP algorithm is presented as Algorithm 1. We point out that ECP does not require the solution of any NLP problem and does not use any information from second order derivatives or approximations of them. This characteristic can be advantageous in some cases, especially when the computation of second order derivatives is hard or cannot be accomplished for some reason. We also emphasize that the strategy of adding the solution (ˆ xk , yˆk ) to the set L at the end of each iteration (line 13) does not ensure that ECP has finite convergence.
Modified Extended Cutting Plane Algorithm for MINLP
431
Input: (P ): MINLP problem, c : convergence tolerance. Output: (x∗ , y ∗ ): optimal solution for (P ). 1 2 3 4 5 6 7 8 9 10 11 12 13 14
z l = −∞; z u = +∞; Choose an initial linearization point (x0 , y 0 ); L = {(x0 , y 0 )}; k = 1; Let (M L ) be the master problem constructed from (P ) over the points in L; while z u − z l > c do Let (α ˆk , x ˆk , yˆk ) be an optimal solution of (M L ); l k z =α ˆ ; if (ˆ xk , yˆk ) is feasible for (P ) and f (ˆ xk , yˆk ) < z u then u k k z = f (ˆ x , yˆ ); (x∗ , y ∗ ) = (ˆ xk , yˆk ); L = L ∪ (ˆ xk , yˆk ); k = k + 1;
Algorithm 1. Extended Cutting Plane (ECP) algorithm. We have observed that the new cuts generated are usually weak, which makes the algorithm to require a large number of iterations to converge to an optimal solution.
3
The Outer Approximation Algorithm
The Outer Approximation (OA) algorithm was proposed in [6,7]. Similarly to ECP, the main foundation of OA is to adopt the master problem (M L ), and, at each iteration, to add a new linearization point to L until the lower bound given by (M L ) becomes sufficiently close to the best known upper bound for ˆk , yˆk ) be an optimal solution for (M L ) at iteration k. An attempt (P ). Let (ˆ αk , x to obtain an upper bound for (P ) is to solve problem (Pyˆk ), which is the NLP problem obtained from (P ) by fixing y at the value yˆk : (Pyˆk ) min f (x, yˆk ) x
s. t. g(x, yˆk ) ≤ 0 x ∈ X.
(6)
xk , yˆk ) be an optimal solution for the probIf problem (Pyˆk ) is feasible, let (˜ lem. In this case, f (˜ xk , yˆk ) is an upper bound for (P ) and (P¯ ), and the point (˜ xk , yˆk ) is added to the set L. In case (Pyˆk ) is infeasible, then the following feasibility problem is solved: m (PyˆFk ) min i=1 ui u,x
s. t. g(x, yˆk ) ≤ u u ≥ 0, x ∈ X, u ∈ Rm .
(7)
432
W. Melo et al.
Let (ˇ uk , x ˇk ) be an optimal solution for (PyˆFk ). The point (ˇ xk , yˆk ) is then added k k to the set L. After the update of L, with the addition of (˜ x , yˆ ), if problem (Pyˆk ) is feasible, or with the addition of (ˇ xk , yˆk ) otherwise, the algorithm starts a new iteration, using as a stopping criterion a maximum tolerance for the difference between the best lower and upper bounds obtained. As shown in [7], assuming that the KKT conditions are satisfied at the solutions of (Pyˆk ) and (PyˆFk ), the strategy used to update L ensures that a given solution yˆk for the integer variable y is not visited more than once by the algorithm, except in case it is part of the optimal solution of (P ) (in this case the solution may be visited at most twice). As the number of integer solutions is finite by hypothes, since Y is bounded, the algorithm is guaranteed to find an optimal solution of (P ) in a finite number of iterations. Thus, in comparison with the ECP algorithm, OA tends to spend a smaller number of iterations, with the overhead of needing to solve one or two NLP problems at each iteration. Algorithm 2 presents the OA algorithm.
Input: (P ): MINLP problem, c : convergence tolerance. Output: (x∗ , y ∗ ): optimal solution for (P ). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
z l = −∞; z u = +∞; Choose an initial linearization point (x0 , y 0 ), (usually the optimal solution of the continuous relaxation of (P )); L = {(x0 , y 0 )}; k = 1; Let M L be the master problem constructed from (P ) over the points in L; Let (Pyˆk ) be the NLP problem obtained by fixing the variable y of (P ) at yˆk ; Let (PyˆFk ) be the feasibility NLP problem obtained from (Pyˆk ); while z u − z l > c do Let (α ˆk , x ˆk , yˆk ) be an optimal solution of M L ; zl = α ˆk ; if (Pyˆk ) is feasible then Let xk be an optimal solution of (Pyˆk ); if f (xk , yˆk ) < z u then z u = f (xk , yˆk ); (x∗ , y ∗ ) = (xk , yˆk ); else Let (uk , xk ) be an optimal solution of (PyˆFk ); L = L ∪ {(xk , yˆk )}; k = k + 1;
Algorithm 2. Outer Approximation (OA) algorithm.
Modified Extended Cutting Plane Algorithm for MINLP
4
433
Our Modified Extended Cutting Plane Algorithm
In this section, we present our approach based on the ECP algorithm, which we call Modified Extended Cutting Plane (MECP). Our main motivation is to improve the performance of ECP, turning it more similar to OA, while keeping the nice characteristic of ECP of being a first order method. With this purpose, instead of considering problems (Pyˆk ) and (PyˆFk ), MECP considers a linear approximation for problem (Pyˆk ), built over the same set of linearization points L as problem (M L ). We denote then this new problem by (MyˆLk ), as defined in the following: MyˆLk min α α,x x − xj + f (xj , y j ) ≤ α, ∀(xj , y j ) ∈ L s. t. ∇f (xj , y j )T k j (8) yˆ − yj x−x j j j j ∇g(xj , y j )T , y ) ≤ 0, ∀(x , y ) ∈ L + g(x yˆk − y j x ∈ X. Note that (MyˆLk ) can be obtained from (M L ) simply by fixing the variable y at the value yˆk . Thus, when considering problem (MyˆLk ), we expect to obtain good feasible solutions sooner when comparing with the traditional ECP algorithm. These solutions are used for a possible update of the known upper bound z u and to strengthen the relaxation given by the master problem through their inclusion in the set L. The MECP algorithm is presented as Algorithm 3. Comparing to the ECP algorithm, the novelty is the introduction of lines 7, 12–17. We point out that, at each iteration, between the solution of (M L ) (line 9) and the solution of (MyˆLk ) (lines 12–13), the solution (ˆ xk , yˆk ) is added to set L (line 11), to strengthen the linear relaxation built with the points of this set. For this reason, it is possible ˆk . With this strategy, we that the optimal solution xk of (MyˆLk ) is different from x expect that the MECP algorithm will find feasible solutions sooner, and, therefore, can close the integrality gap with less computational effort, when compared to ECP. We still note that the solution obtained when solving (MyˆLk ) is also added to L (line 17), in case the problem is feasible. As (MyˆLk ) is a linear programming problem, its resolution does not add significant computational burden, when compared to the resolution of (M L ). It is important to point out that the convergence of MECP to an optimal solution of (P ) is easily verified from the convergence of ECP, as MECP considers, during its iterations, all linearization points considered by ECP (with some additional points). We also emphasize that, in the context of this discussion, the approximation of the feasibility problem (PyˆFk ) by a linear programming problem would make no sense, because in case (MyˆLk ) is infeasible, the value yˆk cannot represent a solution to y in any problem (M L ) from the current iteration, and, therefore, the resolution of a feasibility problem is not necessary to cut out solutions with value yˆk from (M L ).
434
W. Melo et al. Input: (P ): MINLP problem, c : convergence tolerance. Output: (x∗ , y ∗ ): optimal solution for (P ).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
z l = −∞; z u = +∞; Choose an initial linearization point (x0 , y 0 ); L = {(x0 , y 0 )}; k = 1; Let (M L ) be the master problem constructed from (P ) over the points in L; Let (MyˆLk ) be the problem obtained by fixing the variable y of (M L ) at yˆk ; while z u − z l > c do Let (α ˆk , x ˆk , yˆk ) be an optimal solution of (M L ); l z = α; ˆ L = L ∪ (ˆ xk , yˆk ); L if (Myˆk ) is feasible then Let (αk , xk ) be an optimal solution of (MyˆLk ); if (xk , yˆk ) is feasible for (P ) AND f (xk , yˆk ) < z u then (x∗ , y ∗ ) = (xk , yˆk ); z u = f (xk , yˆk ); L = L ∪ (xk , yˆk ) ; k = k + 1;
Algorithm 3. Modified Extended Cutting Plane (MECP) algorithm. Finally, we note that (MyˆLk ) is an attempt of approximating problem (Pyˆk ) using linear programming. Thus, MECP is still a first order method, which can be implemented with no routines to solve NLP problems. This fact can represent a practical advantage, because, besides the reasons already mentioned about the difficulty in the computation of second order derivatives in some applications, in many cases, even the best computational NLP routines may fail to converge to the optimal solution of the addressed problem, even when the problem is convex and continuously differentiable. In some cases, these routines fail even to find a feasible solution of problems with nonempty feasible set, incorrectly declaring infeasibility. Thus, because it does not depend on NLP routines, the MECP algorithm appears as a robust approach for solving several problems of convex MINLP.
5
Computational Results
We now present computational results obtained with the application of ECP, OA, and our approach MECP, on a set of 343 convex MINLP test instances from the following libraries [3,9,15]. Table 1 brings statistics about the test problems. The high percentage of linear constraints present in the instances, in general, is apparent.
Modified Extended Cutting Plane Algorithm for MINLP
435
Table 1. Statistics on test problems. Min Max Variables
2
Integer variables (%)
0.00 1.00
Constraints
0
Mean
107222 975.34 0.52
Median 114 0.40
108217 1184.65 211
Linear constraints (%) 0.00 1.00
0.89
0.95
The algorithms were implemented in C++ 2011 and compiled with ICPC 16.0.0. To solve the MILP problems, we use the solver Cplex 12.6.0 [4], and to solve the NLP problems, we use the solver Mosek 7.1.0 [1]. The tests were run on a computer with processor core i7 4790 (3.6 GHz), under the operating system Open Suse Linux 13.1. All the algorithms were configured to be executed by a single processing thread, which means that they were executed by a single processor at each time in the machine used for the tests. The CPU time of each algorithm in each test instance was limited to 4 hours. Values of 10−6 and 10−3 were adopted as absolute and relative convergence tolerance, respectively, for all algorithms.
Fig. 1. Relative comparison of CPU time for the algorithms.
Figure 1 brings a relative comparison between the algorithms with respect to the CPU time spent in the set of test instances considered. Note that the data are normalized with respect to the best result obtained from all the approaches for each instance. In the horizontal axis, the abscissa indicates the number of times that the result obtained by the algorithm was greater than the best result among the algorithms with respect to computational time. In the vertical axis, the ordinate indicates the percentage of instances reached by each approach.
436
W. Melo et al.
More specifically, if the curve of a given algorithm passes through the point (α, τ ), this indicates that for τ % of the instances, the result obtained by the algorithm observed is smaller or equal to α times the best computational time among all algorithms. Note, for example, that the OA curve passes through the point (1, 57%). This means that, for 57% of the test instances considered, OA achieves the best result with respect to the computational time (one time greater than the best result). Next, the curve passes through the point (1.2, 63%), indicating that OA was able to solve 63% of the instances spending up to 20% more time than the best algorithm in each (1.2 times greater than the best result). Thus, roughly speaking, we can say that, the more the curve of an algorithm is above the curves of the other algorithms in the graph, the better the algorithm did when compared to the others, with respect to the characteristic analyzed in the graph. Analyzing Fig. 1, we can observe that the performance of OA dominates the performance of ECP. It is also possible to note that our MECP algorithm presents substantially better results than ECP, completely dominating its performance and even becoming competitive in relation to the OA algorithm. It is worth noting that the MECP curve dominates the OA curve for results greater than or equal to 2.2 times the best result. All algorithms were able to solve about 90% of the test instances in the maximum running time stipulated. Finally, we note that the implementations of all the algorithms considered in this study, ECP, MECP and OA, together with heuristics [11] are available in our MINLP solver Muriqui [10,12].
References 1. The MOSEK optimization software. Software. http://www.mosek.com/ 2. Bonami, P., Kilin¸c, M., Linderoth, J.: Algorithms and software for convex mixed integer nonlinear programs. Technical Report 1664, Computer Sciences Department, University of Wisconsin-Madison (2009) 3. CMU-IBM: Open source MINLP project (2012). http://egon.cheme.cmu.edu/ibm/ page.htm 4. Corporation, I.: IBM ILOG CPLEX V12.6 User’s Manual for CPLEX (2015). https://www.ibm.com/support/knowledgecenter/en/SSSA5P 12.6.0 5. D’Ambrosio, C., Lodi, A.: Mixed integer nonlinear programming tools: a practical overview. 4OR 9(4), 329–349 (2011). https://doi.org/10.1007/s10288-011-0181-9 6. Duran, M., Grossmann, I.: An outer-approximation algorithm for a class of mixedinteger nonlinear programs. Math. Program. 36, 307–339 (1986). https://doi.org/ 10.1007/BF02592064 7. Fletcher, R., Leyffer, S.: Solving mixed integer nonlinear programs by outer approximation. Math. Program. 66, 327–349 (1994). https://doi.org/10.1007/ BF01581153 8. Hemmecke, R., K¨ oppe, M., Lee, J., Weismantel, R.: Nonlinear integer programming. In: J¨ unger, M., Liebling, T.M., Naddef, D., Nemhauser, G.L., Pulleyblank, W.R., Reinelt, G., Rinaldi, G., Wolsey, L.A. (eds.) 50 Years of Integer Programming 1958–2008, pp. 561–618. Springer, Heidelberg (2010). https://doi.org/10.1007/9783-540-68279-0 15
Modified Extended Cutting Plane Algorithm for MINLP
437
9. Leyffer, S.: Macminlp: Test problems for mixed integer nonlinear programming (2003). https://wiki.mcs.anl.gov/leyffer/index.php/macminlp (2013). https:// wiki.mcs.anl.gov/leyffer/index.php/MacMINLP 10. Melo, W., Fampa, M., Raupp, F.: Integrating nonlinear branch-and-bound and outer approximation for convex mixed integer nonlinear programming. J. Glob. Optim. 60(2), 373–389 (2014). https://doi.org/10.1007/s10898-014-0217-8 11. Melo, W., Fampa, M., Raupp, F.: Integrality gap minimization heuristics for binary mixed integer nonlinear programming. J. Glob. Optim. 71(3), 593–612 (2018). https://doi.org/10.1007/s10898-018-0623-4 12. Melo, W., Fampa, M., Raupp, F.: An overview of MINLP algorithms and their implementation in muriqui optimizer. Ann. Oper. Res. (2018). https://doi.org/10. 1007/s10479-018-2872-5 13. Trespalacios, F., Grossmann, I.E.: Review of mixed-integer nonlinear and generalized disjunctive programming methods. Chem. Ing. Tech. 86(7), 991–1012 (2014). https://doi.org/10.1002/cite.201400037 14. Westerlund, T., Pettersson, F.: An extended cutting plane method for solving convex MINLP problems. Comput. Chem. Eng. 19(Supplement 1), 131–136 (1995). https://doi.org/10.1016/0098-1354(95)87027-X. European Symposium on Computer Aided Process Engineering 15. World, G.: MINLP library 2 (2014). http://www.gamsworld.org/minlp/minlplib2/ html/
On Proximity for k-Regular Mixed-Integer Linear Optimization Luze Xu and Jon Lee(B) University of Michigan, Ann Arbor, MI, USA {xuluze,jonxlee}@umich.edu
Abstract. Putting a finer structure on a constraint matrix than is afforded by subdeterminant bounds, we give sharpened proximity results for the setting of k-regular mixed-integer linear optimization. Keywords: Mixed-integer linear optimization
1
· Proximity · k-regular
Introduction
We study the standard form MILO (mixed-integer linear optimization) problem min{c x : Ax = b; x ≥ 0; xi ∈ Z for all i ∈ I},
(I-MIP)
with full row-rank A ∈ Zm×n , b ∈ Qm , and I ⊆ [n] := {1, 2, . . . , n}. The main issue that we are interested in is: for distinct I, J ⊆ [n], and optimal solution x∗ (I) to I-MIP, find a good upper bound on x∗ (I) − x∗ (J )∞ for some optimal x∗ (J ) to J -MIP. Mostly we are considering the ∞-norm, though it is nice to have results using the 1-norm. A key special-case of interest is I = ∅ and J = [n], where we are asking for a bound on how far components of an optimal solution of a pure MILO problem may be from components of a solution of its continuous relaxation—a quantity that is very relevant to the issue of rounding and local search starting from a relaxation solution. In some situations we add further natural conditions (e.g., b ∈ Zm , x∗ (∅) is a basic solution, etc.). Even in dimension n = 2, it is easy to construct examples where the solution of a pure MILO problem is far from the solution of its continuous relaxation. Choose p1 < p2 to be a pair of large, relatively-prime positive integers. Consider the integer standard-form problem min{x1 : p2 x1 − p1 x2 = p1 − p2 ; x1 , x2 ≥ 0; xi ∈ Z for all i ∈ I},
(I-P)
By the equation in I-P, every feasible solution (ˆ x1 , x ˆ2 ) satisfies (ˆ x1 +1)/(ˆ x2 +1) = p1 /p2 . Because p1 and p2 are relatively prime, there cannot be a feasible solution to {1, 2}-P with x ˆ1 smaller than p1 − 1. So the optimal solution to {1, 2}-P is (z1∗ , z2∗ ) := (p1 − 1, p2 − 1) . But it is very easy to see that the (unique and basic) Supported in part by ONR grant N00014-17-1-2296. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 438–447, 2020. https://doi.org/10.1007/978-3-030-21803-4_44
On Proximity for k-Regular Mixed-Integer Linear Optimization
439
optimal solution to its continuous relaxation ∅-P is (x∗1 , x∗2 ) := (0, −1 + p2 /p1 ), quite far from the optimal solution to {1, 2}-P. With such a small example, it is not obvious exactly what drives this behavior, and at a high level our goal is to control and investigate this. 1.1
Literature Review and Outline
Many of the results in this area focus on the general-form MILO problem max{c x : Ax ≤ b, xi ∈ Z for all i ∈ I}, where A ∈ Zm×n , b ∈ Qm , and ∅ = I ⊆ [n] := {1, 2, . . . , n} (see [2,4,5,9,13]). [2] gives a bound nΔ(A) on the ∞-norm distance between optimal solutions to the general-form pure MILO problem and its continuous relaxation, where Δ(A) is the maximum of the absolute values of the determinants of the square submatrices of A. Note that if b is not restricted to be integer, then this bound nΔ(A) is best possible (see [11]). But if we assume that b ∈ Zm , then it is not known whether this bound is optimal or not. [4] generalizes the objective function from linear functions to convex separable quadratic functions; [5,13] generalizes further to convex separable function. Recently, [9] considers proximity between optimal solutions of general MILO problems that differ only in the sets of indices of integer variables, and obtains a bound of |I1 ∪ I2 | Δ(A), where I1 and I2 are the sets of the indices of the integer variables. Of course we can move from the standard-form I-MIP to the general form by recasting Ax = b, x ≥ 0 as ⎡ ⎤ ⎡ ⎤ A b ⎣−A⎦ x ≤ ⎣−b⎦ . −I 0 Thus we could directly apply the theorems in the general form to the standard form using the simple fact that Δ([A; −A; −I]) = Δ(A). However, the special structure of the resulting general-form matrix could imply a better bound in some cases. For example, making a clever argument employing the “Steinitz Lemma”, ¯ is a basic optimal solution of [3] establishes ¯ x − z ∗ 1 ≤ m(2mU + 1)m , when x ∗ ∅-MIP and z is some optimal solution of [n]-MIP, where U = maxij {|aij |}. [1] gives an optimal bound Δ(A)−1 on the ∞-norm distance between basic optimal solutions and feasible integer solutions for standard-form knapsack polyhedra, i.e., the case when m = 1. [6,7] establishes a bound of k dim(n.s.(A)) on the ∞-norm distance between optimal solutions for “k-regular” ∅-MIP and [n]-MIP, where k-regular means that the elementary vectors (i.e., the nonzero vectors with minimal support) in the null space of the constraint matrix A can be scaled to have all entries in {0, ±1, ±2, . . . , ±k} (see [6–8]). Note that k = 1 or regular is equivalent to A be equivalent to a totally-unimodular matrix. A nice family of examples with k = 2 is when A is the vertex-edge incidence matrix (or its transpose) of a mixed-graph.
440
L. Xu and J. Lee
In what follows, we focus on k-regular I-MIP. In Sect. 1.2, we review some needed fundamentals. In Sect. 2.1, we establish a proximity result for k-regular I-MIP. In Sect. 2.2, we improve the bound for a 2-regular pure MILO problem relative to a basic optimal solution of its continuous relaxation. In Sect. 3, we consider a special 2-regular case, where A is the vertex-edge incidence matrix of a mixed graph G, and we give a sufficient condition on G such that the ∞-norm distance between any basic optimal solution and some feasible integer solution is at most 1. 1.2
Fundamentals
Let F be an arbitrary field. For any x ∈ Fn , the support of x, denoted x, is the set of coordinates with nonzero entries, i.e., x := {i ∈ [n] : xi = 0}. Let V be a vector subspace of Fn . A vector x ∈ V is an elementary vector of V if x = 0, and x has minimal support in V \ {0}; i.e., x ∈ V \ {0} and y ∈ V \ {0} with y x. The set of elementary vectors of V is denoted as F(V ). Assume now that F is ordered. A vector y ∈ Fn conforms to x ∈ Fn if xi yi > 0 for i ∈ y. The following result of Rockafellar is fundamental, that every nonzero x ∈ V can be expressed as a conformal sum of at most min{dim(V ), |x|} elementary vectors from F(V ). Theorem 1. ([10], Theorem 1). Let V be a subspace of Fn , where F is an ordered field. For every t x ∈ V \ {0}, there exists elementary vectors v1 , . . . , vt ∈ V , such that x = i=1 vi , where each vi conforms to x, none has its support contained in the union of the supports of the others, and t ≤ min{dim(V ), |x|}. Definition 2. Let V be a subspace of Rn . The subspace V is k-regular if ∀ x ∈ F(V ), ∃ λ ∈ F \ {0} such that λxi ∈ {±1, ±2, . . . , ±k} ∀ i ∈ x. We also refer to a standard-form problem as k-regular when the null space of its constraint matrix is k-regular. We have the simple following property for 2-regular standard-form problems. Proposition 3. Let P := {x : Ax = b, x ≥ 0}, where A ∈ Zm×n and rank(A) = m, and suppose that P has an integer feasible solution (this implies that b ∈ Zm ). If V := n.s.(A) is 2-regular, then every basic solution (feasible or not) x ¯ of P satisfies 2¯ x ∈ Zn . Proof. Rearranging columns, we may assume that the first m columns of A form ¯. That is, A = [Aβ , Aη ], x ¯β = A−1 a basis matrix Aβ corresponding to x β b, and −1 x ¯η = 0. Multiply Aβ on both sides of Ax = b, we get [I, M ]x = A−1 β b. Also V = r.s.([−M , I]) (r.s.(B) means the row space of B), and each row of [−M , I] is in F(V ). Because each row has an entry of 1, and each row can be scaled by a nonzero to have all entries in {0, ±1, ±2}, it follows that all entries of M are in {0, ± 12 , ±1, ±2}. So we have that Ax = b is equivalent to [2I, 2M ]x = 2A−1 β b, where now [2I, 2M ] is an all-integer matrix. Plugging a feasible integer solution m xβ = 2A−1 x0 into [2I, 2M ]x = 2A−1 β b, we can conclude that 2¯ β b ∈ Z , and so n 2¯ x∈Z .
On Proximity for k-Regular Mixed-Integer Linear Optimization
2 2.1
441
Proximity for k-Regular MILO k-Regular Mixed-Integer Linear Optimization
[9] considers the question of bounding the ∞-norm distance between optimal solutions of mixed-integer linear problems that only differ in the sets of indices of integer variables. And by using the properties of so-called bimodular systems in [12], they manage to give a tighter bound Δ(A) − 1 for the special case when Δ(A) ≤ 2 and I1 , I2 ∈ {∅, [n]}, which is just in terms of Δ(A) and not relative to |I1 ∪ I2 | = n. However, Δ(A) ≤ 2 is a very strong assumption. Of course totallyunimodular A have this property, so we have vertex-edge incidence matrices of digraphs, for example. But we do not know further broad families of examples for Δ(A) ≤ 2. It is natural to think about vertex-edge incidence matrices of mixed graphs. But in general these have subdeterminants that are ±2k , k ∈ Z+ . For example, if G is a collection of k disjoint undirected triangles, then the square vertex-edge incidence matrix has determinant 2k . But interestingly, the null space of the vertex-edge incidence matrix of every mixed graph is 2-regular (see [7], for example), so there is an opportunity to get a better proximity bound than afforded by only considering Δ(A). Generally, for integer matrices A, we have k ≤ Δ(A) (see [7]), and so the idea of k-regularity gives a more refined view that can be exploited. We consider the standard-form I-MIP, where we assume that V := n.s.(A) is k-regular and dim(V ) = r. Note that [n]-MIP is the k-regular pure-integer problem, while ∅-MIP is its continuous relaxation. Theorem 4. ([7]). If [n]-MIP is feasible, then for each optimal solution x∗ to the corresponding continuous relaxation ∅-MIP, there exists an optimal solution z ∗ to [n]-MIP with z ∗ − x∗ ∞ ≤ kr. We are going to generalize Theorem 4, using the technique developed in [9]. Toward this, we restate the main lemma used in [9]. Lemma 5. ([9], Lemma 1). Let d, t ∈ Z≥1 , g 1 , . . . , g t ∈ Zd , and α1 , . . . , αt ≥ t 0. If there exist βi ∈ [0, αi ] for i ∈ [t] such that not all i=1 αi ≥ d, then t β1 , . . . , βt are zero and i=1 βi g i ∈ Zd . We use a mild generalization of [3, Lemma 5] (from the case I = [n], J = ∅): Lemma 6. Assume that I ∪ J = [d], where d ∈ [n]. Let x∗ (I) and x∗ (J ) be an optimal solution of I-MIP and J -MIP, respectively. If there is a vector w ∈ Zd × Rn−d satisfying Aw = 0, w conforms to x∗ (I) − x∗ (J ), and |wi | ≤ |x∗ (I)i − x∗ (J )i | for i ∈ [n], then x∗ (I) − w and x∗ (J ) + w are also an optimal solution to I-MIP and J -MIP, respectively. Proof. First we claim that x∗ (J ) + w is also feasible to J -MIP. To see this, because the coordinates of x∗ (J ) and w indexed by J are integer, we observe that the coordinates of x∗ (J ) + w indexed by J are also integer. Furthermore, because Aw = 0, we see that A(x∗ (J ) + w) = b. Also because w conforms to
442
L. Xu and J. Lee
x∗ (I) − x∗ (J ) and |wi | ≤ |x∗ (I)i − x∗ (J )i |, we have x∗ (J ) + w ≥ 0. Similarly, x∗ (I) − w is also feasible to I-MIP. Because of the optimality of x∗ (I), we have c x∗ (I) ≤ c (x∗ (I) − w), i.e., c w ≤ 0, thus c (x∗ (J ) + w) ≤ c x∗ (J ). Therefore x∗ (J ) + w is also an optimal solution of J -MIP and c w = 0. This also implies that x∗ (I) − w is also an optimal solution of I-MIP. Next, we generalize Theorem 4, which is the special case when I = ∅, J = [n]. Also this theorem gives a better bound than [9] for the k-regular case (because k ≤ Δ(A)). The proof is mainly based on Theorem 1, Lemmas 5 and 6. Theorem 7. Suppose that V := n.s.(A) is k-regular and dim(V ) = r. Let I, J ⊆ [n] with I = J such that J -MIP has an optimal solution. For every optimal x∗ (I) of I-MIP, there exists an optimal x∗ (J ) of J -MIP such that x∗ (I) − x∗ (J )∞ ≤ k min{r, |I ∪ J |}. Proof. Without loss of generality, assume that I ∪ J = [d], with d ∈ [n]. Let x∗ (I) ∈ Rn be optimal for I-MIP. And let z˜(J ) ∈ Rn be any optimal solution of ∈ V can be expressed as a conformal J -MIP. By Theorem 1, y := x∗ (I) − z˜(J ) t sum of at most r vectors in F(V ), i.e., y = i=1 v i , t ≤ r, where each v i ∈ F(V ) i conforms to y. For each summand v , because V is k-regular, there exists a positive scalar λi , so that that λ1i v i is {0, ±1, . . . , ±k}-valued. So we have t
t λi λ1i v i := i=1 λi g i , where g i = λ1i v i is an integer vector with g i ∞ ≤ k and Ag i = 0, g i = λ1i v i also conforms to y. Next, consider the set t ¯i g i ∈ Zd × Rn−d }, S := {(¯ γ1 , . . . , γ¯t ) : γ¯i ∈ [0, λi ] for all i ∈ [t], i=1 γ y=
i=1
which is non-empty (it contains exists some t 0) and compact. Hence,there t i ¯i over S. Let w := (γ1 , . . . , γt ) ∈ S maximizing i=1 γ i=1 γi g . Because i ∗ conforms to y = x (I) − z ˜ (J ), we know that w also conforms to y and g i i t t g = |w g = |y γ |, λ | for j ∈ [n]. Thus |w | ≤ |y | for j ∈ [n] j j j j i=1 i j i=1 i j t i because γi ∈ [0, λi ]. Along with Aw = i=1 γi Ag = 0, by Lemma 6, we know that x∗ (J ) := z˜(J ) + w is also an optimal solution to (J -MIP). The distance from x∗ (J ) to x∗ (I) can be bounded as follows: t t t ∗ ∗ i (λi − γi ) g i ∞ ≤ (λi − γi )k. x (I) − x (J )∞ = (λi − γi )g ≤ i=1
∞
i=1
i=1
t It remains to argue that i=1 (λi − γi ) ≤ min{d, r}. Since ( λ1 , . . . , λt ) ∈ S, t t we have i=1 γi ≥ i=1 λi , thus t t i=1 (λi − γi ) ≤ i=1 (λi − λi ) ≤ t ≤ r. t Now we only need to argue that i=1 (λi − γi ) ≤ d. Let αi := λi − γi ≥ 0 for i ∈ [t]. Suppose, for the sake of contradiction, that this inequality does not
On Proximity for k-Regular Mixed-Integer Linear Optimization
443
t i d i hold, i.e., i=1 αi > d. Thus, let h ∈ Z be the projection of g onto the i , h , and obtain β , . . . , βt with first d coordinates, we can apply Lemma 5 to α 1 t i t βi ∈ [0, αi ] such that not all βi ’s are zero and i=1 βi hi ∈ Zd . Hence i=1 βi g i ∈ d n−d . Now we consider γi := γi + βi ≥ 0. Note that γi ≤ γi + αi = λi , and Z ×R t t t i i i d n−d . i=1 γi g = i=1 γi g + i=1 βi g ∈ Z × R So (γ1 , . . . , γt ) ∈ S. However, because not all βi ’s are zero, we have t i=1 γi , which contradicts the maximality of (γ1 , . . . , γt ). 2.2
t i=1
γi >
2-Regular Pure-Integer Linear Optimization
To improve the bound of Theorem 7, we focus on the important special case of k = 2 and b ∈ Zm , for two important subcases: (i) I = ∅, J = [n] when the optimal solution of ∅-MIP is basic, and (ii) I = [n], J = ∅. In this case, the bound of Theorem 7 is 2r, which we now improve to 3r/2. Theorem 8. Suppose that A has full row rank, V := n.s.(A) is 2-regular, b ∈ Zm , [n]-MIP is feasible, and ∅-MIP has an optimal solution. (1) For each basic optimal solution x ¯∗ to ∅-MIP, there exists an optimal solution ∗ ∗ ∗ x − z ∞ ≤ 32 r; z to [n]-MIP with ¯ (2) For each optimal solution z ∗ to [n]-MIP, there exists an optimal solution x∗ to ∅-MIP with z ∗ − x∗ ∞ ≤ 32 r. Proof. (1): The proof is similar to that of Theorem 7 with some extra care using Theorem 1 and Proposition 3. Let x ¯∗ be a basic optimal solution of ∅-MIP. If ∗ n ∗ ∗ ¯ satisfies the conclusion, so we just consider x ¯∗ ∈ / Zn . x ¯ ∈ Z , then z := x ∗ n Because V is 2-regular, by Proposition 3, we have 2¯ x ∈ Z . Let z˜ be optimal for [n]-MIP. By Theorem 1, y := x ¯∗ − z˜ ∈ Vcan be expressed as a conformal sum t of at most r vectors in F(V ), i.e., y = j=1 v j , t ≤ r, and none of v j has its support contained in the union of the supports of the others, which means that for each v j , there exists an index ij ∈ {1, . . . , n} such that vijj = 0 and vilj = 0 for l = j. For each summand v j , because V is 2-regular, there exists a positive scalar λj for v j , so that λ1j v j is {0, ±1, ±2}-valued. So we can write y=
t j=1
λj λ1j v j :=
t j=1
λj g j ,
j 1 j λj v is an integer vector with gi ∈ {0, ±1, ±2}. t := j=1 λj g j ∈ Zn . By Lemma 6, we know that
where g j =
z ∗ := z˜ + w is also Let w an optimal solution to [n]-MIP. For this optimal solution, we have x ¯∗ − z ∗ =
t
j=1 (λj
− λj )g j .
Without loss of generality, we can assume that μj = λj − λj ∈ (0, 1). t t Because 2¯ x∗ ∈ Zn , we have j=1 2μj g j ∈ Zn . For s ∈ {1, . . . , t}, j=1 2μj gijs =
444
L. Xu and J. Lee
2μj gijs ∈ Z \ {0}, when gijs = 1, μj = 12 , when gijs = 2, μj ∈ { 14 , 12 , 34 }. Therefore, t t 3 3 j (λ −
λ ) g ≤ (λj − λj ) g j ∞ ≤ · 2t ≤ r. ¯ x∗ − z ∗ ∞ = j j 4 2 j=1 j=1 ∞
(2): The proof is similar to that of (1) by choosing a basic optimal solution x ¯ first, and then letting x∗ := x ¯ − w. Remark 9. For the mixed-integer case, we do not have a result like Proposition 3 for the optimal solution; so we cannot generalize Theorem 8 in such a direction. Next we give an example to demonstrate that for Theorem 8, Part (1), the bound of 32 r cannot be improved to better than r. ⎤ ⎡ 1 1 ⎡ ⎤ 1 101 2 2 −2 ⎥ ⎢ Example 10. Let G = ⎣1 1 0⎦, G−1 = ⎣− 12 12 12 ⎦, e1 = (1, 0, 0) , h = 1 011 − 12 12 ⎡ ⎤ ⎡ ⎤ 2 1 0 0 0 1000 ⎢− 1 1 1 − 1 ⎥ ⎢ ⎥ 2 2 2 2⎥ ¯ = ⎢1 1 0 1⎥, G ¯ −1 = ⎢ ¯1 = G−1 e1 = ( 12 , − 12 , 12 ) , G ⎢ 1 1 ⎥, e 1 1 ⎣0 1 1 0⎦ − ⎣ 2 2 2 2 ⎦ 0011 − 12 12 − 21 12 1 1 1 ¯ −1 ¯ e¯1 = (1, − , , − ) , (1, 0, 0, 0) , h = G 2 2 2 ⎡ ⎤ ¯ 0 . . . 0 e¯1 . . . e¯1 G ⎢ 0 G . . . 0 e1 . . . 0 ⎥ ⎢ ⎥ A=⎢. . . . ⎥ ∈ Z(4+3p)×(4+4p) , b = β1 ∈ Z4+3p , . . . . ⎣ . . . . 0 ... 0⎦ 0 0 . . . G 0 . . . e1 ¯ G, . . . , G}, and consider the basic feasible solution u = B −1 b = Let B = diag{G, β (β, 0, β, 0, 2 1 ) . For each integer solution x in P ∩ Z4+4p , left multiply Ax = b by B −1 to get ⎤ ⎡ ¯ ... h ¯ I4 0 . . . 0 h ⎢ 0 I3 . . . 0 h . . . 0 ⎥ ⎥ ⎢ ⎥ x = u, ⎢ .. .. . . .. ⎣ . . . . 0 . . . 0⎦ 0 0 . . . I3 0 . . . h i.e.,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 β x1 p ⎢x2 ⎥ ⎢ 0 ⎥ ⎢− 1 ⎥ 4 ⎢ ⎥=⎢ ⎥− ⎢ x3p+4+i ⎣ 12 ⎥ ⎣x3 ⎦ ⎣β ⎦ ⎦∈Z 2 i=1 0 x4 − 21 ⎡ ⎤ ⎡ 1 ⎤ x3i+2 2 ⎣x3i+3 ⎦ = β 1 − x3p+4+i ⎣− 1 ⎦ ∈ Z3 , i = 1, . . . , p 2 2 1 x3i+4 2
On Proximity for k-Regular Mixed-Integer Linear Optimization
445
Now, let p be an even integer, and let β be a large enough (> p) odd integer. Then x3p+4+i is odd for i = 1, . . . , p because of the integrality pof x3i+2 , which implies x3p+4+i ≥ 1. In this case, x − u∞ ≥ |x1 − u1 | = i=1 x3p+4+i ≥ p for all feasible integer solution x. Note that because A has full row rank, r = n − m = (4 + 4p) − (4 + 3p) = p.
3
Special Case: The Incidence Matrix of a Mixed Graph
Inspired by the proximity result for bimodular matrices (i.e., Δ(A) ≤ 2) in [9,12], we consider a special case where A is the incidence matrix of a mixed graph. Such an A has a 2-regular null space, but it is not generally bimodular (see [7]). A mixed graph G = G(V, E+ , E− , A) has vertices V, positive edges E+ , negative edges E− , and arcs A. An edge with identical endpoints is a (positive or negative) loop. An arc may have one void endpoint, in which case it is a half arc. The incidence matrix A of G has a row for each vertex and a column for each edge and arc. For each positive (resp., negative) loop e = (v, v), Av,e = +2(−2). For all other positive (resp., negative) edges e = (v, w), Av,e = Aw,e = +1(−1). For each half arc a = (v, ∅) (respectively, a = (∅, w)), Av,a = +1 (Aw,a = −1). For each arc a = (v, w), Av,a = −Aw,a = +1. All unspecified entries of A are zero. Mixed graphs, and their incidence matrices, have been studied previously under the names bidirected graphs and signed graphs (see [14]). For a mixed graph G(V, E+ , E− , A), we can construct an oriented signed graph which has the same incidence matrix. The signed graph Σ consists of an unsigned graph (V, {E+ , E− , A}), and an arc labelling σ, which maps E+ and E− to −1 and maps A (except half arcs) to +1. Because there are in general two possibilities for each column of the incidence matrix of a signed graph, the orientation is chosen to make the incidence matrix the same as the mixed graph (see [14]). For a cycle C = e1 e2 . . . ek not containing a half arc, if the product of the arc labelling of the cycle σ(e1 )σ(e2 ) . . . σ(ek ) is +1, then the cycle is balanced, and otherwise it is unbalanced. For C with some orientation, the incidence matrix is ⎤ ⎡ 1 0 0 ... −σ(ek ) ⎢−σ(e1 ) 1 0 ... 0 ⎥ ⎥ ⎢ ⎢ .. ⎥ . . ⎢ . . ⎥ −σ(e2 ) 1 (1) C=⎢ 0 ⎥, ⎥ ⎢ . . . . . .. .. .. .. ⎦ ⎣ .. 1 0 0 . . . −σ(ek−1 ) up to permutations of rows and columns. Clearly, det C = 1 − σ(e1 ) . . . σ(ek ). For a balanced cycle, det C = 0, and the incidence matrix is not of full rank. For an unbalanced cycle, det C = ±2, and the incidence matrix has full rank. The following is easy to see via Cramer’s rule. Proposition 11. Suppose that C is the incidence matrix of an unbalanced cycle, then every element in C −1 is ± 12 .
446
L. Xu and J. Lee
Lemma 12. If full row-rank A is the incidence matrix of a mixed graph, and B is a basis of A, then up to row/column rearrangement, B is block diagonal, with each block being the incidence matrix of a quasitree, consisting of a spanning tree, plus a half arc or an arc forming an unbalanced cycle (including the case that the arc is a loop). Lemma 12 follows from Theorem 5.1 (g) of [14], but [14] has another case for a block of B—that it represents a spanning tree. But, in that case det(B) = 0. Theorem 13. Let A be the incidence matrix of a mixed graph G such that for every set S of vertex disjoint unbalanced cycles in G, there is a partition S1 S2 of S such that each unbalanced cycle in S1 has a half arc in G incident to the cycle and S2 has a perfect matching in G pairing these unbalanced cycles. Suppose that P := {x : Ax = b, x ≥ 0} and P ∩ Zn = ∅. Then for each vertex u of P , there exists y ∈ P ∩ Zn satisfying y − u∞ ≤ 1. Proof. If u ∈ Zn , then y = u ∈ P ∩Zn and the theorem holds. So we assume that u∈ / Zn and B is a basis of A that satisfies BuB = b, where u = [uB ; uN ] and uN = 0. By Lemma 12, with some rearrangement of the columns, we can assume that B is a block diagonal matrix, with blocks B1 , . . . , Bs , where Bi represents a quasitree containing an unbalanced cycle, i.e.,
Ci Ei Bi = ∈ {0, ±1}ni ×ni , 0 Di where Ci ∈ {0, ±1}pi ×pi represents an unbalanced cycle, or Bi represents a quasitree containing a half arc, i.e., |det Bi | = 1. Note that we also use Bi to call the column indices of each block. We have ⎡ −1 ⎤ ⎡ ⎤ u1 B1 b1 ⎢B −1 b2 ⎥ ⎢u2 ⎥ 2 ⎢ ⎥ ⎢ ⎥ uB = B −1 b = ⎢ . ⎥ = ⎢ . ⎥ . ⎣ .. ⎦ ⎣ .. ⎦ Bs−1 bs
us
If ui ∈ / Zni , then |det Bi | = 2, and there is an unbalanced cycle Ci in this block. Similarly to the proof of Theorem 2 in [12], the lattice L generated by the columns of Bi−1 can be divided into two classes: Zni and ui +Zni . For any j ∈ [ni ], we have / Zni , otherwise 1 Bi rj = 1 ej , the left-hand side is even because rj = Bi−1 ej ∈ 1 Bi ≡ 0 mod 2, while the right-hand side is odd, resulting in a contradiction. Therefore rj ∈ ui + Zni . Also for j ∈ [pi ], rj = Bi−1 ej = [Ci−1 (:, j); 0], implying that the first pi entries of rj are 12 or − 12 by Proposition 11. Consider the set S of unbalanced cycles in the blocks that ui is not integer, by the assumption, there is a partition S1 S2 of S such that each unbalanced cycle in S1 has a half arc in G incident to the cycle and S2 has a perfect matching in G pairing these unbalanced cycles. For each unbalanced cycle in S1 , assume that the cycle is in block Bi , and the half arc incident to it has a non-zero (±1) corresponding to the columns
On Proximity for k-Regular Mixed-Integer Linear Optimization
447
with ej (j ∈ pi ) in block Bi . The corresponding column of the half arc in A is denoted A·t . Let rj = −Bi−1 ej . We have ui ± rj ∈ Zni . Because j ∈ [pi ], we have rj ∞ ≤ 12 . Now choose r as rt = 1, rB = −B −1 A·t and 0 otherwise for this cycle. We have Ar = 0, rB ∞ ≤ 12 , and uB + rB is integer in block Bi . For each pair of unbalanced cycles in S2 , assume that the cycles are in blocks Bi1 and Bi2 , and the edge pairing them has two non-zeros corresponding to the entries with ej1 (j1 ∈ [pi1 ]) in block Bi1 and ej2 (j2 ∈ [pi2 ]) in block Bi2 . ej1 The column of the pairing edge in A is denoted A·t . Then let rj1 = −Bi−1 1 −1 ni1 ni2 and rj2 = −Bi2 ej2 . We have ui1 ± rj1 ∈ Z and ui2 ± rj2 ∈ Z . Also, because j1 ∈ [pi1 ] and j2 ∈ [pi2 ], we have rj1 ∞ ≤ 12 , rj2 ∞ ≤ 12 . Now choose r as rt = 1, rB = −B −1 A·t and 0 otherwise for this pair. We have Ar = 0, rB ∞ ≤ 12 , and uB + rB is integer in blocks Bi1 and Bi2 . Following this way, we can construct r1 , . . . , rl for all unbalanced cycles in S1 and all pairs in S2 , and for each block that ui is not integer, there is only one ri that is non-zero in this block. Let y = u + r1 + · · · + rl , then y ∈ Zn , and 0 ≤ (y − u)N ≤ 1, (y − u)B ≥ − 21 . Because u ≥ 0, we have yN ≥ uN ≥ 0, yB ≥ uB − 12 ≥ − 12 , which implies yB ≥ 0 because of the integrality of y. Therefore y ∈ P ∩ Zn and y satisfies y − u∞ ≤ 1.
References 1. Aliev, I., Henk, M., Oertel, T.: Distances to lattice points in knapsack polyhedra. arXiv preprint arXiv:1805.04592 (2018) ´ Sensitivity theorems in integer 2. Cook, W., Gerards, A.M., Schrijver, A., Tardos, E.: linear programming. Math. Program. 34(3), 251–264 (1986) 3. Eisenbrand, F., Weismantel, R.: Proximity results and faster algorithms for integer programming using the Steinitz lemma. In: SODA. pp. 808–816 (2018) 4. Granot, F., Skorin-Kapov, J.: Some proximity and sensitivity results in quadratic integer programming. Math. Program. 47(1–3), 259–268 (1990) 5. Hochbaum, D.S., Shanthikumar, J.G.: Convex separable optimization is not much harder than linear optimization. J. ACM 37(4), 843–862 (1990) 6. Lee, J.: Subspaces with well-scaled frames. Ph.D. dissertation, Cornell University (1986) 7. Lee, J.: Subspaces with well-scaled frames. Linear Algebra Appl. 114, 21–56 (1989) 8. Lee, J.: The incidence structure of subspaces with well-scaled frames. J. Comb. Theory Ser. B 50(2), 265–287 (1990) 9. Paat, J., Weismantel, R., Weltge, S.: Distances between optimal solutions of mixed-integer programs. Math. Program. https://doi.org/10.1007/s10107-0181323-z (2018) 10. Rockafellar, R.T.: The elementary vectors of a subspace of Rn . In: Combinatorial Mathematics and Its Applications, pp. 104–127. University of North Carolina Press (1969) 11. Schrijver, A.: Theory of Linear and Integer Programming. Wiley (1998) 12. Veselov, S.I., Chirkov, A.J.: Integer program with bimodular matrix. Discret. Optim. 6(2), 220–222 (2009) 13. Werman, M., Magagnosc, D.: The relationship between integer and real solutions of constrained convex programming. Math. Prog. 51(1), 133–135 (1991) 14. Zaslavsky, T.: Signed graphs. Discrete Appl. Math. 4(1), 47–74 (1982)
On Solving Nonconvex MINLP Problems with SHOT Andreas Lundell1(B) and Jan Kronqvist2 1
2
Faculty of Science and Engineering, Mathematics and Statistics, ˚ Abo Akademi University, Turku, Finland [email protected] Department of Computing, Imperial College London, London, UK [email protected]
Abstract. The Supporting Hyperplane Optimization Toolkit (SHOT) solver was originally developed for solving convex MINLP problems, for which it has proven to be very efficient. In this paper, we describe some techniques and strategies implemented in SHOT for improving its performance on nonconvex problems. These include utilizing an objective cut to force an update of the best known solution and strategies for handling infeasibilities resulting from supporting hyperplanes and cutting planes generated from nonconvex constraint functions. For convex problems, SHOT gives a guarantee to find the global optimality, but for general nonconvex problems it will only be a heuristic. However, utilizing some automated transformations it is actually possible in some cases to reformulate all nonconvexities into linear form, ensuring that the obtained solution is globally optimal. Finally, SHOT is compared to other MINLP solvers on a few nontrivial test problems to illustrate its performance. Keywords: Nonconvex MINLP · Supporting Hyperplane Optimization Toolkit (SHOT) · Reformulation techniques · Feasibility relaxation
1
Introduction
Mixed-integer nonlinear programming (MINLP) constitutes a difficult class of mathematical optimization problems. As MINLP combines the combinatoric nature of mixed-integer linear programming (MILP) and nonlinearities of nonlinear programming (NLP), there is still today often a practical limit on the size of the problems (with respect to number of constraints and/or variables) that can be solved. While this limit is constantly pushed forward through the means of computational and algorithmic improvement, there are still MINLP problems with only a few variables that are difficult to solve. Most of these cases AL and JK acknowledge support from the Magnus Ehrnrooth Foundation and the Newton International Fellowship by the Royal Society (NIF\R1\82194) respectively. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 448–457, 2020. https://doi.org/10.1007/978-3-030-21803-4_45
On Solving Nonconvex MINLP Problems with SHOT
449
are nonconvex problems, i.e., MINLP problems with either a nonconvex objective function or one or more nonconvex constraints, e.g., a nonlinear equality constraint. Globally solving convex MINLP problems can nowadays be regarded almost as a technology, as seen in a recent benchmark [10]. However, global nonconvex MINLP is still very challenging. Solvers for this problem class include Antigone [18], BARON [21], Couenne [1] and SCIP [5]. These solvers mostly rely on spatial branch and bound, where convex understimators and concave overestimators are refined in nodes in a branching tree. There are also reformulation techniques that can transform special cases of nonconvex problems, e.g., signomial [14] or general twice-differentiable [16], into convex MINLP problems that can then be solved with convex solvers. The requirement to solve the MINLP problem to guaranteed optimality or having tight bounds on the best possible solution may be a nice bonus, but for many real-world cases, it is not always a necessity or even possible. Often end users of optimization software are mostly interested in finding a good-enough feasible solution to the optimization problem at hand within a reasonable time. For these use-cases, local MINLP solvers may be worth considering. Here, a local solver for MINLP is defined as a solver that while it does solve convex problems to global optimality, it cannot guarantee that a solution is found for a nonconvex problem, let alone a locally or globally optimal one. However, local solvers are often faster than global solvers, and in many cases they also manage to return the global solution, or a very good approximation of it. Local MINLP solvers include AlphaECP [13], Bonmin [2], DICOPT [6] and SBB [4]. SHOT is a new local solver initially intended mainly for convex MINLP problems [15]. In this paper, the following general type of MINLP problem is considered: minimize f (x), subject to Ax ≤ a, Bx = b, gk (x) ≤ 0 hk (x) = 0 xi ≤ xi ≤ xi xi ∈ R, xj ∈ Z
∀k ∈ KI , ∀k ∈ KE , ∀i ∈ I = {1, 2, . . . , n}, ∀i, j ∈ I, i = j.
(1)
Here, the nonlinear functions f , g and h are considered to be differentiable, but we set no restriction on the convexity of the functions. In this paper, as is often the case in MINLP, we refer to solutions that fulfill all the constraints in problem (1) as primal solutions, and the objective function value of the best known primal solution, the lowest value in case of a minimization problem, as the primal bound. A dual solution will then be a solution to a relaxed version of problem (1), e.g., where the integer or nonlinear constraints are ignored. The dual bound will then correspond to the best possible value the objective function can take and the goal in global optimization is to reduce the gap between the dual and primal bound to zero. Most deterministic global MINLP solvers will return both a primal and a dual bound, while a local solver normally only provides a primal bound.
450
2
A. Lundell and J. Kronqvist
The SHOT Solver
SHOT is an open source solver,1 combining a primal and a dual strategy as described in detail in [15]. The dual strategy is based on the extended supporting hyperplane (ESH) [11] and extended cutting plane (ECP) [20] algorithms. These iteratively improve the polyhedral outer approximation, in the form of a MILP problem of the nonlinear feasible set in the MINLP problem by adding linear cuts in the form of supporting hyperplanes or cutting planes. The dual strategy is tightly integrated with the underlying MILP solver (CPLEX, Gurobi or Cbc). The primal strategy in SHOT includes several (deterministic) heuristics such as solving NLP subproblems with fixed-integer values or utilizing alternative solutions in the MILP solver’s solution pool, to find solutions fulfilling all constraints in the MINLP problem. Including other primal strategies such as the center-cut algorithm [9] is also planned. Example 1. In Fig. 1, it is illustrated how the original nonconvex strategy in SHOT works on a simple MINLP problem of two variables (2 ≤ x1 ≤ 8 and x2 ∈ {0, 1, 2}). The objective is to minimize f (x1 , x2 ) = x1 − x2 . There are two linear constraints l1 (x1 , x2 ) ≤ 0 and l2 (x1 , x2 ) ≤ 0. The problem also has a nonconvex feasible set resulting from the intersection of two nonlinear constraints. Figure 1 illustrates that SHOT is not well-equipped for solving nonconvex problems, and to prepare SHOT for solving these, some additional strategies and techniques explained in the following sections are required. These include: a heuristic for repairing infeasibilities that may appear when generating cuts for nonconvex constraints and a strategy for forcing solution updates in case of convergence to a local optima. In addition, some reformulations implemented in SHOT, which exactly linearize certain classes of functions, are detailed in Sect. 4. These reformulations have the advantage that if all nonconvex nonlinearities can be handled in this way, the solution returned by SHOT will actually be the guaranteed global solution.
3
SHOT as a Local Solver for Nonconvex MINLP Problems
As could be seen in Example 1, the main issue is that cuts generated exclude viable solution candidates. In general, a simple solution to this shortcoming is to generate fewer and less tight hyperplane cuts, e.g., by generating cutting planes (ECP) instead of supporting hyperplanes (ESH), or by reducing the emphasis given to generating cuts for nonconvex constraints. Also, since it is nontrivial, or in many cases not even possible, to obtain the integer-relaxed interior point required in the ESH algorithm, the ECP method might in general be a safer choice in the dual strategy in SHOT; this is however not considered in this paper. 1
SHOT is available at https://www.github.com/coin-or/shot.
On Solving Nonconvex MINLP Problems with SHOT 2 x2
2 x2
2 x2
1
1
1
0
l1 2
l2 4
6
x1 8
0
c1 2
4
6
x1 8
0
2
4
6
451
x1 8
Fig. 1. In the figures the shaded area indicate the integer-relaxed feasible region of the MINLP problem. In the figure to the left, the MILP problem in the first iteration, with the feasible set defined by the variable bounds and the two original linear constraints l1 and l2 , has been solved to obtain the solution point (2, 2). In the middle figure, a root search is performed according to the ESH algorithm between this point and an interior point (6.0, 0.4) of the integer-relaxed feasible region of the MINLP problem to give a point on the boundary. A supporting hyperplane (c1 ) is then added to the MILP problem. In the figure to the right, the integer-relaxed feasible region of the updated MILP problem is shown. Since all integer-feasible solutions have been cut off, the problem is infeasible. We cannot continue after this and no primal solution is found.
MINLP problems containing nonlinear equality constraints also pose a problem; for example, they have no interior points. Currently the strategy is to replace constraints h(x) = 0 with (h(x))2 ≤ 0, which are better suited for the dual strategy. Other reformulations are also utilized, such as partitioning terms in separable objective or constraint functions. This is explained in [12] for the convex case, but the principle is the same also for nonconvex functions. Only the dual problem is reformulated, the original problem formulation is still used in the primal strategy. The original MINLP problem is also used for checking feasibility of solution candidates, since utilizing reformulated constraints could reduce accuracy. A simple technique to improve the performance of SHOT for nonconvex problems is to increase the effort put on the primal strategy. Currently, this can be accomplished by (i) solving more integer-fixed NLP problems and (ii) increasing the maximum number of solutions saved in the MILP solver’s solution pool and changing the strategy of the MILP solver to put more emphasis on finding alternative solutions. An addition is also that after an integer combination has been tested with the fixed NLP strategy, an integer cut is introduced similarly to what is done in DICOPT. Note however, that in SHOT, this is currently not possible for problems with nonbinary integer variables. 3.1
Repairing Infeasibilities in the Dual Problems
For nonconvex problems, cuts added for nonconvex constraints normally tend to make the dual problem infeasible as more and more cuts are added. At this
452
A. Lundell and J. Kronqvist
stage, if a primal solution has been found we can terminate with this solution. One alternative strategy would also be to remove some of the cuts added until the problem is feasible again, but this can eliminate the effect of cuts added in previous iterations and result in cycling. SHOT uses a different approach, however, where the MILP problem is relaxed to restore feasibility. The same approach was successfully used in [8], where an implementation of the ECP algorithm in Matlab was connected to the process simulation tool Aspen to solve simulation based MINLP problems. A similar technique was also used in [7] to determine the feasibility of problems. To find the relaxation needed to restore feasibility, the following MILP problem is solved minimize vT r subject to Ax ≤ a, Bx = b, Ck x + ck ≤ r, r ≥ 0, ∀i ∈ I = {1, 2, . . . , n}, xi ≤ xi ≤ xi ∀i, j ∈ I, i = j. xi ∈ R, xj ∈ Z
(2)
Here the matrix Ck and vector ck contains the cuts added until the current iteration k. The vector r contains the relaxations for restoring feasibility and the vector v is used to individually penalize the relaxations of the different cuts. The main strategy is to penalize the relaxation of the last cuts stronger than the relaxation of the cuts from the early iterations. This will favor a relaxation of the early cuts and reduce the risk of cycling, i.e., first adding the cuts in one iteration and removing them in the next. The penalty terms in SHOT are currently determined as vT = [1, 2, . . . N ], where N is total number of cuts added. After the relaxation problem (2) is solved, the MILP model is modified as Ck x + ck ≤ τ r, where τ > 1 is a parameter to relax the model further. Both CPLEX and Gurobi have functionality built in to find a feasibility relaxation of an infeasible problem, and this functionality is directly utilized in SHOT. As Cbc lacks this functionality, repairing infeasible cuts are currently not supported for this choice of subsolver. Whenever the MILP solver returns with a status of infeasible, the repair functionality of SHOT tries to modify the bounds in the constraints causing the infeasibility. We do not want to modify the linear constraints that originate from the original MINLP problem nor the variable bounds. In the MILP solvers’ repair functionality, it is possible to either minimize the sum of the numerical modifications to the constraint bounds or the number of modifications; here we have used the former. If it was possible to repair the problem, SHOT will continue to solve the problem normally, otherwise SHOT will terminate with the currently best known primal solution (if found). In Fig. 2, we apply the repair functionality to the problem in Example 1. 3.2
Forcing Primal Updates Using a Cutoff Constraint
As seen in Fig. 2, the dual strategy in SHOT can get stuck in suboptimal solutions. To try to force the dual strategy to search for a better solution, a primal
On Solving Nonconvex MINLP Problems with SHOT 2 x2
2 x2
2 x2
1
1
1
453
c2 0
c1 2
4
6
x1 8
0
2
4
6
x1 8
0
2
4
6
x1 8
Fig. 2. The repair functionality is now applied to the infeasible MILP problem illustrated on the left so that the constraint c1 is relaxed and replaced with c2 . Thus, an integer feasible solution (7.8, 1) to the updated MILP problem can be obtained as illustrated in the middle and right figures. 2
2
2 c3
1
1
1 p
0
c2 2
4
6
8
0
p
c2 2
4
6
8
0
2
4
6
8
Fig. 3. A primal cut p is now introduced to the MILP problem in the left figure, which makes the problem infeasible as shown in the middle figure. The previously generated cut c2 is therefore relaxed by utilizing the technique in Sect. 3.1 and replace with c3 to allow the updated MILP problem to have a solution (2.3, 1). This solution is better than the previous primal solution (7.8, 1)! After this, we can continue to generate more supporting hyperplanes to try to find an even better primal solution.
objective cut of the type f (x) ≤ γ is introduced (for a minimization problem). The left-hand-side γ must force a solution to the dual MIP problem that is better than the current best known primal bound PB. Whenever SHOT has found a solution that it believes is the global solution, it modifies the objective cut, so that its right-hand-side is less than the current primal bound. The dual problem is then resolved with the MILP solver. The problem will then either be infeasible (in which case the repair functionality discussed in Sect. 3.1 will try to repair the infeasibility), or a new solution with better objective value will be found. Note however, that this solution does not need to be a new primal solution to the MINLP problem, since it is not required to fulfill the nonlinear constraints, only their linearizations through hyperplane cuts that have been included in the MILP problem. This procedure is then repeated a user-defined number of times. Also, an objective cut update is forced
454
A. Lundell and J. Kronqvist
if a new primal bound has not been found in a specified number of iterations in SHOT. This procedure, applied to Example 1, is exemplified in Fig. 3.
4
Automated Reformulations for Linearizing Special Terms
Certain nonlinear terms in nonconvex MINLP problems can always be written in an exact linear form by introducing new (binary) variables and linear constraints. Depending on whether these are the only nonlinearities or nonconvexities in the problem, the reformulated problem may either be a MILP problem or a convex MINLP, both of which can then be solved to global optimality using SHOT. The nonlinear terms that can currently be automatically reformulated in SHOT are: bilinear terms with one or two integer variables and monomials with only binary variables. Note that there are other, possible more efficient reformulations available than those mentioned here, however at this stage, this has not fully been investigated. Reformulating bilinear terms with at least one binary variable A product xi xj of a binary variable xi and a continuous or discrete variable xj , where 0 ≤ xj ≤ xj ≤ xj , can be exactly represented by replacing the term with the auxiliary variable w and introducing the following linear constraints xj xi ≤ w ≤ xj xi ,
w ≤ xj + xj (1 − xi )
and
w ≥ xj − xj (1 − xi ).
Reformulating bilinear terms of two integers A product c · xi xj , where c is a real coefficient, of two integer variables with bounds 0 ≤ xi ≤ xi ≤ xi and 0 ≤ xj ≤ xj ≤ xj can be exactly represented by replacing the term with an auxiliary variable w. First binary variables bk for each discrete value the xi can assume are introduced. The variables are furthermore constrained by xi
bk = 1,
and
k=xi
xi =
xi
k · bk .
k=xi
It is beneficial to introduce the set of binaries for the variable with smaller domain. Also, depending on whether c is negative or positive one of the following constraints are required: w − k · xj + xi xj · bk ≤ xi xj if c > 0, ∀k = xi , . . . , xi : w − k · xj − xi xj · bk ≤ −xi xj if c < 0. Reformulating monomials of binary variables A monomial term x1 · · · xN , where all xi are binary, can be reformulated into linear form by replacing the term with the auxiliary variable w that assumes the value one if all xi are one and zero otherwise. The relationship between w and xi ’s can be modeled as: bi ≤ w + N − 1. N ·w ≤ i
0
0
11
>100 0.7
carton7
cecil 13
edgecross14-078
gasnet
0
0
0
27
19
tln7
wastepaper4
waterno2 03
69
squfl030-150persp
tln6
2.6
sporttournament20
tln5
9.1
sonetgr17
4.4
0
sonet18v6
0
4.0
sfacloc2 3 80
tln4
>100 >100 1800
radar-3000-10-a-8 lat 7
sssd12-05persp
∞
∞
oil
0
0
0
0
0
0
0
13
0
0
0
0.2
0
0
nous1
1800
1800
608
0.5
0.3
0.9
1800
1800
1800
1800
1094
1800
125
231
1800
0
299
0
12
multiplants mtg1a
1800
1800
49
470
119
1800
0
0
0
0
0
0
0
0
0
1800
161
440
1800
1800
1518
44
1800
15
1096
0
0
0
0
0
0
38
8.5
0
0
0
0
0
0
0
0
0
0
0.2
0.3
0
0
0
0
795
116
648
2.0
12
1.1
1800
1800
15
543
134
43
23
62
14
0
0
7.6
82
1.1
577
0
0
11
1800
1800
>100 0
1800
1.2
1
1800
3
0.7
176
1650
1800
0
0
36
0
0
0
0
1.3
>100 397
9.5
0
0.7
120
1800
9.1
31
30
630 1800
∞ 8.4
1800
0.2
∞
3.0
2.7
207
0
∞
5.0
12
>100 0.7
1.3
3.9 ∞
0.1
∞
3.6
0
∞
5.5
14
0
0.1
0.1
∞ ∞
1.0 0.1
36 ∞
>100 1800
0
0
0
0.1
310
3.9
1800
1800
1800
1800
7.5
0.5
1800
693
1800
2.0
>100 1.2
247
0.5
23
2.3
22
1.0
1800
1800
298
0.1
0.1
0.1
0.1 3.5
0.1
76
3.7
0.6
>100 0.2
>100 10
∞
∞
∞
37
0.3
>100 8.8
20
32
0.1
∞ 26
0.1
2.0
0.8
0.1
0.2
0.1
0.1
∞
∞
9.5
∞
0.7
12
0
0
1800 1800
∞
569 ∞
∞
2.1
Time
DICOPT
Time PG
>100 1800
9.5
46
∞
8.3
∞
4.7
0
∞
0
1.2
PG
>100 1800
7.1
0.1
∞
7.5
0
0.4
∞
∞
1.4
1800
∞ 11
1800
Time
4.7
Time PG
0
37
32
0
0
0
0
0
0
>100 8.3
0
PG
SHOT (nonconvex) SHOT (convex) Bonmin Time DG
>100 >100 1800
24
0
0
7.7
>100 0
0
0
6.5
0
0
PG
BARON Time DG
graphpart 3pm-0444-0444 0
0
0
0
0
0
blend721
0
10
PG
Antigone
DG
autocorr bern25-13
Instance
Table 1. The table shows the relative gaps (in %) as well as solution times (in seconds) for the global solvers Antigone and BARON and the local solvers SHOT, Bonmin and DICOPT. For the global solvers, both the dual (DG) and primal (PG) gaps are shown, and for the local solvers only the primal gap. Since SHOT is able to reformulate some of the instances into MILP form using the reformulations in Sect. 4, the dual gaps are shown in these cases. A zero in the gap columns means that the gap is less than the termination criteria 0.1%, and ‘∞’ that the respective bound was not found. The time limit used was 1800 s. The gaps were calculated with PAVER [3].
On Solving Nonconvex MINLP Problems with SHOT 455
456
5
A. Lundell and J. Kronqvist
Some Numerical Tests
To demonstrate the enhancements to SHOT detailed in the previous sections, we have applied its convex and new nonconvex strategy to some nontrivial instances selected from MINLPLib [17]. More precisely, these are the nonconvex MINLP problems in the benchmark [19]. Additional tln*-instances were also included. SHOT was also compared to the local solvers Bonmin (with its BB-strategy) and DICOPT, as well as the global solvers Antigone and BARON in GAMS 25.1.2. Due to its automatic reformulations, SHOT is for some of the instances a local and for some a global solver. The solvers used default settings except for DICOPT, where the maximum number of cycles was set to 1000 to prevent early termination. The comparison with the global solvers is not entirely fair, since they actually do more work than the local solvers when proving optimality of the solutions. However, we have mainly included them to indicate the difficulty of the test set. The results in Table 1 show that Antigone and BARON are good at finding the global primal solution, but that proving optimality is time-consuming. Bonmin fails to find primal solutions on many instances, and DICOPT seems to terminate too quickly on many problems, which could perhaps have been prevented by increasing the maximum number of cycles further. As expected, the nonconvex strategy in SHOT performs much better than the convex one. Also, the reformulations in Sect. 4 enabled SHOT to find the right primal solution for many instances and SHOT was even faster than BARON and Antigone in some of these.
6
Conclusions
In this paper, some functionality for improving the stability of SHOT for nonconvex MINLP were described. With these modifications, the performance on nonconvex problems is very good compared to the other local MINLP solvers considered. The steps illustrated in this paper are, however, only a starting point for further development, and the goal is to significantly increase the types of MINLP problems that can be solved to global optimality by SHOT. To this end, we intend to include convexification techniques based on lifting reformulations for signomial and general twice-differentiable functions based on [14]. For problems with a low to moderate number of nonlinearities, this might prove to be a viable alternative to spatial branch and bound solvers.
References 1. Belotti, P., Lee, J., Liberti, L., Margot, F., W¨ achter, A.: Branching and bounds tightening techniques for non-convex MINLP. Optim. Methods Softw. 24, 597–634 (2009) 2. Bonami, P., Lee, J.: BONMIN user’s manual. Numer. Math. 4, 1–32 (2007) 3. Bussieck, M.R., Dirkse, S.P., Vigerske, S.: PAVER 2.0: an open source environment for automated performance analysis of benchmarking data. J. Glob. Optim. 59(2), 259–275 (2014)
On Solving Nonconvex MINLP Problems with SHOT
457
4. GAMS: Solver manuals (2018). https://www.gams.com/latest/docs/S MAIN.html 5. Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L., Hendel, G., Hojny, C., Koch, T., L¨ ubbecke, M.E., Maher, S.J., Miltenberger, M., M¨ uller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schl¨ osser, F., Schubert, C., Serrano, F., Shinano, Y., Viernickel, J.M., Walter, M., Wegscheider, F., Witt, J.T., Witzig, J.: The SCIP Optimization Suite 6.0. Technical report, Optimization Online (July 2018) 6. Grossmann, I.E., Viswanathan, J., Vecchietti, A., Raman, R., Kalvelagen, E., et al.: GAMS/DICOPT: a discrete continuous optimization package. GAMS Corporation Inc (2002) 7. Guieu, O., Chinneck, J.W.: Analyzing infeasible mixed-integer and integer linear programs. INFORMS J. Comput. 11(1), 63–77 (1999) 8. Javaloyes-Ant´ on, J., Kronqvist, J., Caballero, J.A.: Simulation-based optimization of chemical processes using the extended cutting plane algorithm. In: Friedl, A., Klemeˇs, J.J., Radl, S., Varbanov, P.S., Wallek, T. (eds.) 28th European Symposium on Computer Aided Process Engineering, Computer Aided Chemical Engineering, vol. 43, pp. 463–469. Elsevier (2018) 9. Kronqvist, J., Bernal, D., Lundell, A., Westerlund, T.: A center-cut algorithm for quickly obtaining feasible solutions and solving convex MINLP problems. Comput. Chem. Eng. (2018) 10. Kronqvist, J., Bernal, D.E., Lundell, A., Grossmann, I.E.: A review and comparison of solvers for convex MINLP. Optim. Eng. 1–59 (2018) 11. Kronqvist, J., Lundell, A., Westerlund, T.: The extended supporting hyperplane algorithm for convex mixed-integer nonlinear programming. J. Glob. Optim. 64(2), 249–272 (2016) 12. Kronqvist, J., Lundell, A., Westerlund, T.: Reformulations for utilizing separability when solving convex MINLP problems. J. Glob. Optim. 1–22 (2018) 13. Lastusilta, T.: GAMS MINLP solver comparisons and some improvements to the AlphaECP algorithm. Ph.D. thesis, ˚ Abo Akademi University (2011) 14. Lundell, A., Westerlund, T.: Solving global optimization problems using reformulations and signomial transformations. Comput. Chem. Eng. (2017). (available online) 15. Lundell, A., Kronqvist, J., Westerlund, T.: The supporting hyperplane optimization toolkit-a polyhedral outer approximation based convex MINLP solver utilizing a single branching tree approach. Preprint, Optimization Online (2018) 16. Lundell, A., Skj¨ al, A., Westerlund, T.: A reformulation framework for global optimization. J. Glob. Optim. 57(1), 115–141 (2013) 17. MINLPLib: Mixed-integer nonlinear programming library (2018). http://www. minlplib.org/. Accessed 27 May 2018 18. Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59(2–3), 503–526 (2014) 19. Mittelmann, H.: Benchmarks for optimization software (2018). http://plato.asu. edu/bench.html. Accessed 28 Jan 2019 20. Westerlund, T., Petterson, F.: An extended cutting plane method for solving convex MINLP problems. Comput. Chem. Eng. 19, 131–136 (1995) 21. Zhou, K., Kılın¸c, M.R., Chen, X., Sahinidis, N.V.: An efficient strategy for the activation of MIP relaxations in a multicore global MINLP solver. J. Glob. Optim. 70(3), 497–516 (2018)
Reversed Search Maximum Clique Algorithm Based on Recoloring Deniss Kumlander(&) and Aleksandr Porošin TalTech, Ehitajate tee 5, 19086 Tallinn, Estonia {deniss.kumlander,aleksandr.porosin}@ttu.ee
Abstract. This work concentrates on finding maximum clique from undirected and unweighted graphs. Maximum clique problem is one of the most known NP-complete problems, the most complex problems of NP class. A lot of other problems can be transformed into clique problem, therefore solving or at least finding a faster algorithm for finding clique will automatically help to solve lots of other tasks. The main contribution of this work is a new exact algorithm for finding maximum clique, which works faster than any currently existing algorithm on a wide variety of graphs. The main idea is to combine a number of efficient improvements from different algorithms into a new one. At first sight these improvements cannot cooperate together, but a new approach of skipping vertices from further expanding instead of pruning the whole branch allows to use all the upgrades at ones. There will be some step-by-step examples with explanations which demonstrate how to use a proposed algorithm. Keywords: Graph theory
Maximum clique
1 Introduction A graph G is a representation of objects, which is a set of vertices V, and a number of relationships between these objects, called edges i.e. a set of edges E. The order of G is a number of vertices in G and the number of edges is called the size of G. Therefore, order is |V| and |E| is equal to size of G. If two vertices u and v are connected to each other they are called adjacent e ¼ uv 2 E ðGÞ and u and v are both incident to e. If e 6¼ uv 2 EðGÞ then u and v are nonadjacent. The number of adjacent vertices or neighbors of a vertex is called vertex degree deg(v). Vertex can be called even or odd if its degree is even or odd. The maximum vertex degree of a graph G is denoted Δ(G). Vertex support is a sum of degrees of all neighbors of a given vertex. Graphs can be divided into directed and undirected. A directed graph D i.e. digraph has non symmetric arcs (directed edge is called arc), which means that vertex u can has relation to vertex v, but there might not be relation from v to u. From the other hand, directed graph always has symmetric relation between two vertices. Moreover graphs are divided to weighted and unweighted. Weight is a number (generally non-negative integer) assigned to each edge and can represent additional property like length of a route, cost, required power, etc. depending on the problem context. On the opposite side, edges of unweighted graph does not have weight or all their weights are equal to one. Loop is an edge that connects a vertex to itself. Simple graph is an undirected © Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 458–467, 2020. https://doi.org/10.1007/978-3-030-21803-4_46
Reversed Search Maximum Clique Algorithm Based on Recoloring
459
graph that does not contain any loops and there is no more than one edge connecting two vertices. It should be noted that in this paper we are studying only unweighted simple graphs. An undirected graph where all the vertices are adjacent to each other is called complete. Otherwise, a graph with no edges is called edgeless, in other words no two vertices are adjacent to each other. A clique is a complete subgraph of a graph G and an independent set is an edgeless subgraph of G. Complement graph G’ of a simple graph G is a graph that has the same vertex set, but the edge set consists only from vertices that are not present G. G0 ¼ ðV; KnEÞ, where K is the edge set consisting from all possible edges. Vertex cover of a graph G is a vertex set such that each edge of G is incident to at least one vertex from this set. Graph coloring is process of assigning labels i.e. colors to vertices with a special property that no two adjacent vertices can share the same color. A color class is a set of vertices containing vertices with the same color. It is clearly seen from coloring property that each color class is nothing more than an independent set. Graph is called k-colorable if it can be colored into k colors. The minimum number of colors required for coloring a graph G is called the chromatic number - v(G) and in this case graph is called k-chromatic. Maximum clique problem – a problem of finding maximum possible complete subgraph of a graph G. Solving maximum clique problem or upgrading algorithms for finding maximum clique will not only improve one specific, narrow problem but help to find better algorithms for all the problems reducible to maximum clique problem.
2 Maximum Clique Algorithms Review 1. Carraghan and Pardalos – (Carraghan and Pardalos 1990) –classical branch and bound algorithm; 2. Östergård’s algorithm (Östergård 2002) – based on back tracking; 3. VColor-BT-u – (Kumlander 2005). The idea is to apply initial vertex coloring basing on the Östergård’s algorithm (Östergård 2002). It operates not with single vertices but with independent sets to find the coloring that is used later to build efficient bounding function. 4. MCQ - MCQ algorithm was firstly introduced in 2003 by Tomita and Seki (Tomita and Seki 2003) and later Tomita and Kameda revised it with more computational experiments in 2007 (Tomita and Kameda 2007). This algorithm bases on the Carraghan and Pardalos idea (Carraghan and Pardalos 1990). Tomita and Seki noted that a number of vertices of a maximum clique w(G) in a graph G = (V,E) is always less or equal to the maximum degree Δ(G) plus 1 ðwðGÞ DðGÞ þ 1Þ. Using this property they reworked an existing pruning formula. MCR - “An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments” article published by Tomita and Kameda in 2007 (Tomita and Kameda 2007) introduced a new MCR algorithm, a successor of MCQ algorithm. Compared to the older version, MCR mainly focused on initial sorting and color numbering. Branch processing i.e. EXPAND function was not changed, so we will spotlight only modified features and skip all the steps inherited from MCQ.
460
D. Kumlander and A. Porošin
5. MCS - Three years later after MCR was released a new improvement for the same algorithm appeared called MCS (Tomita et al. 2010). This time authors focused on approximate coloring enhancements. 6. MCS improved - “Improvements to MCS algorithm for the maximum clique problem” article was released in 2013 by Mikhail Batsyn, Boris Goldengorin, Evgeny Maslov and Panos M. Pardalos (Batsyn et al. 2013). MCSI show very good results on dense graphs using high-quality solution gained by ILS heuristic algorithm.
3 New Algorithm In this part we are going to introduce a new algorithm solving maximum clique problem. It is called VRecolor-BT-u as this algorithm is a successor of VColor-BT-u algorithm and it implements recoloring on each depth. There were multiple algorithms described previously in this work. The idea of a new one is to gather and combine all the gained knowledge to fasten maximum clique finding even more. 3.1
Description
The main idea of a new algorithm is to combine reversed search by color classes (from VColor-BT-u) and in depth coloring i.e. recoloring (from MCQ and successors). Before we can start there should be some useful properties from previous algorithms noted: 1. Reversed search by color classes means searching for a clique in a constantly increasing subgraph adding each color class one by one holding a cache b[] for each color class, where cache is a maximum clique found by given color class. First of all, we consider a subgraph S1 consisting only from vertices of a first color class C1. After than S subgraph S S S2 is created with two color classes C1 and C2. In general Si ¼ C1 C2 . . . Ci . 2. Pruning formula for reversed search by color classes is d 1 þ b½C ðvdi Þ jCBC j can be used only if vertices in each subgraph Si are ordered by initial color classes (using this color classes we are constructing a new subgraph on each iteration). 3. If vertices are ordered by their color numbers and are expanded starting from the largest color number then all the vertices with color number lower than a threshold ðth ¼ jCBC j ðd 1Þ can be ignored as they will not be expanded because of a pruning formula d 1 þ MaxfNo½ pjp 2 Rg jCBC j. 4. Pruning formula d 1 þ MaxfNo½ pjp 2 Rg jCBC j can be used when we are reapplying coloring on each depth and vertices are reordered with response to these colors. From this point it is seen that properties 2 and 4 are conflicting with each other as two pruning formulas require different vertex ordering. As a result if both bounding rules are used we are going to miss some cliques when a promising branch will be pruned. To avoid such situations the formula d 1 þ MaxfNo½ pjp 2 Rg jCBC j was used not
Reversed Search Maximum Clique Algorithm Based on Recoloring
461
to prune a branch but to skip a current vertex as expanding it is not going to give us a better solution. This means that if vertices are recolored on each depth, but are not ordered with response to new colors, we can skip a vertex without expanding it if and only if its color number is lower than a current threshold and there is no neighbors of this vertex with color number larger than threshold and who stand after the bound gained from the first pruning formula d 1 þ b½C ðvdi Þ jCBC j. There is an example on a Fig. 1 that shows how a conflict with two different colorings is solved. Green lines show adjacency of two vertices (not all the adjacent vertices are marked with green lines, but only two that are interesting for us in this specific example). Let us assume that current depth is 2 and we have the following prerequisites: • d = 2 (depth is 2) • jCBC j ¼ 3 (current best clique is 3) • th ¼ 3 ð2 1Þ ¼ 2 (threshold taken from skipping formula, we need to expand vertices having color number bigger than threshold) • b½1 ¼ 1; b½2 ¼ 2; b½3 ¼ 3; b½4 ¼ 3 (cache values from previous iterations) • bnd ¼ 2 (index of a rightmost vertex expanding which a pruning formula d 1 þ b½C ðvdi Þ jCBC j will prune current branch) • Ca – array storing initial color classes, Cb – array storing in depth color classes
Fig. 1. Different coloring conflict in depth example
Let’s analyze the current example (picture 4.1). We start with the rightmost vertex h with in depth color number 1 (No[h] = 1). We skip this vertex as long as its color number is lower than a threshold (th = 2). As you can see vertex h might be contained in a larger clique as it is connected with a vertex r (No[r] = 3), but we skip it anyway because vertex r will be expanded later. Now we proceed with the next vertex t. Color number of t is 1 (No[t] = 1), the same as vertex h has, but in this case it is not possible to skip vertex t, because it is adjacent to vertex k (No[k] = 3). Vertex k stands after the pruning bound (bnd = 2), therefore it will not be expanded at all. If we skip vertex t right now we might possibly skip a larger clique, this means that vertex t should be expanded. The next vertex to analyze is vertex a, we skip it as it’s in depth color number is equal to the threshold (th = No[a] = 2) and there are no adjacent vertex standing after bound. And the last expanded vertex on current depth is r (No[a] = 3) as its color number is larger than the threshold. It should be noted that skipped vertices are not thrown away from further considerations (when building the next depth), they should be stored in a
462
D. Kumlander and A. Porošin
separate array and added to the next depth with preserved order. There is another pruning formula used right after recoloring is done. As we already know number of color classes obtained by coloring subgraph Gd is an upper bound for maximum clique in a current subgraph. This property allows us to use the following pruning formula d 1 þ cn jCBC j, where cn is a number of colors gained from recoloring. 3.2
Coloring Choice Based on Density
There are two coloring algorithms used in VRecolor-BT-u. They are both greedy, but the first one is using swaps when coloring and the other one is not. Each time coloring is applied we need to determine which algorithm to use. Moreover, there are two places where we need to use coloring: initial coloring performed one time at the beginning of the algorithm and in-depth coloring applied each time a new depth is constructed. Coloring algorithm choice is made according to graph density using special constants, they are 0.35 density for initial coloring and 0.55 density for in depth coloring. Constants 0.35 and 0.55 were found using experimental results. 3.3
Algorithm
CBC – current best clique, largest clique found by so far; d – depth; c – index of the currently processed color class; di – index of the currently processed vertex on depth d; b – array to save maximum clique values for each color class; Ca – initial color classes array; Cb – color classes array recalculated on each depth; Gd - subgraph of graph G induced by vertices on depth d; cn – number of color classes recalculated on each depth; CanBeSkipped( vdi ; c) - function that returns true if a vertex can skipped without expanding it. 1. Graph density calculation. If graph density is lower than 35% go to step 2a, else go to step 2b. 2. Heuristic vertex greedy coloring. There should be two arrays created to store initial color classes defined only once (Ca) and color classes recalculated on each depth (Cb). During this step both arrays must be equal. a. Before coloring vertices are unordered and colored with swaps. b. Before coloring vertices are in decreasing order with response to their degree and colored without swaps. 3. Searching. For each color class starting from the first (current color class index c). a. Subgraph (branch) building. Build the first depth selecting all the vertices from color classes whose number c is equal or smaller than current. Vertices from the first color class should stand first. Vertices at the end should belong to c color class. b. Process subgraph. (1) Initialize depth. d = 1. (2) Initialize current vertex. Set current vertex index di to be expanded (initially the first expanded vertex is the rightmost one). di ¼ nd .
Reversed Search Maximum Clique Algorithm Based on Recoloring
463
(3) Bounding rule check. If current branch can possibly contain larger clique than found by so far. If Caðvdi Þ\c and d 1 þ b½Caðvdi Þ jCBC j then prune. Go to step 3.2.7. (4) Vertex skipping check. If current vertex can possible contain larger clique than found by so far. If d 1 þ Cbðvdi Þ jCBC j and CanBeSkipped( vdi ; c) skip this vertex. Decrease index i = i − 1. Go to step 3.2.3. (5) Expand current vertex. Form new depth by selecting all the adjacent vertices (neighbors) to current vertex vdi (Gd þ 1 ¼ N ðvdi Þ). Set the next expanding vertex on current depth di ¼ di 1: (6) New depth analysis. Check if new depth contains vertices. i. If Gd þ 1 ¼ ; then check if current clique is the largest one it must be saved. Go to step 3.3. ii. If Gd þ 1 6¼ ; then check graph density. If graph density is lower than 55% apply greedy coloring with swaps to Gd þ 1 , else use greedy coloring without swaps. Save number of color classes (cn) acquired by this coloring. If number of color classes cannot possibly give us a larger clique then prune. If d 1 þ cn jCBC j decrease index i = i − 1 and go to step 3.2.3, else increase depth d = d + 1. Go to step 3.2.2. (7) Step back. Decrease depth d = d − 1. Delete expanding vertex from the current depth. If d = 0 go to step 3.3, else go to step 3.2.3. (8) Complete iteration. Save current best clique value for this color. b[c] = | CBC|. 4. Return maximum clique. Return CBC. CanBeSkipped function th – threshold from which branch will be pruned; CBC – current best clique, largest clique found by so far; d – depth; c – index of the currently processed color class; di – index of the currently processed vertex on depth d; bnd – bound from which vertices cannot be skipped; b – array to save maximum clique values for each color class; Ca – initial color classes array; Cb – color classes array recalculated on each depth. 1. Define threshold. th ¼ jCBC j d:
2. Find skipping bound. For each vertex index dj from di − 1 to 0. If Ca vdj \c and b Ca vdj th then bnd = j.
3. Decide whether vertex can be skipped. For each adjacent (to currently expanded) vertex with index dj from bnd to 0. If Cb vdj [ th then return false. If Cb vdj [ th had never occurred return true.
4 Results In this part we are going to compare the new algorithm to all the previously described ones. The following algorithms take part in testing: Carraghan and Pardalos, Östergård, VColor-u (Kumlander 2005), VColor-BT-u, MCQ, MCR, MCS, MCS Improved and
464
D. Kumlander and A. Porošin
VRecolor-BT-u. The first part of tests is devoted to randomly generated graphs. These random tests give a general overview of algorithms performance and therefore whether a new algorithm is worth to be used for clique finding. All test cases are divided by graphs density and for each density different algorithms are being tested. Algorithms that perform much worse compared to others are removed from test results. The second part contains analysis of algorithm results of DIMACS instances. Each DIMACS graph has a special structure with response to some specific real problem. Four algorithms were tested with this benchmark: MCS, MCSI, VColor-BT-u and VRecolor-BT-u. 4.1
Generated Test Results/Random Graphs
Next Figs. 2, 3 and 4 demonstrate that VRecolor-BT-u consumes the least amount of time than the fastest of the rest algorithms on sparse graphs where density is lower than 40%. On graphs where density is very low (about 10%) basic algorithms (Carraghan and Pardalos, Östergård) show really good results as they doesn’t perform any additional operations like coloring, searching for initial solution, reordering and so on.
Time (ms)
8800 Ostergard
6800
VColorBtu
4800
Mcq
2800
Mcr
800 800
880
960
1040
1120
1200
VRecolorBtu
Number of verƟces
Fig. 2. Randomly generated graphs test. Density 30%.
13900
Time (ms)
11900 9900
Mcq
7900
Mcr
5900
Mcs
3900
Mcsi
1900 400
420
440
460
480
500
Number of verƟces Fig. 3. Randomly generated graphs test. Density 50%.
VRecolorBtu
Reversed Search Maximum Clique Algorithm Based on Recoloring
465
Time (ms)
21000 16000 Mcr 11000
Mcs
6000
Mcsi VRecolorBtu
1000 125
130
135
140
145
150
Number of verƟces Fig. 4. Randomly generated graphs test. Density 90%.
Basic pruning formulas are really effective on such small density. Although VRecolor-BT-u outperforms them proving that skipping technique gives overall positive impact even with a fact that algorithm needs to spend time for coloring and proving that a vertex can be skipped. On densities from 20% to 40% the closest to VRecolor-BT-u are results of MCQ and MCR but the new algorithm performs about 20–25% faster. Note that some algorithms are missing on certain densities: we filter out those, which performance is exceptionally bad. Based on randomly generated graph results we can conclude with the following statements: • Graphs with densities lower than 50% are best solved using VRecolor-BT-u algorithm • When graphs density is about 50%, there are three algorithms MCQ, MCR and VRecolor-BT-u that are the fastest, but time consumption fluctuates a bit compared to each other • If density of graph lies between 55% and 75%, then VRecolor-BT-u algorithm is a best choice • For dense graphs with density more than 75% MCS Improved is fastest algorithm. 4.2
DIMACS Test Results
Here we test four algorithms on DIMACS graph instances: MCS, MCSI, VColor-BT-u and VRecolor-BT-u. MCS and MCSI were chosen for testing because they demonstrated the best results on DIMACS instances of all modern algorithms. VColor-BT-u is a predecessor of VRecolor-BT-u and is the best candidate to be compared with a new algorithm. In general DIMACS instances test proves results gained from randomly generated graphs testing. VRecolor-BT-u algorithm works better on densities lower than 75% (See Table 1).
466
D. Kumlander and A. Porošin Table 1. DIMACS graphs results. Time consumption (ms). Graph
Size Density Time (ms) MCS MCSI c-fat500-1 500 0.04 44 229 c-fat500-10 500 0.37 190 175 c-fat500-2 500 0.07 27 112 c-fat500-5 500 0.19 62 100 gen200_p0.9_44 200 0.90 3867 2103 gen200_p0.9_55 200 0.90 8988 98 hamming10-2 1024 0.99 496 50636 hamming6-2 64 0.90 0 1 hamming6-4 64 0.35 0 1 hamming8-2 256 0.97 10 38 hamming8-4 256 0.64 633 654 johnson16-2-4 120 0.76 702 800 johnson8-2-4 28 0.56 0 0 johnson8-4-4 70 0.77 1 3 keller4 171 0.65 85 97 MANN_a27 378 0.99 7201 291385 MANN_a9 45 0.93 0 2 p_hat1000-1 1000 0.24 2592 2788 p_hat300-1 300 0.24 35 78 p_hat300-2 300 0.49 108 109 p_hat300-3 300 0.74 17796 9200 p_hat500-1 500 0.25 110 198 p_hat500-2 500 0.50 4816 2613 p_hat700-1 700 0.25 386 453 san1000 1000 0.50 7270 4989 san200_0.7_1 200 0.70 18 39 san200_0.7_2 200 0.70 16 45 san200_0.9_1 200 0.9 2166 38 san200_0.9_2 200 0.9 265 35 san400_0.5_1 400 0.5 57 341
VColor-BT-u 6 18 4 6 140082 3650 1290 0 0 22 3 244 1 0 133 68105 0 3540 15 464 161323 139 24391 239 410 889338 3 250 2828 42
VRecolor-BT-u 2 136 1 11 21045 2276 61271 1 0 245 14 581 0 0 73 10231 0 2046 12 238 16421 91 8539 236 945 1819 5 51 1402 29
5 Summary The main topic of this study was to develop a new improved algorithm for maximum clique finding on undirected, unweighted graphs. The new maximum clique algorithm called VRecolor-BT-u is demonstrated. This algorithm is constructed on the basis of reversed search by color classes. The main idea is to apply coloring on each depth to preserve the most up-to-date color classes and combine updated vertex colors with the reversed search approach. At the first sight the idea of in depth recoloring might be
Reversed Search Maximum Clique Algorithm Based on Recoloring
467
unclear as reversed search is built around initial color classes, but introduction of a new skipping technique instead of pruning allows avoiding this conflict. Furthermore, there are two different greedy coloring algorithms (with swaps and without swaps) used for initial and in-depth coloring. Experimentally gained constants, which depend on graph density, determine which coloring is applied. The new algorithm shows the best results on the random graphs with low density and loses only on dense graphs to MCS and MCSI algorithms specially designed for high densities. VRecolor-BT-u produces less branches that its predecessor for all the DIMACS instances, but there are some cases where the new algorithm consumes more time. Decreasing branch number resulting in a performance degradation might be misleading at a glance, but can be described with a simple fact that on some special cases additional in depth recoloring consumes a lot of time while skipping technique is practically not working. In the result we have a slightly lower branch number but increased time consumption. Finally, it was noted that each graph should be solved by a different algorithm with response to graphs density. On the low to mid densities it is advised to use VRecolor-BT-u algorithm while the best option for dense graphs is MCS Improved algorithm.
References Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NpCompleteness. Freeman, New York (2003) Cook, S.A.: The complexity of theorem proving procedures. In: Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, pp. 151–158 (1971) Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Oper. Res. Lett. 9, 375–382 (1990) Östergård, P.R.J.: A fast algorithm for the maximum clique problem. Discret. Appl. Math. 120, 197–207 (2002) Kumlander, D.: Some Practical Algorithms to Solve the Maximum Clique Problem. Tallinn University of Technology, Tallinn (2005) Clarkson, K.: A modification to the greedy algorithm for vertex cover. Inf. Process. Lett. 16(1), 23–25 (1983) Andrade, D.V., Resende, M.G.C., Werneck, R.F.: Fast local search for the maximum independent set problem. J. Heuristics 18(4), 525–547 (2012) Tomita, E., Seki, T.: An efficient branch-and-bound algorithm for finding a maximum clique. In: Proceedings of the 4th International Conference on Discrete Mathematics and Theoretical Computer Science, DMTCS’03. Springer-Verlag, Berlin, Heidelberg, pp. 278–289 (2003) Tomita, E., Kameda, T.: An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments. J. Glob. Optim. 37(1), 95–111 (2007) Tomita, E., Sutani, Y., Higashi, T., Takahashi, S., Wakatsuki, M.: A simple and faster branchand-bound algorithm for finding a maximum clique. In: Proceedings of the 4th International Conference on Algorithms and Computation, WALCOM’10. Springer-Verlag, Berlin, Heidelberg, pp. 191–203 (2010) Batsyn, M., Goldengorin, B., Maslov, E., Pardalos, P.M.: Improvements to MCS Algorithm for the Maximum Clique Problem. Springer Science+Business Media, New York (2013)
Sifting Edges to Accelerate the Computation of Absolute 1-Center in Graphs Wei Ding1(B) and Ke Qiu2 1
2
Zhejiang University of Water Resources and Electric Power, Hangzhou 310018, Zhejiang, China [email protected] Department of Computer Science, Brock University, St. Catharines, ON, Canada [email protected]
Abstract. Given an undirected connected graph G = (V, E, w), where V is the set of n vertices, E is the set of m edges and each edge e ∈ E has a positive weight w(e) > 0, a subset T ⊆ V of p terminals and a subset E ⊆ E of candidate edges, the absolute 1-center problem (A1CP) asks for a point on some edge in E to minimize the distance from it to T . We prove that a vertex 1-center (V1C) is just an absolute 1-center (A1C) if the all-pairs shortest paths distance matrix from the vertices covered by the edges in E to T has a (global) saddle point. Furthermore, we define the local saddle point of an edge and conclude that the candidate edge having a local saddle point can be sifted. By combining the tool of sifting edges with the framework of Kariv and Hakimi’s algorithm, we design an O(m + pm∗ + np log p)-time algorithm for A1CP, where m∗ is the number of the remaining candidate edges. Applying our algorithm to the classic A1CP takes O(m + m∗ n + n2 log n) time when the distance matrix is known and O(mn + n2 log n) time when the distance matrix is unknown, which are smaller than O(mn + n2 log n) time and O(mn + n3 ) time of Kariv and Hakimi’s algorithm, respectively.
Keywords: Absolute 1-center
1 1.1
· Sifting edges · Saddle point
Introduction Previous Results
Let G = (V, E, w) be an undirected connected graph, where V is the set of n vertices, E is the set of m edges, and w : E → R+ is a positive weight function on edges. The vertex 1-center problem (V1CP) aims to find a vertex of G, called a vertex 1-center (V1C), to minimize the longest distance from it to all the other vertices. The V1CP is tractable and the whole computation is dominated by finding all-pairs shortest paths in G, which can be done by using Fredman and Tarjan’s O(mn + n2 log n)-time algorithm [3], or O(m∗ n + n2 log n)-time c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 468–476, 2020. https://doi.org/10.1007/978-3-030-21803-4_47
Sifting Edges to Accelerate the Computation of Absolute
469
algorithm by Karger et al. [7] where m∗ is the number of edges used by shortest paths, or Pettie’s O(mn + n2 log log n)-time algorithm [9]. The classic absolute 1-center problem (A1CP) asks for a point on some edge of G, called an absolute 1-center (A1C), to minimize the longest distance from it to all the vertices of G. The A1CP was proposed by Hakimi [4], who showed that an A1C of a vertex-unweighted graph G must be at either one of 1 2 n(n − 1) break points or one vertex of G. The A1CP admits polynomial-time exact algorithms. For example, Hakimi et al. [5] presented an O(mn log n)-time algorithm. In [8], Kariv and Hakimi first devised a linear-time subroutine to compute a local center on every edge and then found an A1C by comparing all the local centers. As a result, they developed an O(mn+n2 log n)-time algorithm when the all-pairs shortest paths distance matrix is known and an O(mn + n3 )time algorithm when the distance matrix is unknown. The A1CP in vertexunweighted trees admits a linear-time algorithm. Moreover, the A1CP in vertexweighted graphs admits an O(mn2 log n)-time algorithm by Hakimi et al. [5] and Kariv and Hakimi’s O(mn log n)-time algorithm [8]. For A1CP in vertexweighted trees, Kariv and Hakimi designed an O(n log n)-time algorithm [8]. For more results on A1CP, we refer readers to [2,10] and references listed therein. The A1CP has applications in the design of minimum diameter spanning subgraph [1,6]. 1.2
Our Results
In this paper, we consider the generalized version of the classic A1CP, formally defined in Sect. 2, where we are asked to find an absolute 1-center from the given subset of candidate edges with a goal of minimizing the longest distance from the center to the given subset of terminals. Without otherwise specified, A1CP is referred to as the generalized version in the remainder of this paper. First, we prove that a V1C is just an A1C if the all-pairs shortest paths distance matrix from the vertices covered by candidate edges to terminals has a (global) saddle point. Next, we introduce the definition of the local saddle point of edge, and prove that the local center on one edge can be reduced to one of its endpoints if the edge has no local saddle point. In other words, the candidate edges that have a local saddle point can be sifted. Moreover, we combine the tool of sifting edges with the framework of Kariv and Hakimi’s algorithm to design an O(m + pm∗ + np log p)-time algorithm for A1CP, where m∗ is the number of the remaining candidate edges. Applying our algorithm to the classic A1CP takes O(m + m∗ n + n2 log n) time when the distance matrix is known as well as O(mn + n2 log n) time when the matrix is unknown, which reduces O(mn + n2 log n) time and O(mn + n3 ) time of Kariv and Hakimi’s algorithm to some extent, respectively. Organization. The rest of this paper is organized as follows. In Sect. 2, we define the notations and A1CP formally. In Sect. 3, we show the fundamental properties which form the basis of our algorithm. In Sect. 4, we present our algorithm and apply it to the classic A1CP. In Sect. 5, we conclude this paper.
470
2
W. Ding and K. Qiu
Definitions and Notations
Let G = (V, E, w) be an undirected connected graph with n vertices and m edges. All the vertices are labelled by 1, 2, . . . , n in sequence, and all the edges in E are labelled by 1, 2, . . . , m. For every 1 ≤ i ≤ n, we let vi denote the vertex with index i. For every 1 ≤ k ≤ m, we let ek denote the edge with index k, denote its weight by wk , and denote by e1k and e2k its two endpoints. Let τ (v) be the index of v, for any v ∈ V . For every (i, j) ∈ {1, 2, . . . , n}2 , we denote by π ∗ (vi , vj ) the vi -to-vj shortest path in G, and denote by d(vi , vj ) the vi -to-vj shortest path distance (SPD), i.e., d(vi , vj ) = e∈π ∗ (vi ,vj ) w(e). Let D = (di,j )n×n be the all-pairs shortest paths distance matrix of G. Note that di,j = d(vi , vj ), ∀i, j, and so D is an n × n symmetric matrix. Let I and J be two subsets of {1, 2, . . . , n}, and D(I, J ) be an |I| × |J | sub-matrix of D composed of the elements of D at all the crossings of the rows with indices in I and the columns with indices in J . For any I, J ⊆ {1, 2, . . . , n}, 2 ≤ |I|, |J | ≤ n, a (global) saddle point of D(I, J ) is referred to as an index pair (i , j ) ∈ I × J such that di ,j ≤ di ,j ≤ di,j ,
∀i ∈ I \ {j }, j ∈ J \ {i }.
(1)
Let T ⊆ {1, 2, . . . , n} be the index set of the given terminals and S ⊆ {1, 2, . . . , n} be the index set of the given candidate vertices. We also use T and S to denote the set of terminals and candidate vertices, respectively. For every i ∈ S, the distance from vi to T , φ(vi , T ), is referred to as the maximum of all the vi -to-vj SPD’s, j ∈ T \ {i}, i.e., φ(vi , T ) = max d(vi , vj ), j∈T \{i}
∀i ∈ S.
(2)
The vertex 1-center problem (V1CP) aims to find a vertex vi∗ , called a vertex 1-center (V1C), from S to minimize the value of φ(vi , T ). So, φ(vi∗ , T ) = min φ(vi , T ). i∈S
(3)
Let Pk be the set of continuum points on edge ek , for any 1 ≤ k ≤ m, and P(G) be the set of continuum points on all the edges of G. For any point p ∈ P(G), if p is a vertex then τ (p) also denotes the index of vertex p and τ (p) is empty otherwise. Let E ⊆ {1, 2, . . . , m} be the index set of the given candidate edges and also let E denote the set of candidate edges. Let PE be the set of continuum points on all the edges in E. Clearly, PE = Pk . (4) k∈E
For every point p ∈ PE , the distance from p to T , φ(p, T ), is referred to as the maximum of all the p-to-vj SPD’s, j ∈ T \ {τ (p)}, i.e., φ(p, T ) =
max
j∈T \{τ (p)}
d(p, vj ),
∀p ∈ PE .
(5)
Sifting Edges to Accelerate the Computation of Absolute
471
The absolute 1-center problem (A1CP) asks for a point p∗ , called a absolute 1-center (A1C), from PE to minimize the value of φ(p, T ). So, φ(p∗ , T ) = min φ(p, T ).
(6)
p∈PE
Let SE be the index set of vertices covered by the edges in E, i.e., SE = {τ (e1k ), τ (e2k )|k ∈ E}.
(7)
Also, SE denotes the corresponding subset of vertices. Let I = SE and J = T , then we extract D(SE , T ) from D. Let I represent an instance of A1CP as follows: given an undirected connected graph G = (V, E, w), a subset T ⊆ {1, 2, . . . , n} of terminals, a subset E ⊆ {1, 2, . . . , m} of candidate edges. By substituting E = SE into I, we get an instance σ(I) of V1CP induced by I. Let opt(I) and opt(σ(I)) be an optimum to I and σ(I), respectively. Specifically, we let Ik represent the special instance of A1CP where E = {k}, 1 ≤ k ≤ m, and Sk denote the corresponding index set, i.e., Sk = {τ (e1k ), τ (e2k )}. For every j ∈ T , the distance from PE to vj , ψ(vj , PE ), is referred to as the minimum of all the p-to-vj SPD’s over p ∈ PE \ {vj }. Let vj ∗ be the terminal maximizing the value of ψ(vj , PE ). We have ψ(vj , PE ) =
min
p∈PE \{vj }
d(p, vj ),
∀j ∈ T ,
(8)
and ψ(vj ∗ , PE ) = max ψ(vj , PE ). j∈T
(9)
Similarly, ψ(vj , Pk ) = minp∈Pk \{vj } d(p, vj ), ∀k ∈ E. Let ψ(vj , SE ) denote the distance from SE to vj , and vj ∗∗ be the terminal which maximizes the value of ψ(vj , SE ). We have ψ(vj , SE ) =
min
i∈SE \{j}
d(vi , vj ),
∀j ∈ T ,
(10)
and ψ(vj ∗∗ , SE ) = max ψ(vj , SE ). j∈T
3
(11)
Fundamental Properties
In this section, we show several important properties which form the basis of our algorithm. Lemma 1. φ(vi∗ , T ) ≥ φ(p∗ , T ). Proof. It follows directly from Eqs. (3), (6) and SE ⊂ PE .
472
W. Ding and K. Qiu
Lemma 2. For every k ∈ E and j ∈ T , it always holds that ψ(vj , Pk ) ≥ min{d(e1k , vj ), d(e2k , vj )|e1k , e2k = vj }.
(12)
Lemma 3. ψ(vj ∗∗ , SE ) = ψ(vj ∗ , PE ). Lemma 4. φ(p∗ , T ) ≥ ψ(vj ∗ , PE ). Lemma 5. ψ(vj ∗∗ , SE ) = φ(vi∗ , T ) iff D(SE , T ) has a saddle point. Theorem 1. If D(SE , T ) has a saddle point, then φ(p∗ , T ) = φ(vi∗ , T ). Proof. When D(SE , T ) has a saddle point, we conclude from Lemmas 3, 4 and 5 φ(vi∗ , T ) = ψ(vj ∗∗ , SE ) = ψ(vj ∗ , PE ) ≤ φ(p∗ , T ). Together with Lemma 1, we get φ(p∗ , T ) = φ(vi∗ , T ).
Immediately, we obtain the following corollary. Corollary 1. Given an instance I of A1CP, an optimum to the induced instance σ(I) of V1CP is also an optimum to I provided that D(SE , T ) has a (global) saddle point.
4
A Faster Algorithm for A1CP
In this section, we first introduce a new definition, local saddle point of an edge, and then employ the tool of sifting edges that have a local saddle point to speed up Kariv and Hakimi’s algorithm [8] for the classic A1CP. 4.1
Sifting Edges
For every k ∈ E, the indices of two endpoints of edge ek are τ (e1k ) and τ (e2k ). We extract the τ (e1k )-th and τ (e2k )-th rows of D(SE , T ) to obtain a 2 × |T | submatrix D(Sk , T ). A local saddle point of ek is referred to as the endpoint of ek with index ik such that there exists an index pair (ik , jk ) ∈ Sk × T satisfying that (13) dik ,j ≤ dik ,jk ≤ di,jk , ∀i ∈ Sk \ {j }, j ∈ T \ {i }. Let E0 denote the index set of the candidate edges having no local saddle point and E1 denote the index set of ones having a local saddle point. Clearly, E = E0 ∪ E1 . By Eq. (13), we obtain the following lemma. Lemma 6. For every k ∈ E, ek has a local saddle point iff D(Sk , T ) has a saddle point. Theorem 2. Given an instance I of A1CP, if ek , k ∈ E, has a local saddle point, then ek can be replaced with its two endpoints, e1k and e2k .
Sifting Edges to Accelerate the Computation of Absolute
473
By Theorem 2, we only need to select e1k and e2k as the candidate vertices of A1C when ek , k ∈ E, has a local saddle point. So, Corollary 2 follows. Corollary 2. Given an instance I of A1CP and for every k ∈ E, an optimum to the induced instance σ(Ik ) of V1CP is also an optimum to Ik provided that ek has a local saddle point. Furthermore, if ek , k ∈ E, has a local saddle point for all k ∈ E, then we only need to select the endpoints of all the candidate edges as the candidate vertices of A1C. So, Corollary 3 follows. Corollary 3. Given an instance I of A1CP, an optimum to the induced instance σ(I) of V1CP is also an optimum to I provided that every ek , k ∈ E, has a local saddle point. 4.2
The Algorithm
It is well known that Kariv and Hakimi’s algorithm is the most popular one for the classic A1CP. The fundamental framework of their algorithm is based on the result that there is surely a classic A1C in the union of all the vertices of the graph and the local centers of all the edges. By Corollaries 2 and 3, we claim for A1CP that those candidate edges having a local saddle point can be sifted but only their endpoints remain as the candidate vertices of A1C. In other words, the edges in E1 can be omitted. In this subsection, we combine the framework of Kariv and Hakimi’s algorithm with the tool of sifting candidate edges to design a fast algorithm, named AlgA1CP, for A1CP, which consists of three stages. Algorithm (AlgA1CP) for A1CP: Input: an instance I of A1CP and distance matrix D; Output: an A1C, p∗ . 01: Use DFS to traverse G to get SE , compute φ(vi , T ) 02: and record ji∗ ← arg maxj∈T d(vi , vj ) for each i ∈ SE ; 03: i∗ ← arg mini∈SE φ(vi , T ); // (the index of V1C) 04: E0 ← ∅; SE0 ← ∅; 05: for every k ∈ E do 06: Determine whether or not ek has a local saddle point; 07: if ek has no local saddle point then 08: E0 ← E0 ∪ {k}; SE0 ← SE0 ∪ {τ (e1k ), τ (e2k )}; 09: endif 10: endfor 11: if E0 = ∅ then 12: Return vi∗ ; // (i.e., p∗ = vi∗ ) 13: else 14: for every i ∈ SE0 do 15: Sort d(vi , vj ), j ∈ T in a nonincreasing order; 16: endfor 17: for every k ∈ E0 do
474
W. Ding and K. Qiu
18: Use Kariv and Hakimi’s subroutine [8] to find 19: a local center p∗k on ek , and record φ(p∗k , T ); 20: endfor 21: k ∗ ← arg mink∈E0 φ(p∗k , T ); 22: if φ(vi∗ , T ) ≤ φ(p∗k∗ , T ) then Return vi∗ ; 23: else Return p∗k∗ ; endif // (i.e., p∗ = p∗k∗ ) 24: endif First of all, we recall and introduce some useful definitions and notations. For every 1 ≤ k ≤ m, a local center on ek , p∗k , is referred to as a point on ek such that the value of φ(p, T ) is minimized, i.e., φ(p∗k , T ) = minp∈Pk φ(p, T ). An optimal local center, p∗k∗ , is referred to as the one that minimizes the value of φ(p∗k , T ), i.e., φ(p∗k∗ , T ) = mink∈E0 φ(p∗k , T ). Note that k ∗ is the index of the candidate edge in E0 that has an optimal local center. Moreover, for every i ∈ SE0 , we let ji∗ be the index of the terminal that maximizes the value of d(vi , vj ), i.e., d(vi , vji∗ ) = maxj∈T d(vi , vj ). In the first stage (lines 1–3 in AlgA1CP), it takes O(m) time to obtain SE by using depth first search (DFS) to traverse G. When the distance matrix D is known, it takes O(|T |) time to compute φ(vi , T ) (i.e., find the maximum element on the i-th row of D) and record the column index ji∗ , for every i ∈ SE . Then, it takes O(|SE |) time to find a V1C, i.e., the vertex index i∗ such that the value of φ(vi , T ) is minimized. So, the time cost of the first stage is O(m + |SE ||T |). In the second stage (lines 4–10 in AlgA1CP), the major task is to record E0 and SE0 . So, we need to determine whether or not ek has a local saddle point, for every k ∈ E. It is implied by Lemma 6 that our practice can be used to determine whether or not D(Sk , T ) has a saddle point. In detail, we determine that D(Sk , T ) has a saddle point if d(vτ (e1k ) , vj ∗ 1 ) ≤ d(vτ (e2k ) , vj ∗ 1 ) or d(vτ (e2k ) , vj ∗
τ (e2 ) k
) ≤ d(vτ (e1k ) , vj ∗
τ (e2 ) k
τ (e ) k
τ (e ) k
) and it has no saddle point otherwise. Such
a decision takes O(1) time. Accordingly, the update of E0 and SE0 also takes O(1) time. So, the time cost of the second stage is O(|E|). In the third stage (lines 11–24 in AlgA1CP), we find a local center p∗k on ek for every k ∈ E0 and then determine the optimal local center p∗k∗ . By comparing φ(vi∗ , T ) and φ(p∗k∗ , T ), we obtain an A1C, p∗ . The main body of this stage is to compute p∗k∗ . First, it takes O(|SE0 ||T | log |T |) time to sort d(vi , vj ), j ∈ T for all i ∈ SE0 . Next, it takes O(|T |) time to apply Kariv and Hakimi’s subroutine to the candidate edge ek to get p∗k , for every k ∈ E0 . So, it takes O(|E0 ||T |) time to compute all p∗k , k ∈ E0 and then determine p∗k∗ . So, the time cost of the third stage is O(|E0 ||T | + |SE0 ||T | log |T |). The above three stages form our algorithm AlgA1CP for A1CP. Therefore, the total time cost of AlgA1CP is O(m + (|SE | + |E0 |)|T | + |SE0 ||T | log |T |).
(14)
Sifting Edges to Accelerate the Computation of Absolute
475
Since |SE | ≤ n and |SE0 | ≤ min{2|E0 |, n} ≤ n, we conclude that the total time cost of AlgA1CP is at most O(m + |E0 ||T | + n|T | log |T |).
(15)
Theorem 3. Given an instance I of A1CP with G having n vertices and m edges and |T | = p, AlgA1CP can find an A1C in at most O(m + pm∗ + np log p) time, where m∗ is the number of candidate edges having no local saddle point, when the distance matrix is known. Now, we consider the special case of A1CP, where all the candidate edges have a local saddle point, i.e., E0 = ∅. The time cost of applying AlgA1CP to this special case is obtained by substituting |SE0 | = |E0 | = 0 and |SE | ≤ n into Eq. (14), see Corollary 4. Corollary 4. Given an instance I of A1CP with G having n vertices and m edges and |T | = p, AlgA1CP can find an A1C in at most O(m + pn) time provided that all the candidate edges have a local saddle point, when the distance matrix is known. 4.3
Application
The classic A1CP is the special case of A1CP studied in this paper, where T = {1, 2, . . . , n} and E = {1, 2, . . . , m}. Therefore, when the distance matrix is known, we substitute p = n into Theorem 3 to obtain the time cost of applying AlgA1CP to the classic A1CP, see Theorem 4. Moreover, when the distance matrix is unknown, we additionally use Pettie’s O(mn + n2 log log n)-time algorithm [9] to get the matrix. Theorem 4. Given an undirected connected graph G = (V, E, w) with n vertices and m edges, AlgA1CP can find a classic A1C in at most O(m + m∗ n + n2 log n) time when the distance matrix is known as well as in at most O(mn + n2 log n) time when the distance matrix is unknown, where m∗ is the number of edges having no local saddle point. Next, we consider the special case of the classic A1CP where all the edges have a local saddle point. We substitute p = n into Corollary 4 to obtain the time cost of applying AlgA1CP to this special case. Similarly, when the distance matrix is unknown, we use Pettie’s algorithm to get it. Corollary 5. Given an undirected connected graph G = (V, E, w) with n vertices and m edges, AlgA1CP can find a classic A1C in at most O(n2 ) time when the distance matrix is known as well as in at most O(mn + n2 log log n) time when the distance matrix is unknown provided that all the edges have a local saddle point.
476
5
W. Ding and K. Qiu
Conclusions
This paper studies the (generalized) A1CP in an undirected connected graph. We examine an important property that if the distance matrix has a saddle point then all the candidate edges can be sifted and only their endpoints remain as the candidate vertices of A1C (i.e., a V1C is just an A1C), and further conclude that every candidate edge having a local saddle point can be sifted. Based on this property, we combine the tool of sifting edges with Kariv and Hakimi’s subroutine for finding a local center of edge to design a faster exact algorithm for the classic A1CP, which reduces O(mn + n2 log n) time of Kariv and Hakimi’s algorithm to O(m + m∗ n + n2 log n) time, where m∗ is the number of edges that have no local saddle point, when the distance matrix is known as well as O(mn + n3 ) time of it to O(mn + n2 log n) time when the matrix is unknown, respectively. In this paper, we determine separately whether or not every candidate edge can be sifted (i.e., has a local saddle point). This is a straightforward way. It remains as a future research topic how to find a faster method of sifting candidate edges to a larger extent.
References 1. Ding, W., Qiu, K.: Algorithms for the minimum diameter terminal steiner tree problem. J. Comb. Optim. 28(4), 837–853 (2014) 2. Eiselt, H.A., Marianov, V.: Foundations of Location Analysis. Springer, Heidelberg (2011) 3. Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987) 4. Hakimi, S.L.: Optimum locations of switching centers and the absolute centers and medians of a graph. Oper. Res. 12(3), 450–459 (1964) 5. Hakimi, S.L., Schmeichel, E.F., Pierce, J.G.: On p-centers in networks. Transport. Sci. 12(1), 1–15 (1978) 6. Hassin, R., Tamir, A.: On the minimum diameter spanning tree problem. Info. Proc. Lett. 53(2), 109–111 (1995) 7. Karger, D.R., Koller, D., Phillips, S.J.: Finding the hidden path: time bounds for all-pairs shortest paths. SIAM J. Comput. 22(6), 1199–1217 (1993) 8. Kariv, O., Hakimi, S.L.: An algorithmic approach to network location problems. I: the p-centers. SIAM J. Appl. Math. 37(3), 513–538 (1979) 9. Pettie, S.: A new approach to all-pairs shortest paths on real-weighted graphs. Theor. Comp. Sci. 312(1), 47–74 (2004) 10. Tansel, B.C., Francis, R.L., Lowe, T.J.: Location on networks: a survey. Part I: the p-center and p-median problems. Manag. Sci. 29(4), 482–497 (1983)
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family Adriano Delfino(B) UTFPR - Universidade Tecnol´ ogica Federal do Paran´ a, Pato Branco, Brazil [email protected]
Abstract. In this work we describe a good approach to solve chanceconstrained programming with mixed-integer variables. We replace a hard chance constrained function by a copula. We prove that Zhang’s copula family satisfies the proprieties request by outer-approximation and we use this algorithm to solve this problem with promising results. Keywords: Mixed-integer programming · Chance-constrained programming · Copula
1
Introduction
In recent years, the stochastic programming community have been witnessed a great development in optimization methods for dealing with stochastic programs with mixed-integer variables [2]. However, there are only few works on chanceconstrained programming with mixed-integer variables [1,3,12,13]. In this work, the problem of interest consists in nonsmooth convex mixed-integer nonlinear programs with chance constraints (CCMINLP). These class of problems for instance, can be solved by employing the outer-approximation technique. In general, OA algorithms require solving less MILP subproblems than extended cutting-plane algorithms [14], therefore the former class of methods is preferable than the latter one. This justifies why we have chosen the former class of methods to deal with problems of the type min(x,y)∈X×Y f0 (x, y) s.t. fi (x, y) ≤ 0, i = 1, . . . , mf − 1 P [g(x, y) ≥ ξ] ≥ p,
(1)
where the functions fi : IRnx × IRny → IR , i = 0, . . . , mf − 1, are convex but possibly nonsmooth, g : IRnx × IRny → IRm is a concave function, X ⊂ IRnx is a polyhedron, Y ⊂ ZZ ny contains only integer variables and both X and Y are compacts sets. Furthermore, ξ ∈ IRm is the random vector, P is the probability measure associated to the random vector ξ and p ∈ (0, 1) is a given parameter. We assume that P is a 0−concave distribution (thus P is α−concave for all α ≤ 0). Some examples of distribution functions that satisfies the 0−concavity c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 477–487, 2020. https://doi.org/10.1007/978-3-030-21803-4_48
478
A. Delfino
property are the well-known multidimensional Normal, Log-normal, Gamma and Dirichlet distributions [9]. Under these assumptions, the following function is convex (2) fmf (x, y) = log(p) − log(P [g(x, y) ≥ ξ]). As a result, (1) is a convex (but possibly nonsmooth) MINLP problem fitting usually notation: fmin :=
min
(x,y)∈X×Y
f0 (x, y) s.t. fi (x, y) ≤ 0 , i ∈ Ic := {1, . . . , mf } .
(3)
Due to the probability function P [g(x, y) ≥ ξ], evaluating the constraint (2) and computing its subgradient is a difficult task: for instance, if P follows a multivariate normal distribution, computing a subgradient of P [g(x, y) ≥ ξ] requires numerically solving m integrals of dimension m − 1. If the dimension m of ξ is too large, then creating a cut for function log(p) − log(P [g(x, y) ≥ ξ]) is computationally challenging. In this situation, it makes sense to replace the probability measure by a simpler function. In this manner, this work proposes to approximate the hard chance constraint P [g(x, y) ≥ ξ] ≥ p by a copula C: C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y)) ≥ p.
(4)
In addition to the difficulties present in MINLP models, we recall that the constraint function (4) can be nondifferentiable. Our main contribution in this paper was to prove that the Zhang’s copula family is log concave and with this result we get a approximation to Problem (3) which is possible to solve in the reasonable time. This work is organized as follows: in Sect. 2 we briefly review some basic about Copulae and proved that the Zhang’s family satisfies the condition to use the outer-approximation algorithm develop in [4]. In Sect. 3, a review about outerapproximation is presented. In Sect. 4, we describe a tool problem coming from power management energy, in Sect. 5 we present some preliminary numerical solution of this problem and finally, in Sect. 6 we give a short conclusion.
2
Copulae: A Bird’s Eye View
When dealing with chance-constrained programs it is, very often, impossible to get an explicit formula for the probability measure P because the jointly distribution of ξ variable is unknown. However, it is easier to estimate the marginal distributions Fξ1 , . . . , Fξm . In what follows, the random variable ξ ∈ IRm will supposed to have known marginal distributions Fξ1 , . . . , Fξm . In order to model the dependence among theses marginals a copula function will be employed. The concept of copula was introduced by Sklar [11] in 1959, when he was studying the relationship between a multidimensional probability function and its lower dimensional marginals.
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family
479
Definition 1. An m−dimensional copula is a function C : [0, 1]m → [0, 1] that satisfies the following properties: (i) C(1, . . . , 1, u, 1, . . . , 1) = u ∀u ∈ [0, 1], (ii) C(u1 , . . . , ui−1 , 0, ui+1 , . . . , um ) = 0 and (iii) C is quasi monotone on [0, 1]m . In other words, the above definition means that C is an m−dimensional distribution function with all univariate marginals being uniform in the interval [0, 1]. The item (iii) means that the C−volume of any box in [0, 1]m is nonnegative (see [8] for more details). Given a random vector ξ with known margins Fξi , i = 1, . . . , m, an important tool proved by Sklar [11] is a theorem that assures the existence of a copula that approximates the cumulative distribution F . Theorem 1. Let Fξ be a m−dimensional distribution function with marginals Fξ1 , Fξ2 , . . . , Fξm . Then there exists a m−dimensional copula C such for all z ∈ IRm , Fξ (z1 , z2 , . . . , zm ) = C(Fξ1 (z1 ), Fξ2 (z2 ), . . . , Fξm (zm )).
(5)
If Fξi , i = 1, . . . , m are continuous, then C is unique. Otherwise, C is uniquely determined in the image of Fξ . Conversely, if C is a copula and Fξi , . . . , Fm are distribution functions, then the function Fξ defined by (5) is a m−dimensional distribution function with marginals Fξ1 , . . . , Fξm . In the above theorem, functions Fξi , i = 1, . . . , m can be different. This a property of particular interest in energy problems where power generation can be produced by several renewable (and uncertainty) sources of different nature: the probability distribution governing the amount of water arriving into a reservoir (of a hydroplant) can be very different from the probability distribution of the wind speed nearby an eolian park. Observe that this theorem is not constructive, it just ensures the existence of a copula associated to the distribution Fξ (z). In most of the cases, a copula providing the equality C(Fξ1 (z1 ), . . . , Fξm (zm )) = Fξ (z) must be estimated. The problem of choosing/estimating a suitable copula has been receiving (from the statistical community) much attention in the last few years [8]. As shown in book [8], there are many copulae in the literature. By applying “log” in the inequality (4) the following function is obtained fm (x, y) = log(p) − log C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y)),
(6)
where Fξi is the marginal probability distribution of Fξ (z) = P [z ≥ ξ], which is assumed to be known. The function given by (6) is well defined by Sklar’s theorem [Theorem 1]. If C is 0−concave, then (3) can be approximated by the convex MINLP (which can be nonsmooth) min(x,y)∈X×Y f0 (x, y) s.t. fi (x, y) ≤ 0, i = 1, . . . , mf − 1 log(p) − log(C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y))) ≤ 0.
(7)
480
A. Delfino
There is a family of copula with this property, introduced by Zhang [15]. The family is given by r a min (ui j,i ), (8) C(u1 , . . . , um ) = j=1
1≤i≤m
r where aj,i ≥ 0 and j=1 aj,i = 1 for all i = 1, . . . , m. Different choices of parameters aj,i give different copulae, all of them nonsmooth functions, but with subgradient easily computed via chain rule. We proved that this family of copula is a log concave. Theorem 2. Let ξ ∈ IRm be a random vector with all marginals Fξi , i = 1, . . . , m being 0−concave functions. Suppose that g : IRnx × IRny → IRm is a concave function. Consider a Zhang’sCopula C given in (8) for any choice of r parameters aj,i satisfying aj,i ≥ 0 and j=1 aj,i = 1 for all i = 1, . . . , m. Then C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y))) is α−concave for α ≤ 0. Proof. Given a pair (x, y) ∈ IRnx ×IRny we set z = (x, y) to simplify the notation. Let z1 = (x1 , y1 ), z2 = (x2 , y2 ) and z = λz1 + (1 − λ)z2 with λ ∈ [0, 1]. As the function g is concave, then for all i = 1, . . . , m gi (λz1 + (1 − λ)z2 ) ≥ λgi (z1 ) + (1 − λ)gi (z2 ).
(9)
As Fξi , i = 1, . . . , m, are increasing functions, by applying Fξi to inequality (9) we get (10) Fξi (gi (λz1 + (1 − λ)z2 )) ≥ Fξi (λgi (z1 ) + (1 − λ)gi (z2 )). By applying log in the above inequality, log(Fξi (gi (λz1 + (1 − λ)z2 ))) ≥ log(Fξi (λgi (z1 ) + (1 − λ)gi (z2 ))).
(11)
Functions Fξi are 0−concave by hypothesis. Then log(Fξi (λgi (z1 ) + (1 − λ)gi (z2 ))) ≥ λ log(Fξi (gi (z1 ))) + (1 − λ) log(Fξi (gi (z2 ))). (12) By gathering inequality (11) and (12) we have log(Fξi (gi (z))) ≥ log(λFξi (gi (z1 )) + (1 − λ)Fξi (gi (z2 ))) ≥ λ log(Fξi (gi (z1 ))) + (1 − λ) log(Fξi (gi (z2 ))).
(13)
The Zhang’s Copula evaluated at the point z = λz1 + (1 − λ)z2 is r a C(Fξ1 (g1 (z), . . . , Fξm (gm (z)) = j=1 min1≤i≤m [Fξi (gi (λz1 (1 − λ)z2 ))] j,i , where aj,i ≥ 0. To simply the notation, we write Fξ1 (g1 (z), . . . , Fξm (gm (z) as Fξ (g(z)). So, r a log C(Fξ (g(z))) = j=1 log (min1≤i≤m [Fξi (gi (z))] j,i ) .
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family
481
As the log function is increasing, log min u = min log u, and therefore r
log C(Fξ (g(z))) =
min [log (Fξi (gi (z))
aj,i
1≤i≤m
j=1
].
As aj,i ≥ 0, the equality becomes log C(Fξ (g(z))) =
r j=1
min [aj,i log (Fξi (gi (z))] .
1≤i≤m
By using (13) in the above equality we get log C(Fξ (g(z))) ≥
r j=1
min aj,i [λ log(Fξi (gi (z1 ))) + (1 − λ) log(Fξi (gi (z2 )))] .
1≤i≤m
The right-most side of the above inequality is greater or equal than r
j=1
=λ
min1≤i≤m aj,i [λ log(Fξi (gi (z1 )))] +
r
r
j=1
min1≤i≤m aj,i [(1 − λ) log(Fξi (gi (z2 )))]
aj,i ] + (1 − λ) j=1 min1≤i≤m [log(Fξi (gi (z1 )))
= λ[log
r
j=1
min1≤i≤m (Fξi (gi (z1 )))
aj,i
] + (1 −
r
aj,i ] j=1 min1≤i≤m [log(Fξi (gi (z2 ))) r aj,i λ)[log j=1 min1≤i≤m (Fξi (gi (z2 ))) ]
= λ log C(Fξ (g(z1 ))) + (1 − λ) log C(Fξ (g(z2 ))).
It was then demonstrated that log C(Fξ (g(λz1 + (1 − λ)z2 ))) ≥ λ log C(Fξ (g(z1 ))) + (1 − λ) log C(Fξ (g(z2 ))), i.e., the log C(Fξ1 (g1 (z)), . . . , Fξm (gm (z))) is a concave function. In other words, the copula C is α−concave for α ≤ 0.
3
Outer Approximation
A important method for solving (3) is the outer-approximation algorithm given in [5] and further extended in [7]. The method solves a sequence of nonlinear and mixed linear subproblems, as described below. At iteration k, the method fixes the integer variable yk and tries to solve, in the continuous variable x, the following subproblem: min f0 (x, yk ) s.t. fi (x, yk ) ≤ 0 , i ∈ Ic .
x∈X
(14)
If this subproblem is feasible, a feasible point to problem (3) is found and, k for its optimal value. On the other hand, if (14) therefore, an upper bound fup is infeasible the method solves the feasibility subproblem: min
x∈X,u≥0
u s.t. fi (x, yk ) ≤ u , i ∈ Ic .
(15)
482
A. Delfino
With a solution of (14) or (15), the linearization of f0 and fi can be used to approximate problem (3) by the following MILP:
k flow
⎧ minr,x,y r ⎪ ⎪ ⎪ k ⎪ r < fup ⎨ s.t. j := f0 (x , y j )+ < s0 , (x − xj , y − y j ) >≤ r, j ∈ T k ⎪ ⎪ ⎪ fi (xj , y j )+ < si , (x − xj , y − y j ) >≤ 0, j ∈ T k ∪ S k , ⎪ ⎩ r ∈ IR, x ∈ X, y ∈ Y ,
i ∈ Ic
(16) where si , i = 0, 1, . . . , m are subgradients at the point (xj , y j ) and index sets T k and S k are defined as T k := {j ≤ k : subproblem (14) was feasible at iteration j} and S k := {j ≤ k : subproblem (14) was infeasible at iteration j}. Under convexity and differentiability (in this case, the subdifferencial is a singleton which k of contains the gradient vector) of underlying functions, the optimal value flow (16) is a lower bound on the optimal value of (3). Moreover, the y−part solution of (16) is the next integer iterate y k+1 . The algorithm stops when the difference between upper and lower bounds provided respectively by (14) and (16) is within a given tolerance > 0. The outer approximation algorithm was revisited in 1992 in [10], where the authors proposed a LP/NLP based on the branch and bound strategy in which the explicit solution of a MILP master problem is avoided at each major iteration k. In our case, the underlying functions might not be differentiable, but subdifferentiable. As pointed out in [6], replacing gradients by subgradients in the classic OA algorithm entails a serious issue: the OA algorithm is not convergent if the differentiability assumption is removed. In order to have a convergent OA algorithm for nonsmooth convex MINLP one needs to compute linearizations (cuts) in (16) by using subgradients that satisfy the KKT system of either subproblem (14) or (15), see [4,6]. Computing solutions and subgradients satisfying the KKT conditions of the nosmooth subproblems is not a trivial task. For instance, the Kelley cuttingplane method and subgradients methods for nosmooth convex optimization problems are ensured to find an optimal solution, but are not ensured to provide a subgradient satisfying the KKT system. Given an optimal solution xk of (14), there might be infinitely many subgradients of f at xk if f is nonsmooth. How would a specific subgradient can be chosen in order to satisfy the underlying KKT system? The answer for this question was closed by the authors in paper [4] in which they proposed a regularized outer-approximation method capable to solve the nonsmooth MINLP problems.
4
A Power System Management Problem
In this section we consider a power system management model similar to the one given in [1]. The model consists of planning (in a horizon of few hours) a small system composed of a hydro plant and a wind farm. Our model is constructed via realistic data from the Brazilian energy system.
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family
4.1
483
Problem’s Description
Consider a power management model consisting of a hydro power plant and a wind farm. Electricity that is generated by both units has two purposes: first attend the local community power demand and secondly the leftover is sold on the market. The energy that is generated by the wind farm is designated to supply the local community demand only. If it is not enough then the remaining demand is covered by the hydro power plant. The residual energy portion generated by the hydro power plant is then sold to the market with the aim of maximizing the profit, which varies according to the given energy price. Since the intention is to consider a short time planning period (e.g. one day) the assumption is that the only uncertainty in this energy planning problem comes from the wind power production. As a result the approach will consider the inflow to the hydro plant, market prices and energy demand as known parameters. The hydro plant counts with a reservoir that can be used to store water and adapt the water release strategy to better achieve profit according the price versus demand: the price of electricity varies during the day, thus it is convenient to store water (if possible) to generate electricity at moments of the day deemed more profitable. In order to exclude production strategy that can be optimum in a short period of time and can harm the near future energy supply (e.g. the planner can be willing to use all water in the reservoir to produce energy to maximize profit because the energy prices are higher and in the next hour there is no enough water to produce energy in case the wind farm is failing to supply the local community leading to a blackouts), a level constraint is imposed for the final water level in the hydro reservoir i.e. it cannot be lower of a certain level l∗ . The decision variables of the problem are the leftover energy to supply the local community and the residual energy to be sold to the market (both generated by the hydro power plant). Since the main purpose of the problem is to maximize the profit for the power plant owner then the objective function is profit maximization. Some of the constraints of this problem are simple bounds of water release which are given by the operational limits of the turbine (usually provided per the manufacturer), lower and upper bounds of hydro reservoir filling level and demand satisfaction. As in the paper [1], the demand satisfaction constraint will be dealt with in a probabilistic manner: random constraints in which a decision has to be taken prior to the observation of the random variable are not well-defined in the context of an optimization problem. This motivates the formulation of a corresponding probabilistic constraint in which a decision is defined to be feasible if the underlying random constraint is satisfied under this decision at least with a certain specified probability p. A further characteristic of this model is to consider binary decision variables. These variables are needed because turbines cannot be operated using an arbitrary level: they are either off or on (working in a positive level). Such on/off constraints are easily modeled by binary variables. By discretizing the time horizon (one day) into T intervals (hours), the resulting optimization problem is described below:
484
A. Delfino
T maxx,y,z t=1 πt zt s.t. P [xt + ξt ≥ dt ∀t = 1, . . . , T ] ≥ p ∀t = 1, . . . , T yt v ≤ xt + zt ≤ yt v¯ ∀t = 1, . . . , T xt , zt ≥ 0, yt ∈ {0, 1} t 1 ¯ l ≤ l0 + tω − χ τ =1 (xτ + zτ ) ≤ l ∀t = 1, . . . , T t l0 + T ω − χ1 τ =1 (xτ + zτ ) ≥ l∗ ,
(17)
where zt is the residual energy which is produced by the hydro power plant in time interval t that is sold to market, πt is the energy price in the time t, xt is the amount of energy generated by hydro power plant to supply the remaining demand on local community on time t, dt is the local community demand on time t which is assumed to be known (due to the short planning horizon of one day), ξt is the random energy generated by the wind farm on time t p ∈ (0, 1] is the given parameter to ensure confidence level for the demand satisfaction, v and v¯ are the lower and upper respectively operations limits of the hydro power plant turbine, yt is the binary variable modeling turbine turn on/turn off, l0 is the initial water level of the hydro power plant reservoir at the beginning of the horizon, l and ¯l are the lower and upper water levels respectively in the hydro power plant reservoir at any time, ω denotes the constant amount of water inflow to the hydro power plant reservoir at any time t, χ represents a conversion factor between the released water and the energy produced by the turbine: one unit of water released corresponds to χ units of energy generated, l∗ is the minimum level of water into the hydro power plant reservoir in the last period T of the time horizon and P is the probability measure associated to random vector ξ. As in [1], we assume that the wind power generation follows a multivariate normal distribution with mean vector μ and a positive definite correlation matrix Σ. This assumption assures that problem (3) is convex. Theorem 1 secures that we can replace the probability measure P by a Zhang’s copula C and finally, Theorem 2 confirm that Problem (7) are also convex and consequently, we can use outer-approximation to solve them. 4.2
Problem’s Data
The demand considered in this problem was extracted from the ONS website (www.ons.org.br). In our numerical tests, the considered daily demands corresponds to eighty percent of averaged demand Southern region of Brazil, divided by the number of cities in such a region. Note that we have only one city (or region) and two sources of energy: hydro power plant and wind farm. The price πt of energy is directly proportional to the demand and varies between 166.35 and 266.85 by MW\h. The configuration of hydro power plant reservoir is mirrored from [1] and in this problem is set as l = 5000 hm3 (cubic hectometre), ¯l = 10000 hm3 and l0 = l∗ = 7500 hm3 . The amount of water inflow is a constant ω = 2500 hm3 each hour and the conversion factor is χ = 0.0035 MWm3 . When the turbines are turned on,
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family
485
the minimum power generation is 5 MW/h and the maximum generation is 20 MW/h. The marginal distributions Fξi follows a normal distribution with mean μi and variance ξi .
5
Numerical Experiments
Numerical experiments were performed on a computer with Intel(R) Core(TM), i7-5500U, CPU @ 2.40 GHz, 8G (RAM), under Windows 10, 64 Bits and we coded in/or called from matlab version 2017a our algorithm from [4]. We solved the power system management problem for T = 24 (one day) using the Zhang’s copula to approximate the probability function. Then we checked the probability constraint of (17). In this case for dimension T = 24 was not possible to solve Problem (17) within one hour CPU time, given the considered computer and softwares. One of difficulties in using copulae is to find its coefficients that model with accuracy the probability constraint. The parameters of Zhang’s Copula depend on the size of the problem. If the random vector ξ has dimension T then the r number of parameters is 1+rT : r and aj,i ≥ 0 with j=1 aj,i = 1 ∀i = 1, . . . , T . In this work we do not focus on the best choice of the Copula parameters yet. For now, we simply set r = 8 and the coefficients aj,i was generated following a uniform probability distribution with low sparseness. As shown below, this simple choice gives satisfactory results. These nonsmooth convex mixed-integer programs are solved with the following solvers, coded in/or called from matlab version 2017a: – OA: it is an implementation of the outer-approximation Algorithm given in [4] (the classic algorithm). – OA1 : as solver OA, with integer iterates defined by solving the regularized MILP subproblem (16) with the 1 norm. The stability center was set as the current iterate and the prox parameter as μk = 10 for all k; – OA∞ : as solver OA1 , with the 1 norm replaced by ∞ ; – OA2 : as solver OA1 , with the 1 norm replaced by 2 . In this case, the subproblem defining the next iterate is no longer a MILP, but a MIQP; We solved 21 problems, each of them based a day of week and using the value for p as 0.8, 0.9 and 0.95. Table 1 shows the performance those algorithms in all problems. Although some problems were not resolved in the required time (one hour) by OA, the regularized ones solved all problems with satisfactory results. It is important to say that if we use the probability function instead of copula then no problem is solved by the algorithms in the time limit of one hour.
486
A. Delfino Table 1. Number of MILPs and CPU time for p ∈ {0.8, 0.9, 0.95}
∗
Day
OA k
p = 0.8
Tuesday Wednesday Thursday Friday Saturday Sunday Monday
1175 502 967 908 31 197 653
p = 0.9
Tuesday Wednesday Thursday Friday Saturday Sunday Monday
p = 0.95 Tuesday Wednesday Thursday Friday Saturday Sunday Monday
CPU
OA1 k CPU
OA∞ k CPU
OA2 k CPU
3395.79 1327.55 2958.82 3600.89∗ 36.27 407.31 3605.96∗
39 44 57 41 9 13 23
141.08 140.20 174.08 173.59 9.08 21.33 96.57
539 147 452 781 53 79 1073
1485.71 349.76 1235.68 2971.03 45.53 79.91 3347.86
37 47 52 48 11 13 28
113.72 102.97 151.84 152.14 12.41 19.61 87.82
1204 533 1031 820 29 778 646
3600.92∗ 1510.74 3425.71 3603.04∗ 37.92 3582.22 3609.38∗
32 41 73 42 9 15 23
179.63 152.42 213.68 146.48 10.44 46.50 96.59
390 157 472 793 33 461 373
1225.37 465.13 1329.76 3203.98 27.35 1239.08 1288.81
34 52 58 43 11 14 28
107.71 120.90 176.13 128.13 15.92 35.07 86.89
1165 717 1032 771 38 761 622
3601.44∗ 2129.51 3600.68∗ 3605.22∗ 94.58 3401.13 3600.82∗
32 56 49 42 9 15 22
204.71 181.47 219.76 123.34 19.13 48.91 98.68
394 398 374 789 14 123 986
1310.39 1081.85 1280.56 3045.20 22.51 356.63 3580.10
23 51 41 48 10 14 28
112.16 127.46 99.64 110.58 20.93 45.96 84.72
Sum 14580 15.20 h the limit time was reached
6
686 0.69 h 8881 8.05 h
691 0.53 h
Conclusion
In this work we show that using copulas is an excellent alternative to solve problems involving probability restrictions. In future work we will move in this direction and improve our numerical results.
References 1. Arnold, T., Henrion, R., M¨ oller, A., Vigerske, S.: A mixed-integer stochastic nonlinear optimization problem with joint probabilistic constraints. Pac. J. Optim. 10, 5–25 (2014) 2. Birge, J., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (2011) 3. de Oliveira, W.: Regularized optimization methods for convex MINLP problems. TOP 24, 665–692 (2016)
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family
487
4. Delfino, A., de Oliveira, W.: Outer-approximation algorithms for nonsmooth convex MINLP problem. Optimization 67, 797–819 (2018) 5. Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36, 307–339 (1986) 6. Eronen, V.P., M¨ akel¨ a, M.M., Westerlund, T.: On the generalization of ECP and OA methods to nonsmooth convex MINLP problems. Optimization 63, 1057–1073 (2014) 7. Fletcher, R., Leyffer, S.: Solving mixed integer nonlinear programs by outer approximation. Math. Program. 66, 327–349 (1994) 8. Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006) 9. Pr´ekopa, A.: Stochastic Programming. Dordrechet (1995) 10. Quesada, I., Grossmann, I.E.: An LP/NLP based branch and bound algorithm for convex MINLP optimization problems. Comput. Chem. Eng. 16, 937–947 (1992) 11. Sklar, A.: Fountions de rapartition ´ a dimentisions et leurs marges. Publications and Institut de Statistique de Paris 8, 229–231 (1959) 12. Song, Y., Luedtke, J., K¨ u¸cu ¨kyavuz, S.: Chance-constrained binary packing problems. INFORMS J. Comput. 26, 735–747 (2014) 13. Vielma, J.P., Ahmed, S., Nemhauser, G.L.: Mixed integer linear programming formulations for probabilistic constraints. Oper. Res. Lett. 40, 153–158 (2012) 14. Westerlund, T., Pettersson, F.: An extended cutting plane method for solving convex MINLP problems. Comput. Chem. Eng. 19, 131–136 (1995) 15. Zhang, Z.: On approximation max-stable process and constructing extremal copula functions. In: Statistical Inference for Stochastic Process, vol. 12, pp. 89–114 (2009)
Stochastic Greedy Algorithm Is Still Good: Maximizing Submodular + Supermodular Functions Sai Ji1 , Dachuan Xu1 , Min Li2 , Yishui Wang3(B) , and Dongmei Zhang4 1
College of Applied Sciences, Beijing University of Technology, Beijing 100124, People’s Republic of China [email protected], [email protected] 2 School of Mathematics and Statistics, Shandong Normal University, Jinan, People’s Republic of China [email protected] 3 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, People’s Republic of China [email protected] 4 School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, People’s Republic of China [email protected]
Abstract. In this paper, we consider the problem of maximizing the sum of a submodular and a supermodular (BP) function (both are nonnegative) under cardinality constraint and p-system constraint respectively, which arises in many real-world applications such as data science, machine learning and artificial intelligence. Greedy algorithm is widely used to design an approximation algorithm. However, in many applications, evaluating the value of the objective function is expensive. In order to avoid a waste of time and money, we propose a StochasticGreedy (SG) algorithm, a Stochastic-Standard-Greedy (SSG) algorithm as well as a Random-Greedy (RG) for the monotone BP maximization problems under cardinality constraint, p-system constraint as well as the non-monotone BP maximization problems under cardinality constraint, respectively. The SSG algorithm also works well on the monotone BP maximization problems under cardinality constraint. Numerical experiments for the monotone BP maximization under cardinality constraint is made for comparing the SG algorithm and the SSG algorithm in the previous works. The results show that the guarantee of the SG algorithm is worse than the SSG algorithm, but the SG algorithm is faster than SSG algorithm, especially for the large-scale instances.
Keywords: BP maximization algorithm
· Stochastic greedy · Approximation
Supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171) and Natural Science Foundation of China (Grant Nos. 11531014, 11871081, 61433012, U1435215). c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 488–497, 2020. https://doi.org/10.1007/978-3-030-21803-4_49
Stochastic Greedy Algorithm Is Still Good: Maximizing
1
489
Introduction
Maximizing a submodular function subject to independence constraint is related to many machine learning and data science applications such as information gathering [13], document summarization [14], image segmentation [12], and PAC learning [16]. There are many studies about variants of submodular maximization [2,4,8,11]. However many subset selection problems in data science are not always submodular [22]. In this paper, we study the constrained maximization of an objective that may be decomposed into the sum of a submodular and a supermodular function. That is, we consider the following problem: arg max h(X) = f (X) + g(X), X∈C
where f is a normalized submodular function and g is a normalized supermodular function. We call this problem submodular-supermodular (BP) maximization problem, and f + g a BP function [1]. We say a function h admits a BP decomposition if ∃f, g such that h = f + g, where f is a submodular function and g is a supermodular function. Not all the monotone functions are BP decomposition and there are some instances of BP problem that can not be approximately solved to any positive factor in polynomial time [1]. Thus, in this paper, we just study the BP maximization problems, which can be decomposed and approximated. When g is modular, there is an extensive literature on submodular maximization problem [6,10,17,21]. If h is monotone, the greedy algorithm is guaranteed to obtain an (1 − e−1 )-approximation subject to the cardinality constraint [17], 1 -approximation for and this result is known to be tight. It also achieves a p+1 p matroids [9]. Based on the definition of curvature, the greedy algorithm has a K1h (1 − e−Kh ) guarantee for the cardinality constraint and a Kh1+p guarantee for the matroid constraint [7]. The Stochastic-Greedy algorithm achieves a (1 − e−1 − ) guarantee for the monotone submodular maximization with cardinality constraint [15]. If h is non-monotone, the Random-Greedy algorithm achieves a e−1 -approximation subject to the cardinality constraint [5]. When g is not modular, there are also some good jobs about non-submodular maximization problem [3,18,20]. Especially, when g is supermodular, Bai et g al. [1] provide a K1f [1 − e−(1−K )Kf ]-approximation algorithm for the monog
(1−K ) tone BP maximization problem under cardinality constraint and a (1−K g )K +p f approximation algorithm for the monotone BP maximization problem under p matroid independence constraint. In this paper, we consider the monotone BP maximization problem subject to cardinality constraint, the monotone BP maximization problem subjected to p-system constraint and the non-monotone BP maximization problem subjected to cardinality constraint, where both of the submodular and the supermodular functions are non-negative. For each problem, we provide a stochastic greedy algorithm and give the corresponding theoretical analysis.
490
2
S. Ji et al.
Preliminaries
Given a set V = {v1 , v2 , . . . , vn }, denote fv (X) = f (X ∪ {v}) − f (X) as the marginal gain of adding the item v to the set X ⊂ V . A function f is monotone if for any X ⊆ Y , f (X) ≤ f (Y ). Without loss of generality, we assume that monotone functions are normalized, i.e., f (∅) = 0. Definition 1. (Submodular curvature [7]) The curvature Kf of a submodular function f is defined as Kf = 1 − min v∈V
fv (V \{v}) . f (v)
Definition 2. (Supermodular curvature [7]) The curvature Kg of a supermodular function g is defined as Kg = 1 − min v∈V
g(v) . gv (V \{v})
From Definitions 1 and 2, we have that 0 ≤ Kf , Kg ≤ 1. In this paper, we study the case that 0 < Kf < 1 and 0 < Kg < 1.
3
Algorithms
In this section, we provide the SG algorithm, the SSG algorithm and the RG algorithm for the monotone BP maximization problem subject to cardinality constraint, monotone BP maximization problem subject to p-system constraint and non-monotone BP maximization problem subject to cardinality constraint, respectively. These three algorithms are alternative processes. In each iteration, the SG algorithm samples an alternative set of size nk ln 1 uniformly and randomly from the set V \current solution, then chooses the item from above alternative set with the maximum marginal gain. The SSG algorithm selects one item whose marginal gain is at least ξ ∈ (0, 1] times of the largest marginal gain value, where ξ ∈ (0, 1] comes from a distribution D. While the RG algorithm chooses an item uniformly and randomly from a set of size k with largest summation of individual marginal gain value. The detailed algorithms are showed as follows. 3.1
SG Algorithm
Similar to Lemma 2 of [15], we have the following lemma, which estimate the lower bound of the expected gain in the (i+1)-th step and reveal the relationship between the current solution S and the optimal solution S ∗ . Lemma 1. Let t = nk ln 1 . The expected gain of Algorithm 1 at the (i + 1)-th step is at least 1− hs (Si ), |S ∗ \Si | ∗ s∈S \S
Stochastic Greedy Algorithm Is Still Good: Maximizing
491
where S ∗ is the optimal solution and Si is the subset obtained by Algorithm 1 after i steps.
Algorithm 1 SG algorithm for monotone BP maximization with cardinality constraint Input: a monotone submodular function f , a monotone supermodular function g, a ground set V , and a budget k Output: a subset S of V with k items 1: S ← ∅, i ← 1 2: while i ≤ k do 3: R ← a random subset obtained by sampling t random items from V \S 4: si ← arg maxs∈R hs (S) 5: S ← S ∪ {si }, i ← i + 1 6: end while 7: return S
Lemma 2. Let Si (0 ≤ i ≤ k) be the subsets obtained by Algorithm 1 after i-th step, respectively. Then the following holds for all 0 ≤ i ≤ k − 1, aj + aj h(S ∗ ) ≤ Kf j:sj ∈Si \S ∗
+
j:sj ∈Si ∩S ∗
k − |S ∗ ∩ Si | ai+1 , (1 − Kg )(1 − )
where S ∗ is the optimal solution, {si } = Si \Si−1 , ai = E[hsi (Si−1 )], Kf is the curvature of submodular function f , and Kg is the curvature of supermodular function g. Combining Lemma C.1, Lemma D.2 of [1] and Lemmas 1–2, we obtain the following theorem to estimate the approximation ratio of SG algorithm. Theorem 1. Let t = nk ln 1 . For monotone BP maximization problem with cardinal constraint, Algorithm 1 finds a subset S ⊆ V with |S| = k and E[h(S)] ≥
g 1 1 − e−(1−K )Kf − h(S ∗ ). Kf
where Kf and Kg are the curvature of submodular function f and the curvature of supermodular function g, respectively. 3.2
SSG Algorithm
k−1 Lemma 3. [9] For δi , ρi ≥ 0 with 0 ≤ i ≤ t − 1, if it satisfies that i=0 δi ≤ t k−1 k−1 for 1 ≤ t ≤ k and ρi−1 ≥ ρi for 1 ≤ i ≤ k − 1, then i=0 δi ρi ≤ i=0 ρi .
492
S. Ji et al.
Algorithm 2 SSG algorithm for monotone BP maximization with p-system constraint Input: a monotone submodular function f , a monotone supermodular function g, an independence system (V, I) and a distribution D Output: a base of V 1: S ← ∅, U ← V 2: repeat 3: U ← {v ∈ U |S ∪ {v} ∈ I} 4: if U = ∅ then 5: ξ ← randomly sampled from D 6: s∗ ← an arbitrary item from U s.t. hs∗ (S) ≥ ξ · maxs∈U hs (S) 7: end if 8: until U = ∅ 9: return S
Similar to Theorem 3 of [19]. We get the following theorem based on Lemma 3 to estimate the approximation ratio of SSG algorithm. Theorem 2. For monotone BP maximization problem with p-system constraint, Algorithm 2 finds a basis S of V with E[h(S)] ≥
(1 − Kg )2 μ h(S ∗ ), p + (1 − Kg )2 μ
where S ∗ , Kf , Kg and μ is the optimal solution, the curvature of submodular function f , the curvature of supermodular function g and the expectation of ξ ∼ D, respectively. 3.3
RG Algorithm
Lemma 4. Let h = f + g, f is a submodular function and g is a supermodular function. Denote A(p) as a random subset of A where each element appears with probability at most p (not necessarily independently). Then we have E[h(A(p))] ≥ [1 − (1 − Kg )p]h(∅), where Kg is the curvature of supermodular function g. From Lemma 4, we get the following lemma which is crucial to the analysis of RG algorithm. Lemma 5. Let h = f + g, f is a submodular function and g is a superE[h(S ∗ ∪ {si })] ≥ 1 − (1 − modular function. For any 1 ≤ i ≤ k, we have h(S ∗ ) Kg ) 1 − (1 − k1 )i , where Kg is the curvature of supermodular function g and si is the item obtained by Algorithm 3 at the i-th step.
Stochastic Greedy Algorithm Is Still Good: Maximizing
493
Algorithm 3 RG algorithm for non-monotone BP maximization with cardinality constraint Input: a submodular function f , a supermodular function g, function f + g is nonmonotone, a ground set V , and a budget k Output: a subset S of V with k items 1: S ← ∅, i ← 1 2: while i ≤ k do 3: Mi ⊆ V \S be the subset of size k maximizing s∈Mi hs (S) 4: si ← uniformly and randomly chosen from Mi 5: S ← S ∪ {si }, i ← i + 1 6: end while 7: return S
Theorem 3. For non-monotone BP maximization problem with cardinality constraint, Algorithm 3 finds a subset S ⊆ V with |S| = k and E[h(S)] ≥ (1 − Kg )e−1 h(S ∗ ), where S ∗ is the optimal solution and Kg is the curvature of supermodular function g. Proof. Denote Si as the subset obtained by Algorithm 3 after i steps. For each 1 ≤ i ≤ k, consider a set Mi containing the elements of S ∗ ∪ Si−1 plus enough dummy elements to make the size of Mi exactly k. Recall Algorithm 3 and Lemma C.1 of [1]. We have hs (Si−1 ) ≥ k −1 hs (Si−1 ) E[hsi (Si )] = k −1 s∈Mi
s∈Mi
= k −1
hs (Si−1 )
S ∗ \Si−1
≥ (1 − Kg )[
h(S ∗ ∪ Si−1 ) − h(Si−1 ) ]. k
Combing Lemma 5, we have E[h(Si )] = E[h(Si−1 )] + E[hsi (Si−1 )] (1 − Kg )[h(S ∗ ∪ Si−1 ) − h(Si−1 )] k 1 1 − Kg 1 − Kg )E[h(Si−1 )] + (1 − )i−1 h(S ∗ ). = (1 − k k k ≥ E[h(Si−1 )] +
(1)
494
S. Ji et al.
By (1), we have
k k−1 1 − Kg 1 − Kg E[h(S)] = E[h(Sk )] ≥ 1 − h(∅) + 1− k k k−3 2 k−2 1 − Kg 1 1 1 − Kg + 1− + 1− 1− + 1− k k k k k−1 1 − Kg 1 h(S ∗ ) ... + 1 − k k
k−1 1 1 − Kg k 1− ≥ h(S ∗ ) k k
≥ (1 − Kg )e−1 h(S ∗ ). The theorem is proved.
4
Numerical Experiments
In this section, we make some numerical experiments to compare the StochasticGreedy algorithm (Algorithm 1) and the Stochastic-Standard-Greedy algorithm (see the reference [1]) for the BP maximization problems subject to cardinality constraint. We use the same model in Bai et al. [1]. In this model, the ground set V is partitioned into V1 = {v1 , . . . , vk } and V2 = V \V1 . The submodular function is defined as
k − α|S ∩ V2 | |S ∩ V2 | , wi + f (S) := λ · k k i:vi ∈S
and the supermodular function is defined as g(S) := (1 − λ) · |S| − β min(|S ∩ V1 | + 1, |S|, k) + β (|S ∩ V2 | − k + 1)) , ε max(|S|, |S| + 1−β where α ∈ (0, 1], β ∈ [0, 1), λ ∈ [0, 1], and ε = 10−5 . It is easy to see that the curvatures of f and g are α and β respectively. In our experiments, we set n = 300, k = 150, α = 0.05, 0.1, 0.15, . . . , 1, β = 0, 0.05, 0.1, . . . , 0.95, and λ = 0.1, 0.3, . . . , 0.9. For the Stochastic-Greedy algorithm, we repeat 10 runs and compute the average value for each setting. Results of guarantee are shown in Fig. 1, and running times are shown in Table 1. It is not surprise that the Stochastic-Standard-Greedy algorithm performs better than the Stochastic-Greedy algorithm since the Stochastic-Standard-Greedy algorithm use a larger candidate set in each iteration. However, the StochasticGreedy algorithm is faster than the Stochastic-Standard-Greedy algorithm, especially for the large size, the gap of running time between these two algorithm
Stochastic Greedy Algorithm Is Still Good: Maximizing =0.1
objective value
1 0.8 0.6 0.4 1 0.5
0 0
0.5
1
0.9
0.8
0.8
0.7
0.7
0.6 1 0.5
0.6 1 0.5
0 0
1
0 0
0.9
stochastic greedy standard greedy
0.8 0.8
0.7 0.6 1 0.5
1
0.5
=0.9
1
0.9
0.5
=0.5
1
0.9
=0.7
1
=0.3
1
495
0 0
0.5
1
0.7 1 0.5
0 0
0.5
1
Fig. 1. Guarantees of the Stochastic-Standard-Greedy and Stochastic-Greedy algorithms
is very big. Moreover, it is interested that for the fixed λ and Kg , lower Kf yields lower gap of value between the two algorithms, and larger λ (means the function h is closer to be submodular) yields lower gap of value between the two algorithms for the big Kf . Table 1. Running times of the Stochastic-Standard-Greedy and Stochastic-Greedy algorithms Size
5
λ
Minimal time (s)
Maximal time (s)
Average time (s)
Standard
Stochastic
Standard greedy
Stochastic greedy
Standard greedy
Stochastic greedy
n = 300 0.1
8.2265
0.0923
8.7303
0.0969
8.4784
0.0943
k = 150 0.3
8.2101
0.0921
8.7409
0.0988
8.4771
0.0943
0.5
8.2262
0.0923
8.6891
0.0981
8.4800
0.0943
0.7
8.2150
0.0921
8.6225
0.0969
8.4808
0.0943
0.9
8.2512
0.0922
8.7408
0.1136
8.4816
0.0944
Conclusion
In this paper, we consider the monotone BP maximization problem subject to cardinality constraint and p-system constraint, respectively. Then, we consider the non-monotone BP maximization problem subjected to cardinality constraint.
496
S. Ji et al.
For each problem, we give a stochastic algorithm. The theoretical analysis indicates that the stochastic algorithms work well on the BP maximization problems. Numerical experiments shows the the algorithms are effective. There are two possible future research problems. One is to design a better stochastic algorithm to solve BP maximization problem subjected to cardinality constraint or p-system constraint. Another direction is to study other variants of constrained submodular maximization problem.
References 1. Bai, W., Bilmes, J.A.: Greed is still good: maximizing monotone submodular+ supermodular functions (2018). arXiv preprint arXiv:1801.07413 2. Bian, A., Levy, K., Krause, A., Buhmann, J.M.: Continuous dr-submodular maximization: structure and algorithms. In: Advances in Neural Information Processing Systems, pp. 486–496 (2017) 3. Bian, A.A., Buhmann, J.M., Krause, A., Tschiatschek, S.: Guarantees for greedy maximization of non-submodular functions with applications (2017). arXiv preprint arXiv:1703.02100 4. Bogunovic, I., Zhao, J., Cevher, V.: Robust maximization of non-submodular objectives (2018). arXiv preprint arXiv:1802.07073 5. Buchbinder, N., Feldman, M., Naor, J.S., Schwartz, R.: Submodular maximization with cardinality constraints. In: Proceedings of the Twenty-Fifth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 1433–1452 (2014) 6. Chekuri, C., Vondr´ ak, J., Zenklusen, R.: Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput. 43(6), 1831–1879 (2014) 7. Conforti, M., Cornu´ejols, G.: Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discret. Appl. Math. 7(3), 251–274 (1984) 8. Epasto, A., Lattanzi, S., Vassilvitskii, S., Zadimoghaddam, M.: Submodular optimization over sliding windows. In: Proceedings of the 26th International Conference on World Wide Web, pp. 421–430 (2017) 9. Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions - II. Polyhedral Combinatorics, pp. 73–87 (1978) 10. Iwata, S., Orlin, J.B.: A simple combinatorial algorithm for submodular function minimization. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1230–1237 (2009) 11. Kawase, Y., Sumita, H., Fukunaga, T.: Submodular maximization with uncertain knapsack capacity. In: Latin American Symposium on Theoretical Informatics, pp. 653–668 (2018) 12. Kohli, P., Kumar, M.P., Torr, P.H.: P3 & beyond: move making algorithms for solving higher order functions. IEEE Trans. Pattern Anal. Mach. Intell. 31(9), 1645–1656 (2009) 13. Krause, A., Guestrin, C., Gupta, A., Kleinberg, J.: Near-optimal sensor placements: maximizing information while minimizing communication cost. In: Proceedings of the 5th International Conference on Information Processing in Sensor Networks, pp. 2–10 (2006)
Stochastic Greedy Algorithm Is Still Good: Maximizing
497
14. Lin, H., Bilmes, J.: A class of submodular functions for document summarization, pp. 510–520 (2011) 15. Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondr´ ak, J., Krause, A.: Lazier than lazy greedy. In: AAAI, pp. 1812–1818 (2015) 16. Narasimhan, M., Bilmes, J.: Pac-learning bounded tree-width graphical models. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 410–417 (2004) 17. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions - I. Math. Prog. 14(1), 265–294 (1978) 18. Niazadeh, R., Roughgarden, T., Wang, J.R.: Optimal algorithms for continuous non-monotone submodular and dr-submodular maximization (2018). arXiv preprint arXiv:1805.09480 19. Qian, C., Yu, Y., Tang, K.: Approximation guarantees of stochastic greedy algorithms for subset selection. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI, pp. 1478–1484 (2018) 20. Schoenebeck, G., Tao, B.: Beyond worst-case (in) approximability of nonsubmodular influence maximization. In: International Conference on Web and Internet Economics, pp. 368–382 (2017) 21. Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004) 22. Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International Conference on Machine Learning, pp. 1954–1963 (2015)
Towards Multi-tree Methods for Large-Scale Global Optimization Pavlo Muts(B) and Ivo Nowak Hamburg University of Applied Sciences, Hamburg, Germany {pavlo.muts,ivo.nowak}@haw-hamburg.de
Abstract. In this paper, we present a new multi-tree approach for solving large scale Global Optimization Problems (GOP), called DECOA (Decomposition-based Outer Approximation). DECOA is based on decomposing a GOP into sub-problems, which are coupled by linear constraints. It computes a solution by alternately solving sub- and masterproblems using Branch-and-Bound (BB). Since DECOA does not use a single (global) BB-tree, it is called a multi-tree algorithm. After formulating a GOP as a block-separable MINLP, we describe how piecewise linear Outer Approximations (OA) can be computed by reformulating nonconvex functions as a Difference of Convex functions. This is followed by a description of the main- and sub-algorithms of DECOA, including a decomposition-based heuristic for finding solution candidates. Finally, we present preliminary results with MINLPs and conclusions. Keywords: Global optimization · Decomposition method Mixed-integer nonlinear programming
1
·
Introduction
We consider block-separable (or quasi-separable) MINLP problems of the form min cT x
s. t. x ∈ P, xk ∈ Xk , k ∈ K
(1)
with P := {x ∈ Rn : aTj x ≤ bj , j ∈ J} Xk := Gk ∩ Lk ∩ Yk ,
(2)
where Gk := {y ∈ [xk , xk ] ⊂ Rnk : gkj (y) ≤ 0, j ∈ [mk ]}, Lk := {y ∈ [xk , xk ] ⊂ Rnk : aTkj y ≤ bkj , j ∈ Jk }, Yk := {y ∈ Rnk : yi ∈ Z, i ∈ Ik }. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 498–506, 2020. https://doi.org/10.1007/978-3-030-21803-4_50
(3)
Towards Multi-tree Methods for Large-Scale Global Optimization
499
n The vector of variables x ∈ R is partitioned into |K| blocks suchnkthat n = nk , where nk is the dimension of the k -th block, and xk ∈ R denotes k∈K
the variables of the k-th block. The vectors x, x ∈ Rn denote lower and upper bounds on the variables. The linear constraints defining the polytope P are called global. The constraints defining sub-sets Xk are called local. Set Xk is defined by nonlinear local constraints, denoted by Gk , by linear local constraints, denoted by Lk , and by integrality constraints, denoted by Yk . In this paper, all the nonlinear local constraint functions gkj : Rnk → R are assumed to be bounded and continuously differentiable within the set [xk , xk ]. Linear global constraints P are defined by aj ∈ Rn , bj ∈ R, j ∈ J and linear local constraints Lk are defined by akj ∈ Rnk , bkj ∈ R, j ∈ Jk . Set Yk defines the set of integer values of variables , where Ik is an index set. The linear objective function is defined by xki , i ∈ Ik cTk xk , ck ∈ Rnk and matrix Ak ∈ Rm×nk with m = |J| + |Jk |, is cT x := k∈K
defined by columns with the indices of k-th block. Furthermore, we define sets Gk , Y := Yk , X := Xk . (4) G := k∈K
k∈K
k∈K
Note that it is possible to reformulate a general sparse MINLP defined by factorable functions gkj as a block-separable optimization problem with a given maximum block-size nk by adding new variables and copy-constraints [5,7,8]. Multi-tree Decomposition Algorithms. Decomposition is a very general approach that can be applied to convex optimization, as well as non-convex optimization and discrete optimization. These methods are based on dividing a model into smaller sub-problems, which can be solved in parallel. The solutions of the subproblems are used for updating a global master problem. If the master problem is a MIP, this type of strategy is called multi-tree, because an individual branchand-bound tree is built for each MIP instance. Using one global master which is updated during the solution process, i.e. new constraints are added during the solution process in order to improve the master problem, is called single-tree strategy. More discussion on single-tree and multi-tree approaches can be found in [3].
2
Polyhedral Outer-Approximation
Fundamental for an OA-solver is a method for computing a polyhedral OA of feasible set G of problem (1) with an arbitrary precision. An example of an OA master problem is given by min s.t.
cT x, k , k ∈ K, x ∈ P, xk ∈ X
(5)
500
P. Muts and I. Nowak
where k := Yk ∩ Lk ∩ G k , with Yk := X
Yk , Rnk .
(6)
k ⊇ Xk denote a polyhedral OA of Gk and Xk , respec ⊇ Gk and X The sets G k . := G Xk and G tively. Note that X := k∈K
2.1
k∈K
Piecewise DC Outer Approximation
Consider a DC formulation (Difference of Convex functions) gkj (x) = hkj (x) − qkj (x), defined by the convexified nonlinear and quadratic functions ϕki (xi ), hkj (x) := gkj (x) + qkj (x) and qki (x) := σkj
(7)
i∈Ikj ∂g where ϕki (xi ) := x2i and Ikj = {i : ∂x = const} denotes an index set of nonlinear i variables of constraint function gkj . The convexification parameters are computed by σkj = max{0, −vkj } and vkj is a lower bound of the optimal value of the following nonlinear eigenvalue problem
min y T Hkj (x) y
s. t. x ∈ [xk , xk ], y ∈ Rnk , y 2 = 1,
(8)
with Hkj = ∇2 gkj . A convex polyhedral underestimator of hkj is defined by ¯ kj,ˆy (x), ˇ kj (x) = max h h yˆ∈Tk
where
(9)
¯ kj (x, yˆ) := hkj (ˆ y ) + ∇hkj (ˆ y )T (x − yˆ) h
denotes the linearization of hkj at the sample point yˆ ∈ Tk ⊂ Rnk . A piecewise linear overestimator qˇki (x) of qkj is defined by replacing ϕki by ϕˇki (xi ) := ϕki (pki,t )
pki,t+1 − xi xi − pki,t + ϕki (pki,t+1 ) , pki,t+1 − yˆki,t pki,t+1 − pki,t
(10)
where xi ∈ [pki,t , pki,t+1 ], t ∈ {1, . . . , |Bki | − 1}, regarding breakpoints Bki := {pki,1 , . . . , pki,|Bki | }. Then a DC polyhedral underestimator gˇkj of gkj is given by ˇ kj (x) − qˇkj (x). gˇkj (x) := h
(11)
A DC-based OA of G is denoted by k = {xk ∈ Lk : (xk , rk ) ∈ C k ∩ Q k }, G
(12)
Towards Multi-tree Methods for Large-Scale Global Optimization
where ˇ kj (y) − σkj k := {y ∈ [x , xk ], rk ∈ Rnk : h C k
501
rki ≤ 0},
i∈Ikj
k := {y ∈ [x , xk ], rk ∈ Rnk : rk − ϕˇk (y) ≤ 0}. Q k k is defined by linearization cuts as in (9) and the set Q k is The polytope C defined by breakpoints Bk as in (10).
3
DECOA
In this section, we describe the DECOA (Decomposition-based Outer Approximation) algorithm for solving (1), depicted in Algorithm 1. Algorithm 1. Main algorithm of DECOA 1: function OaSolve L, B) ← initOa 2: (ˆ x, C, v←∞ 3: 4: repeat B) ← oaLocalSearch(ˆ 5: (˜ x, C, x, B) B) ← fixAndRefine(˜ 6: (C, x) ˜ < v then 7: if x ˜ ∈ X and cT x ˜, v ← cT x∗ 8: x∗ ← x 9: if v − cT x ˆ < then return (ˆ x, x∗ ) ←tightenBounds(x∗ ) 10: (x, x, C) B) 11: x ˆ ←solveOa(C, xk ) 12: for k ∈ K do yˆk ←project(ˆ B) ← addCutsAndPoints(ˆ 13: (C, x, yˆ, B) ˆ yˆki . Then procedure addNonconvexLinCutsAndPoints(ˆ xk , yˆk ) is called, which for some nonconvex constraints gkj adds breakpoints, e.g. like in AMP algorithm [4], and cuts at the new breakpoints.
Algorithm 2. Cut and breakpoint generation 1: function addCutsAndPoints(ˆ x, yˆ, B) 2: for k ∈ K do k ← addActiveLinCut(ˆ yk ) 3: C k , Bk ) ← addNonconvexLinCutsAndPoints(ˆ xk , yˆk ) 4: (C 5: return (C, B)
3.2
OA-Start Heuristic
Algorithm 3 describes the procedure InitOa for computing an initial OA. Algorithm 3. OA-Initialization 1: function initOa 2: for k ∈ K do 3: for dk ∈ {ck , 1, −1} do k , Sk ) ← oaSubSolve(dk ) 4: (ˆ xk , C ˆk ≤ dTk xk } 5: Lk ← {xk ∈ Lk : dTk x 6: [xk , xk ] ← box(Sk ) 7: Bk ← {xk , xk , xk , xk } ←addRnlpCuts(x , x ) 8: (ˆ x, C) L, B) 9: return (ˆ x, C,
It uses the procedure oaSubSolve for (approximately) solving sub-problems x ˆk = argmin dTk xk where dk ∈ Rnk is a search direction.
s. t. xk ∈ Xk ,
(15)
Towards Multi-tree Methods for Large-Scale Global Optimization
503
Note that 1 denotes a vector of ones and box(S) denotes the smallest interval [x , x ] containing a set S. The procedure addRnlpCuts(x , x ) performs cutting plane iterations for solving the RNLP-OA (˜ y , s) = argmin cT x + γ s 1 s. t. Ax ≤ b + s, x ∈ [x , x ],
s≥0
(16)
hkj (xk ) − qˇkj (xk ) ≤ 0, j ∈ Jk , k ∈ K, where
qˇkj (xk ) = σkj
ϕki (xki )
i∈[nk ]
xki − xki xki − xki + ϕ (x ) . ki ki xki − xki xki − xki
(17)
Furthermore, adjacentPoints(ˆ x, B) returns the smallest breakpoint interval containing x ˆ. 3.3
Solving OA Sub-problems
Algorithm 4 describes the procedure oaSubSolve(dk ) for solving sub-problem (15). It uses the procedure solveSubOa for solving the local OA master-problem min dTk xk
k . s. t. xk ∈ X
(18)
Furthermore, it uses solveFixedNlp(ˆ xk ) for solving the following local NLP problem with fixed integer variables and starting point x ˆk : min cTk xk
s. t. xk ∈ Lk ∩ Gk , xki = x ˆki , i ∈ Ik .
Note that Algorithm 4 uses temporary breakpoints Bk , which are initialized using initLocalBreakPoints. Algorithm 4. OA sub-solver 1: function oaSubSolve(dk ) k ) 2: x ˆk ←solveSubOa(C 3: Bk ← initLocalBreakPoints 4: repeat xk ) 5: yˆk ← project(ˆ k , Bk ) ← addCutsAndPoints(ˆ xk , yˆk , Bk ) 6: (C k ) 7: x ˆk ←solveSubOa(C 8: until stopping criterion xk ) 9: x∗k ← solveFixedNlp(ˆ 10: Sk ← Sk ∪ {x∗k } k , Sk ) 11: return (ˆ xk , C
504
3.4
P. Muts and I. Nowak
Fix-and-Refine
The procedure fixAndRefine, described in Algorithm 5, generates cuts and breakpoints per block by solving a partly-fixed sub-problem similarly as in Algorithm 4. It uses the procedure solveFixOA for solving a MIP-OA problem, where variables are fixed for all blocks except for one: min cTk xk + γ s 1 s.t.
Ak xk ≤ s + b −
Am x ˜m ,
m∈K\{k}
k , xk ∈ X
(19)
s ≥ 0,
where x ˜ is a solution candidate of (1). Algorithm 5. Fixation-based cut and breakpoint generation 1: function FixAndRefine(˜ x, B) 2: for k ∈ K do k , Bk ) xk , C 3: x ˆk ← solveFixOA(˜ xk , x ˜ k , Bk ) 4: (Ck , Bk ) ← addCutsAndPoints(ˆ 5: repeat k , Bk ) xk , C 6: x ˆk ← solveFixOA(˜ xk ) 7: yˆk ← project(ˆ k , Bk ) ← addCutsAndPoints(ˆ xk , yˆk , Bk ) 8: (C 9: until stopping criterion B) 10: return (C,
3.5
OA-Based Local Search
Algorithm 6 describes the decomposition-based procedure oaLocalSearch for computing a solution candidate x ˜ ∈ X ∩ P of problem (1). It iteratively solves the following restricted MIP-master-problem: x ˆ = argmin cT x
∩ [x , x ]. s. t. x ∈ P ∩ G
(20)
regarding target bounds x , x , projects the point x ˆ ∈ P onto X and adds cuts and breakpoints. Finally, in order to compute solution candidate x ˜ ∈ X ∩ P , Algorithm 6 calls procedure solveFixedNlp(ˆ y ) for solving the following NLP master problem with fixed integer variables: min cT x, s.t. x ∈ P ∩ X, xki = x ˆki , i ∈ Ik , k ∈ K.
(21)
Note that the algorithm uses temporary breakpoints B without changing breakpoints B.
Towards Multi-tree Methods for Large-Scale Global Optimization
505
Algorithm 6. OA-based local search 1: function oaLocalSearch(ˆ x, B) x, B) 2: (x , x ) ← adjacentPoints(ˆ x 3: (C, ˆ) ←addRnlpCuts(x , x ) 4: repeat 5: for k ∈ K do xk ) 6: yˆk ←project(ˆ 7: Bk ← {xk , xk } B ) ← addCutsAndPoints(ˆ x, yˆ, B ) 8: (C, 9: x ˆ ←solveOa(C, B ) 10: until stopping criterion 11: x ˜ ← solveFixedNlp(ˆ x) 12: B ← B ∪ B B) 13: return (˜ x, C,
3.6
Bound Tightening
The method tightenBounds(x∗ ) performs a similar Optimization-Based Bound Tightening (OBBT) strategy as proposed in [4]. It is based on minimizing : cT x ≤ cT x∗ } using or maximizing some of the variables xki over the set {x ∈ X a similar approach as in Algorithm 4.
4
Numerical Experiments Using Decogo
Algorithm 1 is currently implemented as part of the parallel MINLP-solver Decogo (DECOmposition-based Global Optimizer) [6]. It uses Pyomo [2], an algebraic modelling language in Python, and two subsolvers: SCIP 5.0 [1] for solving MIP problems and IPOPT 3.12.8 [10] for solving LP and NLP problems.
Fig. 1. Number of MIP solutions per problem size for convex MINLPs
506
P. Muts and I. Nowak
A preliminary version of Algorithm 1 has been tested on 70 convex MINLP instances from MINLPLib [9] with 11 to 2720 variables. The results show that the number of MIP solutions (procedure solveOa of Algorithm 1) is independent of problem size. Figure 1 presents this property of the algorithm. The average number of MIP solutions is 2.37.
5
Conclusions
We introduced DECOA, a multi-tree decomposition-based method for solving large scale MINLP models (1), based on a DC-approach for computing a polyhedral OA. Preliminary numerical experiments with Decogo show that the presented OA-method solves convex MINLPs with a small number of MIP solutions. Many ideas of the presented methods are new, and there is much room for improvement.
References 1. Gleixner, A., Eifler, L., Gally, T., Gamrath, G., Gemander, P., Gottwald, R.L., Hendel, G., Hojny, C., Koch, T., Miltenberger, M., M¨ uller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schl¨ osser, F., Serrano, F., Shinano, Y., Viernickel, J.M., Vigerske, S., Weninger, D., Witt, J.T., Witzig, J.: The SCIP Optimization Suite 5.0. Technical report, www.optimization-online.org/DB HTML/2017/ 12/6385.html (2017) 2. Hart, W.E., Laird, C.D., Watson, J.P., Woodruff, D.L., Hackebeil, G.A., Nicholson, B.L., Siirola., J.D.: Pyomo–optimization modeling in python, vol. 67, second edn. Springer Science & Business Media, Heidelberg (2017) 3. Lundell, A., Kronqvist, J., Westerlund, T.: The supporting hyperplane optimization toolkit. www.optimization-online.org/DB HTML/2018/06/6680.html (2018) 4. Nagarajan, H., Lu, M., Wang, S., Bent, R., Sundar, K.: An adaptive, multivariate partitioning algorithm for global optimization of nonconvex programs. J. Global Optim. (2019) 5. Nowak, I.: Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming. Birkh¨ auser (2005) 6. Nowak, I., Breitfeld, N., Hendrix, E.M.T., Njacheun-Njanzoua, G.: Decompositionbased inner- and outer-refinement algorithms for global optimization. J. Global Optim. 72(2), 305–321 (2018) 7. Tawarmalani, M., Sahinidis, N.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 225–249 (2005) 8. Vigerske, S.: Decomposition in multistage stochastic programming and a constraint integer programming approach to mixed-integer nonlinear programming. Ph.D. thesis, Humboldt-Universit¨ at zu Berlin (2012) 9. Vigerske, S.: MINLPLib. http://minlplib.org/index.html (2018) 10. W¨ achter, A., Lorenz, B.T.: On the implementation of an interior-point filter linesearch algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Optimization under Uncertainty
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming Manuel Arana-Jim´enez(B) Department of Statistics and Operational Research, University of C´ adiz, Cadiz, Spain [email protected]
Abstract. In this work, it is proposed a new method for obtaining Pareto solutions of a fully fuzzy multiobjective linear programming problem with fuzzy partial orders and triangular fuzzy numbers, without ranking functions, by means of solving a crisp multiobjective linear problem. It is provided an algorithm to generate Pareto solutions. Keywords: Multiobjective optimization · Fully fuzzy linear programming · Fuzzy numbers
1
Introduction
Fuzzy linear programming is a field where many researchers model decision making in fuzzy environment [3,8,11,15,17,31,32]. It is usual that not all variables and parameters in the fuzzy linear problem are assumed to be fuzzy numbers, although it is interesting to provide a general model for linear problems where all elements are fuzzy, called fully fuzzy linear programming problem ((FFLP) problem, for short). In this regard, Lofti et al. [30] proposed a method to find the fuzzy optimal solution of (FFLP) with equality constraints with symmetric fuzzy numbers. Kumar et al. [26] proposed a new method for finding the fuzzy optimal solutions of (FFLP) problems with equality constraints, using ranking function (see [3] and the bibliography there in). Najafi and Edalatpanah [35] made correction to the previous method. Khan et al. [24] studied (FFLP) problems with inequalities, and they also use ranking functions to compare the objective function values (see also [10,25]). Ezzati et al. [16] recovered the methods provided by Lofti et al. [30] and Kumar et al. [26] to propose a new method based on a multiobjective programming problem with equality constraints. Liu and Gao [29] have remarked some limitations of the existing method to solve (FFLP) problems. As applications, Chakraborty et al. [12] locate fuzzy optimal solutions in fuzzy transportation problems. Recently, Arana-Jim´enez [5] have provided a novel method to find fuzzy optimal (nondominated) solutions of (FFLP) problems with inequality constraints, with triangular fuzzy numbers and not necessarily symmetric, via solving a crisp multiobjective linear programming problem. This method does not require ranking functions. Supported by the research project MTM2017-89577-P (MINECO, Spain) and UCA. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 509–517, 2020. https://doi.org/10.1007/978-3-030-21803-4_51
510
M. Arana-Jim´enez
On the other hand, some models require decision maker to face several objectives at the same time. This type of problems includes multiobjective programming problems, where two or more objectives have to be optimized (minimized or maximized), and we deal with conflicts among the objectives. The Pareto optimality in multiobjective programming associates the concept of a solution with some property that seems intuitively natural, and is an important concept in mathematical models, economics, engineering, decision theory, optimal control, among others (see [2]). So, extending the idea of fuzzy linear programming to Fuzzy multiobjective linear programming, again the objectives appearing in it are conflicting in nature. Therefore, a concept of Pareto solution is necessary too. In such fuzzy multiobjective problems, Bharati et al. [9] comment that choice of best alternatives among the available need to rank the fuzzy numbers used in the model. They compare different methods using ranking functions, and propose the concept of Pareto-optimal solution suggested by Jimenez and Bilbao [22] by means of ranking function. Some applications can be found by Kumar et al. [27], for instance, to DEA. In the present work, and as an extension of [5], we face the challenge of studying a problem with fuzzy variables and parameters, that is, a fully fuzzy multiobjective linear programming problem ((FFMLP) problem, for short). In this regard, a new method is proposed to get fuzzy Pareto solutions, and no ranking functions are used. The structure is as follows. In next section, we present notations, arithmetic and partial orders on fuzzy numbers. Later, in Sect. 3, we formulate the fully fuzzy multiobjective linear programming problem and provide an algorithm to generate fuzzy Pareto solutions of (FFMLP) by means of solving an auxiliary crisp multiobjective programming problem. Finally, we conclude the paper and present future works. Due to length requirements on this text for the congress, proofs and examples are omitted and will be presented in a paper (extended version).
2
Preliminaries on Arithmetic and Partial Order on Fuzzy Numbers
A fuzzy set on Rn is a mapping u : Rn → [0, 1]. Each fuzzy set u has associated a family of α-level sets, which are described as [u]α = {x ∈ Rn | u(x) ≥ α} for any α ∈ (0, 1], and its support as supp(u) = {x ∈ Rn | u(x) > 0}. The 0-level of u is defined as the closure of supp(u), that is, [u]0 = cl(supp(u)). A very useful type of fuzzy set to model parameters and variables are the fuzzy numbers. Following Dubois and Prade [13,14], a fuzzy set u on R is said to be a fuzzy number if u is normal, this is there exists x0 ∈ R such that u(x0 ) = 1, upper semi-continuous function, convex, and (iv) [u]0 is compact. FC denotes the family of all fuzzy numbers. The α-levels of a fuzzy number can be represented by means of real interval, that is, [u]α = [uα , uα ] ∈ KC , uα , uα ∈ R, with KC is the set of real compact intervals. There exist many families of fuzzy numbers that have been applied to model uncertainty in different situations. some of the
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming
511
most popular are the L-R, triangular, trapezoidal, polygonal, gaussian, quasiquadric, exponential, and singleton fuzzy numbers. The reader is referred to [7,21,36] for a complete description of these families and their representation properties. Among them, we point out triangular fuzzy numbers, because of their easy modeling and interpretation (see, for instance, [13,23,24,30,36]), and whose definition is as follows. ˆ, a+ ) is said to be a triangular fuzzy Definition 1. A fuzzy number a ˜ = (a− , a number (TFN for short) if its membership function is given by ⎧ x−a− ⎪ ˆ, ⎨ aˆ−a− , if a− ≤ x ≤ a + + a ˜(x) = a+ −x , if a ˆ < x ≤ a , a −ˆ a ⎪ ⎩ 0, otherwise. At the same time, given a triangular fuzzy number a ˜ = (a− , a ˆ, a+ ), its αlevels are formulated as [˜ a]α = [a− + (ˆ a − a− )α, a+ − (a+ − a ˆ)α], for all α ∈ [0, 1]. This means that triangular fuzzy number are well determined by three real numbers a− ≤ a ˆ ≤ a+ . A unique triangular fuzzy number is characterized by means of the previous formulation of α-levels, such as Goestschel and Voxman [19] established. The set of all TFNs is denoted as TF . The nonnegativity condition on some parameters and variables in many optimization problems makes useful the following special consideration of TFNs. Let a ˜ be a fuzzy number. We say that a ˜ is nonnegative fuzzy number (nonpositive, a0 ≤ 0, respectively). So, in the case that a ˜ is a TFN, respectively) if a ˜0 ≥ 0 (˜ then a ˜ nonnegative (nonpositive, respectively) if and only if a− ≥ 0 (a+ ≤ 0, respectively). Classical arithmetic operations on intervals are well known, and can be referred to Moore [33,34] and Alefeld and Herzberger [1]. A natural extension of these arithmetic operations to fuzzy numbers u, v ∈ FC can be found described in [18,28], where the membership function of the operation u ∗v, with ∗ ∈ {+, ·}, is defined by (1) (u ∗ v)(z) = sup min{u(x), v(y)}. z=x∗y
Furthermore, the previous arithmetic operations can be provided by means of their α-levels as follows (see, [Theorem 2.6, [18]). For any α ∈ [0, 1]: [u + v]α = [uα + v α , uα + v α ] , α
[λu]
= [min{λuα , λuα }, max{λuα , λuα }] ,
[uv]α = [u]α × [v]α = [min {uα v α , uα v, uα v α , uα v α }, max {uα v α , uα v, uα v α , uα v α }].
(2) (3) (4)
512
M. Arana-Jim´enez
TF is closed under addition and multiplication by scalar. The above operations (2) and (3) are straightforward particularized to triangular fuzzy number ˆ, a+ ), ˜b = (b− , ˆb, b+ ) ∈ TF and λ ∈ R, then as follows. Given a ˜ = (a− , a a ˜ + ˜b = (a− + b− , a ˆ + ˆb, a+ + b+ ), (λa− , λˆ a, λa+ ) if λ ≥ 0, λ˜ a= + a, λa− ) if λ < 0. (λa , λˆ
(5) (6)
However, TF is not closed under the multiplication operation (4) (see, for instance, the examples in [39]). To avoid this situation, it is usual to apply a different multiplication operation between TFNs, such as those referenced in [5,23,24,26], which can be considered as an approximation to the multiplication given in (1). We provide the following definition for the multiplication: a˜b), (˜ a˜b)+ ) a ˜˜b = ((˜ a˜b)− , (˜ = (min{a− b− , a− b+ , a+ b− , a+ b+ }, a ˆˆb, max{a− b− , a− b+ , a+ b− , a+ b+ }). (7) In the case that a ˜ or ˜b is a nonnegative TFN, then the previous multiplication is reduced (see, for instance, [23,26]). For instance, if ˜b is nonnegative, then ⎧ − − ˆˆb, a+ b+ ), if a− ≥ 0, ⎨ (a b , a a ˜˜b = (a− b+ , a ˆˆb, a+ b+ ), if a− < 0, a+ ≥ 0, ⎩ − + ˆ + − (a b , a ˆb, a b ), if a+ < 0.
(8)
And if a ˜ and ˜b are nonnegative, then a ˜˜b = (a− b− , a ˆˆb, a+ b+ ).
(9)
To compare two fuzzy numbers, there exist several definitions based on interval binary relations (see e.g., [20]) which provides partial orders in fuzzy sets (see, e.g., [37,38]). Definition 2. Given u, v ∈ FC , it is said that: (i) μ ≺ ν if and only if μα < ν α and μα < ν α , for all α ∈ [0, 1], (ii) μ ν if and only if μα ≤ ν α and μα ≤ ν α , for all α ∈ [0, 1], In a similar way, we define , . In case of TFNs, the previous definition can be really reduced, as recently Arana-Jim´enez and Blanco [6] have proved: ˆ, a+ ), ˜b = (b− , ˆb, b+ ) ∈ TF , then: Theorem 1. Given a ˜ = (a− , a ˆ < ˆb and a+ < b+ . (i) a ˜ ≺ ˜b if and only if a− < b− , a − − ˜ ˆ ≤ ˆb and a+ ≤ b+ . (ii) a ˜ b if and only if a ≤ b , a The relations , are obtained in a similar manner. Note that to say that a ˜ is nonnegative is equivalent to write a ˜˜ 0 = (0, 0, 0).
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming
3
513
Fully Fuzzy Multiobjective Linear Problem
Consider a fuzzy vector z˜ = (˜ z1 , . . . , z˜p ) ∈ TF × · · · × TF = (TF )p , with p ∈ N. For the sake of simplicity, we write z˜ = (˜ zi )pi=1 . In a same manner, x = − + − + 3n n ˆ1 , x1 , . . . , xn , x ˆn , xn ) ∈ R can be written as x = (x− ˆj , x+ (x1 , x j ,x j )j=1 , and so on. Let us define the following formulation of a Fully Fuzzy Multiobjective Linear Problem: ⎞p ⎛ n
(FFMLP) Minimize z˜ = (˜ zi )pi=1 = ⎝ c˜ij x ˜j ⎠ subject to
n
j=1
j=1
a ˜rj x ˜j ˜br ,
0, x ˜j ˜
i=1
r = 1, . . . , m,
j = 1, . . . , n,
c1 , . . . , c˜n ) ∈ (TF )n is where z˜ is the fuzzy vector objective function, each c˜i = (˜ the fuzzy vector with the coefficients of the ith component of the fuzzy vector ˜n ) is the fuzzy vector with the fuzzy decision variables, function, x ˜ = (˜ x1 , . . . , x and a ˜rj and ˜br are the fuzzy technical coefficients. Since we deal with (FFMLP) without any kind of ranking function, it is necessary to define a fuzzy nondominated solution concept, as follows. ˜ ˜¯ is said to be a fuzzy Definition 3. Let x ¯ be a feasible solution for (FFMLP). x Pareto solution of (FFMLP) if there does not exist a feasible solution x ˜ for n n n
˜ ¯j for all i = 1, . . . , p, with (FFMLP) such that c˜ij x ˜j c˜ij x c˜i0 j x ˜j = n
j=1
j=1
j=1
˜ ¯j for some i0 ∈ {1, . . . , p}. c˜i0 j x
j=1
Following the notation of TFNs, we have: z˜i = (zi− , zˆi , zi+ ), i = 1, . . . , p, ˆj , x+ j = 1, . . . , n, x ˜j = (x− j ,x j ), − c˜ij = (cij , cˆij , c+ ), i = 1, . . . , p, j = 1, . . . , n, ij − + a ˜rj = (arj , a ˆrj , arj ), r = 1, . . . , m, j = 1, . . . , n, ˜br = (b− , ˆbr , b+ ), r = 1, . . . , m. r r Let us remark that x ˜j is a nonnegative TFN, and so the multiplication role is ˜j is computed by one of the three expressions given by (8). This means that c˜ij x in (8), which only depends on c˜ij . Since the fuzzy coefficients c˜ij are known, then the expressions of c˜ij x ˜j = ((˜ cij x ˜j )− , (˜ cij x ˜j ), (˜ cij x ˜j )+ ) are also known. The same ˜j . occurs to a ˜rj x Problem (FFMLP) has associated the following crisp multiobjective problem:
514
M. Arana-Jim´enez n n n
(CMLP) Minimize f (x) = ( (˜ cij x ˜j )− , ˜j ), (˜ cij x ˜j )+ )pi=1 (˜ cij x j=1
n
subject to (˜ arj x ˜j )− ≤ b− r , j=1 n
˜j ) ≤ ˆbr , (˜ a rj x
j=1 n
(˜ arj x ˜j )+ ≤ b+ r ,
j=1
j=1
r = 1, . . . , m, r = 1, . . . , m, r = 1, . . . , m,
j=1 x− j −
x ˆj ≤ 0, j = 1, . . . , n, j = 1, . . . , n, x ˆj − x+ j ≤ 0, ≥ 0, x ˆ ≥ 0, x+ j = 1, . . . , n. x− j j j ≥ 0,
n 3n ˆj , x+ f : R3n → R3p is a vector function, with the variable x = (x− j ,x j )j=1 ∈ R , with fh linear functions, h = 1, . . . , 3p. And since all constraints are represented as linear inequalities on the variable x, then (CMLP) is a multiobjective linear programming problem. Recall that a feasible point x ¯ ∈ R3n of (CMLP) is said to be a Pareto solution if there does not exist another feasible point x such x) fh (x), for all h = 1, . . . , 3p, and fh0 (¯ x) < fh0 (x), for some h0 ∈ that fh (¯ {1, . . . , 3p}. The relationship between the fuzzy Pareto solutions of (FFMLP) and the Pareto solutions of (CMLP) is as follows.
˜n ) with x ˜j = (x− ˆj , x+ ∈ TF , j Theorem 2. x ˜ = (˜ x1 , . . . , x j ,x j ) 1, . . . , n, is a fuzzy Pareto solution of (FFMLP) if and only if x − 3n ˆ1 , x+ ˆn , x+ is a Pareto solution of (CMLP). (x− n) ∈ R 1 ,x 1 , . . . , xn , x
= =
In the literature, we can find several methods to generate Pareto solutions of a multiobjective linear problem (see [2] and the bibliography therein). Most popular methods are based on scalarization. One of them is by means of related weighting problems, whose formulation can be as follows. Given (CMLP) and
3p w = (w1 , . . . , w3p ) ∈ R3p , wi > 0, i=1 wi = 1, we define the related weighting problem as (CMLP)w Minimize subject to
3p
i=1 n
j=1 n
j=1 n
wi fi (x) (˜ arj x ˜j )− ≤ b− r , ˜j ) ≤ ˆbr , (˜ a rj x (˜ arj x ˜j )+ ≤ b+ r ,
j=1 x− j −
r = 1, . . . , m, r = 1, . . . , m, r = 1, . . . , m,
x ˆj ≤ 0, j = 1, . . . , n, j = 1, . . . , n, x ˆj − x+ j ≤ 0, − ˆj ≥ 0, x+ j = 1, . . . , n. xj ≥ 0, x j ≥ 0,
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming
515
3p Theorem 3. Given w = (w1 , . . . , w3p ) ∈ R3p , wi > 0, i=1 wi = 1, if − + n 3n ˆj , xj )j=1 ∈ R is an optimal solution of the weighting optimizax = (xj , x tion problem (CMLP)w , then x ˜ = (˜ x1 , . . . , x ˜n ) with x ˜j = (x− ˆj , x+ j ,x j ) ∈ TF , j = 1, . . . , n, is a fuzzy Pareto solution of (FFMLP). The previous result allows us to outline a method to get fuzzy Pareto solutions for (FFMLP) problem, which can be written via the following algorithm. Algorithm Step 1
Step 2 Step 3
Step 4 Step 5
Define r ∈ N and a set of weights SW = {ws = (ws1 , . . . , ws3p ) ∈ R3p : s = 1, . . . k},
3p with wsi > 0, for all i, and i=1 wsi = 1, for all s = 1, . . . k D←∅ s←1 − 3n ˆs,1 , x+ ˆs,n , x+ Solve (MLP)ws → xs = (x− s,n ) ∈ R s,1 , x s,1 , . . . , xs,n , x If no solution, then go to Step 4 ˆs,j , x+ x ˜s,j ← (x− s,j , x s,j ), j = 1, . . . n xs,1 , x ˜s,2 , . . . x ˜s,n ) x ˜s ← (˜ D ← D ∪ {˜ xs } s←s+1 If s ≤ k, then go to Step 2 End
Given k ∈ N and a set of weights SW , and as a result of the application of the previous algorithm, it is obtained a set D of fuzzy Pareto solutions for (FFMLP) problem.
4
Conclusions
An equivalence between a (FFMLP) problem and a crisp multiobjective lineal programming problem is established, without loss of information and without ranking functions. As a result, an algorithm to obtain fuzzy Pareto solutions for a (FFMLP) problem has been provided. As future works, the techniques presented will be extended to get fuzzy Pareto solutions in interval and fuzzy fractional programming with applications to economy, among others.
References 1. Alefeld, G., Herzberger, J.: Introduction to Interval Computations. Academic Press, New York (1983) 2. Arana-Jim´enez, M. (ed.): Optimiality Conditions in Vector Optimization. Bentham Science Publishers Ltd, Bussum (2010)
516
M. Arana-Jim´enez
3. Arana-Jim´enez, M., Rufi´ an-Lizana, A., Chalco-Cano, Y., Rom´ an-Flores, H.: Generalized convexity in fuzzy vector optimization through a linear ordering. Inf. Sci. 312, 13–24 (2015) 4. Arana-Jim´enez, M., Antczak, T.: The minimal criterion for the equivalence between local and global optimal solutions in nondifferentiable optimization problem. Math. Meth. Appl. Sci. 40, 6556–6564 (2017) 5. Arana-Jim´enez, M.: Nondominated solutions in a fully fuzzy linear programming problem. Math. Meth. Appl. Sci. 41, 7421–7430 (2018) 6. Arana-Jim´enez, M., Blanco, V.: On a fully fuzzy framework for minimax mixed integer linear programming. Comput. Ind. Eng. 128, 170–179 (2019) 7. B´ aez-S´ anchez, A.D., Moretti, A.C., Rojas-Medar, M.A.: On polygonal fuzzy sets and numbers. Fuzzy Sets Syst. 209, 54–65 (2012) 8. Bellman, R.E., Zadeh, L.A.: Decision making in a fuzzy environment. Manag. Sci. 17, 141–164 (1970) 9. Bharati, S.K., Abhishek, S.R., Singh: A computational algorithm for the solution of fully fuzzy multi-objective linear programming problem. Int. J. Dynam. Control. https://doi.org/10.1007/s40435-017-0355-1 10. Bhardwaj, B., Kumar, A.: A note on the paper a simplified novel technique for solving fully fuzzy linear programming problems. J. Optim. Theory Appl. 163, 685–696 (2014) 11. Campos, L., Verdegay, J.L.: Linear programming problems and ranking of fuzzy numbers. Fuzzy Set. Syst. 32, 1–11 (1989) 12. Chakraborty, D., Jana, D.K., Roy, T.K.: A new approach to solve fully fuzzy transportation problem using triangular fuzzy number. Int. J. Oper. Res. 26, 153– 179 (2016) 13. Dubois, D., Prade, H.: Operations on fuzzy numbers. Ins. J. Syst. Sci. 9, 613–626 (1978) 14. Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York (1980) 15. Ebrahimnejad, A., Nasseri, S.H., Lotfi, F.H., Soltanifar, M.: A primal-dual method for linear programming problems with fuzzy variables. Eur. J. Ind. Eng. 4, 189–209 (2010) 16. Ezzati, R., Khorram, E., Enayati, R.: A new algorithm to solve fully fuzzy linear programming problems using the MOLP problem. Appl. Math. Model. 39, 3183– 3193 (2015) 17. Ganesan, K., Veeramani, P.: Fuzzy linear programs with trapezoidal fuzzy numbers. Ann. Oper. Res. 143, 305–315 (2006) 18. Ghaznavi, M., Soleimani, F., Hoseinpoor, N.: Parametric analysis in fuzzy number linear programming problems. Int. J. Fuzzy Syst. 18(3), 463–477 (2016) 19. Goestschel, R., Voxman, W.: Elementary fuzzy calculus. Fuzzy Sets Syst. 18, 31–43 (1986) 20. Guerra, M.L., Stefanini, L.: A comparison index for interval based on generalized Hukuhara difference. Soft. Comput. 16, 1931–1943 (2012) 21. Hanss, M.: Applied Fuzzy Arithmetic Springer, Stuttgart (2005) 22. Jimenez, M., Bilbao, A.: Pareto-optimal solutions in fuzzy multiobjective linear programming. Fuzzy Sets Syst. 160, 2714–2721 (2009) 23. Kaufmann, A., Gupta, M.M.: Introduction to Fuzzy Arithmetic Theory and Applications. Van Nostrand Reinhold, New York (1985) 24. Khan, I.U., Ahmad, T., Maan, N.: A simplified novel technique for solving fully fuzzy linear programming problems. J. Optim. Theory Appl. 159, 536–546 (2013)
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming
517
25. Khan, I.U., Ahmad, T., Maan, N.: A reply to a note on the paper “A simplified novel technique for solving fully fuzzy linear programming problems”. J. Optim. Theory Appl. 173, 353–356 (2017) 26. Kumar, A., Kaur, J., Singh, P.: A new method for solving fully fuzzy linear programming problems. Appl. Math. Model. 35, 817–823 (2011) 27. Mehlawat, M.K., Kumar, A., Yadav, S., Chen, W.: Data envelopment analysis based fuzzy multi-objective portfolio selection model involving higher moments. Inf. Sci. 460, 128–150 (2018) 28. Liu, B.: Uncertainty Theory. Springer-Verlag, Heidelberg (2015) 29. Liu, Q., Gao, X.: Fully fuzzy linear programming problem with triangular fuzzy numbers. J. Comput. Theor. Nanosci. 13, 4036–4041 (2016) 30. Lotfi, F.H., Allahviranloo, T., Jondabeha, M.A., Alizadeh, L.: Solving a fully fuzzy linear programming using lexicography method and fuzzy approximate solution. Appl. Math. Modell. 3, 3151–3156 (2009) 31. Maleki, H.R., Tata, M., Mashinchi, M.: Linear programming with fuzzy variables. Fuzzy Set. Syst. 109, 21–33 (2000) 32. Maleki, H.R.: Ranking functions and their applications to fuzzy linear programming. Far East J. Math. Sci. 4, 283–301 (2002) 33. Moore, R.E.: Interval Analysis. Prentice-Hall, Englewood Cliffs (1966) 34. Moore, R.E.: Method and Applications of Interval Analysis. SIAM, Philadelphia (1979) 35. Najafi, H.S., Edalatpanah, S.A.: A note on ”A new method for solving fully fuzzy linear programming problems”. Appl. Math. Model. 37, 7865–7867 (2013) 36. Stefanini, L., Sorini, L., Guerra, M.L.: Parametric representation of fuzzy numbers and application to fuzzy calculus. Fuzzy Sets Syst. 157(18), 2423–2455 (2006) 37. Stefanini, L., Arana-Jim´enez, M.: Karush-Kuhn–Tucker conditions for interval and fuzzy optimization in several variables under total and directional generalized differentiability. Fuzzy Sets Syst. 262, 1–34 (2019) 38. Wu, H.C.: The optimality conditions for optimization problems with convex constraints and multiple fuzzy-valued objective functions. Fuzzy Optim. Decis. Making 8, 295–321 (2009) 39. Yasin Ali Md., Sultana, A., Khodadad Kha, A.F.M.: Comparison of fuzzy multiplication operation on triangular fuzzy number. IOSR J. Math. 12(4), 35–41 (2016)
Minimax Inequalities and Variational Equations Maria Isabel Berenguer(B) , Domingo G´ amez , A. I. Garralda–Guillem , and M. Ruiz Gal´ an Department of Applied Mathematics, University of Granada, E.T.S. Ingenier´ıa de Edificaci´ oo ´n, Granada, Spain {maribel,domingo,agarral,mruizg}@ugr.es
Abstract. In this paper we study some weak conditions guaranteeing the validity of several minimax inequalities and illustrate the possibilities of such a tool for characterizing the existence of solutions of certain variational equations. Keywords: Minimax inequalities
1
· Variational equations
Introduction
Minimax inequalities are normally associated with game theory. This was the original motivation of von Neumann work, in 1928, but in mathematical literature, generalizations of von Neumann results, called minimax theorems, became objects of study in their own right. These generalizations focuss on various directions. Some of them pay attention on topological conditions, other on the study of weak convexity conditions (see [23]). At the same time minimax inequalities have turned out to be a powerful tool in other fields: see, for instance, [4,5,12,13,15,16,18,21,22]. In this work, Sect. 2, we illustrate the applicability of a minimax inequality to analyse the existence of a solution for a quite general variational inequalities system. After that, in Sect. 3, we explore some new generalizations of minimax theorems with weak convexity conditions. For the first aim we analyse a class of system which arises in many situations. To evoke one of them, let us recall that the study of variational equations with constraints emerges naturally, among others, from the context of the elliptic boundary value problem, when their essential boundary conditions are treated as constraints in their standard variational formulation. This leads one to its variational formulation, which coincides with the system of variational equations: z ∈ Z ⇒ f (z) = a(x0 , z) , find x0 ∈ X such that y ∈ Y ⇒ g(y) = b(x0 , y) Partially supported by project MTM2016-80676-P (AEI/FEDER, UE) and by Junta de Andaluc´ıa Grant FQM359. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 518–525, 2020. https://doi.org/10.1007/978-3-030-21803-4_52
Minimax Inequalities and Variational Equations
519
for some Banach spaces X and Y , a closed vector subspace Z of X, some continuous bilinear forms a : X × X −→ R and b : X × Y −→ R, and f ∈ X ∗ and g ∈ Y ∗ (“∗ ” stands for “topological dual space”): see the details, for instance, in [10, Sect. 4.6.1]. In a more general way, we deal with the following problem: let X be a real reflexive Banach space, N ≥ 1, and suppose that for each j = 1, . . . , N , Yj is a real Banach space, yj∗ ∈ Yj∗ , Cj is a convex subset of Yj with 0 ∈ Cj , and aj : X × Yj −→ R is a bilinear form satisfying yj ∈ Cj ⇒ aj (·, yj ) ∈ X ∗ ; then ⎧ ⎨ y1 ∈ C1 ⇒ y1∗ (y1 ) ≤ a1 (x0 , y1 ) ··· find x0 ∈ X such that . (1) ⎩ ∗ (yN ) ≤ aN (x0 , yN ) yN ∈ CN ⇒ yN This kind of variational system is so general that it includes certain mixed variational formulations associated with some elliptic problems, those in the so-called Babuˇska–Brezzi theory (see, for instance [3,9] and some of its generalizations [7]).
2
Variational Equations for Reflexive Spaces
Now, we focus on deriving an extension of the Lax–Milgram theorem as well as a characterization of the solvability of a system of variational equations, using as a tool a minimax inequality. To this aim we evoke the minimax inequality of von Neumann–Fan, a particular case of [6, Theorem 2]: Theorem 1. Let us assume that X and Y are nonempty and convex subsets of two real vector spaces. If in addition X is a topological compact space and f : X × Y −→ R is concave and upper-semicontinuous on X and convex on Y , then max inf f (x, y) = inf max f (x, y). x∈X y∈Y
y∈Y x∈X
As a first application of minimax inequalities on variational equations we show this version of the Lax–Milgram lemma first appeared in [20]. As usual, + denotes “positive part”. Theorem 2. Let E be a real reflexive Banach space, F be a real normed space, y0∗ ∈ F ∗ , a : E × F −→ R be bilinear and let C be a nonempty convex subset of F such that for all y ∈ C, a(·, y) ∈ E ∗ . Then there exits x0 ∈ E : y ∈ C ⇒ y0∗ (y) ≤ a(x0 , y)
(2)
if, and only if, there exists α > 0 : y ∈ C ⇒
y0∗ (y) ≤ αa(·, y).
(3)
Moreover, if one of these equivalent statements is satisfied and, for some y ∈ C, we have that a(·, y) = 0, then
min{x0 : x0 ∈ E such that y ∈ C ⇒
y0∗ (y)
≤ a(x0 , y)} =
sup
y∈C, a(·,y)=0
y0∗ (y) a(·, y)
. +
(4)
520
M. I. Berenguer et al.
Proof. The fact that (2) ⇒ (3) is straightforward. On the other hand, let α > 0 in such a way that (3) holds. Then we apply the minimax theorem, Theorem 1 to the convex sets X := αBE , Y := C and the bifunction f (x, y) := a(x, y) − y0∗ (y),
((x, y) ∈ X × Y ),
where BE stands for the unit closed ball on E. We arrive at max inf (a(x, y) − y0∗ (y)) = inf max (a(x, y) − y0∗ (y)).
x∈αBE y∈C
y∈C x∈αBE
But the right-hand side term of this equality is nonnegative, since inf max (a(x, y) − y0∗ (y)) = inf (αa(·, y) − y0∗ (y))
y∈C x∈αBE
y∈C
and according to (3). Therefore, the left-hand side term is also nonnegative, i.e., there exists x0 ∈ E –in fact, x0 ∈ αBE – y ∈ C ⇒ y0∗ (y) ≤ a(x0 , y). To conclude, the fact that x0 can be choosen in αBE implies the stability condition (4). Now we show a more sophisticated application of the minimax theorem to state a characterization of the solvability of the system of variational inequalities (1). Let us first note that if such a system admits a solution x0 ∈ X, then, for N all (y1 , . . . , yN ) ∈ Cj we have that (add the N equations and take γ := x0 ) j=1
N ∗ yj (yj ) ≤ γ aj (·, yj ) . j=1 j=1
N
The next result establishes that this necessary condition is also sufficient (see [8, Theorem 2.2, Corollary 2.3]). Theorem 3. Let E be a real reflexive Banach space, N ≥ 1, F1 , . . . , FN be real Banach spaces, and for each j = 1, . . . , N , let Cj be a convex subset of Fj with 0 ∈ Cj and aj : E × Fj −→ R be a bilinear form such that yj ∈ Cj ⇒ aj (·, yj ) ∈ E ∗ . Then, the following assertions are equivalent: ∗ (i) For all y1∗ ∈ F1∗ , . . . , yN ∈ FN∗ there exists x0 ∈ E such that ⎧ ⎨ y1 ∈ C1 ⇒ y1∗ (y1 ) ≤ a1 (x0 , y1 ) ··· . ⎩ ∗ (yN ) ≤ aN (x0 , yN ) yN ∈ CN ⇒ yN
(5)
Minimax Inequalities and Variational Equations
(ii) There exists ρ > 0 such that N
(y1 , . . . , yN ) ∈
Cj
j=1
521
N ⇒ ρ yj ≤ aj (·, yj ) . j=1 j=1 N
Moreover, if one of these equivalent statements holds, then ρx0 ≤ max yj∗ . j=1,...,N
The next example illustrates the applicability of Theorem 3: Example 1. Given μ ∈ R and f ∈ Lp (0, 1), (1 < p < ∞), let us consider the boundary value problem: −z + μz = f on (0, 1) . (6) z(0) = 0, z(1) = 0 It is not difficult to prove that its mixed variational formulation is given as follows: find (x0 , z0 ) ∈ X × Z such that y ∈ Y ⇒ a(x0 , y) + b(y, z0 ) = y0∗ (y) , w ∈ W ⇒ c(x0 , w) + d(z0 , w) = w0∗ (w) where X := W 1,p (0, 1), Y := W 1,q (0, 1), Z := Lp (0, 1), W := Lq (0, 1), the continuous bilinear forms a : X × Y −→ R, b : Y × Z −→ R, c : X × W −→ R and d : Z × W −→ R defined for each x ∈ X, y ∈ Y, z ∈ Z and w ∈ W as 1 xy, a(x, y) := 0
1
y z,
b(y, z) := 0
1
c(x, w) :=
x w
0
and
d(z, w) := −μ
1
zw, 0
and the continuous linear forms y0∗ ∈ Y ∗ and w0∗ ∈ W ∗ given by y0∗ (y) := 0, and w0∗ (w)
(y ∈ Y )
1
:= −
f w, 0
(w ∈ W ).
522
M. I. Berenguer et al.
Now Theorem 3 applies, since this system adopts the form of (5) with N = 2, the reflexive space E := (X × Z)∗ , the Banach spaces F1 := Y , F2 := W , the convex sets C1 := F1 , C2 := F2 , the continuous bilinear forms a1 : E ∗ ×F1 −→ R and a2 : E ∗ × F2 −→ R defined at each (x, z) ∈ E ∗ , y ∈ F1 and w ∈ F2 as a1 ((x, z), y) := a(x, y) + b(y, z) and a2 ((x, z), w) := c(x, w) + d(z, w), and the continuous linear forms y1∗ := y0∗ and y2∗ := w0∗ . This mixed variational formulation admits a unique solution (x, z) ∈ E = (X × Z)∗ as soon as |μ| < 0.5: see [8, Example 2.4] for the details. Let us mention that the boundary problem in the preceding example does not fall into the scope of the Babuˇska–Brezzi theory, or even the more general one of [7], where the analysis of Theorem 3 is done by means of independent conditions of the involved bilinear forms. The abstract uniformity in the Theorem 3 allow us to state a Galerkin scheme for the system of inequalities under study, when the convex sets Cj coincide with the space Fj and the bilinear forms are continuous (see [8]). Let us also emphasize to conclude that the numerical treatment of some inverse problems related to the systems of variational equalities under consideration has been developed in [14].
3
Minimax Inequality Under Weak Conditions
In the previous section we have shown some applications of the von Neumann– Fan minimax inequality to variational analysis. Now we focus on the study of minimax inequalities. With the aim of deriving more general applications, a wide variety of this kind of results has appeared in the last decades. Most of them involves a certain concept of convexity and some topological conditions. Let us first recall that a minimax inequality is a result guaranteeing that, under suitable hypotheses, a function f : X × Y −→ R, with X and Y nonempty sets, satisfies the inequality inf sup f (x, y) ≤ sup inf f (x, y),
y∈Y x∈X
x∈X y∈Y
(7)
and therefore, the equality also holds, since the opposite inequality is always true. Note that when X is a compact topological space and f is upper semicontinuous on X, the inequality (7) can be written as in Theorem 1. Our starting point is the generalization of upper-semicontinuity introduced in ([1, Definition 8]) : if X is a nonempty topological space, Y is a nonempty set, x0 ∈ X and inf y∈Y supx∈X f (x, y) ∈ R, let us recall that f is infsup– transfer upper semicontinuous in x0 if, for (x0 , y0 ) ∈ X × Y , f (x0 , y0 )
under the restrictions GðxjUÞ ¼ G1 ðxjU Þ; . . .; Gp ðxjUÞ 0 : ðx ; x Þ 2 Rd Rd and x x x l u l u
ð3Þ
This problem involves only inequality restrictions, but the approach applies also to the general situation involving mixed equality/inequality restrictions. In order to evaluate statistics of the Pareto front associated to this problem, we generate ns variates from the random vector U : u1 ; u2 ; . . .; uns . For each variate ui , we find the Pareto front associated to: 8 Minimize FðxjU ¼ ui Þ ¼ ðF1 ðxjU ¼ ui Þ; F2 ðxjU ¼ ui ÞÞ > < d x2R ð4Þ GðxjU ¼ ui Þ ¼ G1 ðxjU ¼ ui Þ; . . .; Gp ðxjU ¼ ui Þ 0 : > : u:r: xl x x u In our experiments, we used the variational approach introduced in [6, 7] to determine the Pareto front Pi , but any available method may be used instead. A polynomial family U is used, but any total family may be considered. Each ui generates a Pareto front and we obtain ns fronts P ¼ fP1 ; P2 ; . . .; Pns g. Then, we determine the median front Pm corresponding to the minimal sum of Hausdorff distances dH to the other fronts in the set P: Pm is considered as the most representative member of the family P – its median. The determination of Pm allows us organizing the elements of P in function of their distance to it, so that we may find a x-quantile Px of a given confidence level x. Finally, we can define a confidence interval at the same level x by: Ix ¼ fPi jdH ðPi ; Pm Þ dH ðPi ; Px Þg:
ð5Þ
It is interesting to notice that our experiments with other distances, such as, for instance, dL2 and the modified Hausdorff distance dHM [6] led to the same results.
550
M. Bassi et al.
As an example, let us consider U ¼ ðu1 ; u2 ; u3 Þ, with three independent variables uniformly distributed on ½0; 0:1 and the Fonseca-Fleming problem under uncertainty given by: 8 > > > >
1ffiffi > p f ð x Þ ¼ 1 exp x þ u < 1 i i i¼1 3 Minimize 2 : P3 > x2R3 1 > p ffiffi > : f2 ð xÞ ¼ 1 exp i¼1 xi 3 ui > > > : under the restrictions 4 xi 4 ; i 2 f1; 2; 3g
ð6Þ
The results are exhibited on Fig. 2 where 200 Pareto fronts corresponding to a sample size ns ¼ 200 are plotted. On Fig. 2, the median appears in red, the confidence interval 90% appears in cyan, while blue curves lay outside the confidence interval.
Fig. 2. Pareto fronts for the Fonseca-Fleming problem under uncertainty. The sample has size ns ¼ 200. The median appears in red, the confidence interval 90% in cyan. Blue curves lay outside the confidence interval.
As a second example, we consider the ZDT3 problem under uncertainty: U ¼ ðu1 ; u2 Þ, with two independent variables uniformly distributed on ½0; 0:1 and ½0:15; 0:15, respectively:
Statistics of Pareto Fronts
551
8 >
x2Rn : under the restrictions 0 xi 1 i 2 f1; 2g for gðxÞ ¼ 1 þ 9ðx1 þ x2 Þ:
ð7Þ
On Fig. 3, the median Pareto front is the red curve and the confidence interval at 90% is the set of cyan fronts, while the blue fronts are the remaining 10% elements of the set and which are the farthest ones from the median front, in the sense of Hausdorff distance.
Fig. 3. Pareto fronts for the ZDT3 problem under uncertainty. The sample has size ns ¼ 200. The median appears in red, the confidence interval 90% in cyan. Blue curves lay outside the confidence interval
3 Statistics of Pareto Fronts by Using Generalized Fourier Series The preceding examples show that the construction of the median and of the confidence interval may request a large number of Pareto fronts, namely if a high confidence is requested. But each Pareto front results from an optimization procedure and the whole process may be expensive in terms of computational cost. To accelerate the procedure, we may consider the use of Generalized Fourier Series (GFS): given a relatively small sample of exact Pareto fronts, we may determine an expansion of the functions FðtjU Þ corresponding to the Pareto fronts and use it to generate a much larger sample of approached Pareto fronts.
552
M. Bassi et al.
The process involves two steps: The first one consists in approximating each exact Pareto front by a polynomial of degree N 1, whose coefficients are exact too. In the second step, another approximation of the random vectors of the exact coefficients is made by using GFS. In the sequel, “ex” refers to “exact” and “app” refers to “approached”. For instance, let us consider the Pareto front as given by the equation FðtjU Þ ¼ f1ex ðtjU Þ; f2ex ðtjU Þ : t 2 ð0; 1Þ. We may consider the expansion: fiex ðtjU Þ
XN
cex ðU ÞWj ðtÞ; j¼1 ij
Wj ðtÞ ¼ tj1 :
ð8Þ
The coefficients cex ij depend on the variate and, so, are random variables. We may determine a representation of cex ij by GFS (see Appendix A): app cex ij ðU Þ cij ðU Þ ¼
Xnc k¼0
dijk Uk ðU Þ;
ð9Þ
involving a degree of approximation nc and a polynomial family U, and use this approximation to obtain more accurate statistics of the Pareto fronts. Let us illustrate the approach by using the Fonseca-Fleming problem: we generate ns ¼ 1000 exact Pareto fronts and we approach each as shown in Eq. (11) with N ¼ 7, then we obtain 1000 instances of the random variables cex ij 1 i 2 . 1j7 Let us denote em the mean error between the approximation of the Pareto fronts using cex ij and the exact ones. We have: i1 0 h P
j1 ð U Þt E f1ex ðtjU Þ Nj¼1 cex 1j 6:4 103 1i A ¼ em ¼ @ h P j1 6:4 103 E f2ex ðtjU Þ Nj¼1 cex 2j ðU Þt 1
where k:k1 refers to the norm 1 defined on Rn by: kxk1 ¼ supfjxi j; 1 i ng. Now, we generate a sample of 1000 values of capp resulting from the GFS ij ex expansion of cij and we construct 1000 approached Pareto fronts given by: fiapp ðtjU Þ ¼
XN
capp ðU Þtj1 ; t j¼1 ij
2 ð0; 1Þ:
ð10Þ
Now, after having compared two samples of the same size, according to the values in that allows us building 105 of Tables 1 and 2, we generate 105 size sample of capp ij app approached Pareto fronts f1 ðtjU Þ; f2app ðtjU Þ : t 2 ð0; 1Þ. In Fig. 4, we present in cyan the new set of 105 of the approached Pareto fronts of from the Fonseca and Fleming and in black, the mean Pareto front Papp m resulting ( ! ) N N P app j1 P app j1 app app means of cij 1 i 2 such as: Pm ¼ c1j :t ; c2j :t ; t 2 ð0; 1Þ . j¼1 j¼1 1j7
−4,47895E −08 −2,87411E −08 −2,48081E −08 −2,47322E −08 −5,15009E −09 −3,45621E −09 −1,99118E– 08 −6,84894E −08 −3,09873E −08 −2,46507E −08 −2,47294E −08
−5,31835E −10 −4,9891E−08
0
−4,44304E −08 −3,84528E −08 −2,35116E −08 −1,98608E −08 −1,97139E −08 −1,35588E −09 −5,64872E −08 −9,6811E −10 −8,32324E −08 −2,50653E −08 −1,95474E −08 −1,97111E −08
−5,31835E −10 0
−1,36275E −09 −4,58613E −09 −6,31702E −09 −6,63513E −09 −4,02546E −08 −5,75829E −08 −3,2258E −08 −2,38264E −07 −5,82029E −09 −7,04689E −09 −6,63732E −09
−4,9891E −08 −4,44304E −08 0
−1,83409E −09 −3,0819E −09 −3,19753E −09 −3,71717E −08 −5,43515E −08 −2,76388E −08 −2,35483E −07 −1,93134E −09 −3,36212E −09 −3,19907E −09
−4,47895E −08 −3,84528E −08 −1,36275E −09 0
−1,61921E −10 −1,98718E −10 −2,34201E −08 −3,79573E −08 −1,54351E −08 −1,98891E −07 −2,90146E −10 −2,64535E −10 −1,99221E −10
−2,87411E −08 −2,35116E −08 −4,58613E −09 −1,83409E −09 0
−7,22773E −12 −2,02017E −08 −3,39841E −08 −1,26035E −08 −1,89211E −07 −4,64749E −10 −3,6297E −11 −7,40883E −12
−2,48081E −08 −1,98608E −08 −6,31702E −09 −3,0819E −09 −1,61921E −10 0
−1,14512E −11 −1,16684E −13
−2,03137E −08 −3,41113E −08 −1,2561E −08 −1,89763E −07 −4,203E−10
−2,47322E −08 −1,97139E −08 −6,63513E −09 −3,19753E −09 −1,98718E −10 −7,22795E −12 0
−4,57646E −09 −1,32851E −09 −8,20559E −08 −2,66403E −08 −2,04664E −08 −2,03133E −08
−5,15009E −10 −1,35588E −09 −4,02546E −08 −3,71717E −08 −2,34201E −08 −2,02017E −08 −2,03137E −08 0
−7,5895E −09 −6,87324E −08 −4,15572E −08 −3,41819E −08 −3,40736E −08
−3,45621E −09 −5,64872E −09 −5,75829E −08 −5,43515E −08 −3,79573E −08 −3,39841E −08 −3,41113E −08 −4,57646E −09 0
−1,00573E −07 −1,72107E −08 −1,25239E −08 −1,25595E −08
−1,99118E −09 −9,6811E −10 −3,2258E −08 −2,76388E −08 −1,54351E −08 −1,26035E −08 −1,2561E −08 −1,32851E −09 −7,5895E −09 0
−2,10434E −07 −1,90461E −07 −1,89766E −07
−6,84894E −08 −8,32324E −08 −2,38264E −07 −2,35483E −07 −1,98891E −07 −1,89211E −07 −1,89763E −07 −8,20559E −08 −6,87324E −08 −1,00573E −07 0
−3,83583E −10 −4,19007E −10
−2,66403E −08 −4,15572E −08 −1,72107E −08 −2,10434E −07 0
−3,09873E −08 −2,50653E −08 −5,82029E −09 −1,93134E −09 −2,90146E −10 −4,6475E −10 −4,203E−10
app Table 1. Relative errors between the correlation coefficients of 1000 values of cex ij and cij .
−1,09187E −11
−2,46507E −08 −1,95474E −08 −7,04689E −09 −3,36212E −09 −2,64535E −10 −3,62969E −11 −1,14511E −11 −2,04664E −08 −3,41819E −08 −1,25239E −08 −1,90461E −07 −3,83583E −10 0
−2,47294E −08 −1,97111E −08 −6,63732E −09 −3,19907E −09 −1,99221E −10 −7,40883E −12 −1,16684E −13 −2,03133E −08 −3,40736E −08 −1,25595E −08 −1,89766E −07 −4,19007E −10 −1,09187E −11 0
Statistics of Pareto Fronts 553
554
M. Bassi et al. app Table 2. Relative errors between the 4 first moments of 1000 values of cex ij and cij .
Moyenne Variance Kurtosis Skweness
2,41.10−15 5,92.10−16 8,86.10−09 6,55.10−09 −2,8.10−05 3,03.10−05 −5,6.10−06 8,94.10−06
2,78.10−16 1,83.10−15 4,96.10−09 2,21.10−08 −3,1.10−06 5,69.10−05 4,97.10−06 1,67.10−05
−3,4.10−16 1,06.10−15 5,22.10−08 9,64.10−10 −1,1.10−04 5,67.10−06 −2,9.10−05 2,07.10−06
1,79.10−15 −6,4.10−16 4,27.10−08 2,32.10−07 −7,3.10−05 3,22.10−04 −1,3.10−05 11,3.10−05
−2,3.10−15 1.10−15 2,14.10−08 2,4.10−08 −3,3.10−05 1,13.10−05 −2,8.10−06 −4,2.10−05
−1,6.10−15 0 1,67.10−08 1,64.10−08 −2,1.10−05 −7,2.10−06 2,5.10−06 3,55.10−05
−4,2.10−16 −1,4.10−16 1,65.10−08 1,65.10−08 −1,5.10−05 −1,5.10−05 9,66.10−06 9,66.10−06
The method presented here allows computing very large samples of a given random object at a very low computational cost, and leads then to a better estimation of their statistical characteristics. Note that the calculation of a 105 size sample of exact Fonseca and Flemings Pareto fronts lasts about 215 h, be it 9 days of calculation, while the sample in Fig. 4 took few seconds to be generated.
Fig. 4. 105 of approached Fonseca and Fleming’s Pareto fronts (cyan) and their mean (black) obtained by using GFS expansions
4 Concluding Remarks We considered multiobjective optimization problems involving random variables. In such a situation, the Pareto front is a random curve, which is an object belonging to an infinite dimensional space, so that the evaluation of statistics faces some difficulties. At first, a precise definition of the mean and the variance involves the definition of
Statistics of Pareto Fronts
555
probabilities in an infinite dimensional space. In addition, the practical evaluation requests the numerical manipulation of those probabilities. In order to solve these difficulties, we introduced a procedure which is based on the manipulation of samples, which is independent from the basis used to describe the space. The accuracy of the statistics requests large samples, so that we introduced a method for the generation of large samples by using a relatively smaller medium size sample and a representation of the Pareto fronts by Generalized Fourier series. The tests performed with a large number of classical multiobjectif test problems led to excellent results. Further development will concern the use of small samples and robust multiobjective optimization.
Appendix A: Generalized Fourier Series In the framework of UQ, we are interested in the representation of random variables: let us consider a couple of random variables ðU; X Þ, such that X ¼ X ðU Þ, that is X is a function of U. If X 2 V, where V is a separable Hilbert space, associated to the scalar product ð ; Þ, we may consider a convenient Hilbert basis (or total family) U ¼ fui gi2N and look for a representation X given by [2]: X ¼ X ðU Þ ¼
X i2N
xi ui ðUÞ:
ð11Þ
If the family is orthonormal, ui ; uj ¼ dij and the coefficients of the expansion are given by xi ¼ ðX; ui ðU ÞÞ. Otherwise, we may consider the approximations of X by finite sums: X X Pn X ¼ xi ui ðU Þ: ð12Þ 1in
In thiscase, the coefficients xi are the solutions of the linear system Ax ¼ B, where Aij ¼ ui ; uj and Bi ¼ ðX; ui Þ. We have: lim Pn X ¼ X:
n!1
ð13Þ
In UQ, the Hilbert space V is mainly L2 ðX; PÞ, where X Rn and P is a probability measure, with ðY; ZÞ ¼ EðYZÞ. Classical families U are formed by polynomials, trigonometric functions, Splines or Finite Elements approximations. Examples of approximations may be found in the literature (see, for instance, [2, 3]). When X is a function of a second variable – for instance, t – we denote the function X ðtjU Þ and we have:
556
M. Bassi et al.
XðtjU Þ ¼
X i2N
xi ðtÞui ðU Þ Pn X ðtjUÞ ¼
X
xi ðtÞui ðU Þ:
ð14Þ
1in
The reader may refer to [4] to get more information and MATLAB codes for the evaluation of the coefficients xi , namely in multidimensional situations. In practice, we use a sample from ðtjU Þ : X ðtjU1 Þ; . . .; X ðtjUns Þ in order to evaluate the means forming A and B.
References 1. Croquet, R., Souza de Cursi, E.: Statistics of uncertain dynamical systems. In: Topping, B.H. V., Adam, J.M., Pallarés, F.J., Bru, R., Romero, M.L. (eds.) Proceedings of the Tenth International Conference on Computational Structures Technology, Paper 173, Civil-Comp Press, Stirlingshire, UK (93), pp. 541–561. https://doi.org/10.4203/ccp.93.173 (2010) 2. Bassi, M., Souza de Cursi, E., Ellaia, R.: Generalized fourier series for representing random variables and application for quantifying uncertainties in optimization. In: 3rd International Symposium on Uncertainty Quantification and Stochastic Modeling, Maresias, SP, Brazil, 15–19 February (2016). http://www.swge.inf.br/PDF/USM-2016-0037_027656.PDF 3. Bassi, M.: Quantification d’Incertitudes et Objets en Dimension Infinie. Ph.D. Thesis, INSA Rouen Normandie, Normandie Université, Saint-Etienne du Rouvray (2019) 4. Souza de Cursi, E., Sampaio, R.: Uncertainty Quantification and Stochastic Modelling with Matlab. ISTE Press, London, UK (2015) 5. Bassi, M., Souza de Cursi, E., Pagnacco, E., Ellaia, R.: Statistics of the pareto front in multiobjective optimization under uncertainties. Lat. Am. J. Solids Struct. 15(11), e130. Epub November 14, 2018. https://doi.org/10.1590/1679-78255018 (2018) 6. Dubuisson, M., Jain, A.K.: A modified hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, October 9–13, Jerusalem, pp. 566–568, https://doi.org/10.1109/icpr.1994.576361 (1994)
Uncertainty Quantification in Optimization Eduardo Souza de Cursi1 1
and Rafael Holdorf Lopez2(&)
LMN, INSA Rouen Normandie, Normandie Université, 76801 St-Etienne du Rouvray, France [email protected] 2 UFSC, Campus Trindade, Florianopolis SC 88040-900, Brazil [email protected]
Abstract. We consider constrained optimization problems affected by uncertainty, where the objective function or the restrictions involve random variables u. In this situation, the solution of the optimization problem is a random variable xðuÞ: we are interested in the determination of its distribution of probability. By using Uncertainty Quantification approaches, we may find an expansion of xðuÞ in terms of a Hilbert basis F ¼ fui : i 2 N g. We present some methods for the determination of the coefficients of the expansion. Keywords: Optimization under uncertainty Constrained optimization
Uncertainty quantification
1 Introduction Uncertainties are a key issue in engineering design: optimal solutions usually imply no safety margins, but real systems involve uncertainty, variability and errors. For instance, geometry, material parameters, boundary conditions, or even the model itself include uncertainties. To provide safe designs, uncertainty must be considered in the design procedure – so, in optimization procedures. There are different ways to introduce uncertainty in design: the most popular ones are interval methods, fuzzy variables and probabilistic modeling. Each approach has its particularities, advantages and inconveniences. Here, we focus on the probabilistic approach, which is used in the situations where quantitative statistical information about the variability is available - fuzzy approaches are often used when the information about uncertainty is qualitative and interval approaches do not require information about statistical properties of the uncertainties. When using the probabilistic approach, the variability is modeled by random variables. In general, the only assumption about the distributions the random parameters is the existence of a mean and a variance, id est, that the random variables are square summable. The distribution is generally calculated: it is one of the unknowns to be determined. For instance, let us consider the model problem x ¼ Arg Min fF ðy; uÞ : y 2 SðuÞg;
© Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 557–566, 2020. https://doi.org/10.1007/978-3-030-21803-4_56
ð1Þ
558
E. S. de Cursi and R. Holdorf Lopez
where U is a random vector. Thus, the optimal solutions x may be sensitive to the variations of U. In the case of a significant variability of u, standard optimization procedures cannot ensure a requested safety level: for each possible value u ¼ uðxÞ, the solution takes the value xðxÞ ¼ xðuðxÞÞ, so that x is a random variable. The determination of its statistical properties and its distribution are requested to control statistical properties or the probabilities of some crucial events, such as failure. The reader may find in the literature different approaches used to guarantee the safety of the solution: sensitivity analysis, robust optimization approaches, reliability optimization, chance-constrained optimization, value at risk analysis. None of these approaches furnishes the distribution of x : it is necessary to use Montecarlo simulation or Uncertainty Quantification (UQ) approaches – in general, Montecarlo requires a larger computational effort, while UQ approaches are more economical. In a preceding work [1], we considered the determination of the distribution of x in unconstrained optimization. The results extend straightly to the situation where S is defined by inequalities: S ¼ fy 2 Rn : gi ðy; uÞ 0; 1 i p; hi ðy; uÞ ¼ 0; 1 i qg:
ð2Þ
2 UQ Methods for the Determination of the Unknown Coefficients Applying the UQ approach, we look for a representation of x ¼ x ðuÞ, given by X x¼ xi ui ðuÞ; i2N
where F ¼ fui : i 2 N g is a Hilbert basis or a total family. In practice, we consider the approximations of x by finite sums: X x Px ¼ xi ui ðuÞ 1 i Nx
Let us introduce
uðuÞ ¼ u1 ðuÞ; . . .; uNx ðuÞ
and a matrix X ¼ xij : 1 i NX ; 1 j n such that its line i contains the components of xi : Xij ¼ ðxi Þj ; i:e:; xi ¼ ðXi1 ; . . .; Xin Þ: Then, Px ¼ uðuÞ:X. The unknowns to be determined are the elements of the matrix X. In the sequel, we examine some methods for the determination of X.
Uncertainty Quantification in Optimization
559
2.1
Collocation When a sample xk ; uk : 1 k ns of ns variates from the pair ðx; uÞ is available, we may consider the system of linear equations given by: u uk :X ¼ xk This system involves ns n equations for NX n unknowns. For ns NX , it is overdetermined and admits a generalized solution, such as a least-squares one, which may be determined in order to furnish X and, so, Px. We may illustrate this approach by using the simple situation where x is the solution of the Rosenbrock’s problem (assume that both u1 ; u2 are independent and uniformly distributed on ð1; 2Þ) (See Figs. 1, 2, 3, 4 and 5). n o x ¼ Arg Min ð1 u1 y1 Þ2 þ 100ððu1 þ u2 Þy2 u1 y1 Þ2 : y 2 R2 ; y
Fig. 1. Results for a random sample of 8 8 values of u.
The solution is X1 ¼ 1=u1 ; X2 ¼ 1=ðu1 þ u2 Þ Let us consider a polynomial approximation of order 2: u1 ðuÞ ¼ 1; u2 ðuÞ ¼ u1 ; u3 ðuÞ ¼ u2 ; u4 ðuÞ ¼ u21 ; u5 ðuÞ ¼ u1 u2 ; u6 ðuÞ ¼ u22 2.2
Variational Approximation
The variational approach solves the orthogonal projection equation
560
E. S. de Cursi and R. Holdorf Lopez
Fig. 2. Results for an uniform mesh e of 8 8 values of u
Fig. 3. Results for an error of 5% in the values of xk .
Fig. 4. Results for an error in the distribution of u (N(1.5, 0.25))
Uncertainty Quantification in Optimization
561
Fig. 5. Exact values
Eðyt PxÞ ¼ E ðyt xÞ; 8y ¼ uðuÞ:Y We have
Eðyt PxÞ ¼ Y t E uðuÞt uðuÞ X;
E ðyt xÞ ¼ Y t E uðuÞt x :
Thus, the coefficients X are the solution of the linear system (See Figs. 6, 7, and 8). E uðuÞt uðuÞ X ¼ E uðuÞt X :
Fig. 6. Results for n ¼ z and a random sample of 8 8 values of u.
Fig. 7. Results for an error of 5% in the values of xk .
562
E. S. de Cursi and R. Holdorf Lopez
Fig. 8. Results for an error in the distribution of u (N(1.5, 0.25))
2.3
Moment Matching
An alternative solution consists in determining the unknown coefficients in order to fit the empirical moments of the data. Let us denote by Mke1 ...kn ¼
ns 1X xki11 . . .xkinn E xki11 . . .xkinn ns i¼1
the empirical moment of order k ¼ ðk1 ; . . .; kn Þ of the components j associated to the e sample. When considering 0 ki KM moments, we set M ¼ Mke1 ...kn ; 0 ki KM . Into an analogous way, we may generate M ðX Þ ¼ ðMk1 ...kn ðX Þ; 0 ki KM Þ such that Mk1 ...kn ðX Þ ¼
1 Xns k1 Pxi1 . . .Pxkinn E Pxki11 . . .Pxkinn ; Pxi ¼ uðui Þ:X i¼1 ns
Then, we may look for the coefficients verifying M ðXÞ ¼ M e ; i:e:; Mk1 ...kn ðXÞ ¼ Mke1 ...kn ; 0 ki KM : These equations form a nonlinear system of ðKM þ 1Þn equations which must be solved for the n Nx unknowns X by an appropriated method. If the number of equations exceeds the number of unknowns, an alternative consists in minimizing a pseudo-distance distðM ðX Þ; M e Þ. The main difficulty in this approach is to obtain a good quality result in the numerical solution), due to the lack of convexity the minimization of distðM ðX Þ; M e Þ is a global optimization problem. Let us illustrate this approach by using the Rosenbrock’s function. Let n ¼ z and consider a sample of 64 values of z, corresponding to 8 random values of each variable zi . We consider KM ¼ 5 and we minimize the mean square norm kM ðX Þ M e k. For X exactly determined, we obtain a relative error of 1.0%. By using an uniform grid of 88 values of z, the relative error is 0:8%. When 5% errors are introduced in the values of X, the relative error is 1.0% for a sample of random values and 2.0% for an uniform grid. When considering u as a pair of independent normal variables having mean 1.5
Uncertainty Quantification in Optimization
563
Fig. 9. Results for an error in the distribution of u (N(1.5, 0.25))
and standard deviation 0.25, the relative error is 1.6%. An example of result is shown in Fig. 9. 2.4
Adaptation of an Iterative Method
Assume that the deterministic optimization problem (1) for a fixed U can be solved by an iterative numerical method having as iteration function W: a sequence of values ð pÞ X : p 0 is generated starting from an initial guess X ð0Þ by the iterations xðp þ 1Þ ¼ W xð pÞ Introducing the finite approximations, we have Pxðp þ 1Þ W Pxð pÞ : Since Pxð pÞ ¼ uðuÞ:Xð pÞ ; Pxðp þ 1Þ ¼ uðuÞ:Xðp þ 1Þ and solve the variational equations: E yt PX ðp þ 1Þ ¼ E yt W PX ðpÞ ; 8y ¼ uðuÞ:Y Thus E uðuÞt uðuÞ Xðp þ 1Þ ¼ E uðuÞt W uðuÞ:Xð pÞ
564
E. S. de Cursi and R. Holdorf Lopez
The solution of this linear system determines X ðp þ 1Þ and, thus, Pxðp þ 1Þ . A particularly useful situation concerns iterations where WðX Þ ¼ X þ UðXÞ In this case, the iterations read as Xðp þ 1Þ ¼ Xð1Þ þ DX ð pÞ ; where E uðuÞt uðuÞ DXðpÞ ¼ E uðuÞt U uðuÞ:XðpÞ : This approach may be used, for instance, when an implementation of a descent method for the problem (1) is available – such as, for example, a code implementing the projected gradient descent. Then, the code furnishes W and we may adapt it to uncertainty quantification (See Figs. 10, 11 and 12).
Fig. 10. Results for stochastic descent with an error in the distribution of u (N(1.5, 0.25))
Fig. 11. Results for Robbins-Monto iterations with an error in the distribution of u (N(1.5, 0.25)), degree 4
Uncertainty Quantification in Optimization
2.5
565
Optimality Equations
If the solution x verifies optimality equations: EðxÞ ¼ 0; such as, for instance, rF ðxÞ ¼ 0, then we may use methods for uncertainty quantification of algebraic equations (see, for instance, [2, 3]). This approach may be applied when q ¼ 0 (only inequalities, no equalities), In this case, we may consider mðt; uÞ ¼ minn fr ðy; t; uÞg; r ðy; t; uÞ ¼ max F ðy; uÞ t; g1 ðy; uÞ; . . .; gp ðy; uÞ : y2R
Assume the continuity of F. Then, on the one hand, y 62 SðuÞ ) r ðy; t; uÞ [ 0; on the other hand, for y 2 SðuÞ: t [ F ðx; uÞ ) r ðy; t; uÞ\0; t\F ðx; uÞ ) r ðy; t; uÞ [ 0: It results from these inequalities that m has a zero at the point t ¼ F ðx; uÞ. Thus, an alternative approach consists in determining a zero t of m (i.e., mðt ; uÞ ¼ 0). Then x ¼ arg minn fr ðy; t ; uÞg. y2R
(a) Approximated values
(b) Exact values Fig. 12. Results furnished by adaptation of Newton’s iterations of optimality equations
566
E. S. de Cursi and R. Holdorf Lopez
3 Concluding Remarks We have considered optimization under uncertainty modeled by random variables, which corresponds to situations where statistical models of the uncertain parameters are available. In such a situation, the distribution of the solution x may be determined by different methods, such as, for instance, collocation, moment matching, variational, adaptation or algebraic equation ones. The main difficulty in the practical implementation is the significant increase in the number of variables: when considering n unknowns, we must determine n Nx coefficients. This is a limitation, namely, for large n. The methods have been tested in a simple but significant situation involving a nonconvex objective function and numerical difficulties in the optimization itself. In the simple situation, the methods have shown to be effective to calculate. The simplest approach is furnished by collocation, but it is more sensible to measurement errors and may request stabilization or regularization (such as Tikhonov’s one). Moment matching leads to global optimization problems, but is less sensitive to measurement errors, including those on the distribution of the random variables. Variational, and adaptation approaches are stable, but less efficient than collocation in situations where the data is of good quality. The adaptation of deterministic procedures has led to the best results, namely the algebraic approaches for the solution of optimality equations. Questions for future work involve the development of methods for large values of n, the generation of samples for arbitrary distributed v (or n), namely for specified correlations between their components, high-order expansions in a large number of variables.
References 1. Lopez, R.H., De Cursi, E.S., Lemosse, D.: Approximating the probability density function of the optimal point of an optimization problem. Eng. Optim. 43(3), 281–303 (2011) https://doi. org/10.1080/0305215x.2010.489607 2. Lopez, R.H., Miguel, L.F.F., De Cursi, E.S.: Uncertainty quantification for algebraic systems of equations. Comput. Struct. 128, 189–202 (2013) https://doi.org/10.1016/j.compstruc.2013. 06.016 3. De Cursi, E.S., Sampaio, R.: Uncertainty quantification and stochastic modelling with matlab. ISTE Press, London, UK (2015)
Uncertainty Quantification in Serviceability of Impacted Steel Pipe Renata Troian(B) , Didier Lemosse, Leila Khalij, Christophe Gautrelet, and Eduardo Souza de Cursi Normandie universite, LMN/INSA de ROUEN, 76000 Rouen, France [email protected] https://www.insa-rouen.fr/recherche/laboratoires/lmn
Abstract. The problem of the vulnerability of structures facing explosions came to the front line of the scientific scene in the last decades. Structural debris usually present dangerous potential hazard, e.g. domino accident. Deterministic models are not sufficient for reliability analysis of structures impacted by debris. Uncertainty of the environmental conditions and material properties have to be taken into account. The proposed research is devoted to the analysis of a pipeline behavior under a variable impact loading. Bernoulli beam model is used as a structural model of a pipeline for the case simplicity, while the different formulation for impact itself are studied to simulate the wide range of possible types of debris. Model sensitivity is studied first. The influence of input parameters on structural behavior, that are the impact force, duration and position, as well as beam material are considered. Uncertainty analysis of several impacts are then presented. The obtained insights can provide the guidelines for the structure optimization under the explosive loading taking into account the uncertainties.
Keywords: Impact
1
· Rigid · Soft · Sensitivity · Uncertainty
Introduction
To assure the urban security in the case of an explosion efforts have to be made in developing reliability analysis and design methods. Research efforts, stimulated by industrial needs, are still required to achieve this goal. Yi Zhu et al. in [21] have analyzed oil vapor explosion accident, various causes led to the explosion, high casualties and severe damages. They are mentioning that debris usually present dangerous potential hazard, e.g. domino accident. Among possible affected structures pipelines can play a major role in the domino effect. This consideration defines the object of the present research. Prediction of a debris This research is a part of a project AMED, that has been funded with the support from the European Union with the European Regional Development Fund (ERDF) and from the Regional Council of Normandie. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 567–576, 2020. https://doi.org/10.1007/978-3-030-21803-4_57
568
R. Troian et al.
impact on pipes needs a detailed understanding of impact phenomena, structural and material response of pipes, and, of course, uncertainties consideration. Structural design by safety factors using nominal values without considering uncertainties may lead to designs that are either unsafe, or too conservative and thus not efficient. The existing literature treats various aspects of the risks connected to the effects of explosions on structural integrity. For example, Kelliher and SuttonSwaby in [7] present a model for a stochastic representation of blast induced damage in a concrete building. In the research of Hadianfard et al. [6] a steel column under blast load is investigated using the Monte-Carlo method to take into account the uncertainties of the blast and the material. Nevertheless, works concerning variability, reliability or uncertainty remain rare. Regarding the specific analysis of pipes or, more generally, cylindrical structures destined to the transportation of fluids, we may find works on the same topics, analogously limited to deterministic situations. While modeling the pipe’s dynamic response one can’t use anymore the material models developed for isotropic materials, as the pipe’s construction material evolved much lately. Economic studies have shown that development of oil and gas transportation over long distances requires the use of high-strength grade steels because their tensile properties allow to substantially increase the internal pressure for a given pipe thickness [15]. In order to obtain high strength, these materials are produced using complex Thermo-Mechanical-Control-Processing (TMCP) which introduces preferred orientations within the steel and leads to anisotropic plastic properties in higher grades. Rupture properties may also be anisotropic [14]. Fully understanding and describing the material behavior is needed to produce safe and cost-effective pipelines. However, the pipe material properties can be controlled better comparing to possible debris’ characteristics, see for instance works on statistic debris distribution [19,21]. Thus the attention has to be paid to a variable impactors modeling. Also Villavicencio and Soares [18] showed that the dynamic response of a clamped steel beams struck transversely at the center by a mass is very sensitive to the way in which the supports are modelled. So the boundary conditions of a structure are to be studied as well. Numerous experimental and numerical studies of an explosion or an impact on a cylinder have been carried out. But very few study concern the stochastic systems. Among them can be mentioned recent study of Wagner et al. [20] concerning robust design criterion for axially loaded cylindrical shells. Alizadeh et al. in [2] studied pipe conveying fluid with stochastic structural and fluid parameters. Speaking about impact problems in general, existing studies are considering structure geometry, material and impactor velocity. Li and Liu in [8] studied a phenomenon of elastic–plastic beam dynamics keeping the geometrical and material parameters of a beam fixed and applying uncertainty on the pressure amplitude, i.e. impact characteristic. Rina et al. in [13] proposed a model to predict the penetration depth of a projectile in a two- and threelayer target both deterministically and probabilistically. Material properties and
Uncertainty Quantification in Serviceability of Impacted Steel Pipe
569
projectile velocity are considered in probabilistic analysis. In [4] Antoine and Batra have analyzed the influence of an impact speed, laminate material and geometrical parameters on the plate response during the low velocity impact by a rigid hemispherical-nosed cylinder. In [5] Fyllingen et al. conducted stochastic numerical and experimental tests of square aluminium tubes subjected to axial crushing. Considered uncertain parameters were geometry of a tube (an extrusion length and a wall thickness) and an impact velocity of the impactor. L¨ onn et al. [9] presented approach to robust optimization of an aluminium extrusion with quadratic cross-section subjected to axial crushing taking into account the geometry uncertainty.
a) General view
b) Cross-section
Fig. 1. Beam subjected to a single impact.
We aim the present research to demonstrate the influence of structure and impactor parameters on a structural response. Young modulus is chosen for material characterization, position of an impact on a structure corresponds to interaction structure-impact and special attention is given for impactor characteristics. Not only its velocity and mass are considered, but also the material of an impactor. For this we propose a simplified modeling of a pipe under an impact loading suitable for stochastic simulations. Impact is introduced into the model as a pulse of sinusoidal shape. First a sensitivity of a model versus loading parameters and pipe material is studied. Then a dynamic response of a pipe to several impacts is considered.
2
Numerical Model of a Pipe Under Variable Impactors
In this research we are interested in the response of a pipeline to the impact taking into account uncertainties. We propose a simplified model, that includes a structure and numerous random impactors and won’t demand computational time. The choice is to represent the impactors by a contact-force history that act on a pipe. 2.1
Pipe Modeling
For a pipe simulation an elastic Bernoulli beam finite-element model was developed in Matlab with Newmark time integration scheme.
570
R. Troian et al.
The perfectly clamped hollow cylindrical steel beam is shown in Fig. 1 (a,b). The characteristics are the length L = 1 m, the diameter d = 0.1 m and the thickness r = 0.02, Young modulus E = 2.158e11 Pa, density ρ = 7966 kg/m3 and yield stress σy = 2.5e8 Pa. 2.2
Limit State Criterion
Depending on the failure cause different limit states are associated to structural elements of the linear pipelines parts [17]. Parameters such as a depletion of strength under a force impact and a depletion of pipe material plasticity can be taken into account for impacted pipelines risk assessment. In the present study an elastic limit is chosen for the analyses as a serviceability limit state. Being too conservative for a design of most pipelines due to the capacities of the elastic– plastic range, it can be reasonable for the dangerous sites where a Domino effect is highly possible. According to these considerations only stresses in the elastic domain are calculated and stresses with σ > σy are considered as a failure. 2.3
Impactor Modeling
The interest of the work is to study the response of a structure to a stochastic impact loading, including the impact of solid and soft debris, as both types of the debris can be produced during an accident and can affect the structure’s integrity. The characteristics of the loading force history such as shape, force and duration are thus crucial. To obtain the idea of the realistic contact force estimation for solid debris impact the paper of Perera et al. [11] can be considered. The model presented in this paper enables the value of the peak contact force generated by the impact of a piece of debris to be predicted. Results of calculations employing the derived relationships have been verified by comparison with experimental results across a wide range of impact scenarios. The observed contact load has sinusoidal shape, that will be considered in the present article. Following [12], when the load shape is defined by its effective load (amplitude) h and duration l, the corresponding sinusoidal impulse shape, that will provide the same structural response according to the pulse approximation method, will π , with ω = π−2 be given by load = (π − 2)h sin(2ωt), for 0 < t < 2ω l . Duration time is the main difference between rigid and soft contacts, as are formulated in [3]. A comprehensive overview of soft impact models is done by Serge Abrate [1] for three types of soft projectile - liquid, bird and hailstone. Thus the rigid and soft impact impulses are introduced into the model by considering impulse of different time duration. The developed numerical model of a beam under an impact is valid for the case of the elastic material, so the impactor’s characteristics have to be in the ranges, which will induce stresses in the beam that won’t exceed the yield stress σy = 2.5e8 Pa. Parametric study was conducted with varying amplitude and duration. Obtained maximum stresses σx for each pair of parameters (amplitude, duration) = (h, l) are presented in Fig. 2. Only values of stresses
Uncertainty Quantification in Serviceability of Impacted Steel Pipe
571
in the elastic domain are marked with color. When impact position p = 0.1 m normal stresses are smaller than for p = 0.5, as it was expected. The input parameters of the system that are considered to keep it in elastic domain are chosen following the values in Fig. 2 (colored stresses ) and are given in the Table 1.
a) Impact is applied in a mid span, p = 0.5 m.
b) Impact is applied at p = 0.1 m.
Fig. 2. Maximum stresses values. Stresses values in an elastic domain are given by a colorbar. The white area corresponds to a plasticity domain.
Table 1. Numerical values of parameters used in the simulation Input parameters
3
Val. min Val. max
Impact amplitude h (N) 0
1.5e5
Impact duration l (s)
0.0001
0.004
Impact position p (m)
0.05
0.5
Sensitivity Analysis of Impactor and Pipe Characteristic
The study is concentrated on the stochastic nature of the impact. The characteristics of the impact that will be studied are the impact force, duration and position. The variability of the structure material will be considered through variation of the Young modulus E. Parameters are supposed to be independent and have uniform distribution in the limits given in the Table 1 for impulse characteristics. Considering Young modulus, the material of a pipe is assumed to be known, but it can vary slightly
572
R. Troian et al.
due to manufacturing or aging. To take this into account E has uniform distribution in a range [1.95e11; 2.2e11] Pa. The sample of a size N = 1400 is obtained with Latin hypercube sampling (LHS) [10]. The output parameters of the model that are influenced by the impact characteristics and are important for the evaluation of the structure integrity are the maximum beam deflection wmax and maximum stresses σmax together with deflection wp and stresses σp at the impact position. While there are many methods available for analyzing the decomposition of variance as a sensitivity measure, the method of Sobol [16] is one of the most established and widely used methods and is capable of computing the Total Sensitivity Indices (TSI), which measures the main effects of a given parameter and all the interactions (of any order) involving that parameter. Sobol’s method uses the decomposition of variance to calculate the sensitivity indexes. The basis of the method is the decomposition of the model output function y = f (x) into summands of variance using combinations of input parameters in increasing dimensionality. The first-order index Si represents the share of the output variance that is explained by the considered parameter alone. Most important parameters therefore have high index, but a low one does not mean the parameter has no influence, as it can be involved in interactions. The total index SItot is a measure of the share of the variance that is removed from the total variance when the considered parameter is fixed to its reference value. Therefore parameters with low SItot , can be considered as non-influential.
a) First order effects
b) Total order effects
Fig. 3. Indices using Sobol’s method for the maximum beam deflection wmax , maximum stresses σmax , deflection wp and stresses σp at the impact position.
3.1
Obtained Results
First order effects and total effects of the stresses and beam deflection due to changes in loading characteristics were calculated. Numerical values are presented in the Fig. 3. It can be seen, that the variation of Young modulus doesn’t
Uncertainty Quantification in Serviceability of Impacted Steel Pipe
573
play a major role in the obtained values. Contrariwise the position of the impact relatively to the boundary conditions and impact amplitude influence strongly the beam response, as well as impact duration.
4
Uncertainty Analysis of a Model
This section is devoted to the uncertainty analysis of the structural response of an impacted beam. The realistic situation that can lead to the Domino effect is modeled. The possible values of impactor properties are unknown and the aim is to compute the structural response. The uniform distribution of the impulse characteristics is used. 4.1
Stresses Variability Under Multiple Impacts
Situations of single impact, two impacts with different interval between them and three impacts with variable time were considered. Impulse characteristics, such as impact amplitude, duration and position (see Table 1) are considered as random variables with uniform distribution using Latin hypercube sampling (LHS). The sample size for single impact is N = 1000, for two impacts N = 2000 and for three N = 3000. Obtained stresses distributions are presented in Figs. 4 and 5. In red are marked number of tests when stresses exceeded the plastic limit σy .
a) single impact
b) three impacts with the interval of 0.001 s
Fig. 4. Probability of possible stresses under impact.
Figure 6 represents the Cumulative Distribution Functions (CDF) of the stresses for one, two and three safe impactors. The same tendency can be noted: more impactors are falling on the pipe, the higher is the damage probability.
574
R. Troian et al.
a) 0.001 s
b) 0.0005 s
Fig. 5. Probability of possible stresses under two impacts with different intervals.
Fig. 6. The cumulative distribution functions (CDF’s) of stresses distribution. (1) corresponds to the case of one impact, (2.1) to the two impacts with an interval of 0.001 s, (2.2) to the two impacts with an interval of 0.0005 s and (3) corresponds to the three impacts.
According to Figs. 4, 5, 6, when the interval between two impacts is 0.001 s more than 5% of impact can provoke plastic deformations. If the impacts will occur almost simultaneously with interval 0.0005 this number increase up to 8%. And in the case of three impacts it becomes almost 20%. Thus even if the impactor’s characteristics don’t provoke the plastic deformation in the case of single impact, two and more impactors of the same size and velocity can damage the impacted structure.
Uncertainty Quantification in Serviceability of Impacted Steel Pipe
5
575
Conclusion
The paper proposes a stochastic analysis of a structural response under a random impact. A steel pipeline is simulated with Bernoulli beam model and impactors are introduced in the system by the impulses of rectangular or sinusoidal shapes. The present research shows a need to consider not only the big/heavy impactors or impactors with high velocity. Relatively small and slow impactors can cause the plastic deformations and lead to rupture of a pipe and a following domino effect. Proposed analysis is conducted on the simplified model. Nevertheless, the conclusions on the parameters sensitivity give the insights into the problem modeling. It was shown that the impactor properties and impact position are more important than the structure material variation for a structural dynamic response. Also the proposed approach when the impactor is introduced into the system by its time-force history can save time for more complex numerical models. Further studies with detailed 3D modeling are ongoing to detect rupture modes for different kinds of impactors. Acknowledgment. This research is a part of a project AMED, that has been funded with the support from the European Union with the European Regional Development Fund (ERDF) and from the Regional Council of Normandie.
References 1. Abrate, S.: Soft impacts on aerospace structures. Prog. Aerosp. Sci. 81, 1–17 (2016) 2. Alizadeh, A.A., Mirdamadi, H.R., Pishevar, A.: Reliability analysis of pipe conveying fluid with stochastic structural and fluid parameters. Eng. Struct. 122, 24–32 (2016) 3. Andreaus, U., Casini, P.: Dynamics of sdof oscillators with hysteretic motionlimiting stop. Nonlinear Dyn. 22(2), 145–164 (2000) 4. Antoine, G., Batra, R.: Sensitivity analysis of low-velocity impact response of laminated plates. Int. J. Impact Eng. 78, 64–80 (2015) 5. Fyllingen, Ø., Hopperstad, O., Langseth, M.: Stochastic simulations of square aluminium tubes subjected to axial loading. Int. J. Impact Eng. 34(10), 1619–1636 (2007) 6. Hadianfard, M.A., Malekpour, S., Momeni, M.: Reliability analysis of h-section steel columns under blast loading. Struct. Saf. 75, 45–56 (2018) 7. Kelliher, D., Sutton-Swaby, K.: Stochastic representation of blast load damage in a reinforced concrete building. Struct. Saf. 34(1), 407–417 (2012) 8. Li, Q., Liu, Y.: Uncertain dynamic response of a deterministic elastic-plastic beam. Int. J. Impact Eng. 28(6), 643–651 (2003) 9. L¨ onn, D., Fyllingen, Ø., Nilssona, L.: An approach to robust optimization of impact problems using random samples and meta-modelling. Int. J. Impact Eng. 37(6), 723–734 (2010) 10. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42(1), 55–61 (2000)
576
R. Troian et al.
11. Perera, S., Lam, N., Pathirana, M., Zhang, L., Ruan, D., Gad, E.: Deterministic solutions for contact force generated by impact of windborne debris. Int. J. Impact Eng. 91, 126–141 (2016) 12. Ren, Y., Qiu, X., Yu, T.: The sensitivity analysis of a geometrically unstable structure under various pulse loading. Int. J. Impact Eng. 70, 62–72 (2014) 13. Riha, D., Thacker, B., Pleming, J., Walker, J., Mullin, S., Weiss, C., Rodriguez, E., Leslie, P.: Verification and validation for a penetration model using a deterministic and probabilistic design tool. Int. J. Impact Eng. 33(1–12), 681–690 (2006) 14. Shinohara, Y., Madi, Y., Besson, J.: A combined phenomenological model for the representation of anisotropic hardening behavior in high strength steel line pipes. Eur. J. Mech.A Solids 29(6), 917–927 (2010) 15. Shinohara, Y., Madi, Y., Besson, J.: Anisotropic ductile failure of a high-strength line pipe steel. Int. J. Fract. 197(2), 127–145 (2016) 16. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Math. Comput. Simul. 55(1), 271–280 (2001) 17. Timashev, S., Bushinskaya, A.: Methods of assessing integrity of pipeline systems with different types of defects. In: Diagnostics and Reliability of Pipeline Systems, pp. 9–43. Springer (2016) 18. Villavicencio, R., Soares, C.G.: Numerical modelling of the boundary conditions on beams stuck transversely by a mass. Int. J. Impact Eng. 38(5), 384–396 (2011) 19. Van der Voort, M., Weerheijm, J.: A statistical description of explosion produced debris dispersion. Int. J. Impact Eng. 59, 29–37 (2013) 20. Wagner, H., H¨ uhne, C., Niemann, S., Khakimova, R.: Robust design criterion for axially loaded cylindrical shells-simulation and validation. Thin-Walled Struct. 115, 154–162 (2017) 21. Zhu, Y., Qian, X.m., Liu, Z.y., Huang, P., Yuan, M.q.: Analysis and assessment of the qingdao crude oil vapor explosion accident: lessons learnt. J. Loss Prev. Process Ind. 33, 289–303 (2015)
Multiobjective Programming
A Global Optimization Algorithm for the Solution of Tri-Level Mixed-Integer Quadratic Programming Problems Styliani Avraamidou
and Efstratios N. Pistikopoulos(B)
Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA {styliana,stratos}@tamu.edu
Abstract. A novel algorithm for the global solution of a class of tri-level mixed-integer quadratic optimization problems containing both integer and continuous variables at all three optimization levels is presented. The class of problems we consider assumes that the quadratic terms in the objective function of the second level optimization problem do not contain any third level variables. To our knowledge, no other solution algorithm can tackle the class of problems considered in this work. Based on multi-parametric theory and our earlier results for tri-level linear programming problems, the main idea of the presented algorithm is to recast the lower levels of the tri-level optimization problem as multiparametric programming problems, in which the optimization variables (continuous and integer) of all the upper level problems, are considered as parameters at the lower levels. The resulting parametric solutions are then substituted into the corresponding higher-level problems sequentially. Computational studies are presented to asses the efficiency and performance of the presented algorithm. Keywords: Tri-level optimization Mixed-integer optimization
1
· Multi-parametric programming ·
Introduction
Optimization problems that involve three decision makers at three different decision levels are referred to as tri-level optimization problems. The first decision maker, also referred to as the leader, solves an optimization problem which includes in its constraint set another optimization problem solved by a second decision maker, that it is in turn constraint by a third optimization problem solved by the third decision maker. A tri-level problem formulation can be applied to many different applications in different fields including operations research, process engineering, and management. Moreover, tri-level problems can involve both discrete and continuous Supported by Texas A&M Energy Institute, RAPID SYNOPSIS Project (DEEE0007888-09-03) and National Science Foundation grant [1739977]. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 579–588, 2020. https://doi.org/10.1007/978-3-030-21803-4_58
580
S. Avraamidou and E. N. Pistikopoulos
decision variables, as they have been used to formulate supply chain management problems [23], safety and defense [1,6,24] or robust optimization [7,13,14] problems. Mixed-integer tri-level problems have the general form of (1), where x is a vector of continuous variables, and y is a vector of discrete variables. min F1 (x, y)
x1 ,y1
s.t. G1 (x, y) ≤ 0 min F2 (x, y) x2 ,y2
s.t.
G2 (x, y) ≤ 0 min F3 (x, y)
(1)
x3 ,y3
s.t. G3 (x, y) ≤ 0 x = [xT1 xT2 xT3 ]T , y = [y1T y2T y3T ]T x ∈ Rn , y ∈ Zp This manuscript is organized as follows. The following sub-section presents previous work on solution algorithms for tri-level problems, Sect. 2 presents the class of problems considered and presents the proposed algorithm. In Sect. 3 computational studies are presented and Sect. 4 concludes this manuscript. 1.1
Previous Work
Tri-level programming problems are very challenging to solve even when considering continuous linear problems [5]. Therefore, solution approaches presented in the literature for tri-level problems are sparse, addressed a very restricted class of problems, and most do not guarantee global optimality. Table 1 summarizes key solution methods for mixed-integer multi-level problems. It is worth noting here that all the presented approaches are only applicable to linear tri-level problems. Table 2 presents an indicative list of solution methods for non-linear multilevel problems. None of the approaches presented here is able to tackle integer variables at any decision level. It is therefore evident that general strategies for the solution of mixed-integer non-linear tri-level problems are still lacking. Table 1. Indicative list of previous work on mixed-integer multi-level optimization problems with three or more optimization levels. Class
Algorithm
Reference Note
Integer linear
Tabu search
[20]
Genetic Algorithms Mixed-integer linear Decomposition rithm
Sub-optimal solutions
[21]
algo- [24]
Multi-parametric Pro- [4] gramming
Exact and global, applicable only for min-max-min problems Exact and global
A Global Optimization Algorithm for the Solution of T-MIQP Problems
581
Table 2. Indicative list of previous work on non-linear multi-level optimization problems with three or more optimization levels. Class
Algorithm
Reference Note
Continuous quatratic
Multi-parametric programming
[9]
Continuous non-linear Particle swarm opti- [10] mization Evolutionary algo- [22] rithm Multi-parametric [11] programming (B&B)
2
Exact and global Sub-optimal solutions Sub-optimal solutions Approximate optimum
global
Tri-Level Mixed-Integer Quadratic Optimization Algorithm
In this work, we consider the tri-level problem presented as (2). The problem contains linear constraints and quadratic objective functions in all three optimization levels. The quadratic term in the objective function of the second level problem’s objective function does not contain any third level variable. min ω T Q1 ω + cT1 ω + cc1
x1 ,y1
s.t. A1 x + E1 y ≤ b1 min [xT1 y1T xT2 y2T ]Q2 [xT1 y1T xT2 y2T ]T + cT2 ω + cc2 x2 ,y2
s.t.
A2 x + E2 y ≤ b2 min ω T Q3 ω + cT3 ω + cc3
(2)
x3 ,y3
s.t. A3 x + E3 y ≤ b3 x ∈ Rn , y ∈ {0, 1}m x = [xT1 xT2 xT3 ]T , y = [y1T y2T y3T ]T ,
ω = [xT1 y1T xT2 y2T xT3 y3T ]T
where ω is a vector of all decision variables of all decision levels, xi are continuous bounded decision variables and yi binary decision variables of optimization level i, Qi 0, ci and cci are constant coefficient matrices in the objective function of optimization level i, Ai , Ei are constant coefficient matrices multiplying decision variables of level i in the constraint set, and b is a constant value vector. Faisca et al. [9] presented an algorithm for the solution of continuous tri-level programming problems using multi-parametric programming [15]. Avraamidou and Pistikopoulos [4] expanded on that approach and presented an algorithm for the solution of mixed-integer linear tri-level problems. The approach presented here is an extension to these algorithms and tackles the more general mixedinteger quadratic tri-level problem. The proposed algorithm will be introduced through the general form of the tri-level mixed-integer quadratic programming problem (2) and then illustrated through a numerical example in Subsect. 2.1.
582
S. Avraamidou and E. N. Pistikopoulos
The first step of the proposed algorithm is to recast the third level optimization problem as multi-parametric mixed-integer quadratic programming problem, in which the optimization variables of the second and first level problems are considered as parameters (3). min
x3 ,y3
s.t.
ω T Q3 ω + cT3 ω + cc3 A3 x + E3 y ≤ b3 xL ≤ x ≤ xU
(3)
Problem (3) is then solved using multi-parametric mixed-integer quadratic (mpMIQP) algorithms in POP toolbox [16] allowing for the solution to contain envelops of solutions [8,17]. Remark 1. Envelops of solutions contain multiple candidate parametric solutions and require a comparison procedure for the determination of the optimal solution. Envelops of solutions can appear when solving mp-MIQP problems as a result of multiple parametric solutions for different realizations of the binary variables. POP toolbox follows a comparison procedure to get the exact and optimal critical regions in the space of the parameters and discard sub-optimal solutions in envelopes. This comparison procedure is not performed at this step of the presented tri-level algorithm, as a comparison procedure at the end of the algorithm is more computationally efficient. The solution of problem (3) results to the parametric solution (4) that consists of the complete profile of optimal solutions of the third level variables, x3 and y3 as explicit functions of the decision variables of optimization levels one and two (x1 , y1 , x2 , y2 ). ⎧ ξ1 = p1 + q1 [xT1 y1T xT2 y2T ]T if H1 [xT1 y1T xT2 y2T ]T ≤ h1 , y3 = r1 ⎪ ⎪ ⎪ ⎪ ⎨ξ2 = p2 + q2 [xT1 y1T xT2 y2T ]T if H2 [xT1 y1T xT2 y2T ]T ≤ h2 , y3 = r2 (4) x3 = . .. .. ⎪ ⎪ . ⎪ ⎪ ⎩ ξk = pk + qk [xT1 y1T xT2 y2T ]T if Hk [xT1 y1T xT2 y2T ]T ≤ hk , y3 = rk where ξi is the affine function of the third level continuous variables in terms of the first and second level decision variables, Hi [xT1 y1T xT2 y2T ]T ≤ hi , y3 = ri is referred to as critical region i, CRi , and k denotes the number of computed critical regions. The next step is to recast the second level optimization problem into k mpMIQP problems, by considering the optimization variables of the first level problem, x1 , y1 , as parameters and substituting in the corresponding functions ξi of x3 and y3 . Also, the corresponding critical region definitions are added to the existing set of second level constraints, as an additional set of constraints for each problem. The k formulated problems are solved using POP toolbox, providing the complete profile of optimal solutions of the second level problem (for an optimal
A Global Optimization Algorithm for the Solution of T-MIQP Problems
583
third level problem), as explicit functions of the decision variables of the first level problem, x1 and y1 . The computed parametric solution is in turn used to formulate single-level reformulations of the upper level problem by substituting the derived critical region definitions and affine functions of the variables in the leader problem, forming a single level mixed-integer quadratic programming (MIQP) problem for each critical region. The single-level MIQP problems are solved with appropriate R if convex, and either BARON R [19] or ANTIGONE R algorithms (CPLEX [12] if not convex). The final step of the algorithm is a comparison procedure to select the global optimum solution. This is done by solving the mixed-integer linear problem (5). z ∗ = min α α,γ
s.t. α = γi,j zi,j i,j γi,j = 1 i,j
(5)
γi,j ui,j ≤ γi,j up,q ∀i, j, p = i, q ∀i, j, r = i γi,j vi ≤ γi,j vr γi,j ∈ {0, 1} where z ∗ is the exact global optimum of problem (2), γi,j are binary variables corresponding to each CRi,j , zi,j are the objective function values obtained when solving problems in Step 6, ui are the objective function values obtained when solving problems in Step 4, and vi are the objective function values obtained when solving problems in Step 2. 2.1
Numerical Example
Consider the following tri-level mixed-integer quadratic problem (6). min z = 5x1 2 + 6x2 2 + 3y1 + 3y2 − 3x3
x1 ,y1
min
x2 ,y2
u = 4x1 2 + 6y1 − 2x2 + 10y2 − x3 + 5y3 min
x3 ,y3
v = 4x3 2 + y3 2 + 5y4 2 + x2 y3 + x2 y4 − 10x3 − 15y3 − 16y4
6.4x1 + 7.2x2 + 2.5x3 ≤ 11.5 −8x1 − 4.9x2 − 3.2x3 ≤ 5 3.3x1 + 4.1x2 + 0.02x3 + 0.2y1 + 0.8y2 + 4y3 + 4.5y4 ≤ 1 y1 + y 2 + y 3 + y 4 ≥ 1 −10 ≤ x1 , x2 ≤ 10 x1 , x2 , x3 ∈ R, y1 , y2 , y3 , y4 ∈ {0, 1} s.t.
(6) Step 1: The third level problem is reformulated as a mp-MIQP problem (7), in which all decision variables of the first and second level problems (x1 , y1 , x2 , y2 ) are considered as parameters.
584
S. Avraamidou and E. N. Pistikopoulos
min
x3 ,y3
v = 4x3 2 + y3 2 + 5y4 2 + x2 y3 + x2 y4 − 10x3 − 15y3 − 16y4
6.4x1 + 7.2x2 + 2.5x3 ≤ 11.5 −8x1 − 4.9x2 − 3.2x3 ≤ 5 3.3x1 + 4.1x2 + 0.02x3 + 0.2y1 + 0.8y2 + 4y3 + 4.5y4 ≤ 1 y1 + y2 + y3 + y4 ≥ 1 −10 ≤ x1 , x2 ≤ 10 x1 , x2 , x3 ∈ R, y1 , y2 , y3 , y4 ∈ {0, 1}
s.t.
(7)
R toolbox Step 2: Problem (7) is solved using the mp-MIQP solver in POP [16]. The multi-parametric solution of problem (7) consists of 14 critical regions. A subset of them is presented in Table 3.
Table 3. Numerical Example: A sub-set of the multi-parametric solution of the third level problem CR
CR1
CR6
CR14
Definition 3rd Level Obj. 0.6236x1 + 0.7759x2 v1 = 13.1072x1 2 +0.0960y2 ≤ 0.1743 +16.5888x2 2 −0.6644x1 − 0.7474x2 ≤ −0.2206 +29.4912x1 x2 −y1 − y2 ≤ −1 −8.7040x1 x1 ≤ 10 −9.7920x2 y1 , y2 ∈ {0, 1} −26.6800 v6 = 54450x1 2 0.62119x1 + 0.7778x2 +84050x2 2 +0.0956y2 ≤ −0.5674 +1250y2 2 −0.6242x1 − 0.7755x2 +16.5888x2 2 −0.0946y2 ≤ 0.5816 +135300x1 x2 y1 + y 2 ≤ 1 +16500x1 y2 x1 ≤ 10 +20500x2 y2 y1 , y2 ∈ {0, 1} +101475x1 +126076x2 +15375y2 + 15375 0.6212x1 + 0.7778x2 v14 = 12.5x1 2 +0.0956y2 ≤ 0.1743 +4.6895x2 2 0.8528x1 + 0.5223x2 ≤ −0.2206 +15.3125x1 x2 0.0444x1 + 0.9990x2 ≤ −0.2206 +53.1250x1 y1 + y 2 ≤ 1 +32.5391x2 −x1 ≥ 10 −9.7920x2 −x2 ≥ 10 +0.3203 y1 , y2 ∈ {0, 1}
3rd Level Var. x3 = −2.56x1 −2.88x2 + 4.6 y3 = 0 y4 = 0
x3 = −165x1 −205x2 − 25y2 −150 y3 = 1 y4 = 0
x3 = −2.5x1 −1.5312x2 −1.5625 y3 = 1 y4 = 1
Step 3: The multi-parametric solution of problem (7), partially presented in Table 3, is used to formulate 14 mp-MIQP second level problems. The critical region definitions are added to the reformulated second level problems as a new
A Global Optimization Algorithm for the Solution of T-MIQP Problems
585
set of constraints. The affine functions of the third level variables, x3 , are substituted in the problems, along with the value of the binary third level variables. Finally, decision variables of the first level problem are considered as parameters. The first mp-MIQP formulated corresponds to CR1 and is presented as (8). Similar problems are formulated for the rest of the critical regions. min
x2 ,y2
s.t.
u1,1 = 4x1 2 + 4y2 2 + 6y1 − 2x2 + 6y2 − (2.56x1 − 2.88x2 + 4.6) + 5(0) 0.6236x1 + 0.7759x2 + 0.0960y2 ≤ 0.1743 −0.6644x1 − 0.7474x2 ≤ −0.2206 −y1 − y2 ≤ −1 x1 ≤ 10 y1 , y2 ∈ {0, 1}
(8) Step 4: All the problems formulated in Step 3 are solved using the mp-MIQP R toolbox [16]. The resulting solutions consisted of a total of 22 solvers in POP critical regions. The critical regions corresponding to CR1 and CR6 are presented in Table 4. Table 4. Numerical example: A sub-set of the multi-parametric solution of second level problems CR
2nd level objective
2nd level var.
CR1,1 2.2792 ≤ x1 ≤ 10 y1 ∈ {0, 1}
Definition
u1,1 = 4x1 2 + 1.7778x1 +6y1 + 3.6597
x2 = −0.8889x1 + 0.2951 y2 = 1
CR6,1 −3.2852 ≤ x1 ≤ 10 y1 ∈ {0, 1}
u6,1 = 4x1 2 + 1.6098x1 +6y1 + 2.75
x2 = −0.8049x1 − 0.75 y2 = 0
Step 5: The parametric solutions for the second level problem obtained in Step 4 are used to formulate 22 single level deterministic MIQP problems, each corresponding to a critical region of the second level problem. Each critical region definition is added to the first level problem as a new set of constraints and the affine functions of the second and third level decision variables are substituted in the objective, resulting into MIQP problems that involve only first level variables x1 , y1 . The MIQPs formulated from CR1,1 and CR6,1 are presented below as (9) and (10) respectively. min
x1 ,y1
s.t.
2
z1,1 = 5x1 2 + 6(−0.8889x1 + 0.2951) + 3y1 + 3(1) −3(−2.56x1 − 2.88(−0.8889x1 + 0.2951) + 4.6) 2.2792 ≤ x1 ≤ 10 y1 ∈ {0, 1}
(9)
586
S. Avraamidou and E. N. Pistikopoulos
min
x1 ,y1
s.t.
2
z6,1 = 5x1 2 + 6(−0.8049x1 − 0.75) + 3y1 + 3(0) −3(−165x1 − 205(−0.8049x1 − 0.75) − 25) − 3.2852 ≤ x1 ≤ 10 y1 ∈ {0, 1}
(10)
Step 6: The 22 single level MIQP problems formulated in Step 5 are solved R mixed-integer quadratic programming solver. The resulting using CPLEX solutions from problems (9) and (10) are presented in Table 5. Table 5. Numerical Example: A sub-set of the first level problem solutions CR
Objectives
Decision variables
CR1,1 z1,1 = 35.6995, u1,1 = 30.4913, v=0
x1 = 2.2792, y1 = 0, x2 = −1.7308 y2 = 1, x3 = 3.7500, y3 = 0, y4 = 0
CR6,1 z6,1 = −9.3512, u6,1 = 2.7583, v = −43.0470
x1 = −0.4076, y1 = 0, x2 = −0.4220 y2 = 0, x3 = 3.7500, y3 = 1, y4 = 0
Step 7: The comparison optimization problem (5) is then solved using the information in Tables 3, 4 and 5. The exact global optimum is lying in CR6,1 with optimal decisions x1 = −0.4076, y1 = 0, x2 = −0.4220y2 = 0, x3 = 3.7500, y3 = 1 and y4 = 0. The computational performance of the algorithm for this numerical example is presented in Table 6.
3
Computational Studies
A small set of tri-level mixed-integer quadratic problems of different sizes was solved to investigate the capabilities of the proposed algorithm. The randomly generated problems have the general mathematical form of (2) and all variables appear in all three optimization levels. Table 6 presents the studied problems, where XT denotes the total number of continuous variables of the tri-level problem, YT denotes the total number of binary variables of the tri-level problem, X1 , X2 and X3 denote the number of continuous decision variables of the first, second and third optimization level respectively, Y1 , Y2 and Y3 denote the number of binary decision variables of the first, second and third optimization level respectively, C denotes the total number of constraints in the first, second and third optimization level, L1, L2, and L3 denote the time required to solve each optimization level, Com denotes the time required to solve the comparison problem and CPU denotes the total computational time for each test problem in seconds. The computations were carried out on a 2-core machine with an Intel Core i7 at 3.1 GHz and 16 GB of RAM, MATLAB R2016a, and IBM ILOG CPLEX Optimization Studio 12.6.3. The test problems presented in Table 6 can be found in parametric.tamu.edu website as ‘BPOP TMIQP’.
A Global Optimization Algorithm for the Solution of T-MIQP Problems
587
Table 6. Computational results of the presented algorithm for tri-level MIQP problems of the general form (2) Problem (6)
XT YT X1 X2 X3 Y1 Y2 Y3 C L3(s)
L2(s)
L1(s)
Com(s) CPU(s)
3
4
1
1
1
1
1
2
4
4.4278 1.7434 0.1291 0.0125
6.3128
TMIQP1 13
7
3
5
5
2
2
3
4
2.3567 0.5560 0.0042 0.0361
2.9530
TMIQP2 15
6
5
5
5
2
2
2
5
1.2253 0.1884 0.0021 0.0004
1.4161
TMIQP3 16
5
1
1
14
1
1
3
6
0.7164 0.8937 0.0018 0.0004
1.6123
TMIQP4 18
7
1
2
15
1
1
5
3
6.3971 1.9648 0.0081 0.0026
8.3726
4
Conclusions
An algorithm for the solution of tri-level mixed-integer quadratic problems is introduced, as an extension to the global solution algorithms recently developed for different classes of mixed-integer multi-level problems using multi-parametric programming [2–4,18]. The problem under consideration involves both integer and continuous variables at all optimization levels and has linear constraints and quadratic objective functions, with the quadratic terms in the objective function of the second level problem not containing third level variables. This is the only algorithm, to our knowledge that can handle mixed-integer non-linear tri-level problems. The algorithm has been implemented in a MATLAB based toolbox, and its performance and efficiency were assessed through a set of randomly generated test problems. The limiting step of the proposed algorithm was shown to be the solution of the multi-parametric problem in Step 2. Future work will involve the use of the presented algorithm to solve a robust optimization problem case study. The presented procedure will also be expanded for the solution of more general classes of mixed-integer tri-level problems.
References 1. Alguacil, N., Delgadillo, A., Arroyo, J.: A trilevel programming approach for electric grid defense planning. Comput. Oper. Res. 41(1), 282–290 (2014) 2. Avraamidou, S., Pistikopoulos, E.N.: B-POP: Bi-level parametric optimization toolbox. Comput. Chem. Eng. 122, 193–202 (2018) 3. Avraamidou, S., Pistikopoulos, E.N.: A Multi-Parametric optimization approach for bilevel mixed-integer linear and quadratic programming problems. Comput. Chem. Eng. 122, 98–113 (2019) 4. Avraamidou, S., Pistikopoulos, E.N.: Multi-parametric global optimization approach for tri-level mixed-integer linear optimization problems. J. Global Optim. (2018) 5. Blair, C.: The computational complexity of multi-level linear programs. Ann. Oper. Res. 34(1), 13–19 (1992) 6. Brown, G., Carlyle, M., Salmer´ on, J., Wood, K.: Defending critical infrastructure. Interfaces 36(6), 530–544 (2006)
588
S. Avraamidou and E. N. Pistikopoulos
7. Chen, B., Wang, J., Wang, L., He, Y., Wang, Z.: Robust optimization for transmission expansion planning: minimax cost vs. minimax regret. IEEE Trans. Power Syst. 29(6), 3069–3077 (2014) 8. Dua, V., Bozinis, N., Pistikopoulos, E.: A multiparametric programming approach for mixed-integer quadratic engineering problems. Comput. Chem. Eng. 26(4–5), 715–733 (2002) 9. Faisca, N.P., Saraiva, P.M., Rustem, B., Pistikopoulos, E.N.: A multi-parametric programming approach for multilevel hierarchical and decentralised optimisation problems. Comput. Manag. Sci. 6, 377–397 (2009) 10. Han, J., Zhang, G., Hu, Y., Lu, J.: A solution to bi/tri-level programming problems using particle swarm optimization. Inf. Sci. 370–371, 519–537 (2016) 11. Kassa, A., Kassa, S.: A branch-and-bound multi-parametric programming approach for non-convex multilevel optimization with polyhedral constraints. J. Global Optim. 64(4), 745–764 (2016) 12. Misener, R., Floudas, C.: Antigone: algorithms for continuous/integer global optimization of nonlinear equations. J. Global Optim. 59(2–3), 503–526 (2014) 13. Moreira, A., Street, A., Arroyo, J.: An adjustable robust optimization approach for contingency-constrained transmission expansion planning. IEEE Trans. Power Syst. 30(4), 2013–2022 (2015) 14. Ning, C., You, F.: Data-driven adaptive nested robust optimization: general modeling framework and efficient computational algorithm for decision making under uncertainty. AIChE J. 63, 3790–3817 (2017) 15. Oberdieck, R., Diangelakis, N., Nascu, I., Papathanasiou, M., Sun, M., Avraamidou, S., Pistikopoulos, E.: On multi-parametric programming and its applications in process systems engineering. Chem. Eng. Res. Design 116, 61–82 (2016) 16. Oberdieck, R., Diangelakis, N., Papathanasiou, M., Nascu, I., Pistikopoulos, E.: Pop-parametric optimization toolbox. Ind. Eng. Chem. Res. 55(33), 8979–8991 (2016) 17. Oberdieck, R., Pistikopoulos, E.: Explicit hybrid model-predictive control: the exact solution. Automatica 58, 152–159 (2015) 18. Oberdieck, R., Diangelakis, N.A., Avraamidou, S., Pistikopoulos, E.N.: On unbounded and binary parameters in multi-parametric programming: Applications to mixed-integer bilevel optimization and duality theory. J. Glob. Optim. 69(3), 587–606 (2017) 19. Sahinidis, N.: BARON 17.8.9: Global Optimization of Mixed-Integer Nonlinear Programs, User’s Manual 20. Sakawa, M., Matsui, T.: Interactive fuzzy stochastic multi-level 0–1 programming using tabu search and probability maximization. Expert Syst. Appl. 41(6), 2957– 2963 (2014) 21. Sakawa, M., Nishizaki, I., Hitaka, M.: Interactive fuzzy programming for multilevel 0–1 programming problems through genetic algorithms. Eur. J. Oper. Res. 114(3), 580–588 (1999) 22. Woldemariam, A., Kassa, S.: Systematic evolutionary algorithm for general multilevel Stackelberg problems with bounded decision variables (SEAMSP). Ann. Oper. Res. (2015) 23. Xu, X., Meng, Z., Shen, R.: A tri-level programming model based on conditional value-at-risk for three-stage supply chain management. Comput. Ind. Eng. 66(2), 470–475 (2013) 24. Yao, Y., Edmunds, T., Papageorgiou, D., Alvarez, R.: Trilevel optimization in power network defense. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37(4), 712–718 (2007)
A Method for Solving Some Class of Multilevel Multi-leader Multi-follower Programming Problems Addis Belete Zewde1 and Semu Mitiku Kassa1,2(B) 1
2
Department of Mathematics, Addis Ababa University, P.O.Box 1176, Addis Ababa, Ethiopia [email protected] Department of Mathematics and Statistical Sciences, Botswana International University of Science and Technology, P/Bag 16, Palapye, Botswana [email protected]
Abstract. Multiple leaders with multiple followers games serve as an important model in game theory with many applications in economics, engineering, operations research and other fields. In this paper, we have reformulated a multilevel multi-leader multiple follower (MLMLMF) programming problem into an equivalent multilevel single-leader multiple follower (MLSLMF) programming problem by introducing a suppositional (or dummy) leader. If the resulting MLSLMF programming problem consist of separable terms and parameterized common terms across all the followers, then the problem is transformed into an equivalent multilevel programs having a single leader and single follower at each level of the hierarchy. The proposed solution approach can solve multilevel multileader multi-follower problems whose objective values in both levels have common, but having different positive weights of, nonseparable terms. Keywords: Multilevel multi-leader multi-follower programming Multilevel programming · Multi-parametric programming · Branch-and-bound
1
·
Introduction
Multi-leader-follower games are a class of hierarchical games in which a collection of leaders compete in a Nash game constrained by the equilibrium conditions of another Nash game amongst the followers. Generally, in a game, when several players take the position as leaders and the rest of players take the position as followers, it becomes a multi-leader-follower game. The leader-follower Nash equilibrium, a solution concept for the multi-leader-follower game, can be defined as a set of leaders’ and followers’ strategies such that no player (leader or follower) can improve his status by changing his own current strategy unilaterally. The early study associated with the multi-leader-follower game and equilibrium problem with equilibrium constraints (EPEC) could date back to 1984 by c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 589–599, 2020. https://doi.org/10.1007/978-3-030-21803-4_59
590
A. B. Zewde and S. M. Kassa
Sherali [14], where a multi-leader-follower game was called a multiple Stackelberg model. While multi-leader generalizations were touched upon by Okuguchi [11], Sherali [14] presented amongst the first models for multi-leader-follower games in a Cournot regime. Sherali [14] established existence of an equilibrium by assuming that each leader can exactly anticipate the aggregate follower reaction curve. He also showed the uniqueness of equilibrium for a special case where all leaders share an identical cost function and make identical decisions. As Ehrenmann [2,3] pointed out, the assumption that all leaders make identical decisions is essential for ensuring the uniqueness result. In addition, Su [16] considered a forward market equilibrium model that extended the existence result of Sherali [14] under some weaker assumptions. Pang and Fukushima [13] considered a class of remedial models for the multi-leader-follower game that can be formulated as a generalized Nash equilibrium problem (GNEP) with convexified strategy sets. Based on the strong stationarity conditions of each leader in a multi-leader-follower game, Leyffer and Munson [10] derived a family of nonlinear complementarity problem, nonlinear program, and MPEC formulations of the multi-leader multi-follower games. In [15], Su proposed a sequential nonlinear complementarity problem (NCP) approach for solving EPECs. There are several instances of EPECs for which equilibria have been shown to exist, but there are also fairly simple EPECs which admit no equilibria as shown in [12]. Definitive statements on the existence of equilibria have been obtained mainly for two level multi-leader-follower games with specific structure. In the majority of these settings, the uniqueness of the follower-level equilibrium is assumed to construct an implicit form (such as problems with convex strategy sets) which allows for the application of standard fixed-point theorems of Brouwer and Kakutani [1,4]. Indeed when the feasible region of the EPEC is convex and compact, the multi-leader multi-follower game can be thought of as a conventional Nash game or a generalized Nash game and the existence of a global equilibrium follows from classical results. But the equilibrium constraint in an EPEC is known for being nonconvex and for lacking the continuity properties required to apply fixed point theory. Consequently, most standard approaches fail to apply to EPECs and there currently exists no general mathematical paradigm that could be built upon to make a theory for general EPECs. In [8], Kulkarni identified subclasses of the non-shared constraint multi-leader multi-follower games for which the existence of equilibria can be guaranteed and showed that when the leader-level problem admits a potential function, the set of global minimizers of the potential function over the shared constraint are the equilibria of the multi-leader multi-follower game; in effect, this reduces to a question of the existence of an equilibrium to that of the existence of a solution to an MPEC. The above result is extended later in [9] to work for quasi-potential functions. In [17], Sun have reformulated the generalized Nash equilibrium problem into an equivalent bilevel programming problem with one leader and multiple followers. In particular, if the followers problems are separable, then it has been shown that the generalized Nash equilibrium problem is equivalent to the bilevel programming problem having a single decision maker at both levels.
A Method for Solving Some Class of Multilevel Multi-leader
591
In [7], Kassa and Kassa have reformulated the class of multilevel programs with single leader and multiple followers, that consist of separable terms and parameterized common terms across all the followers, into equivalent multilevel programs having single follower at each level. Then the resulting (non-convex) multilevel problem is solved by a solution strategy, called a branch-and-bound multi-parametric programming approach, they have developed in [6]. In most of the literature reviewed above, the existence of equilibria have been obtained mainly for multi-leader-follower games with specific structure (such as bilevel case, single leader case) and with constraint sets and/or objective functions assumed to have a nice property (such as linearity, convexity, differentiability, separability). In this paper we consider an equivalent reformulation of a multilevel multi-leader multi-follower programming problem into a multilevel single leader multiple follower programming problem with one more level of hierarchy. Then for some special classes of problems, the reformulated problem is transformed into an equivalent multilevel programs having only single follower at each level of the hierarchy - and hence proposing a solution approach for multilevel multi-leader-follower games.
2
Problem Formulation
Multilevel programs involving multiple decision makers at each level over the hierarchy are called multilevel multi-leader multi-follower (MLMLMF) programming problem. A general k-level multi-leader multi-follower programming problem involving N leaders and multiple followers at each level can be described by: min F1n (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ), n ∈ 1, . . . , N
n ∈Y n y1 1
n i −i j −j l −l s.t. Gn 1 (y1 , y2 , y2 , y3 , y3 , . . . , yk , yk ) ≤ 0
H1 (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ) ≤ 0 min f2i (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ), i ∈ 1, . . . , I
i ∈Y i y2 2
s.t. g2i (y1n , y1−n , y2i , y3j , y3−j , . . . , ykl , yk−l ) ≤ 0, h2 (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ) ≤ 0 min f3j (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ), j ∈ 1, . . . , J j
j
y3 ∈Y3
s.t. g3j (y1n , y1−n , y2i , y2−i , y3j , . . . , ykl , yk−l ) ≤ 0, h3 (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ) ≤ 0 .. min
l ∈Y l yk k
.
fkl (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ),
l ∈ 1, . . . , L
s.t. gkl (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl ) ≤ 0, hk (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ) ≤ 0
(1)
592
A. B. Zewde and S. M. Kassa
where y1n ∈ Y1n is a decision vector for the leader’s optimization problem and y1−n is a vector of the decision variables for all leaders without the decision variables y1n , of the nth leader. i.e., y1−n = (y11 , . . . , y1n−1 , y1n+1 , . . . , y1n ), where, n = 1, 2, . . . , N . The shared constraint H1 is the leaders common constraint set whereas, the constraint Gn1 determines the constraint only for the c ∈ Ymc is a decision vector for the cth follower at level m, and nth leader. ym −c 1 c−1 c+1 n , ym , . . . , ym ), where, c = i, j, . . . , l and m ∈ 2, 3, . . . , k. The ym = (ym , . . . , ym th shared constraint hm is the m level followers common constraint set whereas, c determines the constraint only for the cth follower at the mth the constraint gm level optimization problem.
3
Equivalent Formulation
In this section, we will consider the equivalent reformulation of a multi-level programs with multiple leaders and multiple followers into multi-level programs having a single leader and multiple followers. For the sake of clarity in presentation, the methodology is described using a bilevel programs with multiple leaders and multiple followers, however it can be extended to a general k-level case. Consider a bilevel multi-leader multi-follower (BLMLMF) programming problem involving N leaders in the upper level problem and M followers at lower level problem which is defined as: min Fi (xi , x−i , y j , y −j )
xi ∈X i
s.t. Gi (xi , y j , y −j ) ≤ 0 H(xi , x−i , y j , y −j ) ≤ 0 min fj (xi , x−i , y j , y −j ) j j
(2)
y ∈Y
s.t. gj (xi , x−i , y j ) ≤ 0 h(xi , x−i , y j , y −j ) ≤ 0 Let us assume that Fi , Gi , H, h, fj , gj , i = 1, 2, . . . , N , j = 1, 2, . . . , M are twice continuously differentiable functions and that the followers constraint functions satisfy the Guignard constraint qualifications conditions and let us define some relevant sets related to problem (2) as follows: (i) The feasible set of problem (2) is given by: A = (xi , x−i , y j , y −j ) : gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0, Gi (xi , y j , y −j ) ≤ 0 H(xi , x−i , y j , y −j ) ≤ 0, i = 1, . . . , N, j = 1, . . . , M .
(ii) The feasible set for the j th follower (for any leaders strategy x = (xi , x−i )) can be defined as Aj (xi , x−i , y −j ) = y j ∈ Y j , : gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0 .
A Method for Solving Some Class of Multilevel Multi-leader
593
(iii) The Nash rational reaction set for the j th follower is defined by the set of parametric solutions, j y¯ ∈ Y j : y¯j ∈ argmin fj (xi , x−i , y j , y −j ) : Bj (xi , x−i , y −j ) = y j ∈ Aj (xi , x−i , y −j ) , j = 1, . . . , M . (iv) The feasible set for the ith leader is defined as Ai (x−i ) =
i j −j (x , y , y ) ∈ X i × Y j × Y −j : Gi (xi , y j , y −j ) ≤ 0, H(xi , x−i , y j , y −j ) ≤ 0, gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0, y j ∈ Bj (xi , x−i , y −j ), j = 1, . . . , M .
(v) The Nash rational reaction set for the ith leader is defined as Bi (x−i ) = (xi , y j , y −j ) ∈ X i × Y j × Y −j : xi ∈ arg min Fi (xi , x−i , y j , y −j ) : (xi , y j , y −j ) ∈ Ai (x−i ), i = 1, . . . , N . (vi) The set of Nash equilibrium points (optimal solutions) of problem (2) is given by
S = (xi , x−i , y j , y −j ) : (xi , x−i , y j , y −j ) ∈ A, (xi , y j , y −j ) ∈ Bi (x−i ), i = 1, . . . , N .
Equivalent TLSLMF Form for BLMLMF Now will formulate an equivalent trilevel single-leader multi-follower (TLSLMF) programming problem for (2) and we will show their equivalence. Let us add an upper level decision maker, a suppositional (or dummy) leader, to the problem (2) with the corresponding decision variable z, where z = (x, y) = (x1 , x2 , . . . , xn , y 1 , y 2 , . . . , y m ), and objective function 0. Then the multiple leaders in the upper level problem of (2) becomes middle-level follower and multiple followers in the lower level problem of (2) becomes bottom-level follower in the second level and we will get the following TLSLMF programming: min 0 z
s.t. z = (x, y), min Fi (xi , x−i , y j , y −j ) xi
s.t. Gi (xi , y j , y −j ) ≤ 0, i = 1, . . . , N H(xi , x−i , y j , y −j ) ≤ 0
(3)
min fj (xi , x−i , y j , y −j ) yj
s.t. gj (xi , x−i , y j ) ≤ 0, j = 1, . . . , M h(xi , x−i , y j , y −j ) ≤ 0. Let us assume that each of the objective function is convex with respect to its own decision variable for the second and third level followers and Guignard constraint qualifications hold for the followers constraints. Moreover, related to problem (3) we shall denote
594
A. B. Zewde and S. M. Kassa
(i) the feasible set for the third level followers problem by Ω3 (xi , x−i , y −j ); (ii) the rational reaction set for the third level followers problem by Ψ3 (xi , x−i , y −j ); (iii) the feasible set for the second level problem by Ω2 (x−i ); (iv) the rational reaction set for the second level followers problem by Ψ2 (x−i ); (v) the feasible set of problem (3) is given by: Φ = (z, xi , x−i , y j , y −j ) : z = (x, y), gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0, Gi (xi , y j , y −j ) ≤ 0, H(xi , x−i , y j , y −j ) ≤ 0, i = 1, . . . , N, j = 1, . . . , M ;
(vi) and the inducible region of problem (3) is given by: IR = (z, xi , x−i , y j , y −j ) : (z, xi , x−i , y j , y −j ) ∈ Φ, (xi , y j , y −j ) ∈ Ψ2 (x−i ) .
With these notations and definitions, problem (3) could be rewritten as: min 0 z
(4)
s.t. (z, xi , x−i , y j , y −j ) ∈ IR
Since every feasible point of (4) is an optimal point, the optimal set of (4) is given by S ∗ = IR = (z, xi , x−i , y j , y −j ) : (z, xi , x−i , y j , y −j ) ∈ Φ, (xi , y j , y −j ) ∈ Ψ2 (x−i ) .
Once we have established relations between the BLMLMF problem (2) and the TLSLMF problem (3). We will describe their equivalence with the following conclusions. Theorem 31 A point (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (2) if and only if (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (4). Proof:- Suppose that (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (2), i.e., (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S which implies that, (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ A, (x∗,i , y ∗,j , y ∗,−j ) ∈ Bi (x∗,−i ), i = 1, . . . , N . This implies (x∗,i , y ∗,j , y ∗,−j ) ∈ Ψ2 (x∗,−i ), gj (x∗,i , x∗,−i , y ∗,j ) ≤ 0, h(x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ≤ 0, Gi (x∗,i , y ∗,j , y ∗,−j ) ≤ 0, H(x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ≤ 0, i = 1, . . . , N, j = 1, . . . , M.
Then for any point (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ), z ∗ = (x∗ , y ∗ ), and (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S, we have ∗,i
(x
,y
∗,j
,y
∗,−j
∗,i
≤ 0, Gi (x
∗,−i
) ∈ Ψ2 (x
,y
∗,j
,y
∗,−j
∗
∗
∗
∗,i
), z = (x , y ), gj (x ∗,i
) ≤ 0, H(x
,x
∗,−i
,y
∗,j
,y
,x
∗,−i
∗,−j
,y
∗,j
∗,i
) ≤ 0, h(x
,x
∗,−i
,y
∗,j
,y
∗,−j
)
) ≤ 0, i = 1, . . . , N, j = 1, . . . , M.
This implies that (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ Φ and (x∗,i , y ∗,j , y ∗,−j ) ∈ Ψ2 (x∗,−i ). Therefore (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ IR = S ∗ and hence (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (4).
A Method for Solving Some Class of Multilevel Multi-leader
595
Conversely, suppose that (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (4), i.e., (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S ∗ , then we have (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ Φ and (x∗,i , y ∗,j , y ∗,−j ) ∈ Ψ2 (x∗,−i ). This implies the following ∗,i
(x
,y
∗,j
∗,i
Gi (x
,y
,y
∗,−j
∗,j
,y
∗,−i
) ∈ Bi (x
∗,−j
∗,i
), gj (x ∗,i
) ≤ 0, H(x
,x
∗,−i
,x ,y
∗,−i
∗,j
,y
,y
∗,j
∗,−j
∗,i
) ≤ 0, h(x
,x
∗,−i
,y
∗,j
,y
∗,−j
) ≤ 0,
) ≤ 0, i = 1, . . . , N, j = 1, . . . , M, i = 1, . . . , N.
This implies that (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ A, (x∗,i , y ∗,j , y ∗,−j ) ∈ Bi (x∗,−i ), i = 1, . . . , N . Therefore (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S and hence (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (2). 2 Remark 1. The idea described above can be extended to any finite k-level multileader multi-follower programming problem. By adding an upper decision maker, problem (1) can be equivalently reformulated as (k +1)-level MLSLMF programming. As a result, leaders in the upper level problem of (1) becomes followers at the second-level and followers at mth -level problem of (1) becomes followers at th (m + 1) -level, where m ∈ 2, . . . , k.
4
Solution Approach For Special Problems
In this section we suggest an appropriate solution method to solve a multilevel program with multiple leader and multiple follower at each decision level. And we introduce a pseudo algorithmic approach to solve some classes of multilevel program with multiple leader and multiple follower. The basic steps of the proposed algorithm was as follows: (1) Reformulate the given multilevel program with multiple leader and multiple follower into equivalent multilevel program with single leader and multiple follower as discussed in Sect. 3. (2) If the resulting problem in step (1) above have a property that at all levels in the hierarchy each followers’ objective function consisting of separable terms and parameterized common terms across all followers of the same level, then it can be reformulated into equivalent multilevel program having a single follower over the hierarchy as discussed in Ref. [7]. (3) Then to solve the resulting problem in step (2) above, we can apply the following approaches: (i) Multi-parametric programming approach for convex case: for multilevel programming problems having convex quadratic objective function and affine constraints at each decision levels, we should apply a multi-parametric programming (MPP) suggested in [5] to solve multi level hierarchical optimization problems. The approach starts by rewriting the most inner level optimization problem as a multi-parametric programming problem, where the upper level optimization variables are considered as parameters. The resulting problem can be solved globally and the solution can be substituted into the most nearby upper level optimization problem.
596
A. B. Zewde and S. M. Kassa
(ii) Branch-and-bound and MPP approach for non-convex case: when a multilevel programming problem contains a special non-convexity formulation in the objectives at each decision level and constraints at each level are polyhedral, we should apply the branch-and-bound multiparametric programming approach proposed in [6]. The approach starts by convexifying the inner level problem to underestimate them by convex functions while the variables from upper level problems are considered as parameters. Then, the resulting convex parametric under-estimator problem is solved using multi-parametric programming approach.
5
Example
Consider the following bilevel multi-leader multi-follower programming problem: 1 min F1 (x1 , x2 , y1 , y2 ) = x1 − y1 x1 2 1 min F2 (x1 , x2 , y1 , y2 ) = − x2 − y2 x2 2 s.t. 0 ≤ x1 , x2 ≤ 1 1 min f1 (x1 , x2 , y1 , y2 ) = y1 (−1 + x1 + x2 ) + y12 y1 2 min f2 (x1 , x2 , y1 , y2 ) = y2 (−1 + x1 + x2 ) + y2
1 2 y2 2
(5)
s.t. y1 ≥ 0, y2 ≥ 0 An equivalent tri-level single-leader multi-follower problem for (5) is given by: min 0 z
s.t. z = (x, y) 1 min F1 (x1 , x2 , y1 , y2 ) = x1 − y1 x1 2 1 min F2 (x1 , x2 , y1 , y2 ) = − x2 − y2 x2 2
s.t. 0 ≤ x1 , x2 ≤ 1 1 min f1 (x1 , x2 , y1 , y2 ) = y1 (−1 + x1 + x2 ) + y12 y1 2 min f2 (x1 , x2 , y1 , y2 ) = y2 (−1 + x1 + x2 ) + y2
s.t. y1 ≥ 0, y2 ≥ 0
1 2 y2 2
(6)
A Method for Solving Some Class of Multilevel Multi-leader
597
Then (6) is transformed into the following tri-level programming problem with single follower: min 0 z
s.t. z = (x, y) 1 1 min F (x1 , x2 , y1 , y2 ) = x1 − x2 − y1 − y2 x1 ,x2 2 2 s.t. 0 ≤ x1 , x2 ≤ 1 1 2 1 2 min f (x1 , x2 , y) = y1 + y2 + y1 (−1 + x1 + x2 ) + y2 (−1 + x1 + x2 ) y1 ,y2 2 2 s.t. y1 ≥ 0, y2 ≥ 0
(7) Then the third level problem in (7) can be considered as a MPP problem with parameter x = (x1 , x2 ): 1 2 1 2 y1 + y2 + y1 (−1 + x1 + x2 ) + y2 (−1 + x1 + x2 ) 2 2 s.t. 0 ≤ x1 , x2 ≤ 1, y1 ≥ 0, y2 ≥ 0
min f (x1 , x2 , y) =
y1 ,y2
(8)
The Lagrangian of the problem is given by, L(x, y, λ) = 12 y12 + 12 y22 + y1 (−1 + x1 + x2 ) + y2 (−1 + x1 + x2 ) and the KKT points are given by ⎧ ∂L ∂L ⎪ ⎪ ⎨ y1 ∂y = y1 (−1 + x1 + x2 ) = 0, ∂y = −1 + x1 + x2 ≥ 0, y1 ≥ 0, 1 1 ⎪ ∂L ∂L ⎪ ⎩ y2 = y2 (−1 + x1 + x2 ) = 0, = −1 + x1 + x2 ≥ 0, y2 ≥ 0 ∂y2 ∂y2 Therefore, the parametric solution with the corresponding critical regions are given by (Fig. 1):
Fig. 1. Critical regions for the second level problem of (7) ⎧ ⎧
1 − x1 − x2 0 ⎪ ⎪ ⎪ ⎪ ⎨ y ∗ (x) = ⎨ y ∗ (x) = 1 − x1 − x2 0 and CR2 = CR1 = + x ≤ 1 + x ≥ 1 x x ⎪ ⎪ 1 2 1 2 ⎪ ⎪ ⎩ ⎩ 0 ≤ x1 , x2 ≤ 1 0 ≤ x1 , x2 ≤ 1 which can be incorporated into the second level followers problem of (7) and after solving the resulting problems in each critical region we have the following solutions:
598
A. B. Zewde and S. M. Kassa
– In CR1 , the optimal solution is (x1 , x2 , y1 , y2 ) = (0, 0, 1, 1) with the corresponding second level follower problem objective value F = −2. – In CR2 , the optimal solution is (x1 , x2 , y1 , y2 ) = (0, 1, 0, 0) with the corresponding second level follower problem objective value F = 0. Since the objective value obtained in CR1 is better we can take (x1 , x2 , y1 , y2 ) = (0, 0, 1, 1) as an optimal solution to the second level followers problem of (7). Therefore, the optimal solution to the bilevel multi-leader multi-follower programming problem (5) is (x1 , x2 , y1 , y2 ) = (0, 0, 1, 1) with the corresponding objective values F1 = −1, F2 = −1, f2 = −0.5 and f2 = −0.5.
6
Conclusion
In a multilevel multi-leader-follower programming problem, various relationships among multiple leaders in the upper-level and multiple followers at the lower-level would generate different decision processes. To support decision in such problems, this work considered a class of multilevel multi-leader multi-follower programming problem, that consist of separable terms and parameterized common terms across all objective functions of the followers and leaders, into multilevel single-leader multi-follower programming problem. Then the reformulated problem is transformed into an equivalent multilevel programs having only single follower at each level of the hierarchy. Finally this single leader hierarchical problem is solved using the solution procedure proposed in [5, 7]. The proposed solution approach can solve multilevel multi-leader multi-follower problems whose objective values in all levels have common, but having different positive weights of, nonseparable terms and with the constraints at each level are polyhedral. However, much more research is needed in order to provide algorithmic tools to effectively solve the procedures. In this regard we feel it deserves further investigations.
References 1. Ba¸sar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Classics in Applied Mathematics. SIAM, Philadelphia (1999) 2. Ehrenmann, A.: Equilibrium problems with equilibrium constraints and their applications in electricity markets. Dissertation, Judge Institute of Management, Cambridge University, Cambridge, UK (2004) 3. Ehrenmann, A.: Manifolds of multi-leader Cournot equilibria. Oper. Res. Lett. 32, 121–125 (2004) 4. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research, vol. I, 1st edn. Springer, New York (2003) 5. Fa´ısca, N.P., Saraiva, M.P., Rustem, B., Pistikopoulos, N.E.: A multi-parametric programming approach for multilevel hierarchical and decentralised optimisation problems. Comput. Manag. Sci. 6, 377–397 (2009) 6. Kassa, A.M., Kassa, S.M.: A branch-and-bound multi-parametric programming approach for general non-convex multilevel optimization with polyhedral constraints. J. Glob. Optim. 64(4), 745–764 (2016)
A Method for Solving Some Class of Multilevel Multi-leader
599
7. Kassa, A.M., Kassa, S.M.: Deterministic solution approach for some classes of nonlinear multilevel programs with multiple follower. J. Glob. Optim. 68(4), 729– 747 (2017) 8. Kulkarni, A.A.: Generalized Nash games with shared constraints: existence, efficiency, refinement and equilibrium constraints. Ph.d. Dissertation, Graduate College of the University of Illinois, Urbana, Illinois (2010) 9. Kulkarni, A.A., Shanbhag, U.V.: An existence result for hierarchical stackelberg v/s stackelberg games. IEEE Trans. Autom. Control 60(12), 3379–3384 (2015) 10. Leyffer, S., Munson, T.: Solving multi-leader-common-follower games. Optim. Methods Softw. 25(4), 601–623 (2010) 11. Okuguchi, K.: Expectations and stability in oligopoly models. In: Lecture Notes in Economics and Mathematical Systems, vol. 138. Springer, Berlin (1976) 12. Pang, J.S., Fukushima, M.: Quasi-variational inequalities, generalized nash equilibria, and multi-leader-follower games. Comput. Manag. Sci. 2(1), 21–56 (2005) 13. Pang, J.S., Fukushima, M.: Quasi-variational inequalities, generalized nash equilibria, and multi-leader-follower games. Comput. Manag. Sci. 6, 373–375 (2009) 14. Sherali, H.D.: A multiple leader stackelberg model and analysis. Oper. Res. 32(2), 390–404 (1984) 15. Su, C.L.: A sequential ncp algorithm for solving equlibrium problems with equilibrium constraints. Technical report, Department of Management Science and Engineering, Stanford University (2004) 16. Su, C.L.: Analysis on the forward market equilibrium model. Oper. Res. Lett. 35(1), 74–82 (2007) 17. Sun, L.: Equivalent bilevel programming form for the generalized nash equilibrium problem. J. Math. Res. 2(1), 8–13 (2010)
A Mixture Design of Experiments Approach for Genetic Algorithm Tuning Applied to Multi-objective Optimization Taynara Incerti de Paula1(B) , Guilherme Ferreira Gomes2 , Jos´e Henrique de Freitas Gomes1 , and Anderson Paulo de Paiva1 1
2
Institute of Industrial Engineering, Federal University of Itajub´ a, Itajub´ a, Brazil [email protected] Mechanical Engineering Institute, Federal University of Itajub´ a, Itajub´ a, Brazil
Abstract. This study applies mixture design of experiments combined with process variables in order to assess the effect of the genetic algorithm parameters in the solution of a multi-objective problem with weighted objective functions. The proposed method allows defining which combination of parameters and weights should be assigned to the objective functions in order to achieve target results. A study case of a flux cored arc welding process is presented. Four responses were optimized by using the global criterion method and three genetic algorithm parameters were analyzed. The method proved to be efficient, allowing the detection of significant interactions between the algorithm parameters and the weights for the objective functions and also the analysis of the parameters effect on the problem solution. The procedure also proved to be efficient for the definition of the optimal weights and parameters for the optimization of the welding process. Keywords: Genetic algorithm tuning · Mixture design of experiments · Multi-objective optimization · Global criterion method
1
Introduction
Several multi-objective optimization techniques perform the scalarization of different responses by multiplying weighting factors to each response in order to prioritize the most important ones, e.g., weighted sums, global criterion method, normal boundary intersection. A common practice among researchers is to define the multi-objective problem using one of these methods and then apply a metaheuristic as the search technique, in order to find the optimal solution. When dealing with genetic algorithms (GA) as the search technique, there is an obstacle that is the tuning of several parameters responsible for the genetic operations and there is no consensus in the literature on how to tune them. An Supported by PDSE-CAPES/Process No 88881.132477/2016-01. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 600–610, 2020. https://doi.org/10.1007/978-3-030-21803-4_60
A Mixture DOE Approach for GA Tuning Applied to MOO
601
inadequate setup of these parameters can affect the performance of the algorithm, leading to unsatisfactory solutions. In order to bypass the configuration issues of the GA, many studies have proposed methods for the optimization of these parameters, which include the use of adaptive techniques, meta-heuristics or yet the use of design of experiments. This study addresses not only the optimization of GA parameters, but also the optimization of the weights applied to the objective functions of a MOP and the interactions which may exist between the weights and the parameters of the algorithm used to solve it. Therefore, this work proposes an experimental procedure that applies the design of experiments methodology, through a mixture design with process variables, in order to evaluate of the influence of the genetic algorithm parameters on the results of a multi-objective optimization problem using weighted responses. By using this procedure, it is also possible to determine both the optimal weights to be used in the GCM function and the optimal parameters to be used for tuning the GA. To demonstrate the applicability of the proposed method, a case study of a flux-cored arc welding (FCAW) process is used. Four input parameters are used to configure the FCAW process and its optimization includes four responses that describe the weld bead geometry.
2 2.1
Theoretical Fundamentals Global Criterion Method
Several methods for optimization of multiple objectives can be found in the literature [1]. For scalarization methods, the strategy is to combine individual objective functions into a single function, which becomes the global objective of the problem. In the global criterion method (GCM), the optimum solution x∗ is found by minimizing a pre-selected global criterion, F (x) [2]. In this study, the global criterion adopted is based on the normalization of the objectives, so they will have the same magnitude. The MCG equation is then defined as in Eq. 1, where fi (x∗ ) is the optimal value for the individual optimization (utopia point) of each response and fi (xmax ) is the most distant value from fi (x∗ ) (nadir point). 2 p fi (x∗ ) − fi (x) wi Min F (x) = fi (xmax ) − fi (x∗ ) (1) i=1 s.t.: gj (x) ≤ 0, j = 1, 2, ...m 2.2
Genetic Algorithms
Inspired by the mechanism of evolution, the genetic algorithm is based on the principles of selection and survival of the fittest and, its main premise is the idea that by combining different pieces of important information to the problem, new and better solutions can be found [3,4]. From a population of solutions rather than a single solution, it is then capable of finding global optimum for
602
T. I. de Paula et al.
constrained and unconstrained optimization problems as well as one or multiple objective functions [5]. The evolution process in the GA is performed by using a set of stochastic genetic operators that manipulate the genetic code [6]. The three main GA operators are the selection, recombination (crossover) and mutation, which are controlled by different parameters (mainly functions and rates) that will affect the proper functioning of the algorithm. The most common parameters cited in the literature are: a. Population size - this parameter is a determining factor in the quality of the solution and the algorithm efficiency, since it specifies how many chromosomes will form a generation [7]. According to [8], the greater the population size, the greater the chance of obtaining satisfactory solutions, but a very large population may increase the algorithm search time. At the same time, if the population size is too small, it lead the algorithm to find a local optimum and not a global one [9]. b. Selection type - it defines which chromosomes will be selected for the next generation. Different methods are mentioned in the literature, such as uniform, stochastic uniform, ranking, tournament and the roulette selection, among others. c. Crossover function - it determines how the exchange of genetic information will happen between two chromosomes. The most commonly functions are the single-point, the double-point and the scattered crossover. When the string length is short, the single-point crossover can be very effective. But, if the string length is too large, it may be necessary to use a function capable of promoting the crossover in more points of the string. d. Crossover rate - it refers to the percent of the parent population that will undergo a crossover operation. Values for this parameter are always between 0 and 1, since it is a probability. According to [10], while a high crossover rate can cause the disposal of good solutions, a very low rate can give too much attention to the parents and then stagnate the search. Yet, according to [11], the higher the crossover rate, the more quickly new structures are introduced into the population. e. Mutation function - as in the operations of selection and recombination, many mutation methods can be found in the literature. The most common are the uniform mutation, the Gaussian mutation and the adaptive feasible (AF). The choice of the mutation type depends largely on the constraints of the problem. The adaptive feasible mutation is indicated for solving restricted problems, while the Gaussian mutation is contraindicated for solving problems with linear constraints. f. Mutation rate - it refers to the percentage of the population of parents who will suffer the mutation operation. As the crossover rate, this parameter is a probability and corresponds to a value between 0 and 1. According to [10], a low level of mutation can prevent bits to converge to a single value in the entire population, while a very high rate makes the search essentially random.
A Mixture DOE Approach for GA Tuning Applied to MOO
603
g. Number of generations - it is used as an algorithm stopping criteria. If the algorithm reaches the specified maximum number of generations without coming to an optimal point, the algorithm should terminate the search. To avoid the problems regarding the parameters tuning, some adaptive techniques have been developed, in which the parameters are adjusted during the algorithm evolution. However, using such techniques, the problem ceases to be the configuration of parameters and happens to be the parameters control while solving the problem [12]. For those who prefer to use conventional techniques, the choice of appropriate parameters is still a challenge. Some research has been done using different methods to identify the best parameter settings for each problem. Some studies used simple methods for comparing different combinations of parameters, like hypotheses tests [13] while others appealed to more advanced techniques such as the Meta-GA approach [14,15]. Design of experiments has also been used by some researchers to optimize GA parameters and also to analyze the effects of the interactions between them [8,10,16]. Significant interactions were found in all studies, but none of these studies evaluated the possible interaction between the algorithm parameters and the weights for the objective functions, since none of the optimization problems involved scalarization optimization methods. 2.3
Mixture Design of Experiments
A mixture design is a special type of response surface experiment in which the factors are the ingredients or components of a mixture, and the response is a function of the proportions of each ingredient [17]. The most common mixture designs are: simplex-lattice, simplex-centroid and extreme-vertices. In the simplex-lattice design the experiments are evenly distributed throughout the region covered by the simplex region. A simplex-lattice for q components is associated to a polynomial model of degree m and can be referred to as a {q, m} simplex-lattice. The proportions assumed by each component take the m + 1 1 2 ; m ; ...; 1 [18]. equally spaced values from 0 to 1, that is, xi = 0, m Including process variables in a mixture experiment can greatly increase the scope of the experiment and it entails setting up a design consisting of the different settings of the process variables in combination with the original mixture design [17,18]. The combination of the process variables can be done by setting up a mixture design at each point of a factorial design. The most complete form of the mixture design coupled with process variables model can be expressed as the product of the terms in the mixture component model and the terms in the process variable model. Equation 2 exemplifies a quadratic model for a mixture design for three components (xi ) combined to a full factorial design for two process variables (zi ) [17].
604
T. I. de Paula et al.
E(y) =
3
βi xi +
i=1
2 3
⎡ ⎤ 2 3 3 ⎣ βij xi xj + αijk xi + αijk xi xj ⎦ zk
i 0, Bansal et al. [3] gave a ( 32 − )-approximation algorithm improving upon the natural barrier of 32 which follows from independent randomized rounding. In simplified terms, their result was obtained by an enhancement of independent randomized rounding via strong negative correlation properties. In 2017, Kalaitzis et al. [7] took a different approach and proposed to use the same elegant rounding scheme for the weighted completion time objective as devised by Shmoys and Tardos [8] for optimizing a linear function subject to makespan constraints. Their main result is a 1.21approximation algorithm for the natural special case where the weight of a job is proportional to its processing time (specifically, all jobs have the same Smith ratio), which expresses the notion that each unit of work has the same weight. For the problem Rm|ri,j | wj Cj , Skutella gave a 2-approximation algorithm in 2001 [10]. It has been a long standing open problem if one can improve upon this 2-approximation. Im and Li answered this question in the affirmative by giving a 1.8786-approximation [6]. Most of the parallel machine scheduling models assume that each machine has no capacity constraints, which means every machine can process an arbitrary number of jobs. In general, it is quite important to balance the number of jobs allocated to each single production facility in many flexible manufacturing systems, for example, in VLSI chip production. For the case of two identical parallel machines with capacity constraint, Yang, Ye and Zhang [12] presented a 1.1626-approximation algorithm which has the first non-trivial ratio to approximate this problem (m = 2) by semidefinite programming relaxation. In this paper, we extend the techniques in [9,12] to complex semidefinite programming and approximate the problem when m = 3 with performance ratio of 1.4446. The rest of the paper is organized as follows. In Sect. 2 we introduce the approximation preserving reduction of P 3|q| wj Cj to Max-(q; q; n-2q) 3-Cut and present the translated guarantee. Then, in Sect. 3 we develop a CSDP-based approximation algorithm for Max-(q; q; n-2q) 3-Cut and present our main results.
2
Preliminaries
In this section, we will show the translation of the approximation algorithm for Max-(q; q; n − 2q) 3-Cut problem to an approximation algorithm for P 3|q| wj Cj . This translation was first given by Skutella [9]. We denote the third roots of unit by 1, ω and ω 2 . Let G = (V ; E) be an undirected graph with vertex set V = {1, 2, · · · , n} and non-negative weights wij = wji on the edges (i; j) ∈ E, the Max-(q; q;n-2q) 3-Cut problem is to find
Scheduling Three Identical Parallel Machines with Capacity Constraints
1091
a partition S = {S1 , S2 , S3 } of G maximizing the total weight of cut edges that satisfies the constraints n−2q ≤ |Si | ≤ q for i = 1, 2, 3, where n = |V |, n3 ≤ q ≤ n and Sk = {i : yi = ω k−1 }, k = 1, 2, 3. When q = n3 , it is just the Max 3-Section problem. In Max-(q; q; n-2q) 3-Cut problem, we require a cut that the cardinality of each part of the partition is not greater than q. We may suppose that |S1 | = x, |S2 | = y and then |S3 | = n − x − y. Thus | j yj | = |x + yω + (n − x − y)ω 2 | = √
| 3x−n + 23i (x + 2y − n)| ≤ 3q − n. 2 Then we have that Max-(q; q; n-2q) 3-Cut can be relaxed as follows (Mq3C): 2 w∗ := max wij (1 − Re(yi · yj )) 3 i0 . Each drone may travel a maximum span of E distance units per operation, where a drone operation is characterized by a triple (d, s, j) as follows: the drone d ∈ D is launched from a drone station s ∈ VS , fulfills a request at j ∈ VN , and returns to the same station from which it was launched. Figure 1 shows an example of a TSDSLP instance and potential TSP and TSDSLP solutions. 1 D
3 s1
2
s2 4
Fig. 1. A TSDSLP with a depot D, four customers VN = {1 . . . 4}, two drone stations VS = {s1 , s2 } that can accommodate two drones each, a TSP solution (middle figure) and a TSDSLP solution (right figure) in which a station is utilized for two deliveries
2.1
Minimal Makespan TSDSLP
In order to formulate the TSDSLP, we introduce the following decision variables: τ ∈ R≥0 xij ∈ {0, 1}
: continuous variable that defines the makespan. : is equal to 1, iff arc (i, j) is part of the truck’s route.
xsij ∈ {0, 1}
: is equal to 1, iff arc (i, j) is part of the truck’s route to the drone station s.
∀i∈VL ,j∈VR
∀i∈VL ,j∈VR ,s∈VS d ∈ {0, 1} ysj ∀s∈VS ,j∈VN ,d∈D
zs ∈ {0, 1} ∀s∈VS
: is equal to 1, iff drone d serves j from station s. : is equal to 1, iff drone station s is opened.
The Traveling Salesman Drone Station Location Problem
1133
Using this notation, we have the following MILP formulation of the TSDSLP:
s.t. i∈VL j∈VR i=j
τ,
(1)
tij xij ≤ τ,
(2)
min
i∈VL j∈VR i=j
tij xsij +
d 2 · tsj · ysj ≤ τ : ∀s ∈ VS , d ∈ D,
(3)
j∈VN
xij +
i∈VL , i=j
d ysj = 1 : ∀j ∈ VN ,
(4)
s∈VS d∈D
x0j =
j∈VR
xi,n+1 = 1,
(5)
i∈VL
xik −
i∈VL i=k
xkj = 0 : ∀k ∈ VN ∪ VS ,
(6)
j∈VR k=j
xij ≤ |S| − 1 : ∀S ⊂ V, {0, n + 1} ∈ / S, |S| > 1,
(7)
xsij ≤ xij : ∀s ∈ VS , i ∈ VL , j ∈ VR , i = j, s x0j = 1 : ∀s ∈ VS ,
(8)
i∈S j∈S i=j
(9)
j∈VR
xsik −
i∈VL i=k
i∈VL i=s
xskj = 0 : ∀s ∈ VS , k ∈ VN ∪ VS , s = k,
(10)
xis = 0 : ∀s ∈ VS ,
(11)
xis ≥ zs : ∀s ∈ VS ,
(12)
zs ≤ C,
(13)
j∈VR j=k
xsis −
i∈VL i=s
i∈VL i=s
s∈VS
d ysj ≤ nzs : ∀s ∈ VS ,
(14)
d∈D j∈VN d 2 · dsj · ysj ≤ E : ∀s ∈ VS , d ∈ D, j ∈ VN .
(15)
In this model, the objective function (1) minimizes the makespan τ . Constraints (2) and (3) describe τ mathematically. More precisely, constraint (2) sets the time spent traveling by the truck (to serve customer and supply stations) as a lower bound on the objective value. In constraints (3), for each station s, we account for the time until the truck has reached the station and then, for each drone d located at the station, the time spent fulfilling requests. These values are summed up to define lower bounds on τ . Constraints (4) guarantee that each request j is served exactly once by either the truck or a drone. The flow of the truck is defined through constraints (5)–(6).
1134
D. Schermer et al.
More precisely, constraints (5) ensure that the truck starts and concludes its tour exactly once. For each customer or drone station, constraints (6) guarantee that the flow is preserved, i.e., the number of incoming arcs must equal the number of outgoing arcs. Moreover, constraints (7) serve as classical subtour elimination constraints, i.e., for each proper non-empty subset of vertices S (that does not contain the depot), no more than |S| − 1 arcs can be selected within this set. Constraints (8)–(11) specify the route of the truck that leads to each drone station s. To this end, constraints (8) ensure that this path must follow the path of the truck. Moreover, constraints (9) guarantee that the departure from the depot is always a part of each route to a station. Furthermore, for each vertex k that might be located in between the depot and the station, constraints (10) preserve the flow. In addition, for each station s that is visited by the truck, constraints (11) guarantee that there is exactly one arc leading to the station. As specified by constraints (12), a drone station is opened only if it is visited by the truck. Moreover, constraint (13) guarantees that at most C drone stations may be opened. Constraints (14) restrict the number of drone operations that can only be performed at opened drone stations. Constraints (15) determine the drone stations’ range of operation. Note that these constraints might be effectively handled during preprocessing. Finally, according to the definition of the decision variables, τ ∈ R≥0 and the other decision variables are binary. In place of constraints (7), it is possible to adapt the family of Miller-TuckerZemlin (MTZ) constraints, using auxiliary integer variables ui , as follows [5]: u0 = 1,
(16)
2 ≤ ui ≤ n + m + 2 : ∀i ∈ VR , ui − uj + 1 ≤ (n + m + 2)(1 − xij ) : ∀i ∈ VL , j ∈ VR , i = j, ui ∈ Z+ : ∀i ∈ V. 2.2
(17) (18) (19)
Minimal Operational Cost TSDSLP
As an alternative to the makespan minimization, we might be interested in cost minimization instead. In this case, we consider only the variable cost-per-mile that might be associated with the truck and drones and the fixed cost of using a station. To this end, we might use the following objective function: min ct
i∈VL j∈VR i=j
dij xij + cd
s∈VS d∈D j∈VN
d (dsj + djs )ysj +
fs zs
(20)
s∈VS
where, fs is the fixed cost of opening and using the station s, and the parameters ct , cd ∈ R+ determine the relative cost for each mile that the truck and drones are in operation. In this case, the model can remain unchanged with the exception that it is not necessary to consider the variables τ , xsij and the respective constraints associated with these variables.
The Traveling Salesman Drone Station Location Problem
3
1135
Computational Experiments
We implemented the model (1)–(15) and solved it by the MILP solver Gurobi Optimizer 8.1.0. Throughout the solution process, the subtour elimination constraints (7) were treated as lazy constraints. More precisely, whenever the solver determines a new candidate incumbent integer-feasible solution, we examine if it contains any subtour. If no subtour is contained, we have a feasible solution. Otherwise, we calculate the set of vertices S that is implied by the shortest subtour contained in the current candidate solution. For this set S, constraint (7) is added as a cutting plane and the solver continues with its inbuilt branch-and-cut procedure. For comparative purposes, we solved also the alternative formulation of the TSDSLP in which the MTZ constraints (16)–(19), in place of (7), are used. We carried out all experiments on single compute nodes in an Intel Xeon E5-2670 CPU cluster where each node was allocated 8 GB of RAM. A time limit of 10 min was imposed on the solver. We generated the test instances according to the scheme shown in Fig. 2. More precisely, we considered a 32 × 32 km2 square grid where the customer locations VN = {1, . . . , n}, n ∈ {10, 30, 50} were generated under uniform distribution. Furthermore, we assumed that the drone stations are located at the coordinates (x, y) ∈ (8, {8, 24}) ∪ (24, {8, 24}). Moreover, we considered a central depot at (x, y) = (16, 16). We investigated different cases; more precisely, the basic one follows the assumption of Murray and Chu [7], where drones have a maximum range of operation of E = 16 km. Therefore, the radius of operation associated with each station is Er = E/2 = 8 km. In order to broaden our experiments, we tested the model for two other values of Er . y
x
Fig. 2. A visualization of the scheme according to which instances are generated
In order to study the influence of problem parameters on the solver and the solutions, we considered their domains as follows. We tested for three different possible values for C, i.e., C ∈ {1, 2, 3}, and also did experiments for three distinct number of identical drones that a drone station can hold, i.e., |D| ∈ {1, 2, 3}. Moreover, we let the relative velocity α be one of the values from {0.5, 1, 2, 3} and we assumed that the operational radius Er ∈ {8, 12, 16}.
1136
D. Schermer et al.
For each value of n ∈ {10, 30, 50}, we generated 10 random instances, which (along with the drone stations and the location of the depot) specify the graph G (refer to [14] for the instances). Furthermore, based on our choice of parameters C, |D|, α and E, we have 3 · 3 · 4 · 3 = 108 parameter vectors. Therefore, we have a total of 30 · 108 = 3240 problems that are solved through both formulations. Table 1 contains the numerical results of our computational experiments using two different formulations of the TSDSLP. In particular, for each number of customers n, the permitted number of stations C, and the operational radius Er , this table shows the average run-time t (in seconds) as well as the average MIP gap. While comparing the results of two formulations, we observe that the differences on instances with n = 10 are negligible; however, on larger instances, the prohibition of subtours through lazy constraints improves the average run-times and MIP gaps significantly. Although medium-sized instances can be solved within reasonable time, the run-time depends strongly on n and the parameters. Table 1. Influence of the instance size n, the number of permitted stations C, the radius of operation Er , and the formulation on the run-time (s.) as well as MIP gap n
C MILP (1)–(15) with lazy constraints Er = 8 t
Gap
MILP (1)–(6), (8)–(19)
Er = 12
Er = 16
Er = 8
t
t
t
Gap
Gap
Gap
Er = 12
Er = 16
t
t
Gap
Gap
10 1
1.1 0.0%
1.3
0.0%
1.4
0.0%
1.9 0.0%
1.8
0.0%
2.0
0.0%
2
1.1 0.0%
1.7
0.0%
2.3
0.0%
1.6 0.0%
2.1
0.0%
2.9
0.0%
3
1.0 0.0%
1.7
0.0%
3.1
0.0%
1.6 0.0%
2.0
0.0%
3.0
0.0%
30 1
16.1 0.0%
28.0
0.0%
37.3
0.0%
83.2 0.0% 145.9
0.2% 198.8
0.3%
2
43.9 0.0%
80.0
0.0% 178.8
0.6% 187.6 0.9% 301.5
2.3% 423.8
5.4%
3
67.2 0.0% 227.6
0.9% 331.6
3.3% 228.9 1.6% 350.2
4.6% 442.0
8.7%
50 1
113.5 0.0% 292.4
0.2% 379.6
1.1% 430.8 2.1% 562.6
8.4% 591.0 13.2%
2
289.1 0.5% 534.8
5.3% 583.9 11.4% 505.4 4.9% 600.2 18.9% 600.3 24.0%
3
385.5 2.7% 549.8 13.7% 591.2 23.0% 526.6 7.2% 599.7 23.2% 597.0 28.4%
For the purpose of illustrating the benefits of utilizing the drone stations with regards to makespan reduction, we introduce the following metric, where τ ∗ is the optimal objective is the objective value returned by the solver and τTSP value of the TSP (that does not visit or use any drone station): Δ = 100% −
τ ∗ τTSP
(21)
Figure 3 highlights the numerical results. More precisely, this figure shows the average savings over the TSP, i.e., Δ, based on the number of permitted stations C, the number of drones located at each station |D|, as well as the drones’ relative velocity α and radius of operation Er . Overall, we can distinguish two cases. If the radius of operation is small (Er = 8, solid lines) and the number
The Traveling Salesman Drone Station Location Problem
1137
of permitted stations C is fixed, the savings are nearly independent from the number of drones at each station and their velocity and radius of operation. In this case, the number of customers that can be served by the drones is limited (see Fig. 2). However, even a slow-moving drone can effectively serve most (or all) customers within its radius of operation. An increase in the number of drones (or their relative velocity) will in this case not improve the overall makespan. On the other hand, if the radius of operation is large (Er = 16, dashed lines), there is a significant impact of these parameters on the savings. In this case, the makespan can be reduced effectively by increasing the number of drones (or their relative velocity). Furthermore, it is worth to highlight that, in many cases, significant savings are already possible with few drones (per station) that have a relative velocity of α ∈ {0.5, 1} but a large operational range. This contrasts problems that follow the fundamental idea of the FSTSP, where drones with relatively small endurance but fast relative velocity are often preferred [1,10–13].
Savings Δ [%]
60 50 40
Permitted Stations C 2
1
3
|D| = 1 |D| = 2 |D| = 3
30 20 10 0
0.5 1.0
2.0
3.0
0.5 1
2
3
0.5 1
2
3
Relative Velocity α
Fig. 3. The savings Δ for different values of the problem parameters (averaged over all instances). Solid and dashed lines correspond to Er = 8 and Er = 16, respectively
4
Conclusion
In this work, we introduced the Traveling Salesman Drone Station Location Problem (TSDSLP), which combines Traveling Salesman Problem and Facility Location Problem in which facilities are drone stations. After formulating the problem as a MILP, we presented the results of our computational experiments. According to the numerical results, using suitable drone stations can bring significant reduction in the delivery time. Since TSDSLP defines a new concept, the future research directions are numerous, e.g., a research idea might consist in studying the case of using multiple trucks in place of a single one. Another research direction might focus in design of efficient solution methods. In fact, the standard solvers are able to solve
1138
D. Schermer et al.
only small TSDSLP instances; hence, we might design effective heuristics, which can address large-scale instances. The research in this direction is in progress and the results will be reported in the future.
References 1. Agatz, N., Bouman, P., Schmidt, M.: Optimization approaches for the traveling salesman problem with drone. Transp. Sci. 52(4), 965–981 (2018) 2. Chauhan, D., Unnikrishnan, A., Figliozzi, M.: Maximum coverage capacitated facility location problem with range constrained drones. Transp. Res. Part C: Emerg. Technol. 1–18 (2019) 3. Dorling, K., Heinrichs, J., Messier, G.G., Magierowski, S.: Vehicle routing problems for drone delivery. IEEE Trans. Syst. Man Cybern. Syst. 47(1), 70–85 (2017) 4. Kim, S., Moon, I.: Traveling salesman problem with a drone station. IEEE Trans. Syst. Man Cybern. Syst. 49(1), 42–52 (2018) 5. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulation of traveling salesman problems. J. ACM 7(4), 326–329 (1960) 6. Min, H., Jayaraman, V., Srivastava, R.: Combined location-routing problems: a synthesis and future research directions. Eur. J. Oper. Res. 108(1), 1–15 (1998) 7. Murray, C.C., Chu, A.G.: The flying sidekick traveling salesman problem: optimization of drone-assisted parcel delivery. Transp. Res. Part C: Emerg. Technol. 54, 86–109 (2015) 8. Nagy, G., Salhi, S.: Location-routing: issues, models and methods. Eur. J. Oper. Res. 177(2), 649–672 (2006) 9. Otto, A., Agatz, N., Campbell, J., Golden, B., Pesch, E.: Optimization approaches for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: a survey. Networks 72(4), 411–458 (2018) 10. Schermer, D., Moeini, M., Wendt, O.: Algorithms for solving the vehicle routing problem with drones. In: LNCS, vol. 10751, pp. 352–361 (2018) 11. Schermer, D., Moeini, M., Wendt, O.: A variable neighborhood search algorithm for solving the vehicle routing problem with drones (Technical report), pp. 1–33. BISOR, Technische Universit¨ at Kaiserslautern (2018) 12. Schermer, D., Moeini, M., Wendt, O.: A hybrid VNS/Tabu search algorithm for solving the vehicle routing problem with drones and en route operations. Comput. Oper. Res. 109, 134–158 (2019). https://doi.org/10.1016/j.cor.2019.04.021 13. Schermer, D., Moeini, M., Wendt, O.: A matheuristic for the vehicle routing problem with drones and its variants (Technical report), pp. 1–37. BISOR, Technische Universit¨ at Kaiserslautern (2019) 14. Schermer, D., Moeini, M., Wendt, O.: Instances for the traveling salesman drone station location problem (TSDSLP) (2019). https://doi.org/10.5281/zenodo. 2594795 15. Wang, X., Poikonen, S., Golden, B.: The vehicle routing problem with drones: several worst-case results. Optim. Lett. 11(4), 679–697 (2016)
Two-Machine Flow Shop with a Dynamic Storage Space and UET Operations Joanna Berli´ nska1 , Alexander Kononov2 , and Yakov Zinder3(B) 1
3
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Pozna´ n, Poland [email protected] 2 Sobolev Institute of Mathematics, Novosibirsk, Russia [email protected] School of Mathematical and Physical Sciences, University of Technology Sydney, Ultimo, Australia [email protected]
Abstract. The paper establishes the NP-hardness in the strong sense of a two-machine flow shop scheduling problem with unit execution time (UET) operations, dynamic storage availability, job dependent storage requirements, and the objective to minimise the time required for the completion of all jobs, i.e. to minimise the makespan. Each job seizes the required storage space for the entire period from the start of its processing on the first machine till the completion of its processing on the second machine. The considered scheduling problem has several applications, including star data gathering networks and certain supply chains and manufacturing systems. The NP-hardness result is complemented by a polynomial-time approximation scheme (PTAS) and several heuristics. The presented heuristics are compared by means of computational experiments. Keywords: Two-machine flow shop · Makespan · Dynamic storage · Computational complexity · Polynomial-time approximation scheme
1
Introduction
This paper presents a proof of the NP-hardness in the strong sense and a polynomial-time approximation scheme (PTAS) for the two-machine flow shop, where the duration of each operation is one unit of time and where, in order to be processed, each job requires a certain amount of an additional resource, which will be referred to as a storage space (buffer). The storage requirement varies from job to job, and the availability of the storage space (buffer capacity) varies in time. The goal is to minimise the time needed to complete all jobs. The presented computational complexity results are complemented by several heuristics which are compared by means of computational experiments. The considered problem arises in star data gathering networks where datasets from the worker nodes are to be transferred to the base station for processing. c Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1139–1148, 2020. https://doi.org/10.1007/978-3-030-21803-4_112
1140
J. Berli´ nska et al.
Data transfer can commence only if the available memory of the base station is not less than the size of the dataset that is to be transferred. The amount of memory, occupied by a dataset, varies from worker node to worker node. Only one node can transfer data to the base station at a time, although during this process the base station can process one of the previously transferred datasets. The memory, consumed by this dataset, is released only at the completion of processing the dataset by the base station. The base station has a limited memory whose availability varies in time due to other processes. The existing publications on scheduling in the data gathering networks assume that the exact time, needed for transferring each dataset, and the exact time, required by the base station for its processing after this transfer, are known in advance (see, for example, [1–3,10]). In reality, the exact duration of transferring a dataset and the duration of its processing by the base station may be difficult to estimate, and only an upper bound may be known. In such a situation, the allocation of dataset independent time slots for transferring data and for processing a dataset by the base station, may be a more adequate option. This approach is analysed in this paper. The paper also relaxes the assumption, which is normally made in the literature (see, for example, [3]), that, during the planning horizon, the available amount of memory remains the same. Another area that is relevant to this paper is transportation and manufacturing systems where two consecutive operations use the same storage space that is allocated to a job at the beginning of its first operation and is released only at the completion of this job. For example, in supply chains, where goods are transported in containers or pallets with the consecutive use of two different types of vehicles, the unloading of one vehicle and the loading onto another normally require certain storage space. Although the storage requirements of different containers or pallets can vary significantly, the durations of loading and unloading by a crane practically remain the same regardless of their sizes. The considered scheduling problem can be stated as follows. The jobs, comprising the set N = {1, ..., n}, are to be processed on two machines, machine M1 and machine M2 . Each job should be processed during one unit of time on machine M1 (the first operation of the job), and then during one unit of time on machine M2 (the second operation of the job). Each machine can process at most one job at a time, and each job can be processed on at most one machine at a time. If a machine starts processing a job, it continues its processing until the completion of the corresponding operation, i.e. no preemptions are allowed. A schedule σ specifies for each j ∈ N the point in time Sj (σ) when job j starts processing on machine M1 and the point in time Cj (σ) when job j completes processing on machine M2 . In order to be processed each job j requires ωj units of an additional resource. These ωj units are seized by job j during the time interval [Sj (σ), Cj (σ)). At any point in time t the available amount of the resource is specified by the function Ω(t), i.e. any schedule σ, at any point in time t, should satisfy the condition ωj ≤ Ω(t). {j: Sj (σ)≤t 0 denote k =
nε 2
and q =
2 ε
.
Theorem 2. For any given small ε > 0, a schedule σ such that Cmax (σ) ≤ (1 + ε)Cmax (σ ∗ ), ∗
(4) 2 q
where σ is an optimal schedule, can be constructed in O(q n ) time. Proof. The proof is based on the idea in [6]. Assume that there are sufficiently many jobs and number them in a nondecreasing order of their storage requirements, i.e. ω1 ≤ ... ≤ ωn . For each job j, replace its storage requirement ωj by a new one (denoted αj ) as follows. For each 1 < e ≤ q − 1 and each (e − 1)k < j ≤ ke, let αj = ωke and, for each k(q − 1) < j ≤ n, let αj = ωn . Observe that any schedule for the problem with the new storage requirements is feasible for the problem with the original storage requirements. An optimal schedule for the new storage requirements can be constructed by dynamic programming as follows. For 1 ≤ e ≤ q, let ke if 1 ≤ e < q π(e) = , n if e = q and consider (q + 1)-tuples (n1 , ..., nq , i) such that (a) 0 ≤ ne ≤ k, for all 1 ≤ e ≤ q − 1, and 0 ≤ nq ≤ n − k(q − 1); and (b) 1 ≤ i ≤ q and ni > 0. Each (q + 1)-tuple represents n1 + ... + nq jobs such that, for each 1 ≤ e ≤ q, this set contains ne jobs j, which αj is ωπ(e) . For each (q + 1)-tuple (n1 , ..., nq , i), let F (n1 , ..., nq , i) be the minimal time needed for completion of all jobs, corresponding to (n1 , ..., nq , i), under the condition that the job with the largest completion time among these jobs is a job with the new storage requirement ωπ(i) . Consequently, the optimal makespan is C = min F (k, ..., k, n − (q − 1)k, i). 1≤i≤q
1144
J. Berli´ nska et al.
The (q + 1)-tuples, satisfying the condition n1 + ... + nq = 1, will be referred to as boundary (q + 1)-tuples. Then, F (n1 , ..., nq , i) = 2 for each boundary (n1 , ..., nq , i). For any positive integer t, any 1 ≤ i ≤ q and any 1 ≤ e ≤ q, let 1 if ωπ(i) + ωπ(e) ≤ Ω(t) Wi,e (t) = . 2 if ωπ(i) + ωπ(e) > Ω(t) Then, the values of F for all (q + 1)-tuples that are not boundary are computed, using the following recursive equations: F (n1 , ..., ni + 1, ..., nq , i) =
min [F (n1 , ..., nq , e) + Wi,e (F (n1 , ..., nq , e) − 1)].
{e: ne >0}
The dynamic programming algorithm above constructs an optimal schedule σ in O(q 2 nq ) time, and the only what remains to show is that (4) holds. Let σ ∗ be an optimal schedule for the problem with the original storage requirements. This schedule can be converted into a schedule η for the problem with the new storage requirements as follows. For each job j such that 1 ≤ j ≤ n − k, let Cj (η) = Cj+k (σ ∗ ) and, for each job j such that n − k < j, let Cj (η) = Cmax (σ ∗ ) + 2(j − n + k). Then, taking into account that n < Cmax (σ ∗ ), Cmax (σ ∗ ) ≤ C ≤ Cmax (η) ≤ Cmax (σ ∗ )+2k ≤ Cmax (σ ∗ )+nε ≤ (1+ε)Cmax (σ ∗ )
which completes the proof.
4
ILP Formulation
In this section, we formulate our problem as an integer linear program. Since we assumed that ωj ≤ Ω(t) for all j and t, the schedule length never exceeds 2n. Let T ≤ 2n be an upper bound on the optimum schedule length Cmax . As the available buffer size may change only at integer moments, for any nonnegative integer t the buffer size equals Ω(t) in the whole interval [t, t + 1). For each j = 1, . . . , n and t = 0, . . . , T − 1 we define binary variables xj,t such that xj,t = 1 if job j starts at time t, and xj,t = 0 in the opposite case. The minimum schedule length can be found by solving the following integer linear program. minimise Cmax n ωj (xj,t + xj,t−1 ) ≤ Ω(t) j=1 n j=1
xj,t ≤ 1
(5) for t = 1, . . . , T − 1
for t = 0, . . . , T − 1
(6)
(7)
Two-Machine Flow Shop with a Dynamic Storage Space T −1
xj,t = 1
for j = 1, . . . , n
1145
(8)
t=0 T −1
txj,t + 2 ≤ Cmax
for j = 1, . . . , n
(9)
t=0
xj,t ∈ {0, 1}
for j = 1, . . . , n,
t = 0, . . . , T − 1
(10)
Constraints (6) guarantee that the jobs executed in interval [t, t + 1), where 1 ≤ t ≤ T − 1, fit in the buffer. Note that for t = 0 we have only one job running in interval [t, t + 1), and hence, the buffer limit is also observed. At most one job starts at time t by (7), and each job starts exactly once by (8). Inequalities (9) ensure that all jobs are completed by time Cmax .
5
Heuristic Algorithms
Although the ILP formulation proposed in the previous section delivers optimum solutions for our problem, it is impractical for larger instances because of its high computational complexity. Therefore, in this section we propose heuristic algorithms. Each of them constructs a sequence in which the jobs should be processed. The jobs are started without unnecessary delay, as soon as the previous job completes on the first machine and sufficient amount of buffer space is available. Algorithm LF constructs the job sequence using a greedy largest fit rule. Every time unit, we start on the first machine the largest job which fits in the currently available buffer. If no such job can be found, the procedure continues after one time unit, when the buffer is released. This algorithm can be implemented to run in O(n log n) time, using a self-balancing binary search tree. However, if n is not very big, a simple O(n2 ) implementation that does not use advanced data structures may be more practical. Local search with neighborhoods based on job swaps proved to be a very good method for obtaining high quality solutions for flow shop scheduling with constant buffer space and non-unit operation execution times [3]. Therefore, we also analyse algorithm LFLocal that starts with a schedule generated by LF, and then applies the following local search procedure. For each pair of jobs, we check if swapping their positions in the current sequence leads to decreasing the schedule length. The swap that results in the shortest schedule is executed, and the search is continued until no further improvement is possible. Algorithm Rnd constructs a random job sequence in O(n) time. This algorithm is used mainly to verify if the remaining heuristics perform well in comparison to what can be achieved without effort. Algorithm RndLocal starts with a random job sequence, and then improves it using the local search procedure described above. Analysing this heuristic will let us know what can be achieved by local search if we start from a probably low quality solution.
1146
6
J. Berli´ nska et al.
Experimental Results
In this section, we compare the quality of the obtained solutions and the computational costs of the proposed heuristics. The algorithms were implemented in C++ and run on an Intel Core i7-7820HK CPU @ 2.90 GHz with 32 GB RAM. Integer linear programs were solved using Gurobi. Due to limited space, we report here only on a small subset of the obtained results. The generated instances and solutions can be found at http://berlinska.home.amu.edu.pl/datasets/F2-UETbuffer.zip. 40
1E+4
35
1E+3
30
1E+2 1E+1
25
1E+0
20
1E-1
15
1E-2
10
1E-3
5
1E-4
0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
1E-5 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
ILP
LF
LFLocal
a)
Rnd
RndLocal
ILP
LF
LFLocal
Rnd
RndLocal
b)
Fig. 1. Results for n = 100 vs. δ. (a) Average quality, (b) average execution time.
Not all analysed instances could be solved to optimality using the ILP in reasonable time. Therefore, we measure the schedule quality by the relative percentage error from the lower bound computed by Gurobi by solving the ILP in 1 h time limit. In most cases, this limit was enough to reach an optimal solution. To illustrate this, in addition to the heuristic results, we also report on the quality of solutions returned by ILP after at most 1 h. For each analysed parameter combination, we present the average results over 30 instances. The test instances were generated as follows. In tests with n jobs, the buffer requirements ωj were chosen randomly from the range [n, 2n]. Due to such a choice of ωj range, the buffer requirements are diversified, but not very unbalanced. For a given δ, the available buffer space Ω(t) was chosen randomly from the range [maxnj=1 {ωj }, δ maxnj=1 {ωj }], for each t = 0, . . . , 2n − 1 independently. On the one hand, when δ is very small, the instances may be easy to solve because there are not many possibilities of job overlapping, and the optimum schedules are long. On the other hand, if δ is very big, there exist many pairs of jobs that fit in the buffer together, which can also make instances easier. Therefore, we tested δ ∈ {1.0, 1.1, . . . , 2.0} for n = 100. The obtained results are presented in Fig. 1. All instances with δ ∈ {1.0, 1.1, 2.0} were solved by ILP to optimality within the imposed time limit. For each of the remaining values of δ, there were some tests for which only suboptimal solutions could be found within an hour. The instances with δ ∈ [1.3, 1.6] seem the most difficult, as the average
Two-Machine Flow Shop with a Dynamic Storage Space
1147
running time of ILP for these values is above 1000 s. In this set of instances, the heuristic algorithms have the worst solution quality for δ = 1.6. Hence, in the next experiment we use δ = 1.6, in order to construct demanding instances. We analysed the performance of our heuristics for n = 10, 20, . . . , 100. The quality of solutions delivered by the respective algorithms is presented in Fig. 2a. All instances with n ≤ 40 were solved to optimality by the ILP algorithm in the 1 h time limit. In each remaining test group, there were several instances for which the optimum solution was not found within this time. Still, the largest average error of one-hour ILP, obtained for n = 90, is only 0.25%. As expected, the worst results are obtained by algorithm Rnd. Algorithm RndLocal delivers much better schedules, which shows that our local search procedure can improve a poor initial solution. On the contrary, the differences between the results delivered by LF and LFLocal are very small. For most instances, the schedules delivered by LF and LFLocal are identical, because they are local optimums. The quality of results returned by all algorithms gets worse with growing n. The number of jobs has the strongest influence on algorithms Rnd and RndLocal. Its impact on LF and LFLocal is much smaller, and the changes in the quality of ILP results are barely visible. The quality of results produced by all algorithms seems to level off for n ≥ 50. 40
1E+4
35
1E+3
30
1E+2 1E+1
25
1E+0
20
1E-1
15
1E-2
10
1E-3
5 0 10
1E-4 20 ILP
30
40 LF
50
60
LFLocal
a)
70 Rnd
80
90
100
RndLocal
1E-5 10
20 ILP
30
40 LF
50
60
LFLocal
70 Rnd
80
90
100
RndLocal
b)
Fig. 2. Results for δ = 1.6 vs. n. (a) Average quality, (b) average execution time.
The average execution times of the algorithms are shown in Fig. 2b. Naturally, algorithms Rnd and LF, each of which generates only one job sequence, are the fastest. The impact of n on their running times is very small, because they have low computational complexity. Local search algorithms need more time, and are affected by the growth of n. RndLocal is significantly slower than LFLocal. This is caused by the fact that when we start from a random sequence, the local search can really do some work, while in the case of LFLocal, we usually have only 1 or 2 iterations of the search procedure. The ILP algorithm is the slowest, and its running time increases fast with growing n. All in all, the one-hour limited ILP returns optimum or near-optimum solutions, but at a relatively high computational cost. In our experiments with chang-
1148
J. Berli´ nska et al.
ing n, algorithm LF delivers schedules within 12% from the optimum on average. The worst results were obtained by LF for the tests with δ = 2.0 (see Fig. 1a), but the average error was still below 14%. The running time of LF is several orders of magnitude lower than that of ILP even for small instances, and the difference between them increases with the growth of n. Therefore, ILP should only be used when getting as close as possible to the optimum is more important than the algorithm’s running time. For the cases when 10–15% error is acceptable, we recommend using algorithm LF.
7
Conclusions
To the authors’ knowledge, this article is the first paper attempting to explore the two-machine flow shop with a dynamic storage space and job dependent storage requirements. For the case of UET operations, the paper presents a proof of the NP-hardness in the strong sense and a polynomial-time approximation scheme, together with an integer linear program and several heuristics, characterised by the results of computational experiments. Future research should include a worst-case analysis of approximation algorithms.
References 1. Berli´ nska, J.: Scheduling for data gathering networks with data compression: Berli´ nska. J. Eur. J. Oper. Res. 246, 744–749 (2015) 2. Berli´ nska, J.: Scheduling data gathering with maximum lateness objective. In: Wyrzykowski, R. et al. (eds.) PPAM 2017, Part II. LNCS, vol. 10778, pp. 135– 144. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78054-2 13 3. Berli´ nska, J.: Heuristics for scheduling data gathering with limited base station memory. Ann. Oper. Res. (2019). https://doi.org/10.1007/s10479-019-03185-3. In press 4. Bla˙zewicz, J., Kubiak, W., Szwarcfiter, J.: Scheduling unit-time tasks on flow-shops under resource constraints. Ann. Oper. Res. 16, 255–266 (1988) 5. Bla˙zewicz J., Lenstra, J.K., Rinnooy Kan, A.H.G.: Scheduling subject to resource constraints: classification and complexity. Discret. Appl. Math. 5, 11–24 (1983) 6. Fernandez de la Vega, W., Lueker, G.S.: Bin packing can be solved within 1 + ε in linear time. Combinatorica 1(4), 349–355 (1981) 7. Fung, J., Zinder, Y.: Permutation schedules for a two-machine flow shop with storage. Oper. Res. Lett. 44(2), 153–157 (2016) 8. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco (1979) 9. Gu, H., Memar, J., Kononov, A., Zinder, Y.: Efficient Lagrangian heuristics for the two-stage flow shop with job dependent buffer requirements. J. Discret. Algorithms 52–53, 143–155 (2018) 10. Luo, W., Xu, Y., Gu, B., Tong, W., Goebel, R., Lin, G.: Algorithms for communication scheduling in data gathering network with data compression. Algorithmica 80(11), 3158–3176 (2018) 11. R¨ ock, H.: Some new results in no-wait flow shop scheduling. Z. Oper. Res. 28(1), 1–16 (1984) 12. S¨ ural, H., Kondakci, S., Erkip, N.: Scheduling unit-time tasks in renewable resource constrained flowshops. Z. Oper. Res. 36(6), 497–516 (1992)
Author Index
A Abdallah, Lina, 228 Abu, Kuandykov, 861, 1119 Aigerim, Bolshibayeva, 861, 882 Aiman, Moldagulova, 761, 842 Akhtar, Taimoor, 672, 681 Alharbi, Mafawez, 949 Alsyouf, Imad, 1078 Altherr, Lena C., 916 Anton-Sanchez, Laura, 1013 Aoues, Younes, 991, 1001 Arana-Jiménez, Manuel, 509 Archetti, Francesco, 751 Ashimov, Abdykappar, 850 Avraamidou, Styliani, 579 Aygerim, Aitim, 842 B Bai, Hao, 991 Ballo, Federico, 68 Barilla, David, 702 Barkalov, Konstantin, 48 Bassi, Mohamed, 547 Bednarczuk, Ewa M., 175 Bellahcene, Fatima, 279 Benammour, Faouzi Mohamed, 341 Bentobache, Mohand, 26 Berenguer, Maria Isabel, 518 Berlińska, Joanna, 1139 Bertozzi, Andrea L., 730 Bettayeb, Maamar, 1078 Bonvin, Gratien, 957 Borovskiy, Yuriy, 850 Borowska, Bożena, 537 Buchheim, Christoph, 267
Buisson, Martin, 981 Bujok, Petr, 202 C Caillard, Simon, 1033 Candelieri, A., 751 Cao, Hung-Phi, 740, 769 Caristi, Giuseppe, 702 Cen, Xiaoli, 135 Cexus, Jean-Christophe, 1097 Cheaitou, Ali, 1078 Chen, Boyu, 1067 Climent, Laura, 617 Costa, M. Fernanda P., 16 D Da Silva, Gabriel, 893 Dambreville, Frédéric, 1097 Dan, Pranab K, 906 de Cursi, Eduardo Souza, 3, 238, 547, 557, 567, 991 de Freitas Gomes, José Henrique, 600 de Oliveira, Welington, 957 de Paiva, Anderson Paulo, 600 de Paula, Taynara Incerti, 600 Degla, Guy, 831 Delfino, Adriano, 477 Demassey, Sophie, 957 Deussen, Jens, 78 Devendeville, Laure Brisoux, 1033 Diaz, David Alejandro Baez, 1109 Dinara, Kozhamzharova, 1119 Ding, Wei, 468 Doan, Xuan Vinh, 310 Dupin, Nicolas, 790
© Springer Nature Switzerland AG 2020 H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1149–1152, 2020. https://doi.org/10.1007/978-3-030-21803-4
1149
1150 E Eddy, Foo Y. S., 247 Einšpiglová, Daniela, 202 Ellaia, Rachid, 3, 547 F Fajemisin, Adejuyigbe, 617 Fakhari, Farnoosh, 926 Fampa, Marcia, 89, 267, 428 Fenner, Trevor, 779 Fernandes, Edite M. G. P., 16 Fernández, José, 1013 Ferrand, Pascal, 981 Foglino, Francesco, 720 Frolov, Dmitry, 779 Fuentes, Victor K., 89 Fukuba, Tomoki, 937 G G.-Tóth, Boglárka, 1013 Galán, M. Ruiz, 518 Galuzzi, Bruno Giovanni, 751 Gamez, Domingo, 518 Gao, Runxuan, 135 Garmashov, Ilia, 398 Garralda–Guillem, A. I., 518 Gautrelet, Christophe, 567 Gawali, Deepak D., 58 Gergel, Victor, 638 Ghaderi, Seyed Farid, 926 Ghosh, Tamal, 906 Giordani, Ilaria, 751 Gobbi, Massimiliano, 68 Goldberg, Noam, 871 Gomes, Guilherme Ferreira, 600 Granvilliers, Laurent, 99 H Haddou, Mounir, 228 Hamdan, Sadeque, 1078 Hartman, David, 119 Hennequin, Sophie, 1054, 1109 Henner, Manuel, 981 Hladík, Milan, 119 Ho, Vinh Thanh, 1054 Holdorf Lopez, Rafael, 238, 557 Homolya, Viktor, 109 Hu, Xi-Wei, 341 Huang, Yaoting, 1067
Author Index I Imanzadeh, S., 971 Imasheva, Baktagul, 820 J Jarno, Armelle, 971 Jemai, Zied, 1078 Jemmali, Mahdi, 949 Ji, Sai, 488 Jiang, Rujun, 145, 213 Jin, Zhong-Xiao, 1067 Jouini, Oualid, 1078 K Kanzi, Nader, 702 Karpenko, Anatoly, 191 Kasri, Ramzi, 279 Kassa, Semu Mitiku, 589 Kaźmierczak, Anna, 128 Khalij, Leila, 567 Khenchaf, Ali, 1097 Koliechkina, Liudmyla, 355 Kononov, Alexander, 1139 Korolev, Alexei, 398 Koudi, Jean, 831 Kozinov, Evgeniy, 638 Krishnan, Ashok, 247 Kronqvist, Jan, 448 Kulanda, Duisebekova, 1119 Kulitškov, Aleksei, 365 Kumar, Deepak, 257 Kumlander, Deniss, 365, 458 L Le Thi, Hoai An, 289, 299, 320, 893, 1054 Le, Hoai Minh, 893 Le, Thi-Hoang-Yen, 740 Lebedev, Ilya, 48 Lee, Jon, 89, 387, 438 Lefieux, Vincent, 893 Leise, Philipp, 916 Lemosse, Didier, 567, 991, 1001 Leonetti, Matteo, 720 Li, Duan, 145, 213 Li, Min, 488 Li, Yaohui, 627 Li, Zhijian, 730 Liu, Wen-Zhuo, 330 Liu, Zhengliang, 691
Author Index
1151
Lu, Kuan, 611 Lu, Wenlian, 1067 Lucet, Corinne, 1033 Lucet, Yves, 257 Lundell, Andreas, 448 Luo, Xiyang, 730
Phan, Thuong-Cang, 740, 769 Pichugina, Oksana, 355 Pistikopoulos, Efstratios N., 579 Porošin, Aleksandr, 458 Prestwich, Steven D., 417, 617 Previati, Giorgio, 68
M Ma, Ran, 1089 Martinsen, Kristian, 906 Melhim, Loai Kayed B., 949 Melo, Wendel, 428 Migot, Tangi, 228 Mikitiuk, Artur, 407 Mirkin, Boris, 779 Mishra, Priyanka, 660 Mishra, Shashi Kant, 182, 660 Mizuno, Shinji, 611 Moeini, Mahdi, 1023, 1129 Mohapatra, Ram N., 660 Mokhtari, Abdelkader, 26 Mukazhanov, Nurzhan K., 761 Mukhanov, S.B., 810 Muts, Pavlo, 498 Myradov, Bayrammyrat, 526
Q Qiu, Ke, 468
N Nakispekov, Azamat, 820 Nascentes, Fábio, 238 Nascimento, Susana, 779 Nataraj, Paluri S.V., 58 Naumann, Uwe, 78 Ndiaye, Babacar Mbaye, 831 Nguyen, Duc Manh, 1097 Nguyen, Huu-Quang, 221 Nguyen, Phuong Anh, 289 Nguyen, Viet Anh, 320 Nielsen, Frank, 790 Niu, Yi-Shuai, 330, 341 Nouinou, Hajar, 1054 Nowak, Ivo, 498 Nowakowski, Andrzej, 128 O Onalbekov, Mukhit, 850 Ortigosa, Pilar M., 1013 P Pagnacco, Emmanuel, 547 Parra, Wilson Javier Veloz, 1001 Patil, Bhagyesh V., 58, 247 Pelz, Peter F., 916 Perego, Riccardo, 751 Phan, Anh-Cang, 740, 769
R Raissa, Uskenbayeva, 761, 810, 842, 861, 882 Rakhmetulayeva, Sabina, 861, 882, 1119 Raupp, Fernanda, 428 Redondo, Juana L., 1013 Regis, Rommel G., 37 Rocha, Ana Maria A. C., 16 Rossi, Roberto, 417 Roy, Daniel, 1054, 1109 Ryoo, Hong Seo, 376 Ryskhan, Satybaldiyeva, 842 S Sadeghieh, Ali, 702 Sagratella, Simone, 720 Sakharov, Maxim, 191 Salewski, Hagen, 1023 Samir, Sara, 299 Sampaio, Rubens, 238 Sarmiento, Orlando, 267 Sato, Tetsuya, 937 Schermer, Daniel, 1129 Seccia, Ruggiero, 720 Sergeev, Sergeĭ, 691 Shahi, Avanish, 182 Sheu, Ruey-Lin, 221 Shi, Jianming, 611 Shiina, Takayuki, 937 Shoemaker, Christine A., 672, 681 Sidelkovskaya, Andrey, 820 Sidelkovskiy, Ainur, 820 Simon, Nicolai, 916 Singh, Sanjeev Kumar, 182 Singh, Vinay, 649 Skipper, Daphne, 387 Speakman, Emily, 387 Subba, Mohan Bir, 649 Sun, Jian, 1089 Syga, Monika, 175 T Taibi, S., 971 Talbi, El-Ghazali, 790 Tarim, Armagan, 417
1152 Tavakkoli-Moghaddam, R., 926 Telli, Mohamed, 26 Tohidifard, M., 926 Tokoro, Ken-ichi, 937 Toumi, Abdelmalek, 1097 Tran, Bach, 893 Tran, Ho-Dat, 769 Treanţă, Savin, 164 Troian, Renata, 567 Trojanowski, Krzysztof, 407 U Upadhyay, Balendu Bhooshan, 660 V Vavasis, Stephen, 310 Vinkó, Tamás, 109 Visentin, Andrea, 417 W Wang, Bao, 730 Wang, Shuting, 627 Wang, Wenyu, 681 Wang, Yishui, 488 Wang, Yong, 1043 Wendt, Oliver, 1129 Wu, Yizhong, 627 X Xia, Yong, 135, 221
Author Index Xin, Jack, 730, 800 Xu, Dachuan, 488, 713, 1089 Xu, Luze, 438 Xu, Yicheng, 713 Xue, Fanghui, 800 Y Yagouni, Mohammed, 299 Yan, Kedong, 376 Yang, Tianzhi, 135 You, Yu, 330, 341 Z Zagdoun, Ishy, 871 Zălinescu, Constantin, 155 Zámečníková, Hana, 202 Zewde, Addis Belete, 589 Zhang, Dongmei, 488, 713 Zhang, Hongbo, 991 Zhang, Hu, 341 Zhang, Xiaoyan, 1089 Zhang, Yong, 713 Zhang, Yuanmin, 627 Zhang, Zebin, 981 Zheng, Ren, 1067 Zidani, Hafid, 3 Zidna, Ahmed, 58, 247 Zinder, Yakov, 1139 Zuldyz, Kalpeyeva, 842