236 6 7MB
English Pages 524 [515] Year 2021
Alexander Strekalovsky Yury Kochetov Tatiana Gruzdeva Andrei Orlov (Eds.)
Communications in Computer and Information Science
1476
Mathematical Optimization Theory and Operations Research Recent Trends 20th International Conference, MOTOR 2021 Irkutsk, Russia, July 5–10, 2021 Revised Selected Papers
Communications in Computer and Information Science Editorial Board Members Joaquim Filipe Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh Indian Statistical Institute, Kolkata, India Raquel Oliveira Prates Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou Tsinghua University, Beijing, China
1476
More information about this series at http://www.springer.com/series/7899
Alexander Strekalovsky Yury Kochetov Tatiana Gruzdeva Andrei Orlov (Eds.) •
•
•
Mathematical Optimization Theory and Operations Research Recent Trends 20th International Conference, MOTOR 2021 Irkutsk, Russia, July 5–10, 2021 Revised Selected Papers
123
Editors Alexander Strekalovsky Matrosov Institute for System Dynamics and Control Theory SB RAS Irkutsk, Russia Tatiana Gruzdeva Matrosov Institute for System Dynamics and Control Theory SB RAS Irkutsk, Russia
Yury Kochetov Sobolev Institute of Mathematics SB RAS Novosibirsk, Russia Andrei Orlov Matrosov Institute for System Dynamics and Control Theory SB RAS Irkutsk, Russia
ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-3-030-86432-3 ISBN 978-3-030-86433-0 (eBook) https://doi.org/10.1007/978-3-030-86433-0 © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume contains the refereed proceedings of the 20th International Conference on Mathematical Optimization Theory and Operations Research (MOTOR 2021)1 held during July 5–10, 2021, at Lake Baikal, near Irkutsk, Russia. MOTOR 2021 was the third joint scientific event unifying a number of well-known international and Russian conferences held in Ural, Siberia, and the Far East for a long time. First two events conference of this series, MOTOR 20192, and MOTOR 20203 were held in Ekaterinburg and Novosibirsk, Russia, respectively. As per tradition, the main conference scope included, but was not limited to, mathematical programming, bi-level and global optimization, integer programming and combinatorial optimization, approximation algorithms with theoretical guarantees and approximation schemes, heuristics and meta-heuristics, game theory, optimal control, optimization in machine learning and data analysis, and their valuable applications in operations research and economics. In response to the call for papers, MOTOR 2021 received 181 submissions. Out of 102 full papers considered for reviewing (79 abstracts and short communications were excluded for formal reasons) only 30 papers were selected by the Program Committee (PC) for publication in the first volume of proceedings (Springer, LNCS, Vol. 12755). The PC then selected 34 revised papers for publication in this volume. Each submission was reviewed by at least three PC members or invited reviewers, experts in their fields, in order to supply detailed and helpful comments. The conference featured nine invited lectures: – Christian Blum (Artificial Intelligence Research Institute, Spain), “On the Design of Matheuristics that make Use of Learning” – Emilio Carrizosa (Institute of Mathematics, University of Seville, Spain), “Optimal Classification and Regression Trees” – François Clautiaux (Université de Bordeaux, France), “Integer Programming Formulations Based on Exponentially Large Networks: Algorithms and Applications” – Andreas Griewank (Institute of Mathematics, Humboldt University, Germany), “Potential-based Losses for Classification without Regularization” – Klaus Jansen (Christian-Albrechts-Universität, Germany) “Integer Programming and Convolution, with Applications” – Sergey Kabanikhin (Institute of Numerical Mathematics and Mathematical Geophysics, Russia) “Optimization and Inverse Problems” – Nenad Mladenovic (Khalifa University, United Arab Emirates), “Minimum Sum of Squares Clustering for Big Data – Heuristic Approach”
1 2 3
https://conference.icc.ru/event/3/. http://motor2019.uran.ru. http://math.nsc.ru/conference/motor/2020/.
vi
Preface
– Claudia Sagastizábal (IMECC - University of Campinas, Brazil), “Exploiting Structure in Nonsmooth Optimization” – Dmitri Shmelkin (Moscow Research Center of Huawei, Russia), “Novel Scenarios and Old Challenges: Discrete Optimization at Various Huawei Technologies” – Mikhail Solodov (Institute for Pure and Applied Mathematics, Brazil), “State-of-the-art on Rates of Convergence and Cost of Iterations of Augmented Lagrangian Methods”. The following tutorials were given by outstanding scientists: – Alexander Krylatov (Saint-Petersburg State University, Russia), “Equilibrium Traffic Flow Assignment in a Multi-Subnet Urban Road Network” – Alexander Strekalovsky (Matrosov Institute for System Dynamics and Control Theory, Irkutsk, Russia), “Modern Nonconvex Optimization: Theory, Methods, and Applications”. We thank the authors for their submissions, and the members of the PC and external reviewers for their efforts in providing exhaustive reviews. We thank our sponsors and partners: the Mathematical Center in Akademgorodok, Huawei Technologies Co., Ltd., Sobolev Institute of Mathematics, Krasovsky Institute of Mathematics and Mechanics, Ural Mathematical Center, Center for Research and Education in Mathematics, Higher School of Economics (Campus Nizhny Novgorod), and Matrosov Institute for System Dynamics and Control Theory. We are grateful to the colleagues from the Springer LNCS and CCIS editorial boards for their kind and helpful support. August 2021
Alexander Strekalovsky Yury Kochetov Tatiana Gruzdeva Andrei Orlov
Organization
Program Committee Chairs Panos Pardalos Michael Khachay Oleg Khamisov Yury Kochetov Alexander Strekalovsky
University of Florida, USA Krasovsky Institute of Mathematics and Mechanics, Russia Melentiev Energy Systems Institute, Russia Sobolev Institute of Mathematics, Russia Matrosov Institute for System Dynamics and Control Theory, Russia
Program Committee Anatoly Antipin Alexander Arguchintsev Pasquale Avella Evripidis Bampis Olga Battaïa René van Bevern Maurizio Boccia Sergiy Butenko Igor Bychkov Igor Bykadorov Tatjana Davidović Stephan Dempe Gianni Di Pillo Alexandre Dolgui Mirjam Duer Vladimir Dykhta Rentsen Enkhbat Anton Eremeev Adil Erzin Yuri Evtushenko Alexander Filatov Mikhail Falaleev Fedor Fomin Alexander Gasnikov
Dorodnicyn Computing Centre, FRC, CSC, RAS, Russia Irkutsk State University, Russia University of Sannio, Italy Sorbonne Université, France ISAE-Supaero, Toulouse, France Novosibirsk State University, Russia University of Naples Federico II, Italy Texas A&M University, USA Matrosov Institute for System Dynamics and Control Theory, Russia Sobolev Institute of Mathematics, Russia Mathematical Institute SANU, Serbia Freiberg University, Germany University of Rome “La Sapienza”, Italy IMT Atlantique, France University of Augsburg, Germany Matrosov Institute for System Dynamics and Control Theory, Russia Institute of Mathematics and Digital Technology, Mongolia Sobolev Institute of Mathematics, Russia Novosibirsk State University, Russia Dorodnicyn Computing Centre, FRC, CSC, RAS, Russia Far Eastern Federal University, Russia Irkutsk State University, Russia University of Bergen, Norway Moscow Institute of Physics and Technology, Russia
viii
Organization
Victor Gergel Edward Gimadi Aleksander Gornov Alexander Grigoriev Feng-Jang Hwang Alexey Izmailov Milojica Jacimovic Klaus Jansen Sergey Kabanikhin Valeriy Kalyagin Vadim Kartak Alexander Kazakov Lev Kazakovtsev Andrey Kibzun Donghyun (David) Kim Igor Konnov Alexander Kononov Alexander Kruger Dmitri Kvasov Tatyana Levanova Vadim Levit Frank Lewis Leo Liberti Bertrand M. T. Lin Marko Makela Vittorio Maniezzo Pierre Marechal Vladimir Mazalov Boris Mordukhovich Yury Nikulin Ivo Nowak Evgeni Nurminski Leon Petrosyan Alex Petunin Boris Polyak Leonid Popov Mikhail Posypkin Oleg Prokopyev Artem Pyatkin Soumyendu Raha Alexander Razgulin Jie Ren
University of Nizhni Novgorod, Russia Sobolev Institute of Mathematics, Russia Matrosov Institute for System Dynamics and Control Theory, Russia Maastricht University, The Netherlands University of Technology Sydney, Australia Lomonosov Moscow State University, Russia University of Montenegro, Montenegro Kiel University, Germany Institute of Numerical Mathematics and Mathematical Geophysics, Russia Higher School of Economics, Russia Ufa State Aviation Technical University, Russia Matrosov Institute of System Dynamics and Control Theory, Russia Siberian State Aerospace University, Russia Moscow Aviation Institute, Russia Kennesaw State University, USA Kazan Federal University, Russia Sobolev Institute of Mathematics, Russia Federation University, Australia University of Calabria, Italy Dostoevsky Omsk State University, Russia Ariel University, Israel University of Texas at Arlington, USA CNRS, France National Chiao Tung University, Taiwan University of Turku, Finland University of Bologna, Italy University Paul Sabatier, France Institute of Applied Mathematical Research, Russia Wayne State University, USA University of Turku, Finland Hamburg University of Applied Sciences, Germany Far Eastern Federal University, Russia Saint Petersburg State University, Russia Ural Federal University, Russia Trapeznikov Institute of Control Science, Russia Krasovsky Institute of Mathematics and Mechanics, Russia Dorodnicyn Computing Centre, Russia University of Pittsburgh, USA Sobolev Institute of Mathematics, Russia Indian Institute of Science, India Lomonosov Moscow State University, Russia Huawei Russian Research Institute, Russia
Organization
Anna N. Rettieva Claudia Sagastizabal Yaroslav Sergeyev Natalia Shakhlevich Alexander Shananin Vladimir Shikhman Angelo Sifaleras Vladimir Skarin Vladimir Srochko Claudio Sterle Petro Stetsyuk Roman Strongin Nadia Sukhorukova Tatiana Tchemisova Alexander Tolstonogov Ider Tseveendorj Vladimir Ushakov Olga Vasilieva Alexander Vasin Vitaly Zhadan Dong Zhang Anatoly Zhigljavsky Yakov Zinder
Institute of Applied Mathematical Research, Russia Unicamp, Brazil University of Calabria, Italy University of Leeds, UK Moscow Institute of Physics and Technology, Russia Catholic University of Louvain, Belgium University of Macedonia, Greece Krasovsky Institute of Mathematics and Mechanics, Russia Irkutsk State University, Russia University of Naples Federico II, Italy Glushkov Institute of Cybernetics, Ukraine University of Nizhni Novgorod, Russia Swinburne University of Technology, Australia University of Aveiro, Portugal Matrosov Institute for System Dynamics and Control Theory, Russia University of Versailles, France Krasovsky Institute of Mathematics and Mechanics, Russia Universidad del Valle, Colombia Lomonosov Moscow State University, Russia Dorodnitsyn Computing Centre, Russia Huawei Technologies, Co., Ltd., China Cardiff University, UK University of Technology Sydney, Australia
Additional Reviewers Abbasov, Majid Berikov, Vladimir Berndt, Sebastian Brinkop, Hauke Buchem, Moritz Buldaev, Alexander Buzdalov, Maxim Chernykh, Ilya Dang, Duc-Cuong Davydov, Ivan Deineko, Vladimir Deppert, Max Gluschenko, Konstantin Golak, Julian Gonen, Rica Grage, Kilian
ix
Gromova, Ekaterina Iljev, Victor Jaksic Kruger, Tatjana Khachay, Daniel Khoroshilova, Elena Khutoretskii, Alexandr Kononova, Polina Kovalenko, Yulia Kulachenko, Igor Kumacheva, Suriya Kuzyutin, Denis Lassota, Alexandra Lavlinskii, Sergey Lee, Hunmin Lempert, Anna Melnikov, Andrey
x
Organization
Morshinin, Alexander Neznakhina, Ekaterina Ogorodnikov, Yuri Orlov, Andrei Pinyagina, Olga Plotnikov, Roman Plyasunov, Alexander Rohwedder, Lars Sandomirskaya, Marina Semenov, Alexander Servakh, Vladimir Sevastyanov, Sergey Shenmaier, Vladimir Shkaberina, Guzel Simanchev, Ruslan Srochko, Vladimir Stanimirovic, Zorica
Stanovov, Vladimir Staritsyn, Maxim Sukhoroslov, Oleg Tovbis, Elena Tsidulko, Oxana Tsoy, Yury Tur, Anna Tyunin, Nikolay Urazova, Inna Urosevic, Dragan van Lent, Freija Vasin, Alexandr Veremchuk, Natalia Yanovskaya, Elena Zalyubovskiy, Vyacheslav Zolotykh, Nikolai
Industry Section Chair Vasilyev Igor
Matrosov Institute for System Dynamics and Control Theory, Russia
Organizing Committee Chair Alexander Kazakov
ISDCT SB RAS, Russia
Deputy Chair Andrei Orlov
ISDCT SB RAS, Russia
Scientific Secretary Tatiana Gruzdeva Vladimir Antonik Maria Barkova Oleg Khamisov Stepan Kochemazov Polina Kononova Alexey Kumachev Pavel Kuznetsov Anna Lempert Timur Medvedev Taras Madzhara Nadezhda Maltugueva
ISDCT SB RAS, Russia IMIT ISU, Russia ISDCT SB RAS, Russia ESI SB RAS, Russia ISDCT SB RAS, Russia IM SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia HSE, Nizhny Novgorod, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia
Organization
Ilya Minarchenko Ekaterina Neznakhina Yuri Ogorodnikov Nikolay Pogodaev Stepan Sorokin Pavel Sorokovikov Maxim Staritsyn Alexander Stolbov Anton Ushakov Igor Vasiliev Tatiana Zarodnyuk Maxim Zharkov
ESI SB RAS, Russia IMM UB RAS, Russia IMM UB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia ISDCT SB RAS, Russia
Organizers Matrosov Institute for System Dynamics and Control Theory, Russia Sobolev Institute of Mathematics, Russia Krasovsky Institute of Mathematics and Mechanics, Russia Higher School of Economics (Campus Nizhny Novgorod), Russia
Sponsors Center for Research and Education in Mathematics, Russia Huawei Technologies Co., Ltd. Mathematical Center in Akademgorodok, Russia Ural Mathematical Center, Russia
xi
Contents
Continuous Optimization Optimal (in the Sense of the Minimum of the Polyhedral Norm) Matrix Correction of Inconsistent Systems of Linear Algebraic Equations and Improper Linear Programming Problems in Interval Constraints . . . . . . . Vladimir Erokhin, Alexander Krasnikov, Vladimir Volkov, and Mikhail Khvostov Solving Smooth Min-Min and Min-Max Problems by Mixed Oracle Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Egor Gladin, Abdurakhmon Sadiev, Alexander Gasnikov, Pavel Dvurechensky, Aleksandr Beznosikov, and Mohammad Alkousa A Subgradient Projection Method for Set-Valued Network Equilibrium Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Igor Konnov and Olga Pinyagina
3
19
41
Non-convex Optimization in Digital Pre-distortion of the Signal . . . . . . . . . . Alexander Maslovskiy, Dmitry Pasechnyuk, Alexander Gasnikov, Anton Anikin, Alexander Rogozin, Alexander Gornov, Lev Antonov, Roman Vlasov, Anna Nikolaeva, and Maria Begicheva
54
Zeroth-Order Algorithms for Smooth Saddle-Point Problems . . . . . . . . . . . . Abdurakhmon Sadiev, Aleksandr Beznosikov, Pavel Dvurechensky, and Alexander Gasnikov
71
Algorithms for Solving Variational Inequalities and Saddle Point Problems with Some Generalizations of Lipschitz Property for Operators . . . . . . . . . . . Alexander A. Titov, Fedor S. Stonyakin, Mohammad S. Alkousa, and Alexander V. Gasnikov
86
Application of Smooth Approximation in Stochastic Optimization Problems with a Polyhedral Loss Function and Probability Criterion . . . . . . . . . . . . . . Roman Torishnyi and Vitaliy Sobol
102
An Acceleration of Decentralized SGD Under General Assumptions with Low Stochastic Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ekaterina Trimbach and Alexander Rogozin
117
xiv
Contents
Integer Programming and Combinatorial Optimization A Feature Based Solution Approach for the Flying Sidekick Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maurizio Boccia, Andrea Mancuso, Adriano Masone, and Claudio Sterle Maximizing the Minimum Processor Load with Linear Externalities . . . . . . . Julia V. Chirkova
131
147
Analysis of Optimal Solutions to the Problem of a Single Machine with Preemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. A. Chernykh and V. V. Servakh
163
Solving Irregular Polyomino Tiling Problem Using Simulated Annealing and Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aigul I. Fabarisova and Vadim M. Kartak
175
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover for Continuous p-Median Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lev Kazakovtsev, Ivan Rozhnov, Ilnar Nasyrov, and Viktor Orlov
184
Continuous Reformulation of Binary Variables, Revisited . . . . . . . . . . . . . . Leo Liberti
201
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition of a Regular Multigraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrey Kostenko and Andrei Nikolaev
216
The Constrained Knapsack Problem: Models and the Polyhedral-Ellipsoid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oksana Pichugina and Liudmyla Koliechkina
233
NP-Hardness of 1-Mean and 1-Medoid 2-Clustering Problem with Arbitrary Clusters Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Artem V. Pyatkin
248
The Polytope of Schedules of Processing of Identical Requirements: The Properties of the Relaxation Polyhedron . . . . . . . . . . . . . . . . . . . . . . . R. Yu. Simanchev and I. V. Urazova
257
A Heuristic Approach in Solving the Optimal Seating Chart Problem . . . . . . Milan Tomić and Dragan Urošević
271
Fast Heuristic Algorithms for the Multiple Strip Packing Problem . . . . . . . . . Igor Vasilyev, Anton V. Ushakov, Maria V. Barkova, Dong Zhang, Jie Ren, and Juan Chen
284
Contents
xv
Operational Research Applications The Research of Mathematical Models for Forecasting Covid-19 Cases . . . . . Mostafa Salaheldin Abdelsalam Abotaleb and Tatiana Makarovskikh Detecting Corruption in Single-Bidder Auctions via Positive-Unlabelled Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natalya Goryunova, Artem Baklanov, and Egor Ianovski On the Speed-in-Action Problem for the Class of Linear Non-stationary Infinite-Dimensional Discrete-Time Systems with Bounded Control and Degenerate Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Danis N. Ibragimov and Nikita M. Novozhilkin Method for Calculating the Air Pollution Emission Quotas . . . . . . . . . . . . . . Vladimir Krutikov, Anatoly Bykov, Elena Tovbis, and Lev Kazakovtsev Bilevel Models for Socially Oriented Strategic Planning in the Natural Resources Sector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergey Lavlinskii, Artem Panin, and Alexander Plyasunov Strong Stability in Finite Games with Perturbed Payoffs . . . . . . . . . . . . . . . Yury Nikulin and Vladimir Emelichev Inverse Optimal Control with Continuous Updating for a Steering Behavior Model with Reference Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ildus Kuchkarov, German Mitiai, Ovanes Petrosian, Timur Lepikhin, Jairo Inga, and Sören Hohmann
301
316
327 342
358 372
387
Dynamic Cooperative Games on Networks. . . . . . . . . . . . . . . . . . . . . . . . . Leon Petrosyan, David Yeung, and Yaroslavna Pankratova
403
Consumer Loan Demand Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. A. Shananin, M. V. Tarasenko, and N. V. Trusov
417
An Industry Maintenance Planning Optimization Problem Using CMA-VNS and Its Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna Zholobova, Yefim Zholobov, Ivan Polyakov, Ovanes Petrosian, and Tatyana Vlasova Numerical Solution of the Inverse Problem for Diffusion-Logistic Model Arising in Online Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olga Krivorotko, Tatiana Zvonareva, and Nikolay Zyatkov
429
444
xvi
Contents
Optimal Control On One Approach to the Optimization of Discrete-Continuous Controlled Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Buldaev
463
On One Optimization Problem for the Age Structure of Power Plants Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evgeniia Markova and Inna Sidler
478
Valid Implementation of the Fractional Order Model of Energy Supply-Demand System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samad Noeiaghdam and Denis Sidorov
493
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
505
Continuous Optimization
Optimal (in the Sense of the Minimum of the Polyhedral Norm) Matrix Correction of Inconsistent Systems of Linear Algebraic Equations and Improper Linear Programming Problems in Interval Constraints Vladimir Erokhin1(B) , Alexander Krasnikov2 , Vladimir Volkov3 , and Mikhail Khvostov3 1
3
Mozhaisky Military Space Academy, 13 Zhdanovskaya Street, 197198 St. Petersburg, Russia [email protected] 2 Moscow Polytechnic University, 38 Bolshaya Semyonovskaya Street, 107023 Moscow, Russia Borisoglebsk Branch of Voronezh State University, 43 Narodnaya Street, 397160 Borisoglebsk, Russia {volkov,hvostoff}@fizmat.net
Abstract. The problems of multiparameter (matrix) correction of the inconsistent systems of the linear algebraic equations and improper linear programming problems written down in a canonical form have been considered in this paper. Optimum criteria pointed out in the given problems are the weighted minimax criterion and the minimum of the considered sum of the absolute values of the elements of matrix correction requirement. The considered problems of correction are supplemented with interval constraints put in over the corrected elements of the matrix system and its right part. The analyzed systems of the linear algebraic equations can contain fixed (not subject to correction) rows and columns. Calculating layouts of the solution of the stated problems of matrix correction are given and substantiated. The calculating layouts demand the minimum of the scalar positive parameters on the upper (external) level and reduced to the solution of the dependent on the stated parameter of the systems of the linear algebraic equations and inequalities and also the problems of linear programming on the low (internal) level. Keywords: Inconsistent linear systems · Improper linear programming problems · Matrix correction · Polyhedral norms Weighted minimax criterion · Interval constraints
c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 3–18, 2021. https://doi.org/10.1007/978-3-030-86433-0_1
·
4
1
V. Erokhin et al.
Introduction
Consider a system of linear algebraic equations of the following type b AS x , · A = xS d T U
(1)
where A = (aij ) ∈ Rm×n , S ∈ Rm×k , T ∈ Rl×n , U ∈ Rl×k , xA ∈ Rn , xS ∈ Rk , b = (bi ) ∈ Rm , d ∈ Rl . In some specific cases, which be shown further, the system (1) will be supplemented by the condition xA 0.
(2)
The set of solutions of the system (1) is denoted as X(A, S, T, U, b, d), of the system (1)–(2) as X+ (A, S, T, U, b, d). Subsystem T xA + U xS = d of the system (1) will be considered as compatible. Note that the system (1)–(2) can be interpreted as a constraint system of some linear programming problem written (under additional condition xS 0) in a canonical form. The most important assumptions of this paper are based on the fact that X(A, S, T, U, b, d) = ∅, X+ (A, S, T, U, b, d) = ∅, but matrix A and, probably, vector b can be (and should be) corrected. In other words, the task of the study
is to find such matrix A = (Aij ) ∈ Rm×n and such vector b = ( b i ) ∈ Rm
that X(A, S, T, U, b, d) = ∅, X(A, S, T, U, b , d) = ∅, X+ (A, S, T, U, b, d) = ∅,
X+ (A, S, T, U, b , d) = ∅. Thus, the object of analysis of this article is devoted to inconsistent systems of linear algebraic equations and improper linear programming problems with an incompatible system of constraints. Another possible object of research is the actual linear programming problem, in which the optimal value of the objective function does not reach the a priori directive value. For example, let the optimal criterion be (3) c A xA + cS xS → max, where cA ∈ Rn , cS ∈ Rk are any vectors, γ ∗ is the optimal value of the criterion (3), γ is a known a priori directive value of the criterion (3), but γ ∗ < γ. Having included the equation c A xA + cS xS = γ into subsystem T xA + U xS = d we again obtain an inconsistent system of the linear equations and inequalities of type (1)–(2). Note that the block form of the system (1), in fact, means that some columns and (or) rows of matrix (or extended matrix) of the system (1) are not allowed to be corrected. But in this article, the number of such rows and columns is not strictly stipulated, which allows in some special cases to consider systems (1) and (1)–(2) having only fixed rows or columns or no fixed elements at all. It is obvious that the natural requirement arising in the applied problems
is the requirement of the “proximity” of the matrix A to matrix A and vector
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
5
b to vector b. The above requirement can be formalized in different ways. We consider the following criteria:
max{αij ·|Aij − aij |} → inf,
(4)
i,j
max{αij · |Aij − aij |, βi · | b i − bi |} → inf, i,j
σi τj ·|Aij − aij | → inf,
(5) (6)
i,j
where αij > 0, βi > 0, σi > 0, τj > 0 are some weight coefficients. In addition to criteria (4)–(6), we also consider interval constraints of the type
A A A, b b b, where A = (aij ) ∈ Rm×n , A = (aij ) ∈ Rm×n , b = (bi ) ∈ Rm , b = (bi ) ∈ Rm and natural conditions A A and b b are satisfied. Note that criteria (4)–(6) can be rewritten in terms of polyhedral generalized matrix norms. Recall (see, for example, [1]) that for an arbitrary matrix A ∈ Rm×n , norms ·1 and ·∞ are defined as A1 =
m n
|aij | ,
(7)
i=1 j=1
A∞ =
max
i=1,2,...,m j=1,2,...,n
|aij | .
(8)
Subject to formulas (7)–(8)
max{αij · |Aij − aij |} = A ◦ (A − A)∞ , i,j
max{αij · |Aij − aij |, βi · | b i − bi |} = [ A ◦ (A − A)β · ( b − b) ]∞ , i,j σi τj · |Aij − aij | = diag(σ) · (A − A) · diag(τ )1 ,
(9) (10) (11)
i,j
where A = (αij ), β = (βi ), σ = (σi ), τ = (τj ), symbol “◦” denotes matrix product of Hadamard. Before turning to the synthesis of the main material, it is necessary to say a few words about the background to the problems discussed. Historically, great interest in matrix correction of the inconsistent systems of linear algebraic equations in the sense of minimum Euclidean norm was shown earlier than in the problems of matrix correction in polyhedral norms. The above problem is closely related to the so-called Total Least Squares (TLS) method. Today it
6
V. Erokhin et al.
is a widespread and rapidly developing scientific field. The number of publications devoted to the study of this scientific field is very high. We only mention S. Van Haffel and J. Vandewalle’s classic monograph [2]. The usage of polyhedral norms in the matrix approximation problems is not a popular research topic such as TLS. Here, in this context, it is necessary to mention the research works of G.A. Watson [3,4], in which the mentioned problems are solved approximately. Monograph by I.I. Eremin, Vl. D. Mazurov, N.N. Astafiev [5] (the first systematized approach to the problem of correcting improper linear programming problems), research by A.A. Vatolin [6–8] (matrix correction of systems of linear equations, inequalities and linear programming problems in various norms, including polyhedral norms), V.A. Gorelik’s research paper [9] (systematized formulation of matrix correction of improper linear programming problems in the Euclidean norm) and the research from [10–15] (matrix correction of systems of linear equations and linear programming problems in ·α,β -norms) are the main scientific works used by the authors to create methods for correcting improper linear programming problems and inconsistent linear systems.
2
A Set of Auxiliary Formulas and Theorems (“Tools”)
The results presented in this paper are based on several useful auxiliary correlations, which are discussed in this section. Theorem 1. Let system Ax = b, x 0, where A ∈ R the system
m×n
, x ∈ R , b ∈ Rm be inconsistent. Then for the consistency of n
˜ = b, A A˜ A, Ax ⎤ ⎡ ⎡ a1 a1 ⎢ .. ⎥ ⎢ .. m×n , A = ⎣ . ⎦, A = ⎣ . where A, A ∈ R
x 0, ⎤
(12)
⎥ ⎦, A A, it is necessary and
am am sufficient to have a consistent system of linear inequalities of the form Ax b Ax, x 0.
(13)
Then, the system (13) is consistent and the vector x is its solution, the matrix ˜ which is a solution to the system of linear inequalities (12), can be constructed A, by the following formula A˜ = A + diag(g) · A − A , (14) where g = (gi ) ∈ Rm ,
(bi − ai x) / (ai − ai ) x, if (ai − ai ) x = 0, gi = any number γ ∈ [0, 1] , otherwise.
(15)
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
7
Proof. Sufficiency. Let the system of linear inequalities (13) be consistent, x 0 be some of its solutions, and matrix A˜ be constructed by the formulas (14)–(15). Let’s show that vector x and matrix A˜ are also the solutions of the system (12). Indeed, taking into consideration (14), ⎤ ˜b1 . ⎥ ˜ = Ax + diag(g) · A − A x = ˜b = ⎢ Ax ⎣ .. ⎦ , ˜bm ⎡
where ˜bi = a x + gi (ai − a )x, i = 1, 2, ..., m. i i Let condition (ai − ai )x = 0 be held for some i. Then, according to (15), ˜bi ≡ bi . Suppose now the opposite relation: let (ai − a )x = 0 for some i. Using i this inequality and the inequality ai x bi ai x, which is true due to the consistency of system (13) after simple calculations, we can say with confidence that ai x = ai x = bi . Obviously, according to (15), ˜bi ≡ bi . Thus, we have shown ˜ ≡ b is satisfied. At the same time, we can show that that the condition Ax 0 gi 1 ∀i = 1, 2, ..., m.
(16)
If (ai − ai )x = 0, then (16) follows directly from (15). Otherwise, it is necessary to additionally use the inequality ai x bi ai x. But using (14) and (16), we are sure that the inequality A A˜ A is true. Necessity. Suppose that system (12) is consistent, the vector x and the matrix A˜ are some of its solutions, but the system of linear inequalities (13) is inconsistent. Let us show that such assumptions lead to a contradiction. Indeed, multiplying ˜ A on the right by the vector x, as follows from the consistent the matrices A, A, system (12), we obtain the system of linear inequalities (13), which, according to our assumption, is incompatible (hence a contradiction). Definition 1 [1]. Let x, y ∈ Rn be any vectors, ϕ(x) be any vector norm. The function y x ∗ , ϕ (y) = max x=0 ϕ(x) is called the norm, dual to the norm ϕ(·) relative to the scalar product. Definition 2 [1]. A vector y ∈ Rn satisfying the requirement y x = ϕ∗ (y) · ϕ(x) = 1 for any vector x = 0, x ∈ Rn is called dual to the vector x with respect to the norm ϕ(·). It is very important that elements introduced by Definitions 1 and 2 do always exist [1].
8
V. Erokhin et al.
Problem 1. Let x, τ ∈ Rn and b, σ ∈ Rm be some given vectors, x = 0, σ, τ > 0. It is necessary to find a matrix A ∈ Rm×n that will serve as a solution of the equation Ax = b and the value diag(σ) · A · diag(τ )1 must be minimal. Theorem 2. The solution to Problem 1 exists for any vectors x, τ ∈ Rn , b, σ ∈ Rm , x = 0, τ, σ > 0, and the following equalities are true diag(σ) · b1 , min diag(σ) · A · diag(τ )1 = diag−1 (τ ) · x Ax=b ∞ ˆ A = by ∈ Argmin diag(σ) · A · diag(τ ) ,
(17) (18)
1
Ax=b
where y ∈ Rn is a vector that meets the conditions y x = 1,
(19)
1 . diag(τ ) · y1 = diag−1 (τ ) · x ∞
(20)
Proof. First of all, let’s show that if the matrix A be the solution of the equation Ax = b with x = 0, then the following inequality is true diag(σ) · b1 . diag(σ) · A · diag(τ )1 diag−1 (τ ) · x ∞
(21)
Indeed, let matrix A is the solution of the equation Ax = b with x = 0. Then Ax ≡ b ⇔ (diag(σ) · A · diag(τ )) · diag−1 (τ ) · x ≡ (diag(σ) · b) ⇒ ⇒ diag(σ) · b = (diag(σ) · A · diag(τ )) · diag−1 (τ ) · x . 1
1
But as it follows from definitions of ·1 , ·∞ and ·1 norms, (diag(σ)A diag(τ )) diag−1 (τ )x diag(σ)A diag(τ ) diag−1 (τ )x , 1 1 ∞ hence we receive the inequality (21). Now let’s show that the lower bound of the expression diag(σ) · A· diag(τ )1 obtained in the formula (21) is true for matrix Aˆ from (18). First, we prove that the matrix Aˆ exists. For this purpose, let’s show that vector y exists and meets the equalities (19)–(20). Indeed, let diag−1 (τ ) · x = z ∈ Rn ,
(22)
and for τ > 0, vector z exists. Since τ = 0 and x = 0, it means that z = 0. Therefore, the vector w ∈ Rn , w = 0, exists; it is dual to the vector z with respect to the norm ·∞ . As known, the norm ·1 is dual to norm ·∞ . Therefore, according to Definition 2, equality w z = w1 · z∞ = 1
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
9
is true. Now let’s work with y = diag−1 (τ ) · w.
(23)
As τ > 0, vector y is exists. Therefore, the fulfillment of conditions (19)–(20) occurs from (22)–(23). ˆ given in (18) So, the vector y corresponding to (19)–(20) and the matrix A, exist. Further, according to the definition of the norm ·1 and equality (20), we obtain equality (17): diag(σ) · Aˆ · diag(τ )1 = diag(σ) · by · diag(τ )1 = |σi bi yj τj | =
σi |bi | ·
i
j
i,j
diag(σ) · b1 . τj |yj | = diag(σ) · b1 · diag(τ ) · y1 = diag−1 (τ ) · x ∞
Problem 2. Let xA ∈ Rn and xS ∈ Rk be some given vectors with xA = 0,
T xA + U xS = d. Our goal is to find such a matrix A ∈ Rm×n , that is a solution
to the Eq. (1), and the value diag(σ) · (A − A) · diag(τ )1 should be minimal. Theorem 3. The solution to Problem 2 exists for any vectors x, τ ∈ Rn , b, σ ∈ Rm , x = 0, τ, σ > 0, and the following equalities are true min
{ diag(σ) · (A − A) · diag(τ )1 } =
X( A,S,T,U,b,d)=∅
A = A + (b − AxA − SxS ) y ∈
ψ (b − AxA − SxS ) , (24) ϕ(xA )
Argmin
A − Aϕ,ψ ,
(25)
X( A,S,T,U,b,d)=∅
where ψ(z) = diag(σ) · z1 , ϕ(x) = diag−1 (τ ) · x∞ ,
(26) (27)
z ∈ Rm is any vector and y ∈ Rn is a vector that meets the conditions y xA = 1,
(28)
1 . diag(τ ) · y1 = diag−1 (τ ) · xA ∞
(29)
Proof. Let xA ∈ Rn and xS ∈ Rk with xA = 0, T xA + U xS = d. Let’s represent
A as A = A + H, where H ∈ Rm×n is some matrix and rewrite the subsystem A xA + SxS = b as (30) HxA = b − AxA − SxS .
10
V. Erokhin et al.
Obviously, taking into account (30), Problem 2 turns to be Problem 1, in which the unknown matrix H needs to be determined from (30). According to Theorem 2, we obtain min
HxA =b−AxA −SxS , T xA +U xS =d
diag(σ) · H · diag(τ )1 =
ˆ = (b − AxA − SxS ) y ∈ H
Argmin
HxA =b−AxA −SxS , T xA +U xS =d
ψ (b − AxA − SxS ) , ϕ(xA )
(31)
diag(σ) · H · diag(τ )1 , (32)
where y ∈ Rn is a vector satisfying conditions (19)–(20). The existence of this vector can be easily shown using the arguments given in Theorem 2. To complete ˆ and compare formulas (31)–(32) the proof, it is necessary to perform A = A + H with formulas (24)–(29).
3
Correction for Weighted Minimax Criterion Within Interval Constraints
In this section, we consider two problems as the main ones. They are
X+ (A, S, T, U, b, d) = ∅, A A A,
(33)
A ◦ (A − A)∞ → inf A
and
X+ (A, S, T, U, b , d) = ∅, A A A, b b b, inf . [ A ◦ (A − A) β ◦ ( b − b) ]∞ →
(34)
A,b
Take notice that in both problems, there is a condition xA 0. This is not accidental. Below with the support of the stated condition and Theorem 1, we show that a simple calculating layout can be used for solving problems (33)–(34). This layout is based on the continuity of scalar-dependent systems of linear equations and inequalities. We also notice if the condition xA 0 is discarded then problems (33), (34) take the form X(A, S, T, U, b, d) = ∅, A A A, (35) A ◦ (A − A)∞ → inf A
and
X(A, S, T, U, b , d) = ∅, A A A, b b b, inf . A ◦ (A − A) β ◦ ( b − b) ∞ → A,b
(36)
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
11
In this article, problems (35)–(36) will be considered under the additional assumption that a priori information about the signs of the xA components is known. Therefore, this assumption allows us to move from problems (35)–(36) to problems (33)–(34). 3.1
Correction of the System (1)–(2)
Let δ > 0 be some parameter. Let us introduce matrices depending on the parameter δ ⎤ ⎤ ⎡ ⎡ w1 (δ) w1 (δ) ⎥ ⎥ ⎢ ⎢ W (δ) = ⎣ ... ⎦ = wij (δ) ∈ Rm×n , W (δ) = ⎣ ... ⎦ = (wij (δ)) ∈ Rm×n , wm (δ)
wm (δ)
and vectors z(δ) = (z i (δ)) ∈ Rm , z(δ) = (z i (δ)) ∈ Rm , as follows
δ δ wij (δ) = max aij , aij − , wij (δ) = min aij , aij + , αij αij
δ δ z i (δ) = max bi , bi − , z i (δ) = min bi , bi + . βi βi
(37) (38)
According to (9)–(10) and (37)–(38), it is safe to say that ⎧ ⎨ A A A, W (δ) A W (δ) ⇔ ⎩ A ◦ (A − A)∞ δ, ⎧ ⎧ ⎨ ⎨ W (δ) A W (δ) A A A, b b b, ⇔ ⎩ [ ⎩ z(δ) b z(δ) A ◦ (A − A) β ◦ ( b − b) ]∞ δ. Consider the whole complex of conditions ⎧ ⎨ AxA + SxS = b, T xA + U xS = d, ⎩ x 0, W (δ) A W (δ).
(39)
A
Suppose the representation S = [s1 , . . . , sm ] is true. Taking into account the arguments given in Theorem 1, it can be shown that the following proposition is true. Proposition 1. The system of linear equations and inequalities (39) is solved
for unknown vectors xA , xS and unknown matrix A if and only if the following system of linear equations and inequalities is solved for the vectors xA , xS
W (δ)xA + SxS b W (δ)xA + SxS , xA 0, (40) T xA + U xS = d.
12
V. Erokhin et al.
˜ When the system (40) is solved with respect to vectors xA , xS , the matrix A, which is a solution to the system of linear equations and inequalities (39), can be constructed by the formula A = W (δ) + diag(g) · W (δ) − W (δ) ,
(41)
where g = (gi ) ∈ Rm , ⎧ ⎨ bi − wi (δ) xA − si xS , if (wi (δ) − wi (δ)) xA = 0, gi = (wi (δ) − wi (δ)) xA ⎩ any number γ ∈ [0, 1] , otherwise. Consider a set of conditions ⎧ ⎨ AxA + SxS = b , T xA + U xS = d, ⎩ xA 0, W (δ) A W (δ), z(δ) b z(δ),
(42)
(43)
where b = ( b i ) ∈ Rm . Taking into account the arguments presented in Theorem 1, it can be shown that the following proposition is true. Proposition 2. The system of linear equations and inequalities (43) is solved
for unknown vectors xA , xS , b and unknown matrix A if and only if the following
system of linear equations and inequalities is solved for the vectors xA , xS , b W (δ)xA + SxS b W (δ)xA + SxS , z(δ) b z(δ), xA 0, (44) T xA + U xS = d
When the system (44) is solved with respect to vectors xA , xS , b , the matrix ˜ which is the solution to the system of linear equations and inequalities (43), A, can be constructed by the formula (41), where ⎧ ⎪ ⎨ b i − wi (δ) xA − si xS , if (wi (δ) − wi (δ)) xA = 0, gi = (45) ⎪ (wi (δ) − wi (δ)) xA ⎩ any number γ ∈ [0, 1] , otherwise. Propositions 1 and 2 allow us to go from (33), (34) to problems W (δ)xA + SxS b W (δ)xA + SxS , T xA + U xS = d, δ 0, δ → inf
(46)
and
W (δ)xA + SxS b W (δ)xA + SxS ,
z(δ)xA b z(δ)xA , T xA + U xS = d, δ 0, δ → inf .
(47)
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
13
The possible computational layout of the solution to the problems (46),(47) has a two-level form. The low (internal) level is to analyze the consistency of the systems (40) and (43) with the fixed value of the parameter δ. As a result, either the inconsistency of the corresponding systems is determined, or the cor
responding vectors xA ,xS and the vector b are determined, which are solutions for the system (43) and the problem (47). The choice of the most efficient numerical algorithm for solving systems (40) and (43) can be the subject of a separate study. Note that such algorithms are part of a whole trove of linear programming algorithms and their number is currently large. The essence of the upper (external) level of the problems (46), (47) solution is to find the smallest value of the parameter δ ∗ when systems (40) and (43) are still consistent. Using the dichotomy method, one can find on the segment [0, δmax ], where δmax > 0 is any, probably known in advance, “sufficiently large” value of the parameter δ, when systems (40) and (43) are really consistent. Suppose now that problem (46) is already solved and we know the elements δ ∗ ,x∗A ,x∗S ,W (δ ∗ ),W (δ ∗ ). Based on this information, it is easy to obtain a solution to the problem (33): ∗ xA inf A ◦ (A − A)∞ = δ ∗ , ∈ X+ (A∗ , S, T, U, b, d), ∗ xS X+ ( A,S,T,U,b,d)=∅,
A AA
where A∗ is the matrix constructed by the formulas (41)–(42) using the values x∗A , x∗S , W (δ ∗ ), W (δ ∗ ). We see that almost all elements needed to solve the problem (33) are already known after solving the problem (46). It remains only to calculate the optimal
correction matrix A∗ . Thus we know the elements δ ∗ , x∗A , x∗S , W (δ ∗ ), W (δ ∗ ), b∗ . Taking these elements into account, it is easy to obtain a solution to the problem (34): [ A ◦ (A − A) β ◦ ( b − b) ]∞ = δ ∗ ,
inf
X+ ( A,S,T,U, b ,d)=∅,
A A A, b b b
x∗A ∈ X+ (A∗ , S, T, U, b∗ , d), x∗S
where A∗ is the matrix built by the formulas (41), (45) using the values x∗A , x∗S ,
W (δ ∗ ), W (δ ∗ ), and b∗ . So, to complete the analysis of the problem (34), it is
necessary to additionally calculate only the matrix A∗ . In conclusion, summarizing the material in this section, let us discuss the issue of achieving the corresponding lower bounds in the problems (33)–(34).
14
V. Erokhin et al.
Note that, on the one hand, the lower constraint for special-purpose functions declared in the problems is clearly visible, since this follows from the nonnegativity property of the matrix norm. It is also obvious that the upper and lower
constraints on the elements of the matrix A∗ and the vector b∗ are directly related to the problem in problems (33)–(34). At the same time, the boundedness of the components x∗A , x∗S does not follow either from the corresponding problems, or from the considered methods of their solution. Therefore, it is impossible to exclude cases when the norm of the solution ∗ xA x∗ of the corrected system of linear equations and inequalities (1)–(2) S
with an “optimal” matrix A∗ and vector b∗ turns out to be infinite. The way out of this situation can be found by including additional constraints of the following type x x θ, (48) A S where · is some vector norm, θ > 0 is any probably known a priori value. If the vector norm in (48) is polyhedral, then the computational scheme for solving problems (33)–(34) does not undergo principal changes. But to these systems of linear equations and inequalities (39) and (43) are added inequalities based on (48). 3.2
Correction of the System (1) Using a Priori Information About the Signs of the xA Components
Let u = (ui ) ∈ Rn be a vector composed of the numbers {−1, 1}, |xA | ∈ Rn be a vector composed of the absolute values of the xA components, the representation xA = diag(u) · |xA |
(49)
is true. So, the vector u is a formal realization of a priori information about the signs of the components of the vector xA .
We perform some elementary transformations. Let’s replace xA by |xA | , A ˘ where by A, (50) A˘ = A · diag(u), 1 and A, A by A˘ = a ˘n ∈ Rm×n and A˘ = a ˘ ... a ˘1 . . . a ˘n ∈ Rm×n which are ¯i of the matrices A and A by the following composed of the columns ai and a rule
a ˘i = ai , a ˘i = a ¯i , if ui = 1, i i i a ˘ = −¯ a, a ˘ = −ai , if ui = −1. Then it is obvious that the problems (35) and (36) are reduced to (33) and (34), respectively. Suppose that the modified problem (33) or (34) is solved and we know the elements of its optimal solution |x∗A | and A˘∗ . Then the elements of the optimal
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
15
solution (33) or (34) defined by formulas (49) and (50) can be calculated as follows x∗A = diag(u) · |x∗A | , A∗ = A˘∗ · diag−1 (u). It shouldalsobe noted that the remarks about the unboundedness of the x∗A in problems (33)–(34), made at the end of the previous parax∗S graph, are also valid for problems (35)–(36). vector norm
4
The Problem of Correction of the System (1)–(2) Based on a Criterion of the Minimum of Deviations Within Interval Constraints
This section addresses the following problems
inf , diag(σ) · (A − A) · diag(τ )1 →
(51)
A,b
X(A, S, T, U, b , d) = ∅, b b b, and
inf , diag(σ) · (A − A) · diag(τ )1 →
(52)
A,b
X+ (A, S, T, U, b , d) = ∅, b b b. Solutions to these problems will be formed taking into account their similarity with Problem 2, considered in the previous section. Let’s start with the problem (51). It differs from Problem 2 in some conditions: 1) the vectors xA , xS are unknown, 2) the vector b is corrected within the interval constraints. Taking into account additional conditions and using Theorem 3, we find out that problem (51) is equivalent to the problem
ψ( b − AxA − SxS ) → ϕ(xA )
inf
T xA +U xS =d, xA ,xS , b
,
(53)
b b b
where the functions ϕ(·) and ψ(·) are defined as in Theorem 2. If we assume that problem (53) is solved and its optimal vectors x∗A , x∗S and
b∗ are found, then the missing element of the solution of the problem (51) is
the matrix A∗ in accordance with Theorem 3. It can be found by formula (25). Similar considerations can be applied to the problem (52). For this, instead of problem (53), it is necessary to solve an additional problem:
ψ( b − AxA − SxS ) → ϕ(xA )
inf
T xA +U xS =d,
xA ,xS , b b b b, xA 0
.
16
V. Erokhin et al.
Now let’s solve the problem (53). Let xA = t · x ¯A , where t > 0 is any value, xA ) = 1. Then problem (53) can be rewritten as x ¯A is a vector such that ϕ(¯
xA − t−1 SxS ) → inf, ψ(t−1 b − A¯
ϕ(¯ xA ) = 1, T x ¯A + t−1 U xS = t−1 d, b b b, t > 0.
(54)
Minimization of the expression ψ(t−1 b − A¯ xA − t−1 SxS ) can be performed by the following standard method. Let
xA − t−1 SxS = p − q, t−1 b − A¯ where p = (pi ) ∈ Rm , q = (qi ) ∈ Rm , p, q 0, p q = 0. Then by the formula (26) we get m xA − t−1 SxS ) = σi · (pi + qi ), (55) ψ(t−1 b − A¯ i=1
and as a result we have
ψ(t
−1
b − A¯ xA − t
−1
SxS ) → inf ⇔
⎧ ⎪ ⎨ t ⎪ ⎩
m −1
i=1
σi · (pi + qi ) → inf,
b − A¯ xA − t−1 SxS = p − q, p 0, q 0, p q = 0.
(56)
Note that in the system (56) condition p q = 0 is redundant since m it will be automatically fulfilled in the process of minimizing the expression i=1 σi · (pi + qi ). Obviously, the following transformations of the problem (54) must concern the condition ϕ(¯ xA ) = 1. According to (27) and the definition of ·∞ we obtain ϕ(¯ xA ) = 1 ⇔ −τ x ¯A τ.
(57)
So, taking into account our remarks and (55)–(57), we find that problem (54) is equivalent to the problem m inf , i=1 σi · (pi + qi ) →
p,q,¯ xA , b ,t
t−1 b − A¯ xA − t−1 SxS = p − q, T x ¯A + t−1 SxS = t−1 d,
(58)
−τ x ¯A τ, b b b, p 0, q 0, t > 0. A possible calculation scheme for solving the problem (58) is two-level. The low (internal) level is contained in the solution to the linear programming problem arising from the problem (58) with a fixed value of t. The upper (external) level is contained in the minimization of the optimal values of the objective function of the lower level over t. Note that using “inf” instead of “min” in the problem (58) is correct because there is a case when t∗ → ∞.
Optimal (in Sense of Minimum of the Polyhedral Norm) Matrix Correction
17
In this case, we can add to the problem (58) the additional condition t t, where t > 0 is some, probably known a priori, upper value of ϕ (xA ). Problem (52), for reasons similar to those stated above, is reduced to the problem m inf , i=1 σi · (pi + qi ) → p,q,¯ xA , b ,t
t
−1
b − A¯ xA − t
−1
SxS = p − q, T x ¯A + t−1 SxS = t−1 d,
0x ¯A τ, b b b, p 0, q 0, t > 0.
5
Conclusion
It is known that systems of linear algebraic equations and inequalities, as well as linear programming problems, are widely used in modern applied mathematics. At the same time, the inconsistency (improperness) of the declared objects, except for gross miscalculations (errors) when constructing models in some applied problems, has long been perceived by mathematicians as an essential feature of the model. It is due to the influence of errors (noise) and inaccuracies in the initial information. The authors hope that the methods of multiparameter (matrix) correction of inconsistent systems of linear algebraic equations and inequalities, as well as improper linear programming problems, will serve as a flexible analysis tool. This will allow taking into account possible a priori information about the applied problem using interval constraints and weight coefficients. Thus, the use of multifaceted norms as a qualitative indicator of correction may turn out to be an important alternative to the Euclidean norm in cases where the hypothesis of the normal minimum of the distribution of errors is incorrect.
References 1. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1986) 2. Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis. Frontiers in Applied Mathematics series 9, SIAM, Philadelphia, Pennsylvania (1991) 3. Watson, G.A.: On a class of matrix nearness problems. Const. Approx. 7, 299–314 (1991) 4. Watson, G.A.: Data fitting problems with bounded uncertainties in the data. SIAM J. Matrix Anal. Appl. 22(4), 1274–1293 (2001) 5. Eremin, I.I., Mazurov, N.N., Astafiev, N.N.: Improper Linear and Convex Programming Problems. Nauka, Moscow (1983). (in Russian) 6. Vatolin, A.A.: About Problems of Linear Programming with Interval Coefficients. U.S.S.R. Comput. Math. Math. Phys. 24(6), 18–23 (1984) 7. Vatolin, A.A.: Approximation of improper linear programming problems by a criterion of euclidean norm. Zh. Vychisl. Mat. Mat. Fiz. 24(12), 1907–1908 (1984) (in Russian)
18
V. Erokhin et al.
8. Vatolin, A.A.: Improper Mathematical Programming Problems and their Correction Methods. Doctoral Thesis. Institute for Mathematics and Mechanics, Ural Department of Russian Academy of Sciences, Ekaterinburg (1992) (in Russian) 9. Gorelik, V.A.: Matrix correction of a linear programming problem with inconsistent constraints. Comput. Math. Math. Phys. 41(11), 1615–1622 (2001) 10. Erokhin, V.I.: Optimal Matrix Correction and Regularization of Inconsistent Linear Models. Diskretn. Anal. Issled. Oper. Ser. 29(2), 41–77 (2002) (in Russian) 11. Volkov, V.V., Erokhin, V.I., Kakaev, V.V., Onufrei, A.Y.: Generalizations of tikhonov’s regularized method of least squares to non-euclidean vector norms. Comput. Math. Math. Phys. 57(9), 1416–1426 (2017) 12. Erokhin, V.I., Volkov, V.V., Khvostov, M.N.: Modification of the A. N. Tikhonov’s Method for Solving Approximate Systems of Linear Algebraic Equations for NonEuclidean Norms. Vestn. Voronezh. Gos. Univ. Ser. Fiz. Mat. 4, 84–101 (2018) (in Russian) 13. Erokhin, V.I., Volkov, V.V.: Solution of the approximate system of linear algebraic equations minimal with respect to the 1 -norm. In: Proceedings of IX Moscow International Conference on Operations Research (ORM2018) Moscow, pp. 137– 141 (2018) 14. Volkov, V.V., Erokhin, V.I., Onufrei, A. Yu., Kakaev, V.V., Kadochnikov, A.P.: Solution of the Approximate System of Linear Algebraic Equations Minimal with Respect to the ∞ -norm. In: Proceedings of IX Moscow International Conference on Operations Research (ORM2018) Moscow, pp. 156–161 (2018) 15. Erokhin, V.I., Volkov, V.V., Krasnikov, A.S., Khvostov, M.N.: Solving Approximate Systems of Linear Algebraic Equations Using Polyhedral Norms. Recent Advances of the Russian Operations Research Society. Cambridge Scholars Publishing, pp. 271–282 (2020)
Solving Smooth Min-Min and Min-Max Problems by Mixed Oracle Algorithms Egor Gladin1,2,5 , Abdurakhmon Sadiev1(B) , Alexander Gasnikov1,3,5 , Pavel Dvurechensky3,4,5 , Aleksandr Beznosikov1,3 , and Mohammad Alkousa1,3 1
5
Moscow Institute of Physics and Technology, 1 “A” Kerchenskaya st., Moscow 117303, Russia {gladin.el,sadiev.aa,gasnikov.av,beznosikov.an}@phystech.edu 2 Skolkovo Institute of Science and Technology, 3 bld. 1 Bolshoy Boulevard, Moscow 121205, Russia 3 HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia 4 Weierstrass Institute for Applied Analysis and Stochastics, 39, Mohrenstr, Berlin, Germany [email protected] Institute for Information Transmission Problems RAS, 11, Pokrovsky boulevard, Moscow 109028, Russia
Abstract. In this paper, we consider two types of problems that have some similarity in their structure, namely, min-min problems and minmax saddle-point problems. Our approach is based on considering the outer minimization problem as a minimization problem with an inexact oracle. This inexact oracle is calculated via an inexact solution of the inner problem, which is either minimization or maximization problem. Our main assumption is that the available oracle is mixed: it is only possible to evaluate the gradient w.r.t. the outer block of variables which corresponds to the outer minimization problem, whereas for the inner problem, only zeroth-order oracle is available. To solve the inner problem, we use the accelerated gradient-free method with zeroth-order oracle. To solve the outer problem, we use either an inexact variant of Vaidya’s cutting-plane method or a variant of the accelerated gradient method. As a result, we propose a framework that leads to non-asymptotic complexity bounds for both min-min and min-max problems. Moreover, we estimate separately the number of first- and zeroth-order oracle calls, which are sufficient to reach any desired accuracy. Keywords: First-order methods · Zeroth-order methods Cutting-plane methods · Saddle-point problems
·
The research of A. Gasnikov and P. Dvurechensky was supported by Russian Science Foundation (project No. 21-71-30005). The research of E. Gladin, A. Sadiev and A. Beznosikov was partially supported by Andrei Raigorodskii scholarship. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 19–40, 2021. https://doi.org/10.1007/978-3-030-86433-0_2
20
E. Gladin et al.
1
Introduction
In this paper, we consider optimization problems in which the decision variable is decomposed into two blocks with minimization with respect to (w.r.t.) one block, which we call the outer block, and two types of operations w.r.t. the second block, which we call the inner block: minimization or maximization. In other words, we consider smooth min-min problems and min-max problems. The main difference between our setting and existing in the literature is that we assume that it is possible to evaluate the gradient w.r.t. the outer block of the variables, i.e., first-order oracle, and only function values, i.e., zeroth-order oracle, when we deal with the inner block of variables. Thus, we operate with a mixed type of oracle: first-order in one block of variables and zeroth-order in the second block of variables. Our motivation, firstly, comes from min-max saddle-point problems, which have recently become of an increased interest in machine learning community in application to training Generative Adversarial Networks [10], and other adversarial models [15], as well as to robust reinforcement learning [21]. The standard process is to simultaneously train neural networks, find adversarial examples and make the network distinguish the true examples from the artificially generated. In the training process, the gradient is available through the backpropagation, whereas for the generating adversarial examples, the network is sometimes available as a black box, and only zeroth-order oracle is available. Another close application area is Adversarial Attacks [11,26] on neural networks, in particular the Black-Box Adversarial Attacks [16]. Here the goal for a trained network is to find a perturbation of the data in such a way that the network outputs the wrong prediction. Then the training is repeated to make the network robust to such attacks. Since the attacking model does not have access to the architecture of the main network but only to the input and output of the network, the only available oracle for the attacker is the zeroth-order oracle for the loss function. The motivation for min-min problems comes from simulation optimization [7,24], where some parts of the optimized system can be given as a black box with unavailable or computationally expensive gradients, and other parts of the objective are differentiable. Separately zeroth-order [4] and first-order [19] are very well-developed areas of modern numerical optimization. There are also plenty of works on first-order [3,12,17,18] and zeroth-order methods [2,23,29] for saddle-point problems. Our main idea and contribution in this paper are to consider mixed oracles, which seems to be an underdeveloped area of optimization and saddle-point problems. Methods with mixed oracles were considered in [23] and [14], but, unlike this work, only in the context of saddle-point problems and without acceleration techniques. Notably, in this paper, we develop a generic approach that is suitable for both types of problems: min-min and min-max, and is based on the same idea for both problems: we consider the minimization problem w.r.t. the outer group of variables as a minimization problem with inexact oracle. This inexact oracle is evaluated via an inexact solution of the inner problem, which is either min-
Min-Min and Min-Max Problems with Mixed Oracles
21
imization or maximization problem. We carefully estimate with what accuracy one needs to solve the inner problem to be able to solve the outer problem with the desired accuracy. Moreover, we have to account for the random nature of the solution to the inner problem since we use randomized gradient-free methods with zeroth-order oracle to solve the inner problem. In our approach, we consider two settings for the outer problem. If the dimension of the outer problem is small, we use Vaidya’s cutting-plane method [27,28], for which we extend the analysis to the case of approximate subgradients. The drawback of this method is that it scales quite badly with the dimension. Thus, if the dimension of the outer problem is large, we exploit the accelerated gradient method, for which we develop an analysis in the case when an inexact oracle is available only with some probability, which may be of independent interest. Our approach based on inner-outer loops also allows to separate complexities, i.e., the number of calls to each of the oracles: first-order oracle for the outer block of variables and zeroth-order oracle for the inner block of variables. The rest of the paper is organized as follows. First, we consider min-max problems in two settings: small and large dimensions of the outer problem. In the first case, we develop an inexact variant of Vaidya’s method to use it in the outer loop in combination with the accelerated random gradient-free method in the inner loop. In the second case, we apply the accelerated gradient method in the outer loop combined with the same method in the inner loop. After that, we consider saddle-point min-min problems again in two settings. When the dimension of the outer problem is small, we use the same scheme with inexact Vaidya’s method and the accelerated random gradient-free method. The situation is more complicated when the dimension of the outer problem is large. In this case, we use a three-loop structure with the Catalyst acceleration scheme [13] combined with the accelerated gradient method with inexact oracle and the accelerated random gradient-free method (Table 1).
Table 1. The main results about the oracle complexities of the proposed approaches. Notation is introduced in subsequent paragraphs. Problem Dimension 0-th order nx ny Lyy min-min Small-scale O μy ny LLyy Large-scale O μμy nx ny Lyy min-max Small-scale O μy Lxx Lyy large-scale O ny + μx μy
1-st order (nx ) O L O μ
(nx ) O 2L2 Lxx Lyy xy O + μx μy μx μy
2L2 xy μx μy
22
2
E. Gladin et al.
Solving Min-Max Saddle-Point Problems
In this section, we consider the following problem: min max f (x, y),
(1)
x∈X y∈Rny
where X ⊆ Rnx is a closed convex set, f (x, y) is a convex-concave function (i.e. convex in x and concave in y) equipped with a mixed oracle, i.e. we have access to a first-order oracle for the outer problem (minimization w.r.t. x) and a zeroth-order oracle for the inner problem (maximization w.r.t. y). In the subsections below we describe two approaches for solving such problems together with additional assumptions they require. The general idea of the proposed approaches is as follows. Let us introduce the function f (x, y), ∀x ∈ X , (2) g(x) = max ny y∈R
and rewrite the initial problem (1) as min g(x).
(3)
x∈X
Using an iterative method for the outer problem (3) requires solving the inner problem (2) numerically on each iteration. An error of the solution of the inner problem results in an inexact oracle for the outer problem. 2.1
Small Dimension of the Outer Problem
The approach described in the present section requires the following assumptions about the problem (1): 1. X ⊂ Rnx is a compact convex set with nonempty interior; 2. nx is relatively small (up to a hundred); 3. f (x, y) is a continuous function which is convex in x and μy -strongly concave in y; 4. for all x ∈ X the function f (x, ·) is Lyy -smooth, i.e. ∇y f (x, y) − ∇y f (x, y )2 Lyy y − y 2 ,
∀y, y ∈ Rny .
5. for any x ∈ X the maximization problem (2) has a solution y(x). The algorithms used in the proposed approach and related convergence theorems are given in the subsequent paragraphs. Our proposed approach goes as follows:
Min-Min and Min-Max Problems with Mixed Oracles
23
Approach 1. The outer problem (3) is solved via Vaidya’s cutting plane method [27, 28]. The inner problem (31) is solved via Accelerated Randomized Directional Derivative method for strongly convex functions (ARDDsc) [6], see Algorithm 2.1 The complexity of the proposed Approach 1 is given in the following theorem. 2 Theorem 1. Approach 1 arrives at of the problem (3) after O(nx ) ε-solution nx ny Lyy calls to the zeroth-order oracle. calls to the first-order oracle and O μy
Remark 1. As far as the arithmetic complexity of the iteration is concerned, Vaidya’s cutting plane method involves inversions of nx × nx matrices, hence the assumption that nx is relatively small. The complexity bounds from Theorem 1 are derived in a subsequent paragraph. Vaidya’s Cutting Plane Method. Vaidya proposed a cutting plane method [27,28] for solving problems of the form min g(x),
(4)
x∈X
where X ⊆ Rn is a compact convex set with non-empty interior, and g : X → R is a continuous convex function. We will now introduce the notation and describe the algorithm. Let P = {x ∈ Rn : Ax b} be the bounded full-dimensional polytope, where A ∈ Rm×n and b ∈ Rm . The logarithmic barrier for P is defined as L(x) := −
m
ln a i x − bi ,
i=1 th where a row of A. The Hessian of L(x) is given by i is the i
H(x) =
m i=1
ai a i a i x − bi
2
(5)
and is positive definite for all x in the interior of P . The volumetric barrier for P is defined as 1 F (x) = ln (det H(x)) , 2 1
2
Here and below instead of ARDDsc we can use Accelerated coordinate descent methods [9, 20] with replacing partial derivatives by finite differences. In this case we √ lost opportunity to play on the choice of the norm (that could save ny -factor in gradient-free oracle complexity estimate [6]), but, we gain a possibility to replace the wort case Lyy to the average one (that could be ny -times smaller [20]). At the √ end this could also save ny -factor in gradient-free oracle complexity estimate [20]. = O(·) up to a small power of logarithmic factor. O(·)
24
E. Gladin et al.
where det H(x) denotes the determinant of H(x). The point ω that minimizes F (x) over P will be called the volumetric center of P . Let σi (x) be defined as −1
σi (x) =
ai a i (H(x)) 2 , ai x − bi
1 i m.
(6)
Now, let R be a radius of some Euclidean ball BR that contains X . Without loss of generality we will assume that BR is centered at the origin. The parameters of the method η > 0 and γ > 0 are small constants such that η 10−4 , and γ 10−3 η. The algorithm starts out with the simplex ⎧ ⎫ n ⎨ ⎬ P0 = x ∈ Rn : xj −R, j = 1, n, xj nR ⊇ BR ⊇ X (7) ⎩ ⎭ j=1
and produces a sequence of pairs (Ak , bk ) ∈ Rmk ×n × Rmk , such that the corresponding polytope Pk = {x ∈ Rn : Ak x bk } always contains a solution of the problem (4). At the beginning of each iteration k we have an approximation zk to the volumetric center of Pk (for more details on computing the approximation see [27,28]). In particular, on the 0-th iteration we can compute the volumetric center explicitly. Proposition 1. The volumetric center for P0 is ω = ω1n , where ω := and 1n denotes the vector (1, . . . , 1) ∈ Rn .
n−1 n+1 R
For k 0, the next polytope (Ak+1 , bk+1 ) is defined by either adding or removing a constraint to the current polytope, depending on the values m {σi (zk )}i=1 associated to Pk : 1. If for some i ∈ {1, . . . , m} one has σi (zk ) =
min σj (zk ) < γ, then
1jm
(Ak+1 , bk+1 ) is defined by removing the ith row from (Ak , bk ). 2. Otherwise, i.e. if min σj (zk ) γ, the oracle is called with the current point 1jm
zk as input. If zk ∈ X , it returns a vector ck , such that −ck ∈ ∂g(zk ), i.e. −ck is a subgradient of g at zk . Otherwise, it returns a vector ck such that c k x ck zk , ∀x ∈ X . We choose βk ∈ R such that ck zk βk and −1
c ck 1√ k (H(zk )) ηγ. 2 = 2 ck zk − βk Then we define (Ak+1 , bk+1 ) by adding the row given by c k , βk to (Ak , bk ). After N iterations, the method returns a point xN := arg min g(zk ). 1kN
Min-Min and Min-Max Problems with Mixed Oracles
25
Now, let us introduce the concept of inexact subgradient. Definition 1. The vector c ∈ Rn is called a δ-subgradient of a convex function g at z ∈ dom f (we denote c ∈ ∂δ g(z)), if g(x) g(z) + c (x − z) − δ,
∀x ∈ dom f.
In fact, a δ-subgradient can be used in Vaidya’s method instead of the exact subgradient. In this case, we will call the algorithm Vaidya’s method with δsubgradient. We will now present a theorem that justifies this claim. Theorem 2. Let Bρ and BR be some Euclidean balls of radii ρ and R, respectively, such that Bρ ⊆ X ⊆ BR , and let a number 1.5 B > 0 be such that 2n |g(x) − g(x )| B ∀x, x ∈ X . After N γ ln n γρR + γ1 ln π iterations Vaidya’s method with δ-subgradient for the problem (4) returns a point xN such that ln π − γN Bn1.5 R g(xN ) − g(x∗ ) exp + δ, (8) γρ 2n where γ > 0 is the parameter of the algorithm and x∗ is a solution of the problem (4). Accelerated Randomized Directional Derivative Method. We refer to the work [6]. For convenience, we present algorithms from this paper, taking into account that the problem will be a classical optimization problem: min f (x).
x∈Rn
(9)
For some τ > 0 and x ∈ Rn , the gradient approximation of f is defined as follows: n (10) gradf (x, τ, e) = (f (x + τ e) − f (x)) e, τ where e ∈ RS n2 (1), i.e. a random vector uniformly distributed on the surface of the unit Euclidean sphere in Rn . Definition 2. Let p ∈ [1, 2] and xp be the p-norm of x ∈ Rn , which is defined n p p n → R is called proxas xp = i=1 |xi | . The continuous function d : R n function if d is differentiable on R and 1-strongly convex w.r.t. · p -norm, i.e. 1 d(x ) − d(x) − ∇d(x), x − x x − x2p , ∀x, x ∈ Rn . 2 It is worth noting that in the case of p = 2, the prox-function d(x) looks like the squared Euclidean norm d(x) =
1 x22 , 2
∀x ∈ Rn .
(11)
26
E. Gladin et al.
Definition 3. Let d : Rn → R be a prox-function. For any two points x, x ∈ Rn we define the Bregman divergence Vx (x ) associated with prox-function d as follows: Vx (x ) = d(x ) − d(x) − ∇d(x), x − x . For the case of p = 2, the Bregman divergence Vx (x ) has the following form Vx (x ) =
1 x − x22 . 2
Let x∗ be a fixed point and x be a random vector such that Ex x−x∗ 2p Rp2 , then x − x∗ Ωp , (12) Ex d Rp 2 where Ex denotes the expectation w.r.t. random vector x, Ωp = 1 for p = 2 and the prox-function (11) [6]. Algorithm 1. Accelerated Randomized Directional Derivative (ARDD) method [6]. 1: Input: starting point x0 , number of iterations N , smoothness parameter L, τ . 2: w0 := x0 , z0 = x0 . 3: for k = 0, 1, 2, . . . , N − 1 do 4: Sample ek+1 ∈ RS n 2 (1). 5: Set 2 tk := , xk+1 := tk zk + (1 − tk )wk . k+2 6: Calculate gradf (xk+1 , τ, ek+1 ) using (10). 7: Compute 1 gradf (xk+1 , τ, ek+1 ). wk+1 := xk+1 − 2L 8: Set k+1 αk := . 96n2 L 9: Compute
zk+1 := argmin αk+1 gradf (xk+1 , τk , ek+1 ), z − zk + Vzk (z) . z∈Rn
10: end for 11: Output: wN
Min-Min and Min-Max Problems with Mixed Oracles
27
Algorithm 2. Accelerated Randomized Directional Derivative method for strongly convex functions (ARDDsc) [6]. 1: Input: starting point x0 s.t. x0 − x∗ 2p Rp2 , number of iterations N , strong convexity parameter μp . 2: u0 := x0 3: Set
8aL2 Ωp N0 = , μp 2 −1
where a = 384n2 ρn , ρn = min {q − 1, 16 ln n − 8} n q 4: for k = 0, 1, 2, . . . , N − 1 do 5: Set Rk2 = Rp2 2−k . 6:
Set dk (x) = Rk2 d
x−uk Rk
.
.
7: Run Algorithm 1 with starting point uk and prox-function dk (x) for N0 steps. 8: Set uk+1 = wN0 , k = k + 1. 9: end for 10: Output: uN .
Theorem 3 (see [6]). Let p ∈ [1, 2] and q ∈ [2, +∞] be defined such that p1 + 1q = 1. Let f be a μp -strongly convex and L2 -smooth w.r.t. ·p and ·2 , respectively. Applying Algorithm 2 to the problem (9) results in the following inequality: Ef (uN ) − minn f (x) x∈R
μp Rp2 −N 2 . 2
Moreover, the oracle complexity to achieve ε-accuracy of the solution is μp Rp2 1 1 L Ω 2 p + nq 2 · O . · log2 μp ε Analysis of Approach 1. Fix a point x ∈ X . The following lemma gives the recipe to obtaining the δ-subgradient c ∈ ∂δ g(x ) for the outer problem (3). Lemma 1 (see [22]). Let δ > 0 and y˜ ∈ Rny satisfy g (x ) − f (x , y˜) δ, ∀x ∈ X , then ∂x f (x , y˜) ∈ ∂δ g (x ). According to Lemma 1, we need to solve the inner problem (2) with accuracy δ to obtain the δ-subgradient. Now, to derive the complexity of Approach 1, we will use Theorem 2 as follows: ln π − γN Bnx1.5 R N g(x ) − g(x∗ ) exp + δ , γρ 2nx ε/2 ε/2
28
E. Gladin et al.
i.e. Vaidya’s method will perform 1.5 nx BR (nx ) , Nx = O nx ln =O ερ steps (first-order oracle calls), and at each of them ARDDsc will perform L yy ny Ny = O μy iterations (see Theorem 3). Thus, the number of zeroth-order oracle calls is Lyy , Nx · Ny = O nx ny μy which finishes the analysis of Approach 1. 2.2
Large Dimension of the Outer Problem
For a detailed study of the convergence of the methods, we introduce some assumptions about the objective function f (x, y). Assumption 1. f (x, y) is convex-concave. It means that f (·, y) is convex for all y and f (x, ·) is concave for all x. Assumption 1(s). f (x, y) is strongly-convex-strongly-concave. It means that f (·, y) is μx -strongly convex for all y and f (x, ·) is μy -strongly concave for all x w.r.t. · 2 , i.e. for all x1 , x2 ∈ X and for all y1 , y2 ∈ Rny we have μx x1 − x2 22 , 2 μy y1 − y2 22 . −f (x2 , y1 ) −f (x2 , y2 ) − ∇y f (x2 , y2 ), y1 − y2 + 2 f (x1 , y2 ) f (x2 , y2 ) + ∇x f (x2 , y2 ), x1 − x2 +
(13)
Assumption 2. f (x, y) is (Lxx , Lxy , Lyy )-smooth w.r.t · 2 , i.e. for all x, x ∈ X , y, y ∈ Rny ∇x f (x, y) − ∇x f (x , y)2 Lxx x − x 2 ; ∇x f (x, y) − ∇x f (x, y )2 Lxy y − y 2 ; ∇y f (x, y) − ∇y f (x , y)2 Lxy x − x 2 ; ∇y f (x, y) − ∇y f (x, y )2 Lyy y − y 2 .
(14)
As mentioned above, we have access to a first-order oracle ∇x f (x, y) for the outer problem (minimization problem with variables x) and a zeroth-order oracle f (x, y) for the inner problem (maximization problem with variables y). Since we do not have access to the values of the gradient ∇y f (x, y), it is logical
Min-Min and Min-Max Problems with Mixed Oracles
29
to approximate it using finite differences using the value of the function f (x, y) at two close points as follows gradf (x, y, τ, e) = −
n (f (x, y + τ e) − f (x, y)) e, τ
(15)
n
where e ∈ RS 2 y (1), i.e. is a random vector uniformly distributed on the surface of the unit Euclidean sphere in Rny . So we get a mixed oracle ∇x f (x, y) G(x, y, τ, e) = . (16) gradf (x, y, τ, e) Using the mixed oracle (16), we provide our approach for solving the initial saddle-point problem (1). First, we can use the following trick with the help of Sion’s theorem: min f (x, y) = max h(y), where h(y) = min f (x, y). min max f (x, y) = max ny ny
x∈X y∈Rny
y∈R
x∈X
x∈X
y∈R
For the new problem, we apply the Catalyst algorithm [13] to the outer maximization problem: max f (x, y) . (17) h(y) = min ny x∈X
y∈R
Algorithm 3. Catalyst [13] 1: Input: starting point x0 , parameters H1 and α0 , accuracy of solution to subproblem ε˜, optimization method M. 2: Initialize μy q= (μy + H1 ) 3: while the desired stopping criterion is not satisfied do 4: Find an approximate solution of the following problem using M: H1 (18) y − zk−1 22 , yk ≈ argmax ϕk (y) = h(y) − 2 y∈Rny such that ϕ∗k − ϕk (yk ) ε˜ 5:
Compute αk ∈ (0, 1) from equation 2 + qαk αk2 = (1 − αi )αk−1
6:
Compute zk = yk + βk (yk − yk−1 ) , where βk =
7: end while 8: Output: yf inal .
αk−1 (1 − αk−1 ) 2 αk−1 + αk
30
E. Gladin et al.
Now the question arises how to solve the auxiliary problem (18). This subproblem is equivalent to solving the following problem H1 H1 2 2 y − z y − z {f (x, y)} − max h(y) − = max min k−1 k−1 2 2 , y∈Rny y∈Rny x∈X 2 2 for which we again use the Sion’s theorem and equivalently rewrite the problem as H1 H1 2 2 y − zk−1 2 = min max y − zk−1 2 . f (x, y) − max min f (x, y) − x∈X y∈Rny y∈Rny x∈X 2 2 For convenience, we denote ψ(x, y) = f (x, y) −
H1 y − zk 22 . 2
(19)
Thus, to solve the auxiliary problem (18), we first solve the following saddle-point problem H1 2 y − zk−1 2 . min max ψ(x, y) = min max f (x, y) − (20) x∈X y∈Rny x∈X y∈Rny 2 This saddle-point problem (20) can be considered as an optimization problem for a certain function. Indeed, let us introduce a function ψ(x, y), ξ(x) = max ny y∈R
(21)
and rewrite the initial problem (20) as follows: min ξ(x).
x∈X
(22)
To solve problem (20), we solve the outer minimization problem w.r.t. the variable x by the fast adaptive gradient method with inexact oracle. In each iteration of this method, to find the inexact first-order oracle for the outer problem, we solve the inner problem (21). Since for this inner problem we have access only to the zeroth-order oracle, we use accelerated gradient-free method ARDDsc [6]. Our approach is summarized as follows. Approach 2. The outer problem (3) is solved via Catalyst Algorithm 3. The subproblem (18) is solved as the saddle-point problem (20). The outer problem (22) is solved via Fast Adaptive Gradient Method (Algorithm 4). At each iteration of Algorithm 4 the inner problem (21) is solved via ARDDsc, see Algorithm 2, for case p = 2 (that is, prox-function d(x) = 12 x22 , see Definition 2, Bregman divergence Vx (x ) = 12 x − x22 , see Definition 3, and Ωp = Ω2 = 2). Analysis of Fast Adaptive Gradient Method with (δ, σ, L, μ)-Oracle (Algorithm 4). For our analysis, due to the fact that Algorithm 2 is randomized, we need not just a fast gradient method for solving the outer problem (22), but a
Min-Min and Min-Max Problems with Mixed Oracles
31
fast adaptive gradient method for (δ, σ, L, μ)-oracle. This is the extension of the fast adaptive gradient method from [25]. To understand this problem in depth and in detail, we need to carefully consider the concept of (δ, σ, L, μ)- oracle and perform a deep and thorough convergence analysis of the fast gradient method using such a seemingly unusual oracle. To that end, we consider the following general minimization problem: min f (x).
x∈X
(23)
Definition 4 ((δ, σ, L, μ)-oracle). Let function f be convex on convex set X . We say that it is equipped with a first-order (δ, σ, L, μ)-oracle if, for any x ∈ X , we can compute a pair (fδ,L,μ (x ), gδ,L,μ (x )) ∈ R×Rnx such that with probability at least 1 − σ μ L x − x2 f (x) − (fδ,L,μ (x ) + gδ,L,μ (x ), x − x ) x − x 2 + δ. (24) 2 2 Definition 5 ((ε, σ)-solution). Let ε > 0 be the target accuracy of the solution and σ ∈ (0, 1) be the target confidence level. We say that a random point x ˆ∈X is (ε, σ)-solution to problem (23) if P f (ˆ x) − min f (x) ε 1 − σ. (25) x∈X
If σ = 0, we say that x ˆ ∈ X is an ε-solution to problem (23).
Algorithm 4. Fast adaptive gradient method with (δ, σ, L, μ)-oracle [25] 1: Input: starting point x0 , L0 > 0, μ 0, sequence {δ}k0 . 2: y0 := x0 , u0 := x0 , α0 := 0, A0 := α0 3: for k 0 do 4: Find the smallest integer ik 0 such that Lk+1 xk+1 −yk+1 22 +δk , fδk ,L,μ (xk+1 ) fδk ,L,μ (yk+1 )+ gδ,L,μ (yk+1 ), xk+1 − yk+1 + 2
5:
where Lk+1 = 2ik −1 Lk . Compute αk+1 such that αk+1 is the largest root of Ak+1 (1 + Ak μ) = Lk+1 α2k+1 , where Ak+1 := Ak + αk+1
6:
yk+1 =
αk+1 uk +Ak xk Ak+1
(1 + Ak μ) αk+1 μ φk+1 (x) = αk+1 gδ,L,μ (yk+1 ), x − yk+1 + x − uk 22 + x − yk+1 22 2 2
7:
uk+1 := argmin φk+1 (x) α
x∈X uk+1 +Ak xk Ak+1
8: xk+1 = k+1 9: end for 10: Output: xk+1 .
32
E. Gladin et al.
Note that the problem argmin φk+1 (x) is solved exactly in each iteration. x∈X
Theorem 4. Let function f be convex on convex set X and be equipped with a first-order (δ, σ, L, μ)-oracle. Then, after N iterations of Algorithm 4 applied to problem (23), we have that with probability at least (1 − N σ):
N −1 f (xN ) − f (x ) 2L exp − 2 ∗
N −1 2 k=0 Ak+1 δk μ 2 , R2 + L AN
(26)
where R2 is such that 12 x0 − x∗ 22 R22 and x0 is the starting point. Corollary 1. Let function f be convex on convex set X and be equipped with σ , L, μ)-oracle. If the sequence {δ}k0 is bounded by δ, we have a first-order (δ, N with probability at least (1 − σ): L N −1 μ ∗ 2 δ (27) f (xN ) − f (x ) 2L exp − R2 + 1 + 2 L μ where R2 is such that 12 x0 − x∗ 22 R22 and x0 is the starting point. To prove this statement, we give an auxiliary lemma. Lemma 2 (see [5]). The sequence {Ak }k0 satisfies k L i=0 Ai 1+ Ak μ
(28)
We get the result of Corollary 1 immediately using (26) and Lemma 2. Analysis of Approach 2. For further analysis, we present the main lemma of this subsection Lemma 3 (see [1]). We denote yf∗ (x) = argmax f (x, y), x∗f (y) = argmin f (x, y). y∈Rny
x∈X
Under Assumption 1(s), 2 we have – Function x∗f (y) is yf∗ (x)
Lxy μx Lxy μy
-Lipschitz continuous w.r.t. the norm · 2 .
-Lipschitz continuous w.r.t. the norm · 2 2L2 – Function g(x) (see (2)) is Lg := Lxx + μyxy -smooth w.r.t. the norm · 2 . f (x, y). Then, for any – Let yδ (x) be a (δ, σ)-solution to the problem max ny – Function
is
y∈R
x , x ∈ X , with probability at least 1 − σ we have:
μx 2Lg x−x 22 g(x )−f (x, yδ (x))−∇x f (x, yδ (x)), x − x x −x22 +2δ. 2 2
Min-Min and Min-Max Problems with Mixed Oracles
33
– We define f (x, y), h(y) = min f (x, y). g(x) = max ny x∈X
y∈R
Let x ˆ be (εx , σx )-solution of the problem min g(x), let yεy (ˆ x) be (εy , σy )x∈X
f (ˆ x, y). Then yεy (ˆ x) is (˜ ε, 1 − σx − σy )-solution solution of the problem max ny y∈R
to the problem (1), where 2L2xy L2xy Lyy 2L4xy Lyy ε˜ = εy + + + 2 2 εx . μy μx μy μx μ2y μx μy
(29)
Now we are ready to present the main result of this section Theorem 5. Let ε > 0 be the target accuracy of the solution to the problem (1) and σ ∈ (0, 1) be the target confidence level. Let the auxiliary problems (2), (3) be solved with accuracies ⎛ −1 ⎞ L2xy Lyy 2L4xy ⎝ε ⎠; + 2 εx = O μx (μy + Lyy )2 μx (μy + Lyy )2 ⎛ ⎝ε εy = O
Lyy μy + Lyy
2L2xy + μx (μy + Lyy )
−1
Lxx μx
2L2xy + μx (μy + Lyy )
−1/2 ⎞ ⎠
and confidence levels
μy σx = O σ ; Lyy ⎛ −1/2 ⎞ 2 2L xy ⎝σ Lxx Lyy + ⎠, σy = O μx μy μx μy
that is, a (εx , σx )−solution to the problem (3) and a (εy , σy )−solution to the problem (2) are found (see Definition 5). Then, under assumptions 1(s), 2, the proposed Approach 2 guarantees to find an (ε, σ)-solution to the problem (1). Moreover, the required number of calls to the first-order oracle ∇x f (x, y) and the zeroth-order oracle f (x, y) satisfy the following bounds 2L2xy Lxx Lyy + , Total Number of Calls for ∇x f (x, y) is O μx μy μx μy 2L2xy L L xx yy ny . + Total Number of Calls for f (x, y) is O μx μy μx μy
34
E. Gladin et al.
3
Solving Min-Min Problems
Consider the problem min min f (x, y),
(30)
x∈X y∈Rny
where X ⊆ Rnx is a closed convex set, f (x, y) is a convex function equipped with a mixed oracle, i.e. we have access to a first-order oracle for the outer problem (minimization w.r.t. x) and a zeroth-order oracle for the inner problem (minimization w.r.t. y). In the sections below we will describe the two approaches for solving such problems together with additional assumptions they require. The general idea of the proposed approaches is as follows. Let us introduce the function f (x, y) (31) g(x) = min ny y∈R
and rewrite the initial problem (30) as min g(x).
(32)
x∈X
Using an iterative method for the outer problem (32) requires solving the inner problem (31) numerically on each iteration. An error of the solution of the inner problem results in an inexact oracle for the outer problem. 3.1
Small Dimension of the Outer Problem
The approach described in the present subsection requires the following assumptions about the problem (30): 1. 2. 3. 4.
X ⊂ Rnx is a compact convex set with nonempty interior; nx is relatively small (up to a hundred); f (x, y) is a continuous convex function which is also μy -strongly convex in y; for all x ∈ X the function f (x, ·) is Lyy -smooth, i.e. ∇y f (x, y) − ∇y f (x, y )2 Lyy y − y 2 ,
∀y, y ∈ Rny .
5. for any x ∈ X the minimization problem (31) has solution y(x), and the mapping y(x) is continuous. The algorithms used in the Approach and related convergence theorems were given in the previous section. The proposed Approach goes as follows. Approach 3. The outer problem (32) is solved via Vaidya’s cutting plane method. The inner problem (31) is solved via ARDDsc, see Algorithm 2. The complexity of the proposed Approach 3 is given in the following theorem. Theorem 6. Approach 3 arrives at ε-solution of the problem (32) after O(nx ) Lyy calls to the first-order oracle and O nx ny μy calls to the zeroth-order oracle.
Min-Min and Min-Max Problems with Mixed Oracles
35
Remark 2. As far as the arithmetic complexity of the iteration is concerned, Vaidya’s cutting plane method involves inversions of nx × nx matrices, hence the assumption that nx is relatively small. The complexity bounds from Theorem 6 are derived in the paragraph Analysis of Approach which follows the description of algorithms. Analysis of Approach 3. Fix a point x ∈ X . The following theorem gives the recipe to obtaining the δ-subgradient c ∈ ∂δ g(x ) for the outer problem (32): ˜ then Theorem 7. Let δ˜ > 0 and y˜ ∈ Rny satisfy f (x , y˜) − g(x ) δ, ∂x f (x , y˜) ∈ ∂δ g(x ) with Lyy Dδ˜ δ=2 , (33) μy where D := max (f (x, y˜) − g(x)) < +∞. x∈X
Theorem 7 is based on the two following lemmas. Lemma 4. Let h : Rny → R be an L-smooth convex function, and let the point y ) − h(y∗ ) δ˜ for some δ˜ > 0, where y∗ ∈ Argmin h(y). Then y˜ ∈ Rny satisfy h(˜ y∈Rny
∇h(˜ y ), y˜ − y ˜ y − y2
˜ 2Lδ,
∀y ∈ Rny .
Lemma 5 (see [8], p.12). Let δ > 0 and y˜ ∈ Rny satisfy ∇y f (x , y˜), y˜ − y(x) δ,
∀x ∈ X ,
(34)
then ∂x f (x , y˜) ∈ ∂δ g(x ). According to Theorem 7, we need to solve the inner problem (31) with sufficient accuracy to obtain the δ-subgradient for g. Now, to derive the complexity of Approach 3, we will use Theorem 2 as follows: ln π − γN Bnx1.5 R exp g(xN ) − g(x∗ ) + δ , γρ 2nx ε/2 ε/2
i.e. Vaidya’s method will perform
1.5 nx BR Nx = O nx ln ερ
steps (first-order oracle calls), and at each of them ARDDsc will perform Lyy Ny = O ny μy iterations (see Theorem 3). Thus, the number of zeroth-order oracle calls is L yy nx ny Nx · N y = O , μy which finishes the analysis of Approach 3.
36
3.2
E. Gladin et al.
Large Dimension of the Outer Problem
The approach described in the present subsection requires the following assumptions about the problem (30): 1. f (x, y) is twice continuously differentiable, L-smooth and μ-strongly convex as a function of both variables; 2. for all x ∈ X the function f (x, ·) is μy -strongly convex and Lyy -smooth; 3. the solution y(x) to the minimization problem (31) is a continuously differentiable mapping. Our proposed approach goes as follows. Approach 4. The outer problem (32) is solved via the Fast Adaptive Gradient Method (Algorithm 4). The inner problem (31) is solved via Accelerated Randomized Directional Derivative method for strongly convex functions (ARDDsc), see Algorithm 2. The complexity of the proposed approach 4 is given in the following theorem. Theorem of the problem (32) after 8. Approach 4 provides an ε-solution L ny LLyy calls to the zeroth calls to the first-order oracle and O O μ μμy order oracle. Analysis of Approach 4. Fix a point x ∈ X . The following theorem gives the recipe for obtaining a (δ, L, μ)–oracle (see Definition 4 with σ = 0) for g at x . ˜ then Theorem 9. Let δ˜ > 0 and y˜ ∈ Rny satisfy f (x , y˜) − g(x ) δ, (f (x , y˜) − 2δ, ∇x f (x , y˜)) is a (3δ, 2L − μ, μ)–oracle for g at x , where LDδ˜ δ=2 , D := max (f (x, y˜) − g(x)) < +∞. x∈X μ
(35)
Theorem 9 is based on the two following lemmas. Lemma 6. Let δ > 0 and y˜ ∈ Rny satisfy ∇y f (x , y˜), y˜ − y(x) δ,
∀x ∈ X ,
(36)
then f (x , y˜) − g(x ) δ and g(x) g(x ) + ∇x f (x , y˜), x − x + Lemma 7. The following statements hold:
μ x − x 22 − δ, 2
∀x ∈ X .
Min-Min and Min-Max Problems with Mixed Oracles
37
1. g is L-smooth, where L is the smoothness parameter of f (x, y) as a function of both variables; 2. Let δ > 0 and y˜ ∈ Rny satisfy ∇y f (x , y˜), y˜ − y (x) δ,
∀x ∈ X ,
(37)
then (f (x , y˜) − 2δ, ∇x f (x , y˜)) is a (3δ, 2L − μ, μ)–oracle for g at x . According to Theorem 9, we need to solve the inner problem (31) with sufficient accuracy to obtain a (3δ, 2L − μ, μ)–oracle for g. Now, to derive the complexity of Approach 4, we will use Corollary 1 as follows μ 2L − μ N −1 N 2 3δ , + 1+ g(x ) − g(x∗ ) 2(2L − μ)R2 exp − 2 2L − μ μ ε/2
ε/2
i.e. the Fast Adaptive Gradient Method will perform L Nx = O μ steps (first-order oracle calls), and at each of them ARDDsc will perform L yy ny Ny = O μy iterations (see Theorem 3). Thus, the number of zeroth-order oracle calls is LL yy ny Nx · N y = O , μμy which finishes the analysis of Approach 4.
4
Experiments
Adversarial attack aims at creating an adversarial example y = y0 + δ ∈ Rny that fools a machine learning (ML) system, where y0 denotes the original example with the true label t0 , and δ is an adversarial perturbation. This goal can be formulated as the following optimization problem max (δ; y0 , t0 ) −
δ ∈Rny
γ 2 δ2 , 2
(38)
where (δ; y0 , t0 ) is some objective function the attack seeks to maximize (e.g. loss of the attacked model), and γ > 0 is a parameter that controls amount of perturbation. Large values of γ ensure similarity between y and y0 .
38
E. Gladin et al. Table 2. Accuracy on original and adversarial examples Model
Accorig Accadvers
SVM #1
0.94
SVM #2
0.34
0.94
0.36
LogReg #1 0.96
0.44
LogReg #2 0.96
0.44
Fig. 1. Original (top) and adversarial (bottom) examples K
Given K Machine Learning models {Mi }i=1 , the goal of finding robust adversarial examples that can fool all K models simultaneously leads to the following reformulation of the problem (38) max min ny
δ ∈R
w∈P
K
wi (δ; y0 , t0 , Mi ) −
i=1
γ 2 δ2 , 2
(39)
where (δ; y0!, t0 , Mi ) is the loss of model Mi , and P denotes the probability " simplex P = w | 1 w = 1, wi 0, ∀i . Elements of w have the meaning of the difficulty level of attacking each model. Using Sion’s theorem, we rewrite the problem (39) as follows: min max ny
w∈P δ ∈R
K
wi (δ; y0 , t0 , Mi ) −
i=1
γ 2 δ2 . 2
(40)
If the number of models K is not too large (up to a few dozens), and the loss is smooth as a function of δ, then such a problem satisfies the requirements of the proposed approach for min-max problems that have a small dimension of the outer block. 4.1
Experimental Setup
We considered classification problem on MNIST (http://yann.lecun.com/exdb/ mnist/). The training set was divided into two parts of equal size. Two models were trained on each part: a support-vector machine (SVM) and a logistic regression model, adding up to K = 4 models in total. Cross entropy loss was chosen as the objective function for adversarial attacks: ⎞ ⎛ 9 (δ; y0 , t0 , Mi ) = ln ⎝ evij y ⎠ − vit y, 0 j=0
Min-Min and Min-Max Problems with Mixed Oracles
39
where y = y0 + δ, t0 ∈ {0, . . . , 9}, vij is a vector of parameters of the i-th model corresponding to the j-th class. The experiment consisted of 50 adversarial attacks. In each of them, a separate image from the test set was used to create an adversarial example by solving the problem (40) via the Approach 1. The number of outer iterations (steps performed by Vaidya’s cutting plane method) was set to 200. During each outer step, the number of iterations of the inner loop (performed by ARDDsc method) equals 87. Despite the fact that linear classification models are known to be robust, the attacks caused a drastic drop in accuracy, as depicted on Table 2. Figure 1 depicts some of the examples that fooled all 4 models simultaneously. The source code is available at https://github.com/egorgladin/mixed oracle.
References 1. Alkousa, M., Dvinskikh, D., Stonyakin, F., Gasnikov, A., Kovalev, D.: Accelerated methods for composite non-bilinear saddle point problem (2020). https://doi.org/ 10.1134/S0965542520110020 2. Beznosikov, A., Sadiev, A., Gasnikov, A.: Gradient-free methods for saddle-point problem. arXiv preprint arXiv:2005.05913 (2020).https://doi.org/10.1007/978-3030-58657-7 11 3. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120–145 (2011) 4. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to derivative-free optimization. Soc. Ind. Appl. Math. (2009). https://doi.org/10.1137/1.9780898718768 5. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case (2013). http://hdl.handle.net/2078.1/128723 6. Dvurechensky, P., Gorbunov, E., Gasnikov, A.: An accelerated directional derivative method for smooth stochastic convex optimization. Eur. J. Oper. Res. 290(2), 601–621 (2021). https://doi.org/10.1016/j.ejor.2020.08.027 7. Fu, M.C. (ed.): Handbook of Simulation Optimization. ISORMS, vol. 216. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-1384-8 8. Gasnikov, A., et al.: Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems. arXiv preprint arXiv:1506.00292 (2015) 9. Gasnikov, A., Dvurechensky, P., Usmanova, I.: On accelerated randomized methods. Proceedings of Moscow Institute of Physics and Technology 8, pp. 67–100. Russian (2016) 10. Goodfellow, I.J., et al.: Generative adversarial networks (2014) 11. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014) 12. Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Eknomika i Matematicheskie Metody 12, 747–756 (1976) 13. Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/ c164bbc9d6c72a52c599bbb43d8db8e1-Paper.pdf
40
E. Gladin et al.
14. Liu, S., et al.: Min-max optimization without gradients: convergence and applications to adversarial ml (2019). http://proceedings.mlr.press/v119/liu20j.html 15. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference Track Proceedings (2018) 16. Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial attacks on deep neural networks. In: CVPR Workshops. pp. 1310–1318. IEEE Computer Society (2017). http://doi.ieeecomputersociety.org/10.1109/CVPRW.2017.172 17. Nedi´c, A., Ozdaglar, A.: Subgradient methods for saddle-point problems. J. Optim. Theory Appl. 142(1), 205–228 (2009) 18. Nemirovski, A.: Prox-method with rate of convergence o (1/ t ) for variational inequalities with lipschitz continuous monotone operators and smooth convexconcave saddle point problems. SIAM J. Optim. 15, 229–251 (2004). https://doi. org/10.1137/S1052623403425629 19. Nesterov, Y.: Lectures on Convex Optimization. SOIA, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4 20. Nesterov, Y., Stich, S.U.: Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1), 110–123 (2017) 21. Pinto, L., Davidson, J., Sukthankar, R., Gupta, A.: Robust adversarial reinforcement learning. Proceedings of Machine Learning Research, 06–11 August 2017, vol. 70, pp. 2817–2826. PMLR, International Convention Centre, Sydney (2017). http://proceedings.mlr.press/v70/pinto17a.html 22. Polyak, B.T.: Introduction to Optimization. Publications Division, Inc., New York (1987) 23. Sadiev, A., Beznosikov, A., Dvurechensky, P., Gasnikov, A.: Zeroth-order algorithms for smooth saddle-point problems. arXiv:2009.09908 (2020) 24. Shashaani, S., Hashemi, F.S., Pasupathy, R.: Astro-df: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J. Optim. 28(4), 3145–3176 (2018). https://doi.org/10.1137/15M1042425 25. Stonyakin, F., et al.: Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model (2020). https://doi.org/10. 1080/10556788.2021.1924714 26. Tram`er, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses (2017). https://openreview. net/forum?id=rkZvSe-RZ 27. Vaidya, P.M.: A new algorithm for minimizing convex functions over convex sets. In: 30th Annual Symposium on Foundations of Computer Science, pp. 338–343. IEEE Computer Society (1989) 28. Vaidya, P.M.: A new algorithm for minimizing convex functions over convex sets. Math. Program. 73(3), 291–341 (1996) 29. Wang, Z., Balasubramanian, K., Ma, S., Razaviyayn, M.: Zeroth-order algorithms for nonconvex minimax problems with improved complexities (2020)
A Subgradient Projection Method for Set-Valued Network Equilibrium Problems Igor Konnov1
and Olga Pinyagina2(B)
1
Institute of Computational Mathematics and Information Technologies, Department of System Analysis and Information Technologies, Kazan Federal University, Kazan, Russia [email protected] 2 Institute of Computational Mathematics and Information Technologies, Department of Data Mining and Operations Research, Kazan Federal University, Kazan, Russia [email protected]
Abstract. In the present work, we describe a general set-valued variant of the network equilibrium problem with fixed demand. This problem is equivalent to a set-valued variational inequality. Under certain additional assumptions, it can be replaced with a nonsmooth convex optimization problem. We propose to apply the subgradient projection method with a special two-speed step-size choice procedure to this problem. Computational experiments on model networks showed that the proposed approach is rather efficient. It gives a more flexible procedure for the choice of parameters. Keywords: Set-valued network equilibrium problem · Nonsmooth optimization problem · Subgradient projection method. · Two-speed step-size choice
1
Introduction
The network equilibrium problems are destined for modeling complex distributed systems and have various applications, in particular, in communication and transportation. Their theory and methods are developed rather well; see e.g. [1, Chapter IV], [2] and references therein. The models are usually determined on an oriented graph, each of its arc being associated with some flow (for instance, traffic) and some cost (or dis-utility, for instance, time of delay, shipping expenses, etc.), which depends on the values of arc flows. As a rule, arc costs functions are supposed to be single-valued that reflects a rather smooth adjustment behavior of the system. At the same time, it is natural to suppose that arc costs may have several different working regimes, hence these functions have switching points In this work, the authors were supported by the RFBR grant, project No. 19-01-00431. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 41–53, 2021. https://doi.org/10.1007/978-3-030-86433-0_3
42
I. Konnov and O. Pinyagina
with a set of values at each of them, hence, arc costs may be represented by set-valued mappings. But at the present moment, set-valued variants of network equilibrium problems are not sufficiently investigated. In particular, a network equilibrium problem with set-valued elastic demand, which is reduced to a mixed variational inequality (MVI for short), was considered in [3]. This MVI was proposed to be solved with a partial linearization method, which showed rather suitable convergence properties. In the present work, we intend to consider the generalized network equilibrium problems with set-valued cost mappings and fixed demand. This problem reduces to a set-valued or generalized variational inequality (GVI for short). In the case where arc cost mappings are potential, it can be replaced by a nonsmooth optimization problem. For solving this problem, we propose to apply the subgradient projection method with a special two-speed step-size choice procedure. This procedure falls into the general scheme of the known divergent stepsize series rule, but it contains auxiliary subseries of linearly decreased step-sizes and returns to the next step value of the main series. Computational experiments on model networks showed that the proposed approach is rather efficient. It gives a more flexible procedure for the choice of parameters. The paper is organized as follows. In Sect. 2, we recall the classic network equilibrium problem with fixed demand. In Sect. 3, we describe a set-valued extension of this problem and discuss its main properties. In Sect. 4, we describe a modification of the subgradient projection method applied to the proposed problem. In Sect. 5, we present the results of some computational experiments on test network models.
2
Network Equilibrium Problems
We first describe the classical network equilibrium problems with fixed demand [4,5]. Let V be a set of network nodes, A be a set of directed arcs (links). A set W ⊆ V × V of the so-called origin-destination (O/D) pairs (i, j), i, j ∈ V is also given. For each O/D-pair w ∈ W , a fixed demand value dw is supposed to be known. It presents a flow outgoing from the origin and ingoing to the destination. For each O/D-pair w ∈ W , there is a set of joining paths Pw , each path is a simple chain of arcs starting at the origin node and ending at the destination node of the O/D-pair. We denote by xp a variable flow value passing along path p, for all p ∈ Pw , w ∈ W . The network equilibrium problem is to find a distribution of given demands among the sets of paths of all O/D pairs subordinated to a certain equilibrium criterion. The feasible set of path flows has the form: ⎫ ⎧ ⎬ ⎨ xp = dw , xp ≥ 0, p ∈ Pw , w ∈ W . (1) X= x ⎭ ⎩ p∈Pw
Next, we need also the paths-arcs incidence matrix A with the elements
A Subgradient Projection Method
αpa
1, = 0,
43
if arc a belongs to path p ; otherwise,
a ∈ A, p ∈ Pw , w ∈ W . Then the arc flow value is calculated as the sum of the corresponding path flows for each arc a ∈ A: αpa xp . (2) fa = w∈W p∈Pw
In this model, a continuous cost function ca is attributed to each a ∈ A, its value can depend on all the arc flows in general. Then the path cost function is defined as follows: αpa ca (f ), gp (x) = a∈A
for each path p, where f is the vector of all arc flows fa , a ∈ A. We denote by c(f ) the arc cost vector with the components ca (f ) for all a ∈ A. Similarly, we denote by g(x) the path cost vector with the components gp (x) for all p ∈ Pw and w ∈ W , then we can write f = A x and g(x) = Ac(f ) = Ac(A x).
(3)
A feasible flow vector x∗ ∈ X is said to be an equilibrium point if it satisfies the following conditions: ∀w ∈ W, ∀p ∈ Pw ,
x∗p > 0 =⇒ gp (x∗ ) = min gq (x∗ ). q∈Pw
(4)
In other words, positive values of path flow for any (O/D) pair correspond to paths with minimal cost at the current flow distribution and all the costs along used paths (with nonzero path flows) are equal. It is known that the conditions in (4) can be equivalently rewritten in the form of VI: Find a vector x∗ ∈ X such that gp (x∗ )(xp − x∗p ) = g(x∗ ), x − x∗ ≥ 0 ∀x ∈ X. (5) w∈W ∀p∈Pw
It is easy to see that the feasible set X defined in (1) is clearly bounded, hence VI (5) and the equivalent network equilibrium problem with fixed demands are solvable if all the cost mappings ca , a ∈ A are continuous; see e.g. [1, p.162]. If we assume that each arc cost function ca depends on fa only, ∀a ∈ A, then the mapping g is potential, and there exist the functions
fa μa (fa ) =
ca (t)dt 0
∀a ∈ A.
44
I. Konnov and O. Pinyagina
Then VI (5) presents the optimality condition for the following optimization problem: (6) min −→ μ(x), x∈X
where μ(x) =
μa (fa ),
a∈A
the arc flows fa , a ∈ A are defined in (2). Therefore, problem (6) implies VI (5). The reverse assertion is true if for example the mappings ca are monotone for all a ∈ A.
3
Set-Valued Network Equilibrium Problems
The main difference of the proposed model from the classical network equilibrium problem is that the arc cost mapping f → c(f ) can be set-valued at some flow distributions. That is, we suppose that network arc costs may have several different working regimes on different parts of their domain, hence they can be set-valued at switching points. For this reason, we now replace the single-valued arc cost mapping f → c(f ) with a set-valued arc cost mapping f → C(f ) so that C(f ) is a set and c(f ) = (ca (f ))a∈A if c(f ) ∈ C(f ). By analogy with (3) we can define (7) G(x) = AC(f ) = AC(A x), so that g(x) = (gp (x))p∈Pw ,w∈W if g(x) ∈ G(x). A feasible flow vector x∗ ∈ X is said to be an equilibrium point for this model if it satisfies the following conditions: ∃g(x∗ ) ∈ G(x∗ ), ∀w ∈ W, ∀p ∈ Pw ,
x∗p > 0 =⇒ gp (x∗ ) = min gq (x∗ ). q∈Pw
(8)
By analogy with the single-valued case we can conclude that the conditions in (8) can be equivalently rewritten in the form of GVI: Find a vector x∗ ∈ X such that ∃g(x∗ ) ∈ G(x∗ ), gp (x∗ )(xp −x∗p ) = g(x∗ ), x−x∗ ≥ 0 ∀x ∈ X. (9) w∈W ∀p∈Pw
This formulation enables us to obtain existence results. In the following, we will suppose that the mapping f → C(f ) is upper semicontinuous and have nonempty, convex, and compact values for non-negative flows. Then clearly so is the mapping x → G(x) on X. Since the feasible set X is bounded, then GVI (9) has a solution; see e.g. [6, Theorem 12.7]. Next, we can also select a suitable iterative method for problem (8) under certain additional assumptions. Namely, we take the simple potential monotone case where each separate arc cost mapping fa → Ca (fa ) is monotone and the cost Ca only depends on the flow fa of this arc. Then C(f ) = (Ca (f ))a∈A and Ca (fa ) is a segment in the real line. Moreover, each fa → Ca (fa ) is a potential monotone mapping, hence
A Subgradient Projection Method
45
it is the subdifferential of a continuous convex function ηa : R+ → R that is non-smooth is general. By analogy with the single-valued case we can conclude that GVI (9) is equivalent to the non-smooth convex optimization problem min −→ η(x),
x∈X
where η(x) =
(10)
ηa (fa );
a∈A
cf. (5) and (6). Clearly, problem (10) has a solution, which is not necessarily unique. We denote by η ∗ the optimal value of the objective function. Therefore, we can now find network equilibrium points as solutions of the equivalent nonsmooth convex optimization problem (10). Due to large dimensionality of this problem it seems more suitable to take simple subgradient projection methods whose properties are discussed in the next section.
4
Subgradient Projection Method and Its Modifications
Let us consider the general constrained optimization problem min −→ h(u),
u∈D
(11)
where D ⊂ Rn , h : Rn → R. We denote by D∗ its solution set and by h∗ the optimal value of the objective function. In the following we will suppose that D is a convex and compact set, h is a convex continuous function, which can be nonsmooth. Then (11) is a convex optimization problem. We observe that the above optimization problem (10) falls into these conditions. Next, the feasible set X is there rather simple in the sense that the projection onto this set is not very expensive. The general subgradient projection method can be described as follows. At the kth iteration, k = 0, 1, . . . , we have a current point uk ∈ D. Choose any direction dk ∈ ∂h(uk ) and a step θk from the sequence satisfying the following conditions: ∞ ∞ θk > 0, θk = ∞, θk 2 < ∞. (12) k=0
k=0
The next iterative point is uk+1 = πD [uk − θk dk ].
(13)
Here πD is the projection operator onto the feasible set D. This process stops, if d = 0, then we get the exact solution u∗ ∈ D∗ . Otherwise, this method generates an infinite sequence. We recall the basic convergence property of this method; see e.g. [7, Ch. 3, Theorem 4.5] and [8, Ch. 5, §2]).
46
I. Konnov and O. Pinyagina
Proposition 1. Let the sequence {uk } be generated in accordance with (12)– (13). Then lim uk = u∗ ∈ D∗ . k→∞
Observe that this assertion also holds true for the normed variant of the method where the subgradient dk in (13) is replaced with the normed subgradient q k = (1/dk )dk . Besides, we can apply the method in the unbounded case with proper corrections of the assumptions if necessary. An essential disadvantage of this method is its slow convergence for applied problems, then different modifications can be utilized; see e.g. [6,7,9,10]). We will apply some other approach of improving convergence of the subgradient projection method within (12), which was proposed in [11]. We believe that the sequences generated by the scheme (12)–(13) converge slowly because the ordinary divergent series condition is inflexible and nonadaptive to changing properties of the goal function, since it may have different behavior on diverse parts of the feasible set. Usually, the sequence of step-sizes θk has a uniform rate of decreasing. Remind that the subgradient projection method is not monotone. Also, the iteration sequence may contain many rather large step-sizes that do not in fact decrease the distance to a solution and cause the slow convergence. In order to reduce the number of these improper step-sizes, we propose to apply the two-speed step-size procedure below. Choose a sequence of indices {is } such that i0 = 0, 0 < is+1 − is ≤ m < ∞, s = 0, 1, . . . ,
(14)
and a sequence of basic steps {βk }, corresponding to the custom divergent series rule ∞ ∞ βk > 0, βk = ∞, βk 2 < ∞. (15) k=0
k=0
Then we give the sequence of iteration step-sizes for the subgradient projection method (13) as follows: βs , if k = is ; ∀s = 0, 1, . . . , ν ∈ (0, 1). (16) θk = νθk−1 , if is < k < is+1 Therefore, we have two levels of step-sizes. The linear decreasing rate of stepsizes at the iterates between is and is+1 allows us to find more suitable steps, but the divergent series condition in (15) prevents from the very small steps. Theorem 1. The sequence of step-sizes {θk } in (14)–(16) falls into the classical rule (12). Proof. Indeed, on the one hand, we consider the sum of the step-size series ∞ k=0
θk ≥
∞ s=0
βs = ∞.
A Subgradient Projection Method
47
On the other hand, we consider the sum of the squared step-size series ∞
∞
θk2 ≤
k=0
(1 − ν 2m ) 2 β < ∞. (1 − ν 2 ) s=0 s
Hence, we obtain that all the conditions in (12) hold true, as desired. Now we describe this modification of the subgradient projection method (13)– (16) applied to the convex optimization problem (10), i.e. to the equivalent setvalued network equilibrium problem (8). Modified subgradient projection method (MGPM). Step 0. Choose an initial point x0 ∈ X, integer m > 0, number ν ∈ (0, 1), and the stopping criterion. Set k = 0. Step 1. If the stopping criterion is fulfilled, finish the iterative process. Step 2. Calculate f k in accordance with (2). For all components fak , choose arbitrary ca (fak ) ∈ Ca (fak ), a ∈ A. Calculate g k (xk ) ∈ G(xk ) in accordance with (7). Step 3. Choose the step-size value θk in accordance with conditions (14)–(16). Calculate the next iteration point xk+1 = πX [xk − θk g k (xk )],
(17)
set k = k + 1 and go to Step 1.
5
Computational Experiments on Model Networks
We compared MGPM proposed above and the custom subgradient projection method with harmonic step-size series (GPM for short) on several test network models. The next iterative point in GPM is also calculated as in (17), its step-size sequence was defined as follows θk = γ/(1 + k), where γ ∈ (0, 1]. The sequence {βk } in (15) was given similarly. We used the stopping criterion η(x) < η ∗ + Δ. The error Δ was set to be equal to 0.1. The computational results are presented in tables, they contain the numbers of iterations for different values of parameters m and ν. Example 1. First, we considered the well-known network structure from [12] (Fig. 1). This network contains 25 nodes, 40 arcs, 5 Ω/D-pairs. We assumed that all arcs of the network were bypass [12] and had the same cost mapping
48
I. Konnov and O. Pinyagina
⎧ 1, ⎪ ⎪ ⎪ ⎪ ⎪ [1, 2] ⎪ ⎪ ⎪ ⎪ ⎪ 2, ⎨ Ca (fa ) = [2, 3] ⎪ ⎪ ⎪ 3, ⎪ ⎪ ⎪ ⎪ ⎪[3, 4] ⎪ ⎪ ⎩ 4,
if fa if fa if fa if fa if fa if fa if fa
< 1, = 1, ∈ (1, 2), ∀a ∈ A. = 2, ∈ (2, 3), = 3, > 3,
which is presented on Fig. 2. The demand values were set to be (11, 9, 7, 5, 3). Then, the corresponding functions ηa had the form: ηa (fa ) = max{fa , 2fa − 1, 3fa − 3, 4fa − 6}
Fig. 1. Network of 25 nodes, 5 Ω/D pairs (1–4), (2–5), (3–1), (4–2), (5–3).
Fig. 2. Cost mappings for Examples 1–3
A Subgradient Projection Method
49
Table 1. Example 1, the first series of experiments, the results for MGPM. ν m 0.2 0.3 0.5 0.7 0.9 1
139
79
54
53
55
2
200
4
91
61
49
87
317 148
80
78
96
9
623 283 155 133 151
for all a ∈ A. In the first series of experiments, the parameter γ was set to be equal to 1. In this case, GPM solved the problem in 118 iterations. The results for MGPM for different values of m and ν are presented in Table 1. We see that MGPM showed better results for the indicated values of parameters. In the second series of experiments, the parameter γ was set to be equal to 0.1. In this case, GPM solved the problem in 4595 iterations. The results for MGPM are presented in Table 2. Table 2. Example 1, the second series of experiments, the results for MGPM. ν m 0.2
0.3
0.5
0.7 0.9
1
4673
1951 853
458 278
2
4965
2136 586
329 124
4
7459
2839 618
225 83
9
14923 5713 1153 250 67
Example 2. We considered the network structure presented at Fig. 3. The set of O/D-pairs contains 3 elements, W = {(1, 40), (3, 38), (5, 36)}. The demand values equal (9, 11, 13). The parameter γ was set to be equal to 1. GPM solved the problem in 26 iterations. The results for MGPM are presented in Table 3.
50
I. Konnov and O. Pinyagina
Fig. 3. Network for Examples 2.
Table 3. Example 2, the first series of experiments, the results for MGPM. ν m 0.5 0.7 0.9 1
31 37 24
4
30 35 42
9
48 30 38
In the second series of experiments, the parameter γ was set to be equal to 0.5. In this case, GPM solved the problem in 1393 iterations. The results for MGPM are presented in Table 4.
Table 4. Example 2, the second series of experiments, the results for MGPM. ν m 0.5 0.7 0.9 1
367 188 128
4
300 105 55
9
638 165 55
Example 3. We considered the network structure presented at Fig. 4. The network structure for this example were generated randomly. The network involved 20 Ω/D pairs: (10, 19), (2, 9), (3, 11), (1, 7), (12, 4), (15, 20), (1, 13), (9, 5), (7, 10), (13, 14), (3, 16), (12, 9), (20, 4), (6, 3), (11 ,2), (5, 10), (12, 6), (14, 1), (12, 13), (5, 11). The demand values were the following: (2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7). The cost mappings were the same form as in Example 1. γ was equal to 1.
A Subgradient Projection Method
51
Fig. 4. Network for Example 3, 5.
In this example, GPM solved the problem in 237 iterations. The results of MGPM for different values of m and ν are presented in Table 5. Table 5. Example 3. Results for MGPM. ν m 0.2 0.3
0.5 0.7 0.9
1
193 157
167 139 230
4
461 234
101 120 290
9
923 1564 170 81
251
Example 4. We considered the network structure from Example 1, but we assumed that the cost mapping increases more rapidly after a certain flow threshold value. The cost mappings had the form ⎧ 1, if fa < 1, ⎪ ⎪ ⎪ ⎪ ⎪ [1, 2] if fa = 1, ⎪ ⎪ ⎪ ⎨ 2, if fa ∈ (1, 2), Ca (fa ) = ⎪ [2, 3] if fa = 2, ⎪ ⎪ ⎪ ⎪ ⎪ 3, if fa ∈ (2, 3), ⎪ ⎪ ⎩ 2fa − 3 if fa ≥ 3. Then, the corresponding functions ηa had the form: max{fa , 2fa − 1, 3fa − 3}, if fa < 3, ηa (fa ) = if fa ≥ 3 fa2 − 3fa + 6,
52
I. Konnov and O. Pinyagina
for all a ∈ A. γ was equal to 1. In this example, GPM solved the problem in 26 iterations. The results of MGPM for different values of m and ν are presented in Table 6.
Table 6. Example 4. Results for MGPM. ν m 0.2 0.3 0.5 0.7 1
14 21 26 33
4
32 38 20 35
9
62 63 42 22
Example 5. We considered the network structure from Example 3 and the cost mappings from Example 4. GPM was unable to solve this problem in 100,000 iterations. The results for MGPM are presented in Table 7.
Table 7. Example 5. Results for MGPM. ν m 0.2
0.3
1
55083
>100 000 >100 000 13779
4
>100 000 12350
9
>100 000 >100 000 2005
0.5 2661
0.7
0.9 9459
>100 000 >100 000 670
>100 000
In this section, various parameter values were tested to analyze the behavior of the proposed method. For the problems under solution, the proposed method showed better results for sufficiently wide ranges of parameters than the conventional subgradient method. This property enables one to select some narrow segments for suitable choice of parameters. A preliminary recommendation is to set m closer to 1, ν closer to 0.9. Nevertheless, this question needs further investigations.
6
Conclusions
In the present work, we proposed a set-valued variant of the network equilibrium model with fixed demands. Under certain additional assumptions, it appears equivalent to a nonsmooth convex optimization problem. We proposed to solve this problem with a modification of the subgradient projection method, using a special two-speed step-size choice procedure. This
A Subgradient Projection Method
53
procedure falls into the general divergent step-size series rule, but it uses auxiliary subseries of decreased step-sizes and contains returns to the next step value of the main series. Computational experiments on model networks showed the efficiency of the proposed approach to this class of network equilibrium problems in comparison with the custom subgradient projection method. This approach is more flexible in the choice of parameters and seems to be promising for further investigations.
References 1. Nagurney, A.: Network Economics: A Variational Inequality Approach. Kluwer, Dordrecht (1999) 2. Patriksson, M.: The Traffic Assignment Problem: Models and Methods. Dover, Mineola (2015) 3. Konnov, I., Pinyagina, O.: Partial linearization method for network equilibrium problems with elastic demands. In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos, P. (eds.) DOOR 2016. LNCS, vol. 9869, pp. 418–429. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44914-2 33 4. Smith, M.J.: Existence, uniqueness, and stability of traffic equilibria. Transp. Sci. 13B, 259–304 (1979) 5. Dafermos, S.: Traffic equilibrium and variational inequalities. Transp. Sci. 14(1), 42–54 (1980) 6. Konnov, I.V.: Nonlinear Optimization and Variational Inequalities. Kazan University Press, Kazan (2013).(in Russian) 7. Dem’yanov, V.F., Vasil’yev, L.V.: Nondifferentiable Optimization. Nauka, Moscow (1981). (In Russian, Engl. transl. in Optimization Software, New York (1985)) 8. Gol’shtein, E.G., Tret’yakov, N.V.: Modified Lagrange Functions. Nauka, Moscow (1989). (In Russian, Engl. transl. in John Wiley and Sons, New York (1996)) 9. Shor, N.Z.: Minimization Methods for Non-Differentiable Functions. Naukova Dumka, Kiev (1979). (In Russian, Engl. transl. in Springer-Verlag, Berlin (1985)) 10. Polyak, B.T.: Introduction to Optimization. Nauka, Moscow (1983). (In Russian, Engl. transl. in Optimization Software, New York (1987)) 11. Konnov, I.: Exact penalties for decomposable optimization problems. arXiv preprint arXiv:2010.00630 (2020) 12. Bertsekas, D.P., Gafni, E.M.: Projection methods for variational inequalities with application to the traffic assignment problem. In: Nondifferential and Variational Techniques in Optimization, pp. 139–159. Springer, Berlin, Heidelberg (1982). https://doi.org/10.1007/BFb0120965
Non-convex Optimization in Digital Pre-distortion of the Signal Alexander Maslovskiy1(B) , Dmitry Pasechnyuk1 , Alexander Gasnikov1,4,5(B) , Anton Anikin2 , Alexander Rogozin1 , Alexander Gornov2 , Lev Antonov3 , Roman Vlasov3 , Anna Nikolaeva6 , and Maria Begicheva6 1
4
Moscow Institute of Physics and Technology, 9 Institutskiy per., 141701 Dolgoprudny, Russian Federation {aleksandr.maslovskiy,pasechniuk.da,aleksandr.rogozin}@phystech.edu 2 Matrosov Institute for System Dynamics and Control Theory, 134 Lermontov Street, 664033 Irkutsk, Russian Federation {anikin,gornov}@icc.ru 3 Russian Research Institute, Huawei, Moscow, Russia {antonov.lev,vlasov.roman}@huawei.com Institute for Information Transmission Problems RAS, Bolshoy Karetny per. 19, build.1, 127051 Moscow, Russian Federation 5 Caucasus Mathematical Center, Adyghe State University, 208, Pervomayskaya Street, 385000 Maykop, Russian Federation 6 Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, 121205 Moscow, Russian Federation {anna.nikolaeva,maria.begicheva}@skoltech.ru
Abstract. This paper reviews application of modern optimization methods for functionals describing Digital Pre-distortion (DPD) of signals with orthogonal frequency division multiplexing (OFDM) modulation. The considered family of model functionals is determined by the class of cascade Wiener–Hammerstein models, which can be represented as a computational graph consisting of various nonlinear blocks. To assess optimization methods with the best convergence depth and rate as a properties of this models family, we multilaterally consider modern techniques used in optimizing neural networks and numerous numerical methods used to optimize non-convex multimodal functions. The research emphasizes the most effective of the considered techniques and describes several useful observations about the model properties and optimization methods behavior.
Keywords: Digital Pre-distortion Wiener–Hammerstein models
· Non-convex optimization ·
The work was supported by the Russian Science Foundation (project 21-71-30005) and IRF Algorithm Competence Center of Huawei Moscow Research Center. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 54–70, 2021. https://doi.org/10.1007/978-3-030-86433-0_4
Non-convex Optimization in Digital Pre-distortion of the Signal
1
55
Introduction
Today, base stations, which perform in the capacity of radio signal transceivers, are widely used for the implementation and organization of wireless communication between remote devices. Modern base stations have a complex technical structure and include many technical components allowing organize accurate and efficient data transmission. One of the most important of these components is the analog power amplifier (PA). Its role is to amplify the signal from the base station, reduce the noise effect on the signal and increase the transmission range. The impact of some ideal amplifiers can be characterized with functional PA(x) = a · x, where a 1 and x is an input signal. However, real amplifiers are complex non-linear analog devices that couldn’t be described by analytical function due to the influence of many external and internal obstructing factors. Power amplifiers can change phase, cut amplitude of the original signal, and generate parasitic harmonics outside the carrier frequency range. These influences cause significant distortions of the high-frequency and high-bandwidth signal. The spectrum plot from (see Fig. 1) shows that described problem is relevant in the conditions of operation of modern devices: when the signal goes through the power amplifier its spectrum range becomes wider than the spectrum range of the original signal and as a result generates noise for other signals.
Fig. 1. Power spectral density plot of original signal, out of PA signal, and result of pre-distorting signal
One possible solution to this problem is employing the digital baseband predistortion (DPD) technique to compensate for non-linear effects, that influence
Fig. 2. DPD principle of suppression spread spectrum [7]
56
A. Maslovskiy et al.
the input signal. In this case, DPD acts upon the input signal with the aim of offsetting the non-linear impact of the power amplifier. To compensate errors, generated during the amplification, there is a special feedback connection, through which the difference between input and output of power amplifier signals sent to optimize digital pre-distorter parameters (see Fig. 2). In this paradigm, the pre-distorter model can be presented as a parametric function transforming signal in accordance with the real digital pre-distorter operation. Thus, the parametrization of the model is being optimized as parameters of real adaptive filters. Results of working pre-distortion model could be presented on idealized AM-AM diagram (see Fig. 3).
Fig. 3. Idealized AM-AM diagram of DPD, PA and result signal
According to all the above, from a mathematical point of view, pre-distortion consists of applying to the input signal model DPD, which could be described by mathematical function DPD, that approximates the inverse to the real PA function describing the effect of an analog power amplifier [7]. This problem can be expressed in the following optimization form1 1 PA(y) − ax22 → min . 2 y=:DPD(x) A more practical approach is to choose a certain parametric family of functions {DPD(x, θ)}θ∈Θ (in particular, defined by a computational graph of a specific type). Taking into account that non-linear transformation of the signal can be obtained as a result of passing it through a number of non-linear functions and thus presented as the composition of some additive changes, the optimization problem reformulated in the following form 1 DPDθ (x) − e22 → min . θ∈Θ 2 If we have large enough training set (x, e), it is possible to optimize the model parameters on it, thereby choosing a good approximation for the function that acts as a DPD for the sample signal. This work is devoted to a wide range of issues related to the numerical solution of problems of this kind for one fairly wide class of models — Wiener– Hammerstein models [5,17]. Based on the results of numerous computational 1
Note that even in this formulation, the problem can be solved by a classical gradient descent scheme y k+1 = y k −h(PA(y k )−ax), assuming that DPD can model arbitrary function, and that the Jacobian ∂PA(y)/∂y ≈ aI.
Non-convex Optimization in Digital Pre-distortion of the Signal
57
experiments, there were identified and are now described the methods that demonstrate the most successful results in terms of the convergence depth, its rate, and method’s susceptibility to overfitting. Approaches to online and offline training of DPD models, methods of initializing models, and also some directions for possible further development of methods for solving the problems of the category under consideration are proposed.
2
Problem Formulation
2.1
Model Description
In this paper, we consider block-oriented models describing dynamic nonlinear effects of PA. These models have a tend to reduce number of coefficients unlike the Volterra series. Considered Wiener–Hammerstein models can account for static non-linear behavior of PA and deal with linear memory effects in the modeled system. To refine model robustness and enhance its performance it was chosen an instance of cascade Wiener–Hammerstein model [5], those structure is presented in Fig. 4.
Fig. 4. Two-layer block model of the Wiener–Hammerstein type
The following is a formal mathematical model for the case of two layers: the outputs of the first z-layer and the second y-layer are the sum of the results returned from Rz and Ry , respectively, identical blocks forming the layer. Each of these blocks is described as a combination of convolution, polynomials and lookup table functions applied to the input signal P Cp · φp (|x|) · x − x , dk,H lut,HCS (x) := convHCS convk,H lut p=1
blockH,HCS ,H lut ,C,k (x) := convk,H lut
P
Cp · φp (|x|) · x
p=1
+
Bk l=0
convk,Hl dk,H lut ,HCS (x) · |dk,H lut ,HCS (x)|l , (1)
58
A. Maslovskiy et al.
where H lut ∈ CM , H ∈ CN , HCS ∈ CK×L = Hl ∈ CK l ∈ {1, ..., L} denote weights of convolutions, C ∈ CP are weighting coefficients of gains in lookup table functions, φp is a polynomial function of arbitrary order applied to the input vector to activate special gain for quantized amplitude of the complex input, convk,H is the convolution of the input vector with a vector of weights H and shift k ∈ N [5] N convk,H (x) := Hn xk−n+1 . n=1
Thus, the presented two-layer model is described as follows zk (x) =
Rz r=1
yk (x) =
Ry r=1
blockHz ,HCS,z ,Hzlut ,Cz ,k (x), blockHy ,HCS,y ,Hylut ,Cy ,k (z(x)).
(2)
In this work, we also study the following modification of the described model, obtained by utilizing skip connections technique (widely used in residual neural networks [8]): zk (x) =
Rz r=1
yk (x) =
Ry r=1
blockHz ,HCS,z ,Hzlut ,Cz ,k (x), blockHy ,HCS,y ,Hylut ,Cy ,k (z(x)) + zk (x).
(3)
As a result, we get a computational graph, characterized by the following hyperparameters (such a set for each layer): 1. N , M , K — width of applied convolution, 2. P — number of polynomial functions, 3. R — number of blocks in a layer; and having the following set of training parameters: θ := (Hz , Hzlut , HCS,z , Hy , Hylut , HCS,y , Cz , Cy ), where Hz ∈ CRz ×Kz ×L , HCS,z ∈ CRz ×Nz , Hzlut ∈ CRz ×Mz Hy ∈ CRy ×Ky ×L , HCS,y ∈ CRy ×Ny , Hylut ∈ CRy ×My Cz ∈ CRz ×Pz , Cy ∈ CRy ×Py polynomial weights.
convolution weights,
The total number of model parameters can be calculated as follows: n = Rz (Nz + Mz + Kz × L + Pz ) + Ry (Ny + My + Ky × L + Py ). In the numerical experiments presented in this paper, the tuning of the model was such that the number of model parameters n ∼ 103 . Note that additional experiments on considering various graph configurations and hyperparameter settings are presented in the full version of this paper [15] in Model Tuning part.
Non-convex Optimization in Digital Pre-distortion of the Signal
2.2
59
Optimization Problem Statement
The demonstration of the result, as returned by the used model, is parameterized by the vector θ at the input x as Mθ (x) := y(x) (3). The main considered problem of restoring the function PA−1 using the described model can be formulated as a problem of supervised learning in the form of regression. Let then (x, y) be a training sample, where x ∈ Cm is a signal input to the DPD, y ∈ Cm is the desired modulated output signal. In this setting, the problem of restoring the DPD function can be formulated as minimizing the empirical risk (in this case, with a quadratic loss function): m
1 f (θ) := ([Mθ (x)]k − y k )2 → min . θ m
(4)
k=1
To assess the quality of the solution obtained as a result of the optimization of this loss functional, we will further use the normalized mean square error quality metric, measured in decibels: m
(yk − y k )2 k=1 m NMSE(y, y) := 10 log10 dB. 2 k=1 xk
3
Optimization Methods
In this section of the article, we consider three wide classes of optimization methods: full-gradient methods, Gauss–Newton methods, and stochastic (SGDlike) methods. Descriptions of some used methods and additional related experiments are provided in the full version of this article [15]. 3.1
Long Memory L-BFGS
The consideration of the quasi-Newton method class should be the first step in this section of the paper. Unlike the classical Newton’s method, which uses the Hessian to find the quadratic approximation of a function at a certain point, the quasi-Newton methods are based on the principle of finding a quadratic approximation that is tangent to the graph of the function at the current point and has the same gradient value as the original function at the previous point of the trajectory. More specifically, these methods have iterations of form xk+1 = xk −hk Hk ∇f (xk ), where Hk is an approximation of inverse Hessian [∇2 f (xk )]−1 and hk is the step-size. The choice of matrices Hk is constrained to the following quasi-Newton condition: Hk+1 (∇f (xk+1 ) − ∇f (xk )) = xk+1 − xk , which is inspired by Taylor series at point xk+1 , ∇f (xk ) − ∇f (xk+1 ) = ∇2 f (xk+1 )(xk − xk+1 ) + o(xk − xk+1 2 ) xk+1 − xk ≈ [∇2 f (xk+1 )]−1 (∇f (xk+1 ) − ∇f (xk )).
(5)
60
A. Maslovskiy et al.
There are several methods that use different rules to satisfy condition (5)— some of them (viz. DFP) are also presented in the method comparison Table 1. However, one of the most practically efficient quasi-Newton methods is BFGS (the results of which are also presented in Table 1): xk+1 = xk − hk · Hk ∇f (xk ), where hk = arg min f (xk − h · Hk ∇f (xk )), h>0
Hk+1 = Hk + where βk = 1 +
Hk γk δk
δk γk Hk
+ Hk γk , γk
− βk
Hk γk γk Hk , Hk γk , γk
γk , δk
, γk = ∇f (xk+1 ) − ∇f (xk ), δk = xk+1 − xk , H0 = I. Hk γk , γk
One of its practical benefits is the stability of calculations and line-search accuracy. The most effective and economical method of quadratic interpolations tuned for a certain fixed number of iterations. However, due to the large amount of memory required to store the matrix Hk , this method is unsuitable for largescale problems. Therefore, in practice, the method of recalculating the Hk matrix using only r vectors γk and δk from the last iterations [13], in this case Hk−r is assumed to be equal to I. The described principle underlies the L-BFGS(r) methods class with a r memory depth. Theoretically, it is known about quasi-Newton methods that the global rate of their convergence in the case of smooth convex problems does not exceed estimates for the classical gradient method [4]. In practical terms, the L-BFGS method is one of the most universal and effective methods of convex and even unimodal optimization [9,18].
Table 1. Full-gradient methods convergence, no time limit, residual model, all results in full version of article [15] Method
Time to reach dB (s)
Method
Time to reach dB (s) −30 dB
−35 dB
−37 dB
SDM
25.83
925.92
5604.51
1344.80
Polyak(orig.)
7.39
123.93
345.10
944.52
Polyak(v2)
19.09
782.29
−30 dB
−35 dB
−37 dB
−39 dB
DFP(100)
11.02
50.22
83.46
2474.27
DFP(400)
10.29
42.07
70.27
DFP(inf)
11.10
43.96
72.56
BFGS(100)
7.41
34.77
53.89
695.28
BFGS(400)
10.02
44.95
211.28
4104.48
BFGS(inf)
10.65
47.96
LBFGS(3)
4.75
22.47
LBFGS(10)
4.31
19.45
48.82
LBFGS(100)
4.04
16.81
36.09
LBFGS(700)
4.00
16.77
34.62
LBFGS(900)
4.01
16.75
34.49
BB(v1)
9.82
148.09
Raider(0.1)
40.15
2602.57
188.55
Raider(0.3)
266.83
57.34
CG(PRP)
3.88
26.13
60.10
CG(PRP+)
4.16
24.91
60.29
513.34
CG(CD)
4.29
25.15
59.90
410.86
CG(LS)
4.17
26.58
399.52
CG(Nesterov)
28.86
114.86
777.82
386.93
60.86 277.88
3.1.1 Numerical Experiments Table 1 shows the results and efficiency of using the L-BFGS method with various settings r = 3, ..., 900. Table 1 also includes the experimental results for various
Non-convex Optimization in Digital Pre-distortion of the Signal
61
versions of Polyak method (Polyak) [16], Barzilai–Borwein method (BB) [3], conjugate gradient method (CG) [11], and steepest descent method with zeroed small gradient components (Raider). As you can see from the presented data, the L-BFGS method actually demonstrates better performance compared to other methods (see Table 4 and Fig. 11). One of the unexpected results of these experiments is the special efficiency of the L-BFGS method in the case of using a large amount of information from past iterations. Classically, limited memory variants of the BFGS method have small optimal values of the history size, and are not so dependent on it, however, in this case, the best convergence rate of the L-BFGS method is achieved for value r = 900, and with a further increase in this parameter, the result does not improve. It can be regarded as one of the unique and noteworthy characteristics of the particular problem under consideration. 3.2
Flexible Gauss–Newton Method
Another possible approach to solving the described problem is using the ideas underlying the Gauss–Newton method for solving the nonlinear least squares problem. The approach described in this section was proposed by Yu.E. Nesterov in work2 [12]. Let us reformulate the original problem (4). Consider a mapping F : Rn → Rm of the following form F (x) := (F1 (x), . . . , Fm (x)), where each component represents the discrepancy between the approximation obtained by the model and the exact solution for each of the objects of the training set: Fi (x) := [Mθ (x)]i − y i . Then the original problem can be reduced to solving the following least squares problem: min {f1 (x) := F (x)2 }.
x∈Rn
(6)
We additionally require only the Lipschitz smoothness of the functional F (note that throughout the analysis of the method, the requirement of convexity will not be imposed on the functional, that is, the presented convergence estimates are valid in non-convex generality): F (x) − F (y)2 ≤ LF x − y2 , x, y ∈ Rn , where F (x) = ∂F∂xi (x) is a Jacobian. Under these assumptions, one can prove j
i,j
the following lemma on the majorant for the initial function f1 . 2
This paper is in print. The result of Nesterov’s paper and our paper make up the core of the joint Huawei project. The described below Method of Three Squares [12] was developed as an attempt to beat L-BFGS (see Fig. 5). We repeat in this paper the main results of [12] since they were developed for considered problem formulation and for the moment there is no possibility to read about these results somewhere else. Note, that recently some results of the paper [12] were generalized [19]. In particular, in [19] one can find more information about the Method of Three Squares.
62
A. Maslovskiy et al.
Lemma 1. [12] Let x and y be some points from Rn , L ≥ LF , and f1 (x) > 0. Then L 1 2 f1 (x) + F (x) + F (x)(y − x)22 + y − x22 . (7) f1 (y) ≤ ψˆx,L (y) := 2f1 (x) 2 Let us assume for a moment that we know an upper bound L for the Lipschitz constant LF . Then the last inequality in (7) leads to the following method:
xk+1 = arg minn
Method of Three Squares [12] 1 y − xk 22 + f1 (xk ) f12 (xk ) + F (xk ) + F (xk )(y − xk )22 2 2
L
y∈R
(8) The global convergence of this method is characterized by the following theorem. Theorem 1. [12] Let us assume that the function F (·) is uniformly nondegenerate: F (x)F (x) μIm for all x ∈ F0 := {x ∈ Rn : f1 (x) ≤ f1 (x0 )}. If in the method (8), we choose L ≥ LF , then it converges linearly to the solution of equation F (x) = 0:
μk f1 (xk ) ≤ f1 (x0 ) · exp − , k ≥ 0. 2(Lf1 (x0 ) + μ) At the same time, for any k ≥ 0 we have f1 (xk+1 ) ≤
1 L 2 f1 (xk ) + f (xk ). 2 2μ
Thus, the coefficient for asymptotic local linear rate of convergence for this method is 12 . If we relax the assumptions of Theorem 1, then we can estimate the rate of convergence of this method to a stationary point of problem (6). Denote f2 (x) := f12 (x) = F (x)22 .
Theorem 2. [12] Suppose that the function F has uniformly bounded derivative: F (x)2 ≤ MF for all x ∈ F0 . If in the method (8) L ≥ Lf , then for any k ≥ 0 we have f2 (xk ) − f2 (xk+1 ) ≥
1 ∇f2 (xk )22 . 8(Lf1 (x0 ) + MF2 )
Thus, under very mild assumption (bounded derivative), we can prove that the measure of non-stationarity ∇f2 (·)22 is decreasing as follows [12]: min ∇f2 (xk )22 ≤
0≤i≤k
8f2 (x0 )(Lf1 (x0 ) + MF2 ) , k+1
k ≥ 0.
Non-convex Optimization in Digital Pre-distortion of the Signal
63
We will also consider the following enhanced version of method (8).
xk+1
Non-Smooth Gauss-Newton Method [12] L = arg minn ψxk ,L (y) := y − xk 22 + F (xk ) + F (xk )(y − xk )2 y∈R 2 (9)
Let us describe its convergence properties. Theorem 3. [12] Let us choose in the method (9) L ≥ Lf . 1. If function F is uniformly non-degenerate: F (x)F (x) μIm , x ∈ F0 , then method (9) converges linearly to the solution of equation F (x) = 0:
μk f1 (xk ) ≤ f1 (x0 ) · exp − , k ≥ 0. 2(Lf1 (x0 ) + μ) At the same time, it has local quadratic convergence: f1 (xk+1 ) ≤
L 2 f (xk ), 2μ 1
k ≥ 0.
2. Suppose that function F has uniformly bounded derivative: F (x)2 ≤ MF , x ∈ F0 , then for any k ≥ 0 we have f2 (xk ) − f2 (xk+1 ) ≥
1 ∇f2 (xk )22 . 8(Lf1 (x0 ) + MF2 )
Normalized versions of introduced objective functions are as follows: fˆ1 (x) := 1 1/2 1 √1 f1 (x) = , fˆ2 (x) := m f2 (x). This normalization allows us to m f2 (x) m consider m → ∞. Moreover, the objective function in this form admits stochastic approximation. Therefore, let us describe a stochastic variant of method (8). Method of Stochastic Squares [12] a) Choose L0 > 0 and fix the batch size p ∈ {0, . . . , m}. b) Form Ik ⊆ {1, . . . , m} with |Ik | = p and define Gk := {Fi (xk ), i ∈ Ik }. c) Define ϕk (y) := fˆ1 (xk ) + fˆ1 (xk ), y − xk + 2fˆ 1(x ) p1 GTk (y − xk )22 . 1
k
d) Find the smallest ik ≥ 0 such that for the point
2ik Lk Tik = arg min ψik (y) := ϕk (y) + y − xk 22 y 2 ˆ we have f1 (Ti ) ≤ ψi (Ti ) k
k
k
e) Set xk+1 = Tik and Lk+1 = 2ik −1 Lk .
(10)
64
A. Maslovskiy et al.
3.2.1 Numerical Experiments In a series of numerical experiments, it was tested the practical efficiency of two described full-gradient methods (8) (3SM) and (9) (NsGNM), and the stochastic method (10) (SSM) for various batch sizes p. The results are presented in Table 2, along with the results of the Gauss–Newton method in the Levenberg– Marquardt version (LM) [10]. It can be seen from the presented results that the Three Squares method demonstrates better performance than the Levenberg– Marquardt method, and, moreover, batching technique significantly accelerates the convergence of the proposed scheme. The best setting of the method with3 p = 6n demonstrates a result that exceeds the performance of the L-BFGS method, starting from the mark of −39 dB of the quality metric (see Fig. 5). Table 2. Gauss-Newton methods convergence, residual model (all results in full version of article [15])
Method
Time to reach dB level (s) −30 dB
−35 dB
−37 dB
−39 dB
LM(1)
919.28
1665.08
3165.61
14270.56
LM(3)
550.06
1494.13
3584.74
3SM
633.88
1586.69
1747.28
4616.80
NsGNM
762.99
1626.18
2265.74
9172.11
186.37
12469.39
SSM(1n)
48.79
104.31
SSM(5n)
42.65
88.14
SSM(6n)
47.32
72.30
89.08
SSM(7n)
44.16
109.33
137.58
3.3
110.00
315.28 314.95
Fig. 5. Convergence of Stochastic Squares Method (SSM) and L-BFGS method, residual model
382.58
Stochastic Methods
The next class of methods that are especially widely used for problems related to training models represented in the form of large computational graphs (in particular, neural networks) — is stochastic gradient methods. In addition to the repeatedly confirmed practical efficiency, the motivation for applying stochastic methods to the considering problem is a significant saving of time when evaluating the function for only one of the terms of the sum-type functional at our disposal. Indeed, consider the calculation complexity for the one term — it requires not more than 4 · (Ry + 1) · (Ry Nz + 2Mz Pz ) 4 · 103 arithmetical operations (a.o.), whereas the complexity for the full sum is 4·2m·(Ry Nz +2Mz Pz ) 3·108 a.o., that almost in ∼m times more. Further, according to theory of automatic 3
Note, that p ∼ n can be easily explained by the following observation. In this regime Jacobian calculation ∼pn2 has the same complexity as Jacobian inversion ∼n3 . It means that there is no reason to choose p large, but p n. If p is large we can consider p to be greater than n since the complexity of each iteration include n3 term anyway.
Non-convex Optimization in Digital Pre-distortion of the Signal
65
differentiation, for the particular computational graph the calculation of gradient is not more than 4 times more expensive than calculation of the function value [6,14], although it is necessary to store the entire computational graph in RAM. In our experiments, we apply stochastic methods to the problem under consideration. Along with classical stochastic gradient method (SGD), there weretested various modifications of adaptive momentum stochastic methods (Adam, Adagrad, Adadelta, Adamax). Adaptivity of these methods lies in the absence of the need to know the smoothness constants of the objective function, which is especially effective in deep learning problems [20]. Moreover, a number of variance reduction methods were applied (SVRG, SpiderBoost). Note that the code of most of these methods is free available at GitHub: https://github.com/ jettify/pytorch-optimizer. As one can see from the figures below, adaptive stochastic methods, in particular Adam, show the most effective (among stochastic methods) convergence for the considered model (see Fig. 6) (Table 3). At the same time, the use of variance reduction methods does not allow achieving any acceleration of the convergence (see Fig. 7, 8). Note also that variance reduction methods are inferior in convergence rate to the standard SGD method also in terms of the number of passes through the dataset. Experiments also show that the efficiency of stochastic algorithms (in comparison with full-gradient methods) significantly depends on the dimension of the model’s parameter space. Moreover, for models with a large number of blocks, the rate of convergence of Adam-type algorithms is slower than for methods of the L-BFGS type, due to the slowdown in convergence with an increase in the number of iterations of the method. It is important to note, however, that losing in the considered setting in terms of the depth and rate characteristics of convergence, stochastic methods show the advantage of being more resistant to overfitting [1] (due to their randomized nature). At the same time, the stochastic methods in the current version are especially valuable for the possibility of using them for online training of the model. Indeed, the main application of the solution to the problem posed at the beginning of the article is to optimize the DPD function, however, it is quite natural that with a change in the characteristics of the input signal over time, the optimal parametrization of the model can continuously change, so it is necessary to adjust the model to the new data. Stochastic methods allow for the modification of full-gradient methods right out of the box and are more convenient for hardware implementation.
66
A. Maslovskiy et al.
Table 3. Stochastic methods convergence, residual model dB
Method
Setup
ASGD
(128,
10.0)
−28.976
Adadelta
(2048,
10.0)
−32.697
t = 0 (s)
t = 300 (s)
−36.608
Adagrad
(2048,
0.01)
Adam
(2048,
0.001)
Adamax
(2048,
0.01)
RMSprop
(2048,
0.001)
−36.499
SGD
(128,
10.0)
−34.061
FastAdaptive
−15.616
−38.129 −37.934
−36.273
Fig. 7. Comparison of various sampling strategies for SVRG and SGD in terms of running time (in seconds) to reach the predefined threshold (−30, −35, −37, −38 dB).
4
Fig. 6. Stochastic methods convergence, residual model
Fig. 8. Comparison of various sampling strategies for SVRG and SpiderBoost in terms of running time (in seconds) to reach the predefined threshold (−30, −35, −37, −38 dB).
Overfitting
Overfitting is a common problem in computational graphs parameters training, when the tuned model corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably. In other words, in such a situation, the generalizing ability of the model is sacrificed to the quality of optimization of a specific function of empirical risk, due to which the performance of methods will decrease drastically on the data, which differ from those used for training. In order to control overfitting, in the experiments presented in the work, the original dataset was divided into two parts — training (75%) and validation (25%), and during training, the loss function was calculated both on training and validation datasets, in order to be able to compare the quality of the methods and detect overfitting to data. Figure 9 presents the results of experiments demonstrating how train and validation errors differ for several methods. As one can see, in the case of the used partition, the error difference is 0.05 dB, whereas the error itself at the given time interval is −37 dB. Note that at this scale of
Non-convex Optimization in Digital Pre-distortion of the Signal
67
training time and data quantity the least overfitting is achieved when using the L-BFGS method. 4.1
Different Training Set Size
From the point of view of studying the specific properties of the data generated by the signal arriving at the input of the pre-distorter, it is interesting to consider the dependence of the model’s susceptibility to overfitting on the size of the training dataset. The dataset of 245 760 pairs of complex numbers (x, y), used to set up all experiments in this work, was divided into two parts: a training set and a validation set. Moreover, the data was split in a sequential form, without random shuffling (which, on the contrary, is usually done in the case of training, for example, neural networks), so that a solid opening signal segment is used for training, and the entire remaining signal segment is taken as a validation set. After training the model on the selected training signal, the model with the resulting parameters was used to obtain a solution for the validation signal, and the performance of the method was thus assessed in parallel for two sets. The L-BFGS(900) method was used as the optimization method. It is clearly seen from the results of the experiment (see Fig. 10) that there is a discrepancy between the results of the model for different volumes of training and validation sets. With a small amount of training set, we get a model that describes very well a small number of objects (overtrained), but returns an irrelevant result when processing a new signal with naturally changed characteristics. Note that the specificity of this model is a rather small size of the required training dataset: even when using 20% of the training signal segment (which in this case has a size of ≈ 200 000), the difference between the quality metrics for the training and validation samples does not exceed 0.5 dB. At the same time, when
Fig. 9. Difference between train and validation errors
Fig. 10. Residual model overfitting, 5%, 20% of original data used as training set. Red line—convergence while training on sample signal, blue line— quality of the solution for validation signal (Color figure online)
68
A. Maslovskiy et al.
assessing the effects of overfitting, it is important that the samples used after splitting have a sufficiently large size, since when choosing a too small training or validation set, because of data in this parts can be really different and as a result naturally occurring approximation errors begin to strongly influence the result.
5
Conclusion
This article discusses various approaches for optimizing the parameters of computational graphs simulating the behavior of a digital pre-distorter for the modulated signal. In the numerous experiments, it was tested different full-gradient methods, and stochastic algorithms. Among the many randomized (mini-batch) algorithms that significantly use the sum-type structure of the objective functional, the Adam algorithm, which is most relevant for use in the online-training regime, demonstrates the best efficiency. However, it should be noted that for effective adjustment, that is, selecting the optimal step length and batch size, rather time-consuming precalculations are required. Of all the considered methods, the L-BFGS algorithm turned out to be the undisputed leader. It may be somewhat unexpected that the optimal memory depth of the L-BFGS method for this problem is in the range of 800–1000. Note that the idea of using the L-BFGS method for DPD optimization was proposed earlier in [2], for models based on the Volterra series. Experiments described in this paper thus confirm the L-BFGS method’s particular effectiveness for the DPD problem in terms of relative independence from a specific model and dataset. At the same time, the idea of deep memory is original, and perhaps it is specific to the used model class. In addition to the best convergence rate, the L-BFGS method as a training procedure also leads to the least error on the validation set: the discrepancy in the quality metric, that characterizes the overfitting susceptibility, for the dataset used is approximately 0.05 dB. There were described a number of new modifications of the Gauss–Newton method proposed by Yu.E. Nesterov, including the Method of Stochastic Squares. The practical efficiency of the proposed approaches is not only significantly higher than that of other Gauss–Newton methods, but is also the best among all the local methods considered in the experiments (see Fig. 5). Many experiments have been carried out evaluating the specifics of the dataset used, generated by the samples of the modulated signal. Experiments on the use of different sizes of the training sample have shown that it is enough to use 20% of the original dataset to obtain a sufficiently good quality on the validation set (−38 dB). Moreover, in this case, it is possible to reach the −38 dB threshold much faster than using the full training dataset. It should be noted that even 5% of the data is enough to reach the −37 dB threshold, and also in a much shorter time.
Non-convex Optimization in Digital Pre-distortion of the Signal
69
Table 4. Best methods performance, residual model
Method
Time to reach dB level (s) −30 dB
−35 dB
−37 dB
SDM
25.83
925.92
5604.51
CG(DY)
12.38
47.21
80.74
DFP(inf)
11.10
43.96
72.56
944.52
7.41
34.77
53.89
695.28
4.01
BFGS(100) LBFGS(900)
16.75
34.49
3SM
633.88
1586.69
1747.28
SSM(6n)
47.32
72.30
89.08
−39 dB 3721.43
399.52 4616.80
Fig. 11. Best methods convergence, residual model
314.95
Thus, despite the fact that the considered class of models has significant specificity, following the classical way of studying large computational graphs from the point of view of their parameters optimizing, formed mainly around the problem of training neural networks, makes it possible to collect a set of algorithms and approaches that are most effective for the problem under consideration. Moreover, many of the solutions developed specifically for neural networks turned out to be relevant for Wiener–Hammerstein type models. In particular, adaptive stochastic methods remain just as effective. At the same time, it is possible to significantly and drastically improve the results of classical approaches taking into account the specifics of the problem, such as the use of deep memory for the L-BFGS method, small width of network layers or a small training sample. Apparently, the dependencies found in the course of the described study are quite universal for this family of models, because it was tested on different models and achieve the best result in most cases, and therefore the presented observations can be useful not only for efficiently solving related practical problems, but also for further exploration of the problem of digital-predistortion of the signals.
References 1. Amir, I., Koren, T., Livni, R.: SGD generalizes better than GD (and regularization doesn’t help). arXiv preprint arXiv:2102.01117 (2021) 2. Bao, J.L., Zhu, R.X., Yuan, H.X.: Restarted LBFGS algorithm for power amplifier predistortion. In: Applied Mechanics and Materials, vol. 336, pp. 1871–1876. Trans Tech Publ (2013) 3. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988) 4. Dennis, J.E., Jr., Mor´e, J.J.: Quasi-newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977) 5. Ghannouchi, F.M., Hammi, O., Helaoui, M.: Behavioral Modeling and Predistortion of Wideband Wireless Transmitters. John Wiley & Sons, Hoboken (2015) 6. Griewank, A., et al.: On automatic differentiation. Math. Program. Recent Dev. Appl. 6(6), 83–107 (1989) 7. Haykin, S.S.: Adaptive filter theory. Pearson Education India (2008)
70
A. Maslovskiy et al.
8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) 9. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989) 10. Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11(2), 431–441 (1963) 11. Neculai, A.: Conjugate gradient algorithms for unconstrained optimization. a survey on their definition. ICI Technical report 13, 1–13 (2008) 12. Nesterov, Y.: Flexible modification of gauss-newton method. CORE Discussion paper (2021) 13. Nocedal, J.: Updating quasi-newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980) 14. Nocedal, J., Wright, S.: Numerical Optimization. Springer Science & Business Media (2006). https://doi.org/10.1007/978-0-387-40065-5 15. Pasechnyuk, D., Maslovskiy, A., Gasnikov, A.: Non-convex optimization in digital pre-distortion of the signal. arXiv:2103.10552 (2021) 16. Polyak, B.T.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9, 14–29 (1969) 17. Schreurs, D., O’Droma, M., Goacher, A.A., Gadringer, M.: RF Power Amplifier Behavioral Modeling. Cambridge University Press, New York (2008) 18. Skajaa, A.: Limited memory BFGS for nonsmooth optimization. Master’s thesis (2010) 19. Yudin, N., Gasnikov, A.: Flexible modification of gauss-newton method and its stochastic extension. arXiv preprint arXiv:2102.00810 (2021) 20. Zhang, J., et al.: Why ADAM beats SGD for attention models. arXiv preprint arXiv:1912.03194 (2019)
Zeroth-Order Algorithms for Smooth Saddle-Point Problems Abdurakhmon Sadiev1(B) , Aleksandr Beznosikov1,2 , Pavel Dvurechensky3,4 , and Alexander Gasnikov1,4,5 1
Moscow Institute of Physics and Technology, Dolgoprudny, Russia 2 Higher School of Economics, Moscow, Russia 3 Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany 4 Institute for Information Transmission Problems RAS, Moscow, Russia 5 Caucasus Mathematical Center, Adyghe State University, Maykop, Russia
Abstract. Saddle-point problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle-point problems using zerothorder oracles, and estimate their convergence rate and its dependence on the dimension n of the variable. In particular, our analysis shows that in the case when the feasible set is a direct product of two simplices, our convergence rate for the stochastic term is only by a log n factor worse than for the first-order methods. Finally, we demonstrate the practical performance of our zeroth-order methods on practical problems.
Keywords: Zeroth-order optimization Stochastic optimization
1
· Saddle-point problems ·
Introduction
Zeroth-order or derivative-free methods [6,11,16,37,41] are well known in optimization in application to problems with unavailable or computationally expensive gradients. In particular, the framework of derivative-free methods turned out to be very fruitful in application to different learning problems such as online learning in the bandit setup [7] and reinforcement learning [10,18,38], which can be considered as a particular case of simulation optimization [19,40]. We study stochastic derivative-free methods in a two-point feedback situation, meaning that two observations of the objective per iteration are available. This setting The research of A. Sadiev, A. Beznosikov and P. Dvurechensky in Sect. 4 was supported by the Russian Science Foundation (project 21-71-30005). The research of A. Gasnikov in Sect. 5 was partially supported by RFBR, project number 18-29-03071 mk. This work was partially conducted while A. Sadiev and A. Beznosikov were on the project internship in Sirius University of Science and Technology. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 71–85, 2021. https://doi.org/10.1007/978-3-030-86433-0_5
72
A. Sadiev et al.
was considered for optimization problems by [1,14,39] in the learning community and by [15,20–22,35,42] in the optimization community. In this paper we go beyond the setting of optimization problems and consider convex-concave saddle-point problems for which partial derivatives of the objective are not available, which forces to use derivative-free methods. Saddlepoint problems are tightly connected with equilibrium [17] and game problems [2] in many applications, e.g., economics [31], with tractable reformulations of non-smooth optimization problems [34], and with variational inequalities [25]. Gradient methods for saddle-point problems are an area of intensive study in the machine learning community in application to training of Generative Adversarial Networks [23], and other adversarial models [30], as well as to robust reinforcement learning [36]. In the latter two applications, gradients are often unavailable, which motivates the application of zeroth-order methods to the respective saddlepoint problems. Moreover, this also motivates 1/2th-order methods, when the training of the network is made via stochastic gradient method with backpropagation, and adversarial examples, which are generated to force the network to give incorrect prediction, are generated by zeroth-order methods. Another application area for zeroth-order methods are Adversarial Attacks [24,43], in particular the Black-Box Adversarial Attacks [32]. The goal is not only to train the network, but to find also a perturbation of the data in such a way that the network outputs wrong prediction. Then the training is repeated to make the network robust to such adversarial examples. Since the attacking model does not have access to the architecture of the main network, but only to the input and output of the network, the only available oracle for the attacker is the zerothorder oracle for the loss function. As it is shown in [12,13,45], this approach allows to obtain the same quality of robust training as the more laborious methods of Adversarial Attacks, but faster in up to a factor of three in terms of the training time [9]. Gradient methods for saddle-point problems are a well studied area with the classical algorithm being the extra-gradient method [28]. It was later generalized to the non-Euclidean geometry in the form of Mirror Descent [3] and MirrorProx [34]. These methods are designed for a more general problem of solving variational inequalities. There are also direct methods for saddle-point problems such as gradient descent ascent [33] or primal-dual hybrid gradient method [8] for saddle-point problems with bilinear structure. On the contrary, the theory of zeroth-order methods for saddle-point problems seems to be underdeveloped in the literature. We give a more detailed overview of such methods and explain our contribution in comparison with the literature below. 1.1
Our Contribution and Related Works
In the first part of the work, we present zeroth-order variants of Mirror-Descent [3] and Mirror-Prox [27] methods for stochastic saddle-point problems in convexconcave and strongly convex-concave cases. We consider various concepts of zeroth-order oracles and various concepts of noise. Also we introduce a new class of smooth saddle-point problems – firmly smooth.
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
73
In the particular case of deterministic problems, our methods have a linear rate in the smooth strongly-convex-strongly-concave case, and sublinear rate O(1/N ) in the convex-concave case, where N is the number of iterations. One can note that in some estimates, there is a factor of the problem’s dimension n, but somewhere n2/q . This factor q depends on geometric setup of our problem and gives a benefit when we work in the H¨ older, but non-Euclidean case (use non-Euclidean prox), i.e. · = · p and p ∈ [1; 2], then · ∗ = · q , where 1/p + 1/q = 1. Then q takes values from 2 to ∞, in particular, in the Euclidean case q = 2, but when the optimization set is a simplex, q = ∞. (see Table 1 for a comparison of the oracle complexity with zeroth-order methods for saddle-point problems in the literature and provided by our methods). Table 1. Comparison of oracle complexity in deterministic setup of different zerothorder methods with different assumptions on target function f (x, y): C-C – convexconcave, SC-SC – strongly-convex-strongly-concave, NC-SC – nonconvex-stronglyconcave; Cst – optimization set is constrained, UCst – unconstrained; S - smooth, FS - firmly smooth (see (9)), BG - bounded gradients. Here ε means the accuracy of the solution, D – the diameter of the optimization set, μ – strong convexity constant (see (7)), L – smoothness constant (see (8)), κ = L/μ, M – bound of the gradient (∇x f (x, y)2 ≤ M , ∇y f (x, y)2 ≤ M ), n – the sum of the dimensions of the variables x and y, q = 2for the Euclidean case and q = ∞ for setup of · 1 -norm. *convergence N ∗ ∗ 2 on N1 k=1 E F (xk , yk ) − F (x , y )2 , where F (x, y) = (∇x f (x, y), −∇y f (x, y)). Method
Assumptions
Complexity in deterministic setup ˜ nκ22 ZO-GDMSA [44] NC-SC, UCst-Cst, S O ε ˜ n6 ZO-Min-Max [29] NC-SC, Cst-Cst, S O ε 2 2 2 zoSPA [5] C-C, Cst-Cst, BG O n /q Mε2D
˜ min n2/q κ2 , nκ · log 1 O [Alg 1 and 3] SC-SC, Cst-Cst, S ε 2 LD ˜ n O [Alg 2] C-C, Cst-Cst, S ∗ ε ˜ n2/q L2 D2 O [Alg 1] C-C, Cst-Cst, FS ε
Our theoretical analysis shows that the zeroth-order methods has the same sublinear √ convergence rate in the stochastic part as the first-order method: O(1/ N ) in convex-concave case and O(1/N ) in strongly-convex-strongly-concave case. (see Table 2 for a comparison of the oracle complexity in the stochastic part for first-order methods and available zeroth-order methods for stochastic saddle-point problems). The second part of the work is devoted to the use of a mixed order oracle, i.e. a zeroth-order oracle in one variable and a first-order oracle for the other. First, we analyze a special case when such an approach is appropriate - the Lagrange multiplier method. Then we also present a general approach for this setup. The
74
A. Sadiev et al.
Table 2. Comparison of oracle complexity for stochastic part of different first- and zeroth-order methods with different assumptions on f (x, y): see notation in Table 1. Here σ 2 – the bound of variance (see (3)). Method
Order Assumptions
EGMP [27]
1st
PEG [26]
1st
ZO-SGDMSA [44] 0th [Alg 1]
0th
[Alg 2]
0th
[Alg 1]
0th
Complexity for stochastic part 2 2 C-C, Cst-Cst, S O σ εD 2 2 σ SC-SC, Cst-Cst, S O μ2 ε 2 ˜ κ2 nσ NC-SC, UCst-Cst, S O 4 ε 2/q 2 SC-SC, Cst-Cst, S O n μ2 εσ 2 2 C-C, Cst-Cst, S O nσε2D 2/q 2 2 C-C, Cst-Cst, FS O n εσ2 D
idea of using such an oracle is found in the in literature [4], but for the composite optimization problem. As mentioned above, all theoretical results are tested in practice on a classical bilinear problem.
2
Problem Setup and Assumptions
We consider a saddle-point problem: min max f (x, y), x∈X y∈Y
(1)
where X ⊂ Rnx and Y ⊂ Rny are convex compact sets. For simplicity, we introduce the set Z = X × Y, z = (x, y) and the operator F : ∇x f (x, y) . (2) F (z) = F (x, y) = −∇y f (x, y) We focus on the case when we do not have access to the values of ∇x f (x, y) and ∇y f (x, y), but we have access to the inexact zeroth-order oracle, i.e. inexact values of the objective f (x, y). The inexactness in the zeroth-order oracle includes stochastic noise and unknown bounded noise, which can be of an adversarial nature. More precisely, we have access to the values f˜(z, ξ) such that f˜(z, ξ) = f (z, ξ) + δ(z) and E[f (z, ξ)] = f (z), E[F (z, ξ)] = F (z), E[F (z, ξ) − F (z)22 ] ≤ σ 2 , |δ(z)| ≤ Δ.
(3)
We consider two types of approximations for F (z) based on the available observations of f˜(z, ξ).
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
75
Random Direction Oracle. In this strategy, the vectors ex , ey are generated uniformly on the unit Euclidean sphere, i.e. ex ∈ RS 2nx (1) and ey ∈ RS 2ny (1). And ⎛ ⎞ ˜ ˜ , y, ξ) − f (x, y, ξ) ex f (x + τ e x n ⎠, (4) gd (z, e, τ, ξ) = ⎝ τ f˜(x, y, ξ) − f˜(x, y + τ ey , ξ) ey where τ > 0 is called smoothed parameter and n = nx + ny + 1. Full Coordinates Oracle. Here we consider a standard orthonormal basis {h1 , . . . , hnx +ny } and construct an approximation for the operator F in the following form: gf (z, h, τ, ξ) =
nx 1
f˜(z + τ hi , ξ) − f˜(z, ξ) hi τ i=1
+
1 τ
nx +ny
f˜(z, ξ) − f˜(z + τ hi , ξ) hi .
(5)
i=nx +1
In this concept, we need to call f˜ oracle nx +ny +1 times, whereas in the previous case only 3 times.
3
Notation and Definitions
def n n We use x, y = i=1 xi yi to define inner product of x, y ∈ R where xi is n the i-th component of x in the standard basis in R . Hence we get the defdef inition of 2 -norm in Rn in the following way x2 = x, x . We define def n p 1/p p -norms as xp = ( i=1 |xi | ) for p ∈ (1, ∞) and for p = ∞ we use def
x∞ = max1≤i≤n |xi |. The dual norm · q for the norm · p is defined in the def
following way: yq = max { x, y | xp ≤ 1}. Operator E[·] is full mathematical expectation and operator Eξ [·] express conditional mathematical expectation. As stated above, during the course of the paper we will work in an arbitrary norm · = · p , where p ∈ [1; 2]. And its conjugate · ∗ = · q with q ∈ [2; +∞) and 1/p + 1/q = 1. Some assumptions will be made later in the Euclidean norm - we will write this explicitly · 2 . Definition 1. Function d(z) : Z → R is called a prox-function if d(z) is 1strongly convex w.r.t. · -norm and differentiable on Z function. Definition 2. Let d(z) : Z → R is a prox-function. For any two points z, w ∈ Z we define Bregman divergence Vz (w) associated with d(z) as follows: Vz (w) = d(z) − d(w) − ∇d(w), z − w .
76
A. Sadiev et al.
Definition 3. Let Vz (w) Bregman divergence. For all x ∈ Z define proxoperator of ξ: proxx (ξ) = arg min (Vx (y) + ξ, y ) . y∈Z
Next we present the assumptions that we will use in the convergence analysis. Assumption 1. The set Z is bounded w.r.t · by constant Dp , i.e. Vz1 (z2 ) ≤ Dp2 , ∀z1 , z2 ∈ Z.
(6)
Assumption 2. f (x, y) is convex-concave. It means that f (·, y) is convex for all y and f (x, ·) is concave for all x. Assumption 2(s). f (x, y) is strongly-convex-strongly-concave. It means that f (·, y) is strongly-convex for all y and f (x, ·) is strongly-concave for all x w.r.t. V· (·), i.e. for all x1 , x2 ∈ X and for all y1 , y2 ∈ Y we have f (x1 , y2 ) ≥ f (x2 , y2 ) + ∇x f (x2 , y2 ), x1 − x2
μ V(x2 ,y2 ) (x1 , y2 ) + V(x1 ,y2 ) (x2 , y2 ) , + 2 −f (x2 , y1 ) ≥ −f (x2 , y2 ) + −∇y f (x2 , y2 ), y1 − y2
μ V(x2 ,y2 ) (x2 , y1 ) + V(x1 ,y1 ) (x2 , y2 ) . + 2
(7)
Assumption 3. f (x, y, ξ) is L(ξ)-Lipschitz continuous w.r.t · 2 , i.e. for all x1 , x2 ∈ X , y1 , y2 ∈ Y and ξ ∇x f (x1 , y1 , ξ) ∇x f (x2 , y2 , ξ) ≤ L(ξ) x1 − x2 . (8) − −∇y f (x1 , y1 , ξ) y1 y2 2 −∇y f (x2 , y2 , ξ) 2 Assumption 3(f ). f (x, y) s L-firmly Lipschitz continuous w.r.t · 2 , i.e. for all x1 , x2 ∈ X , y1 , y2 ∈ Y 2 ∇x f (x1 , y1 , ξ) ∇x f (x2 , y2 , ξ) − −∇y f (x1 , y1 , ξ) −∇y f (x2 , y2 , ξ) 2 ∇x f (x1 , y1 , ξ) ∇x f (x2 , y2 , ξ) x x2 ≤ L(ξ) − , 1 − . (9) y1 y2 −∇y f (x1 , y1 , ξ) −∇y f (x2 , y2 , ξ) For (8) and (9) we assume that exists L2 such that E[L2 (ξ)] ≤ L22 . For deterministic case L2 is equal to deterministic constant L (without ξ). By Cauchy-Schwarz, (8) follows from (9). It is easy to see that the assumptions 3 and 3(f) above can be easily rewritten in a more compact form using F (z). For assumption 2(s) it is more complicated: Lemma 1. If f (x, y) is μ-strongly convex on x and μ-strongly concave on y w.r.t V· (·), then for F (z) we have F (z1 ) − F (z2 ), z1 − z2 ≥
μ (Vz1 (z2 ) + Vz2 (z1 )) , ∀z1 , z2 ∈ Z. 2
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
77
And we can present some properties of oracles (4), (5): Lemma 2. Let e ∈ RS 2 (1), i.e. uniformly distributed on the unit Euclidean sphere. Randomness comes from independent variables e, ξ and a point z. Norm · ∗ = · q satisfies q ∈ [2; +∞). We introduce the constant ρn : ρn = min{q − 1, 16 log(n) − 8}. Then under Assumption 3 or 3(f ) the following statements hold: – for Random direction oracle E gd (z, e, τ, ξ)2q ≤ 48n2/q ρn E F (z) − F (z ∗ )22 + 48n2/q ρn F (z ∗ )22 + 48n2/q ρn σ 2 + 8n2/q+1 ρn L2 τ 2 n2/q+1 ρn Δ2 + 16 , τ2 √ √ Δ E[gd (z, e, τ, ξ)] − F (z)q ≤ 2n1/q+1/2 ρn Lτ + 4n1/q+1/2 ρn ; τ – for Full coordinates oracle 6nΔ2 E gf (z, τ, ξ) − F (z)2q ≤ 3σ 2 + 3nL22 τ 2 + , τ2 √ √ 2 nΔ . E [gf (z, τ, ξ)] − F (z)q ≤ nLτ + τ
4
Zeroth-Order Methods
In this part, we present methods for solving problem (1), which use only the zeroth-order oracle. First of all, Input: z0 , N , γ, τ . we want to consider the classic verChoose grad to be either gd or gf . for k = 0, 1, 2, . . . , N do sion of the Mirror-Descent algorithm. Sample indep. ek , ξk . For theoretical and practical analysis dk = grad(zk , ek , τ, ξk ). of this algorithm in the non-smooth zk+1 = proxzk (γ · dk ). case, but with a bounded gradient, see end for [3] (first order), [5] (zero order). The Output: zN +1 or z¯N +1 . main problem of this approach is that it is difficult to analyze in the case when f is convex-concave and Lipschitz continuous (Assumptions 2 and 3). But in practice, this algorithm does not differ much from its counterparts, which will be given below. Let us analyze this algorithm in convex-concave and strongly-convex-strongly-concave cases with Random direction oracle:
Algorithm 1. zoVIA
Theorem 1. By Algorithm 1 with Random direction oracle
78
A. Sadiev et al.
– under Assumptions 1, 2, 3(f ) and with γ ≤
1 , 48n2/q ρn L
we get
N 2LDp2
1 + 48γn2/q ρn L F (z ∗ )22 + σ 2 E F (zk ) − F (z ∗ )22 ≤ N γN k=1 Δ2 + 8γn2/q+1 ρn L L22 τ 2 + 2 2 τ 2Δ √ + 8n1/q+1/2 ρn LDp Lτ + ; τ
– under Assumptions 1, 2(s), 3 and with γ ≤
μ : 96n2/q ρn L2
∗ ∗ E VzN +1 (z ) ≤ Vz0 (z ) exp −
μ2 N 400n2/q ρn L2
24n2/q ρn F (z ∗ )22 + σ 2 2 μ N Δ2 4n2/q+1 ρn 2 2 τ + 2 + L 2 μ2 N τ2 √ 4n1/q+1/2 ρn Dp 2Δ + Lτ + . γμ2 N τ
+
Remark. In the first statement of the Theorem, we used an unusual convergence criterion, it can be interpreted as follows: let as the output z˜N of the algorithm we choose a random point from z0 to zN . Then E F (˜ zN )22 =
1 E F (zk )22 . N +1 N
k=0
In this theorem and below, we draw attention to the fact that in the main part of the convergence there is a deterministic constant L, and in the parts that are responsible for noise – L2 (see (8),(9)). Corollary 1. For Algorithm 1
– under Assumptions 1, 2, 3(f ) and with γ = min τ = Θ min
ε , max √ n1/q+1/2 ρn L2 Dp
1
48n2/q ρn L
ε σ ,√ nL22 nL2
D
, n1/q √ρp σ√N n
,
,
Δ = O L2 τ 2 ,
the oracle complexity (coincides with the number of iterations) to find εsolution (in terms of the convergence criterion from Theorem 1) is 2 n /q ρn L2 Dp2 n2/q ρn σ 2 Dp2 N = O max , . ε ε2
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
79
– under Assumptions 1, 2(s), 3 and with γ = 96n2/qμρ L2 , n √ εμ εL σ σ2 μ τ = Θ min max ,√ , , max , L2 an,q LDp an,q L3 Dp nL2
√ where an,q = n1/q+1/2 ρn , Δ = O L2 τ 2 , the oracle complexity (coincides with the number of iterations) to find ε-solution (in terms of the convergence criterion from Theorem 1) can be bounded by 2/q 2/q 2 2 max n ρn L log 1 , n ρn σ N =O . μ2 ε μ2 ε Remark. We analyze only Random direction oracle. The estimate of the oracle complexity with Full coordinate oracle has the same form with q = 2. Next, we consider a standard algorithm for working with smooth saddle-point problem. It builds on the extra-gradient method [28]. The idea of using this approach for saddle-point problems is not new [27]. It has both heuristic advantages (we forestall the properties of the gradient) as well as purely mathematical ones (a more clear theoretical analysis). We use two versions of this approach: classic and single call version from [26]. Algorithm 2. zoESVIA
Algorithm 3. zoscESVIA
Input: z0 , N , γ, τ . Choose oracle grad from gd , gf . for k = 0, 1, 2, . . . , N do Sample indep. ek , ek+1/2 , ξk , ξk+1/2 . dk = grad(zk , ek , τ, ξk ). zk+1/2 = proxzk (γ · dk ). dk+1/2 = grad(zk+1/2 , ek+1/2 , τ, ξk+1/2 ). zk+1 = proxzk (γ · dk+1/2 ). end for Output: zN +1 or z¯N +1 .
Input: z0 , N , γ, τ . Choose oracle grad from gd , gf . for k = 0, 1, 2, . . . , N do Sample independent ek , ξk . Take dk−1 from previous step. zk+1/2 = proxzk (γ · dk−1 ). dk = grad(zk+1/2 , ek+1/2 , τ, ξk ). zk+1 = proxzk (γ · dk ). end for Output: zN +1 or z¯N +1 .
N Here z¯N +1 = N1+1 i=0 zi+1/2 . Next, we will deal with the theoretical analysis of convergence: Theorem 2. – By Algorithm 2 with Full coordinates oracle under Assumptions 1, 2, 3 and with γ ≤ 1/2L, we have 2Dp2 nΔ2 2 2 2 + 11γ nL2 τ + σ + 2 2 zN +1 )] ≤ E [εsad (¯ γN τ √ √ 2 nΔ + 2Dp nLτ + , τ where
zN +1 ) = max f (¯ xN +1 , y ) − min f (x , y¯N +1 ), εsad (¯ y ∈Y
x ∈X
x ¯N +1 , y¯N +1 are defined the same way as z¯N +1 .
80
A. Sadiev et al.
– By Algorithm 3 with Full coordinates oracle under Assumptions 1, 2(s), 3 and with p = 2 (Vx (y) = 1/2x − y22 ), γ ≤ 1/6L: μN ∗ 2 E zN +1 − z 2 ≤ exp − z0 − z ∗ 22 12 L μN + exp − gf (z0 , τ, ξ0 ) − gf (z0 , τ, ξ0 )22 12 L 1 2nΔ2 + 2 12 σ 2 + nL22 τ 2 + μ N τ2 √ 1 4D2 √ 2 nΔ + 2 nLτ + . μ N γ τ Corollary 2. Let ε be an accuracy of the solution (in terms of the convergence criterion from Theorem 2). – For Algorithm 2 with Full coordinates oracle under Assumptions 1, 2, 3 with √ γ = min {1/2L, Dp/(σ N )} and additionally ! "
ε εL σ τ = O min √ , max ,√ , Δ = O L2 τ 2 , 2 nL2 nLD2 nL2 we have the number of iterations to find ε-solution LD22 σ 2 Dp2 N = O max , 2 . ε ε – For Algorithm 3 with Full coordinates oracle under Assumptions 1, 2(s), 3, with p = 2 (Vx (y) = 1/2x − y22 ), γ = 1/6L and additionally ! " με εμL σ σ2 τ = O min max , max √ ,√ ,√ 2 , L22 nLD2 nL D2 nL2
Δ = O L2 τ 2 , the number of iterations to find ε-solution: L 1 σ2 N = O max log , 2 . μ ε μ ε Remark. The oracle complexity for the Full coordinate oracle is n times greater than the number of iterations. The analysis is carried out only for the Full coordinate oracle. The main problem of using Random Direction is that their variance is tied to the norm of the gradient; therefore, using an extra step does not give any advantages over Algorithm 1. A possible way out of this situation is to use the same direction e within one iteration of Algorithm 2 – this idea is implemented in Practice part. It is interesting how it work in practice, because in the non-smooth case [5] the gain by the factor n2/q can be obtained.
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
5
81
Practice Part
The main goal of our experiments is to compare the Algorithms 1,2,3 and 4 described in this paper with Full coordinate and Random direction oracles. We consider the classical bilinear saddle-point problem on a probability simplex: min max y T Cx , (10) x∈Δn y∈Δk
This problem is often referred to as a matrix game (see Part 5 in [3]). Two players X and Y are playing. The goal of player Y is to win as much as possible by correctly choosing an action from 1 to k, the goal of player X is to minimize the gain of player X using his actions from 1 to n. Each element of the matrix cij are interpreted as a winning, provided that player X has chosen the i-th strategy and player Y has chosen the j-th strategy. n Let consider the stepof algorithm. The prox-function is d(x) = i=1 xi log xi n (entropy) and Vx (y) = i=1 xi log xi/yi (KL divergence). The result of the proximal operator is u = proxzk (γk grad(zk , ek , τ, ξk )) = zk exp(−γk grad(zk , ek , τ, ξk )), by this entry we mean: ui = [zk ]i exp(−γk [grad(zk , ek , τ, ξk )]i ). Using the Bregman projection onto the simplex in following way P (x) = x/ x 1 , we have [xk ]i exp(−γk [gradx (zk , ek , τ, ξk )]i ) [xk+1 ]i = , n [xk ]j exp(−γk [gradx (zk , ek , τ, ξk )]j ) j=1
[yk ]i exp(γk [grady (zk , ek , τ, ξk )]i ) [yk+1 ]i = , n [yk ]j exp(γk [grady (zk , ek , τ, ξk )]j ) j=1
where under gx , gy we mean parts of g which are responsible for x and for y. In the first part of the experiment, we take matrix 200 × 200. All elements of the matrix are generated from the uniform distribution from 0 to 1. Next, we select one row of the matrix and generate its elements from the uniform from 5 to 10. Finally, we take one element from this row and generate it uniformly from 1 to 5. The results of the experiment is on Fig. 1. From the experiment results, one can easily see the best approach in terms of oracle complexity.
82
A. Sadiev et al.
Fig. 1. Different algorithms with Full coordinate and Random direction oracles applied to solve saddle-problem (10).
6
Conclusion
In this paper, we presented various algorithms for optimizing smooth stochastic saddle point problems using zero-order oracles. For some oracles, we provide a theoretical analysis. We also compare the approaches covered in the work on a practical matrix game. As a continuation of the work, we can distinguish the following areas: the study of gradient-free methods for saddle point problems already with a onepoint approximation (in this work, we used a two-point one). We also highlight the acceleration of these methods.
References 1. Agarwal, A., Dekel, O., Xiao, L.: Optimal algorithms for online convex optimization with multi-point bandit feedback. In: COLT 2010 - The 23rd Conference on Learning Theory (2010) 2. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd Edition. Society for Industrial and Applied Mathematics, Philadelphia (1998). https://doi.org/10.1137/1.9781611971132, https://epubs.siam.org/doi/ abs/10.1137/1.9781611971132 3. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Society for Industrial and Applied Mathematics, Philadelphia (2019)
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
83
4. Beznosikov, A., Gorbunov, E., Gasnikov, A.: Derivative-free method for decentralized distributed non-smooth optimization. arXiv preprint arXiv:1911.10645 (2019) 5. Beznosikov, A., Sadiev, A., Gasnikov, A.: Gradient-free methods for saddle-point problem. arXiv preprint arXiv:2005.05913 (2020) 6. Brent, R.: Algorithms for Minimization Without Derivatives. Dover Books on Mathematics, Dover Publications (1973) 7. Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multiR Mach. Learn. 5(1), 1–122 (2012). https:// armed bandit problems. Found. Trends doi.org/10.1561/2200000024 8. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011) 9. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security AISec 2017 (2017). https://doi.org/10.1145/3128572.3140448, http://dx.doi.org/ 10.1145/3128572.3140448 10. Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 970–978. PMLR, Stockholmsm¨ assan, Stockholm Sweden, 10–15 July 2018 11. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2009). https://doi.org/10.1137/1.9780898718768 12. Croce, F., Hein, M.: A randomized gradient-free attack on ReLU networks. arXiv preprint arXiv:1811.11493 (2018) 13. Croce, F., Rauber, J., Hein, M.: Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. arXiv preprint arXiv:1903.11359 (2019) 14. Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zeroorder convex optimization: the power of two function evaluations. IEEE Trans. Inf. Theory 61(5), 2788–2806 (2015). arXiv:1312.2139 15. Dvurechensky, P., Gorbunov, E., Gasnikov, A.: An accelerated directional derivative method for smooth stochastic convex optimization. Eur. J. Oper. Res. (2020). https://doi.org/10.1016/j.ejor.2020.08.027 16. Fabian, V.: Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist. 38(1), 191–200 (1967). https://doi.org/10.1214/aoms/ 1177699070 17. Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media (2007). https://doi.org/ 10.1007/b97543 18. Fazel, M., Ge, R., Kakade, S., Mesbahi, M.: Global convergence of policy gradient methods for the linear quadratic regulator. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1467–1476. PMLR, Stockholmsm¨ assan, Stockholm Sweden, 10–15 July 2018 19. Fu, M.C. (ed.): Handbook of Simulation Optimization. ISORMS, vol. 216. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-1384-8 20. Gasnikov, A.V., Lagunovskaya, A.A., Usmanova, I.N., Fedorenko, F.A.: Gradientfree proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Automation and Remote Control 77(11), 2018– 2034 (2016). https://doi.org/10.1134/S0005117916110114
84
A. Sadiev et al.
21. Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). arXiv:1309.5549 22. Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Prog. 155(1), 267–305 (2016). https://doi.org/10.1007/s10107-014-0846-1, arXiv:1308.6594 23. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014) 24. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014) 25. Harker, P.T., Pang, J.S.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Prog. 48(1–3), 161–220 (1990) 26. Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the convergence of single-call stochastic extra-gradient methods. arXiv preprint arXiv:1908.08465 (2019) 27. Juditsky, A., Nemirovskii, A.S., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. arXiv preprint arXiv:0809.0815 (2008) 28. Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12(4), 747–756 (1976) 29. Liu, S., Lu, S., Chen, X., Feng, Y., et al.: Min-max optimization without gradients: convergence and applications to adversarial ML. arXiv preprint arXiv:1909.13806 (2019) 30. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference Track Proceedings (2018) 31. Morgenstern, O., Von Neumann, J.: Theory of Games and Economic Behavior. Princeton University Press, Princeton (1953) 32. Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial attacks on deep neural networks. In: CVPR Workshops. pp. 1310–1318. IEEE Computer Society (2017). http://doi.ieeecomputersociety.org/10.1109/CVPRW.2017.172 33. Nedi´c, A., Ozdaglar, A.: Subgradient methods for saddle-point problems. J. Optim. Theory Appl. 142(1), 205–228 (2009) 34. Nemirovski, A.: PROX-method with rate of convergence o (1/ t ) for variational inequalities with Lipschitz continuous monotone operators and smooth convexconcave saddle point problems. SIAM J. Optim. 15, 229–251 (2004). https://doi. org/10.1137/S1052623403425629 35. Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2015). https://doi.org/10.1007/ s10208-015-9296-2 36. Pinto, L., Davidson, J., Sukthankar, R., Gupta, A.: Robust adversarial reinforcement learning. In: Proceedings of Machine Learning Research, vol. 70, pp. 2817– 2826. PMLR, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. http://proceedings.mlr.press/v70/pinto17a.html 37. Rosenbrock, H.H.: An automatic method for finding the greatest or least value of a function. Comput. J. 3(3), 175–184 (1960). https://doi.org/10.1093/comjnl/3.3. 175 38. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
Zeroth-Order Algorithms for Smooth Saddle-Point Problems
85
39. Shamir, O.: An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017) 40. Shashaani, S., Hashemi, F.S., Pasupathy, R.: AsTRO-DF: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J. Optim. 28(4), 3145–3176 (2018). https://doi.org/10.1137/15M1042425 41. Spall, J.C.: Introduction to Stochastic Search and Optimization, 1st edn. John Wiley & Sons Inc, New York (2003) 42. Stich, S.U., Muller, C.L., Gartner, B.: Optimization of convex functions with random pursuit. SIAM J. Optim. 23(2), 1284–1309 (2013) 43. Tram`er, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017) 44. Wang, Z., Balasubramanian, K., Ma, S., Razaviyayn, M.: Zeroth-order algorithms for nonconvex minimax problems with improved complexities. arXiv preprint arXiv:2001.07819 (2020) 45. Ye, H., Huang, Z., Fang, C., Li, C.J., Zhang, T.: Hessian-aware zeroth-order optimization for black-box adversarial attack. arXiv preprint arXiv:1812.11377 (2018)
Algorithms for Solving Variational Inequalities and Saddle Point Problems with Some Generalizations of Lipschitz Property for Operators Alexander A. Titov1,3(B) , Fedor S. Stonyakin1,2 , Mohammad S. Alkousa1,3 , and Alexander V. Gasnikov1,3,4,5 1 Moscow Institute of Physics and Technology, Moscow, Russia {a.a.titov,mohammad.alkousa}@phystech.edu 2 V. I. Vernadsky Crimean Federal University, Simferopol, Russia 3 HSE University, Moscow, Russia 4 Institute for Information Transmission Problems RAS, Moscow, Russia 5 Caucasus Mathematical Center, Adyghe State University, Maikop, Russia
Abstract. The article is devoted to the development of numerical methods for solving saddle point problems and variational inequalities with simplified requirements for the smoothness conditions of functionals. Recently, some notable methods for optimization problems with strongly monotone operators were proposed. Our focus here is on newly proposed techniques for solving strongly convex-concave saddle point problems. One of the goals of the article is to improve the obtained estimates of the complexity of introduced algorithms by using accelerated methods for solving auxiliary problems. The second focus of the article is introducing an analogue of the boundedness condition for the operator in the case of arbitrary (not necessarily Euclidean) prox structure. We propose an analogue of the Mirror Descent method for solving variational inequalities with such operators, which is optimal in the considered class of problems. Keywords: Strongly convex programming problem · Relative boundedness · Inexact model · Variational inequality · Saddle point problem
The research in Introduction and Sects. 3,4 is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye in MIPT, project 07500337-20-03). The research in Algorithm 3 was partially supported by the grant of the President of Russian Federation for young candidates of sciences (project MK15.2020.1). The research in Theorem 1 and partially in Sect. 3.1 was supported by the Russian Science Foundation (project 18-71-10044). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 86–101, 2021. https://doi.org/10.1007/978-3-030-86433-0_6
Algorithms for Variational Inequalities and Saddle Point Problems
1
87
Introduction
Modern numerical optimization methods are widely used in solving problems in various fields of science. The problem of finding the optimal value in the investigated mathematical model naturally arises in machine learning, data analysis, economics, market equilibrium problems, electric power systems, control theory, optimal transport, molecular modeling, etc. This paper is devoted to the development and analysis of numerical methods for solving saddle point problems and variational inequalities. Both saddle point problems and variational inequalities play a critical role in structural analysis, resistive networks, image processing, zero-sum game, and Nash equilibrium problems [4,6,8,13]. Despite variational inequalities and saddle point problems are closely related (it will be discussed later), the article consists of two conditionally independent parts, each devoted to the special type of generalization of smoothness conditions and the corresponding problem. Firstly, the paper focuses on recently proposed numerical methods for solving (μx , μy )-strongly convex-concave saddle point problems of the following form: min max f (x, y). x
(1)
y
In particular, in [1] authors considered a modification of some well-known scheme for speeding up methods for solving smooth saddle point problems. The smoothness, in such a setting, means the Lipschitz continuity of all partial gradients of the objective function f . We improve the obtained estimates of the complexity for the case of a non-smooth objective. In more details, we consider the saddle point problem under the assumption, that one of the partial gradients still satisfies the Lipschitz condition, while the other three satisfy simplified smoothness condition, namely, the H¨ older continuity. Note, it is the Lipschitz continuity, which makes it possible to use accelerated methods for solving auxiliary problems. The H¨ older continuity is an important generalization of the Lipschitz condition, it appears in a large number of applications [9,14]. In particular, if a function is uniformly convex, then its conjugated will necessarily have the H¨ older-continuous gradient [9]. We prove, that the strongly convex-concave saddle point problem, modified in a functional form, admits an inexact (δ, L, μx )–model, where δ is comparable to ε, and apply the Fast Gradient Method to achieve the ε–solution of the considered problem. The total number of iterations is estimated as follows: L Lyy 2LD2 2Lyy R2 · log , O · · log μx μy ε ε ˜ where L = L
˜ (1−ν)(2−ν) L 2ε 2−ν
(1−ν)(1+ν) 2−ν
˜ = ,L
Lxy
2Lxy μy
ν 2−ν
+ Lxx D
ν−ν 2 2−ν
,
Lxx , Lxy , Lyy > 0, ν is the H¨older exponent of ∇f , D is a diameter of the domain of f (x, ·), R denotes the distance between the initial point of the algorithm y 0 and the point y ∗ , where (x∗ , y ∗ ) is the exact solution of the problem (1).
88
A. A. Titov et al.
Remind that (˜ x, y˜) is called an ε–solution of the saddle point problem, if x, y) − min f (x, y˜) ≤ ε. max f (˜ y
x
(2)
Let us note, that due to the strong convexity-concavity of the considered saddle point problem, the convergence in argument takes place with the similar asymptotic behavior. Thus, we can consider the introduced definition of the ε-solution (2) for the strongly convex-concave saddle point problem (1). The second part of the article continues with the study of accelerated methods and is devoted to the numerical experiments for some recently proposed [14] universal algorithms. In particular, we consider the restarted version of the algorithm and apply it to the analogue of the covering circle problem with non-smooth functional constraints. We show, that the method can work faster, than O( 1ε ). Finally, the third part of the article explores Minty variational inequalities with the Relatively bounded operator. Recently Y. Nesterov proposed [10] a generalization of the Lipschitz condition, which consists in abandoning the classical boundedness of the norm of the objective‘s gradient ∇f in favor of a more complex structure. This structure allows one to take into account the peculiarities of the domain of the optimization problem, which can be effectively used in solving support vector machine (SVM) problem and intersection of n ellipsoids problem [7,15]. Remind the formulation of the Minty variational inequality. For a given operator g(x) : X → R, where X is a closed convex subset of some finite-dimensional vector space, we need to find a vector x∗ ∈ X, such that g(x), x∗ − x ≤ 0 ∀x ∈ X.
(3)
We consider the variational inequality problem under the assumption of Relative boundedness of the operator g, which is the modification of the aforementioned Relative Lipschitz condition for functionals. We introduce the modification of the Mirror Descent method to solve such variational inequality problems. The proposed method guarantees an (ε + σ)–solution of the problem after no more than 2RM 2 ε2 iterations, where M is the Relative boundedness constant, which depends on the characteristics of the operator g and R can be understood as the distance between the initial point of the algorithm and the exact solution of the variational inequality in some generalized sense. The constant σ reflects certain features of the monotonicity of the operator g. An (ε + σ)–solution of the variational inequality is understood as the point x ˜, such that ˜ − x ≤ ε + σ. maxg(x), x x∈X
(4)
The paper consists of the introduction and three main sections. In Sect. 2, we consider the strongly convex-concave saddle point problem in a non-smooth set-
Algorithms for Variational Inequalities and Saddle Point Problems
89
ting. Section 3, is devoted to some numerical experiments concerning the methods, recently proposed in [14] and the analysis of their asymptotics in comparison with the method proposed in Sect. 2. In Sect. 4, we consider the Minty variational inequality problem with Relatively Bounded operator. To sum it up, the contributions of the paper can be formulated as follows: – We consider the strongly convex-concave saddle point problem and its modified functional form. We prove, that the functional form admits an inexact (δ, L, μx )–model and apply the Fast Gradient Method to achieve the ε– solution of the considered problem. Moreover, we show, that δ = O(ε). – We show, that the total number of iterations of the proposed method does not exceed 2LD2 L Lyy 2Lyy R2 · log · · log O , μx μy ε ε (1−ν)(1+ν) ν ν−ν 2 2−ν ˜ (1−ν)(2−ν) 2Lxy 2−ν L ˜ ˜ 2−ν , L = Lxy μy + Lxx D . where L = L 2ε 2−ν – We introduce the modification of the Mirror Descent method to solve Minty variational inequalities with Relatively Bounded and σ–monotone operators. – We show, that the proposed method can be applied to obtain an 2 iterations. (ε + σ)–solution after no more than 2RM ε2
2
Accelerated Method for Saddle Point Problems with Generalized Smoothness Condition
Let Qx ⊂ Rn and Qy ⊂ Rm be nonempty, convex, compact sets and there exist D > 0 and R > 0, such that
x1 − x2 2 ≤ D
∀x1 , x2 ∈ Qx ,
y1 − y2 2 ≤ R
∀y1 , y2 ∈ Qy ,
Let f : Qx × Qy → R be a μx -strongly convex function for fixed y ∈ Qy and μy -strongly concave for fixed x ∈ Qx . Remind, that differentiable function h(x) : Qx → R is called μ-strongly convex, if ∇h(x1 ) − ∇h(x2 ), x1 − x2 ≥ μ x1 − x2 22
∀x1 , x2 ∈ Qx .
Consider (μx , μy )-strongly convex–concave saddle point problem (1) under assumptions, that one of the partial gradients of f satisfies the Lipschitz condition, while other three gradients satisfy the H¨ older condition. More formally, for any x, x ∈ Qx , y, y ∈ Qy and for some ν ∈ [0, 1], the following inequalities hold:
∇x f (x, y) − ∇x f (x , y) 2 ≤ Lxx x − x ν2 ,
(5)
∇x f (x, y) − ∇x f (x, y ) 2 ≤ Lxy y − y ν2 ,
(6)
90
A. A. Titov et al.
∇y f (x, y) − ∇y f (x , y) 2 ≤ Lxy x − x ν2 ,
(7)
∇y f (x, y) − ∇y f (x, y ) 2 ≤ Lyy y − y 2 .
(8)
Define the following function: x ∈ Qx .
g(x) = max f (x, y), y∈Qy
(9)
It is obvious, that the considered saddle point problem (1) can be rewritten in the following more simple way: g(x) = max f (x, y) → min . y∈Qy
(10)
x∈Qx
Since f (x, ·) is μy -strongly concave on Qy , the maximization problem (9) has the unique solution y ∗ (x) = arg max f (x, y) y∈Qy
∀x ∈ Qx ,
so g(x) = f (x, y ∗ (x)). Moreover, 2 ∗ ∗ 2 ∗ ∗
y (x1 ) − y (x2 ) 2 ≤ f x1 , y (x1 ) − f x1 , y (x2 ) ∀x1 , x2 ∈ Qx . (11) μy Lemma 1. Consider the problem 1 under assumptions (5)–(8). Define the function g(x) : Qx → R according H¨ older continuous gra to (9). Then g(x) has the ν 2−ν 2 ν−ν 2L dient with H¨ older constant Lxy μyxy + Lxx D 2−ν and H¨ older exponent ν 2−ν .
Proof. Similarly to [1], for any x1 , x2 ∈ Qx , let us estimate the following difference: ∗ ∗ ∗ ∗ f x1 , y (x1 ) − f x1 , y (x2 ) − f x2 , y (x1 ) − f x2 , y (x2 ) ≤ ∇x f x1 + t(x2 − x1 ), y ∗ (x1 ) − ∇x f x1 + t(x2 − x1 ), y ∗ (x2 ) · x2 − x1 2 2
∗
≤ Lxy y (x1 ) − y
∗
(x2 ) ν2 · x2
− x1 2 ,
t ∈ [0, 1].
Using (11), one can get
y ∗ (x1 ) − y ∗ (x2 ) 2 ≤
2Lxy μy
1 2−ν
1
x2 − x1 22−ν ,
which means, that y ∗ (x) satisfies the H¨older condition on Qx with H¨ older con1 2−ν 2L 1 and H¨ older exponent 2−ν ∈ [ 12 , 1]. stant μyxy
Algorithms for Variational Inequalities and Saddle Point Problems
91
Further,
∇g(x1 ) − ∇g(x2 ) 2 = ∇x f (x1 , y ∗ (x1 )) − ∇x f (x2 , y ∗ (x2 )) 2 ≤ ∇x f (x1 , y ∗ (x1 )) − ∇x f (x1 , y ∗ (x2 )) 2 + ∇x f (x1 , y ∗ (x2 )) − ∇x f (x2 , y ∗ (x2 )) 2 ≤ Lxy y ∗ (x1 ) − y ∗ (x2 ) ν2 +Lxx x2 − x1 ν2 = ν ν 2Lxy 2−ν
x2 − x1 22−ν +Lxx x2 − x1 ν2 ≤ Lxy μy ν 2−ν ν−ν 2 ν ν 2Lxy = Lxy
x2 − x1 22−ν +Lxx x2 − x1 22−ν · x2 − x1 22−ν . μy Since Qx is bounded, we have ν ν ν−ν 2 2Lxy 2−ν 2−ν + Lxx D
x2 − x1 22−ν ,
∇g(x1 ) − ∇g(x2 ) 2 ≤ Lxy μy
(12)
which means, that g(x) has the H¨ older continuous gradient. Definition 1. A function h(x) : Qx → R admits (δ, L, μ)-model, if, for any x1 , x2 ∈ Qx , the following inequalities hold: μ
x2 − x1 22 +∇h(x1 ), x2 − x1 + h(x1 ) − δ ≤ h(x2 ) ≤ h(x1 ) 2 L + ∇h(x1 ), x2 − x1 + x2 − x1 22 +δ 2
(13)
Remark 1 [5]. Note, that if a function h(x) has the H¨ older-continuous gradient ˜ ν˜ and H¨ with H¨ older constant L older exponent ν˜, then h(x) admits (δ, L, μ) 1−˜ν ˜ ν˜ L˜ ν˜ 1−˜ν 1+˜ν . model. More precisely, inequalities (13) hold with L = L 2δ 1+˜ ν Remark 2. According to Lemma 1, g(x) has the H¨ older continuous gradient, so g(x) admits (δ0 , L, μx )-model (δ was replaced by δ0 to simplify notation) with ˜ L=L ˜= where L
Lxy
2Lxy μy
ν 2−ν
˜ (1 − ν)(2 − ν) L 2δ0 2−ν
+ Lxx D
ν−ν 2 2−ν
(1−ν)(1+ν) 2−ν ,
.
Let us now assume, that instead of the ordinary gradient ∇g(x) we are given
:= ∇x f (x, y ), such that an inexact one ∇g(x)
y1 − y
2 ≤ Δ,
= const. > 0. where f (x, y1 ) = max f (x, y), Δ y∈Qy
(14)
92
A. A. Titov et al.
Theorem 1. Consider the strongly convex-concave saddle point problem (1) under assumptions (5)-(8). Define the function g(x) according to (10). Then g(x) admits an inexact (δ, L, μx )-model with δ = (DΔ + δ0 ) and L defined in Remark 2. Applying k steps of the Fast Gradient Method to the ”outer” problem (10) and solving the ”inner” problem (14) in linear time, we obtain an ε-solution (2) to the problem (1), where δ = O(ε). The total number of iterations does not exceed L Lyy 2LD2 2Lyy R2 · log , · · log O μx μy ε ε (1−ν)(1+ν) ν 2 2−ν ˜ L˜ (1−ν)(2−ν) ˜ = Lxy 2Lxy 2−ν + Lxx D ν−ν 2−ν where L = L , L . 2ε 2−ν μy Proof. Since the problem max f (x1 , y) is smooth ((8) holds) and μy -strongly y∈Qy
concave, one can achieve any arbitrary accuracy in (14), furthermore, in linear 1
> 0, in particular, Δ
= Δ ν , there exists time [5]. More formally, for any Δ Lxy y , such that 1 Δ ν
y − y1 2 ≤ . Lxy Then
∇g(x
) − ∇x f (x1 , y1 ) 2 ≤ Lxy
y − y1 ν2 ≤ Δ. (15) 1 ) − ∇g(x1 ) 2 = ∇x f (x1 , y Thus, taking into account that
∇g(x1 ), x2 − x1 = ∇g(x1 ) − ∇g(x 1 ), x2 − x1 + ∇g(x1 ), x2 − x1 , we get: μx
x2 − x1 22 +∇g(x 1 ), x2 − x1 + g(x1 ) − DΔ − δ0 2 L
x2 − x1 22 +DΔ + δ0 . (16) ≤ g(x2 ) ≤ g(x1 ) + ∇g(x 1 ), x2 − x1 + 2 The inequality (16) means, that g(x) admits the inexact (δ, L, μx )-model with δ = (DΔ + δ0 ) and L defined in Remark 2. It is well known [5], that using Fast Gradient Method for the described construction, after k iterations, one can obtain the following accuracy of the solution: L k μx k ∗ 2 g(x ) − g(x ) ≤ LR exp − . +δ 1+ 2 L μx
more As noted above, it is possible to achieve an arbitrarily small value of Δ,
precisely, such Δ, that
Algorithms for Variational Inequalities and Saddle Point Problems
ν + δ0 = DΔ + δ0 = δ ≤ DLxy Δ
93
ε . 2 1 + μLx
such that δ will be comparable In other words, we can obtain the appropriate Δ, to ε. For example, we can put ε ε , Δ . δ0 L 4 1 + μx 4D 1 + μLx Thus, after no more than k = 2 μLx log
2LR2 ε
iterations we obtain an ε-
solution (2) of the considered saddle point problem (1). The total number of iterations (including achieving the appropriate precision
is expressed as follows: of Δ) 2LD2 L Lyy 2Lyy R2 O · log , · · log μx μy ε ε ˜ where L = L
3
˜ (1−ν)(2−ν) L 2ε 2−ν
(1−ν)(1+ν) 2−ν
˜= ,L
Lxy
2Lxy μy
ν 2−ν
+ Lxx D
ν−ν 2 2−ν
.
Comparison of Theoretical Results for Accelerated Method and the Universal Proximal Method for Saddle-Point Problems with Generalized Smoothness
Let us investigate the effectiveness of the proposed method for solving strongly convex–concave saddle point problems in comparison with the Universal Algorithm, which was recently proposed in [14] to solve variational inequalities. We have to start with the problem statement and define all basics concerning variational inequalities and Proximal Setup. Note, that in Sect. 2 we used exclusively the Euclidean norm, while results of Sect. 3 and Sect. 4 hold for an arbitrary norm. Let E be some finite-dimensional vector space, E ∗ be its dual. Let us choose some norm · on E. Define the dual norm · ∗ as follows:
φ ∗ = max {φ, x}, x≤1
where φ, x denotes the value of the linear function φ ∈ E ∗ at the point x ∈ E. Let X ⊂ E be a closed convex set and g(x) : X → E ∗ be a monotone operator, i.e. g(x) − g(y), x − y ≥ 0 ∀x, y ∈ X. (17) We also need to choose a so-called prox-function d(x), which is continuously differentiable and convex on X, and the corresponding Bregman divergence, which can be understood as an analogue of the distance, defined as follows: V (y, x) = Vd (y, x) = d(y) − d(x) − ∇d(x), y − x ∀x, y ∈ X.
(18)
94
A. A. Titov et al.
In [14], the authors proposed the Universal Proximal method (UMP, this method is listed as Algorithm 1, below) for solving the problem (3) with inexactly given operator. In details, suppose that there exist some δ > 0, L(δ) > 0, such that, for any points x, y, z ∈ X, we are able to calculate g˜(x, δ), g˜(y, δ) ∈ E ∗ , satisfying ˜ g (y, δ) − g˜(x, δ), y − z ≤
L(δ)
y − x 2 + y − z 2 + δ 2
∀z ∈ X.
(19)
In addition, there was considered the possibility of using the restarted version of the method (Restarted UMP, see Algorithm 2 below) in the case of μ–strongly monotone operator g: g(x) − g(y), x − y ≥ μ x − y 2
∀x, y ∈ X.
We additionally assume that arg minx∈X d(x) = 0 and d(·) is bounded on the unit ball in the chosen norm · , more precisely d(x) ≤
Ω 2
∀x ∈ X : x ≤ 1,
where Ω is a known constant. Algorithm 1. Universal Mirror Prox (UMP) Require: ε > 0, δ > 0, x0 ∈ X, initial guess L0 > 0, prox-setup: d(x), V (x, z). 1: Set k = 0, z0 = arg minu∈Q d(u). 2: for k = 0, 1, ... do 3: Set Mk = Lk /2. 4: Set δ = 2ε . 5: repeat 6: Set Mk = 2Mk . 7: Calculate g˜(zk , δ) and g (zk , δ), x + Mk V (x, zk )} . wk = arg min {˜ x∈Q
8:
Calculate g˜(wk , δ) and g (wk , δ), x + Mk V (x, zk )} . zk+1 = arg min {˜ x∈Q
9:
(20)
(21)
until ˜ g (wk , δ)− g˜(zk , δ), wk −zk+1 ≤
10: Set Lk+1 = Mk /2, k = k + 1. 11: end for −1 Ensure: zk = k−11 −1 k−1 i=0 Mi wi . i=0
Mi
ε Mk wk − zk 2 +wk − zk+1 2 + +δ. (22) 2 2
Algorithms for Variational Inequalities and Saddle Point Problems
95
Algorithm 2. Restarted Universal Mirror Prox (Restarted UMP). 2 2 Ω Require: ε > 0, μ > 0, Ω : d(x) ≤ 2 ∀x ∈ Q : x≤ 1; x0 , R0 : x0 − x∗ ≤ R0 . 0 . 1: Set p = 0, d0 (x) = R02 d x−x R0 2: repeat of UMP for monotone case with prox-function dp (·) and 3: Set xp+1 as the output −1 ≥ Ω . stopping criterion k−1 i=0 Mi μ
2 ε Set Rp+1 = R02 · 2−(p+1) + 2(1 − 2−(p+1) ) 4μ . x−x 2 p+1 5: Set dp+1 (x) ← Rp+1 d Rp+1 . 6: Set p = p + 1. 2 2R0 7: until p > log2 . ε Ensure: xp .
4:
Remark 3 (Connection between saddle point problem and VI). Any saddle point problem min max f (x, y) x∈Qx y∈Qy
can be reduced to a variational inequality problem by considering the following operator: ∇x f (x, y) (23) , z = (x, y) ∈ Q := Qx × Qy . g(z) = −∇y f (x, y) Remark 4. Restarted UMP method returns a point xp such that
xp − x∗ 2 ≤ ε +
2δ . μ
Moreover, the number of calls to Algorithm 1 (UMP) while running Algorithm 2 (Restarted UMP) does not exceed 2 2 Lν 1+ν 2 1+ν Ω 2R02 inf . · 1−ν · log2 μ ε ν∈[0,1] ε 1+ν Note, that due to the adaptivity of UMP, in practice the number of calls may decrease (it will be shown experimentally later). Remark 5. Note, that for ν = 0 the convergence rate of the Restarted UMP (for ν = 0) and the accelerated method, introduced in Sect. 2, coincides, while for ν > 0 the asymptotic of the proposed accelerated method is better. 3.1
Numerical Experiments for Restarted Universal Mirror Prox Method
The problem of constrained minimization of convex functionals arises and attracts widespread interest in many areas of modern large-scale optimization and its applications [3,12].
96
A. A. Titov et al.
In this subsection, in order to demonstrate the performance of the Restarted UMP, we consider an example of the Lagrange saddle point problem induced by a problem with geometrical nature, namely, an analogue of the well-known smallest covering ball problem with non-smooth functional constraints. This example is equivalent to the following non-smooth convex optimization problem with functional constraints 2 (24) min f (x) := max x − Ak 2 ; ϕp (x) ≤ 0, p = 1, ..., m , x∈Q
1≤k≤N
where Ak ∈ Rn , k = 1, ..., N are given points and Q is a convex compact set. Functional constraints ϕp , for p = 1, ..., m, have the following form: ϕp (x) :=
n
αpi x2i − 5, p = 1, ..., m.
(25)
i=1
For solving such problems there are various first-order methods, which guar antee achieving an acceptable precision ε by function with complexity O ε−1 . We present the results of some numerical experiments, which demonstrate the effectiveness of the Restarted UMP. Remind, that we analyze how the restart technique can be used to improve the convergence rate of the UMP, and show that in practice it works with a convergence rate smaller than O ε−1 , for some different randomly generated data associated with functional constraints (25). The corresponding Lagrange saddle point problem of the problem (24) is defined as follows m
1 2 max L(x, λ) := f (x) + λ ϕ (x) − λ . min − p p x∈Q → 2 p=1 p λ =(λ1 ,λ2 ,...,λm )T ∈Rm + p=1 m
This problem is satisfied to (5) – (8) for ν = 0 and equivalent to the variational inequality with the monotone bounded operator ⎞ ⎛ m λp ∇ϕp (x), ∇f (x) + ⎠, G(x, λ) = ⎝ p=1 T (−ϕ1 (x) + λ1 , −ϕ2 (x) + λ2 , . . . , −ϕm (x) + λm ) where ∇f and ∇ϕp are subgradients of f and ϕp . For simplicity, let us assume that there exists (potentially very large) bound for the optimal Lagrange mul− → − →∗ tiplier λ . Thus, we are able to compactify the feasible set for the pair (x, λ ) to be an Euclidean ball of some radius. To demonstrate the independence on the choice of experimental data, the coefficients αpi in (25) are drawn randomly from four different distributions. – Case 1: the standard exponential distribution. – Case 2: the Gumbel distribution with mode and scale equal to zero and 1, respectively.
Algorithms for Variational Inequalities and Saddle Point Problems
97
– Case 3: the inverse Gaussian distribution with mean and scale equal to 1 and 2, respectively. – Case 4: from the discrete uniform distribution in the half open interval [1, 6). We run Restarted UMP for different values of n and m with standard − →0 1 Euclidean prox-structure and the starting point (x0 , λ ) = √m+n 1 ∈ Rn+m , where 1 is the vector of all ones. Points Ak , k = 1, ..., N , are chosen randomly from the uniform distribution over [0, 1). For each value of the parameters the random data was drawn 5 times and the results were averaged. The results of the work of Restarted UMP are presented in Table 1. These results demonstrate the number of iterations produced by Restarted UMP to reach the ε-solution of the problem (24) with (25), the running time of the algorithm in seconds, qualities of the solution with respect to the objective function f (f best := f (xout )) and the functional constraints g (g out := g(xout )), where xout denotes the output of the compared algorithms, with different values of ε ∈ {1/2i , i = 1, 2, 3, 4, 5, 6}. All experiments were implemented in Python 3.4, on a computer fitted with Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz, 1992 MHz, 4 Core(s), 8 Logical Processor(s). RAM of the computer is 8 GB. Table 1. The results of Restarted UMP, for the problem (24) with constraints (25) for Cases 1 – 4. Case 1: n = 1000, m = 50, N = 10 g out
Case 2: n = 1000, m = 50, N = 10
1 ε
Iter.
Time (sec.) f best
2
9
0.392
324.066325 −4.349744 12
0.517
324.889649 −3.965223
4
12
0.519
324.066312 −4.349732 16
0.630
324.889634 −3.965213
8
15
0.599
324.066305 −4.349692 20
0.807
324.889621 −3.965197
16 18
0.972
324.066295 −4.349680 24
0.977
324.889598 −3.965165
32 21
0.984
324.066291 −4.349653 28
1.224
324.889540 −3.965223
64 24
1.357
324.066286 −4.349598 32
1.317
324.889527 −3.965111
Case 3: n = 500, m = 25, N = 10
Iter.
Time (sec.) f best
g out
Case 4: n = 500, m = 25, N = 10
2
624
20.321
153.846214 −4.398782 832
26.629
158.210875 −2.610276
4
1319
39.045
153.842306 −4.398106 1702
51.258
158.201645 −2.601072
8
2158
58.919
153.830012 −4.397387 4144
96.136
158.190455 −2.599223
16 4298
117.383
153.827731 −4.397271 6145
174.713
158.188211 −2.598865
32 8523
264.777
153.826829 −4.397226 12081 351.520
158.187255 −2.598713
153.826382 -4.397204
158.186744 -2.598628
64 17584 554.480
30186 768.861
From the results in Table 1 we can see that the work of Restarted UMP, does not only dependent on the dimension n of the problem, the number of the constraints ϕp , p = 1, ..., m and the number of the points Ak , k = 1, ..., N , but also depends on the shape of the data, that generated to the coefficients αpi in (25). Also, these results demonstrate how the Restarted UMP, due to its adaptivity to the level of smoothness of the problem, in practice works with a
convergence rate smaller than O ε−1 , for all different cases 1–4.
98
4
A. A. Titov et al.
Mirror Descent for Variational Inequalities with Relatively Bounded Operator
Let g(x) : X → E ∗ , be an operator, given on some convex compact set X ⊂ E. Recall that g(x) is bounded on X, if there exists M > 0, such that
g(x) ∗ ≤ M
∀x ∈ X.
We can replace the classical concept of the boundedness of an operator by the so-called Relative boundedness condition as following. Definition 2 (The Relative boundedness). An operator g(x) : X → E ∗ is Relatively bounded on X, if there exists M > 0, such that (26) g(x), y − x ≤ M 2V (y, x) ∀x, y ∈ X, where V (y, x) is the Bregmann divergence, defined in (18). Remark 6. Let us note the following special case for (26): M 2V (y, x)
g(x) ∗ ≤ y = x.
y − x
In addition to the Relative boundedness condition, suppose that the operator g(x) is σ-monotone. Definition 3 (σ-monotonicity). Let σ > 0. The operator g(x) : X → E ∗ is σ-monotone, if the following inequality holds: g(y) − g(x), y − x ≥ −σ
∀x, y ∈ X.
(27)
For example, we can consider g = ∇σ f for σ-subgradient ∇σ f (x) of the convex function f at the point x ∈ X: f (y) − f (x) ∇σ f (x), y − x − σ for each y ∈ X (see e.g. [11], Chap. 5). Let us propose an analogue of the Mirror Descent algorithm for variational inequalities with Relatively bounded and σ-monotone operator. For any x ∈ X and p ∈ E ∗ , we define the Mirror Descent step M irrx (p) as follows: M irrx (p) = arg min p, y + V (y, x) . y∈X
The following theorem describes the effectiveness of the proposed Algorithm 3. Theorem 2 Let g : X → E ∗ be Relatively bounded and σ–monotone operator, i.e. (26) and (27) hold. Then after no more than N=
2RM 2 ε2
iterations of Algorithm 3, one can obtain an (ε + σ)–solution (4) of the problem (3), i.e. ˜ − x ≤ ε + σ. maxg(x), x x∈X
Algorithms for Variational Inequalities and Saddle Point Problems
99
Algorithm 3. Mirror Descent method for variational inequalities. Require: ε > 0, M > 0; x0 and R such that maxx∈X V (x, x0 ) ≤ R2 . 1: Set h = ε 2 . M 2: Initialization k = 0. 3: repeat 4: xk+1 = M irrxk hg(xk ) . 5: Set k = k + 1. 2 6: until k ≥ N = 2RM . 2 ε N −1 1 Ensure: x ˜ = N k=0 xk .
Proof According to the listing of the Algorithm 3, the following inequality holds: hg(xk ), xk − x ≤
h2 M 2 + V (x, xk ) − V (x, xk+1 ). 2
Taking summation over k = 0, 1, . . . , N − 1, we get h
N −1
g(xk ), xk − x ≤
k=0
N −1 k=0
h2 M 2 + V (x, x0 ) − V (x, xN ) 2
N 2 2 h M + V (x, x0 ). 2 Due to the σ-monotonicity of g, we have ≤
g(xk ), xk − x ≥ g(x), xk − x − σ. Whence, h
N −1
(g(xk ), xk − x + σ) ≥ h g(x),
k=0
where x ˜=
N −1
(xk − x)
= h g(x), N (˜ x − x) ,
k=0 1 N
N −1 k=0
xk . Since
N hg(x), x ˜ − x ≤
N 2 2 h M + V (x, x0 ) + N hσ, 2
we get g(x), x ˜ − x ≤
M 2 h V (x, x0 ) + + σ. 2 Nh
As h = Mε 2 , after the stopping criterion N ≥ (ε + σ)-solution of the problem (3).
2RM 2 ε2
is satisfied, we obtain an
100
5
A. A. Titov et al.
Conclusions
In the first part of the article we considered strongly convex-concave saddle point problems; using the concept of (δ, L, μx )–model we applied the Fast Gradient Method to obtain an ε-solution of the problem. We proved, that the proposed method can generate an ε-solution of the saddle point problem after no more than L Lyy 2LD2 2Lyy R2 · log iterations, · · log O μx μy ε ε (1−ν)(1+ν) ν 2 2−ν ˜ L˜ (1−ν)(2−ν) ˜ = Lxy 2Lxy 2−ν + Lxx D ν−ν 2−ν where L = L , L . 2ε 2−ν μy We conducted some numerical experiments for the Universal Proximal algorithm and its restarted version and analyzed their asymptotics in comparison with the proposed method. Also, we considered Minty variational inequalities with Relatively bounded and σ–monotone operator. We introduced the modification of the Mirror Descent method and proved, that obtaining an (ε + σ)-solution will take no more than 2RM 2 iterations. ε2
References 1. Alkousa, M.S., Gasnikov, A.V., Dvinskikh, D.M., Kovalev, D.A., Stonyakin, F.S.: Accelerated methods for saddle-point problems. Comput. Math. Math. Phys. 60(11), 1843–1866 (2020) 2. Antonakopoulos, K., Belmega, V., Mertikopoulos, P.: An adaptive Mirror-Prox method for variational inequalities with singular operators. In: Advances in Neural Information Processing Systems, pp. 8455–8465 (2019) 3. Ben-Tal, A., Nemirovski, A.: Robust truss topology design via semidefinite programming. SIAM J. Optim. 7(4), 991–1016 (1997) 4. Benzi, M., Golub, G.H., Liesen, J.: Numerical solution of saddle point problems. Acta numer. 14(1), 1–137 (2005) 5. Gasnikov, A.V.: Modern numerical optimization methods. Methods of universal gradient descent MCCME (2020) 6. Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011) 7. Lu, H.: Relative continuity for non-lipschitz nonsmooth convex optimization using stochastic (or deterministic) mirror descent. Inf. J. Optim. 1(4), 288–303 (2019) 8. Nemirovski, A.: Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convexconcave saddle point problems. SIAM J. Optim 15(1), 229–251 (2004) 9. Nesterov, Yu.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015) 10. Nesterov, Yu.: Relative smoothness: new paradigm in convex optimization. In: Conference Report, EUSIPCO-2019, A Coruna, Spain, 4 September 2019. http://eusipco2019.org/wp-content/uploads/2019/10/Relative-Smoothness-NewParadigm-in-Convex.pdf
Algorithms for Variational Inequalities and Saddle Point Problems
101
11. Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987) 12. Shpirko, S., Nesterov, Yu.: Primal-dual subgradient methods for huge-scale linear conic problem. SIAM J. Optim. 24(3), 1444–1457 (2014) 13. Scutari, G., Palomar, D.P., Facchinei, F., Pang, J.S.: Convex optimization, game theory, and variational inequality theory. IEEE Signal Process. Mag. 27(3), 35–49 (2010) 14. Stonyakin, F., Gasnikov, A., Dvurechensky, P., Alkousa, M., Titov, A.: Generalized mirror prox for monotone variational inequalities: universality and inexact oracle (2018). https://arxiv.org/pdf/1806.05140.pdf 15. Titov, A.A., Stonyakin, F.S., Alkousa, M.S., Ablaev, S.S., Gasnikov, A.V.: Analogues of switching subgradient schemes for relatively Lipschitz-continuous convex programming problems. In: Kochetov, Y., Bykadorov, I., Gruzdeva, T. (eds.) MOTOR 2020. CCIS, vol. 1275, pp. 133–149. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-58657-7 13
Application of Smooth Approximation in Stochastic Optimization Problems with a Polyhedral Loss Function and Probability Criterion Roman Torishnyi(B)
and Vitaliy Sobol
Moscow Aviation Institute (National Research University), 4 Volokolamskoye Shosse, 125993 Moscow, Russia [email protected], [email protected]
Abstract. In this paper, we consider a stochastic optimization problem with a convex piecewise linear (polyhedral) loss function, polyhedral constraints, and continuous random vector distribution. The objective is to maximize the probability that the loss function value does not exceed the specified level when constraints are satisfied. We use an approximation of polyhedral function and probability criterion, replacing maximum with smooth maximum transform and Heaviside function with sigmoid function. This replacement leads to the approximation of their gradients. Next, the problem solved using modified gradient descent. The accuracy and effectiveness of this method are shown in two examples. Also, we discuss some aspects of such approximations in the case of discrete random distribution. Keywords: Stochastic optimization · Polyhedral loss function · Probability criterion · Smooth approximation · Sigmoid function
1 1.1
Introduction General Introduction
Stochastic optimization is generally more complex than determined optimization. That is especially true for the problems which have a probability function as an optimization criterion, and often the latter type of problem appears to be highly practical. For example, consider the problem of calculating an optimal area of an airstrip. A large airstrip area demands more time, money, and resources to build, but it will guarantee a higher chance of successful airplane landing compared to a small airstrip. How much area is enough to guarantee a certain probability of landing? This particular problem can be represented as an optimization problem with probabilistic constraints, or a quantile optimization Supported by RFBR according to the research project № 20-31-90035. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 102–116, 2021. https://doi.org/10.1007/978-3-030-86433-0_7
Smooth Approximation in Stochastic Optimizations Problems
103
problem, and by now there are several solution methods already developed. But the fact is that almost every one of stochastic optimization problems requires a unique approach to solve it directly or some of them can be solved approximately under certain circumstances. A stochastic optimization problem with a polyhedral loss function, considered in this article, is the generalized version of many other practical problems that were reviewed in other works [1–3]. Solution methods to the problem of this type are related to stochastic approximation. Quasigradient methods [3,4] are applicable to this problem but have a low rate of convergence. A more effective way of solving is based on a generalized minimax approach and its modifications [3,5,6]. This method is aimed to find a so-called guaranteeing solution based on chosen optimal confidence subset. Picking this optimal confidence subset, however, can be a difficult task on its own, and guaranteeing solution only provides the evaluation of the upper bound criteria value. It is necessary to note that there are many other solution methods and techniques already developed for specific problems. A quantile optimization problem with discrete distribution can be in some cases reduced to a mixed integer programming problem [7]. Some of the two-step optimization problems can be reduced to a mixed integer linear programming problem [8]. A two-stage optimization problem with quantile criterion can be solved with the use of sample average estimates [9]. All these and other methods solve the particular problems, but unfortunately can not be adapted to every kind of stochastic optimization problem. We consider a different approach to the solution technique involving probability function gradients. 1.2
Information on Gradient Approach
In theory, calculating the probability function gradient allow us to use gradient optimization methods as a unified and simple way to handle these problems. Furthermore, it can help not only to solve the problems with probability functions as a criterion but also to solve optimization problems with probabilistic constraints. However, the direct calculation of the considered gradient in the general case is very difficult. The probability function gradient can be represented as a surface Riemann integral [10], as a Lebesgue integral over the surface [11]. Probability function derivatives also can be formulated as a result of Lebesgue integral transformation [12,13]. In some cases, the surface integrals in the expression of probability function derivative can be replaced with volume integrals [14], or at least as the sum of volume and surface integrals [15]. Generally, probability function gradient calculation involves integration over the surface which already is quite complex, and in many cases, this surface can not be found easily. The other approach to the problem above is finding the approximation or boundaries for probability function derivatives values. There are many works dedicated to this solution technique. For example, expressions for the lower boundary of probability function gradient norm [16] and the upper boundary of probability function subdifferentials [17] were obtained in the case of Gaussian distribution. Another estimation for probability function derivative was obtained
104
R. Torishnyi and V. Sobol
with weak derivatives and sampling [18]. With the use of Taylor series extensions approximation of quantile value and an estimate for probability function derivatives were obtained for some of the well-known distributions [19]. Other methods were used for obtaining an estimate of gradient value in the case of Weibull distribution [20] and for asymptotic formulas for first and second order derivatives [21]. Results of these works provide an approximation of some sort for probability function gradient values, but they are restrained to specific distributions or relate to other stochastic procedures. We consider a new approximation technique of quantile and probability functions and their derivatives. This technique for the case of a one-dimensional absolutely continuous stochastic variable was described in detail previously by authors [22]. The main idea of approximation is the replacement of the indicator function with its continuous differentiable approximation. This method can be used in many applied problems due to its universality and simplicity. In this paper, we consider the application of the approximation method in a quantile criterion optimization problem with polyhedral loss function and probabilistic constraints. We describe the solution algorithm and compare results obtained by the proposed method and results of other works [5,6].
2
Problem Statement and Approximations
In this section, the problem statement is given along with a brief description of the approximation method, its area of application, and obtained results. Also, this section provides other results needed to consider the solution algorithm of the stated optimization problem. 2.1
Problem Statement
Consider complete probability space (Ω, F, P) and a random vector X on this space with values x ∈ IRm . Let the convex and polyhedral loss function be defined as T x + b1i , Φ(u, x) = max AT1i u + B1i i=1,k1
and let the constraints be defined as T Q(u, x) = max AT2i u + B2i x + b2i , i=1,k2
where u ∈ U is the control vector, U ⊂ IRm is a set of feasible control vecT T tors, AT1i , AT2i , B1i , B2i are the rows of determined matrices A1 ∈ IRk1 ×n , A2 ∈ k2 ×n k1 ×m , B1 ∈ IR , B2 ∈ IRk2 ×m respectively, b1i , i = 1, k1 and b2i , i = 1, k2 IR are the components of determined vectors b1 ∈ IRk1 and b2 ∈ IRk2 respectively. Consider the probability that the loss function does not exceed the specified level of loss ϕ while constraints are satisfied: Pϕ (u) = P {Φ(u, X) ≤ ϕ, Q(u, X) ≤ 0} .
Smooth Approximation in Stochastic Optimizations Problems
105
The problem itself is described as finding the optimal control vector uϕ , maximizing considered probability: uϕ = arg max Pϕ (u). u∈U
(1)
The equivalent quantile minimization problem was considered earlier [5,6] alongside with solution method based on optimization on confidence sets, which is complex both algorithmically and computationally. To solve the problem (1), we are going to use the approximation of probability function gradient, which was first described in [22], and the gradient projection method. There are several issues we need to resolve before using this approach: – The probability approximation technique requires the existence of the loss function gradient and the constraints function gradient. A polyhedral function has derivatives at almost all points, but the derivative could not be defined at inflection points, where two or more functions with maximal values become equal; – Approximation of the probability function and its gradient should consider both the loss and constraints functions. The first issue is addressed in Subsect. 2.2, where we replace maximum with smooth maximum transform. The second issue is addressed in Subsect. 2.3, where we provide an approximation of probability function and its derivatives, and also show the correctness of such approximations for arbitrary random vector size. 2.2
Approximation of the Maximum
The smooth maximum function for a set of linear functions f1 (x), f2 (x), ..., fz (x) is defined as z f (x)eγfi (x) z i γf (x) . SMγ (f1 (x), ..., fz (x)) = i=1 i i=1 e where γ > 0 is a parameter of smoothness, represented as a big positive number. We can show that lim SMγ (f1 (x), ..., fz (x)) = max(f1 (x), ..., fz (x)),
γ→+∞
lim
γ→+∞
d d SMγ (f1 (x), ..., fz (x)) = max(f1 (x), ..., fz (x)), dx dx
(2) (3)
for any point x, where the right derivative is defined. The statement (2) is trivial. For proving (3), we consider a maximum of two functions and its smooth function approximation: max(f1 (x), f2 (x)) ≈ SMγ (f1 (x), f2 (x)) =
f1 (x)eγf1 (x) + f2 (x)eγf2 (x) . eγf1 (x) + eγf2 (x)
106
R. Torishnyi and V. Sobol
Derivative of the smooth maximum function can be calculated directly: d f (x)eγf1 (x) + f2 (x)eγf2 (x) SMγ (f1 (x), f2 (x)) = 1 + G(γ, f1 (x), f2 (x)), dx eγf1 (x) + eγf2 (x) where G(γ, f1 (x), f2 (x)) = γeγf1 (x) eγf2 (x)
(f1 (x) − f2 (x))(f1 (x) − f2 (x)) . (eγf1 (x) + eγf2 (x) )2
The value of G(·) at the inflection point of the original maximum function, e.g. at x∗ : f1 (x∗ ) = f2 (x∗ ), is equal to 0. At points that differ from x∗ we consider the behavior of the function G(·) at γ → +∞. Factor (f1 (x) − f2 (x))(f1 (x) − f2 (x)) of G(·) does not depend on γ and may be omitted. Without loss of generality, assume that f2 (x) > f1 (x) and divide both nominator and denominator by e2γ min(f1 (x),f2 (x)) , e.g. e2γf1 (x) : γeγf1 (x) eγf2 (x) γeγ(f2 (x)−f1 (x)) = lim . γ→+∞ (eγf1 (x) + eγf2 (x) )2 γ→+∞ (1 + eγ(f2 (x)−f1 (x)) )2
lim G(·) = lim
γ→+∞
Substitute the variable t = f2 (x) − f1 (x), t > 0, for shortness. The summand-one in the denominator is insignificant compared to the exponent value, so it can be omitted. Next, we apply L’Hopital’s rule: γeγ(f2 (x)−f1 (x)) γeγt γ 1 = lim = lim γt = lim = 0. γ→+∞ (1 + eγ(f2 (x)−f1 (x)) )2 γ→+∞ (eγt )2 γ→+∞ e γ→+∞ teγt lim
Then the approximation of smooth maximum function derivative in the case of two functions can be represented as f (x)eγf1 (x) + f2 (x)eγf2 (x) d SMγ (f1 (x), f2 (x)) ≈ 1 . dx eγf1 (x) + eγf2 (x)
(4)
Applying the same approach, that was used for function G(·), we can show that f1 (x)eγf1 (x) + f2 (x)eγf2 (x) → f1 (x) γ→+∞ eγf1 (x) + eγf2 (x) lim
if f1 (x) > f2 (x). It means that the derivative of a smooth maximum function can be used as an approximation of the maximum function derivative. Higher dimension cases can be validated using the same approach. In the case of multiple functions, the formula (4) is represented as z f (x)eγfi (x) d z i γf (x) . SMγ (f1 (x), ..., fz (x)) ≈ i=1 i dx i=1 e Considering the problem (1), we use the following approximations of the functions Φ(u, x) and Q(u, x): T k1 T T T i=1 A1i u + B1i x + b1i exp γ A1i u + B1i x + b1i ∗ , Φγ (u, x) = T k1 T i=1 exp γ A1i u + B1i x + b1i
Smooth Approximation in Stochastic Optimizations Problems
Q∗γ (u, x)
k2 =
i=1
107
T T AT2i u + B2i x + b2i exp γ AT2i u + B2i x + b2i . T k2 T i=1 exp γ A2i u + B2i x + b2i
The partial derivatives of these approximated functions with respect to control vector components uj , j = 1, m, are defined as ∂Φ∗γ (u, x) = ∂uj ∂Q∗γ (u, x) = ∂uj
k1
T T γ A1i u + B1i x + b1i , T Tx+b A1i u + B1i 1i
k2
T T γ A2i u + B2i x + b2i . T Tx+b A2i u + B2i 2i
i=1 A1ij exp k1 i=1 exp γ
i=1 A2ij exp k 2 i=1 exp γ
Considering previous claims, we state that lim P Φ∗γ (u, X) ≤ ϕ, Q∗γ (u, X) ≤ 0 = P {Φ(u, X) ≤ ϕ, Q(u, X) ≤ 0} = Pϕ (u).
γ→∞
2.3
Approximation of Probability Function
This subsection addresses the problem of gradient approximation of the probability function Pϕ (u), which will be used later to solve the problem (1). At first, we briefly provide the basic results from [22]. Consider smooth strictly piecewise monotonic function g(u, X), depending on control vector u ∈ IRl and absolutely continuous random variable X with probability density function f (x). The probability that random value g(u, X) will not exceed a specified level ϕ is defined as +∞ +∞ I {g(u, x) ≤ ϕ} f (x)dx = Θ(ϕ − g(u, x))f (x)dx, P {g(u, X) ≤ ϕ} = −∞
−∞
where I(·) is the indicator function and Θ(·) is the Heaviside function. The main idea considered in [22] is to replace the discontinuous Heaviside function with its differentiable approximation and obtain the differentiable expression for the desired probability. So we can replace the Heaviside function with the sigmoid function 1 , Sθ (t) = 1 + e−θt where the parameter θ corresponds with the steepness of the sigmoid function and usually is a large positive number. We previously showed in [22] that +∞ Sθ (ϕ − g(u, x))f (x)dx = P {g(u, X) ≤ ϕ} . lim
θ→+∞ −∞
(5)
108
R. Torishnyi and V. Sobol
Furthermore, it was shown that
∂ lim θ→+∞ ∂ϕ
+∞ Sθ (ϕ − g(u, x))f (x)dx = −∞ +∞ θ [1 − Sθ (ϕ − g(u, x))] Sθ (ϕ − g(u, x))f (x)dx = lim
θ→+∞ −∞
∂ P {g(u, X) ≤ ϕ} , ∂ϕ ∂ lim θ→+∞ ∂ui lim
(6)
+∞ Sθ (ϕ − g(u, x))f (x)dx = −∞ +∞ ∂g(u, x) θ [Sθ (ϕ − g(u, x) − 1)] Sθ (ϕ − g(u, x)) f (x)dx = ∂ui
θ→+∞ −∞
∂ P {g(u, X) ≤ ϕ} , ∂ui
(7)
where partial derivatives are calculated with respect to control vector components ui , i = 1, l. With some additional requirements, these statements can be generalized for higher dimension cases. Consider random vector X = [X1 , X2 ]T with joint probability density function f (x1 , x2 ) and the smooth function g(u, X1 , X2 ). Similarly to a one-dimensional random variable case, probability function approximation is represented as a volume integral: +∞ +∞ Sθ (ϕ − g(u, x1 , x2 ))f (x1 , x2 )dx1 dx2 , P {g(u, X1 , X2 ) ≤ ϕ} ≈ −∞ −∞
and expressions for the probability function derivatives approximation represents as volume integrals similar to (6), (7). Convergence of the approximation to the original functions can be proved by considering the conditional probability density function of X1 given the fixed value x∗2 of X2 fX1 |X2 (x1 |x∗2 ) =
f (x1 , x∗2 ) , fX2 (x∗2 )
where fX2 (·) is a marginal probability density function of X2 . Conditional probability function and its approximation are defined similarly to the one-dimensional case: +∞ ∗ P {g(u, X1 , X2 ) ≤ ϕ|X2 = x2 } = Θ(ϕ − g(u, x1 , x∗2 ))fX1 |X2 (x1 |x∗2 )dx1 , −∞
Smooth Approximation in Stochastic Optimizations Problems
P {g(u, X1 , X2 ) ≤ ϕ|X2 =
x∗2 }
109
+∞ ≈ Sθ (ϕ − Φ(u, x1 , x∗2 ))fX1 |X2 (x1 |x∗2 )dx1 . −∞
The convergence statements (6), (7) are applicable for these probability density functions since they are actually one-dimensional. We can represent the original and approximate two-dimensional probability functions according to the following law of total probability +∞ P {g(u, X1 , X2 ) ≤ ϕ|X2 = x∗2 } fX2 (x∗2 )dx∗2 . P {g(u, X1 , X2 ) ≤ ϕ} = −∞
Using the fact of convergence of conditional density functions and dominance of the integrand factors, we meet the requirements of Lebesgue’s dominated convergence theorem, so we state that +∞ +∞ lim Sθ (ϕ − g(u, x1 , x2 ))f (x1 , x2 )dx1 dx2 = P {g(u, X1 , X2 ) ≤ ϕ} ,
θ→+∞ −∞ −∞
which is the two-dimensional analog of (5). To prove two-dimensional analogs of the convergence statements (6), (7), we need an additional requirement that partial derivatives of the function g(u, X1 , X2 ) with respect to control vector components ui , i = 1, l, must be finite:
∂g(u, x1 , x2 )
< Ki ;
∀i = 1, l ∃Ki ∈ IR, Ki < ∞ :
∂ui with this requirement, considered statements are proven similarly. This concept of approximation and using lower-dimensional convergence can be used for obtaining approximation formulas and proving convergence statements for higher dimensions by induction. Now we get back to the original problem. We need to point out that in the problem (1) the desired probability includes two inequalities. To use the provided approximations, we will consider the indicator of two inequalities: Pϕ (u) = P {Φ(u, X) ≤ ϕ, Q(u, X) ≤ 0} +∞ +∞ ... I {Φ(u, x) ≤ ϕ, Q(u, x) ≤ 0} f (x)dx1 ...dxm = −∞
−∞ +∞ +∞ = ... Θ (ϕ − Φ(u, x)) Θ (−Q(u, x)) f (x)dx1 ...dxm , −∞
−∞
where f (x) = f (x1 , ..., xm ) is the probability density function of X. Using a similar approach, we can replace each Heaviside function with the sigmoid function.
110
R. Torishnyi and V. Sobol
At the same time, considering the replacement of the maximum with the smooth maximum transforms, we obtain the differentiable approximation Pϕθ,γ (u) of the probability function Pϕ (u): Pϕθ,γ (u)
+∞ +∞ = ... Sθ ϕ − Φ∗γ (u, x) Sθ −Q∗γ (u, x) f (x)dx1 ...dxm . −∞
(8)
−∞
The partial derivatives of the function Pϕθ,γ (u) with respect to control vector components uj , j = 1, m, are calculated similarly to (7): +∞ +∞ ∂ θ,γ P (u) = ... θSθ (ϕ − Φ∗γ (u, x))Sθ (−Q∗γ (u, x))· ∂uj ϕ −∞ −∞ Sθ (ϕ − Φ∗γ (u, x)) − 1 Φ∗γ u (u, x) + Sθ (−Q∗γ (u, x)) − 1 Q∗γ u (u, x) · j
j
f (x)dx1 ...dxm .
(9)
Considering the previous statements, we show that lim
θ→+∞,γ→+∞
lim
θ→+∞,γ→+∞
Pϕθ,γ (u) = Pϕ (u),
∂ θ,γ ∂ P (u) = Pϕ (u). ∂uj ϕ ∂uj
So, we get a new optimization problem, similar to (1): θ,γ uθ,γ ϕ = arg max Pϕ (u). u∈U
3
(10)
Optimization Algorithm
Now we propose the modified gradient descent method using these approximations. We set the initial point u0 , desired level of loss ϕ, basic step value h, , umax for each of control vector number of steps N , and basic boundaries umin i i component values ui . We also calculate the value of the probability function p0 at initial point u0 using formula (8). Modified Gradient Descent Algorithm – Calculate the approximate gradient value Δp = [Δp1 , Δp2 , ..., Δpn ]T of the probability function at current point uk using formula (9); – Correct the gradient vector based on component values uki of the current control vector uk . For each i = 1, n: and Δpi < 0, then set Δpi = 0; • if uki < umin i • if uki > umax and Δpi > 0, then set Δpi = 0; i
Smooth Approximation in Stochastic Optimizations Problems
111
– Calculate the current step value hk = h ·
−1 max |Δpi | ;
i=1,n
– Correct the current step value based on the supposed next control vector uk+1 = uk + hk · Δp. For each i = 1, n: • if uk+1 < umin , then set hk = min{hk , (umin − uki )/Δpi }; i i i k+1 max k k max • if ui > ui , then set h = min{h , (ui − uki )/Δpi }; k+1 – Calculate the supposed next control vector u = uk + hk · Δp; k+1 – Calculate the probability function value p at point uk+1 using formula (8); – Check the conditions of the algorithm: • if pk+1 ≤ pk , then modify the basic step as h = h · L, L ∈ [0, 1]; repeat the algorithm, assuming k = k + 1; • if pk+1 > pk , then repeat the algorithm, assuming uk = uk+1 , k = k + 1; • if k ≥ N , exit the algorithm. The optimal solution is uopt = uk . The considered algorithm was implemented using Python programming language.
4 4.1
Experimental Part Optimization Problem with Continuous Distribution
Consider the problem stated in Sect. 2.1 with the following parameters:
6 −6 1 2 1 A1 = , B1 = , b1 = , 10 1 1 −10 −5 A2 = 10 −10 , B2 = −3 −1 , b2 = −2 . Let the control vector component boundaries be umin = umin = 0, 1 2
umax = umax = 10. 1 2
Let the random vector X with values x ∈ IR2 have Gaussian distribution:
10 X ∼ N (0, I), I = . 01 This problem is equal to the problem considered in [5]. In [5], each algorithm assumes a transition from the probability maximization problem to an equivalent quantile minimization problem with a fixed desired probability called confidence level. To show the advantages of the new algorithm for each example from [5] we take the obtained optimal quantile value as the parameter ϕ and then solve the problem (10). We set the algorithm parameters and the parameters of sigmoid and smooth maximum functions: u0 = [5, 5],
h = 0.1,
N = 100,
θ = γ = 12.
112
R. Torishnyi and V. Sobol Table 1. Comparison of optimization algorithms at confidence level 0.9 Other algorithms u∗
Algorithm
ϕ
Algorithm 1 [5]
18.656
Algorithm 2 [5]
14.674
0.9
0.3218 0
Quasigradient algorithm
0
13.906
Our algorithm
Pϕ (u∗ ) uopt
0.808
0.9
1.44 −0.0025
0.9
1.337
0
0.841 0 0
Pϕθ,γ (uopt ) Pϕ (uopt ) 0.9504
0.951
0.9106
0.9105
0.9017
0.9022
0.751
Table 2. Comparison of optimization algorithms at confidence level 0.8 Other algorithms Algorithm
ϕ
Algorithm 1 [5]
13.234
Algorithm 2 [5]
8.511
u∗
0.8
0.2032 0
Quasigradient algorithm 7.943
0
1.48 −0.005 1.084
Our algorithm
Pϕ (u∗ ) uopt
0.8
0.7695 0
0.8
0
0.737 0 0.6985
Pϕθ,γ (uopt ) Pϕ (uopt ) 0.892
0.8939
0.8163
0.8165
0.8072
0.8076
After finding the optimal solution uopt with our algorithm, we additionally calculate the optimal value Pϕ (uopt ) of the probability function without approximation. Results of the algorithms are presented in Tables 1 and 2. The optimal control vector obtained by other methods is denoted in the tables as u∗ . In every case the approximate probability value at an optimal point is greater than the level of confidence used in the corresponded problem, thus the correctness of obtained solution and the overall algorithm is shown. The level of probability obtained by our method is significantly greater than these levels used in algorithm 1 [5] and algorithm 2 [5], which means that our solution is better than these solutions. Also, the level of probability obtained by our method is very close to that in the solution obtained by the quasigradient method, which is the closest to the exact solution. This approach can be used with other optimization algorithms relying on gradient calculation to obtain a more accurate solution; we have chosen gradient descent mainly because of its simplicity and comprehensibility. Also, these results can show that approximation of the probability function itself has a good level of accuracy. In the case of the problem with confidence level 0.9, the modulo threshold between approximated and true values of the probability function does not exceed 0.07%. In the case of the problem with confidence level 0.8, the modulo threshold between approximated and true values of the probability function does not exceed 0.22%. We showed in [22] that the expressions for the threshold between approximated and true values of prob-
Smooth Approximation in Stochastic Optimizations Problems
113
ability function even in a one-dimensional case are quite complex and bulky, and we also prove that the threshold tends to zero as the parameter of the sigmoid function tends to infinity. In the case of the considered problem, we expect a similar behavior for the considered threshold, but this issue requires further investigation. 4.2
High Dimension Optimization Problem
Consider the higher dimension problem stated in Sect. 2.1 with the following parameters: ⎞ ⎛ ⎛ ⎞ 1 2 3 4 −5 6 7 8 4 5 12 ⎜ 2 1 4 3 6 5 8 −7 5 4⎟ ⎜2 1⎟ ⎟ ⎜ ⎜ ⎟ ⎜−5 7 −4 1 3 8 6 2 3 2⎟ ⎜3 4⎟ ⎟ ⎜ ⎜ ⎟ ⎜ 3 5 2 −7 8 −1 6 4 2 3⎟ ⎜2 3⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ A1 = ⎜ 4 3 −2 1 5 7 8 6 1 2⎟ , B1 = ⎜ ⎜1 1⎟ , ⎜−2 3 −5 −1 −6 −4 −7 −8 −2 1⎟ ⎜3 2⎟ ⎟ ⎜ ⎜ ⎟ ⎜−3 −2 −4 −5 −1 −8 −6 7 −5 4⎟ ⎜2 2⎟ ⎟ ⎜ ⎜ ⎟ ⎝−2 −3 −5 4 −8 −1 7 6 −4 5⎠ ⎝1 3⎠ 6 −4 2 −1 3 −5 −8 7 −6 5 31 A2 = 5 6 5 6 4 4 5 6 5 6 , B2 = 1 2 , b1 = 2 3 2 4 3 4 5 5 2 , b2 = −10 . Let the control vector component boundaries be = 0, umax = +∞ umin i i
∀i = 1, 10.
This problem is equal to those considered in [6] besides the type of random variable distribution. Let X be a variable with uniform distribution: X ∼ U ([0, 2] × [0, 2]) . We set the desired loss level according to results of [6], algorithm parameters, and the parameters of sigmoid and smooth maximum functions as follows ϕ = 10.2778,
u0i = 0.2
∀i = 1, 10,
h = 0.1,
N = 100,
θ = γ = 12.
The result of the algorithm is uopt = [0.0946, 0, 0.384, 0.2745, 0, 0, 0, 0, 0, 0]T , Pϕθ,γ (uopt ) = 0.818, which corresponds with the solution obtained in [6] in a sense of corresponding non-zero control vector components. The modulo threshold between approximated and true values of the probability function does not exceed 0.68% in this case.
114
R. Torishnyi and V. Sobol
Previously considered problem [6] has a random variable X with a discrete distribution. Formulas for the approximation of probability function and its gradient are represented in a form of expectations. So, technically we can compute these expectations in a discrete case using finite sums. Let X be a discrete +∞ random variable with values {xi }i=1 , P(xi ) = pi . The discrete analog of the probability function is represented as Pϕ (u) =
+∞
I {Φ(u, xi ) ≤ ϕ} pi =
i=1
+∞
Θ(ϕ − Φ(u, xi ))pi .
i=1
The core of the issue lies in Heaviside function replacement with a sigmoid. For ˆj ) = ϕ the values of Heaviside values x ˆj of random variable X such that Φ(u, x function and sigmoid function are not equal: Θ(ϕ − Φ(u, x ˆj )) = 1,
Sθ (ϕ − Φ(u, x ˆj )) =
1 1 = . −θ·0 1+e 2
Thus the usage of smooth approximation in the case of discrete variables leads to an underestimation of the probability function value.
5
Conclusion
The approximation method proposed in the present paper is a powerful mathematical tool for the approximate solution of many stochastic optimization problems. Using gradient value approximation, the solution algorithm can be brought to simple and clear gradient methods. Gradient approximation itself has a small level of complexity due to volume integration. The solution of the considered problem with polyhedral loss function and probabilistic constraints still remains effective even at higher dimensions and corresponds with the results of other works. Further work in the subject area is aimed at the application of the proposed algorithm to other stochastic optimization problems and obtaining a similar solution algorithm in problems with quantile criterion. Funding Information. This work was funded by RFBR according to the research project № 20-31-90035.
References 1. Gartska, S.J.: The economic equivalence of several stochastic programming models. In: Dempster, M.A.H. (ed.) Stochastic Programming, pp. 83–91. Academic Press, New York (1980) 2. Prekopa, A., Szantai, T.: Flood control reservoir system design. Math. Program. Study 9, 138–151 (1978) 3. Kibzun, A., Kan, Y.: Stochastic Programming Problems with Probability and Quantile Functions. Wiley, Chichester, New York, Brisbane (1996)
Smooth Approximation in Stochastic Optimizations Problems
115
4. Kibzun, A., Matveev, E.: Stochastic quasigradient algorithm to minimize the quantile function. Autom. Remote Control 71, 1034–1047 (2010). https://doi.org/10. 1134/S0005117910060056 5. Naumov, A., Ivanov, S.: On stochastic linear programming problems with the quantile criterion. Autom. Remote Control 72, 353–369 (2011). https://doi.org/ 10.1134/S0005117911020123 6. Ivanov, S., Naumov, A.: Algorithm to optimize the quantile criterion for the polyhedral loss function and discrete distribution of random parameters. Autom. Remote Control 73, 105–117 (2012). https://doi.org/10.1134/S0005117912010080 7. Kibzun, A., Naumov, A., Norkin. V.: On reducing a quantile optimization problem with discrete distribution to a mixed integer programming problem. Autom. Remote Control 74, 951–967 (2013). https://doi.org/10.1134/S0005117913060064 8. Kibzun, A.I., Ignatov, A.N.: Reduction of the two-step problem of stochastic optimal control with bilinear model to the problem of mixed integer linear programming. Autom. Remote Control 77(12), 2175–2192 (2016). https://doi.org/10.1134/ S0005117916120079 9. Ivanov, S.V., Kibzun, A.I.: Sample average approximation in a two-stage stochastic linear program with quantile criterion. Proc. Steklov Inst. Math. 303(1), 115–123 (2018). https://doi.org/10.1134/S0081543818090122 10. Raik, E.: The differentiability in the parameter of the probability function and optimization of the probability function via the stochastic pseudogradient method. Proc. Acad. Sci. Est. SSR Phys. Math. 24(1), 3–9 (1975) 11. Kibzun, A., Tretyakov, G.: On the smoothness of criteria function in quantile optimization. Autom. Remote Control 58(9), 1459–1468 (1997) 12. Marti, K.: Approximations and Derivatives of Probability Functions. In: Anastassiou, G., Rachev, S.T. (eds.) Approximation, Probability, and Related Fields, pp. 367–377, Springer, Boston (1994). https://doi.org/10.1007/978-1-4615-2494-6 28 13. Marti, K.: Differentiation formulas for probability functions: the transformation method. Math. Program. 75, 201–220 (1996). https://doi.org/10.1007/ BF02592152 14. Uryas’ev, S.: Derivatives of probability functions and some applications. Ann. Oper. Res. 56, 287–311 (1995). https://doi.org/10.1007/BF02031712 15. Uryas’ev, S.: Derivatives of probability functions and integrals over sets given by inequalities. J. Comput. Appl. Math. 56(1–2), 197–223 (1994). https://doi.org/10. 1016/0377-0427(94)90388-3 16. Henrion, R.: Gradient estimates for Gaussian distribution functions: application to probabilistically constrained optimization problems. Numer. Algebra Control Optim. 2(4), 655–668 (2012). https://doi.org/10.3934/naco.2012.2.655 17. van Ackooij, W., Henrion, R.: (Sub-)Gradient formulae for probability functions of random inequality systems under Gaussian distribution. SIAM J. Uncertain. Quantif. 5(3), 63–87 (2017). https://doi.org/10.1137/16M1061308 18. Pflug, G., Weisshaupt, H.: Probability gradient estimation by set-valued calculus and applications in network design. SIAM J. Optimiz. 15(3), 898–914 (2005). https://doi.org/10.1137/S1052623403431639 19. Yu, C., Zelterman, D.: A general approximation to quantiles. Commun. Stat. Theor. Methods 46(19), 9834–9841 (2017) 20. Okagbue, H., Adamu, M., Anake, T.: Ordinary differential equations of the probability functions of the Weibull distribution and their application in ecology. Int. J. Eng. Future Technol. 15, 57–78 (2018)
116
R. Torishnyi and V. Sobol
21. Garniera, J., Omraneb, A., Rouchdyc, Y.: Asymptotic formulas for the derivatives of probability functions and their Monte Carlo estimations. Eur. J. Oper. Res. 198(3), 848–858 (2009). https://doi.org/10.1016/j.ejor.2008.09.026 22. Sobol, V., Torishnyi, R.: On smooth approximation of probabilistic criteria in stochastic programming problems. SPIIRAS Proc. 19(1), 181–217 (2020). https:// doi.org/10.15622/sp.2020.19.1.7
An Acceleration of Decentralized SGD Under General Assumptions with Low Stochastic Noise Ekaterina Trimbach(B) and Alexander Rogozin Moscow Institute of Physics and Technology, 9 Institutskiy Per., 141701 Dolgoprudny, Moscow Region, Russia {trimbach.ea,aleksandr.rogozin}@phystech.edu
Abstract. Distributed optimization methods are actively researched by optimization community. Due to applications in distributed machine learning, modern research directions include stochastic objectives, reducing communication frequency and time-varying communication network topology. Recently, an analysis unifying several centralized and decentralized approaches to stochastic distributed optimization was developed in Koloskova et al. (2020). In this work, we employ a Catalyst framework and accelerate the rates of Koloskova et al. (2020) in the case of low stochastic noise.
Keywords: Decentralized optimization Catalyst
1
· Decentralized SGD ·
Introduction
In this paper, we consider an optimization problem with sum-type functional n 1 fi (x) . (1) f = min f (x) := n i=1 x∈Rd Each fi is defined in stochastic form fi (x) := Eξi ∼Di Fi (x, ξi ) , n
where ξi is a random variable with distribution Di . Random variables {ξi }i=1 are independent and do not depend on x, as well. We seek to solve the problem (1) in a decentralized distributed environment, where each of n agents locally holds The work of E. Trimbach and A. Rogozin was supported by Andrei M. Raigorodskii Scholarship in Optimization. The research of A. Rogozin is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No-07500337-20-03, project No. 0714-2020-0005. This work started during Summer school at Sirius Institute. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 117–128, 2021. https://doi.org/10.1007/978-3-030-86433-0_8
118
E. Trimbach and A. Rogozin
fi and has access to stochastic gradient ∇Fi (x, ξ). The agents are connected to each other via a communication network. In this paper, we accelerate decentralized stochastic gradient descent under low noise conditions using the Catalyst shell. The Catalyst approach was originally developed in [12] for deterministic problems and generalized to a stochastic setting in [7]. The Catalyst envelope allows accelerating the convergence rate of a deterministic or stochastic optimization method in strongly convex problems. We apply the Catalyst acceleration algorithm described in the article [7] while extending it to the decentralized case. As a result, we manage to achieve acceleration under the condition of low stochastic noise. In article [4], it is proved that complexity of Decentralized Stochastic Gradi2 ent Descent (DSGD) for computing such X that μE ¯ x − x 2 ≤ ε is √ √ 2 ¯ +σ L( ζτ ¯ pτ ) 1 σ ¯ Lτ ˜ √ log + + O . μp ε μnε μp ε In this article, we propose an accelerated version of DSGD that achieves accuracy ε after √ √ √ 2 √ ¯ +σ L( ζτ ¯ pτ ) τ L L¯ σ 1 ˜ √ O √ log + √ + p μ ε nμ με μp ε iterations. In the case of low stochastic noise, i.e. when σ ¯ is small and the first summand is dominant, the total DSGD complexity decreases. This paper is organized as follows. The rest of Sect. 1 is devoted to related work, notation and assumptions. After that, in Sect. 2 we overview the DSGD algorithm of [4] and provide its Catalyst-accelerated version in Sect. 3. Finally, we cover the proofs of main theorems in Sect. 4. 1.1
Related Work
Decentralized optimization methods are based on applying gradient updates and communication procedures. Schemes based on the direct distribution of gradient descent [26] and sub-gradient descent [17] have a simple implementation but only converge to a neighbourhood of the solution in case of constant step-sizes. Algorithms which converge to an exact solution are well developed in the literature, as well. Exact methods include EXTRA [21], DIGing [16] and NIDS [10]. In many analyses the performance of a decentralized method depends on function conditioning κ and graph condition number χ, (the term χ typically characterizes graph connectivity). In order to enhance the dependence on κ and χ, accelerated schemes are applied. Accelerated methods can use either direct Nesterov acceleration (DNGD [19], Mudag [25], OPAPC [6], Accelerated Penalty Method [2,8]) or employ a Catalyst framework [9]. Accelerated methods SSDA and MSDA [20] use dual conjugates to local functions fi held by the nodes. Moreover, decentralized schemes such as DIGing [16] and Push-Pull Gradient Method [18] are capable of working on a time-varying network. For more references and performance comparison see Table 1 in [24].
Catalyst Acceleration of Decentralized SGD
119
The methods mentioned above are deterministic, i.e. using gradient of local functions fi . This paper is devoted to stochastic decentralized methods, which are particularly interesting due to their applications in distributed machine learning and federated learning [5,14,15]. Several distributed SGD schemes were proposed and analyzed in [1,3,11,23]. A Local-SGD framework was studied in [22,27]. Finally, a recent paper [4] proposed a unified analysis covering many cases and variants of decentralized SGD. 1.2
Notation and Assumptions
Throughout the paper, we use small letters for vectors and capital letters for n×d matrices. We also denote 1 = (1 . . . 1) ∈ Rn . For matrix X ∈ R n , we denote ¯ = 1/n i=1 xi and let its rows X = (x1 . . . xn ) , the average of its rows x ¯ = X 11 = (¯ X x . . . x ¯ ) . n We also introduce standard assumptions for convex optimization. Assumption 1 (L-smoothness). Each function Fi (x, ξi ) is differentiable for every ξi ∈ supp(Di ) and L-smooth. In other words, there exists a constant L ≥ 0 such that for every x1 , x2 ∈ Rd it holds ∇Fi (x1 , ξi ) − ∇Fi (x2 , ξi )2 ≤ Lx1 − x2 2 .
(2)
Assumption 2 (μ-strong convexity). Every function fi : Rd → R is strongly convex with constant μ ≥ 0. In other words, for every x1 , x2 ∈ Rd it holds fi (x1 ) − fi (x2 ) +
μ x1 − x2 22 ≤ ∇fi (x1 ), x1 − x2 . 2
(3)
Assumption 3 (Bounded noise) Let x = arg min f (x). Define x∈Rd
2
ζi2 := ∇fi (x )2 , and also introduce
2
σi2 := Eξi ∇Fi (x , ξi ) − ∇fi (x )2
1 2 ζ , ζ¯2 := n i=1 i n
1 2 σ . n i=1 i n
σ ¯ 2 :=
We assume that σ ¯ 2 and ζ¯2 are finite. In the decentralized case, computational nodes communicate to each other via a network graph. The communication protocol can be written using a matrix W, which elements [W]ij characterize the communication weights between individual nodes. Moreover, the network changes over time, and at each communication round a new instance of W is sampled from a random distribution W (for a detailed discussion of various cases of distribution W, see [4]). For further analysis, we impose an assumption onto the mixing matrix, which is similar to that of [4].
120
E. Trimbach and A. Rogozin
Assumption 4 (Expected Consensus Rate) (t) The entries wij of matrix W (t) are positive if and only if nodes i and j are connected at time t. Moreover, there exist two constants τ ∈ Z, τ ≥ 1 and p ∈ (0, 1] such that for all integers ∈ {0, . . . , T /τ } and all matrices X ∈ Rd×n ¯ 2 ≤ (1 − p)X − X ¯ 2 EW XW,τ − X 2 2 ¯ := X 11 and E is taken over the where W,τ = W((+1)τ −1) . . . W(τ ) , X n distribution Wt ∼ W t .
2
Decentralized SGD Algorithm
Decentralized stochastic gradient descent allows to search for the minimum of a sum-type function by performing calculations in parallel on each machine. Decentralization increases the fault tolerance of the algorithm and ensures data security. Special cases of DSGD are SGD (one machine is used) and Local SGD (a central system is allocated that has access to all others nodes). Below we recall the DSGD algorithm and its main convergence result from [4] Algorithm 1. Decentralized SGD Require: Initial guess X0 , functions fi , −1 , number of iterations T, initialize step-size {ηt }Tt=0 mixing matrix distributions W t for t ∈ [0, T ] , for each i-th node initialize x0i ∈ Rd from X0 1: for t = 0, . . . , T do 2: Sample Wt ∼ W t 3: Parallel processes for every task for worker i ∈ [n] 4: Sample ξit and compute stochastic gradient git := ∇Fi xti , ξit t+ 1 2
xi
= xti − ηt git 1 t t+ 2 6: := j∈N t wij xj i 7: end for
5:
xt+1 i
Theorem 5 (Theorem 2 in [4]). Let Assumptions 1, 2, 3, 4 hold. Then for any ε > 0 there exists a step-size ηt (potentially depending on ε) such that after running Algorithm 1 for T iterations ε-accuracy is attained in the following sense: T T +1 2
wt ¯ E f (¯ xt ) − f + μE x − x 2 ≤ ε. (4) WT t=0 n T where wt = (1 − μ2 ηt )−(t+1) , WT := t=0 wt , x ¯t := n1 i=1 xti . The number of iterations T is bounded as √ √ 2 ¯ +σ L( ζτ ¯ pτ ) 1 σ ¯ Lτ ˜ √ + log O + , (5) μnε μp ε μp ε ˜ notation hides constants and polylogarithmic factors. where the O
Catalyst Acceleration of Decentralized SGD
3
121
Accelerated DSGD
3.1
Overview of Catalyst Framework
The Catalyst shell have gained a lot of attention recently, mainly due to their wide range of applications. The method allows to speed up the algorithms by wrapping them in a shell, in which at each step it is necessary to minimize some surrogate function. It was originally described in the articles [12] and [13] and subsequently a huge number of adaptations of this algorithm recently were proposed. This paper uses the implementation of the Catalyst envelope proposed in the article [7]. Suppose we have some algorithm M, which is able to solve the problem F = min F (x) x∈Rd
(6)
with some accuracy ε, where F is μ-strongly convex function. Paper [7] suggests to choose a surrogate function hk satisfying the following properties: (H1 ) (H2 ) (H3 )
hk is (κ + μ)-strongly convex. κ 2 E[hk (x)] ≤ F (x) + x − yk−1 for x = αk−1 x + (1 − αk−1 ) xk−1 . 2 ∀εk ≥ 0, M can provide a point xk : E [hk (xk ) − hk ] ≤ εk .
We recall the Catalyst acceleration algorithm from [7]. Algorithm 2. Generic Acceleration Framework with Inexact Minimization of hk Require: Input: x0 (initial guess); M (optimization method); μ (strong convexity constant); κ (parameter for hk ); K (number of iterations); {εk }∞ k=1 (sequence of approximation errors). √ μ ; α0 = q if μ = 0. Define y0 = x0 ; q = μ+κ 1: for k = 1, . . . , K do 2: Choose a surrogate hk satisfying (H1 ) , (H2 ) and calculate xk satisfying (H3 ) for εk ; 2 + qαk . 3: Compute αk in (0,1) by solving the equation αk2 = (1 − αk ) αk−1 4: Update the extrapolated sequence yk = xk + βk (xk − xk−1 ) α (1−αk−1 ) . with βk = k−1 α2 +α k−1
k
5: end for 6: return Output: xk (final estimate).
The authors of [7] provide the following result for Algorithm 2.
122
E. Trimbach and A. Rogozin
Theorem 6. After running Algorithm 2 for k iterations, the following inequality holds: ⎛ ⎞ √ −j √ k k q q ε j ⎝2 (F (x0 ) − F ) + 4 1− εj + √ ⎠ . E [F (xk ) − F ] ≤ 1 − 2 2 q j=1
3.2
Catalyst Application
In this section, we combine Algorithms 1 and 2 and present a Catalystaccelerated version of DSGD. k d×n and matrix X ∈ Rd×n . Consider a sequence of matrices {Y k }∞ k=0 , Y ∈ R k k k Let X = (x1 . . . xn ), Y = (y1 . . . yn ) and define a sequence of functions n 2 κ 1 κ fi (xi ) + xi − yik−1 2 − σyk−1 , (7) H k (X) = n 1 2 2 where σyk−1
n 2 n 1 1 2 k−1 y k−1 − = y . i i 2 n i=1 n2 i=1 2
¯ = X 11 yields Note that substituting X = X n ¯ = H k (X) Also define
n 2 2 1 κ κ ¯ − y¯k−1 2 = f (¯ ¯ − y¯k−1 2 . fi (¯ x) + x x) + x n i=1 2 2
n 2 κ k−1 1 κ k−1 fi (x) + x − yi 2 − σy = min 2 2 x∈Rd n i=1 2 κ = min f (x) + x − y¯k−1 2 . 2 x∈Rd 2 ¯ = hk (¯ Introduce hk (x) = f (x) + κ2 x − y¯k−1 2 and note that H k (X) x). Hk
Algorithm 3. Catalyst-accelerated decentralized SGD Require: Number of outer iterations K. Tk −1 . For every k = 1, . . . , K, a step-size sequence {ηt }t=0 √ √ μ Define q = μ+κ , ρ = q/3, α0 = q. Choose {εk }∞ k=1 (approximation errors) For each other i-th node initialize x0i = yi0 ∈ Rd . 1: for k = 1, . . . , K do Tk −1 such 2: Compute required number of iterations Tk and step-size sequence {ηt }t=0 that DSGD provides εk accuracy in Tk iterations. at node i as initial 3: Run DSGD on functions hk (x) for Tk iterations using xk−1 i k −1 . guesses and setting step-size sequence to {ηt }Tt=0 2 + qαk . 4: Compute αk ∈ (0, 1) by solving the equation αk2 = (1 − αk )αk−1 α k−1 k k k k−1 (1−αk−1 ) . 5: For each node, update yi = xi + βk (xi − xi ) with βk = α2 +α 6: end for K K 7: return matrix X = xK 1 , x2 , . . . , xn .
k−1
k
Catalyst Acceleration of Decentralized SGD
3.3
123
Convergence of Algorithm 3
Algorithm 3 can be written in matrix form: Algorithm 4. DSGD acceleration in matrix form.
√ μ Require: Number of outer iterations K, and constants q = μ+κ , α0 = q, sequence K 0 of approximation errors {εk }k=1 . For each i-th node initialize yi = x0i ∈ Rd . 1: for k = 1, . . . , K do Tk −1 2: Compute required number Tk and step size {ηtk }t=0 such that of iterations ¯ DSGD can return Xk : E Hk (X) − Hk ≤ εk after Tk iterations. k ). 3: Compute Xk = DSGD (Xk−1 , Hk , Tk , {ηtk }Tt=0 2 + qαk . 4: Compute αk in (0, 1) by solving the equation αk2 = (1 − αk )αk−1 αk−1 (1−αk−1 ) . 5: Update Yk = Xk + βk (Xk − Xk−1 ) with βk = α2 +α 6: end for 7: return matrix XK
k−1
k
Theorem 7. Under Assumptions 1, 2, 3, 4, there exist a number of outer and step-size sequences iterations K, inner iteration numbers {Tk }K−1 k=0
K−1 k −1 {ηt }Tt=0 , such that Algorithm 3 attains ε-accuracy in the following sense: k=0 T 2 n T d×n ¯ − x 2 ≤ ε, where x it yields X ∈ R such that μ x ¯T := n1 i=1 xTi . The total number of iterations T is bounded as √ √ √ 2 √ ¯ +σ pτ ) L( ζτ ¯ τ L¯ σ 1 L ˜ √ O , √ log + √ + p μ ε nμ με μp ε ˜ notation hides logarithmic factors not dependent on ε. where the O Under low noise conditions, when σ ¯ is small, the dominant contribution is made by the first term, which results in an accelerated rate in comparison with Algorithm 1.
4 4.1
Proofs of Theorems Proof of Theorem 5
Proof. Theorem 5 is initially presented in the article [4] and proved in Lemma 15 of the Appendix of the same paper. To estimate the accuracy of the acceleration, ˜ so the following is we need to know exactly which variables are hidden under O, a proof of Lemma 15, but analysed it in O notation. The proof of Lemma 15 of [4] shows that T 1 τ r0 exp[−aη(T + 1)] + cη + 64BA η 2 , wt et + arT +1 ≤ 2WT t=0 η p
124
E. Trimbach and A. Rogozin
(t) 2 (t)
¯ − x , et = f x − f (x ) , a = μ2 , b = 1, c = where rt = E x ¯ √ 96 3τ L 18τ ¯2 2 . B = 3L, A = σ ¯ + p ζ , d= p After that, the lemma considers several different values of step-size η. 1) η =
ln(max{2,a 2 r 0 T 2 /c }) aT
In the case of η =
≤
σ ¯2 n ,
1 . d
ln(max{2,a2 r0 T 2 /c}) , aT
the right-hand side of (8) writes as
τ r0 exp[−aη(T + 1)] + cη + 64BA η 2 η p
2 r0 aT exp[− ln(max 2, a r0 T 2 /c )] ≤ ln(2)
c τ + ln max 2, a2 r0 T 2 /c + 64BA ln2 max 2, a2 r0 T 2 /c aT p
2
64BA τ
c c + ln max 2, a r0 T 2 /c + 2 2 ln2 max 2, a2 r0 T 2 /c . ≤ ln(2)aT aT a T p T therefore to achieve accuracy 2W1 T t=0 wt et + arT +1 ≤ ε the number of iterations of the algorithm will be equal to 1 BAτ c + . O aT a pε
2) η =
1 d
≤
In case η = as follows.
ln(max{2,a 2 r 0 T 2 /c }) . aT 1 d
≤
ln(max{2,a2 r0 T 2 /c}) , aT
the right-hand side of (8) is estimated
τ r0 exp[−aη(T + 1)] + cη + 64BA η 2 η p
a(T + 1) c ln max 2, a2 r0 T 2 /c ≤ r0 d exp − + d aT
τ + 64BA ln2 max 2, a2 r0 T 2 /c . p T Then to achieve accuracy 2W1 T t=0 wt et + arT +1 ≤ ε the number of iterations of the algorithm will be equal to d r0 d c 1 BAτ O ln + + . a ε aT a pε
Catalyst Acceleration of Decentralized SGD
125
(T +1) 2
T wt (t)
¯ ¯ Ef x − f + μE x Hence to achieve accuracy t=0 W − x ≤ ε T it is required to perform the following number of iterations: √ √ ¯ +σ L(ζτ ¯ pτ ) Lτ r0 τ L σ ¯2 √ + log + O . (8) μnε μp εp μp ε 4.2
Proof of Theorem 7
Let us introduce a new function h(x) : Rd → R: hk (x) = f (x) +
κ x − y¯k−1 2 2 2
(9)
¯ = h(¯ and note that H(X) x). Number of outer iterations Note that hk introduced in (7) satisfies properties (H1 ) and (H2 ) in [7]. Indeed, 2 hk is (μ + κ)-strongly convex and E [hk (x)] ≤ f (x) + κ2 x − yk−1 2 . Therefore, Algorithm 3 is similar to Algorithm 2 in the article [7].
√ x0 ) − f ) . The We choose the accuracy at step k as εk = O (1 − q/3)k (f (¯ properties of hk allow to get an estimate on the number of outer iterations: f (¯ x0 ) − f 1 K = O √ log , (10) q qε where total accuracy is ε = εK /q. For a detailed proof, see Section B.3 of [7]. Number of Inner Iterations When solving the inner problem it is necessary to find such Xk that xk ) − hk ≤ εk by starting the algorithm DSGD from the point Xk−1 . For hk (¯ further analysis, we define Lh = L + κ, μh = μ + κ. The DSGD algorithm applied to minimizing Hk for T iterations guarantees accuracy T +1 2
T wt ¯t ¯ − x 2 ≤ εk . t=0 WT E Hk (X ) − Hk + μH E x ¯ = hk (¯ Using that Hk (X) x), we can rewrite the previous statement as T
wt t=0 WT
T +1 2 ¯ E (hk (¯ xt ) − hk ) + μH E x − x 2 ≤ εk .
T +1 ¯ This means that E x − x 2 ≤
εk μH
and
Lh Lh εk x ¯T +1 − x 2 ≤ xT +1 ) − hk ≤ E . E hk (¯ 2 2μh
126
E. Trimbach and A. Rogozin
T +1 2 We conclude that if DSGD achieves accuracy μH E x ¯ − x 2 ≤ εk after √ √ ¯ +σ Lh (ζτ ¯ pτ ) Lh τ rk−1 τ Lh √ log O + μh p εp μh p ε
iterations, then DSGD achieves accuracy E hk (¯ xT +1 ) − hk ≤ εk after
σ ¯2 + μh nε
O
√ ¯ +σ ¯ pτ ) Lh τ Lh (ζτ Lh σ ¯2 rk−1 τ L2h log + √ + √ μ2h nεk μh μh p εk μh p μh εk p
iterations. According to Proposition 5 in article [7], for Algorithm 3 it holds
E [hk (xk−1 ) − hk ] = O εk−1 /q 2 . As a result, the inner complexity for each k-th step will be
√ ¯ +σ ¯ pτ ) Lh τ Lh (ζτ Lh σ rk−1 τ L2h ¯2 log + + √ √ μ2h nεk μh μh p εk μh p μh εk p √ 2 ¯ ¯ pτ ) Lh τ Lh (ζτ + σ Lh σ (hk (xk−1 ) − hk )τ L2h ¯ log + √ + =O √ μ2h nεk μh μh p εk μh p μ2h εk p √ ¯ +σ ¯ pτ ) Lh τ Lh (ζτ Lh σ εk−1 τ L2h ¯2 log 2 + √ + =O √ μ2h nεk μh μh p εk μh p μh εk pq 2 √ ¯ +σ ¯ pτ ) Lh τ Lh (ζτ Lh σ ¯2 τ L2h =O log + + . √ √ μ2h nεk μh μh p εk μh p μ2h pq 2
O
Total Complexity Let us sum the number of internal iterations by the number of external iterations. This yields total complexity: K K √ ¯ +σ Lh σ ¯ pτ ) Lh τ Lh (ζτ τ L2h ¯2 log 2 2 Tk = O + √ + T =O √ μ2h nεk μh μh p εk μh p μh pq k=1 k=1 K √ K K ¯ +σ Lh σ Lh (ζτ ¯ pτ ) Lh τ τ L2h ¯2 log 2 2 + + =O √ √ μ2 nεk μh μh p εk μh p μh pq k=1 h k=1 k=1 K √ K ¯ +σ Lh σ Lh (ζτ ¯ pτ ) τ L2h ¯2 Lh τ log 2 2 . + +K =O √ √ μ2h nεk μh μh p εk μh p μh pq k=1
k=1
√ x0 ) − f ) and therefore After that, we recall that εk = O (1 − q/3)K (f (¯ K 1 K √1 are geometric progressions. Moreover, note that q ≤ 1 k=1 εk and k=1 εk √ √
q and hence O 1 − 1 − 3 ≥ O q .
Catalyst Acceleration of Decentralized SGD
127
√ ¯ +σ ¯ pτ ) Lh τ 1 Lh (ζτ Lh σ f (¯ x0 ) − f ¯2 τ L2h T =O + √ + √ log 2 2 log √ √ μ2h n qεK μh μh p qεK μh p q μh pq qε √ 2 2 ¯ ( ζτ + σ ¯ pτ ) L f (¯ x0 ) − f Lh σ τ Lh ¯ Lh τ 1 h √ log + log =O + . √ √ μh p q μ2h pq 2 qε μh μh pq ε μ2h nq 3/2 ε μ and total Let us choose κ = L − μ, then μh = L, Lh = 2L − μ, q = L T complexity for accuracy E f (¯ x ) − f ≤ ε will be √ √ √ √ ¯ +σ L(ζτ ¯ pτ ) L(f (¯ x0 ) − f ) L¯ σ2 Lτ L2 τ √ + + √ log 2 log T =O √ μ μnε μp pμ με μp ε √ √ √ √ ¯ +σ L(ζτ ¯ pτ ) 1 L¯ σ2 Lτ ˜ √ + + √ log =O , √ μ μnε μp ε μp ε
˜ - notation hides constants and polylogarithmic factors. where O
References 1. Assran, M., Loizou, N., Ballas, N., Rabbat, M.: Stochastic gradient push for distributed deep learning. In: International Conference on Machine Learning, pp. 344–353. PMLR (2019) 2. Dvinskikh, D., Gasnikov, A.: Decentralized and parallelized primal and dual accelerated methods for stochastic convex programming problems. arXiv preprint arXiv:1904.09015 (2019) 3. Koloskova, A., Lin, T., Stich, S.U., Jaggi, M.: Decentralized deep learning with arbitrary communication compression. In: ICLR 2020 Conference Blind Submission (2020) 4. Koloskova, A., Loizou, N., Boreiri, S., Jaggi, M., Stich, S.U.: A unified theory of decentralized sgd with changing topology and local updates. In: International Conference on Machine Learning (2020) 5. Konecn` y, J., McMahan, H.B., Ramage, D., Richt´ arik, P.: Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016) 6. Kovalev, D., Salim, A., Richt´ arik, P.: Optimal and practical algorithms for smooth and strongly convex decentralized optimization. arXiv preprint arXiv:2006.11773 (2020) 7. Kulunchakov, A., Mairal, J.: A generic acceleration framework for stochastic composite optimization. In: Advanced Neural Information Processing System (NeurIPS 2019), vol. 32 (2019) 8. Li, H., Fang, C., Yin, W., Lin, Z.: A sharp convergence rate analysis for distributed accelerated gradient methods. arXiv:1810.01053 (2018) 9. Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization. arXiv preprint arXiv:2002.10110 (2020) 10. Li, Z., Shi, W., Yan, M.: A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Trans. Sign. Process. 67(17), 4494–4506 (2019)
128
E. Trimbach and A. Rogozin
11. Lian, X., Zhang, C., Zhang, H., Hsieh, C.J., Zhang, W., Liu, J.: Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. arXiv preprint arXiv:1705.09056 (2017) 12. Lin H., Mairal J., H.Z.: A universal catalyst for first-order optimization. arXiv preprint arXiv:1506.02186 (2015) 13. Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(1), 7854–7907 (2018) 14. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017) 15. McMahan, H.B., Moore, E., Ramage, D., y Arcas, B.A.: Federated learning of deep networks using model averaging. arXiv preprint arXiv:1602.05629 (2016) 16. Nedi´c, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017) 17. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Cont. 54(1), 48–61 (2009) 18. Pu, S., Shi, W., Xu, J., Nedich, A.: A push-pull gradient method for distributed optimization in networks. In: 2018 IEEE Conference Decision Control (CDC), pp. 3385–3390 (2018) 19. Qu, G., Li, N.: Accelerated distributed nesterov gradient descent. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (2016) 20. Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massouli´e, L.: Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: International Conference on Machine Learning, pp. 3027–3036 (2017) 21. Shi, W., Ling, Q., Wu, G., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015) 22. Stich, S.U.: Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767 (2018) 23. Tang, H., Lian, X., Yan, M., Zhang, C., Liu, J.: D2: decentralized training over decentralized data. In: International Conference on Machine Learning, pp. 4848– 4856. PMLR (2018) 24. Xu, J., Tian, Y., Sun, Y., Scutari, G.: Distributed algorithms for composite optimization: Unified framework and convergence analysis. arXiv e-prints pp. arXiv2002 (2020) 25. Ye, H., Luo, L., Zhou, Z., Zhang, T.: Multi-consensus decentralized accelerated gradient descent. arXiv preprint arXiv:2005.00797 (2020) 26. Yuan, K., Ling, Q., Yin, W.: On the convergence of decentralized gradient descent. SIAM J. Optim. 26(3), 1835–1854 (2016) 27. Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: NIPS, vol. 4, p. 4. Citeseer (2010)
Integer Programming and Combinatorial Optimization
A Feature Based Solution Approach for the Flying Sidekick Traveling Salesman Problem Maurizio Boccia, Andrea Mancuso, Adriano Masone(B) , and Claudio Sterle Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, Via Claudio 21, 80125 Naples, Italy {maurizio.bocca,andrea.mancuso,adriano.masone,claudio.sterle}@unina.it
Abstract. The integration of new distribution technologies in the delivery systems, specifically drones, has been investigated by several companies to reduce the last mile logistic costs. The most promising delivery system, in terms of emissions and completion time reduction, consists of a truck and a drone operating in tandem for the parcel delivery to the customers. This has led to the definition of new and complex routing problems that have received great attention by the operations research community. Several contributions have appeared in the last years providing ILP and MILP formulation for these kinds of problems. Nevertheless, due to complexity, their solutions is impracticable even on small instances. In particular, the synchronization and the coordination of the two vehicles represent critical issues. In this work, we investigate the possibility of using customer characterizing features which allow to determine a priori promising customer-to-vehicle assignments. This information can be used to perform several variable fixings in the FSTSP formulation, so reducing the size and the complexity of the instances to be solved. Thus, the final aim is to determine optimal or sub-optimal solutions using state-of-the-art solver on a reduced FSTSP. To this aim, we provide a computational study proving the quality of the chosen features and their effectiveness in the solution of new and literature instances.
Keywords: Truck-and-drone Computational study
1
· MILP · Customer features ·
Introduction
Last mile logistics (LML) is one of the most important and expensive part of the freight distribution process in a supply chain. Moreover, its role is not going to change in the future since we are observing an always increasing shift away from physical stores to digital shopping. In this context, the integration of new distribution technologies in the delivery systems, specifically drones, has been investigated by several companies to reduce the LML costs [6]. It is widely recognized that one of the most promising c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 131–146, 2021. https://doi.org/10.1007/978-3-030-86433-0_9
132
M. Boccia et al.
delivery system, in terms of emissions and completion time reduction, consists of a truck and a drone operating in tandem for the parcel delivery to the customers. This innovative delivery system has received great attention by the operations research community leading to the definition of new and complex vehicle routing problems. Truck-and-drone routing problems currently represent a cutting edge research topic and related literature significantly increased in the last years. For the sake of the brevity, here we just address the interested reader to the survey works by [7] and [25] for a complete review of the literature. In this work, we focus on the case with a single-truck and a single-drone, introduced in [19]. This problem is referred to as the Flying Sidekick Traveling Salesman Problem (FSTSP ) and its assumptions and operations can be schematized as follows: the truck-and-drone tandem starts from and returns to the depot; all the customers can be visited only once either by the truck or by the drone; the truck acts as a mobile depot for the drone providing it parcels and changing its battery (since the drone has a limited endurance); the drone is launched from the truck to serve one customer at a time and then it either comes back to the truck or to the depot (drone sortie); the launching and the rendezvous locations of each drone sortie must be different; service times for preparing the drone at launch and rendezvous are given and the rendezvous time contributes to the endurance computation; the drone can serve only eligible customers (i.e., customers demanding parcels with a weight lower than drone maximum payload); the drone and truck speeds are different. The objective is to minimize the completion time, i.e., the time when both vehicles reach the depot after having served all the customers. Slightly different variants of FSTSP have been proposed in literature [1,5,14,15] by modifying/relaxing the basic assumptions or introducing flexibility elements, such as: the possibility of visiting a node more than once, the truck ability of launching and retrieving a drone while waiting stationary in the same positions, different objective functions, the management of no-fly zones. FSTSP and its variants have been modeled by Integer Linear Programming (ILP ) and the Mixed Integer Linear Programming (MILP ) formulations. However, most of these formulations suffer from dimensional drawbacks which make their solution impracticable even on small instances. Therefore, we are observing an increasing number of contributions dealing with the development of heuristic methods able to determine optimal or sub-optimal solutions with an acceptable computational burden. The main aim of this work is the investigation of the benefits arising from the combination of data science and machine learning techniques with combinatorial optimization. Several contributions have already outlined the most promising research directions in this field [3,26], while the great potentiality of these hybrid approaches has been already proven in [12,16–18]. To the best of our knowledge, this is the first work addressing a truckand-drone problem through this kind of hybrid approach. Our idea is to use classification methods to determine a priori promising/good customer-to-vehicle
A Feature Based Solution Approach for FSTSP
133
assignments. This information allows us to perform several variable fixing in the FSTSP formulation, so reducing the size and the complexity of the instances to be solved. Thus, in other words, the final aim is to effectively and efficiently solve the FSTSP through the solution of a reduced ILP formulation. In this context, we provide a computational study which include and compare different classifiers, with the aim of investigating features characterizing good/optimal FSTSP solutions. Finally, we point out that the proposed approach can be conceived as a general framework for the solution of FSTSP instances. Indeed, once good customer-to-vehicle assignments are determined through a classification method, this information can be effectively used by any exact/heuristic solution method allowing it to tackle larger instances. The paper is organized as follows: in Sect. 2, the detailed description of the FSTSP and the related ILP formulation are reviewed; Sect. 3 is devoted to the presentation of the proposed solution approach; Sect. 4 presents the experiment design and the obtained computational results; finally, conclusions and perspectives on future works are given in Sect. 5.
2
FSTSP : Description and Formulation
In this section, we describe the FSTSP. Then, we review the ILP formulation proposed in [4] to solve the problem by a Branch-and-Cut algorithm. The reduction by variable fixing of this ILP formulation forms the basis for the proposed solution approach further discussed in Sect. 3. 2.1
Problem Description
The FSTSP is a delivery system which considers a tandem between a single truck and a single drone to serve a set of customers, C, where each customer can be visited only once. Since the drone cannot serve parcels exceeding its maximum payload, we define as D the set of customers that are eligible to be served by the drone (D ⊆ C). Moreover, we define as T the set of customers that can be served by the truck (generally, T = C). For the sake of clarity, we will denote a customer as c and c , when we refer to a customer in the T set and in the D set, respectively. On the basis of this distinction, we define AT and AD as the set of truck arcs and of drone arcs, respectively. The set of drone E arcs can be conceived as union of three subsets: AT , AL D , AD . The drone will move on an arc of the subset AT when it is carried by the truck. On the other E hand, it will move on an arc of the subset AL D (AD ) when it is launched from (retrieved by) the truck before (after) a drone sortie. The launch node and the rendezvous node of each drone sortie must be different. The drone can serve one customer per sortie and its flight time cannot exceed a certain threshold (denoted as Dtl) defined by its endurance. An overhead related to the time for launching (recovering) operations at launch (rendezvous) nodes is given and it is denoted as SL(SR). The overhead related to the recovery operation contributes to the drone flight time. The tandem is initially located at a warehouse, called starting
134
M. Boccia et al.
depot (indicated in the following as s). The two vehicles can be characterized by different speeds and so we will indicate as dTij (dD ij ) the time required by the truck (drone) to move from node i to node j. The objective of the FSTSP is to minimize the completion time, i.e., the time needed for the tandem to serve all the customers and return to the terminal depot (indicated in the following as t). 2.2
Problem Formulation
Different formulations have been proposed in literature specifically for the FSTSP [4,8,9,19,24]. In this subsection, we will review the formulation proposed in [4] that will be used for the computational study reported in Sect. 4. To the best of our knowledge, this formulation represents the state of the art of exact approaches for the FSTSP together with those presented in [9,24]. However, we point out that the proposed solution approach can be used for all the formulations presented in literature, as will be clarified in the following. The formulation proposed in [4] is based on the concept of truck path. A truck path, Pij , is a sequence of nodes traversed by the truck starting from node i and ending at node j. Let Πij be the set of all the paths from i to j on the truck sub-graph GT ({s, t} ∪ T ; AT ). The idea behind this formulation is to express a FSTSP solution in terms of a set of truck paths. Indeed, while the truck serves all the customer along its path, the drone may ride atop of the truck or be detached for a drone sortie starting from node i and ending at node j. This latter case can be expressed with the couple (k , Pij ), where k is the customer that is served if the drone path by the drone during its sortie. A sortie (k , Pij ) is feasible D T duration (dD ik + dk j ) and the truck path duration (D(Pij ) = (u,v)∈Pij duv ) are lower than the endurance of the drone Dtl. Moreover, we indicate with WPkij the waiting time of the truck for the couple (k , Pij ), where WPkij = max{0, dD ik + − D(P )}. Being S the set of all the possible drone customer and truck path dD ij k j couples (S = {(k , Pij ) | k ∈ D, i, j ∈ T ∪ {s, t} and Pij ∈ Πij }), it is possible to define the following binary variables: – yij , (i, j) ∈ AT , equal to 1 if the arc (i, j) belongs to the truck path, 0 otherwise. – xij , (i, j) ∈ AD , equal to 1 if the arc (i, j) belongs to the drone path, 0 otherwise. – zPk ij , (k , Pij ) ∈ S, equal to 1 if the drone travels the path i − k − j and the truck travels the path Pij , 0 otherwise. On the basis of this notation, the FSTSP can be modeled as follows:
A Feature Based Solution Approach for FSTSP
min
SR xij +
(s,j )∈AL D
(i,j)∈AT
+
dTij yij +
xij ≤
∀j ∈ T
(2)
yij
∀i ∈ T
(3)
j:(i,j)∈AT
j :(i,j )∈AL D
∀(i, j) ∈ AT (4)
ysj =
j:(s,j)∈AT
yit = 1
(5)
i:(i,t)∈AT
yij =
j:(i,j)∈AT
yji
∀i ∈ T
(6)
∀N ⊆ T
(7)
j:(j,i)∈AT
yij ≤ |N | − 1
i,j∈N |(i,j)∈AT
xsj =
j:(s,j)∈AD
yij = 1
i:(i,j)∈AT
(1)
ij
i:(i,j )∈AL D
WPkij zPk ij
(k ,Pij )∈S|WPk >0
xij +
xij ≤ yij
(SL + SR)xij +
(i,j )∈AL D i=s
s.t.
135
xit = 1
(8)
i:(i,t)∈AD
xij =
j:(i,j)∈AD
∀i ∈ T
xji
(9)
j:(j,i)∈AD
xij ≤ |N | − 1
∀N ⊆ T ∪ D
i,j∈N |(i,j)∈AT
(10)
yuv + xjk + xk i ≤ |Pij | + 1
(u,v)∈Pij
∀k ∈ D, ∀i, j ∈ T ∪ {s, t}, Pij ∈ Πij
(11)
D ∀i ∈ {s} ∪ T, j ∈ T ∪ {t}, k ∈ D : dD ik + dk j > Dtl − SR
(12)
xik + xk j ≤ 1
yuv + xik + xk j ≤ |Pij | + 1
(u,v)∈Pij
∀(k , Pij ) ∈ S : D(Pij ) > Dtl − SR zPk ij ≥
(u,v)∈Pij
yuv + xik + xk j − |Pij | − 1 ∀(k , Pij ) ∈ S : WPkij > 0
(13) (14)
136
M. Boccia et al.
The objective function (1) minimizes the total completion time expressed as a sum of the duration of the truck paths, the total launch and retrieval overhead time and the total truck waiting time. Constraints (2) ensure that a customer is visited exactly once either by the truck or by the drone. Constraints (3, 4) guarantee the consistency between drone and truck arc variables. Constraints (5–7) ensure the feasibility of the whole truck path. Constraints (8–10) guarantee the feasibility of the drone path. Constraints (11) forbid drone sorties going in the opposite direction of the truck path (|Pij | indicates the number of arcs of the path Pij ). Constraints (12 and 13) forbid sorties where the drone flight time is greater than the drone time limit. Finally, Constraints (14) are consistency constraints guaranteeing that if the couple (k , Pij ) is part of the solution then the corresponding variable must be equal to 1.
3
Proposed Solution Approach
The solution of a FSTSP instance requires to decide: the subsets of customers assigned to the truck and to the drone, respectively; the nodes where the drone will be launched and retrieved for each drone sortie; the order according to which the customers will be visited by the two vehicles. It is straightforward that vehicles synchronization and the coordination requirements represent the main complexity issues in the solution of the FSTSP. As previously described, the customers can be classified in two groups: customers ‘eligible for drones’ (ED); customers ‘not eligible for drones’ (NED). We recall that ED customers can be served either by the truck or by the drone, whereas NED customers can be served only by the truck since either the required parcels exceed drone payload capacity or their distance from all the other nodes exceed the drone endurance. It is easy to understand that NED customers decrease the flexibility of the delivery system and restrict the solution space of the corresponding FSTSP. In other words, the higher is the number of NED customers, the lower is the effect of the synchronization and the coordination requirements on the complexity of the FSTSP to be solved. Obviously, in the extreme case where all the customers are NED, the FSTSP reduces to the TSP. The idea behind the proposed solution approach is to further exploit the classification of ED and NED customers. More precisely, we want to classify as NED not only the customers characterized by the previous payload capacity and endurance straight features, but also other customers characterized by other unapparent features. This enlargement of the NED customer set allows to further restrict the solution space of the FSTSP by performing a variable fixing of several variables in the used FSTSP formulation. This, on its turn, produces a reduced FSTSP which could be solved to optimality either by off-the-shelf optimization softwares or by exact solution methods present in literature, which already proved to be efficient on small size instances.
A Feature Based Solution Approach for FSTSP
137
With specific reference to the previous formulation, the enlargement of the NED customer set provides a significant reduction of the number of z−variables, that, we recall, are exponential in number. To this aim, we set up a classification experiment which uses a binary variable assuming values 0 and 1 for the classification of ED and NED customers, respectively. The classification experiment will be run considering a set of seven original customer features specifically proposed and defined for the FSTSP. The results of the classification will be used as input for the ILP formulation of the FSTSP previously described that will be solved through the Branch-and-Cut algorithm proposed in [4]. 3.1
Features for Customer Classification in the FSTSP
The seven features used for the FSTSP customer classification have been defined according to the problem assumptions and system operations: 1. Feature 1: it corresponds with the first straight feature described in previous section and it is related to the parcel weight. It is modeled through a binary variable that is equal to 1 if the parcel exceeds drone maximum payload, 0 otherwise. 2. Feature 2: given a customer c∗ , it corresponds to the saving achievable in case c∗ is unserved. This feature is modeled through a continuous variable equal to (T SPC − T SPC\{c∗ } )/T SPC · 100, where T SPX is the length of the TSP optimal solution over the set X. The idea behind this feature is that customers corresponding to large saving could be served by the drone to parallelize the delivery process. Obviously, it could happen that the computation of this feature can be time consuming or it could be impossible. In these cases, the length of the TSP can be replaced by a proxy whose computation requires a lower computational effort. 3. Feature 3 : it is a measure of the centrality. It is modelled through customer a continuous variable equal to j∈C dTc∗ j / i,j∈C dTij . Customers with a relatively small centrality value are good candidates to be visited by the truck. 4. Feature 4 abd Feature 5 : they are designed to identify customers that could be promising launch and rendezvous locations for drone sorties. Let us define a direct feasible drone sortie as a sortie in which the drone travels the path i−k −j, the truck travels the arc (i, j) and the duration of the two travel times is lower than Dtl. Then, for each customer we compute the number of direct feasible drone sorties having the considered customer as launch/rendezvous location. On this basis, the fourth feature is an integer variable counting the number of times the customer is involved in direct feasible drone sorties. On the other hand, the fifth feature counts the number of drone feasible sorties where at least one between the launching and the rendezvous locations is not drone eligible. The motivation behind these two features is that a customer occurring in a large number of direct drone feasible sorties is more likely to be used as a launch/rendezvous location in the FSTSP solution.
138
M. Boccia et al.
5. Feature 6 and Feature 7 : sixth and seventh features are related to the duration of the shortest and largest drone path traversing a customer c∗ , respectively. In particular, they are computed as: D mini,j∈C\{c∗ }|i=j dD ic∗ + dc∗ j
and
D maxi,j∈C\{c∗ }|i=j dD ic∗ + dc∗ j ,
respectively. We point out that if the shortest and/or the largest drone path exceed the Dtl, the continuous variable associated with the corresponding feature will be set to a sufficiently large value. Finally, we highlight that all the feature values for each customer were centered and scaled on the basis of the corresponding instance since these values are of different type and order of magnitude. 3.2
Data Set and Classification Methods
The literature instances for the FSTSP are represented by the two test beds proposed in [19]. The first set was specifically designed for the FSTSP and it consists of 72 instances with 10 customers. The second set was originally defined for another truck-and-drone routing problem, but it has been adapted to the FSTSP. It consists of 120 instances with 20 customers. For both test sets two values of drone endurance are considered (20 and 40 min). The second test set contains the largest size instances which can be solved by FSTSP exact solution methods present in literature. Indeed, the optimal solutions of few 20 customers instances with Dtl = 20 and several instances with Dtl = 40 are still unknown at the time the present study was devised. In our classification experiments, each customer represents a single observation. On the basis of the available optimal solutions, we know the resulting classification of each customer for all the solved instances with 10 and 20 customers but we do not know the classification of the customers for the unsolved 20 customers instances. The idea is to train and test the classifiers with the data provided by the instances with 10 customers to classify the customers of the instances with 20 customers. However, in this way, the data set size is quite limited since it contains 720 observations (72 instances with 10 customers). Thus, to enlarge the data test size, we generated 2376 instance with 10 customers so obtaining a data set with 23760 observations. These instances were generated, coherently with those proposed in [19]. In particular, the 10 customers are distributed across an 8 by 8 mi square area. The depot location can be: the center of gravity of the customers (the average of their x- and y-coordinates); the average of the customers’ x-coordinates with a y-coordinate of zero; the origin (x- and y-coordinates of zero). The number of customers whose parcel weight does not exceed the drone maximum payload is 80% and 90% of the total number of customers. The truck travels considering a Manhattan metric with a speed equal to 25 miles/h. Drone travels considering an Euclidean Metric with three possible speeds: 15, 25, and 35 miles/h. The drone maximum flight time was chosen to be either 20 or 40 min.
A Feature Based Solution Approach for FSTSP
139
The overhead parameters related to launching and rendezvous operations (SL and SR) were set to be one minute each. For each combination of these parameters, we generated 66 different instances. These instances were all solved through the Branch-and-Cut algorithm proposed in [4] to determine which vehicle served each customer in optimal solution. As explained above, the main focus of this contribution is the investigation of the benefit arising from the combination of data science and machine learning methods with combinatorial optimization. Therefore, we did not develop an adhoc classifier but we used the Scikit-learn package [21] from the Python language to build classification models on the defined data set. In particular, we used seven different classifiers, namely: k-nearest neighbors (KNN ), linear support vector machine (LSV ), kernel support vector machine (RSV ), neural networks (NNE ), decision tree (DET ), random forests (RAF ), adaptive boost algorithm (ADB ). For the sake of the brevity, we do not provide a description of these known methods, but we address the interested reader to [2,10,11,13,20,23,27].
4
Computational Results
In this section we present and discuss the computational results of the experimentation performed to evaluate and validate the proposed solution approach. The experiments have been performed on an Intel(R) Core(TM) i7-6500U, 2.50 GHz, 16.00 GB of RAM. As previously discussed, we used the classification models of the Scikit-learn package in our experimentation. We point out that each tested classification model presents one or more parameters that can be tuned to improve the classification performances but, for the sake of simplicity, we used them with their default settings. To estimate the skill of the off-the-shelf Scikit-learn classifiers on unseen data, we used a 10-fold cross-validation scheme on the generated data set with the 23760 observations. The resulting average accuracy for each classifier is reported in Table 1, where the accuracy value is computed as the number of customers that are correctly classified divided by the total number of classified customers. Table 1. Accuracy of the tested Scikit-learn package classifiers KNN LSV
RSV DET RAF NNE ADB
0.858 0.862 0.860 0.864 0.882 0.861 0.861
140
M. Boccia et al.
We can observe, that, even without a specific tuning of the parameters, the lowest accuracy value is equal to 0.858. Moreover, we highlight that the two classifiers showing the highest accuracy values (DET and RAF ) belong both to the family of the decision tree classifiers. 4.1
Main Results
The results of the classification experiment showed that, on average, 14% of customers are misclassified. However, it is not possible to use this information to evaluate the proposed method in terms of quality of the resulting FSTSP solution. Therefore, the main experimentation of our work consists in solving FSTSP instances using the results of the classification. The objective is to investigate the benefits of the classification on the FSTSP solution in terms of objective function value and computation times. We considered for this experimentation the instances with 20 customers proposed in [19]. In particular, for each instance and for each classifier, we generated a modified instance where each customer is set as ED or NED according to the results of the considered classification method. We recall that the considered literature test set with 20 customers is composed of 120 instances where each instance can be characterized by two different Dtl values (20 and 40). Therefore, we generated for each instance and for each Dtl value seven different modified versions, one for each classifiers, so obtaining 1680 instances. As for the instances of the training and test sets, the reduced literature instances were solved through the Branchand-Cut algorithm proposed in [4] with a time limit of 1 h. Then, we compared the results with the best known upper bounds for these instances reported in [4,9]. The results of this comparison are reported in Tables 2, 3, and 4. Tables 2 and 3 show the detailed comparison between the best known upper bounds in literature and the best solutions obtained among those determined using the seven classifiers, for the instances with the endurance equal to 20 and 40, respectively. For each instance, we report: the instance id (Id ); the best upper bound known (BUB ), indicating with an asterisk if the BUB is the optimal solution; the percentage difference between the BUB and the best solution obtained through the classifiers (BD% ); the classifier that determined the best solution (Class); the running time, in seconds, needed by Branch-and-Cut algorithm to solve the corresponding modified instance (Time). We point out that the best percentage difference is computed as BD% = (BSol − BU B)/BU B · 100, where BSol is the objective value of the best solution among those determined on the modified instances. If BSol is determined by more than one classifier we reported the classifier corresponding to the lowest running time in the column Class.
A Feature Based Solution Approach for FSTSP
141
Concerning the instances with Dtl equal to 20 min, the average percentage difference on all the instances is equal to 1.12 with an average running times of 109.13 s. Moreover, we can observe that on 28 instances the best solution obtained is equal or even better than the best solution known. Considering the classifiers, we can observe that the RAF is the one determining the greatest number of best solutions (38 out of 120) confirming the results obtained on the test set. Concerning instead the instances with Dtl equal to 40 min, the average percentage difference on all the instances is equal to 0.56 with an average running times of 611.56 s. We can observe also that the number of instances for which the proposed methodology is able to obtain a solution better or equal to the best known in literature is 53. These results confirm the benefit of the proposed approach in terms both of solution quality and running times. Considering the classifiers, the best one on these instances is KNN being able to obtain the best solution 39 times out of 120. A comparison between the average results obtained using the different classifiers is shown in Table 4, where the instances are grouped in three sets on the basis of the depot location (center, edge, and origin). For each classifier, we report the percentage difference (Diff% ), the number of solutions with an objective value equal or better than the best known literature (BES ), the running time in seconds (Time), the number of instances not solved to optimality within the time limit (ETL). We can observe that on average the best classifier in terms of quality of the solution is KNN with an average percentage difference equal to 2.45. KNN is also the classifier showing the lowest average Diff% on the instances with Dtl equal to 40 min. However, we can observe that KN N is also the classifier that generated the greatest number of instances that were not solved within the time limit (24 out of 120). Considering instead the instances with Dtl equal to 20 min, RAF confirms the results reported in Table 2 showing a Diff% value equal to 2.01. For the sake of completeness, we reported in Table 4 also the results obtained determining the NED customer set through a random approach (RDM ). These results are used to compare the proposed methodology with one that is not based on machine learning techniques. In particular, for each instance we randomly generated 5 different NED customer sets and then, for each of them, we solve the corresponding FSTSP formulation. We can observe that almost all the classification methods (except ADB ) performs better than a random approach in terms of Diff% so confirming the competitiveness of an approach based on classification methods. Moreover, we highlight also that a random approach provides no benefits in terms of computation time, being negligible its gain with respect to classification methods.
142
M. Boccia et al. Table 2. Results on literature instances with 20 customers and Dtl = 20
Id
BUB
BD% Class Time Id
BUB
BD% Class Time
Id
BUB
BD% Class Time
4847 267.05*
1.20 RAF
28.4 5025 131.43*
3.24 RAF
2.3 5154
123.34
0.00 DET
20.9
4849
248.3*
0.19 KNN
0.5 5027 114.31*
3.64 NNE
3.1 5156 124.46*
1.86 RAF
15.5
4853 232.87*
0.58 RAF
1.4 5030
116.1*
0.63 DET
12.2 5159 145.79*
0.76 KNN
8.8
4856 253.33*
0.00 DET
4.5 5032
117.55
0.59 KNN
13.2 5201 148.02*
0.00 DET
11.5
4858 240.63*
0.36 RAF
2.7 5034
105.1
0.88 NNE
8.9 5203 138.59*
0.00 DET
2.2
4902 242.32*
0.71 DET
2.1 5036
124.33
0.10 RAF
11.9 5205 134.78*
6.31 KNN
9.5
4907 239.28*
0.57 KNN
1.5 5039
130.91
5.50 DET
17.2 5207 121.47*
0.76 LSV
2.7
4909 222.88*
0.00 RAF
4.8 5041
125.32
1.13 RAF
11.3 5209 135.92*
2.26 RAF
7.2
4912 267.62*
0.00 RAF
2.8 5044
121.87 −0.91 KNN
33.2 5212 137.67*
1.79 RSV
1.7
4915 259.39*
0.03 LSV
0.7 5047 112.85*
26.8 5214 126.25*
2.22 KNN
41.6
4917 173.97*
1.15 NNE
10.0 5049 197.66*
0.79 RSV
2.0 5216
101.07
4.08 KNN
5.2
4920 170.05*
0.17 RAF
3.5 5051 180.62*
1.31 KNN
9.0 5218
115.82
1.06 RAF
20.7
4922
1.24 KNN
19.3
0.35 KNN
169.6*
1.09 DET
12.0 5053 176.51*
0.44 RAF
4924 159.57*
0.28 RAF
4.5 5055 177.29*
0.00 RAF
19.5 5223
94.59
0.45 ADB
1.7
4926 155.98*
0.00 ADB
1.1 5057
0.34 RAF
10.3 5225 129.72*
4.29 KNN
22.9
4928
180.7*
LSV
6.7 5220 119.04*
166*
1.85 RAF
6.2 5059 150.82*
1.29
4931 172.49*
1.71 RAF
4.8 5102 165.49*
1.24 DET
14.9 5229
94.26
3.65 LSV
4933 159.39*
3.11 KNN
2.8 5104 181.61*
0.69 RAF
66.9 5231
98.93*
2.19 RSV
4.5
4935 176.69*
0.97 RAF
6.5 5106 158.49*
0.00 RAF
8.6 5233
111.62
1.79 KNN
5.0
4937 173.55*
0.00 RAF
5.5 5108 172.12*
0.16 NNE
4939 201.03*
3.63 NNE
2.0 5110 135.43*
4.75 RAF
1.3 5238
4941 253.08*
0.00 NNE
5.3 5112 131.23*
1.40 KNN
16.1 5240
4944 247.03*
0.00 RAF
18.8 5115 127.39*
0.11 RSV
6.8 5242
85.65
1.03 DET
4946 237.21*
0.00 RAF
3.8 5117
130.36
0.29 KNN
84.1 5244
86.81*
0.01 RAF
1.0
4948 258.06*
0.00 DET
6.4 5119 118.97*
4.77 ADB
24.5 5246
74.56
0.25 RAF
11.7
4950 240.99*
0.17 DET
8.3 5121
131.84
1.77 KNN
7.8 5248
83.1
2.17 ADB
3600
4952 218.09*
0.73 KNN
0.9 5123
121.95
2.15 NNE
22.6 5250
81.93*
0.41 RAF
11.4
4954 261.06*
1.41 DET
1.5 5125
130.96
1.09 ADB
7.5 5252
4957 252.28*
1.28 RAF
2.6 5127 132.24*
2.00 DET
3.7 5255
4959 249.92*
0.42 ADB
0.2 5130
126.49
0.06
5.5 5257
5001
115.5*
0.00 KNN
48.0 5132
106.36
1.29 DET
5003
173.54 −0.47 LSV
4.2 5134
102.57
0.00 DET
98.91*
0.13 RAF
LSV
6.2 5227 116.19*
12.1 5235 118.89* 79.39
4.50 KNN 488.2 2.8
3.00 DET
8.9
1.23 NNE
10.9
87.46 −2.86 KNN 223.7 5.5
86.38 −0.27 RSV 228.4 82.5
2.92 KNN
5.8
79.43 −1.47 RSV 299.5
10.1 5306 100.46*
1.67 DET
1.4
9.9 5310
92.41*
0.00 DET
30.2
23.5 5312
83.59*
0.91 LSV
1.5
5006 155.39*
2.07 RAF
6.8 5136
5008 159.74*
1.43 LSV
3.6 5138
91.83 −0.24 RSV
5010 145.48*
1.89 RAF
4.8 5141
95.53
0.93 RAF
24.1
5012
134.7 5321 101.49* 324
3.16 KNN
1.2
104.03
3.00 RSV
1.6
172.4*
2.34 NNE
3.4 5143
97.69*
0.56 RAF
11.8 5330 118.45*
3.27 RAF
3.5
5015 172.67*
0.00 RAF
12.6 5145
95.09
0.00 DET
14.4 5334 102.02*
2.20 KNN
8.9
5017
166.47
0.00 RAF
79.3 5148
90.58
1.76 DET
3600 5336 104.46*
0.97 KNN
7.1
5020 155.92*
0.34 RSV
22.6 5150
82.19 −0.01 DET
22.5 5345 114.19*
0.10 DET
5.0
5022
0.00 RAF 130.3 5152
2.28 KNN
7.4
146.21
90.58
0.00 RSV 3130.2 5351
115.21
A Feature Based Solution Approach for FSTSP
143
Table 3. Results on literature instances with 20 customers and Dtl = 40 Id
BUB
BD% Class Time
Id
BUB
BD% Class Time
Id
BUB
BD% Class Time
255.6 −2.21 DET
4847
255.6
5.52 DET
6.4 5025
255.6
0.00 KNN
21.9 5154
4849
225.15
1.40 RAF
247.9 5027
225.15
1.76 RAF
1.4 5156
4853
218.6
2.87 KNN
2.4 5030
218.6 −1.99 KNN
3600 5159
4856
237.03
4.35 DET
110.2 5032
237.03 −0.15 KNN
7.2 5201
237.03
0.31 NNE
4858
215.63
5.33 DET
19.9 5034
215.63
0.42 RAF
0.5 5203
215.63
2.17 ADB
22.8
4902
226.62
2.28 KNN
31.4 5036
226.62 −0.06 DET
17.9 5205
226.62
2.46
43.4
4907
196.66 −1.93 KNN
142.7 5039
LSV
266.1 5207
196.66
0.00 KNN
4909
216.04
2.63 RAF
47.6 5041
0.00 KNN
81.2 5209
216.04
1.94
LSV
3600
4912
238.04
7.32 RAF
5.3 5044
238.04 −0.73 RSV
3600 5212
238.04 −2.03 KNN
78.2
4915
234.42
0.06 KNN
7.7 5047
234.42
0.00 KNN
184.2 5214
234.42
0.70 KNN
15.9
4917
165.31
2.52 ADB
52.7 5049
165.31
1.61 RAF 1551.6 5216
165.31
2.10 DET
6.8
4920
161.89 −1.13 KNN
76.7 5051
161.89
2.22
161.89
0.72 RSV
449.7
216.04
LSV
26.7 5218
218.6 −3.50 KNN
LSV
3600 54.7
3.2
4.6 5053
168.1
1.95 KNN
463.1 5220
168.1 −1.77 RSV
299.1
4924
158.98
0.02 DET
21.4 5055
158.98
1.28 RAF
81.2 5223
158.98 −1.57 RAF
39.6
4926
152.24
0.10 ADB 3239.4 5057
152.24
0.53 RAF
67.3 5225
152.24 −0.20 KNN
4928
153.88 −0.38 KNN
98.2 5059
153.88
0.53 DET
6.3 5227
4931
166.68
1.94 RAF
12.3 5102
166.68
0.83 KNN
7.4 5229
4933
157.19
2.78 RAF
40.1 5104
157.19 −0.64 RAF 1416.8 5231
4935
160.97
2.26 KNN
7.3 5106
160.97 −1.80 DET
56.1 5233
160.97
6.27 KNN
28.5
4937
159.44 −0.02 KNN
44.7 5108
159.44
0.00 RSV 1330.6 5235
159.44
6.72 DET
190.6
4939
180.92
3.30 ADB
9.5 5110
180.92
0.00 NNE
616.3 5238
180.92 −0.18 KNN
4941
241.9
1.25 KNN
44.1 5112
241.9
0.46 RAF
0.5 5240
4944 233.75*
5.61 DET
8.5 5115 233.75*
0.07 KNN
2.6 5242 233.75*
1.03 DET
1.8
4946 220.39*
0.00 RAF
88.1 5117 220.39*
0.19 NNE 1029.7 5244 220.39*
0.11 ADB
1.7
4922
4948
168.1 −1.61 KNN
196.66 −7.92
31.8
225.15 −1.65 NNE 2594.1
244.13 −0.88 RSV
244.13
2.74 RAF
4.2 5119
4950 225.61*
1.33 ADB
8.5 5121 225.61*
4952
211.67
1.57 KNN
10.3 5123
211.67 −1.43 RAF
4954
240.59
3.99 DET
82.2 5125
4957
40.3 5127
0.00 ADB
123.6 5246
14.6
153.88
0.15 KNN
214.4
166.68
0.00
LSV
920.9
157.19 −5.19 DET
3600
241.9
0.41 KNN
77.2 3600
244.13 −0.78 RAF 1088.4
852.7 5248 225.61* −1.44 DET
3600
726.2 5250
211.67
0.00 RSV
6.3
240.59
3.43 RSV 2646.1 5252
240.59
0.00 RAF
953.0
231.61
0.86 RSV 1106.4 5255
231.61 −1.49 ADB
231.61
5.09 KNN
4959 231.95*
4.29 KNN
5001
114.09
1.05 DET
250.4 5132
LSV
1.7 5306
5003
166.92
0.52 KNN
28.9 5134
166.92 −0.87 NNE
81.4 5310
5006
135.92
4.71 KNN
56.3 5136
135.92
0.00 DET
216.7 5312
135.92
0.37 RSV
1.4
5008
155.42 −1.48 KNN
61.0 5138
155.42
0.01 RSV
3600 5321
155.42
0.00 DET
45.3
5010
135.72
3.09 KNN
10.4 5141
135.72
0.30 DET
58.2 5324
135.72 −8.08
160.2 −2.36 DET
44.5 5143
160.2 −3.83 DET
87.1 5330
5012
4.1 5130 231.95* −0.72 KNN
5015
158.97
1.35 KNN
3.1 5145
5017
157.33
3.02 NNE
5020
146.01
5022
133.36
114.09 −1.09
52.6 5257 231.95*
LSV
16.6
0.00
LSV
3600
166.92 −0.09 KNN
3.0
114.09
160.2
0.34
LSV
177.5
LSV
3600
158.97 −0.79 RAF
3600
235.8 5336
157.33 −0.17
LSV
3600
2.05 NNE
1.4 5345
146.01 −0.40
LSV
11.9
133.36 −0.30 DET
4.9 5351
133.36 −2.01
LSV
87.4
6.90 KNN
2.0 5334
388.4 5148
157.33 −5.33 NNE
0.33 KNN
3600 5150
146.01
1.16
310.1 5152
LSV
3.4
0.50
158.97
144
M. Boccia et al. Table 4. Comparison of the solutions obtained with the different classifiers Depot Depot
Dtl = 20 Diff% BES Time
Dtl = 40 ETL Diff% BES Time
ETL
KNN
Center Edge Origin Average
3.13 1.48 2.00 2.20
2 5 0
26.38 0 151.48 1 5.15 0 61.00
2.34 3.86 1.87 2.69
5 19 12
692.26 6 774.04 8 1018.28 10 828.20
SVM
Center Edge Origin Average
3.66 2.19 2.30 2.71
1 2 2
6.44 0 87.36 0 3.84 0 32.55
4.29 2.01 1.85 2.72
2 7 11
69.95 113.28 651.53 278.25
0 1 6
RSV
Center Edge Origin Average
3.46 2.01 2.50 2.65
0 4 0
9.55 0 128.67 0 6.13 0 48.12
4.16 2.40 3.52 3.36
2 13 6
309.08 408.42 710.40 475.97
2 4 6
DET
Center Edge Origin Average
4.10 2.05 1.83 2.66
3 4 7
11.24 0 226.91 2 24.24 0 87.47
5.28 2.89 4.26 4.14
3 10 6
428.38 374.47 789.33 530.73
4 4 6
RAF
Center Edge Origin Average
2.49 1.74 1.79 2.01
5 7 5
15.29 0 44.68 0 9.45 0 23.14
4.85 2.22 5.57 4.22
6 10 6
394.91 518.49 728.97 547.45
3 5 7
NNE
Center Edge Origin Average
3.01 1.67 2.17 2.28
2 6 1
12.56 0 131.53 0 7.26 0 50.45
3.09 2.02 3.04 2.71
4 13 8
451.03 333.28 855.83 546.71
3 3 8
ADB
Center Edge Origin Average
5.50 2.96 4.07 4.17
0 3 0
3.18 0 192.56 1 1.61 0 65.78
5.21 3.65 8.08 5.65
3 10 6
413.31 509.78 409.52 444.21
3 4 4
RDM Center Edge Origin Average
6.25 3.68 2.34 4.09
0 2 4
10.32 0 22.66 0 16.19 0 16.39
5.41 4.05 8.08 4.87
5 8 8
599.47 385.85 431.92 472.41
4 3 3
A Feature Based Solution Approach for FSTSP
5
145
Conclusions
In this paper, we investigated the benefits for the FSTSP arising from the combination of data science and machine learning methods with combinatorial optimization. In particular, we proposed a solution approach based on the customer classification in ED and NED. This classification reduces the FSTSP solution space allowing us to tackle the resulting problem through the use of exact approaches that otherwise cannot be used. We provided an extensive experimentation on new generated and literature instances. Computational results confirmed the methodological contribution of this work. Indeed, the proposed approach is able to determine 52 new best upper bound values for FSTSP literature instances. Future research directions naturally include further refinement of the proposed solution approach through an additional testing on the classifier parameter settings. In addition, it may be worth exploring other features to determine the customer eligibility. Finally, it would be interesting to investigate the benefit of this approach for slightly different truck-and-drone problems.
References 1. Agatz, N., Bouman, P., Schmidt, M.: Optimization approaches for the traveling salesman problem with drone. Transp. Sci. 52(4), 965–981 (2018) 2. Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 12(6), 783–789 (1999) 3. Bengio, Y., Lodi, A., Provoust, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021) 4. Boccia, M., Masone, A., Sforza, A., Sterle, C.: A column-and-row generation approach for the flying sidekick travelling salesman problem. Transp. Res. Part C Emerg. Technol. 124, 102913 (2021) 5. Boccia, M., Masone, A., Sforza, A., Sterle, C.: An exact approach for a variant of the FS-TSP. Transp. Res. Procedia 52C, 51–58 (2021) 6. Cary, N., Bose, N.: UPS, FedEx and Amazon gather flight data to prove drone safety, 24 September 2016. https://venturebeat.com/2016/09/24/ups-fedex-andamazon-gather-flightdata-to-prove-drone-safety/ 7. Chung, S.H., Sah, B., Lee, J.: Optimization for drone and drone-truck combined operations: a review of the state of the art and future directions. Comput. Oper. Res. 123, 105004 (2020) 8. Dell’Amico, M., Montemanni, M., Novellani, S.: Drone-assisted deliveries: new formulations for the flying sidekick traveling salesman problem. Optim. Lett., 1862– 4480 (2019) 9. Dell’Amico, M., Montemanni, M., Novellani, S.: Models and algorithms for the flying sidekick traveling salesman problem. arXiv (2019). https://arxiv.org/pdf/ 1910.02559v2.pdf 10. Freund, Y., Schapire, R.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14, 771–780 (1999) 11. Friedl, M.A., Brodley, C.E.: Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997)
146
M. Boccia et al.
12. Guerine, M., Rosseti, I., Plastino, A.: Extending the hybridization of metaheuristics with data mining: dealing with sequences. Intell. Data Anal. 20, 1133–1156 (2016) 13. Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. Adv. Neural. Inf. Process. Syst. 17, 513–520 (2005) 14. Ha, Q.M., Deville, Y., Pham, Q.D., H´ a, M.H.: On the min-cost traveling salesman problem with drone. Transp. Res. Part C Emerg. Technol. 86, 597–621 (2018) 15. Jeong, H.Y., Song, B.D., Lee, S.: Truck-drone hybrid delivery routing: Payloadenergy dependency and no-fly zones. Int. J. Prod. Econ. 214, 220–233 (2019) 16. Maia, M.R.H., Plastino, A., Penna, P.H.V.: Hybrid data mining heuristics for the heterogeneous fleet vehicle routing problem. RAIRO Oper. Res. 52, 661–690 (2018) 17. Maia, M.R.H., Plastino, A., Vaz Penna, P.H.: MineReduce: an approach based on data mining for problem size reduction. Comput. Oper. Res. 122, 104995 (2020) 18. Martins, D., Vianna, G.M., Rosseti, I., Martins, S.L., Plastino, A.: Making a stateof-the-art heuristic faster with data mining. Ann. Oper. Res. 263, 141–162 (2018) 19. Murray, C.C., Chu, A.G.: The flying sidekick traveling salesman problem: optimization of drone-assisted parcel delivery. Transp. Res. Part C Emerg. Technol. 54, 86–109 (2015) 20. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005) 21. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 22. Otto, A., Agatz, N., Campbell, J., Golden, B., Pesch, E.: Optimization approaches for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: a survey. Networks 72(4), 411–458 (2018) 23. Richard, M.D., Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput. 3(4), 461–483 (1991) 24. Roberti, R., Ruthmair, M.: Exact methods for the traveling salesman problem with drone. Transp. Sci. 55(2), 315–335 (2021) 25. Viloria, D.R., Solano-Charris, E.L., Munoz-Villamizar, A., Montoya-Torres, J.R.: Unmanned aerial vehicles/drones in vehicle routing problems: a literature review. Int. Trans. Oper. Res. 28, 1626–1657 (2020) 26. Vesselinova, N., Steinert, R., Perez-Ramirez, D.F., Boman, M.: Learning combinatorial optimization on graphs: a survey with applications to networking. IEEE Access 8, 120388–120416 (2020) 27. Wang, L.: Support Vector Machines: Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/b95439
Maximizing the Minimum Processor Load with Linear Externalities Julia V. Chirkova(B) Institute of Applied Mathematical Research of Karelian Research Centre of RAS, Pushkinskaya St. 11, Petrozavodsk, Russia [email protected] http://www.krc.karelia.ru/HP/julia
Abstract. We consider the maximizing the minimum processor delay game on uniformly related processors with linear externalities. Externalities allow taking into account the influence of all loaded processors on each processor performance, unlike the model without externalities this problem has not been considered before. A set of jobs is to be assigned to a set of processors with different delay functions depending on their own loads and also loads on other processors. Jobs choose processors to minimize their own delays, while the system the social payoff of a schedule is the minimum delay among all processors, i.e. cover. For the case of two processors in this model the Nash equilibrium existence is proven and the expression for the Price of Anarchy is obtained. Also we show that the Price of Anarchy is limited in contrast to the initial model without externalities. Keywords: Nash equilibrium · Cover · Maximizing the minimum load · Price of anarchy · Linear externalities
1
Introduction
Our paper is devoted to the scheduling problem as the maximizing the minimum processor delay game (or covering game) [5,20,21] with linear externalities. Here several egoistic players distribute their jobs of various volumes among processors of nonidentical speeds. Each player tries to minimize a completion time (delay) for his own job. The system payoff is a minimum processor delay [5,20,21]. The game is derived from the KP-model (Koutsopias and Papadimitriou, see [12,15]) with parallel different-capacity channels. The problem setup where a system seeks to maximize the minimal delay among all nodes originated from the concept of fair sharing and efficient utilization of network resources. Epstein et al. [2,5] were the first who analyzed the efficiency of equilibria in this model and also illustrated the motivation for such optimality criteria using several examples. The main idea is that all elements of the system must be loaded as much as possible with minimum possible downtime. For example, if each player pays the amount of his delay to the system for execution of his job, then (1) there must c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 147–162, 2021. https://doi.org/10.1007/978-3-030-86433-0_10
148
J. V. Chirkova
be no “privileged” players paying considerably smaller than the other owing to a successful choice of processor, and (2) there must be no processors with small profit or without any profit at all. In our model we add linear externalities into the delay functions depending on processor loads and speeds. Externalities allow taking into account the influence of all loaded processors on each processor performance, unlike the model without externalities this problem has not been considered before. A processor load (its own load) is the total volume of jobs executed by a given processor. The ratio of a processor load and speed defines its delay, i.e., the job completion time at this processor. We suppose that for each processor its delay depends on not only its own load but also other processor’s load contributes to the delay through the externalities. We can interpret that fact as follows. Each processor shares some part of its resources to the execution of some control operations together with other processors. It takes part in organizing, control, and support of the job execution process, as well as a data exchange associated with these operations. Therefore, even if a processor is not loaded with jobs it has some delay because it is involved in organizing the work of loaded processors. A paper [10] is one of the firsts where externalities were introduced as external effects caused by neighbour players in the network. Also different externalities are defined in routing models. Works [4,9,13] consider routing games with positive cost-sharing and negative congestion externalities. In [18] author shows that congestion externalities may lead to Pareto inefficient choice of routes and Braess’s paradox occurrence [1,11,17,19]. In [16] mixed type externalities are considered, they contain positive and negative components which influence to Braess’s paradox and its properties. We demonstrate the possibility to add linear externalities into delay functions of KP-model similarly to papers [8,14] which investigates externalities in the Wardrop model of a transportation system with parallel links. In our game like in the initial model [2,5] egoistic players choose processors minimizing their jobs’ delays and reach a Nash equilibrium, viz., a job distribution such that none of them benefits from unilateral change of a chosen processor. In this paper we study pure strategies Nash equilibria only; we show that similarly to initial KP-models [6,7], the pure equilibrium always exists in the described class of games with 2 processors, in contrast to the general case. The system payoff (also called the social payoff) is the minimum delay over all processors for an obtained job distribution. The Price of Anarchy [5] is defined as the maximum ratio of the optimal social payoff and the social payoff in the worst-case pure Nash equilibrium. We prove a pure Nash equilibrium existence and find an analytical expression for the Price of Anarchy for the case of two processors. Also we show that in the model with linear externalities the Price of Anarchy is a finite value in contrast to the initial models [5,20,21] where the Price of Anarchy is infinity when the fastest processor’s speed is at least 2.
Maximizing the Minimum Processor Load with Linear Externalities
2
149
The Model
Consider a system S = S(N, v, e) composed of a set N of processors operating with speeds v1 ≤ · · · ≤ vn , where n = |N |, and externalities eik , where i, k ∈ N, i = k, and each coefficient eik ≥ 0 reflects a contribution of load on the processor k = i to the delay of the processor i. The system is used by a set of players with their jobs U = U (M, w): each player from the set M chooses an appropriate processor for his job execution. For player j, the volume of job is m wj , j = 1, . . . , m, where m = |M |. Denote by W = wj the total volume of all j=1
jobs. Free processor i with speed vi executes a job of volume wj for wj /vi time, when all other processors are idle. Each player can choose any processor. The strategy of player j is to select a processor lj for his job execution. Then the strategy profile in the game Γ represents the vector L = (l1 , . . . , lm ). The load of processor i is defined by δi (L) = wj . The delay of processor i takes the form j∈M :lj =i
λi (L) =
j∈M :lj =i
wj + eik vi k=i
wj =
j∈M :lj =k
δi (L) + eik δk (L). vi k=i
In fact, this value is the same for all players selecting a given processor. So, we define the pure strategy game Γ = S(N, v, e), U (M, w), λ. In the present paper we consider only pure strategies. Suppose that the system objective is to maximize the delay of the least delayed processor, that is, to maximize its job completion time. The social payoff SC(L) = mini∈N λi (L) is described by the minimal delay over all processors. Define the optimal payoff (the social payoff in the optimal case) as OP T = OP T (S, U ) =
L
max is a profile in
Γ (S,U,λ)
SC(L)
(1)
where maximization runs over all admissible strategy profiles in the game Γ (S, U, λ). A strategy profile L such that none of the players benefits from a unilateral deviation (change of the processor chosen in L for his job execution) is a pure strategy Nash equilibrium (NE). To provide a formal definition, let L(j → i) = (l1 , . . . , lj−1 , i, lj+1 , . . . , lm ) be the profile where the job j migrates from a processor lj chosen in the profile L to some another processor i, whereas the remaining players keep their strategies the same as in L. Definition 1. A strategy profile L is said to be a pure strategy Nash equilibrium if and only if each player chooses a processor with the minimum delay, that is, for each player j ∈ M , we have the inequality λlj (L) ≤ λi (L(j → i)) for all processors i ∈ N . Lets introduce the assumptions to provide an adequate system behavior. Consider an arbitrary profile L. First, we suppose that the externalities are such
150
J. V. Chirkova
that a migration of any job wj from a processor lj to another processor k = lj strictly decreases a delay of the processor lj and strictly increases a delay of the processor k. It means the following. Assumption 1. For each processor pairs i = k the inequality eik < Then λlj (L(j → k)) = λlj (L) −
wj ( v1l j
1 vi
holds.
− elj k ) < λlj (L) and λk (L(j → k)) =
λk (L) + wj ( v1k − eklj ) > λk (L). Second, it is naturally to assume the load of a processor i contributes to its delay λi (L) more than to delays of other processors. When a job wj chooses w some processor i it increases its delay on a value vij which is more than the value eki wj which is added to delays of others processors k. It means the following. Assumption 2. For each processor pairs i = k it holds that eki
λk (L0 ), that is the processor lj is the most delayed. Also moving a job j makes delays of both processors less than λlj (L0 ), since λlj (L0 (j → k)) = λlj (L0 ) − wj ( v1l − elj k ) < λlj (L0 ), and j
λk (L0 (j → k)) < λlj (L0 ). That is λ0 is not a lexicographic minimum.
A set of related papers [2,3,5,20,21] present exact values and estimations for the Price of Anarchy for KP-model based covering games without externalities. All of them show that the Price of Anarchy is infinite if the s ≥ 2. Partially in case of two uniformly related processors with speeds 1 ≤ s [20] the POA value equals to ⎧ √ 2+s ⎪ 1 ≤ s ≤ 2, ⎨ (1+s)(2−s) for √ 2 P OA0 (s) = s(2−s) (2) for 2 < s < 2, ⎪ ⎩∞ for 2 ≤ s. We try to estimate the POA value for the presented model with externalities and compare it with (2). Recall that processor speeds are v1 = 1, v2 = s ≥ 1.
152
J. V. Chirkova
We also denote η(s) = 1 + s − s(e12 + e21 ), ζ(s) = 1 − se12 e21 for convenience and to make entries shorter. Here we present the following estimations which are satisfied in our model. We use them for the further analysis. Lemma 1. The optimal system payoff value has the following upper bound: OP T ≤
W ζ(s) η(s)
(3)
Proof. Consider the situation where jobs are arbitrarily splittable. Then they could be shared between two processors so that their delays are the same: λ = W −Δ + e12 Δ = vΔ2 + e21 (W − Δ), where Δ is a part of total jobs volume W v1
processed on the processor 2. From the equation we obtain Δ =
W ζ(s) η(s) .
W (v2 −e21 s) η(s)
and
The optimal system payoff OP T in our model with unsplittable jobs λ = is not greater than λ in the described situation.
Suppose that λi (L) ≥ λk (L) for some profile L, that is a processor i is the most delayed in L. Denote w = min wj is a minimal volume job on the processor j:lj =i i, a = wj is the total volume of remaining jobs on the processor i. Then j=k:lj =i
the total volume of jobs on the processor k equals to W − a − w . Lemma 2. Let L be an arbitrary NE and λi (L) ≥ λk (L). Then if the processor i executes more than one job and the total volume of jobs on i equals to w + a, where w is a minimal volume job on the processor i, then a≤
W (vi − seik ) − w (vk − seik ) . η(s)
(4)
Proof. Since L is the NE, we have λi (L) = wv+a +eik (W −a−w ) ≤ i It immediately implies the inequality for a.
W −a vk +eki a.
Lemma 3. Let L be an arbitrary NE and λi (L) ≥ λk (L). Then if the processor i executes more than one job and the total volume of jobs on i equals to w + a, where w is a minimal volume job on the processor i, then the system payoff SC(L) is at least W ζ(s) w (1 − vk eki )(η(s) + vk − seik ) − . η(s) vk η(s)
(5)
−a Proof. The system payoff equals to SC(L) = λk (L) = W −w + eki (w + a) = vk W 1 vk − (w + a)( vk − eki ). Substitute here an estimation (4) for a and obtain the estimation for SC(L).
Lemma 4. Let L be an arbitrary NE and λi (L) ≥ λk (L). Then if the processor i executes more than one job, then the system payoff SC(L) is at least W
seki − vi + 2vk ζ(s) . vk (η(s) + vk − seik )
(6)
Maximizing the Minimum Processor Load with Linear Externalities
153
Proof. We use system payoff estimation (5) the estimation (4) for a. Since the processor i executes more than one job and w is a minimal, then w ≤ )−w (vk −seik ) W (vi −seik ) , which implies w ≤ η(s)+v . Then SC ≥ a ≤ W (vi −seikη(s) k −seik ki −vi +2vk ζ(s) . W vse k (η(s)+vk −seik )
It is clear that the Lemma 4 is applicable only if seki − vi + 2vk ζ(s) > 0. Lemma 5. Let L be an arbitrary NE which is not an optimal profile, and λi (L) ≥ λk (L), and w is a minimal volume job on the processor i. Then λi (LOP T ) ≤ W v−w + eik w . i Proof. Let a be the total volume of all jobs excluding the job of the volume w on the processor i. Suppose first that a > 0, then a > w . Then in the profile L the load of the processor i equals to w + a. In the optimal profile LOP T the load of i cannot exceed w + a, because otherwise λi (LOP T ) > λi (L) > λk (L) > λk (LOP T ), that contradicts with the optimality of LOP T . If in the optimal profile the load of i is equal to w + a, then the optimal payoff OPT coincides with the equilibrium SC(L). Therefore the optimal load on the processor i is strictly less than w + a, and it means that it can be equal either a + Δ1 < a + w , either w + Δ2 < a + w , or not more than W − a − w . Here Δ1 + Δ2 = W − a − w , Δ1 , Δ2 ≥ 0. Note that a + Δ1 = W − w − Δ2 ≤ W − w , w + Δ2 = W − a − Δ1 ≤ W − a ≤ W − w , W − a − w < W − w . So, the optimal load of the processor i does not exceed W − w . Consider now the case where a = 0. Then in the optimal profile LOP T the load of i cannot exceed w , because otherwise λi (LOP T ) > λi (L) > λk (L) > λk (LOP T ), that contradicts with the optimality of LOP T . If in the optimal profile the load of i equals to w , then the optimal payoff OPT coincides with the equilibrium SC(L). Then the optimal load of the processor i is strictly less than w , and it means that the job w is executed on the processor k in the optimal
profile, and therefore the load of the processor i does not exceed W − w . Lemma 6. Let L be an arbitrary NE which is not an optimal profile, and λi (L) ≥ λk (L), and w is a minimal volume job on the processor i. Then OP T ≤ W v−w + eik w . i Proof. We use lemma 5. If λi (LOP T ) ≤ λk (LOP T ), then OP T = λi (LOP T ) ≤ W −w + eik w . Otherwise OP T = λk (LOP T ) ≤ λi (LOP T ) ≤ W v−w + eik w .
vi i
154
J. V. Chirkova
We use the following denotations to make entries shorter: f (s) = s(1 − e12 )2 − ζ(s)(1 − se21 ), h(s) = se12 − s + 2ζ(s), g(s) = (1 − se12 )(1 + η(s) − se21 ) − s(1 − e21 η(s)), q(s) = s2 (1 − e12 )(1 − e21 ) − η(s)(1 − se12 ), ζ(s)(η(s)+1−se21 ) 21 (1−se12 ) est1 (s) = η(s)(se , est2 (s) = s−1+se , s2 e12 (1−e12 ) 12 −s+2ζ(s)) est3 (s) = est4 (s) = est5 (s) = est7 (s) =
η(s)+1−s+s2 e21 (1−e21 ) , s(2ζ(s)−s+se12 ) η(s)ζ(s) η(s)2 −s(1−e12 )2 (1−se12 )−sη(s)(1−e21 )(1−e12 ) , s2 ζ(s)(1−e12 ) 12 )(s−1+se12 ) est6 (s) = s(1−e η(s)(s−1+se21 (1−se12 )) , s−1+se21 (1−se12 ) , sζ(s) 1−se12 +s2 e21 (1−e21 ) .
Suppose that A is a set of some functions αi (·) and define a function α(·) which is a combination of min and max operators applied to subsets {αi (·) ∈ A} ⊆ A. We say that the component α∗ (s) ∈ A is active for given s if α∗ (s) ≡ α(s). The domain of s where this identity is true we call an active area for the function α∗ (s). Theorem 2. For the system S with two processors the POA value is at most est(s) = max{upper1 (s), upper2 (s)},
(7)
where the functions min{est1 (s), max{min{est2 (s), est3 (s)}, est4 (s)}} if h(s) > 0, upper1 (s) = otherwise . max{est2 (s), est4 (s)} (8) and upper2 (s) = min{est5 (s), max{est6 (s), est7 (s)}}. (9) Proof. I. Consider first an arbitrary game with an arbitrary NE L where λ2 (L) ≥ λ1 (L). Let w be a minimal volume job on the processor 2, a is the total volume of remaining jobs on the processor 2. Then the total volume of jobs on the processor 1 equals to W − a − w and the system payoff is SC(L) = λ1 (L) = W − a − w + e12 (w + a).
(10)
Suppose first that only one job is executed on the processor 2 in the NE L, that is a = 0. We show that it implies that L is an optimal profile. Suppose the opposite. Then two cases are possible. The first case is when the job with volume w remains on the processor 2 in the optimal profile LOP T . But it yields that OP T = λ1 (LOP T ) ≤ λ1 (L), and so the profile LOP T is not optimal. If in LOP T the job w moves to the processor 1, then the processor 2 obtains the load + e21 w ≤ W − w + e12 w = λ1 (L) and at most W − w . Then OP T ≤ W −w s OP T is not optimal. L Suppose now that the processor 2 executes more than one job in NE L, that OP T ≤ est1 (s), which is a ≥ w . From lemmas 1 and 4 we obtain the estimation SC(L) is applicable when h(s) ≥ 0. To obtain the further estimations we need more precise estimation of OP T .
Maximizing the Minimum Processor Load with Linear Externalities
155
12 )−SC(L) 1. Let λ2 (LOP T ) ≤ λ1 (LOP T ). By (10) we get a = W −w (1−e . Substi1−e12 tute this expression for a to the inequality (4) and expressing w we obtain −η(s)SC(L) . From Lemma 6 we have OP T ≤ W −w + e21 w . Subw ≤ ζ(s)W s(1−e12 )2 s
21 )+W f (s) . stitute here the estimation for w and obtain OP T ≤ SC(L)η(s)(1−se s2 (1−e12 )2 The two further estimations can be applied only if f (s) ≥ 0. If it holds, we can obtain the following estimations. a) The system payoff is SC(L) = W −w −a+e12 (w +a) = W −(w +a)(1−e12 ) ≥ OP T ≤ est2 (s). e12 W , then SC(L) OP T b) By the estimation (6) we obtain the estimation SC(L) ≤ est3 (s), which also needs h(s) > 0. 2. Now consider the case when in the optimal profile we have λ1 (LOP T ) ≤ λ2 (LOP T ), then by Lemma 5 we get w + e12 (W − w ) ≤ λ1 (LOP T ) ≤ 21 ) + e21 w . We can express w ≤ W (1−se and subλ2 (LOP T ) ≤ W −w s η(s) stitute it to the system payoff estimation (5). Then the system payoff 2 2 12 )−sη(s)(1−e21 )(1−e12 )) . By (3) for OP T we SC(L) ≥ W ((η(s) −s(1−e12 ) (1−se η(s)2 OP T obtain SC(L) ≤ est4 (s). The denominator of est4 (s) equals to (1 − se21 )(1 − se12 ) + e12 (η(s)(s − 1) + s(1 − se21 )(1 − e21 ) + η(s)(1 − se12 )) > 0.
OP T ≤ upper1 (s). Joining the obtained estimations we get the upper bound SC(L) We show now that the components of the obtained upper bound are defined and feasible in their active areas. We verify domains and active areas for the obtained estimations. The estimations est1 (s) and est2 (s) are defined if h(s) > 0. The estimations est2 (s) and est3 (s) are applicable if f (s) ≥ 0. Active areas for the estimations depend on the following inequalities. If h(s) > 0, then
est1 (s) ≥ est3 (s) is equivalent to g(s) ≤ 0,
(11)
est1 (s) ≥ est4 (s) is equivalent to g(s) ≤ 0,
(12)
est2 (s) ≥ est3 (s) is equivalent to
(s − 2 + se12 )f (s) ≥ 0, h(s)
(13)
est3 (s) ≥ est4 (s) is equivalent to f (s)g(s) ≤ 0.
(14)
est2 (s) ≥ est4 (s) is equivalent to f (s) ≥ 0
(15)
Also without paying attention to the value of h(s). Suppose first that h(s) > 0. The estimation est∗ (s) is active for given s if est∗ (s) ≡ upper1 (s). The domain of s where this identity is true is an active area for the estimation est∗ (s). This means for est1 (s) that est1 (s) ≤ max{min{est2 (s), est3 (s)}, est4 (s)}. This inequality can be satisfied in four cases: a) est1 (s) ≤ est2 (s) and est4 (s) ≤ est2 (s) and est3 (s) ≥ est2 (s). From the first and third inequalities we have est1 (s) ≤ est3 (s), which with (11) yields g(s) ≥ 0. The second inequality and (15) imply f (s) ≥ 0. The second and
156
J. V. Chirkova
third inequalities imply est4 (s) ≤ est3 (s), which with (14) yields f (s)g(s) ≤ 0, and with f (s) ≥ 0 we obtain g(s) ≤ 0. So, in our conditions this case is possible only if inequalities are equalities. b) est1 (s) ≤ est3 (s) and est4 (s) ≤ est3 (s) and est3 (s) ≤ est2 (s). From the first inequality with (11) we have g(s) ≥ 0. This together with the second inequality and (14) implies f (s) ≤ 0. The second and third inequalities provide est4 (s) ≤ est2 (s) which with (15) yields f (s) ≥ 0. So, this case is possible only with equalities. c) est1 (s) ≤ est4 (s) and est4 (s) ≥ est2 (s). It is possible if g(s) ≥ 0 and f (s) ≤ 0 which follows from (12) and (15). d) est1 (s) ≤ est4 (s) and est4 (s) ≥ est3 (s). It is possible if g(s) ≥ 0 and f (s) ≥ 0 which follows from (12) and (14). Similarly, the estimation est2 (s) is active when est2 (s) ≤ est3 (s) and est2 (s) ≥ est4 (s) and est2 (s) ≤ est1 (s). The second and third inequalities yield est4 (s) ≤ est1 (s), which with (refe14) implies g(s) ≤ 0. The second inequality with (15) yields f (s) ≥ 0, which provides an applicability for the estimation est2 (s). This and the first inequality with (13) imply s − 2 + se12 ≥ 0. The estimation est3 (s) is active when est3 (s) ≤ est2 (s) and est3 (s) ≥ est4 (s) and est3 (s) ≤ est1 (s). The third inequality with (13) provides g(s) ≤ 0. This and the second inequality with (14) imply f (s) ≥ 0, which provides an applicability for the estimation est3 (s). This and the first inequality with (13) imply s − 2 + se12 ≤ 0. The estimation est4 (s) can be active in two cases: a) est4 (s) ≤ est1 (s) and est4 (s) ≤ est2 (s). The first inequality with (12) yields g(s) ≤ 0, from the second inequality with (15) we have f (s) ≤ 0. b) est4 (s) ≤ est1 (s) and est4 (s) ≤ est3 (s). The first inequality with (12) implies g(s) ≤ 0, this together with the second inequality with (14) yields f (s) ≤ 0. Note that in both cases a) and b) the inequality f (s) ≤ 0 means that the estimation est4 (s) is active in domains where the estimations est2 (s) and est3 (s) are not applicable. Consider the case where h(s) < 0, that is the estimations est1 (s) and est3 (s) are not defined. The estimation est2 (s) is active, when est2 (s) ≥ est4 (s), that is f (s) ≥ 0 since (15). Otherwise the estimation est4 (s) is active. So, an active upper estimation cases can be clearly presented in the Table 1. Table 1. Active upper estimation cases if λ2 (L) ≥ λ1 (L) in the NE L. h(s) g(s)
f (s)
>0
>0
Any value Any value
s − 2 + se12 Active estimation est1 (s)
>0
≤0
>0
>0
est2 (s)
>0
≤0
>0
≤0
est3 (s)
>0
≤0
≤0
Any value
est4 (s)
≤0
Any value >0
Any value
est2 (s)
≤0
Any value ≤0
Any value
est4 (s)
Maximizing the Minimum Processor Load with Linear Externalities
157
II. Consider now an arbitrary game with an arbitrary NE L where λ1 (L) ≥ λ2 (L). Let w be a minimal volume job on the processor 1, a is the total volume of remaining jobs on the processor 1. Then the total volume of jobs on the processor 2 equals to W − a − w and the system payoff is SC(L) = λ2 (L) =
W − a − w + e21 (w + a). s
(16)
1. Suppose first that a = 0, that is the processor 1 executes only one job with the volume w . Then the delay on the processor 1 equals to λ1 (L) = w + W (1−se12 ) e12 (W − w ) ≤ W s = λ2 (L). Expressing w we obtain w ≤ s(1−e12 ) and, so, SC(L) ≥ W
s − 1 + se21 (1 − se12 ) . s2 (1 − e12 )
(17)
OP T With the estimation (3) for OP T we get SC(L) ≤ est5 (s). Now we need more precise estimation for OP T to obtain the further estimations. a) Let first λ1 (LOP T ) ≤ λ2 (LOP T ). By lemma 6 OP T ≤ W − w + e12 w . −sSC(L) )+sSC(L)(1−e12 ) . Then OP T ≤ W (e12 −se211−se . If From (16) we have w = W 1−se 21 21 OP T e12 − se21 ≥ 0, then we can use (17) and SC(L) ≤ est6 (s). b) Next consider the case where in the optimal profile λ2 (LOP T ) ≤ λ1 (LOP T ), which with lemma 5 implies ws + e21 (W − w ) ≤ λ2 (LOP T ) ≤ λ1 (LOP T ) ≤ 21 ) W − w + e12 w . Express w ≤ sW (1−e and substitute this to (16). Then η(s) 2
e21 (1−e21 )) . Using the estimation (3) for OP T we obtain SC(L) ≥ W (1−se12 +s sη(s) OP T ≤ est (s). 7 SC(L) OP T ≤ 2. Let now a > 0. From the estimations (6) and (3) we obtain SC(L) sζ(s)(η(s)+s−se12 ) η(s)(se21 −1+2sζ(s))
≤ est5 (s). Note that se21 − 1 + 2sζ(s) = s − 1 + sζ(s) + se21 (1 − se12 ) > 0, therefore the estimation (6) is applied without restrictions. OP T ≤ Joining the obtained estimations we get the upper estimation SC(L) upper2 (s). We show now that the components of the obtained upper bound are defined and feasible in their active areas. We verify domains and active areas for the obtained estimations. The estimation est6 (s) is applicable when e12 − se21 ≥ 0. An active areas of the estimations depend on the following inequalities. est5 (s) ≥ est6 (s) is equivalent to q(s) ≤ 0,
(18)
est5 (s) ≥ est7 (s) is equivalent to q(s) ≤ 0,
(19)
est6 (s) ≥ est7 (s) is equivalent to (e12 − se21 )q(s) ≥ 0,
(20)
We say that the estimation est∗ (s) for given s is active if est∗ (s) ≡ upper2 (s). This means for est5 (s) that est5 (s) ≤ est6 (s) or est5 (s) ≤ est7 (s), which implies with (18) and (19) that q(s) ≥ 0.
158
J. V. Chirkova
The estimation est6 (s) is active when est6 (s) ≥ est7 (s) and est5 (s) ≥ est6 (s). From the second inequality with (18) we have q(s) ≤ 0, which together with (20) implies e1 2 − se21 ≥ 0, which provides an applicability of the estimation est6 (s). The estimation est7 (s) is active when est7 (s) ≥ est6 (s) and est5 (s) ≥ est7 (s). From the second inequality with (19) we have q(s) ≤ 0, which together with (20) implies e12 − se21 ≤ 0, that means that the estimation est7 (s) is active in the area where the estimation est6 (s) is not applicable. OP T So, if q(s) ≥ 0 then the ratio SC(L) ≤ est5 (s), otherwise when e12 − se21 ≥ 0 OP T OP T ≤ est7 (s).
the ratio SC(L) ≤ est6 (s), otherwise the ratio SC(L) Theorem 3. For the system S with two processors the POA value is at least est(s), defined by (7). Proof. To obtain this lower estimate for the price of anarchy, it suffices to give examples of player sets U yielding the ratios of the optimal and the equilibrium payoffs stated by the theorem and show that their properties are feasible and fall into active estimation domains. OP T = est1 (s) as a game with 4 jobs w1 = w2 = 1. Consider an example for SC(L) sη(s)(1 − e21 ), w3 = s(1 − e21 )(1 − se21 ) and w4 = g(s) ≥ 0 in the active area of est1 (s). In the NE L the jobs w1 and w2 are executed on the processor 2, and the jobs w3 and w4 on the processor 1, providing the system payoff to be equal to λ1 (L) = η(s)(2ζ(s) − s(1 − e12 )) > 0 when h(s) > 0. If the job w1 migrates to the processor 1, then λ1 (L(1 → 1)) = λ1 (L)+w1 (1−e12 ) = λ2 (L). In the optimal profile LOP T two jobs w1 w3 are executed on the processor 2, and two jobs w2 and w4 are on the processor 1. The delay on both processors equals to ζ(s)(1 + η(s) − se21 ). OP T 2. For SC(L) = est2 (s) we consider the game with 2 jobs w1 = s(s − 1) ≥ w2 = s(1 − se12 ) if s − 2 + se12 ≥ 0, which yields from the activity of est2 (s). In the NE L both jobs are executed on the processor 2, then the equilibrium system payoff is equal to λ1 (L) = s2 e12 (1 − e12 ). If the job w2 moves to the processor 1, then λ1 (L(2 → 1)) = λ1 (L) + w2 (1 − e12 ) = λ2 (L). However such new profile coincides with an optimal profile LOP T , where the job w1 is the only which remains on the processor 2. The optimal delay on the processor 2 is λ2 (LOP T ) = s − 1 + se21 (1 − se12 ) ≤ λ1 (LOP T ) = s(1 − e12 ). OP T 3. For SC(L) = est3 (s) we present the game with 3 jobs w1 = w2 = s2 (1 − e21 ), w3 = s(2 − s − se12 ) ≥ 0 in the active area of est3 (s). In the NE L the jobs w1 w2 are executed on the processor 2, and one job w3 is on the processor 1, providing the system payoff to be equal to λ1 (L) = s(2ζ(s) − s(1 − e12 )) > 0 if h(s) > 0, which is true in the active area of est3 (s). If the job w1 migrates to the processor 1, then λ1 (L(1 → 1)) = λ1 (L) + w1 (1 − e12 ) = λ2 (L). In the optimal profile LOP T two jobs w1 and w3 are executed on the processor 2, and one job w2 is on the processor 1. The processor 2 has the smallest delay which is equal to λ2 (LOP T ) = η(s) + 1 − s + s2 e21 (1 − e21 ) ≤ λ1 (LOP T ) = s2 (1 − e21 )(1 + e12 ) + se12 (2 − s − se12 ) if g(s) ≤ 0, which is true in the active area of est3 (s).
Maximizing the Minimum Processor Load with Linear Externalities
159
OP T 4. An example for SC(L) = est4 (s) is the game with 3 jobs w1 = η(s)(s − 1) + s(1 − se21 )(1 − e21 ) ≥ w2 = 1 − se12 if g ≤ 0, which holds in the active area of est4 (s), and w3 = (1 − se12 )(1 − se21 ). In the NE L two jobs w1 and w2 are executed on the processor 2, and the job w3 is on the processor 1. The system payoff is λ1 (L) = η(s)2 − s(1 − e12 )2 (1 − se12 ) − sη(s)(1 − e21 )(1 − e12 ). If the job w2 moves to the processor 1, then λ1 (L(2 → 1)) = λ1 (L) + w2 (1 − e12 ) = λ2 (L). In the optimal profile the jobs w1 and w3 are executed on the processor 2, and the job w2 is on the processor 1. Both processors have the same delay which equals to η(s)ζ(s). OP T 5. An example for SC(L) = est5 (s) is the with 3 jobs w1 = sη(s)(1 − se12 ), 2 w2 = s (1 − e12 )(1 − se12 ), where w3 = sq(s) ≥ 0 in the active area of est5 (s). In the NE L the job w1 is executed on the processor 1, and two jobs w2 and w3 are on the processor 2. The system payoff is λ2 (L) = η(s)(s−1+se21 (1−se12 )). If the job w1 moves to the processor 2, then λ2 (L(1 → 2)) = λ2 (L) + w1 ( 1s − e21 ) = λ1 (L). In the optimal profile LOP T two jobs w1 and w3 are executed on the processor 2, and w2 is on the processor 1. Both processors have the same delay which is equal to s2 ζ(s)(1 − e12 ). OP T 6. An example for SC(L) = est6 (s) is the game with 2 jobs w1 = s(1 − se12 ) and w2 = s(s − 1). In the NE L the job w1 is executed on the processor 1, and the job w2 is on the processor 2. The system payoff is λ2 (L) = s−1+se21 (1−se12 ). If the job w1 moves to the processor 2, then λ2 (L(1 → 2)) = λ2 (L) + w1 ( 1s − e21 ) = λ1 (L). In the optimal profile LOP T both jobs change their positions: the job w1 moves to the processor 2 and the job w2 moves to the processor 1. The system payoff equals to λ1 (LOP T ) = s(s−1+e12 (1−se12 )) ≤ λ2 (LOP T ) = 1 − se12 + se21 (s − 1) when q(s) ≤ 0, which holds in the active area of est6 (s). OP T 7. An example for SC(L) = est7 (s) is the game with 2 jobs w1 = s2 (1 − e21 ) and w2 = s(1 − se12 ). In the NE L the job w1 is executed on the processor 1, and the job w2 is on the processor 2. The system payoff is λ2 (L) = 1 − se12 + s2 e21 (1 − e21 ). If the job w1 migrates to the processor 2, then λ2 (L(1 → 2)) = λ2 (L) + w1 ( 1s − e21 ) = η(s) ≥ λ1 (L) = s(s − se21 + e12 (1 − se12 )) if q(s) ≤ 0, which is true in the active area of est7 (s). In the optimal profile LOP T the job w1 is executed on the processor 2, and the job w2 is on the processor 1. Both processors have the same delay sζ(s).
The Theorems 2 and 3 imply that the lower and the upper bounds for the POA value for a system S with two processors coincide. Therefore the obtained estimation is the exact value for the POA. Theorem 4. For the system S with two processors the POA value exactly equals to est(s), defined by (7). The model without externalities concerns with the fact that the POA value is infinite even in case of two processors [5] when s ≥ 2. As we can see adding the externalities close to zero value solves this problem. The Theorem 4 implies that the system can set such externalities e12 < 1s and e21 < 1s and e12 ≥ e21 for any s ≥ 1, that the value of POA equals to est(s) < ∞.
160
4
J. V. Chirkova
Numerical Examples
The Fig. 1 presents examples of evaluated POA values in the model with two processors for different values of s. The Fig. 1 a) shows the POA for the externality values e12 = 0.1 and e21 = 0.01. The Fig. 1 b) presents the case when externalities are e12 = 0.3 and e21 = 0.1. The curves P oA(s) (full lines) are given for the model with externalities, and the curves P oA0 (s) (dashed lines) show the POA values for the initial game without externalities. One can see that in the initial game the POA increases rapidly in contrast to the game with externalities.
Fig. 1. The POA for the system S with a) e12 = 0.1, e21 = 0.01, b) e12 = 0.3, e21 = 0.1.
5
Conclusion
This paper has explored the maximizing the minimum processor delay game with uniformly related processors and linear externalities. We defined the assumptions providing an adequate system behaviour. We proved that the pure NE does not necessarily exist in the general case with n processors. It seems that some additional assumptions are required for its existence. Such conditions exploring and some POA estimation for the general case is a possible direction for the future work. For the case of two processors in this model we proved the Nash equilibrium existence. Also we obtained an analytical expression for the upper bound of the POA, which is also the lower bound and, hence, expresses the POA value exactly. We show that the Price of Anarchy is limited in contrast to the initial KP cover model without externalities. The numerical results demonstrate visually the POA value dependency on s and externalities.
References 1. Braess, D.: Uber ein Paradoxon der Verkehrsplanung. Unternehmensforschung 12, 258–268 (1968) 2. Chen, X., Epstein, L., Kleiman, E., et al.: Maximizing the minimum load: the cost of selfishness. Theor. Comput. Sci. 482, 9–19 (2013). https://doi.org/10.1016/j. tcs.2013.02.033
Maximizing the Minimum Processor Load with Linear Externalities
161
3. Chirkova, Yu.V.: Price of anarchy for maximizing the minimum machine load. Adv. Syst. Sci. Appl. 17(4), 61–77 (2017). https://doi.org/10.25728/assa.2017.17.4.518 4. Easley, D., Kleinberg J.: Networks, Crowds, and Markets: Reasoning about Highly Connected World. Cambridge University Press, Cambridge (2010). https://doi. org/10.1017/CBO9780511761942 5. Epstein, L., Kleiman, E., van Stee, R.: Maximizing the minimum load: the cost of selfishness. In: Leonardi, S. (ed.) WINE 2009. LNCS, vol. 5929, pp. 232–243. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10841-9 22 6. Even-Dar, E., Kesselman, A., Mansour, Y.: Convergence time to nash equilibria. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 502–513. Springer, Heidelberg (2003). https://doi.org/10. 1007/3-540-45061-0 41 7. Fotakis, D., Kontogiannis, S., Koutsoupias, E., Mavronicolas, M., Spirakis, P.: The structure and complexity of nash equilibria for a selfish routing game. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 123–134. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45465-9 12 8. Gao, H., Mazalov, V.V., Xue, J.: Optimal parameters of service in a public transportation market with pricing. J. Adv. Transp. (2020). 2020 Safety, Behavior, and Sustainability under the Mixed Traffic Flow Environment. https://doi.org/ 10.1155/2020/6326953 9. Holzman, R., Monderer, D.: Strong equilibrium in network congestion games: increasing versus decreasing costs. Int. J. Game Theory 44(3), 647–666 (2014). https://doi.org/10.1007/s00182-014-0448-4 10. Jacobs, J.: The Economy of Cities. Random House, New York (1969) 11. Korilis, Y., Lazar, A., Orda, A.: Avoiding the Braess paradox in non-cooperative networks. J. Appl. Probab. 36(1), 211–222 (1999). https://doi.org/10.1239/jap/ 1032374242 12. Koutsoupias, E., Papadimitriou, C.: Worst-case equilibria. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, pp. 404–413. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49116-3 38 13. Kuang, Z., Lian, Z., Lien, J.W., Zheng, J.: Serial and parallel duopoly competition in multi-segment transportation routes. Transp. Res. Part E Logistics Transp. Rev. 133(6), 101821 (2020). https://doi.org/10.1016/j.tre.2019.101821 14. Kuang, Z., Mazalov, V.V., Tang, X., Zheng, J.: Transportation network with externalities. J. Comput. Appl. Math. 382, 113091 (2021). https://doi.org/10.1016/j. cam.2020.113091 15. L¨ ucking, T., Mavronicolas, M., Monien, B., Rode, M., Spirakis, P., Vrto, I.: Which is the worst-case nash equilibrium? In: Rovan, B., Vojt´ aˇs, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 551–561. Springer, Heidelberg (2003). https://doi.org/10. 1007/978-3-540-45138-9 49 16. Mak, V., Seale, D.A., Gishces, E.J., et al.: The Braess Paradox and coordination failure in directed networks with mixed externalities. Prod. Oper. Manag. 27(4), 717–733 (2018). https://doi.org/10.1111/poms.12827 17. Mazalov, V., Chirkova, J.: Networking Games. Network Forming Games and Games on Networks. Academic Press (2019). https://doi.org/10.1016/C2017-004296-9 18. Milchtaich, I.: Network topology and the efficiency of equilibrium. Games Econ. Behav. 57(2), 321–346 (2006). https://doi.org/10.1016/j.geb.2005.09.005 ´ How bad is selfish routing? J. ACM 49(2), 236–259 19. Roughgarden, T., Tardos, E.: (2002). https://doi.org/10.1145/506147.506153
162
J. V. Chirkova
20. Tan, Z., Wan, L., Zhang, Q., et al.: Inefficiency of equilibria for the machine covering game on uniform machines. Acta Inform. 49(6), 361–379 (2012). https://doi. org/10.1007/s00236-012-0163-1 21. Wu, Y., Cheng, T.C.E., Ji, M.: Inefficiency of the Nash equilibrium for selfish machine covering on two hierarchical uniform machines. Inf. Process. Lett. 115(11), 838–844 (2015). https://doi.org/10.1016/j.ipl.2015.06.005
Analysis of Optimal Solutions to the Problem of a Single Machine with Preemption K. A. Chernykh(B) and V. V. Servakh(B) Sobolev Institute of Mathematics, Novosibirsk, Russia
Abstract. This is an analysis of the problem of minimizing the total weighted completion time for jobs of the same processing time on a single machine, given the arrival times of the jobs and the possibility of preemption. At present, the computational complexity of this problem is unknown. Based on the newly revealed properties of the problem, an algorithm has been developed for constructing a finite subset of solutions that contains an optimal schedule. Having conducted a parametric analysis of schedules for this subset, we identified a subclass of optimal schedules for certain weights values. We came up with an algorithm for preprocessing the input data, which makes it possible to reduce the problem to a narrower class of examples. We also propose an approach to solving the special case of the problem with job durations two. Keywords: Scheduling theory · Single machine Preemption · Identical processing times
1
· Release dates ·
The Statement of the Problem
The task is to perform n jobs on a single machine. We assume that for each job i = 1, 2, . . . , n, its release date ri ∈ Z + , processing time pi ∈ Z + , and a weight ωi ∈ R+ are given. A machine can perform only one job at a time. The objective is find a job schedule which minimizes the total weighted job completion time to n time of job i. In common notation i=1 ωi Ci , where Ci denotes the completion [1], this problem is denoted as 1|ri | ωi Ci . The problem is NP-hard in the strong sense [2]. The preemption version of the problem (1|ri , pmnt| ωi Ci ) is also strongly NP-hard [3]. In [4] Baptiste proves the polynomial solvability of the problem 1|ri , pi = p| ωi Ci with equal job processing times. The question of the computational complexity of a similar problem with preemptions 1|ri , pi = p, pmtn| ωi Ci remains open. Single machine problems are simple in formulation, but they are mathematically interesting as combinatorial objects. In [5] obtained some of the key properties of the problems with preemption. The similar problems were considered in The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0019). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 163–174, 2021. https://doi.org/10.1007/978-3-030-86433-0_11
164
K. A. Chernykh and V. V. Servakh
[6–8]. In [9,10] the authors use the fact that to find the optimum, it is enough to analyze schedules where a job is interrupted only at the time of a job with large weight arrival. The problem with job release datas ri has been studied in detail in [11]. In [12,13], we propose a parametric approach to analyzing the combinatorial structure of possible optimal solutions and identifying some interesting polynomially solvable subcases of the problem 1|ri , pi = p, pmtn| ωi Ci , in particular when ri ∈ [0, p] or ri = ri−1 + p2 for all i. See also studies [14,15] concerned with researching the linear integer programming model for this problem. The paper is organized as follows. The second section describes the known simple properties of optimal solutions of problem 1|ri , pi = p| ωi Ci . In the third section, we propose an algorithm for preprocessing the input data of the problem, which allows us to reduce the problem to instances examples of a simpler and more regular structure. The fourth section substantiates certain properties that allow us to consider a finite subset of solutions including the optimal schedule. The fifth section presents an algorithm for constructing all possible schedules of the selected subset. Section 6 proposes an approach to conducting a parametric analysis of the subset schedules. Such an analysis yields a subclass of schedules that are optimal for certain weight values. Section 7 proposes an approach to solving the special case of the problem with job durations two.
2
Known Properties
In this section, we list several known properties of problem 1|ri , pi = p| which can be found in [1].
ωi Ci ,
Statement 1. We have an optimal schedule in which all job preemption times belong to the set {r1 , r2 , . . . , rn } and the total number of all job preemptions does not exceed n − 1. By virtue of Statement 1, it is sufficient to consider only those schedules that satisfy the above property. All subsequent Statements should also begin with the words “We have an optimal schedule in which...” We will omit these words but keep them in mind. All the properties given below must not contradict each other, and their combined use allows us to construct a finite subset of all schedules containing the optimal solution. We will elaborate on this in the next sections. Statement 2. Switching the machine to the performance of another job occurs only at the time ri or at the completion time of a certain job Cj . What follows is a set of properties related to the order in which the jobs are performed. By Si we will denote the job i start time i = 1, 2, . . . , n. Statement 3. If ωi < ωj and ri ≥ rj , then in any optimal schedule job j is completed before job i, in other words, Cj ≤ Si . Thus, under this condition, jobs i and j will not be competing in the schedule. The order of execution can be different only when a job with larger wight arrives later. Let us call such jobs competitive. Note that both inequalities are strict,
The Problem of a Single Machine with Preemption
165
because, if at least one equality is satisfied, the jobs will no longer be competitive. If ri = rj and ωi = ωj , then the jobs are identical. In such circumstances, the job with a higher number will be considered a higher priority. The non-competitive job pairs will be discussed in more detail below. Statement 4. Suppose jobs i and j be competitive, that is, ri < rj and ωi < ωj . If Si ≥ rj in an optimal schedule, then Si ≥ Cj , i.e., job j is completed before job i. Suppose pi ≤ p and pj ≤ p in the optimal schedule are the lengths of two continuous fragments completing jobs i and j, respectively. If pj is executed immediately after pi and no jobs arrive in the interval (Cj − pj − pi , Cj ), then pj pi ≤ . This is the rule for solving the problem 1|| ωi Ci , in which the optimal ωi ωj permutation α = (α1 , α2 , ..., αn ) satisfies the condition
3
pα1 ωα 1
≤
pα2 ωα 2
≤ ... ≤
pαn ωα n
.
Preprocessing Algorithm for Input Data
Let us describe some preprocessing procedures that allow reducing the original problem 1|ri , pi = p, pmtn| ωi Ci to solving a series of problems with a simpler and more regular structure. Let us arrange the jobs in non-decreasing order of weights ω1 ≤ ω2 ≤ . . . ≤ ωn . Consider two non-competitive jobs i and j for which ωi ≤ ωj and ri ≥ rj . According to Statement 3, job j is completed before job i, so Cj ≤ Si . If the intervals [rj , rj + p] and [ri , ri + p] overlap, that is, rj ≤ ri < rj + p, then job i cannot start before rj + p. Therefore, we can reassign ri = rj + p. If the jobs are identical, the priority is given to the job with a higher number. By doing this for each pair of non-competitive jobs with the overlapping intervals [rj , rj + p] and [ri , ri + p] (possibly repeatedly), we arrive at a point where all ri are different. Keep in mind that if ri = rj , then jobs i and j are non-competitive, and this procedure applicable to them. As a consequence, it is enough to consider instances when all values ri are different. The results of preprocessing are shown in Figs. 1 and 2. We assign a time axis to each job, and place these axes one below the other, beginning with the job with the greatest weight and then going down in descending order. We will mark the job arrival time with a bold line. Figure 1 shows an example of the input data. The jobs are shown in intervals [ri , ri + p]. Figure 2 shows the input data after preprocessing.
Fig. 1. Input data
166
K. A. Chernykh and V. V. Servakh
Fig. 2. Input data after preprocessing
Next, we will consider the problem decomposition procedure. Keep in mind that the jobs are indexed by non-decreasing weights, and job 1 has the smallest weight. Let us select a set of jobs that arrive no later than job 1. Suppose, there are k such jobs. According to Statement 3, all those jobs are completed before the start of job 1. Then we can assume that r1 = pk. We continue the procedure until job 1 can not be shifted. If job 1 moves to the very end, then its position is already determined, and we can solve the problem with n − 1 jobs (without job 1). If it has moved to the moment pk, then its execution explicitly starts at the moment S1 = pk, since there are no other candidates. If such jobs existed, then job 1 would move further. In other words, for all other jobs, ri > pk. In this case, the problem can be decomposed into two subproblems. The first subproblem contains k jobs with moments ri ≤ p(k − 1). The second one includes all other jobs, including job 1. Moreover, if in the second subproblem all ri > pk + p, then job 1 is completed in the interval [pk, pk + p], and we can analyze the problem without it. Therefore, we will assume that for the next job – base on the order of arrival times – ri < pk + p, i.e. the intervals [r1 , r1 + p] and [ri , ri + p] overlap. Note that in the first subproblem, which is solved in the interval [0, kp], it is also possible to use this approach with a shift of the lower-weight job unless it arrives first. Thus, the original problem is decomposed into smaller problems, each of which receives the job with a smaller weight first, and the next job arrives before the first one is completed. Developing the decomposition approach further, we can argue that the j-th job in the order of its arrival into the system must arrive no later than p(j−1)−1. Otherwise, the problem is decomposed again. The instance that cannot be simplified by the above procedures shall be called reduced. In the reduced instances, the optimal schedule has a length pn.
4
Constructing a Finite Set of Schedules
We have a list of properties and constraints that arise when setting the priority for executing competitive jobs. According to Statement 2, the alternative for choosing the next job arises at the moments rj and Cj . Consider the possible alternatives.
The Problem of a Single Machine with Preemption
167
Statement 5. Interval [Sj , Cj ] cannot contain fragments of a job with a smaller weight. If there is a job that interrupts an easier one, it gets completed before the easier job resumes its execution. This means that the job i with a smaller weight is either completed in its entirety before or after the start of the job j with a heavy weight, or it encompass it. This fact is further studied in [4]. In the given problem, job 1 has a minimal weight, and the following statement is true. Theorem 1. In the reduced instance, there exists an optimal schedule where the time of its completion C1 is a multiple of p for a job with minimal weight. The proof is obvious, since, according to Statement 5, all the jobs that interrupted it must be completed by the time C1 . Theorem 2. If in the optimal solution of the reduced instance C1 ≥
max
j=1,2,...,n
rj
then in the interval [C1 , pn] all jobs are executed without interruption, and all fragments have processing time p. The proof follows directly from Statement 1 and 5. Indeed, job 1 has the smaller weight. Any other job is either completed before C1 or not started. Statement 6. Interruption of the current job i at the time rj is possible only when the incoming job has the largest weight among all the jobs unfinished and available for execution. Then job j starts at rj . Thus, at the time rj , there is a choice between the current job i and job j. Besides, according to Statement 5, if the choice is in favor of job j, it will be completed before job i resumes. Now consider the variants of the schedule continuing at the time Cj of a job’s completion. Consider the possible scenarios: 1. All the jobs have been completed. The schedule has been constructed. 2. Not all the jobs have been completed, but there are no unfinished jobs in Cj ready for execution. Then the machine will idle until the next job arrives. Such situations do not occur if we are dealing with a reduced instance. 3. There are only whole jobs available, but there are no incomplete jobs. Execution for the job with the largest weight will start. 4. If there are unfinished jobs but no unstarted jobs available, the last interrupted job will be continued. 5. There are both interrupted and available jobs. Let i be the last interrupted job, and j is an available job with the largest weight. If i > j, then we will continue executing job i. 6. Finally, there are both interrupted and available jobs, and i < j. Then, there is a choice between i and j. Let us formulate it as a Statement. Statement 7. If there are alternatives available at the time Cj , either the last interrupted job will resume, or an available job with the maximum weight will start.
168
5
K. A. Chernykh and V. V. Servakh
Algorithm for Constructing a Finite Set of Schedules
Let us describe the order of generating all schedules that satisfy the above properties. We enumerate the jobs in the lexicographically increasing order of vectors (ωi , ri ). The algorithm considering times ri and Ci in ascending order and branching options if a choice of jobs is available. Keep in mind that at the time ri , according to Statement 5, one possible choice is between the current job and job j, and at the time Ci , according to Statement 7, the choice is between the last interrupted job and the heaviest available job. The order for ri is given by the permutation of α so that rα1 < rα2 < . . . < rαn . Let us organize the enumeration. To do this, we form stack T of jobs that are in progress, and a set of jobs D that are available for execution. Stack T is organized according to the standard rule “last in, first out”. Initially, we assume that T = (α1 ) and D = ∅. Starting from the moment rα1 , we execute job α1 until the time Cα1 or rα2 occurs. This is how we obtain a partial schedule. Let us assume that at a certain moment we have a set of partial schedules. In each of these schedules, some of the jobs have already been completed, some have been interrupted, and job k is in progress being the last in stack T . We keep executing this job until the time Ck or the nearest rm comes. If Ck < rm , then we exclude job k from T . With rm < Ck , we add job m to D; with Ck = rm we do both. Further, in T we take the last element – let it be the number i, and in D the element with the largest weight – let it be j. If i > j, then, according to case 5, we continue job i. Otherwise, we branch. With one variant, we execute job i without changing T and D. With the other, we execute job j. We need to remove from D and placed in the stack T . We have now obtained two new versions of partial schedules. If either D or T is empty, then there will be no branching, and the schedule continues until the next arrival time or the completion of a certain job. If both sets are empty, then the machine will be idle until the arrival of the next job. (This is impossible in a reduced instance). If there is no such moment, then a variant of the schedule is constructed. The choice is made only at times rj , apart from the minimal time, and at times Cj except for the last two. Consequently, no more than 22n−3 different schedules can be generated. This is, however, a rough estimate. In reality, the number of variants is significantly lower because branching occurs only when both the stack and the queue are not empty at the same time.
6
Parametric Analysis
Let us describe the way to identify those schedules that can become the optimal solutions to the problem. It is important to define the properties that allow the formation of such schedules and to understand why certain schedules cannot be the optimal. For this purpose, we conduct a parametric study of specific examples. Let us fix the values p and ri , i = 1, 2, . . . , n and the linear order for a
The Problem of a Single Machine with Preemption
169
set of jobs in lexicographically increasing order of vectors (ωi , ri ). The coefficients ωi , i = 1, 2, . . . , n will be seen as parameters with the corresponding condition ω1 ≤ ω2 ≤ . . . ≤ ωn . Using the algorithm described above, we construct the finite set of schedules. Suppose the set of all constructed solutions contain Q schedules, and let Ciq be the completion time of job i in schedule q = 1, 2, . . . , Q. These moments do not depend on any specific values of ωi , but only on the order in which they occur. For schedule q0 to be optimal, the following system of inequalities should be compatible n i=1
Ciq0 ωi
ω5 . Here we get a contradiction.
Fig. 6. Dominance transitivity of job fragments
The schedule on Fig. 7 above is very important. Here, job 4 is executed last after job 1. It follows from the non-optimality of this schedule that all jobs with weights value less than ω4 are executed continuously. In the case p = 2 this property is formulated as Statement 8. It is exactly this example that prompts the idea of using an algorithm for solving the problem.
The Problem of a Single Machine with Preemption
171
Fig. 7. Non-optimal schedule
In the next section, we formulate such a property as a hypothesis but at this point only for the p = 2. Full parametric analysis for all examples has been conducted at n = 8 and p = 2.
7
An Approach to Solving the Problem 1|ri , pi = 2, pmnt| ωi Ci
In this section, we consider the case p = 2 and reformulate the properties, described above, accordingly. Using parametric analysis, we were able to identify a subset of schedules, each of which can be optimal for certain weight values. This enabled us to propose a hypothesis and construct a solution algorithm based on the dynamic programming scheme. However, elaborating on the algorithm and its justification are beyond the scope of this paper. As before, we assume that ω1 ≤ ω2 ≤ . . . ≤ ωn and let (α1 , α2 , . . . , αn ) be a permutation for which rα1 ≤ rα2 ≤ . . . rαn . For the reduced instance at p = 2, the following conditions are satisfied: α1 = 1, rα1 = r1 = 0, rα2 = 1, and then j − 1 ≤ rαj < 2(j − 1), j = 2, 3, . . . , n. It follows from Theorem 1 that C1 is even. Theorem 2, in its turn, implies that, if C1 ≥ rαn in the optimal solution, then all jobs are executed without preemption in the interval [C1 , 2n], and all the fragments have processing time 2. Also, exploring the results of the parametric analysis allows us to propose the following hypothesis: Hypothesis. If C1 < rαn in the optimal solution of a reduced instance, then all jobs are executed without preemption in the interval [1, C1 − 1] and all the fragments have processing time 2. We do not currently have sufficient proof of this hypothesis. Therefore, the following approach for solving the problem can only be considered as a theoretical idea. There are several other properties for p = 2. The job with the maximum weight starts execution either at the time of arrival or at the next moment and is executed continuously. Hence, interval [rαn + 1, rαn + 2] is always occupied by job αn and C1 = rαn + 2. Statement 8. Suppose in the optimal solution, rαn + 3 ≤ C1 ≤ 2n − 2, and suppose jm be the largest number for the job executed after job 1. Then, all jobs 2, 3, . . . , jm are executed without preemption and will encompass the jobs of the set {jm+1 , jm+2 , . . . , n}.
172
K. A. Chernykh and V. V. Servakh
The idea is that the algorithm sequentially constructing optimal schedules for subsets of jobs with the largest weight values. For each iteration, we add the job with the lowest weight among the jobs considered. For that job, we conduct the enumeration for completion times. The expectation is that the jobs within a subset 2, 3, . . . , jm are executed continuously and will encompass the jobs of the subset jm+1 , jm+2 , . . . , n with larger weight values. Moreover, the schedule for the jobs of the first subset is uniquely constructed, while the schedule for the jobs of the second subset was based on the previous steps of the algorithm. Of course, it is not all that simple. Much remains unclear; however, some examples have already been implemented. Let us illustrate the approach with an example where all jobs are competitive, ω1 ≤ ω2 ≤ . . . ≤ ωn , and ri = i − 1, i = 1, 2, . . . , n. There are 8 jobs in this case, and each has its own row. The greater the weight, the higher the position of the job. The values ri = i − 1 in the figure are bolded by vertical lines. The initial data is shown in Fig. 8. In Fig. 9 and Fig. 10, we consider the possible variants of schedules with different values for C1 . The highlighted rectangle defines the set of encompassed jobs for which the schedule was obtained at previous stages. The algorithm for this example has the complexity O(n2 ).
Fig. 8. Example input data, n = 8, ri = i − 1
Fig. 9. Case analysis for various values of C1 ≤ rn
The Problem of a Single Machine with Preemption
173
Fig. 10. Case analysis for various values of C1 > rn
8
Conclusion
This paper explores the problem of minimizing the total weighted completion time of equal-processing time jobs on a single machine, given the job arrival times and a possibility of preemption. We propose an algorithm for preprocessing the input data, which reduces the problem to a narrower class of examples. An algorithm has been developed for constructing a finite subset of solutions containing the optimal schedule. Using the constructed linear programming model, we have conducted a parametric analysis of this subset’s schedules. We have identified a subclass of schedules that are optimal for certain weight values as well as properties that prevent optimality. Based on these properties, we propose an approach to solving the problem when all job processing times equal two. The proposed approach has been implemented for a particular class of examples. In further plans, the implementation of the described approach and the construction of a polynomial algorithm for solving the problem 1|ri , pi = p, pmtn| ωi Ci .
References 1. Pinedo, M.L.: Scheduling. Theory, Algorithms, and Systems, 3rd edn. Springer, Heidelberg (2008). 671p 2. Lenstra, J.K., Rinnooy Kan, A.H.G., Brucker, P.: Complexity of machine scheduling problems. Ann. Discrete Math. 1, 343–362 (1977) 3. Labetoulle, J., Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Preemptive scheduling of uniform machines subject to release dates. In: Progress in combinatorial optimization (Waterloo, Ont., 1982), pp. 245–261. Academic Press, Toronto, Ont. (1984) 4. Baptiste, P.: Scheduling equal-length jobs on identical parallel machines. Discrete Appl. Math. 103(1), 21–32 (2000)
174
K. A. Chernykh and V. V. Servakh
5. Baptiste, Ph., Carlier, J., Kononov, A., Queyranne, M., Sevastianov, S., Sviridenko, M.: Structural properties of preemptive schedules. Discrete Anal. Oper. Res. 16(1), 3–36 (2009). (in Russian) 6. Jaramillo, F., Erkoc, M.: Minimizing total weighted tardiness and overtime costs for single machine preemptive scheduling. Comput. Ind. Eng. 107, 109–119 (2017) 7. Jaramillo, F., Keles, B., Erkoc, M.: Modeling single machine preemptive scheduling problems for computational efficiency. Ann. Oper. Res. 197–222 (2019). https:// doi.org/10.1007/s10479-019-03298-9 8. Lazarev, A.A., Kvaratskhelia, A.G.: Properties of optimal schedules for the minimization total weighted completion time in preemptive equal-length job with release dates scheduling problem on a single machine. Autom. Remote Control 71, 2085–2092 (2010) 9. Batsyn, M., Goldengorin, B., Pardalos, P.M., Sukhov, P.: Online heuristic for the preemptive single machine scheduling problem of minimizing the total weighted completion time. Optim. Methods Softw. 29, 955–963 (2014) 10. Batsyn, M., Goldengorin, B., Sukhov, P., Pardalos, P.M.: Lower and upper bounds for the preemptive single machine scheduling problem with equal processing times. In: Goldengorin, B., Kalyagin, V., Pardalos, P. (eds.) Models, Algorithms, and Technologies for Network Analysis. Springer Proceedings in Mathematics and Statistics, vol. 59, pp. 11–27. Springer, New York (2013). https://doi.org/10.1007/ 978-1-4614-8588-9 2 11. Goemans, M.-X., Queyranne, M., Schulz, A.-S., Skutella, M., Wang, Y.: Single machine scheduling with release dates. SIAM J. Discrete Math. 15(2), 165–192 (2002) 12. Servakh, V.V., Chernykh, K.A.: The structure of the optimal solution of the problem of one machine with the possibility of interruptions of jobs. In: Proceedings of the XIV International School-Seminar “Problems of Optimization of Complex Systems”, pp. 312–321. Institute of Electrical and Electronics Engineers Inc, Almaty (2018) 13. Chernykh, K.A.: Servakh, V.V.: Combinatorial structure of optimal solutions to the problem of a single machine with preemption. In: 2019 15th International Asian School-Seminar Optimization Problems of Complex Systems. OPCS 2019, pp. 21–26. Institute of Electrical and Electronics Engineers Inc, Novosibirsk (2019) 14. Kravchenko, S.A., Werner, F.: Scheduling jobs with equal processing times. In: Proceedings of the 13th IFAC Symposium on Information Control Problems in Manufacturing, INCOM 2009, pp. 1262–1267. IFAC PROCEEDINGS VOLUMES (IFAC-PAPERSONLINE) (2009) 15. Fomin, A., Goldengorin, B.: An efficient model for the preemptive single machine scheduling of equal-length jobs (2020)
Solving Irregular Polyomino Tiling Problem Using Simulated Annealing and Integer Programming Aigul I. Fabarisova(B)
and Vadim M. Kartak
Ufa State Aviation Technical University, Karl Marx str. 12, 450008 Ufa, Russia
Abstract. This paper addresses the problem of irregularity in polyomino tiling. An integer programming model for tiling with L-tromino and L-tetromino and a heuristic approach based on the Simulated Annealing are introduced. To implement irregularity measurement, the evaluation function based on the entropy concept is proposed. The model is tested on the practical problem arising in phased array design. A set of results is reported to evaluate the performance of the proposed approach.
Keywords: Integer programming array antenna
1
· Irregular polyomino tiling · Phased
Introduction
Let us consider the tiling of a finite, square, simple connected region using a given set of polyominoes. A tiling T is a countable family of closed sets T = {T 1, T 2, ...} which cover the plane without gaps or overlaps. So the tiling problem is both packing and covering problem: a condition where T has no overlaps is called packing, and a condition where T covers the plane without gaps is called covering [2]. In this paper, a case of irregular polyomino tiling problem, i.e. tiling with two kinds of shapes: L-tromino and L-tetromino, was addressed. There are four orientations of L-tromino and eight orientations of L-tetromino (Fig. 1) [1]. Tilings theory has various fields of application, for example, in pattern and texture generation, sampling theory [5]. Aperiodicity is one of the properties of tilings that is essential for some practical applications. Aperiodic prototile sets are finite sets of tiles that can tile the plane, but only non-periodically. The most well-known example of aperiodic tilings is Penrose tiles. Polyominoes in tilings can be used in tiling substitution: the construction of infinite tilings using a finite number of tile types based on the special rule. This rule tells how to “substitute” each tile type in a way that can be repeated [6]. An example of the tiling substitution is shown in Fig. 2, which is called “Chair tiling”. It is known that if a set of polyominoes can be assembled into any rectangle, then they can serve as prototiles in a substitution system [5]. That is the reason why c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 175–183, 2021. https://doi.org/10.1007/978-3-030-86433-0_12
176
A. I. Fabarisova and V. M. Kartak
Fig. 1. Four orientations of L-tromino and eight orientations of L-tetromino
many polyominoes can serve as substitution tilings. Practically this substitution approach can be used in sampling with blue-noise properties. Each polyomino is recursively subdivided until the desired local density of samples is reached. For example, this research [3] showed that using polyominoes instead of Penrose tiles [4] helped to reduce strong artifacts of frequencies.
Fig. 2. L-tromino substitution or “Chair tiling”
Another application field of irregular polyomino tiling is a phased array antennas design. Phased array antennas are antennas with phase controls and time-delay devices that make it possible to control the beam direction and to keep an array pattern stationary [7]. In order to reduce the cost of the phased array antennas array elements are grouped into subarrays, and phase controls are introduced at the subarray level. But using rectangular subarrays causes periodicity and radiation of undesirable discrete sidelobes. Some research shows that using aperiodically located subarrays in a shape of polyomino can result in sidelobes decreasing [8]. So there is a problem of generating tilings that can reduce sidelobes level. It is said that the more irregular the tiling, the less the sidelobe level (SLL) is. But the dependency between SLL and subarray tiling is implicit. So there is no relation or function to describe this dependency. Thus, the mathematical formulation of object function for polyomino tiling optimization in this field is difficult. There are some approaches in this field, for example, by using the Genetic Algorithm [9,10]. In this research, the periodicity problem is solved using the autocorrelation function. Another research formulated
Solving Irregular Polyomino Tiling Problem Using SA and IP
177
it as a set covering problem. The irregularity of a tiling is incorporated into the objective function using the information-theoretic entropy concept [11]. An interesting approach was implemented in [12] based on the exact cover theory using Algorithm X proposed by Donald Knuth. The problem of irregular polyomino tiling was discussed in [13], where the integer linear programming (ILP) model for tiling with tromino was proposed. The current paper shows an improvement of the proposed method, i.e. ILP model for tiling with two kinds of polyomino shapes: L-tromino and L-tetromino, and a heuristic approach based on Simulated Annealing. The remainder of the paper is organized as follows. In Sect. 2 the ILP formulation of the case study is outlined. In Sect. 3 the heuristic approach is described. In Sect. 4 the computational results of the implemented ILP model and algorithm are presented. And some conclusions are drawn in Sect. 4.
2
Integer Linear Programming Model
The tiling of the finite, square N × N -sized structure with L-trominoes and L-tetrominoes is considered (Fig. 1). Let us introduce the following variables: – Let an N × N elements structure be represented as the board G with the squares G(i, j), i ∈ {0, . . . , n}, j ∈ {0, . . . , n} of monomino size, where (i, j) is the coordinate of each square. – Let P be the set of polyomino including all possible orientations and turns. Each polyomino {p1 , . . . , pL } ∈ P can be presented as the set of R squares with coordinates {(i1 , j1 ), (i2 , j2 ), . . . , (iR , jR )} – Let C be the center of polyomino which is the corner element of the polyomino. The variable zt , t ∈ T (l, i, j) equals 1 if the square G(i, j) contains the center C of polyomino pl and equals 0 otherwise: 1, C ∈ G(i, j) (1) zt , t ∈ T (l, i, j) = 0, C ∈ / G(i, j) The objective of the model is to fill the whole structure with polyominoes. So the sum of variables should be maximized. zt → max (2) t∈T (l,i,j)
The objective function ensures full cover with polyominoes but special constraints should be added to keep the shape of figures and to reduce overlapping: zf ≤ 1 (3) f ∈F (l,w,v)
178
A. I. Fabarisova and V. M. Kartak
Fig. 3. Tiling the board G(i, j) with L-tetromino
Each square of the board G(i, j) can contain R squares of polyomino pl ∈ P . Let F (l, w, v) be the set of (w, v) that corresponds to the shift of the central square of polyomoino when placing the r-th square of polyomino pl on the square G(i, j). To show the example of constraint let us consider tiling the board with the Ltetromino shape showed in Fig. 3. The constraint for this illustration will be as follows: zi,j + zi−1,j + zi,j−1 + zi−2,j ≤ 1
∀i, j
(4)
Comparing to the model in [13] described model requires also separate constraints for corners because of tetrominoes overlapping on the corners of the tiling. The optimal solution for this model is a structure covered with L-trominoes. It happens because the sum of variables is maximized when covered with the smallest shapes. To deal with this problem special constraints were added: zt ≤ SU Mmax (5) SU Mmin ≤ t∈T (k,i,j) k
where {p , . . . , p } are L-trominoes. 1
Solving Irregular Polyomino Tiling Problem Using SA and IP
3
179
Heuristic Approach
In given time limits the presented ILP model is able to solve only small size instances, i.e. structures of the size N ≤ 48 [13], but the real instances require finding the optimal tiling of structures with larger sizes. In order to solve large instances with low computational time and to solve the irregularity problem, we developed the heuristic based on the Simulated Annealing. 3.1
Evaluation Function and Upper Bound
As mentioned in Sect. 1, there is a problem arising in the field of phased array design related to the irregularity of polyomino tiling. One of the significant parameters of an antenna array is the sidelobe level (SLL). And it is known that the SLL depends on the irregularity of the elements on array structure [8]. But this dependence is implicit and complex and cannot be implemented in the integer programming model directly. So there is a problem of irregularity measurement, which can be dealt with the concept of information-theoretic entropy [11]. Let us consider that every polyomino has a gravity center, located in the corner square of each given shape. When covering the structure regularly, the centers of polyominoes are perfectly aligned. Irregularity of the tiling can be obtained by the uniform distribution of the gravity centers. For the structure of the size (m, n) maximum entropy can be counted as follows: H(X) ≤ H(unif orm) = −
m+n i=1
1 1 log2 ( ) = log2 (m + n) m+n m+n
(6)
The evaluation function was implemented. It is a gap between the upper bound H(2N ) and the entropy of the current structure X of the size N : GAP = 3.2
H(2N ) − H(X) × 100% H(2N )
(7)
Simulated Annealing
Algorithm 1 describes the pseudo-code of Simulated Annealing (SA) for solving the irregular tiling problem. The initial solution is generated by CPLEX without any random polyominoes added to the structure. It is considered the best solution at the beginning. While the minimum temperature is not reached, the algorithm randomly selects a segment of the structure, fills it with a Mrand number of trominoes with random coordinates. Then counts the gap between the entropy of a whole structure with updated segment and upper bound H(2N ), and accepts a new structure according to a probability function. The temperature is updated by a cooling rate α ∈ [0, 1].
180
A. I. Fabarisova and V. M. Kartak
Algorithm 1: Simulated Annealing with ILP BestSolution ← find initial tiling optimized with CPLEX t ← T empmax while t > T empmin do iter ← 0 while iter < Itermax do Gr ← pick a random segment of the structure while there is no a feasible solution do fill Gr with Mrand trominoes with random coordinates optimize segment tiling Gr with CPLEX end while s ← update the segment Gr in BestSolution δ ← GAP (s ) − GAP (BestSolution) if δ < 0 then BestSolution ← s else if rand(0, 1) ≤ e−δ/t then BestSolution ← s end if iter ← iter + 1 end while t ← t × (1 − α) end while return BestSolution
4
Computational Results
Testing of the model proposed in the previous section was carried out using the PC of the following configuration: Intel(R) Core(TM) i5-2450M CPU @2.50 GHz with 6 Gb RAM. The ILP model and described algorithm were implemented in Python code using the API of IBM ILOG CPLEX 12.6 [14]. Table 1 shows computational results for described model (D) compared with the model for tiling with L-trominoes only [13] and results of the research [11]. The values of SU Mmin and SU Mmax were chosen empirically and are as follows: (20, 40) for N = {16, 20}, (50, 100) for N = {24, 28, 30}, (100, 200) for N = {32, 40} and (300, 600) for N = 48. This values provided 100% fullness of the tiling. We can also observe that adding eight polyomino variables did not increase the computational complexity. The running time of model D is comparable with the state of the art results and even significally less for some N sizes.
Solving Irregular Polyomino Tiling Problem Using SA and IP
181
Table 1. Computational results for ILP model of tiling with L-tromino and Ltetromino Size N B & B nodes
Total running time (s)
T [13]
D
K [11]
T [13] D
K [11]
Created Processed Created Processed Created Processed 16
0
0
0
0
235
118
0.19
1.09 5
20
225
127
0
0
977
496
0.95
2.03 219
24
2284
1296
0
0
789
400
1.63
5.33 58
28
46566
24828
0
0
NA
NA
20.11 13.50 NA
30
48036
24782
0
0
1731
880
25.84 12.83 198
32
47336
24327
0
0
3075
1556
28.8
40
48202
24893
0
0
4259
2165
46.88 41.55 1198
48
896182
481331
0
0
NA
NA
800.5
8.17 510 33.64 NA
Table 2 shows comparative analysis of the ILP and SA algorithm in terms of phased antenna array simulation. After extensive experiments the following parameter values were selected in simulated annealing: T empmax = 20; T empmin = 0.05; Itermax = 1; α = 0.1. SA evaluation function (GAP column) gives a relatively good representation of structure irregularity and thus, SLL of an antenna array. It is noticeable that the ILP model itself gives the best result on reducing the sidelobe level (–27.11 dB) for the current size of the structure N = 32. But in case of any restriction on a number of tromino shapes (SU Mmin ; SU Mmax ) the SA can be useful in order to obtain the structure with much higher irregularity (0.43% ← 0.26%GAP ). Table 2. Comparative analysis of the ILP and the SA algorithm Instances
Model
Size N SU Mmin SU Mmax Mrand
Parameters SLL (dB)
GAP % Holes t (s)
r = 1.300 r = 1.818 32
20
80
160
240
0
ILP
–25.12
–17.34
0.43
0
146
0
SA&ILP –25.80
–18.19
0.49
0
1
20
–24.19
–16.71
0.25
4
40
48
–25.51
–19.09
0.38
3
60.2
72
–26.25
–19.26
0.27
8
72.3
100
–26.70
–18.70
0.26
3
100
0
ILP
–27.11
–20.31
0.39
0
13.77
0
SA&ILP –24.79
–17.29
0.67
0
67.2
20
–25.31
–17.96
0.45
4
47.7
48
–26.85
–19.60
0.38
3
56.7
72
–26.61
–19.52
0.40
3
76
100
–26.53
–18.94
0.45
4
100.9
182
5
A. I. Fabarisova and V. M. Kartak
Conclusions
In this paper, the integer linear programming model for the problem of irregular polyomino tiling with L-tetrominoes and L-trominoes is presented. ILP model was programmed in Python language using CPLEX. It is shown that our model can give comparable and even less running time for some N -sizes. Another significant conclusion is the following: implementing two types of polyominoes instead of one type (despite the following increase of variables) does not consequently increase the computational complexity of the model and gives even faster solutions. The developed heuristic based on Simulated Annealing is useful in cases when there is a certain restriction on the number of trominoes used in tiling. Since this problem has a lot of applications, like sampling with blue-noise properties and designing of phased array antennas, it would be interesting to continue the research and implement more polyomino shapes and test the model on different practical problems. Acknowledgments. This research was supported by the Russian Foundation for Basic Research, project No. 19-07-00895.
References 1. Golomb, S.: Polyominoes: Puzzles, Patterns, Problems and Packings, 2nd edn. Princeton University Press, Princeton (1996) 2. Gr¨ unbaum, B., Shephard, G.C.: Tilings and patterns. W. H. Freeman and Company, New York (1987) 3. Ostromoukhov, V.: Sampling with polyominoes. ACM Trans. Graph. (SIGGRAPPH) 26(3), 78:1–78:6 (2007) 4. Ostromoukhov, V, Donohue, C., Jodoin, P.-M. Fast Hierarchical Importance Sampling with Blue Noise Properties. ACM Trans. Graph. 23(3), 488–495 (2004) 5. Kaplan, C.: Introductory Tiling Theory for Computer Graphics. Synthesis Lectures on Computer Graphics and Animation, Morgan & Claypool Publishers (2009) https://doi.org/10.2200/S00207ED1V01Y200907CGR011 6. Frank, N.: A primer of substitution tilings of the euclidean plane. Expositiones Mathematicae 26, 295–326 (2008). https://doi.org/10.1016/j.exmath.2008.02.001 7. Mailloux, R.: Phased Array Antenna Handbook, 2nd edn. Artech House, Norwood (2005) 8. Mailloux, R., Santarelli, S., Roberts, T., Luu, D.: Irregular polyomino-shaped subarrays for space-based active arrays. Int. J. Antennas Propag. 2009, 9 (2009). https://doi.org/10.1155/2009/956524 9. Chirikov, R., Rocca, P., Bagmanov, V., et al.: Algorithm for phased antenna array design for satellite communications. Vestnik UGATU 17(4), 159–166 (2013) 10. Rocca, P., Chirikov, R., Mailloux, R.J.: Polyomino subarraying through genetic algorithms. In: Antennas and Propagation Society International Symposium (APSURSI), pp. 1–2. IEEE (2012) 11. Karademir, S., Prokopyev, O., Mailloux, R.: Irregular polyomino tiling via integer programming with application in phased array antenna design. J. Global Optim. 65(2), 137–173 (2015)
Solving Irregular Polyomino Tiling Problem Using SA and IP
183
12. Xiong, Z.Y., Xu, Z.H., Chen, S.W., Xiao, S.P.: Subarray partition in array antenna based on the algorithm X. Antennas Wirel. Propag. Lett. 12, 906–909 (2013). https://doi.org/10.1109/LAWP.2013.2272793 13. Kartak, V.M., Fabarisova, A.: An integer programming approach to the irregular polyomino tiling problem. In: Bykadorov, I., Strusevich, V., Tchemisova, T. (eds.) MOTOR 2019. CCIS, vol. 1090, pp. 235–243. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-33394-2 18 14. IBM: IBM ILOG CPLEX. http://www.ibm.com/products/ilog-cplexoptimization-studio. Accessed 25 Feb 2021
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover for Continuous p-Median Problems Lev Kazakovtsev1,2(B) , Ivan Rozhnov1 , Ilnar Nasyrov1 , and Viktor Orlov1 1
2
Reshetnev Siberian State University of Science and Technology, prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660031, Russia [email protected] Siberian Federal University, 79 Svobodny pr., Krasnoyarsk 660041, Russia
Abstract. The continuous p-median problem is a classical global optimization model used both for finding optimal locations of facilities on a plane, and as a clustering model. The problem is to find p points (medians) such that the sum of the distances from known demand points to the nearest median is minimal. Such a clustering model is less sensitive to “outliers” (separately located demand points that are not included in any cluster), compared with k-means models. Many heuristic approaches were proposed for this NP-hard problem. The genetic algorithms with the greedy agglomerative crossover operator are some of the most accurate heuristic methods. However, a parameter of this operator (parameter r) determines the efficiency of the whole genetic algorithm. The optimal value of this parameter is hardly predictable based on numerical parameters of the problem such as the numbers of demand points and medians. We investigate its influence on the result of the algorithm, and also propose the use of an exploratory search procedure for adjusting it. The advantages of the self-adjusting algorithm are shown for problems with a data volume of up to 2,075,259 objects. Keywords: p-median · Facility location analysis · Agglomerative clustering
1
· Genetic algorithm · Cluster
Introduction and Problem Statement
The continuous p-median problem is one of the classical models of location theory which can also be used as an unsupervised learning (cluster analysis) model. For this problem of continuous unconstrained global optimization proved to be NP-hard, exact algorithms of polynomial complexity are not known, and the efforts of algorithm developers are aimed at obtaining fast algorithms resulting in appropriate solutions compromised in the calculation time and the solution accuracy. Supported by Ministry of Science and Higher Education of the Russian Federation (Project FEFE-2020-0013). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 184–200, 2021. https://doi.org/10.1007/978-3-030-86433-0_13
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
185
The goal of a location problem [1] is to find locations of p points (centers, centroids, medians). The goal of the continuous p-median problem is to find p points (centers, medians) such that the sum of the distances from N known points to the nearest of p centers reaches its minimum. Let A1 , ..., AN be known points in a continuous space, N be given in advance, Ai = (ai,1 , ..., ai,d ), Ai ∈ Rd , and S = {X1 , ..., Xp } ⊂ R2 be the set of sought points (medians, centers). The objective function (sum of distances) of the pmedian problem is: F (X1 , ..., Xp ) =
N i=1
minj=1,p L(Xj , Ai ) → minX1 ,...,Xp ∈Rd .
(1)
Here, integer p must be known in advance. By default, Euclidean metric is used in location problems: L(Xj , Ai ) = d ( k=1 (xj,k − ai,k )2 )1/2 . Here, Xj = (xj,1 , ..., xj,k )∀j = 1, p, k = 1, d are sought centers also called medians, Ai = (ai,1 , ..., ai,k )∀i = 1, N are known points called demand points or data vectors. Local search methods for optimization problems are the most natural and intuitive [2]. If p = 1, we deal with the Weber problem [3] which can be solved by an iterative Weiszfeld descent procedure [4] or its improved modifications [5,6] with a predefined accuracy. However, a local descent cannot find the global optimum for our problem if p > 1. Despite we are dealing with a formally continuous problem, a large number of local optima makes local search in the ε-neighborhood unproductive. d In the case of squared Euclidean distances L(Xj , Ai ) = k=1 (xj,k − ai,k )2 , we deal with a similar k-means problem which is the most popular for clustering [7–9]. Its disadvantage is its high sensitivity to the outliers (separately located data points) which is important for such clustering problems as grouping highlyreliable electronic devices into homogeneous production batches [10]. The pmedian model is less sensitive and provides a higher quality clustering [11]. One of the most powerful and commonly used algorithms is the Alternate Location-Allocation (ALA) procedure also called Lloyd’s procedure [9,12,13] which alternates the solution of the 1-median problems for groups of demand points (clusters) having the same nearest median with redistribution (allocation) of the demand points among the medians. The first p-median problems were formulated on networks [14] with the goal to find p nodes of a network which minimize the sum of the weighted distances from all nodes to the nearest of p nodes selected as medians. Both network and continuous p-median problems are NP-hard [15,16]. In the early attempts to solve the p-median problem by exact methods, the authors proposed a branch and bound algorithm [17–19] capable of solving very small problems. The development of exact methods resulted in algorithms capable of solving problems og up to 4000 demand points and 100 centers [20]. Nevertheless, the scope of the exact methods is still very limited, and algorithm developers focus on heuristic approaches [21–25]. For larger problems, the simplest heuristic approaches consist in local search [21,22]. Using Lagrangian relaxations enables us to obtain an
186
L. Kazakovtsev et al.
approximate solution to problems of average size [24,26,27], up to N = 90000 with a tight bound on solution quality. Techniques for p-median problems were summarized in many reviews [28–30]. Genetic algorithms (GA) are popular for solving the p-median (both continuous and network), k-means as well as other location problems. Drezner et al. [31] proposed heuristic procedures including the genetic algorithm, for rather small data sets. Genetic and other evolutionary approaches are used to improve the results of the local search [32–37]. Such algorithms recombine the initial solution obtained by the ALA procedure. GAs operate with a certain set (population) of solutions and include special genetic operators (algorithms) of initialization, selection, crossover and mutation. Standard one-point and two-point crossover operators have been proved to present drawbacks when applied to problems such as k-means and p-median problems [38]: the offspring solutions are too far from those of their parents. The GAs with the greedy agglomerative crossover operator also called “merging procedure” demonstrate better results [35,39–41]. In this crossover operator, as well as in the Information Bottleneck Clustering algorithms [42,43], the number of centers is successively reduced down to p. Being originally developed for the network p-median problems [39], such algorithms were adapted for the p-medoids, p-median and k-means problems [35,40]. Their adaptation for continuous problems was carried out in two different ways: (a) solving the corresponding discrete (p-medoids) problem with the GA and improving its final solution with the ALA procedure or a local search algorithms for the continuous problem [41], and (b) embeddding the ALA procedure into the greedy agglomerative crossover operator [40,44]. The latest research [41] shows that the best solution of the discrete problem is not always a good initial solution of the continuous problem. The greedy agglomerative procedure is a computationally expensive algorithm, especially in the case of continuous problems when it includes multiple execution of a local search algorithm. However, this procedure allows us to find solutions that are difficult to improve by other methods without significantly increasing the calculation time. This feature of the greedy agglomerative crossover operator allows one to obtain highly precise GAs that traditionally use the simplest selection genetic operators and no mutation operator. The development of massive parallel processing systems such as graphics processing units (GPUs) makes the repeated launch of adapted versions of local search algorithms very cheap which allows the most advanced algorithms to solve problems with several millions of demand points. A greedy agglomerative procedure composes an interim solution with an excessive number K = p + r > p of medians and successively removes of the medians until obtaining a solution with the desired number of medians. Such procedures are used in the greedy crossover operators of the genetic algorithms, and r (a number of added medians which have to be eliminated) is an important parameter of such a genetic operator. In this paper, we demonstrate that a correct choice of this parameter r determines the efficiency of the whole genetic algorithm, and the optimal value of
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
187
this parameter is hardly predictable based on the numeric parameters of the p-median problem such as numbers of demand points and medians. We add an exploratory search procedure for tuning this parameter to the initialization step of the genetic algorithm. Our experiments demonstrate the advantage of genetic algorithms with such a procedure in comparison with known genetic algorithms. The rest of this paper is organized as follows. In Sect. 2, we give an overview and description of known algorithms used in our research: the genetic algorithms and its components including the greedy agglomerative procedure. In Sect. 3, we investigate the dependence of the results on the parameter r of the greedy agglomerative procedure and introduce the exploratory search procedure. In Sect. 4, we present the results of comparative computational experiments. In Sect. 5, we give a short conclusion.
2
Known Algorithms
For finding a locally optimal solution of our problem in the ε-neighborhood, one of the most powerful and commonly used algorithms is the ALA procedure also called Lloyd’s procedure [9,12], described as follows (Algorithm 1). Algorithm 1. ALA(S) Require: Set of initial centers S = {X1 , ..., Xp }. If S is not given, then the initial centers are selected randomly from the set of data vectors {A1 , ..., AN }. repeat 1. For each center Xj , j = 1, p, a subset Gj of data vectors for which Xj is the nearest center; 2. For each subset Gj , j = 1, p, calculate its new center having solved the Weber problem with Weiszfeld algorithm [2] on Gj ; until all centers stay unchanged.
Weiszfeld algorithm [4] (not given in this paper for brevity) approximately solves the Weber problem for each group Gj . Algorithm 1 finds a local optimum. The essence of various genetic algorithms is a recombination of elements in a set (“population”) of candidate solutions (“individuals”) encoded by “chromosomes”. A chromosome is a vector of bits, integers or real numbers. Initially, genetic algorithms were designed for solving discrete problems. The first pmedian genetic algorithms [45] as well as the first GAs with the greedy crossover [39] used a simple binary chromosome encoding (1 for the network nodes selected as the medians and 0 for those not selected). If the medians are searched in a continuous space, GAs can also use binary encoding of solutions [36,41,46]. In the ALA algorithm, its initial solutions are usually subsets of the demand point set {A1 , ..., AN }. In binary chromosome code, 1 means that the corresponding demand point has been selected as an initial median for the ALA procedure or some local search algorithm. Solutions (chromosomes) in the GAs for p-median
188
L. Kazakovtsev et al.
can also be encoded directly [33] by the coordinates of medians (vectors of real numbers). The GAs with the greedy agglomerative crossover use such a direct encoding [40,47,48]. The SWAP neighborhood search [2,13] is a popular method for solving pmedian and k-means problems. The j-means algorithm [49] using these neighborhoods is one of the most accurate methods. This algorithm, after finding a locally optimal solution in an ε-neighborhood, makes an attempt to replace one of the medians X1 , ..., Xp with one of the demand points A1 , ..., AN . If the value of the objective function has improved, the search continues in the ε-neighborhood of the new solution. The greedy agglomerative procedures can be used as independent algorithms or embedded in genetic and other algorithms [35,39,44,48]. They can be described as follows (Algorithm 2) [48]: Algorithm 2. BasicAggl(S) Require: Set of initial medians S = {X1 , . . . , XK }, K > p, required number p. S ← ALA(S); while |S| > p do for i=1, K do Fi ← F (S\{Xi }); end for Select a subset S ⊂ S of relim medians with the minimum values of the corresponding variables F i; // By default, relim = 1. S ← ALA(S\S ); end while.
To improve the performance of such a procedure, the number of simultaneously eliminated medians can be calculated as relim = max{1, (|S| − p) · rcoef }. In [48,50], the authors used the elimination coefficient rcoef = 0.2. This means that at each iteration, up to 1/5 of the excessive medians are eliminated, and such values are proved to make the algorithm faster. An agglomerative procedure which combines two given solutions can be described as Algorithm 3 [48]: Algorithm 3. Agglr (S, S2 ) Require: Two sets of medians S, S2 , | S |=| S2 |= p, the number of medians r of the solution S2 which are used to obtain the resulting solution, r ∈ {1, p}. For i = 1, nrepeats do 1. Select a subset S ⊂ S2 :| S |= r. 2. S ← BasicAggl(S ∪ S ); 3. if F (S ) < F (S) then S ← S end if ; end for return S.
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
189
Such procedures with r = 1, p can be used as a crossover operator in genetic algorithms [39–41]. Value nrepeats depends on r: nrepeats = max{1, [p/r]}. Alp et al. [39], Neema et al. [35], as well as Kazakovtsev et al. [40,47] use this procedure as a genetic crossover operator with minimum or maximum value of r : r = 1 or r = p only. A commonly used genetic algorithm for the p-median and k-means problems can be described as follows (Algorithm 4) [40,48]: Algorithm 4. GA with real-number chromosome encoding for the k − means and p − median problems Require: Initial population size NP OP (in our experiments, initial NP OP = 5). 1: Niter ← 0; Generate a population of NP OP initial solutions S1 , . . . , SNP OP ⊂ {A1 , . . . , AN } where | Si |= p∀i = 1, NP OP . For each initial solution, run the ALAalgorithm and save corresponding obtained values of the objective function (1): Si ← ALA(Si ), fi = F (Si )∀i = 1, NP OP ; loop 2: Niter ← Niter + 1; if the stop condition (time limitation) is satisfied then STOP. Return solution Si∗ from the population with minimal value of fi end if ; 3: Randomly choose two indexes k1 , k2 ∈ {1, NP OP }, k1 = k2 ; 4: Run chosen crossover procedure to create a new solution: SC ← Crossover(Sk1 , Sk2 ); 5: Run chosen procedure to replace a solution in the population; end loop.
We used the tournament replacement for Step 5 of Algorithm 4: 5: Randomly choose two indexes k4 , k5 ∈ {1, NP OP }, k4 = k5 ; If fk4 > fk5 then k3 ← k4 else k3 ← k5 end if ; Sk3 ← Sc ; fk3 ← F (Sc ). Such algorithms usually operate with a very small population, and other selection procedures do not improve the results significantly [39,40,51]. The crossover operator (Algorithm 3) is computationally expensive due to multiple runs of the ALA algorithm. For large-scale problems and very strict time limitation, GAs with greedy heuristic crossover operator perform only few iterations for large-scale problems. The population size is usually small, up to 10–25 chromosomes. Dynamically growing populations [40,47,48] are able to improve the results. In this case, Step 2 of Algorithm 4 is replaced by the following procedure: √ 2: Niter ← Niter + 1; NP OP ← max{NP OP , 1 + Niter }; if NP OP has changed then initialize randomly a new individual SNP OP ⊂ {A1 , . . . , AN }, | SNP OP |= p; SNP OP ← ALA(SNP OP ) end if. The mutation operator can slightly improve the result [47,48]. We focus on the crossover operator adjustment and thus leave the mutation operator empty.
190
3
L. Kazakovtsev et al.
Parameter r Adjustment
In papers [35,39,40,47], the procedure Agglr (S, S2 ) as a Crossover(S, S2 ) operator is used with r = 1 or r = p in the genetic algorithms for discrete and continuous p-median and k-means problems. We investigated the influence of parameter r on the result of genetic algorithms. The obtained results are summarized in Fig. 1, 2. A description of the used datasets is given in Sect. 4.
Fig. 1. Dependence of the of GA result with greedy agglomerative crossover operator on its parameter r for summarized after 30 runs of each algorithm. BIRCH3 data set, 105 demand points in R2 , time limitation 10 s
As can be seen from Fig. 1, 2, parameter r determines the efficiency of the algorithm. Procedures similar with Algorithm 3 were used in Variable Neighborhood Search (VNS) algorithms for the k-means problem [44], and the results were also determined by the correct tuning of the parameter r. For tuning the parameter r, we propose an exploratory search algorithm (Algorithm 5). After running it, the genetic algorithm runs Algorithm 4 with the dynamically growing population, and Agglr () as the Crossover() operator where r is chosen randomly r ∈ {max{1, [r0 /1.5]}, min{p, 1.5r0 }}. Here, r0 is the value found by Algorithm 5. Thus, in the proposed algorithm, the r value fluctuates within certain limits at each crossover operation. Starting from the same initial solution S, Algorithm 5 tries to implement the Agglr () procedure to this solution NP OP times with the same set (population) of second solutions Si , i = 1, nexplor . The population remains unchanged. This experiment is repeated with various values of r. Then, an option r0 with the best final result is selected.
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
191
(a)
(b)
(c)
Fig. 2. Dependence of the of GA result with greedy agglomerative crossover operator on its parameter r for summarized after 30 runs of each algorithm. Individual Household Electric Power Consumption (IHEPC) data set, 2,075,259 demand points in R7 , time limitation 5 min. (a–c): 30, 100 and 300 medians, respectively.
192
L. Kazakovtsev et al.
Algorithm 5. AgglAdapt( ): Initialization step of the GA with the exploratory search (implementation of Step 1 in Algorithm 4) Niter ← 0; Generate a population of NP OP initial solutions S1 , . . . , SNP OP ⊂ {A1 , . . . , AN } where | Si | = p∀i = 1, NP OP ; For each initial solution, run the ALA-algorithm and save corresponding obtained values of the objective function (1): Si ← ALA(Si ), fi = F (Si )∀i = 1, NP OP ; Select randomly S ⊂ {A1 , . . . , AN }, | S |= k; S ← ALA(S); r ← p; repeat Sr ← S; for i = 1, NP OP do S ← Agglr (Sr , Si ); if F (S ) < F (Sr ) then Sr ← S end if ; end for; r ← max{1, [ r2 ] − 1}; until r = 1 ; select the value r0 with minimum value of F (Sr ).
4
Computational Experiments
For our computational experiments, we used the following test system: Intel Core 2 Duo E8400 CPU, 16 GB RAM, NVIDIA GeForce GTX1050ti GPU with 4096 MB RAM, floating-point performance 2138 GFLOPS. This choice of the GPU hardware was made due to its prevalence, and also one of the best values of the price/performance ratio. The program code was written in C++. We used Visual C++ 2017 compiler embedded into Visual Studio v.15.9.5, NVIDIA CUDA 10.0 Wizards, and NVIDIA Nsight Visual Studio Edition CUDA Support v.6.0.0. For all datasets, 30 attempts were made to run each of algorithms (Tables 1, 2, 3, 4 and 5). In all our experiments, we used the classic data sets from the UCI Machine Learning and Clustering basic benchmark repositories [52,53]: (a) Individual Household Electric Power Consumption (IHEPC) – energy consumption data of households during several years, more than 2 million demand points (data vectors) in R2 , 0-1 normalized data, “date” and “time” columns removed; (b) BIRCH3: 100 groups of points of random size, 100000 demand points in R2 ; (c) S1 data set: Gaussian clusters with cluster overlap (5000 demand points in R2 ). As can be seen from Table 3, the result of each algorithm depends on the elapsed time. Nevertheless, an advantage of the new algorithm is evident regardless of the chosen time limit.
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
193
Table 1. Comparative results of computational experiments with data set BIRCH3 (100000 data vectors in R2 , 100 medians, time limitation 10 s) Algorithm
Objective function value (sum of distances) min (the best attempt)
Average among 30 attempts
Median
Standard deviation
ALA
2.27163E+09
2.44280E+09
2.46163E+09 0.74233E+08
j-means
1.52130E+09
1.69226E+09
1.67712E+09 0.803684E+08
GA+Aggl1
1.67353E+09
2.24229E+09
2.29725E+09 2.01981E+08
GA+Aggl3
1.49601E+09
1.56265E+09
1.55306E+09 0.47064E+08
GA+Aggl5
1.49290E+09
1.56637E+09
1.55676E+09 0.52833E+08
GA+Aggl7
1.50184E+09
1.57234E+09
1.56084E+09 0.59410E+08
GA+Aggl10
1.49413E+09
1.51849E+09
1.51692E+09 0.19989E+08
GA+Aggl12
1.49331E+09
1.51040E+09
1.50533E+09 0.20911E+08
GA+Aggl15
1.49503E+09
1.51858E+09
1.51985E+09 0.18655E+08
GA+Aggl20
1.49417E+09
2.10975E+09
2.19089E+09 3.73064E+08
GA+Aggl25
1.49893E+09
1.99754E+09
2.10804E+09 3.32115E+08
GA+Aggl30
1.50960E+09
2.06125E+09
2.16929E+09 3.95151E+08
GA+Aggl50
1.50985E+09
1.89144E+09
1.92886E+09 2.87190E+08
GA+Aggl75
1.50491E+09
1.77066E+09
1.80732E+09 2.22571E+08
GA+Aggl100
1.49318E+09
1.71780E+09
1.73917E+09 1.83084E+08
GH-VNS1
1.49362E+09
1.54540E+09
1.51469E+09 0.50746E+08
GH-VNS2
1.49413E+09
1.50692E+09
1.50235E+09 0.18750E+08
GH-VNS3
1.49247E+09
1.50561E+09
1.49497E+09 0.18442E+08
SWAP1
1.70412E+09
1.83218E+09
1.83832E+09 0.60441E+08
GA-1POINT
1.61068E+09
1.69042E+09
1.65346E+09 0.72393E+08
GA-UNIFORM
1.67990E+09
1.77471E+09
1.76783E+09 0.69545E+08
GA-AgglAdapt ⇔ 1.49251E+09
1.49486E+09
1.49468E+09 0.02601E+08
Note (for all tables): ⇑: the advantage of the GA-AgglAdapt algorithm in comparison with the best of other algorithms is statistically significant (Wilcoxon test, significance 0.05); ⇔: the advantage/disadvantage is insignificant
We examined the following algorithms: (a) ALA and j-means (a combination of SWAP1 search and ALA) in the multi-start mode; (b) GAs with Agglr () crossover, r = 1, p; (c) VNS1-3: Variable Neighborhood Search algorithms with neighborhoods formed by application of Aggl() procedure, see [51]; (d) local search in SWAPr neighborhoods, r = 1, p (only the best result is given); (e) GA-1POINT: GA with a standard 1-point crossover; (f) GA-UNIFORM: GA with a standard uniform crossover and uniform random mutation [33]; (g) GAAgglAdapt: the proposed GA with the AgglAdapt() initialization procedure. Each algorithm was launched 30 times. As can be seen from Tables 1, 2, 3, 4 and 5, the results of our new algorithm are not inferior to those of the well-known algorithms including known GAs.
194
L. Kazakovtsev et al.
Table 2. Comparative results of computational experiments with S1 data set (5000 data vectors in R2 , 50 medians, 1 s) Algorithm
Objective function value (sum of distances) min (the best attempt)
Average among 30 attempts
Median
Standard deviation
ALA
1.14205E+08
1.15594E+08
1.15666E+08 6.33573E+05
j-means
1.13500E+08
1.15035E+08
1.14954E+08 9.38353E+05
GA+Aggl1
1.13450E+08
1.15816E+08
1.15735E+08 12.5332E+05
GA+Aggl2
1.12587E+08
1.13707E+08
1.13619E+08 5.79217E+05
GA+Aggl3
1.12627E+08
1.13299E+08
1.13319E+08 3.93695E+05
GA+Aggl5
1.12580E+08
1.13086E+08
1.13030E+08 4.01422E+05
GA+Aggl7
1.12537E+08
1.12921E+08
1.12850E+08 3.89649E+05
GA+Aggl10
1.12499E+08
1.12854E+08
1.12812E+08 2.22318E+05
GA+Aggl15
1.12556E+08
1.12787E+08
1.12696E+08 3.15382E+05
GA+Aggl20
1.12547E+08
1.12767E+08
1.12717E+08 1.83210E+05
GA+Aggl30
1.12509E+08
1.12717E+08
1.12695E+08 1.21577E+05
GA+Aggl50
1.12536E+08
1.12703E+08
1.12685E+08 0.94139E+05
GH-VNS1
1.12419E+08
1.12467E+08
1.12446E+08 0.74779E+05
GH-VNS2
1.12472E+08
1.12525E+08
1.12519E+08 0.33722E+05
GH-VNS3
1.12531E+08
1.12712E+08
1.12708E+08 0.96093E+05
SWAP1
1.13142E+08
1.14430E+08
1.14412E+08 8.48529E+05
GA-1POINT
1.14271E+08
1.15443E+08
1.15343E+08 7.16204E+05
GA-UNIFORM
1.13119E+08
1.14384E+08
1.14405E+08 6.75227E+05
GA-AgglAdapt⇔ 1.12420E+08
1.12524E+08
1.12527E+08 0.30970E+05
Table 3. Comparative results of computational experiments with data set BIRCH3 data set (100000 data vectors in R2 , 300 medians, 10 s) Algorithm
Objective function value (sum of distances) min (the best attempt)
Average among 30 attempts
Median
Standard deviation
ALA
1.54171E+09
1.59859E+09
1.60045E+09 2.62443E+07
j-means
9.95926E+08
1.05927E+09
1.04561E+09 4.63788E+07
GA+Aggl1
1.35521E+09
1.63084E+09
1.62734E+09 11.6506E+07
GA+Aggl3
1.31486E+09
1.43469E+09
1.42866E+09 8.64632E+07
GA+Aggl5
1.30397E+09
1.40745E+09
1.41231E+09 6.38373E+07
GA+Aggl7
1.23973E+09
1.38283E+09
1.37158E+09 9.31228E+07
GA+Aggl10
1.22794E+09
1.35611E+09
1.34417E+09 8.28254E+07
GA+Aggl15
1.08778E+09
1.27035E+09
1.27625E+09 8.20298E+07
GA+Aggl20
1.07003E+09
1.19707E+09
1.20986E+09 6.48123E+07
GA+Aggl30
1.01450E+09
1.17255E+09
1.15807E+09 7.97245E+07
GA+Aggl50
0.95070E+09
1.06361E+09
1.05669E+09 5.22946E+07
GA+Aggl75
0.92843E+09
1.02781E+09
1.03396E+09 3.99725E+07
GA+Aggl100 0.92027E+09
0.98427E+09
0.98234E+09 3.98169E+07 (continued)
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
195
Table 3. (continued) Algorithm
Objective function value (sum of distances) min (the best attempt)
Average among 30 attempts
Median
Standard deviation
GA+Aggl150
0.91604E+09
0.95030E+09
0.94126E+09 2.75138E+07
GA+Aggl200
0.90437E+09
0.91968E+09
0.91611E+09 1.45980E+07
GA+Aggl250
0.90407E+09
0.91195E+09
0.90879E+09 0.95858E+07
GA+Aggl300
0.90446E+09
0.91059E+09
0.90691E+09 0.73890E+07
GH-VNS1
1.00289E+09
1.03856E+09
1.03485E+09 2.20294E+07
GH-VNS2
9.68045E+08
1.01461E+09
1.00404E+09 3.80607E+07
GH-VNS3
9.12455E+08
9.31225E+08
9.32001E+08 0.82565E+07
SWAP2
1.23379E+09
1.33987E+09
1.35305E+09 5.64094E+07
SWAP3
1.25432E+09
1.34388E+09
1.34424E+09 5.01104E+07
GA-1POINT
1.11630E+09
1.23404E+09
1.22828E+09 7.05936E+07
GA-UNIFORM
1.17534E+09
1.26758E+09
1.25424E+09 5.20153E+07
GA-AgglAdapt⇑ 0.90430E+09
0.90508E+09
0.90509E+09 0.04096E+07
Table 4. Comparative results of computational experiments with IHEPC data set (2,075,259 data vectors in R7 , 100 medians, 5 min) Algorithm
Objective function value (sum of distances) min (the best attempt)
Average among 30 attempts
Median
Standard deviation
65037.8281 1431.8760
ALA
61803.3047
64612.3371
j-means
61374.5742
63433.8962
62227.1172 3308.9649
GA+Aggl1
61141.4961
69402.0446
63167.8867 11833.2850
GA+Aggl2
59631.1250
74008.5285
83785.7813 12633.6120
GA+Aggl3
59753.6328
77777.4185
84373.5938 11827.4247
GA+Aggl5
60150.8203
75511.2667
84605.6328 11960.2879
GA+Aggl7
59959.9141
67826.1825
61301.6953 11597.3982
GA+Aggl10
59422.7188
75260.1027
83845.0625 11599.3057
GA+Aggl15
59123.1016
73404.8850
83470.1719 12902.8288
GA+Aggl20
59511.8008
66767.9057
60956.7578 11382.0521
GA+Aggl30
58601.1758
66177.5603
59535.6719 11737.7756
GA+Aggl50
58520.6680
76114.5608
82994.4922 11839.6797
GA+Aggl75
58236.3867
62222.4408
58776.9961 9174.5316
GA+Aggl100
58008.5039
62885.9381
58439.5742 9170.8225
GH-VNS1
59228.6055
73899.8482
83708.7031 13239.8092
GH-VNS2
82966.8750
83061.5056
83033.9844 128.9717
GH-VNS3
59417.9375
78905.5815
82124.6094 8593.7433
SWAP10
61196.4414
70051.5938
62676.3125 10652.0083
GA-1POINT
62192.1719
63051.7500
63028.9922 733.3966
GA-UNIFORM
60873.5859
63656.8555
64155.6875 2084.6202
GA-AgglAdapt ⇔ 58215.0000
63617.3129
59109.6660 10341.5159
196
L. Kazakovtsev et al.
Table 5. Comparative results of computational experiments with IHEPC data set (2,075,259 data vectors in R7 , 300 medians, 5 min) Algorithm
Objective function value (sum of distances) min (the best attempt)
Average among 30 attempts
Median
Standard deviation
ALA
49901.8086
53113.4275
52599.0273 2781.6693
j-means
44074.6445
45375.1657
44608.1953 1505.4071
GA+Aggl1
47175.1563
65307.6384
68348.0234 8005.5044
GA+Aggl3
45082.1719
61217.7020
67169.8906 10881.5804
GA+Aggl5
44595.0078
62004.2550
67909.3906 10675.4870
GA+Aggl7
45099.3906
64485.9453
67556.2500 8554.3128
GA+Aggl10
45076.5664
61484.3778
67869.3906 11066.1223
GA+Aggl15
43546.7109
60578.9693
66986.4922 10980.7584
GA+Aggl20
43725.7617
60444.0061
66336.1797 11363.5646
GA+Aggl30
43211.1719
57039.3717
66722.7500 12488.6231
GA+Aggl50
42579.3242
56551.1702
66176.4688 12403.4062
GA+Aggl75
42148.0586
59433.9648
66021.9531 11647.4117
GA+Aggl100
43580.9844
59436.0932
65480.7891 10809.9188
GA+Aggl150
41808.8594
58554.0748
65192.6328 11423.9338
GA+Aggl200
41848.1445
55088.4537
64537.2422 12227.0877
GA+Aggl250
41346.4219
48687.8661
42683.8828 10936.4143
GA+Aggl300
41242.1875
50661.3628
41926.5293 12483.8988
GH-VNS1
67290.1406
67645.6563
67467.4375 478.1909
GH-VNS2
43395.0273
63081.6233
66222.0313 8689.9374
GH-VNS3
42124.9375
61204.4487
64358.4844 8413.3478
SWAP3
46137.9570
63618.6362
68361.8672 9467.2699
SWAP100
51026.5859
65787.6618
67617.5781 6667.2423
GA-1POINT
45618.3828
49198.6964
47304.4453 3129.2994
GA-UNIFORM
46171.7148
52218.1077
50642.4961 5712.0544
GA-AgglAdapt⇔ 41283.0859
51559.9151
41660.5293 11745.5515
5
Conclusion
Having investigated the dependence of the results achieved by the genetic algorithms with greedy agglomerative crossover operator in its parameter r, we found that the nature of this dependence is specific for each problem, while the value of r, which ensures the fastest convergence of the algorithm cannot be preestimated based on the most important numerical parameters of the problem such as the number of demand points and the number of required medians. We added an exploratory search procedure for tuning this parameter. The advantage of the algorithm with this procedure is statistically significant in several cases. In all other cases, the advantages or disadvantages are indistinguishable in
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
197
comparison with the other genetic algorithms for the continuous p-median problem. Thus, the self-configuring ability of the genetic algorithm with the proposed exploratory search procedure makes it a more versatile computational tool in comparison with known algorithms.
References 1. Drezner, Z., Hamacher, H.: Facility Location: Applications and Theory. Springer, Heidelberg (2004) 2. Kochetov, Yu., Mladenovic, N., Hansen, P.: Local search with alternating neighborhoods. Discrete Anal. Oper. Res. Ser. 2(10), 11–43 (2003) 3. Wesolowsky, G.: The Weber problem: history and perspectives. Locat. Sci. 1, 5–23 (1993) 4. Weiszfeld, E., Plastria, F.: On the point for which the sum of the distances to n given points is minimum. Ann. Oper. Res. 167, 7–41 (2009). https://doi.org/10. 1007/s10479-008-0352-z 5. Vardi, Y., Zhang, C.H.: A modified weiszfeld algorithm for the Fermat-Weber location problem. Math. Program 90(3), 559–566 (2001). https://doi.org/10.1007/ s101070100222 6. Drezner, Z.: The fortified Weiszfeld algorithm for solving the weber problem. IMA J. Manage. Math. 26, 1–9 (2013). https://doi.org/10.1093/imaman/dpt019 7. Bailey, K.: Typologies and Taxonomies: An Introduction to Classification techniques. Sage University Paper series on Quantiative Applications in the Social Sciences, CA, USA (1994). https://doi.org/10.4135/9781412986397 8. Tan, P.N., Steinbach, M., Kumar, V.: Cluster Analysis: Basic Concepts and Algorithms. Introduction to Data Mining. Addison-Wesley (2006). https://www-users. cs.umn.edu/ku-mar001/dmbook/ch8.pdf 9. MacQueen, J.B.: Some methods of classification and analysis of multivariate observations. In Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June-18 July 1965 and 27 December 1965–7 January 1966, vol. 1, pp. 281–297 (1965) 10. Rozhnov, I., Orlov, V., Kazakovtsev, L.: Ensembles of clustering algorithms for problem of detection of homogeneous production batches of semiconductor devices. In: CEUR Workshop Proceedings, vol. 2098, pp. 338–348 (2018) 11. Ushakov, A., Vasilyev, I., Gruzdeva, T.: A computational comparison of the pmedian clustering and k-means. Int. J. Artif. Intell. 13(1), 229–242 (2015) 12. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489 13. Kochetov, Yu.A.: Local search methods for discrete placement problems. Dissertion Doctor of Physical and Mathematical Sciences, Novosibirsk (2010) 14. Hakimi, S.L.: Optimum locations of switching centers and the absolute centers and medians of a graph. Oper. Res. 12, 450–459 (1964) 15. Masuyama, S., Ibaraki, T., Hasegawa, T.: TE computational complexity of the m-center problems on the plane. Trans. Inst. Electron. Commun. Eng. Jpn. 64E, 57–64 (1981) 16. Kariv, O., Hakimi, S.L.: An algorithmic approach to network location problems. I: the p-centers. SIAM J. Appl. Math. 37(3), 513–538 (1979)
198
L. Kazakovtsev et al.
17. Kuenne, R.E., Soland, R.M.: Exact and approximate solutions to the multisource Weber problem. Math. Program 3, 193–209 (1972). https://doi.org/10. 1007/BF01584989 18. Ostresh, L.M.J.: The stepwise location-allocation problem: exact solutions in continuous and discrete spaces. Geogr. Anal. 10, 174–185 (1978). https://doi.org/10. 1111/j.1538-4632.1978.tb00006.x 19. Rosing. K.E.: An optimal method for solving the (generalized) multi-Weber problem. Eur. J. Oper. Res. 58, 414–426 (1992). https://doi.org/10.1016/03772217(92)90072-H 20. Piccialli, V., Sudoso, A. M., Wiegele, A.: SOS-SDP: an exact solver for minimum sum-of-squares clustering. arXiv:2104.11542v1 21. Resende, M.G.C.: Metaheuristic hybridization with greedy randomized adaptive search procedures. In: INFORMS TutORials in Operations Research, pp. 295–319 (2008). https://doi.org/10.1287/educ.1080.0045 22. Resende, M.G.C., Ribeiro, C.C., Glover, F., Marti, R.: Scatter search and pathrelinking: fundamentals, advances, and applications. In: Gendreau. M., Potvin. J.-Y. (eds.) Handbook of Metaheuristics, pp. 87–107 (2010). https://doi.org/10.1007/0306-48056-51 23. Fathali, J., Rad, N.J., Sherbaf, S.R.: The p-median and p-center problems on bipartite graphs. Iran. J. Math. Sci. Inform 9(2), 37–43 (2014). https://doi.org/ 10.7508/ijmsi.2014.02.004 24. Avella, P., Sassano, A., Vasil’ev, I.: Computational study of large-scale pmedian problems. Math. Program. 109(1), 89–114 (2007). https://doi.org/10. 1007/s10107-005-0700-6 25. Bernabe-Loranca, M., Gonzalez-VelAzquez, R., Granillo-Martinez, E., RomeroMontoya, M., Barrera-Camara, R.: P-median problem: a real case application. In: Advances in Intelligent Systems and Computing, p. 1181 (2021). https://doi. org/10.1007/978-3-030-49342-4 18 26. Avella, P., Boccia, M., Salerno, S., Vasilyev, I.: An aggregation heuristic for large scale p-median problem. Comput. Oper. Res. 39(7), 1625–1632 (2012). https:// doi.org/10.1016/j.cor.2011.09.016 27. Ushakov, A.V., Vasilyev, I.: Near-optimal large-scale k-medoids clustering. Inf. Sci. 545, 344–362 (2021). https://doi.org/10.1016/j.ins.2020.08.121 28. Farahani, R.Z., Hekmatfar, M.: Facility Location Concepts. Models, Algorithms and Case Studies. Springer, Heidelberg (2009). https://doi.org/10.1007/978-37908-2151-2 29. Mladenovic, N., Brimberg, J., Hansen, P., Moreno-Perez, J.: The p-median problem: a survey of metaheuristic approaches. Eur. J. Oper. Res. 179, 927–939 (2007) 30. Reese, J.: Solution methods for the p-median problem: an annotated bibliography. Networks 48, 125–142 (2006) 31. Drezner, Z., Brimberg, J., Mladenovic, N., Salhi, S.: New heuristic algorithms for solving the planar p-median problem. Comput. Oper. Res. 62, 296–304 (2015). https://doi.org/10.1016/j.cor.2014.05.010 32. Houck, C.R., Joines, J.A., Kay, M.G.: Comparison of genetic algorithms, random restart and two-opt switching for solving large location-allocation problems. Comput. Oper. Res. 23(6), 587–596 (1996). https://doi.org/10.1016/03050548(95)00063-1 33. Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33(9), 1455–1465 (2000). https://doi.org/10.1016/S00313203(99)00137-5
Self-adjusting Genetic Algorithm with Greedy Agglomerative Crossover
199
34. Krishna, K., Murty, M.M.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 29(3), 433–439 (1999). https://doi.org/10.1109/3477. 764879 35. Neema, M.N., Maniruzzaman, K.M., Ohgai, A.: New genetic algorithms based approaches to continuous p-median problem. Netw. Spat. Econ. 11, 83–99 (2011) 36. Tuba, E., Strumberger, I., Tuba, I., Bacanin, N., Tuba, M.: Water cycle algorithm for solving continuous p-median problem. In: proceedings of the 12th IEEE SACI 2018, Timiuoara, Romania, pp. 351–354 (2018). https://doi.org/10.1109/ SACI.2018.8441019 37. Levanova, T. V., Gnusarev. A.Y.: Simulated annealing for competitive p-median facility location problem. J. Phys. Conf. Ser. 1050, 012044 (2018) 38. Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998) 39. Alp, O., Erkut, E., Drezner, Z.: An efficient genetic algorithm for the p-median problem. Ann. Oper. Res. 122, 21–42 (2003) 40. Kazakovtsev, L.A., Stupina, A.A.: Fast genetic algorithm with greedy heuristic for p-median and k-means problems. In: proceedings of 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 602–606 (2014). https://doi.org/10.1109/ICUMT.2014.7002169 41. Kalczynski, P., Brimberg, J., Drezner, Z.: Less is more: discrete starting solutions in the planar p-median problem. TOP 2021 (2021). https://doi.org/10.1007/s11750021-00599-w 42. Still, S., Bialek, W., Bottou, L.: Geometric clustering using the information bottleneck method. In: Advances In Neural Information Processing Systems, vol. 16. MIT Press (2004) 43. Sun, Z., Fox, G., Gu, W., Li, Z.: A parallel clustering method combined information bottleneck theory and centroid-based clustering. J. Supercomput. 69, 452–467 (2014) 44. Kazakovtsev, L., Rozhnov, I., Popov, A., Tovbis, E.: Self-adjusting variable neighborhood search algorithm for near-optimal k-means clustering. Computation 8(4), 90 (2020). https://doi.org/10.3390/computation8040090 45. Hosage, C.M., Goodchild, M.F.: Discrete space location-allocation solutions from genetic algorithms. Ann. Oper. Res. 6(2), 35–46 (1986) 46. Kwedlo, W., Iwanowicz, P.: Using genetic algorithm for selection of initial cluster centers for the k-means method. In: ICAISC 2010: Artifical Intelligence and Soft Computing, pp. 165–172 (2010). https://doi.org/10.1007/978-3-642-13232-220 47. Kazakovtsev, L., Shkaberina, G., Rozhnov, I., Li, R.: Kazakovtsev, V.: Genetic algorithms with the crossover-like mutation operator for the k-means problem. CCIS 1275, 350–362 (2020). https://doi.org/10.1007/978-3-030-58657-7 28 48. Kazakovtsev, L., Rozhnov, I., Shkaberina, G., Orlov, V.: K-means genetic algorithms with greedy genetic operators. Math. Prob. Eng. 2020, 8839763 (2020). https://doi.org/10.1155/2020/8839763 49. Nikolaev, A., Mladenovic, N., Todosijevic, R.: J-means and I-means for minimum sum-of-squares clustering on networks. Optim. Lett. 11, 359–376 (2017). https:// doi.org/10.1007/s11590-015-0974-4 50. Kazakovtsev, L., Rozhnov, I.: Application of algorithms with variable greedy heuristics for k-medoids problems. Informatica 44, 55–61 (2020). https://doi.org/ 10.31449/inf.v44i1.2737
200
L. Kazakovtsev et al.
51. Pizzuti, C., Procopio, N.: A k-means based genetic algorithm for data clustering. In: ´ Quinti´ Gra˜ na, M., L´ opez-Guede, J.M., Etxaniz, O., Herrero, A., an, H., Corchado, E. (eds.) SOCO/CISIS/ICEUTE -2016. AISC, vol. 527, pp. 211–222. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47364-2 21 52. Clustering Basic Benchmark. http://cs.joensuu.fi/sipu/datasets. Accessed 15 Feb 2021 53. Dua, D., Graff. C.: UCI Machine Learning Repository 2019. http://archive.ics.uci. edu/ml. Accessed 15 Feb 2021
Continuous Reformulation of Binary Variables, Revisited Leo Liberti(B) ´ LIX CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France [email protected]
Abstract. We discuss a class of tightly feasible MILP for which branchand-bound is ineffective. We consider its hardness, evaluate the probability that randomly generated instances are feasible, and introduce a heuristic solution method based on the old idea of reformulating binary variables to continuous while introducing a linear complementarity constraint. We show the extent of the computational advantage, under a time limit, of our heuristic with respect to a state-of-the-art branch-andbound implementation. Keywords: Reformulation Hard instances
1
· Infeasible · Market split · Market share ·
Introduction
In this paper we consider the following Mixed-Integer Linear Program (MILP): ⎫ min yh ⎪ ⎪ ⎪ h≤p ⎪ ⎪ ⎬ Qy = x (1) Ax ≤ b ⎪ ⎪ L U ⎪ ⎪ x ≤ x ≤x ⎪ ⎭ y ∈ {0, 1}p , where Q = (qjh ) is an n × p real matrix, A = (aij ) is an m × n real matrix, b ∈ Rm , x is a vector of n continuous decision variables, and y is a vector of p binary decision variables. Equation (1) was brought to my attention by a colleague, as an interesting “core” of a much more complicated formulation concerning a Hydro Unit Com´ mitment (HUC) problem arising at Electricit´ e de France (EDF) [6]. Although Eq. (1) is not directly related to HUC, my colleague and her co-workers at EDF This work was partially supported by: the PGMO grant “Projet 2016-1751H”; a Siebel Energy Institute Seed Grant; the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement n. 764759 ETN “MINOA”. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 201–215, 2021. https://doi.org/10.1007/978-3-030-86433-0_14
202
L. Liberti
have identified Eq. (1) as the source of frequent feasibility issues they experienced while solving their HUC, which they discuss in [21], but along different lines than the present paper. The binary variables y in Eq. (1) model some on/off controls along a discretized time line. The controls influence (through the equations Qy = x) some physical quantities x that are constrained to lie in [xL , xU ]. The decision maker seeks the smallest number of controls that need to be switched on in order for the physical constraints on x to be feasible. According to [21], however, it is difficult to satisfy the constraints of Eq. (1), at least in the EDF instances. In particular, the authors detail the efforts of solving Eq. (1) with common MILP solution techniques, such as the Branch-and-Bound (BB) solver CPLEX [12], which would normally be considered the best solution method for such problems. At first sight, the MILP in Eq. (1) may not strike the reader as a particularly “nasty” problem, insofar as structure goes. The infeasibility issues arise because the instances solved at EDF enforce very tight bounds [xL , xU ] on x—sometimes requiring that xL = xU (which occurs in run-of-the-river reservoirs). Note that the constraints Ax ≤ b in Eq. (1) are supposed to encode “the rest of the problem” (which is quite extensive, and may involve more decision variables than just x). In a private communication [6], I was told that the infeasibilities were mostly related to the problem in Eq. (2) below, obtained as a relaxation of Eq. (1) by shedding the technical constraints. Accordingly, Eq. (2) will be our problem of interest in the rest of this paper. ⎫ yh min ⎪ ⎪ ⎪ h≤p ⎬ Qy = x (2) ⎪ xL ≤ x ≤ xU ⎪ ⎪ ⎭ y ∈ {0, 1}p . Although in this paper we focus exclusively on feasibility, the objective function is discussed in [23]. The choice of limiting our attention to Eq. (2) is also due to confidentiality issues: the EDF instances for the original problem could not be made available to me. I therefore worked with instances generated randomly from Eq. (2). This paper makes two contributions: a theoretical one about the probability of generating feasible/infeasible random instances of the problem in Eq. (2); and a computational one about a heuristic method for solving it. From the methodological point of view, we leverage an observation made in [7] about Linear Complementarity Programming (LCP) reformulations of tightly constrained MILPs in binary variables: they often (heuristically) lead to exact, or almost exact, solutions. We discuss the computational and empirical hardness of Eq. (2) and present some solution methodologies. We focus on a specific one based on an LCP reformulation, which we also test computationally. The rest of this paper is organized as follows. Section 2 provides a very short literature review. In Sect. 3 we discuss the computational complexity of our problem. In Sect. 4 we consider the difficulty of the problem in terms of the probability
Continuous Reformulation of Binary Variables, Revisited
203
of a random instance being feasible in function of the bounds [xL , xU ]. In Sect. 5 we discuss some unusual solution methods that are not based on BB.
2
Short Literature Review
As mentioned above, the original inspiration for this paper was the HUC problem, although the paper itself is not about HUC (see [23] for more information about HUC). This paper is actually about Eq. (2), specifically in the case where xL −xU 2 is small, or even zero, and the consequent infeasibility issues observed in popular MILP solvers even when instances are feasibile. As already noted, the feasibility issues we study in this paper (quite independently of HUC) have been addressed within the HUC context in [21, Sect. 5], mostly through bound constraint relaxation by means of variables whose sum is minimized. When xL = xU , and without an objective function, Eq. (2) reduces to Eq. (3) (see below), which is a well-known case of difficult Binary Linear Program (BLP), i.e. the so-called market split [5] (a.k.a. “market share”). In the literature, pure feasibility BLPs such as Eq. (3) have been solved by means of a basis reduction algorithm proposed in [1], which also targets its “natural” optimization version minimizing a sum of slack variables added to the equations. Solving Eq. (3) with an arbitrary objective function is outside of the scope of the basis reduction algorithm of [2] (used in [1]), although of course bisection search of the feasibility algorithm of [2] is always available. We noted that our purpose in this paper is to solve Eq. (2) by means of an LCP reformulation. This is more unusual than the converse (i.e. solving LCPs by means of a MILP reformulations), since state-of-the-art solver technology is better for MILP than for nonconvex Nonlinear Programming (NLP), which includes LCP. A MILP reformulation of the LCP was presented in [19, Thm. 3.1]. This reformulation has been used many times in the literature, and is now part of the Mathematical Programming (MP) “folklore”. Reformulating MILPs to LCPs is more unusual but of course not unheard of: as already stated, we took our methodological inspiration from [7], which p reformulates a binary variable vector y ∈ {0, 1} exactly by means of the addition of a nonlinear constraint h≤p yh (1 − yh ) = 0 to the formulation. This is one of the oldest tricks in MP: to name a few citations, [18,20] add h≤p yh (1 − yh ) as apenalty to the objective, while [17] proposes an alternating heuristic based on h≤p yh (1 − yh ) ≤ δ, where δ is reduced at each iteration.
3
Hardness
Just how hard is the problem in Eq. (2)? We consider the following set of linear equations in binary variables y ∈ {0, 1}p : qjh yh = x ¯j , (3) ∀j ≤ n h≤p
204
L. Liberti
where x ¯ = xL = xU . We assume Q, x ¯ are rational. We stress that fixing the variables x to a fixed constant x ¯ appears to be an important practical case [6] in the motivating application. First, we consider the case n = 1, i.e. (3) consists of just one equation. Then Eq. (3) is a rational equivalent of the Subset Sum problem [9, §SP13], with instance given by (Q, x ¯). The fact that Q, x ¯ are rational obviously does not change the problem: it suffices to rescale all data by the minimum common multiple of all the denominators. This problem is weakly NP-complete: it can be solved in pseudopolynomial time by a dynamic programming algorithm [9]. The Wikipedia entry for Subset Sum also states that the difficulty of solving this problem depends on the number of variables p and the number of bits necessary to encode Q: if either is fixed, the problem becomes tractable. It is well known that Integer Linear Programs (ILP) with a fixed number of variables p can be solved in polynomial time [15]. In this paper, we are interested in the case where p is not fixed, whereas n might be fixed or not. The case with n fixed is relevant for the motivating U application, as the operational points xj on which the constraints xj ∈ [xL j , xj ] are verified could be given by the (fixed) physical properties of the considered hydro valley. We are not aware of results in ILP complexity with a fixed number of constraints but non-fixed number of variables, however. If neither n nor p is fixed, we note that there is a natural reduction from sat to a version of Eq. (3) where Q has entries in {−1, 0, 1}, which shows that solving Eq. (3) is strongly NP-complete (by inclusion with respect to Q). Again by inclusion (with respect to xL , xU ), Eq. (2) is also NP-complete. Equation (2) is one of those cases when complexity proofs by inclusion are not quite satisfactory. The empirical hardness of solving Eq. (2) obviously decreases as the bounds [xL , xU ] grow farther apart (if xL = −∞ and xU = ∞ any solution of Eq. (2) is trivially feasible). A more convincing complexity proof should take L the width parameter W = maxj≤n (xU j − xj ) into account, too. In Sect. 4 below we attempt to provide a more appropriate hardness measure in terms of the probability of achieving feasibility in a randomly generated instance.
4
Likelihood of Approximate Feasibility
In this section we consider the probability that uniformly sampled instances of Eq. (3) and Eq. (2) are (almost) feasible. 4.1
The Irwin-Hall Distribution
Consider Eq. (3). For each j ≤ n and h ≤ p we assume that qjh is sampled from ˜ jh uniformly distributed in [0, 1]. For a given y ∈ {0, 1}p , a random variable Q let ˆj = ˜ jh . Q Q h≤p yh =1
Continuous Reformulation of Binary Variables, Revisited
205
We intend to derive an expression, which depends on xL and xU , of the probability that a uniformly sampled instance of Eq. (2) is feasible. To do that, we first look at the case x ¯ = xL = xU and tackle instances where the cardinality of the support of y is fixed to a given integer K, i.e. h≤p yh = K. For all j ≤ n, the corresponding random variable is ˜ jh , ˆK Q Q j = h≤p h yh =K
i.e. the sum of K i.i.d. uniform random variables on [0, 1]. ˆ K is known as the Irwin-Hall For any given j ≤ n, the distribution of Q j distribution [11]. Its mean is K/2 and its variance is K/12. It can also be shown ˆ K attains a that for K > 1 the probability distribution function (PDF) of Q j strict (local) maximum at the mean K/2. The cumulative distribution function (CDF) is x K 1 FK (x) = (−1)k (4) (x − k)K . k K! k=0
ˆ K taking values between xL and By Eq. (4), for any j ≤ n the probability of Q j j U L K L U K xU j is FK (xj ) − FK (xj ), a quantity we shall denote by γj (x , x ), or γj when no ambiguity can ensue. 4.2
Feasibility for n = 1
ˆ K would allow us to glance If we fix n = 1, understanding the distribution of Q 1 at some uniformly generated input data, (Q, x ¯), and get an idea of the likelihood of a given binary vector y with K nonzeroes yielding the given value x ¯1 . We first want to make a qualitative statement to the effect that instances U where [xL 1 , x1 ] contains the mean K/2 are likely to be easier than those that do not. Lemma 4.1. There exists a value 0 < r < K such that, from r units away from ˆ K converge to the mean, the tails of the probability density function (PDF) of Q 1 zero exponentially fast. ˜ 1h is uniformly Proof. First, we argue about the right tail. Trivially, since Q ˆ K ≥ K) = 0. We now have to argue sampled in [0, 1] for each h ≤ p, we have P(Q 1 the negative exponential convergence in some sub-range of [K/2, K] including the right end-point K. By [14, Thm. 7.5], we have K 2r KK ˆ + r ≤ e− 2 g( K ) P Q≥ (5) 2 for any r > 0, where g(u) = (1 + u) ln(1 + u) − u for u ≥ 0. The exponent in the RHS of Eq. (5) is negative as long as: 2r 2r 2r 1+ ln 1 + > . K K K
206
L. Liberti
Trivially, we have that, for all v > e, v ln v > v. So, if 1 + 2r K > e, it follows that (1 + 2r/K) ln(1 + 2r/K) > 1 + 2r/K, which is obviously strictly greater than 2r/K. We therefore need r > K(e − 1)/2 for the statement to hold, as claimed. The argument for the left tail follows by symmetry of the PDF, which can be established by considering the distribution of the sum of K uniform random variables over [−1, 0]. ˆ K is concentrated around K . This By Lemma 4.1, the measure of the PDF of Q 1 2 makes it reasonable to expect that instances will be easier as x ¯ moves towards K 2 . Quantitative statements about the probability of generating feasibile U instances can be obtained for given values of K and xL 1 , x1 by evaluating K L U γ1 (x , x ). 4.3
Feasibility in the General Case
We now move back to the general case with n > 1 and bounds xL , xU ∈ Rn on x as in Eq. (2). Recall that for all j ≤ n we defined L U K ˆK P(∃y ∈ {0, 1}p Q j ∈ [xj , xj ] | 1y = K) = γj .
(6)
Proposition 4.2. The probability that a uniformly sampled instance of Eq. (2) is feasible is: p
K ˜ ∈ [xL , xU ] ≤ 1 P ∃y ∈ {0, 1}p s.t.Qy (1 − γ ) . (7) 1 − j K 2p K≤p
j≤n
Proof. The probability that there exists y with support cardinality K that satisfies all of the constraints is ˆ K ∈ [xL , xU ] | 1y = K) P(∃y ∈ {0, 1}p Q ˆ K ∈ [xL , xU ] | 1y = K) = 1 − P(∀y ∈ {0, 1}p Q p ˆ K ∈ [xL , xU ] | 1y = K) = 1 − P(∀y ∈ {0, 1} ∃j ≤ n Q j j j p ˆK L ≤ 1 − P(∃j ≤ n ∀y ∈ {0, 1} Q ∈ [x , xU ] | 1y = K) j
j
j
p
U ˆK [xL ≤ 1 − P(∀j ≤ n ∀y ∈ {0, 1} Q j ∈ j , xj ] | 1y = K)
ˆ K ∈ [xL , xU ] | 1y = K ∀y ∈ {0, 1}p Q =1−P j j j j≤n
=1−
ˆ K ∈ [xL , xU ] | 1y = K) P(∀y ∈ {0, 1}p Q j j j
j≤n
=1−
L U ˆK (1 − P(¬∀y ∈ {0, 1}p Q j ∈ [xj , xj ] | 1y = K))
j≤n
=1−
L U ˆK (1 − P(∃y ∈ {0, 1}p Q j ∈ [xj , xj ] | 1y = K))
j≤n
=1−
(1 − γjK ),
j≤n
Continuous Reformulation of Binary Variables, Revisited
207
where the last equality follows by Eq. (6). We can take the union over all possible values of K ∈ {0, . . . p} by weighing each probability by the probability that a uniformly sampled binary vector y should have support cardinality K, p /2p , which yields Eq. (7), as claimed. i.e. K For example, if xL = 0 and xU =p, then obviously γK (j) = 1 for each K ≤ p p 1 and j ≤ n, which yields 2p k = 1, as expected. Setting different values of K≤p
xL , xU , Proposition 4.2 allows the computation of the probability of feasibility of the corresponding instance.
5
Solution Methods
Since Eq. (2) is a MILP, the solution method of choice is the Branch-and-Bound, implemented for example using a state-of-the-art solver such as CPLEX [12]. As mentioned in Sect. 1, however, given the difficulty in finding feasible solutions, CPLEX can rarely prune by bound, which means that it shows the brunt of its exponential behaviour. In this section we discuss three heuristic methods, and argue that the method in Sect. 5.1 is the most promising. 5.1
Relaxation of Integrality Constraints
Every BLP
min{c y | By ≤ β ∧ y ∈ {0, 1}p }
(8)
can be exactly reformulated to an LCP by replacing the integrality constraints y ∈ {0, 1}p by the following:
0≤y≤1 z =1−y
(9) (10)
yh zh = 0.
(11)
h≤p
By Eq. (9)–(10) we have 0 ≤ z ≤ 1, which implies that every product in Eq. (11) is non-negative, which in turn means that the sum is non-negative. Thus, the sum is zero if and only if every term is zero, so either yh = 0 or zh = 1 − yh = 0, i.e. yh ∈ {0, 1} for each h ≤ p. An equivalent NLP removes the z variables altogether: min c y | By ≤ β ∧ yh (1 − yh ) = 0 . (12) h≤p
This provides the following empirically efficient heuristic algorithm for our problem P in Eq. (2): 1. consider the continuous relaxation P¯ of P ; 2. solve P¯ in polynomial time, e.g. by the interior point method;
208
L. Liberti
3. use the solution x , y as a starting point for a local NLP solver on the problem yh | Qy = x ∧ x ∈ [xL , xU ] ∧ yh (1 − yh ) = 0 . (13) min h≤p
h≤p
We remark that only Step 3 is crucial: randomly sampling the starting point is also a valid choice, as evidenced by the reasonable success of Multi-Start (MS) methods for global optimization [16]. In practice, we employ a slack variable s ≥ 0 on the linear complementarity constraint Eq. (11) in Eq. (13): ⎫ yh + ηs min ⎪ ⎪ ⎪ h≤p ⎪ ⎪ ⎪ ⎪ ∀j ≤ n qjh yh = x ⎪ ⎬ h≤p (14) yh (1 − yh ) = s ⎪ ⎪ ⎪ h≤p ⎪ ⎪ x ∈ [xL , xU ] ⎪ ⎪ ⎪ ⎭ s ≥ 0, where η > 0 is a scaling coefficient chosen empirically. The interest of Eq. (14) is that it always provides a solution: even when s > 0 (and hence Eq. (11) is not satisfied), we can always hope that the (fractional) y variables will be close enough to integrality that a rounding will yield a feasible solution. As shown in Sect. 6, we consider the solution (x∗ , y ∗ ) where y ∗ is obtained by rounding the y variables, and x∗ are re-computed as Qy ∗ according to Eq. (3). 5.2
The Case of Fixed n
When n is fixed, we can exploit Barvinok’s polynomial-time algorithm for systems with a fixed number of homogeneous quadratic equations [3]. We consider the case where x ¯ = xL = xU (Eq. (3)), together with the constraint h yh (yh − 1) = 0, which we homogenize by adding a new variable z and noting that yh2 = yh for each h ≤ p (since y ∈ {0, 1}p ): qjh yh2 = x ¯j z 2 (15) ∀j ≤ n h≤p
h≤p
yh2 =
yh z.
(16)
h≤p
We remark that all variables y, z are now continuous and unconstrained, but if we achieve a feasible solution with y ≥ 0 then Eq. (16) willnecessarily imply y ∈ {0, 1} and z = 1. We deal with the objective function min h yh by replacing each yh with yh2 , to obtain 2 yh | Eq. (15) − (16) , min h≤p
Continuous Reformulation of Binary Variables, Revisited
209
which we can solve by a bisection method on the objective function value using Barvinok’s algorithm as a feasibility oracle. This requires the homogenization of the equation h yh2 = c, where c is a constant varied by the bisection method. The system passed to Barvinok’s algorithm is therefore yh2 = cz 2 ∧ Eq. (15) − (16). h≤p
After each call to Barvinok’s algorithm, the condition y ≥ 0 must be verified. If it does not hold, then the heuristic stops inconclusively. Otherwise it identifies the optimum of Eq. (2) up to a given > 0 in time O(log(p/)pn ), polynomial when n is fixed. We chose not to implement this heuristic, for three reasons: (i) as given in [3], Barvinok’s algorithm determines whether system of homogeneous quadratic equations have a common root or not, but will not find the root; (ii) it might not be very efficient in case n is fixed but not extremely small; (iii) the derived heuristic does not appear to provide any useful information in case of failure. 5.3
Relaxing the [0, 1] Bounds
¯ have We again assume x ¯ = xL = xU as in Eq. (3). We also assume that Q and x integer components, and relax the y variables to take any integer value rather than just binary. This setting opens up algorithmic opportunities from the field of linear Diophantine equations. There exist appropriately sized unimodular matrices L, R such that LQR is a diagonal matrix D where each nonzero diagonal entry divides the next nonzero diagonal entry [13, Thm. 1]. The system Qy = x ¯ (with y ∈ Zp ) can therefore be decomposed into Dz = L¯ x and y = Rz. Now, assuming one can find L, R, solving Dz = L¯ x is easy since D is diagonal, and then y can be computed from Rz. There are algorithms for computing L, R that work in a cubic number of steps in function of n, p [4]. If we step back to the specificities of the real application, Q consists of floating point numbers given to a precision of at least 10−6 : rescaling would likely yield large integer coefficients, which would probably make the application of solution algorithms for linear Diophantine equations unwieldy. On the other hand, a systematic treatment of classical results about linear Diophantine equations with rational coefficients and unbounded variables is given in [22, Ch. 4]. Imposing bounds on the variables, however, makes the problem hard again [2]. We therefore decided not to implement this method.
6
Computational Results
We implemented and tested a MS heuristic defined as follows: 1. sample a starting point x ∈ Rn , y ∈ {0, 1}p ; 2. deploy a local NLP solver from the initial point (x , y ) on Eq. (14), get y ∗ ;
210
L. Liberti
3. round the solution y ∗ and compute x∗ = Qy ∗ ; 4. if the result (x∗ , y ∗ ) has a better objective function value than the incumbent (i.e. the best result found so far), update it; 5. repeat until a termination condition holds. Our tests aim at verifying whether the above algorithm is competitive compared with a BB-based solver. To this goal, we randomly generated two sets of instances ¯ ∈ Rn , over the following parameters: I1 , I2 of Eq. (2), depending on a vector x – n ∈ {1, 2, 5, 10, 50, 99}; – p ∈ {10, 50, 100, 500, 999}; – all components of Q uniformly sampled from either [0, 1] (if distr = 0) or [−1, 1] (if distr = 1); ¯ − θ and xU = x ¯ + θ for θ ∈ {0, 0.05, 0.1}, and x ¯ chosen as detailed – xL = x below depending on I1 or I2 . The first set I1 contains instances that are feasible by construction. This is achieved by defining x ¯ as follows: 1. sample y ∈ {0, 1}p from p Bernoulli distributions with probability p implies that the average value of K is 2 ); 2. for all j ≤ n compute x ¯j = h≤p Qjh yh .
1 2
(which
The second set I2 contains instances that are infeasible with some probability: this is achieved by sampling each component of x ¯ from a uniform distribution on [0, Q] (if distr = 0) or [−Q, Q] (if distr = 1), where Q = h≤p |Qjh |. We attempted to compute the probability of infeasibility of the instances in I2 by means of Proposition 4.2, but our naive implementation (based on the AMPL modelling language [8]) was only able to compute the probability of the smallest instances in the set, i.e. those with p = 10 (for any n, distr, θ). Those instances with θ = 0 obviously yield zero probability by definition of γjK . The rest are reported in Table 1. We then solved the instances in I1 , I2 by running the BB solver CPLEX 12.6.3 [12] and the MS heuristic above on a 4-CPU Intel Xeon X3220 at 2.4 GHz with 8 GB RAM running Linux. We chose SNOPT 7.2 as local NLP solver [10] in the MS heuristic. We gave CPLEX the default parameters but set the time limit to 180s of “wall-clock” time (we recall that CPLEX is a parallel solver by default, so the CPU time measured in the experiments is not limited by 180s but by the actual user CPU time spent, as reported by the operating system). The performance of BB solvers on possibly infeasible instances is severely impaired by a lack of feasible solution because no node is ever pruned by bound; the overall effect is more similar to a complete enumeration than to the typically “smart” enumeration carried out by such solvers. To avoid penalizing CPLEX for this reason, we employed a reformulation of Eq. (2) which shifts all the
Continuous Reformulation of Binary Variables, Revisited
211
Table 1. The smallest (probably) “infeasible instances” of set I2 have low probability of being feasible (computed using Proposition 4.2). n
distr θ
1
0
0.05 0.0000000
1
0
0.10 0.0000000
1
1
0.05 0.0000000
1
1
0.10 0.0002403
2
0
0.05 0.0002937
2
0
0.10 0.0033951
2
1
0.05 0.0001573
2
1
0.10 0.0001573
5
0
0.05 0.0024276
5
0
0.10 0.0044601
5
1
0.05 0.0020171
5
probability
1
0.10 0.0007822
10 0
0.05 0.0030972
10 0
0.10 0.0060456
10 1
0.05 0.0015705
10 1
0.10 0.0038300
50 0
0.05 0.0034620
50 0
0.10 0.0070301
50 1
0.05 0.0023307
50 1
0.10 0.0034504
99 0
0.05 0.0038649
99 0
0.10 0.0069426
99 1
0.05 0.0014419
99 1
0.10 0.0029586
Average
0.0024981
infeasibility into slack variables:
− ⎫ (s+ j + sj ) ⎪ ⎪ ⎪ j≤n ⎪ h≤p + −⎪ qjh yh = xj + sj − sj ⎪ ∀j ≤ n ⎬
min
x,y,s
h≤p
yh +
xL ≤ x ≤ xU s+ , s− ≥ 0 y ∈ {0, 1}p ,
and carried out the same modification on Eq. (14).
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(17)
212
L. Liberti
The only possible configuration for the MS heuristic is the maximum allowed number T of iterations, which we set to p (the same as the number of binary variables) in order to have the effort depend on the size of the instance. The results for the feasible instance set I1 are presented in Table 2. We report the instance details (n, p, the distribution type distr, the [xL , xU ] range half-width θ), whether each method found a feasible solution (feas ∈ {0, 1}), the sum of the infeasibilities w.r.t. Eq. (3) conerr, computed as
L ∗ max(0, x∗j − xU conerr = j ) + max(0, xj − xj ) , j≤n
and the CPU times (CPU). An instance is classified as feasible (feas = 1) if conerr < 10−8 (with also, obviously, y ∗ ∈ {0, 1}p ). Table 2. Comparative detailed result on the feasible instance set I1 with T = p MS iterations. n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
feas p distr θ BB MS 10 0 0 0 0 0 0 10 0 0.05 0 1 10 0 0.1 10 1 0 0 0 0 0 10 1 0.05 0 1 10 1 0.1 0 0 50 0 0 50 0 0.05 0 1 0 1 50 0 0.1 0 0 50 1 0 50 1 0.05 0 0 0 1 50 1 0.1 0 0 100 0 0 100 0 0.05 0 0 0 0 100 0 0.1 0 0 100 1 0 0 0 100 1 0.05 100 1 0.1 0 1 0 0 500 0 0 0 1 500 0 0.05 500 0 0.1 0 1 0 0 500 1 0 500 1 0.05 0 1 500 1 0.1 0 1 999 0 0 0 0 0 1 999 0 0.05 999 0 0.1 0 1 0 0 999 1 0 999 1 0.05 0 0 0 1 999 1 0.1 0 0 10 0 0 10 0 0.05 0 0 10 0 0.1 0 0 0 0 10 1 0 10 1 0.05 0 0 0 0 10 1 0.1 50 0 0 0 0 50 0 0.05 0 0 50 0 0.1 0 0 50 1 0 0 0 50 1 0.05 0 0 50 1 0.1 0 0 0 0 100 0 0 100 0 0.05 0 0 100 0 0.1 0 1 0 0 100 1 0 100 1 0.05 0 1 100 1 0.1 0 0 0 0 500 0 0 500 0 0.05 0 0 500 0 0.1 1 0 500 1 0 0 0 0 0 500 1 0.05 500 1 0.1 0 0 999 0 0 0 0 999 0 0.05 0 1 0 1 999 0 0.1 999 1 0 0 0 999 1 0.05 0 0 999 1 0.1 0 1 0 0 10 0 0 10 0 0.05 0 0 0 0 10 0 0.1 0 0 10 1 0 10 1 0.05 0 0 10 1 0.1 0 0 50 0 0 0 0 50 0 0.05 0 0 0 0 50 0 0.1 50 1 0 0 0 50 1 0.05 0 0 0 0 50 1 0.1 100 0 0 0 0 1 0 100 0 0.05 100 0 0.1 0 0 100 1 0 0 0 0 0 100 1 0.05 100 1 0.1 0 0 0 0 500 0 0 0 0 500 0 0.05 500 0 0.1 1 1 0 0 500 1 0 500 1 0.05 0 0 500 1 0.1 0 0 0 0 999 0 0 999 0 0.05 1 0 1 1 999 0 0.1 999 1 0 0 0 1 0 999 1 0.05 999 1 0.1 0 0
conerr BB MS 2.32407 0.061345 1.578351 0.161354 2.700129 0 1.261148 0.064085 1.211148 0.018378 1.161148 0 11.349767 0.002907 13.913338 0 13.863338 0 2.790113 2.383729 2.740113 2.740113 1.16734 0 28.007851 0.016543 30.865685 0.048186 19.610965 0.0284 8.57807 0.131922 3.120869 3.076816 3.070869 0 123.493014 0.00468 123.443014 0 119.406109 0 12.720343 2.896958 7.066126 0 7.016126 0 247.036395 0.046231 255.617392 0 234.886437 0 0.713901 0.006678 19.365855 0.033537 7.904365 0 0.351547 0.359571 0.618234 0.254019 0.518234 0.254314 1.081735 0.204043 0.303347 0.137929 0.203347 0.163569 1.484605 0.05694 0.391349 0.040117 0.072481 0.146071 0.693182 0.202292 0.593182 0.0489 0.493182 0.065863 0.323233 0.345594 0.682022 0.012533 0.582022 0 5.287322 0.130741 5.187322 0 0.289258 0.088442 0.752074 0.155689 0.042095 0.071255 0 0.022983 5.937512 0.099361 5.858057 0.08288 4.42073 0.037246 5.9E-05 0.088107 0.935847 0 0.835847 0 0.000609 0.065714 13.992941 0.01872 4.289912 0 0.869037 0.706022 0.650297 0.680799 0.921514 0.148645 2.468394 1.775934 2.218394 1.525934 1.998247 1.251902 0.287349 0.960493 0.755586 0.432848 0.538463 0.148599 1.183309 0.377512 1.155268 0.347119 1.005268 0.415876 0.051365 0.445371 0 0.246848 0.951273 0.171545 1.198647 0.440855 1.036505 0.414626 0.61102 0.118224 0.057685 0.284417 0.881686 0.092001 0 0 0.58195 0.378982 0.085931 0.160789 0.928858 0.001169 0.033254 0.154364 0 0.050479 0 0 3.022855 0.103025 0 0.022399 0.445252 0.058864
CPU BB MS 0.01 0.04 0.01 0.04 0.01 0.04 0.01 0.04 0.01 0.04 0.01 0.03 0.02 0.24 0.07 0.19 0.1 0.16 0.02 0.24 0.03 0.24 0.01 0.26 0.02 0.54 0.02 0.41 0.02 0.66 0.02 0.63 0.02 0.63 0.02 0.62 0.04 14.86 0.03 16.78 0.05 36.97 0.02 31.37 0.02 30.91 0.02 31 0.04 100.45 0.04 21.53 0.04 175.02 0.03 213.72 0.03 247.51 0.03 229.83 0.03 0.04 0.04 0.03 0.15 0.03 0.02 0.03 0.1 0.03 0.14 0.05 0.04 0.13 0.04 0.14 0.16 0.16 0.02 0.19 0.03 0.2 0.02 0.2 0.04 0.33 0.24 0.48 0.09 0.47 0.26 0.5 0.2 0.51 0.16 0.49 0.73 4.84 0.07 25.27 0.24 11.27 0.13 33.01 0.17 33.2 0.11 33.57 410.11 184.17 406.56 27.81 410.47 27.44 2.09 189.23 0.13 220.97 0.28 251.13 0.19 0.03 0.18 0.03 0.12 0.03 0.02 0.03 0.02 0.03 0.02 0.03 0.5 0.19 3.23 0.22 0.72 0.21 0.21 0.22 0.29 0.22 0.28 0.22 19 0.47 71.8 0.5 149.01 0.48 0.31 0.6 0.54 0.64 0.16 0.64 436.47 7.96 427.44 16.16 427.28 8.45 2.79 32.87 2.37 36.13 1.28 33.43 430.61 36.43 431.69 78.5 447.98 48.52 23.28 182.44 422.48 172.09 5.24 167.99
n 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
feas p distr θ BB MS 10 0 0 1 1 10 0 0.05 1 1 10 0 0.1 1 1 1 1 10 1 0 0 1 10 1 0.05 0 1 10 1 0.1 0 0 50 0 0 0 0 50 0 0.05 0 0 50 0 0.1 50 1 0 0 0 0 0 50 1 0.05 50 1 0.1 0 0 0 0 100 0 0 0 0 100 0 0.05 100 0 0.1 0 0 0 0 100 1 0 100 1 0.05 0 0 100 1 0.1 0 0 0 0 500 0 0 0 0 500 0 0.05 1 0 500 0 0.1 500 1 0 0 0 0 0 500 1 0.05 500 1 0.1 0 0 999 0 0 0 0 999 0 0.05 0 0 0 0 999 0 0.1 999 1 0 0 0 999 1 0.05 0 0 999 1 0.1 0 0 1 1 10 0 0 10 0 0.05 1 1 10 0 0.1 1 1 1 1 10 1 0 1 1 10 1 0.05 10 1 0.1 1 1 50 0 0 1 1 1 1 50 0 0.05 50 0 0.1 1 1 1 1 50 1 0 50 1 0.05 1 1 50 1 0.1 1 1 100 0 0 1 0 1 1 100 0 0.05 100 0 0.1 1 0 100 1 0 1 1 100 1 0.05 1 1 100 1 0.1 1 0 0 0 500 0 0 500 0 0.05 0 0 0 0 500 0 0.1 500 1 0 0 0 500 1 0.05 0 0 500 1 0.1 0 0 0 0 999 0 0 999 0 0.05 0 0 999 0 0.1 0 0 0 0 999 1 0 999 1 0.05 0 0 999 1 0.1 0 0 10 0 0 1 1 10 0 0.05 1 1 1 1 10 0 0.1 10 1 0 1 1 10 1 0.05 1 1 10 1 0.1 1 1 1 1 50 0 0 50 0 0.05 1 1 50 0 0.1 1 1 50 1 0 1 1 1 1 50 1 0.05 50 1 0.1 1 1 100 0 0 1 1 100 0 0.05 1 1 100 0 0.1 1 1 1 1 100 1 0 1 1 100 1 0.05 100 1 0.1 1 1 500 0 0 0 0 500 0 0.05 0 0 0 0 500 0 0.1 500 1 0 0 0 500 1 0.05 0 0 0 0 500 1 0.1 999 0 0 0 0 999 0 0.05 0 0 0 0 999 0 0.1 999 1 0 0 0 0 0 999 1 0.05 999 1 0.1 0 0
conerr BB MS 0 0 0 0 0 0 0 0 2.99655 0 2.49655 0 0.519431 1.520239 0.172402 1.375297 0.049552 0.545895 4.284423 2.995239 3.830222 1.804413 3.418451 1.151005 1.165771 1.655809 0.051527 0.590638 0.004725 0.7542 1.599585 1.867458 1.914359 1.71108 2.127105 1.178457 0.44991 1.007725 1.170809 0.356041 0 0.23927 2.521863 1.317721 0.225351 1.231162 0.120738 1.100889 0.652774 0.812867 0.113543 0.305648 0.288677 0.212521 0.787394 1.312658 0.353932 1.065403 0.248057 0.537679 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15.823588 0 0 0 10.970566 0 0 0 0 0 26.752762 13.027666 12.909458 11.216089 9.970944 9.325591 7.851837 24.342004 26.182309 27.068901 22.683359 20.61643 21.493069 13.108549 11.284734 11.684617 9.066446 9.552165 7.052495 27.06203 22.467814 24.044542 20.977076 21.310585 18.977143 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35.075665 36.372957 35.640828 34.034148 30.475635 28.59999 81.70416 80.200879 67.853744 76.015859 72.285066 72.912895 38.724098 37.083856 34.098302 31.210423 30.495419 25.813992 76.365674 78.565427 67.236768 69.93739 67.9875 65.822184
CPU BB MS 0.03 0.03 0.02 0.03 0.03 0.04 0.01 0.03 0.12 0.04 0.12 0.04 3.78 0.25 2.34 0.29 12.69 0.28 0.5 0.25 0.52 0.27 0.13 0.29 424.62 0.63 437.08 0.71 263.62 0.75 0.88 0.73 1.32 0.77 1.28 0.83 434.48 14.73 435.92 14.47 444.33 20.1 446.46 34.07 453.87 39.32 450.9 35.55 458.13 55.9 463.93 64.47 453.56 71.53 465.24 221.48 466.37 231.37 460.1 247.08 0.02 0.07 0.04 0.09 0.04 0.09 0.02 0.06 0.04 0.08 0.04 0.09 0.03 1.23 0.17 2.3 0.3 2.2 0.03 1.08 0.1 1.76 0.28 1.85 0.93 3.69 8.24 5.89 2.88 6.02 1.23 3.36 0.77 4.2 10.21 4.23 460.91 96.71 477.07 154.97 461.81 148.07 476.6 130.92 483.7 189.23 490.75 199.8 495.09 411.8 483.66 690.39 480.33 641.75 491.73 838.73 496.45 1122.32 494.81 1221.51 0.02 0.11 0.05 0.18 0.05 0.17 0.03 0.11 0.04 0.19 0.07 0.19 0.04 1.96 0.25 6.15 0.21 6.29 0.04 1.71 0.21 5.68 0.5 5.86 0.44 13.19 1.82 34.09 1.8 34.3 0.36 8.94 0.7 27.18 1.89 26.98 524.1 347.02 520.66 681.15 522.43 708.54 525.74 433.17 524.03 658.92 525.64 718.14 523.66 1712.5 524.42 3002.4 530.95 2990.05 520.15 2547.18 525.46 4078.99 538.52 4580.88
Continuous Reformulation of Binary Variables, Revisited
213
Table 3. Sums, averages and standard deviations for the instance set I1 (Table 2). feas BB
MS
conerr BB
sum
47
59
2270.01 934.47 23254 32600
avg
0.261 0.328 12.61
stdev 0.44
0.47
36.95
MS
CPU BB
MS
5.19
129.2 181.1
14.78
209.4
610.3
We remark that we report 6 decimal digits in Table 2 but the computations have been carried out at the machine floating point precision. We report sum, averages and standard deviations for feas, conerr, CPU in Table 3. For lack of space we only report sum, average and standard deviations (rather than complete results) for the results on the possibly infeasible instance set I2 . It is clear from Tables 3–4 that continuous optimization techniques are extremely beneficial in finding feasible solutions to tightly constrained MILPs. Table 4. Sums, averages and standard deviations for the instance set I2 . feas BB MS
conerr BB
sum
0
275542.20 274858.03 211.44 5985.9
avg
0
0.022 1530.79
1526.99
1.17
33.26
stdev 0
0.148 3827.90
3831.38
8.93
73.71
4
MS
CPU BB
MS
One might question whether the superior performance of the MS heuristic is due to the higher CPU time effort of MS w.r.t. CPLEX. To ascertain this, we re-ran the experiments with the CPU time of the MS heuristic capped at 3 min. Table 5 reports on the sums, averages and variances of results obtained on both I1 and I2 this way. We remark that the results on I1 in Table 5 are actually better than those in Table 3. This is just due to the stochastic nature of the MS heuristic, and to the fact that good solutions are identified early on in the search. Overall, we conclude that the CPU time taken by MS is not the reason for its advantage w.r.t. BB.
214
L. Liberti
Table 5. Sums, averages and std. dev. for MS on I1 , I2 capped at 180 s CPU time. Feasible (set I1 ) feas conerr CPU sum
81
avg
0.45 5.09
916.08 8395.9 6
stdev 0.50 15.33
7
Possibly infeas. (set I2 ) feas conerr CPU 274834.36 3916.4
46.64
0.03 1526.86
21.758
93.22
0.18 3831.52
37.88
Conclusion
We explored the old idea of replacing binary variables by continuous ones bounded by [0, 1] and a linear complementarity constraint. For tightly constrained MILPs similar to market share instances, we show that a simple multistart approach is superior (under a time limit) to a state-of-the-art branch-andbound solver.
References 1. Aardal, K., Bixby, R.E., Hurkens, C.A.J., Lenstra, A.K., Smeltink, J.W.: Market split and basis reduction: towards a solution of the cornu´ejols-dawande instances. In: Cornu´ejols, G., Burkard, R.E., Woeginger, G.J. (eds.) IPCO 1999. LNCS, vol. 1610, pp. 1–16. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-4877781 2. Aardal, K., Hurkens, C., Lenstra, A.: Solving a system of linear Diophantine equations with lower and upper bounds on the variables. Math. Oper. Res. 25(3), 427–442 (2000) 3. Barvinok, A.I.: Feasibility testing for systems of real quadratic equations. Discrete Comput. Geom. 10(1), 1–13 (1993). https://doi.org/10.1007/BF02573959 4. Bradley, G.: Algorithms for Hermite and Smith normal matrices and linear Diophantine equations. Math. Comput. 25(116), 897–907 (1971) 5. Cornu´ejols, G., Dawande, M.: A class of hard small 0—1 programs. In: Bixby, R.E., Boyd, E.A., R´ıos-Mercado, R.Z. (eds.) IPCO 1998. LNCS, vol. 1412, pp. 284–293. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-69346-7 22 6. D’Ambrosio, C.: Personal communication (2017) 7. Di Giacomo, L., Patrizi, G., Argento, E.: Linear complementarity as a general solution method to combinatorial problems. INFORMS J. Comput. 19(1), 73–79 (2007) 8. Fourer, R., Gay, D.: The AMPL Book. Duxbury Press, Pacific Grove (2002) 9. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman and Company, New York (1979) 10. Gill, P.: User’s guide for SNOPT version 7.2. Systems Optimization Laboratory, Stanford University, California (2006) 11. Hall, P.: The distribution of means for samples of size n drawn from a population in which the variate takes values between 0 and 1, all such values being equally probable. Biometrika 19(3/4), 240–245 (1927) 12. IBM: ILOG CPLEX 12.10 User’s Manual. IBM (2020)
Continuous Reformulation of Binary Variables, Revisited
215
13. Lazebnik, F.: On systems of linear Diophantine equations. Math. Mag. 69(4), 261– 266 (1996) 14. Ledoux, M.: The concentration of measure phenomenon. In: Number 89 in Mathematical Surveys and Monographs. AMS, Providence (2005) 15. Lenstra, H.: Integer programming with a fixed number of variables. Math. Oper. Res. 8(4), 538–548 (1983) 16. Locatelli, M., Schoen, F.: Random linkage: a family of acceptance/rejection algorithms for global optimization. Math. Program. 85(2), 379–396 (1999) 17. Lopez, C., Beasley, J.: A note on solving MINLP’s using formulation space search. Optim. Lett. 8, 1167–1182 (2014) 18. Murray, W., Ng, K.-M.: An algorithm for nonlinear optimization problems with binary variables. Comput. Optim. Appl. 47, 257–288 (2010) 19. Pardalos, P., Rosen, J.: Global optimization approach to the linear complementarity problem. SIAM J. Sci. Stat. Comput. 9(2), 341–353 (1988) 20. Raghavachari, M.: On connections between zero-one integer programming and concave programming under linear constraints. Oper. Res. 17(4), 680–684 (1969) 21. Sahraoui, Y., Bendotti, P., D’Ambrosio, C.: Real-world hydro-power unitcommitment: dealing with numerical errors and feasibility issues. Energy 184, 91–104 (2019) 22. Schrijver, A.: Theory of Linear and Integer Programming. Wiley, Chichester (1986) 23. Taktak, R., D’Ambrosio, C.: An overview on mathematical programming approaches for the deterministic unit commitment problem in hydro valleys. Energy Syst. 8(1), 57–79 (2016). https://doi.org/10.1007/s12667-015-0189-x
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition of a Regular Multigraph Andrey Kostenko
and Andrei Nikolaev(B)
P. G. Demidov Yaroslavl State University, 14 Sovetskaya Street, 150003 Yaroslavl, Russia Abstract. A Hamiltonian decomposition of a regular graph is a partition of its edge set into Hamiltonian cycles. The problem of finding edgedisjoint Hamiltonian cycles in a given regular graph has many applications in combinatorial optimization and operations research. Our motivation for this problem comes from the field of polyhedral combinatorics, as a sufficient condition for vertex nonadjacency in the 1-skeleton of the traveling salesperson polytope can be formulated as the Hamiltonian decomposition problem in a 4-regular multigraph with one forbidden decomposition. In our approach, the algorithm starts by solving the relaxed 2-matching problem, then iteratively generates subtour elimination constraints for all subtours in the solution and solves the corresponding ILP-model to optimality. The procedure is enhanced by the local search heuristic based on chain edge fixing and cycle merging operations. In the computational experiments, the iterative ILP algorithm showed comparable results with the previously known heuristics on undirected multigraphs and significantly better performance on directed multigraphs. Keywords: Hamiltonian decomposition · Traveling salesperson polytope · 1-skeleton · Vertex adjacency · Integer linear programming Subtour elimination constraints · Local search
1
·
Introduction
A Hamiltonian decomposition of a regular graph is a partition of its edge set into Hamiltonian cycles. The problem of finding edge-disjoint Hamiltonian cycles in a given regular graph has different applications in combinatorial optimization [21], coding theory [4,5], privacy-preserving distributed mining algorithms [10], analysis of interconnection networks [17] and other areas. Our motivation for this problem comes from the field of polyhedral combinatorics. We consider a classic traveling salesperson problem: given a complete weighted graph (or digraph) Kn = (V, E), it is required to find a Hamiltonian cycle of minimum weight. We denote by HCn the set of all Hamiltonian cycles in This research was supported by P.G. Demidov Yaroslavl State University Project VIP016. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 216–232, 2021. https://doi.org/10.1007/978-3-030-86433-0_15
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
217
Kn . With each Hamiltonian cycle x ∈ HCn we associate a characteristic vector xv ∈ RE by the following rule: 1, if the cycle x contains an edge e ∈ E, v xe = 0, otherwise. The polytope
TSP(n) = conv{xv | x ∈ HCn }
is called the symmetric traveling salesperson polytope. The asymmetric traveling salesperson polytope ATSP(n) is defined similarly as the convex hull of characteristic vectors of all possible Hamiltonian cycles in the complete digraph Kn . The 1-skeleton of a polytope P is the graph whose vertex set is the vertex set of P and edge set is the set of one-dimensional faces of P . The study of 1-skeleton is of interest, since, on the one hand, some algorithms for perfect matching, set covering, independent set, a ranking of objects, and problems with fuzzy measures are based on the vertex adjacency relation in 1-skeleton and the local search technique (see, for example, [1,6,11,12]). On the other hand, such characteristics of 1-skeleton as the diameter and clique number, estimate the time complexity for different computation models and classes of algorithms [7,9,16]. Unfortunately, the classic result by Papadimitriou stands in the way of studying the 1-skeleton of the traveling salesperson polytope. Theorem 1 (Papadimitriou [24]). The question of whether two vertices of the polytopes TSP(n) or ATSP(n) are nonadjacent is NP-complete.
2
Formulation of the Problem
We consider a sufficient condition for vertex nonadjacency in 1-skeleton of the traveling salesperson polytope by Rao [27]. Let x = (V, E(x)) and y = (V, E(y)) be two Hamiltonian cycles on the vertex set V . We denote by x ∪ y a multigraph (V, E(x) ∪ E(y)) that contains all edges of both cycles x and y. Lemma 1 (Rao [27]). Given two Hamiltonian cycles x and y, if the multigraph x ∪ y contains a Hamiltonian decomposition into edge-disjoint cycles z and w different from x and y, then the corresponding vertices xv and y v of the polytope TSP(n) (or ATSP(n)) are not adjacent. From a geometric point of view, the sufficient condition means that the segment connecting two vertices xv and y v intersects with the segment connecting two other vertices z v and wv of the polytope TSP(n) (or ATSP(n) correspondingly). Thus, the vertices xv and y v cannot be adjacent. An example of a satisfied sufficient condition is shown in Fig. 1 (see also [23]). We formulate the sufficient condition for vertex nonadjacency of the traveling salesperson polytope as a combinatorial problem.
218
A. Kostenko and A. Nikolaev 2
3
1
2 4
6
1
5
2
x 2 1 5 y
4 6
5 x∪y
5 z
4 6
4 6
3
1 3
3
2
3
1
4 6
5 w
Fig. 1. The multigraph x ∪ y has two different Hamiltonian decompositions
Hamiltonian Decomposition with One Forbidden Decomposition Instance. Let x and y be two Hamiltonian cycles. Question. Does the multigraph x∪y contain a pair of edge-disjoint Hamiltonian cycles z and w different from x and y? Thus, we consider a version of a Hamiltonian decomposition problem of a special form. By construction, the union multigraph x ∪ y always contains the Hamiltonian decomposition into x and y, but in our case, this decomposition is forbidden, and we need to find another decomposition if it exists. Note that testing of whether a graph has a Hamiltonian decomposition is NP-complete, even for 4-regular undirected graphs and 2-regular directed graphs [25]. Therefore, instead of Rao’s sufficient condition, various polynomially solvable special cases of the vertex nonadjacency problem have been studied in the literature. In particular, the polynomial sufficient conditions for the pyramidal tours [8], pyramidal tours with step-backs [22], and pedigrees [2,3] are known. However, all of them are weaker than the sufficient condition by Rao. The Hamiltonian decomposition problem of the considered form was introduced in [20] and later studied in [23], where two heuristic algorithms were proposed. The set of feasible solutions in both algorithms consists of all possible decompositions of the multigraph x ∪ y into edge-disjoint 2-factors z and w. Recall that a 2-factor (or a perfect 2-matching) of a graph G is a subset of edges of G such that every vertex is incident with exactly two edges. The differences are as follows: – the simulated annealing algorithm from [20] repeatedly finds 2-factors z and w through the reduction to random perfect matching [28] until it obtains a pair of Hamiltonian cycles different from x and y; – the general variable neighborhood search algorithm from [23] adds to the previous algorithm several neighborhood structures and cycle merging operations combined in the basic variable neighborhood descent approach [14].
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
219
Heuristic algorithms have proven to be very efficient on instances with an existing solution, especially on undirected graphs. However, on instances without a solution, the heuristics face significant difficulties. In this paper, we propose two exact algorithms for solving the Hamiltonian decomposition problem with a forbidden decomposition and verifying vertex nonadjacency of the traveling salesperson polytope. The first algorithm iteratively generates ILP models for the problem. The second algorithm combines the first one and the modified local search heuristic from [23].
3
Iterative Integer Linear Programming
Let x = (V, E(x)), y = (V, E(y)), x ∪ y = (V, E = E(x) ∪ E(y)). With each edge e ∈ E we associate the variable 1, if e ∈ z, xe = 0, if e ∈ w. We adapt the classic ILP formulation of the traveling salesperson problem by Dantzig, Fulkerson and Johnson [13] into the following ILP model for the considered Hamiltonian decomposition problem: xe = |V |, (1) e∈E
xe = 2,
∀v ∈ V,
(2)
e∈Ev
xe ≤ |V | − |E(x) ∩ E(y)| − 2,
(3)
xe ≤ |V | − |E(x) ∩ E(y)| − 2,
(4)
e∈E(x)\E(y)
e∈E(y)\E(x)
xe ≤ |S| − 1,
∀S ⊂ V,
(5)
xe ≥ |ES | − |S| + 1,
∀S ⊂ V,
(6)
∀e ∈ E.
(7)
e∈ES
e∈ES
xe ∈ {0, 1},
In the following, we elaborate on the model. The multigraph x ∪ y contains 2|V | edges. The constraint (1) guarantees that both components z (xe = 1) and w (xe = 0) receive exactly |V | edges. We denote by Ev the set of all edges incident to the vertex v in x ∪ y. By the vertex degree constraint (2) the degree of each vertex in z and w is equal to 2. The vertex degree constraint for directed graphs is slightly different. Let for some vertex v ∈ V : e1 and e2 be two incoming edges, and u1 and u2 be two outgoing edges. Then the constraints (2) will take the form:
220
A. Kostenko and A. Nikolaev
Algorithm 1. Iterative integer linear programming algorithm procedure IterativeILP(x ∪ y) Define current model as (1)–(4),(7) relaxed 2-matching problem while the model is feasible do z, w ← an integer point of the current model by an ILP-solver if z and w is a Hamiltonian decomposition then return Hamiltonian decomposition z and w end if For all subtours in z and w add the SEC (5) and (6) into the model end while return Hamiltonian decomposition does not exist end procedure
xe1 + xe2 = 1, xu1 + xu2 = 1, with exactly one incoming edge and outgoing edge for each vertex in the solution. The constraints (3)–(4) forbid the Hamiltonian cycles x and y as a solution. If we consider a general Hamiltonian decomposition problem of a 4-regular multigraph without reference to the vertex adjacency in the traveling salesperson polytope, then these constraints can be omitted. Finally, the inequalities (5)–(6) are known as the subtour elimination constraints (SEC ), which forbid solutions consisting of several disconnected tours. Here S is a subset of V , ES is the set of all edges from E with both vertices belonging to S: ES = {(u, v) ∈ E : u, v ∈ S}. The main problem with the subtour elimination constraints is that there are exponentially many of them: two for each subset of S ⊂ V , i.e. Ω(2|V | ). Therefore, the idea of the first algorithm is as follows. We start with the relaxed model (1)–(4), (7) of the basic 2-matching problem with O(V ) constraints. By ILP-solver we obtain an integer point that corresponds to the pair of 2-factors z and w. Then we find all subtours in z and w, add the corresponding subtour elimination constraints (5) and (6) into the model, and iteratively repeat this procedure. The algorithm stops either by finding the Hamiltonian decomposition into cycles z and w or by obtaining an infeasible model that does not contain any integer points. This approach is inspired by the algorithm for the traveling salesperson problem from [26] and is summarized in Algorithm 1.
4
Local Search
To improve the performance, we enhance Algorithm 1 with the local search heuristic. The neighborhood structure is a significantly modified version of the first neighborhood in the GVNS algorithm [23] with a new recursive chain edge fixing procedure.
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
4.1
221
Feasible Set
Every solution of the ILP model (1)–(4), (7) with partial subtour elimination constraints corresponds to the pair z and w of edge-disjoint 2-factors of the multigraph x ∪ y (Fig. 2). We compose a set of feasible solutions for the local search algorithm from all possible pairs of edge-disjoint 2-factors of the multigraph x ∪ y. 4.2
Objective Function
As the objective function to minimize, we choose the total number of connected components in the 2-factors z and w. If it equals 2, then z and w are Hamiltonian cycles. 4.3
Neighborhood Structure for Directed Graphs
The main difference between the neighborhood structure presented in this chapter and those described in [23] is the chain edge fixing procedure. We divide the edges of z and w into two classes: – unfixed edges that can be moved between z and w to get a neighboring solution; – edges that are fixed in z or w and cannot be moved. The idea is that one fixed edge starts a recursive chain of fixing other edges. For example, we consider a directed 2-regular multigraph x ∪ y with all indegrees and outdegrees are equal to 2. Let us choose some edge (i, j) and fix it in the component z, then the second edge (i, k) outgoing from i and the second edge (h, j) incoming into j obviously cannot get into z. We will fix these edges in w (Fig. 3). In turn, the edges (i, k) and (h, j), fixed in w, start the recursive chains of fixing edges in z, etc. At the initial step, we fix in z and w a copy of each multiple edge of the x ∪ y, since both copies obviously cannot end up in the same Hamiltonian cycle. We construct a neighboring solution by choosing an edge of z, moving it to w, and running the chain edge fixing procedure to restore the correct 2-factors. If the number of connected components in z and w has decreased, we find all subtours in z and w, add the corresponding subtour elimination constraints (5) and (6) into the model, proceed to a new solution and restart the local search. This procedure is summarized in Algorithm 2. Note that although the chain edge fixing procedure at each step can call up to two of its recursive copies, the total complexity is linear (O(V )), since each edge can only be fixed once, and |E| = 2|V |. Thus, the size of the neighborhood is equal to the number of edges in z, i.e. O(V ), and the total complexity of exploring the neighborhood is quadratic O(V 2 ).
222
A. Kostenko and A. Nikolaev
2
3
1
2 4
6
3
1
2 4
5
6
5
1
4 6
x∪y
z
3
5
w = (x ∪ y)\z
Fig. 2. The multigraph x ∪ y and its two edge-disjoint 2-factors
k w i
z
j
w h
Fig. 3. Fixing the edge (i, j) in z
Algorithm 2. Local search for directed graphs procedure Chain Edge Fixing Directed((i, j) in z) Fix the edge (i, j) in z and mark it as checked to avoid double checking if the edge (i, k) is not fixed then Chain Edge Fixing Directed((i, k) in w) end if if the edge (h, j) is not fixed then Chain Edge Fixing Directed((h, j) in w) end if end procedure procedure Local Search Directed(z, w) Fix the multiple edges in z and w repeat Shuffle the edges of z in random order for each unchecked and unfixed edge (i, j) in z do Chain Edge Fixing Directed((i, j) in w) Move (i, j) from z to w if the number of connected components in z and w has decreased then For all subtours in z and w add the SEC (5) and (6) into the model Proceed to a new solution and restart the local search end if Restore the original z and w and unfix all non-multiple edges end for until all edges of z are checked and no improvement found A local minimum return z and w end procedure
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
4.4
223
Neighborhood Structure for Undirected Graphs
The neighborhood structure for undirected graphs is similar, we choose an edge of z, move it to w and run the chain edge fixing procedure. The key difference is that after the exchange of edges and the chain edge fixing procedure, there will remain some broken vertices in z and w with a degree not equal to 2. We restore the degree of each broken vertex by moving random unfixed incident edges between components z and w (Fig. 4). This procedure is summarized in Algorithm 3. Since at each step we pick a random edge to restore a broken vertex, the local search for undirected graphs is a randomized algorithm. Therefore, we run several attempts (parameter attemptLimit) while constructing each neighboring solution, i.e. exploring several random branches in the search tree. Thus, the size of the neighborhood is equal to O(V · attemptLimit), and the total complexity of exploring the neighborhood is O(V 2 · attemptLimit). 4.5
Iterative ILP Algorithm with Local Search
We add the local search heuristic into the iterative ILP algorithm between iterations to improve the performance on instances with an existing Hamiltonian decomposition. If the ILP-solver returns a pair of edge-disjoint 2-factors z and w that are not a Hamiltonian decomposition, then we call the local search to minimize the number of connected components. Note that every time the local search improves the solution, we modify the model by adding the corresponding subtour elimination constraints for all subtours in z and w. Thus, we implement the memory structure and prohibit the algorithm from returning to feasible solutions that have already been explored. If the heuristic also fails, we restart the ILP-solver on the modified model, and repeat these steps until a Hamiltonian decomposition is found, or the resulting model is infeasible. This procedure is summarized in Algorithm 4.
5
Computational Results
For comparison, we chose two algorithms presented in this paper and two known heuristic algorithms: – Iterative ILP algorithm from Sect. 3 (Algorithm 1); – Iterative ILP + LS algorithm from Sect. 4 (Algorithm 4); – SA: the simulated annealing algorithm that repeatedly finds 2-factors through the reduction to random perfect matching from [20]; – GVNS: the general variable neighborhood search algorithm from [23]. The ILP algorithms are implemented in C++, for the SA and GVNS algorithms the existing implementation in Node.js [23] is taken. Computational experiments were performed on an Intel (R) Core (TM) i5-4460 machine with a 3.20 GHz CPU and 16 GB RAM. As the ILP-solver we used SCIP 7.0.2 [15].
224
A. Kostenko and A. Nikolaev
Fig. 4. Restoring the broken vertex 8 (blue colored) by moving the random incident edge (8, 7) from w (dashed edges) to z (solid edges). (Color figure online)
Algorithm 3. Local search for undirected graphs procedure Chain Edge Fixing Undirected((i, j) in z) Fix the edge (i, j) in z if vertex i in z has two incident fixed edges then Chain Edge Fixing Undirected((i, k) and (i, h) in w) end if if vertex j in z has two incident fixed edges then Chain Edge Fixing Undirected((j, k) and (j, h) in w) end if end procedure procedure Local Search Undirected(z, w, attemptLimit) Fix the multiple edges in z and w repeat Shuffle the edges of z in random order for each unchecked and unfixed edge (i, j) in z do Chain Edge Fixing Undirected((i, j) in w) Move (i, j) from z to w for i ← 1 to attemptLimit do while z contains a broken vertex i with degree not equal to 2 do if vertex degree of i is equal to 1 then Pick a random unfixed edge (i, k) of w One missing edge Chain Edge Fixing Undirected((i, k) in z) end if if vertex degree of i is equal to 3 then Pick a random unfixed edge (i, k) of z; One extra edge Chain Edge Fixing Undirected((i, k) in w) end if end while if the number of connected components has decreased then For all subtours in z and w add the SEC (5) and (6) into the model Proceed to a new solution and restart the local search end if Restore z and w and unfix all non-multiple edges except (i, j) end for Unfix the edge (i, j) and mark it as checked end for until all edges of z are checked and no improvement found A local minimum return z and w end procedure
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
225
Algorithm 4. Iterative ILP algorithm with local search procedure IterativeILP+LS(x ∪ y, attemptLimit) define the current model as (1)–(4),(7) relaxed 2-matching problem while the model is feasible do z, w ← an integer point of the current model by ILP-solver if z and w is a Hamiltonian decomposition then return Hamiltonian decomposition z and w end if For all subtours in z and w add the SEC (5) and (6) into the model if the graph is directed then z, w ← Local Search Directed(z, w); else z, w ← Local Search Undirected(z, w, attemptLimit); end if if z and w is a Hamiltonian decomposition then return Hamiltonian decomposition z and w end if end while return Hamiltonian decomposition does not exist end procedure
The algorithms were tested on random directed and undirected Hamiltonian cycles. For each graph size, 100 pairs of random permutations with a uniform probability distribution were generated by the Fisher-Yates shuffle algorithm [19]. For two ILP algorithms, a limit of 2 h was set for each set of 100 instances. Therefore, the tables indicate how many instances out of 100 the algorithms managed to solve in 2 h. For both heuristic algorithms, a limit of 60 s per test was set, as well as a limit on the number of iterations: 2 500 for SA and 250 for GVNS. The reason is that the heuristic algorithms have a one-sided error. If the algorithm finds a solution, then the solution exists. However, the heuristic algorithms cannot guarantee that the solution to the problem does not exist, only that the solution has not been found in a given time or number of iterations. For each set of 100 instances, the tables show the average running time in seconds and the average number of iterations separately for feasible and infeasible problems. The computational results for random directed multigraphs are presented in Table 1 and Fig. 5. On the considered test set, only 194 instances out of 1 000 had a solution. Three algorithms: iterative ILP, iterative ILP + LS, and GVNS correctly solved all instances in the given time, while SA found only 101 Hamiltonian decompositions of 194. It can be seen that directed multigraphs contain not many subtours. Thus, the iterative ILP algorithm requires on average only 6.3 iterations to find a solution, and 6.2 iterations to prove that a solution does not exist. The addition of the local search heuristic to the algorithm makes it possible to reduce the number of iterations by an average of 2.4 times for problems with a solution and 1.3 times for problems without a solution. In some cases, as for graphs on 192 and 256 vertices, this speeds up the algorithm. However, in most cases, the heuristic does not give an improvement in runtime. On average, ILP + LS is 3 times slower on problems with a solution and 5 times slower on
226
A. Kostenko and A. Nikolaev
Table 1. Computational results for pairs of random directed Hamiltonian cycles Iterative ILP Feasible
Iterative ILP + LS Feasible Infeasible
Infeasible
|V |
N Time (s) Iter
N
Time (s) Iter
N Time (s) Iter
N Time (s) Iter
192
21 0.028
4.23
79
0.029
4.22
21 0.022
2.00
79 0.037
3.44
256
25 0.111
7.04
75
0.060
5.74
25 0.075
3.12
75 0.090
4.64
384
20 0.054
4.65
80
0.082
5.75
20 0.080
2.60
80 0.156
4.33
512
22 0.105
5.45
78
0.114
5.58
22 0.125
2.36
78 0.248
4.33
768
19 0.148
6.05
81
0.125
5.43
19 0.226
2.26
81 0.439
4.27
1024 17 0.182
5.70
83
0.222
6.21
17 0.277
1.88
83 0.925
4.80
1536 16 0.325
6.43
84
0.404
6.83
16 1.099
2.50
84 2.312
5.34
2048 15 0.568
7.33
85
0.503
6.70
15 3.137
3.13
85 3.829
5.09
3072 21 1.130
7.95
79
1.009
7.65
21 4.661
2.42
79 10.722
5.91
4096 18 1.681
8.16
82
1.522
7.95
18 18.560
4.44
82 21.283
5.98
SA (perfect matching) Solved Not solved
GVNS Solved
|V |
N Time (s) Iter
Time (s) Iter
N Time (s) Iter
N Time (s) Iter
192
21 0.594
183.95 71
8.196
2500
21 0.015
3.76
79 1.618
256
20 2.557
466.75 80
13.776
2500
25 0.084
10.72 75 2.168
250
384
17 5.824
500.70 83
28.136
2500
20 0.110
8.00
80 5.958
250
512
16 6.525
315.68 84
47.791
2500
22 0.136
5.45
78 10.903
250
768
13 13.384
292.15 87
60.000
1420
19 0.643
10.52 81 25.673
250
127.88 91
60.000
749.81 17 1.746
16.94 83 42.272
250
1024 9
11.576
N
Not solved 250
1536 − −
−
100 60.000
361.40 16 1.072
5.81
84 60.000
196.64
2048 1
8.717
38
99
60.000
249.98 15 3.201
12.00 85 60.000
151.94
3072 3
13.719
42.33
97
60.000
185.57 21 5.554
12.28 79 60.000
115.62
4096 1
16.713
37
99
60.000
141.15 18 9.395
17.38 82 60.000
98.73
problems without a solution, and the gap only increases with the growth of the graph size. We can conclude that a few extra iterations of the ILP-solver turn out to be cheaper in runtime than using an additional heuristic. Regarding the heuristic algorithms, the performance of GVNS on instances with the existing solution is on average 3.8 times slower than ILP and is comparable to ILP + LS. While SA completely dropped out of the competition, finding only 101 solutions out of 194, and being an order of magnitude slower. As for instances without a solution, both heuristic algorithms are not able to determine this scenario and exit only when the limit on the running time or the number of iterations is reached. In this case, GVNS turns out to be on average 100 times slower than ILP. However, it is difficult to compare performance here since the time and iteration limits in both heuristic algorithms are set as parameters. Note that, in the GVNS, the limit was 250 iterations, while the algorithm found a solution, if it exists, on average in 10 iterations. This means that the
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
227
Average runtime in ms.
214 212 210 28 26
ILP ILP+LS SA GVNS
24 22 192
256
384
512
768
1024 1536 2048 3072 4096
Problem size |V |
Fig. 5. Computational results for feasible problems on directed graphs
limit can potentially be lowered to speed up the algorithm. However, this will increase the danger of losing the existing solution. The situation for undirected multigraphs is fundamentally different. It is known that random undirected regular graphs have a Hamiltonian decomposition with a very high probability, which allows finding the decomposition asymptotically almost surely by random matchings in polynomial time [18]. This approach is in some way similar to the considered SA algorithm. In our case, the problem is slightly different, since the multigraph x ∪ y always has a decomposition into cycles x and y, and we need to find another decomposition into cycles z and w. Nevertheless, for all 1 000 instances on undirected cycles (Table 2 and Fig. 6), there was a second Hamiltonian decomposition, and the vertices of the TSP(n) polytope were not adjacent. From a geometric point of view, this means that the degrees of vertices in 1-skeleton are much less than the total number of vertices, so two random vertices are not adjacent with a very high probability. Summary for random undirected multigraphs: both iterative ILP + LS and GVNS solved all 1 000 instances, SA solved 558 instances, and ILP solved only 531 instances in a given time. It can be concluded that the iterative ILP algorithm was not very successful for undirected graphs and showed similar results to the SA algorithm. On instances up to 768 vertices, where all tests were solved, the ILP was on average 2.3 times slower than the SA. The problem is that undirected multigraphs contain a large number of subtours that have to be forbidden. On average, the ILP algorithm took about 86 iterations to find a solution. On the other hand, the addition of the local search heuristic to the ILP algorithm reduced the running time by an average of 200 times, and the number of iterations by 65 times. The ILP + LS algorithm showed results similar to GVNS,
228
A. Kostenko and A. Nikolaev
Table 2. Computational results for pairs of random undirected Hamiltonian cycles Iterative ILP Feasible
Infeasible
Iterative ILP + LS Feasible Infeasible
|V |
N
N
Time (s) Iter
N
192
100 2.029
23.28
−
−
−
100 0.052
1.24 − −
256
100 4.800
30.53
−
−
−
100 0.094
1.30 − −
−
384
100 12.168
34.13
−
−
−
100 0.150
1.27 − −
−
512
100 24.914
44.22
−
−
−
100 0.217
1.28 − −
−
768
100 67.382
54.41
−
−
−
100 0.488
1.29 − −
− −
Time (s) Iter
Time (s) Iter N Time (s) Iter −
1024 20
396.215
95.40
−
−
−
100 0.721
1.21 − −
1536 1
30.598
33
−
−
−
100 1.518
1.34 − −
−
2048 4
1618.87
235.6
−
−
−
100 3.281
1.32 − −
−
3072 4
1772.42
143.25 −
−
−
100 6.746
1.34 − −
−
4096 2
3506.19
168.50 −
−
−
100 14.447
1.38 − −
−
SA (perfect matching) Solved Not solved
GVNS Solved
|V |
N
Time (s) Iter
N
192
100 0.884
105.84 −
−
−
100 0.023
1.00 − −
−
256
100 1.904
124.75 −
−
−
100 0.035
1.00 − −
−
384
100 7.734
228.22 −
−
−
100 0.073
1.00 − −
−
512
99
236.39 1
60.000
1016
100 0.133
1.00 − −
−
768
Time (s) Iter
12.880
N
Not solved
Time (s) Iter N Time (s) Iter
70
21.223
194.74 30
60.000
498.96 100 0.291
1.00 − −
−
1024 46
21.548
124.23 54
60.000
313.14 100 0.511
1.00 − −
−
1536 25
26.140
70.24
75
60.000
157.05 100 1.085
1.00 − −
−
2048 12
35.540
54.33
88
60.000
91.71
100 1.824
1.00 − −
−
3072 6
29.225
19.50
94
60.000
41.11
100 4.235
1.00 − −
−
4096 −
−
−
100 60.000
22.22
100 7.593
1.00 − −
−
solving all test instances and being on average only 1.8 times slower. This time loss is due to two factors. Firstly, the GVNS has a more complex heuristic with several neighborhood structures, which made it possible to find all solutions in just 1 iteration. Secondly, one iteration of the ILP-solver is much more expensive than constructing 2-factors through the reduction to perfect matching. It should be noted that although all 1 000 random instances on undirected graphs had a solution, in the general case, the traveling salesperson polytope contains adjacent vertices for which, accordingly, the Hamiltonian decomposition does not exist. Moreover, the 1-skeleton of the traveling salesperson polytope has cliques with an exponential number of vertices [7]. Thus, the ILP + LS algorithm may turn out to be more promising, since it will be able to prove that there is no Hamiltonian decomposition for the given problem.
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
229
218 Average runtime in ms.
216 214 212 210 28
ILP ILP+LS SA GVNS
26 24 192
256
384
512
768
1024 1536 2048 3072 4096
Problem size |V |
Fig. 6. Computational results for undirected graphs Table 3. Computational results for infeasible problems on undirected pyramidal tours
Iterative ILP |V | Time (s) Iter
Iterative ILP+LS Time (s) Iter
SA
GVNS
Time (s) Iter
Time (s) Iter
128 0.177
10.90 0.239
8.94
3.439
2500 1.495
250
192 0.418
14.78 0.645
11.26
6.901
2500 2.278
250
256 0.822
19.40 1.575
14.84
11.687
2500 3.253
250
384 2.425
28.50 5.549
20.59
25.215
2500 5.609
250
512 4.586
36.08 13.488
26.24
43.904
2500 9.868
250
768 17.740
56.41 56.824
41.25
60.000
1713 17.313
250
We ran additional tests to investigate this scenario (Table 3). Using the vertex adjacency criterion for the pyramidal tours polytope [8], we generated 6 groups of 50 pairs of such undirected pyramidal tours x and y that the multigraph x ∪ y is guaranteed not to contain a Hamiltonian decomposition into cycles z and w. It can be seen that although the additional local search heuristic reduced the number of iterations by an average of 1.3 times, the total running time increased by an average of 2.2 times. Indeed, the local search takes extra time to find a solution that does not exist. However, this slight slowdown is acceptable, given that on undirected multigraphs with an existing solution (Table 2), the local search heuristic gives an average speed up of 200 times. Note that for undirected graphs the number of iterations grows significantly faster than for
230
A. Kostenko and A. Nikolaev
directed multigraphs (Tables 1) since the undirected multigraphs contain a large number of subtours that have to be forbidden. Nevertheless, the ILP algorithms have the advantage over the SA and GVNS here, since the heuristic algorithms cannot guarantee that the problem is infeasible.
6
Conclusion
We introduced two iterative ILP algorithms to find a Hamiltonian decomposition of the 4-regular multigraph. On random undirected multigraphs, the version enhanced by the local search heuristic turned out to be much more efficient than the basic ILP algorithm, showing results comparable to the known general variable neighborhood search heuristic. While for random directed multigraphs the iterative ILP algorithm significantly surpassed in speed the previously known algorithms. The key feature that distinguishes the ILP algorithms from previously known heuristics is that they can prove that the Hamiltonian decomposition in the graph does not exist. The directions for further development are as follows. Firstly, we can consider a more complex heuristic with several neighborhood structures, as in [23], to speed up the algorithm on problems with an existing solution. Secondly, it is of great interest to add to the model other classes of facet inequalities of the traveling salesperson polytope, like 2-matching and clique-tree inequalities [16], that can significantly reduce the number of expensive calls of an ILP-solver. Acknowledgements. We are very grateful to the anonymous reviewers for their comments and suggestions which helped to improve the presentation of the results in this paper.
References 1. Aguilera, N.E., Katz, R.D., Tolomei, P.B.: Vertex adjacencies in the set covering polyhedron. Discrete Appl. Math. 218, 40–56 (2017). https://doi.org/10.1016/j. dam.2016.10.024 2. Arthanari, T.S.: On pedigree polytopes and Hamiltonian cycles. Discrete Math. 306, 1474–1492 (2006). https://doi.org/10.1016/j.disc.2005.11.030 3. Arthanari, T.S.: Study of the pedigree polytope and a sufficiency condition for nonadjacency in the tour polytope. Discrete Optim. 10, 224–232 (2013). https:// doi.org/10.1016/j.disopt.2013.07.001 4. Bae, M.M., Bose, B.: Edge disjoint Hamiltonian cycles in k-ary n-cubes and hypercubes. IEEE Trans. Comput. 52, 1271–1284 (2003). https://doi.org/10.1109/TC. 2003.1234525 5. Bailey, R.F.: Error-correcting codes from permutation groups. Discrete Math. 309, 4253–4265 (2009). https://doi.org/10.1016/j.disc.2008.12.027 6. Balinski, M.L.: Signature methods for the assignment problem. Oper. Res. 33, 527–536 (1985). https://doi.org/10.1287/opre.33.3.527 7. Bondarenko, V.A.: Nonpolynomial lower bounds for the complexity of the traveling salesman problem in a class of algorithms. Autom. Rem. Contr. 44, 1137–1142 (1983)
An Iterative ILP Approach for Constructing a Hamiltonian Decomposition
231
8. Bondarenko, V.A., Nikolaev, A.V.: On the skeleton of the polytope of pyramidal tours. J. Appl. Ind. Math. 12, 9–18 (2018). https://doi.org/10.1134/ S1990478918010027 9. Bondarenko, V.A., Nikolaev, A.V.: Combinatorial and geometric properties of the Max-Cut and Min-Cut problems. Doklady Math. 88(2), 516–517 (2013). https:// doi.org/10.1134/S1064562413050062 10. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4, 28–34 (2002). https:// doi.org/10.1145/772862.772867 11. Chegireddy, C.R., Hamacher, H.W.: Algorithms for finding k-best perfect matchings. Discrete Appl. Math. 18, 155–165 (1987). https://doi.org/10.1016/0166218X(87)900175 12. Combarro, E.F., Miranda, P.: Adjacency on the order polytope with applications to the theory of fuzzy measures. Fuzzy Set. Syst. 161, 619–641 (2010). https:// doi.org/10.1016/j.fss.2009.05.004 13. Dantzig, G., Fulkerson, R., Johnson, S.: Solution of a large-scale traveling-salesman problem. J. Oper. Res. Soc. Am. 2(4), 393–410 (1954). https://doi.org/10.1287/ opre.2.4.393 14. Duarte, A., S´ anchez-Oro, J., Mladenovi´c, N., Todosijevi´c, R.: Variable neighborhood descent. In: Mart´ı R., Pardalos P., Resende M. (eds) Handbook of Heuristics, pp. 341–367, Springer, Cham. (2018). https://doi.org/10.1007/978-3-319-0712449 15. Gamrath, G., et al.: The SCIP Optimization Suite 7.0. ZIB-Report, Zuse Institute Berlin, pp. 20–10 (2020). http://nbn-resolving.de/urn:nbn:de:0297-zib-78023 16. Gr¨ otschel, M., Padberg, M.: Polyhedral theory. In: Lawler, E., Lenstra, J.K., Kan, A.R., Shmoys, D. (eds.) The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization, pp. 251–305. John Wiley, Chichester (1985) 17. Hung, R.-W.: Embedding two edge-disjoint Hamiltonian cycles into locally twisted cubes, Theor. Comput. Sci. 412, 4747–4753 (2011). https://doi.org/10.1016/j.tcs. 2011.05.004 18. Kim, J.H., Wormald, N.C.: Random matchings which induce Hamilton Cycles and Hamiltonian decompositions of random regular graphs. J. Comb. Theory B 81(1), 20–44 (2001). https://doi.org/10.1006/jctb.2000.1991 19. Knuth, D.E.: The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. Addison-Wesley Longman Publishing Co., Inc. (1997). https:// doi.org/10.5555/270146 20. Kozlova, A., Nikolaev, A.: Simulated annealing approach to verify vertex adjacencies in the traveling salesperson polytope. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds.) MOTOR 2019. LNCS, vol. 11548, pp. 374–389. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22629-9 26 21. Krarup, J.: The peripatetic salesman and some related unsolved problems. In: Roy B. (eds) Combinatorial Programming: Methods and Applications. NATO Advanced Study Institutes Series (Series C - Mathematical and Physical Sciences), vol. 19, pp. 173–178 (1995). https://doi.org/10.1007/978-94-011-7557-9 8 22. Nikolaev, A.: On vertex adjacencies in the polytope of pyramidal tours with stepbacks. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds.) MOTOR 2019. LNCS, vol. 11548, pp. 247–263. Springer, Cham (2019). https://doi.org/10.1007/978-3030-22629-9 18
232
A. Kostenko and A. Nikolaev
23. Nikolaev, A., Kozlova, A.: Hamiltonian decomposition and verifying vertex adjacency in 1-skeleton of the traveling salesperson polytope by variable neighborhood search. J. Comb. Optim. 42(2), 212–230 (2021). https://doi.org/10.1007/s10878020-00652-7 24. Papadimitriou, C.H.: The adjacency relation on the traveling salesman polytope is NP-Complete. Math. Program. 14, 312–324 (1978). https://doi.org/10.1007/ BF01588973 25. P´eroche, B.: NP-completeness of some problems of partitioning and covering in graphs. Discrete Appl. Math. 8, 195–208 (1984). https://doi.org/10.1016/0166218X(84)90101-X 26. Pferschy, U., Stanˇek, R.: Generating subtour elimination constraints for the TSP from pure integer solutions. CEJOR 25(1), 231–260 (2016). https://doi.org/10. 1007/s10100-016-0437-8 27. Rao, M.R.: Adjacency of the traveling salesman tours and 0–1 vertices. SIAM J. Appl. Math. 30, 191–198 (1976). https://doi.org/10.1137/0130021 28. Tutte, W.T.: A short proof of the factor theorem for finite graphs. Can. J. Math. 6, 347–352 (1954). https://doi.org/10.4153/CJM-1954-033-3
The Constrained Knapsack Problem: Models and the Polyhedral-Ellipsoid Method Oksana Pichugina1(B)
and Liudmyla Koliechkina2
1
2
National Aerospace University “Kharkiv Aviation Institute”, 17 Chkalova Street, 61070 Kharkiv, Ukraine [email protected] University of Lodz, 3 Uniwersytecka Street, 90-137 Lodz, Poland [email protected]
Abstract. The paper focuses on studying a class KCFG of a Constrained Knapsack Problem (CKP), where conflict and forcing constraints are present. Four KCFG-formulations as quadratically constrained programs are introduced that utilize geometric properties of a feasible domain such as inscribability in an ellipsoid and coverability by two parallel planes. The new models are applied in deriving new upper bounds that can be effectively found by semi-definite programming and the r-algorithm. Another introduced application area is the Polyhedralellipsoid method (PEM) for linear optimization on two-level sets in a polytope P (P -2LSs) illustrated by a numerical example. Besides KCFG, the new modelling and solution approaches can be applied to any CKP reducible to a polynomial number of CKPs on P -2LSs. Keywords: Knapsack problem · Conflict graph · Forcing graph R-algorithm · Semi-definite programming · Quadratic constraint program · Two-level set
1
·
Introduction
The knapsack problem (KP) is one of the most famous NP-hard combinatorial optimization problems [15]. In the standard KP, there is given a set P = {Pi }i of n items and a knapsack of a capacity b. Every item Pi ∈ P has a profit ai and a weight wi . All the values are assumed to be positive integers, i.e., c = (ci )i ∈ Nn , a = (ai )i ∈ Nn , b ∈ N.
(1)
The task is to choose a subset P ∗ ⊆ P, such that the total profit of P ∗ is maximized and the total weight does not exceed b. KP is known since 1972 [14]. Plenty of applications are found for this problem, and a variety of highly effective algorithms are available [16], such as Dynamic Programming (DP) methods and Branch & Bound (B&B) techniques [15], graphtheoretic approaches [18], metaheuristic schemes [3] etc. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 233–247, 2021. https://doi.org/10.1007/978-3-030-86433-0_16
234
O. Pichugina and L. Koliechkina
Various KP extensions are known in the literature, such as multi-objective KP, multi-dimensional KP, multiple KP, quadratic KP, and many others [15]. Some of them can be represented as KP with constraints (constraint KP, CKP). The constraints naturally complicate the problem, solution methods, and even the possibility of obtaining a feasible solution. This leads to the need to search for both new approaches to solving the general CKP and to single outing subclasses of the problems with further developing special solution methods for them. Some CKPs form a class of integer programming formulations (further the special Constrained KP, SCKP) (SCKP): max c x, s.t. a x ≤ b, B − e ≤ Ax ≤ B, x ∈ Bn = {0, 1}n , where A is a matrix with entries 1, −1, 0, B ∈ ZM , e is a vector of units. Examples of SCKP are a knapsack problem with conflict graph (KCG) [30] and a knapsack problem with forcing graph (KFG) [19]. In KCG, conflict constraints (edge conflict constraints, ECC) expressing an incompatibility between pairs of items are imposed. In KFG, forcing constraints (edge forcing constraints, EFC) enforce at least one element from the underlying pairs of items to be included in a feasible solution. These conflict and forcing constraints can be represented with the help of undirected graphs G = (V, E) (a conflict graph) and G = (V, E ) (a forcing graph) with |V | = n, |E| = m, and |E | = m [19]. Each vertex v ∈ V corresponds to a knapsack item, an edge {i, j} ∈ E ({i, j} ∈ E ) indicates that items i and j are in a conflict (forcing) relation. KCG and KFG itself and their aggregation (further referred to as knapsack problem with conflict-forcing graph, KCFG) can be written as SCKP n
ci xi ,
(2)
ai xi ≤ b,
(3)
(ECC) xi + xj ≤ 1, {i, j} ∈ E, (EF C) xi + xj ≥ 1, {i, j} ∈ E ,
(4) (5)
max s.t.
i=1 n i=1
xi ∈ {0, 1}, i ∈ Jn ,
(6)
where Jn = {1, ..., n}, E ∪ E = ∅, a, b, c satisfy (1). To the model (2)–(6) we will refer to as an edge KCG-formulation (KCG.EF). KCG and KFG are strongly NP-hard [19], thus KCFG is strongly NP-hard. (2)–(6) is an ILP formulation of KCFG. We will show that it allows reformulating as a quadratic constraint problems with linear objective functions (QP). The advantage of QP-models is that it enables to apply polynomial optimization tools and relation with semi-definite programs for its solutions and evaluation bounds on optimal value z ∗ of z = c x. The paper is organized as follows. In Sect. 2, after a survey of KCG and KFG solution methods, necessary information about finite point configurations
The Constrained Knapsack Problem
235
(FPCs), their geometric properties and analytic descriptions is given, and the polyhedral-ellipsoid method (PEM) is outlined. Thereafter, in Sect. 3, a notion of two-level sets in a polytope P (P -2LSs) is introduced, and several polynomial KCFG formulations are built. Section 4 is dedicated to applications of quadratic KCFG-formulations. These are semi-definite and dual quadratic upper bounds on z ∗ , and B&B for linear optimization on P -2LSs, called the Polyhedralellipsoid method (PEM), is described. Finally, PEM is illustrated by an example of a KCG solution.
2 2.1
Prerequisites KCG, KFG: Solution Techniques and Formulations
KCG solution methods: exact (B&B [1,6,13,30], Branch and Price (B&P) [8], DP [11]), heuristic such as greedy algorithms [30], a reactive local search based algorithm [13], local branching techniques, scatter search metaheuristics, iterative rounding search (see [1] and references therein). KFG was first considered in [19] as a generalization of the maximum weight vertex cover problem. In [11,19], several conflict and forcing graphs’ families were found, when KCG and KFG are polynomially solvable, and some of their pseudo-polynomial solutions were devised. Clique formulation of KCG [1] (KCG.CF) is another ILP-model of KCG, where ECC is written with the help of clique conflict constraints xi ≤ 1, C ∈ C(V ), (CCC) i∈C
where C(V ) is a clique set of the conflict graph G. The advantage of KCG.CF over KCG.EF is that its continuous relaxation is at least as tight as the one of KFG.EF. However, it requires solving the NP-complete Clique problem to find C(V ). B&P method [8] utilizes KCG.CF demonstrating that CCC-based cuts much stronger than ECC-based ones. 2.2
FPC Classes and F-Representations
Let E be a finite point configuration (FPC) in Rn (a collection of finitely many points in Rn ) and P be a polytope P = conv E. An FPC is called: a) full-dimensional if dim P = n [10]; b) vertex-located (a vertex-located set, VLS) [27] if E = vert P . Remark 1. Further, we will assume that E is a full-dimensional FPC. An example of a VLS is a set of Boolean vectors Bn , which convex hull is a unit hypercube P Bn = [0, 1]n . Definition 1. [10] An FPC E is k-level if for every P -facet-defining hyperplane H there are k parallel hyperplanes H1 , H2 , ..., Hk with E ⊂ H1 ∪ H2 ∪ ... ∪ Hk .
236
O. Pichugina and L. Koliechkina
Remark 2. An FPC E is called two-level (a two-level set, 2LS ) if, in the direction of an arbitrary facet of P , it is decomposed exactly into two parallel planes. Equivalently, E is a 2LS if every facet-defining affine function takes exactly two distinct values on E [10]. Definition 2. [4] A polytope P is said to be a 2LP if for each P -facet-defining hyperplane H, the vertices of P can be covered with H and exactly one other translation of H. Every 2LP is affinely isomorphic to a 0/1-polytope [10], wherefrom it follows that any 2LS is vertex-located. This means that a vertex set of any 2LP is a 2LS. Some examples of 2LPs in the literature are hypercubes, cross-polytopes, hypersimplices, Birkhoff polytopes [2], Hanner polytopes [12], order polytopes [24], stable set polytopes of perfect graphs [5] etc. [9,10,21]. Definition 3. [22] An analytic representation of E as a solution of relations: fi (x) = 0, i ∈ Jm ; fi (x) ≤ 0, i ∈ Jm \Jm ,
(7) (8)
where fi (x) : E → R1 , i ∈ Jm are continuous functions on E ⊇ E, is called a continuous functional representation (an f-representation) of E on E . If E = Rn , the (7), (8) is called an f-representation of E. Such representation that involves the equality part (7) only is a strict f-representation; the one with only the inequalities (8) is a non-strict f-representation; if (7) and (8) are present, the f-representation is mixed [22]. If fi (x), i ∈ Jm are polynomials, then the f-representation is polynomial. For instance, if the highest degree of the polynomials is two, the f-representation is quadratic. If E is convex, and fi (x), i ∈ Jm are convex on E , then an f-representation is called convex. Theorem 1. [29] Any VLS E is representable as the following: E = P ∩ S,
(9)
where S is a strictly convex surface (i.e., S is given by f (x) = 0, where f (x) is strictly convex [23]), and P ⊂ conv S is a polytope. The f-representation (9) is called a polyhedral-surfaced (PSR) [29]. Particularly if S is an ellipsoid, the representation is polyhedral-ellipsoid (PER). Note, for a certain S, a E-PSR is not unique. One of them is E = P ∩ S, and others involve a relaxation polytope P ⊃ P satisfying (9). For a VLS E, an analytic form of the representation (9) involves equality of S and a linear inequality system describing P . In the same manner, it is called polyhedral-surfaced (f-PSR). It belongs to convex mixed f-representations. If S is an ellipsoid, the f-representation is polyhedral-ellipsoid (f-PER).
The Constrained Knapsack Problem
2.3
237
Polyhedral-Spherical Method
The polyhedral-spherical method (PSM) [20] is a Branch and Bound technique developed for the exact solution of unconstrained Boolean quadratic problems (UBQP) min f (x), where f (x) is a quadratic function. x∈Bn
An underlying idea of PSM is to construct a convex extension F (x) of f (x) from E = Bn onto Rn [27,28] and switch from UBQP to (UBQP’) min F (x). x∈Bn
Next, polyhedral sphericity of set Bn [28] is utilized, which means representability of Bn as an intersection of a hypersphere S and a unit hypercube P = P Bn . A polyhedral relaxation of UBQP’ (PR) min F (x) is a convex optimization probx∈P
lem over a hypercube, while its spherical relaxation (SR) min F (x) is a problem x∈S
of optimization of a convex function on a hypersphere. They are both effectively solvable [7,28]. During a PSM implementation [20], solutions xP , z P of PR and xS , z S of SR are combined in order to tighten the lower bound on z ∗ . Projections of xP and xS onto Bn are feasible and yield an upper z ∗∗ on the optimal value. Process terminates if the lower bound z l = max z P , z S = z ∗∗ resulting in x∗ = x∗∗ . Otherwise, a binary search tree is designed, where branching utilizes the 2-levelness of Bn ; thus, it decomposes into two parallel hyperplanes. First, an auxiliary problem of finding the nearest hyperface α of the polytope P to the point xP in the direction xS − xP is solved. It results in a partition of E into E α = {x ∈ E : x ∈ α}, E α¯ = {x ∈ E : x ∈ / α} = {x ∈ E : x ∈ α}
(10)
lying in parallel planes α, α inducing two nodes of the tree. Then, the node associated with E α is examined, next – with E α¯ . The corresponding subproblems minα F (x), minα¯ F (x) are lower dimensional UBQP’-type problems. The Branch
x∈E
x∈E
and Bound process continues similarly until all prospective nodes of the tree have been viewed. Finally, set x∗ = x∗∗ .
3
Theoretic Part
This section is dedicated to modelling P -2LSs such as a feasible set of KCFG. 3.1
Polyhedral-Ellipsoid f-Representations
Theorem 2. If a full-dimensional FPC E ⊂ Rn has a strict quadratic frepresentation (ai x − bi )2 = c2i , i ∈ JM ,
(11)
where ai ∈ Rn , bi ∈ R1 , ci ∈ R1>0 for each i ∈ JM , then E has a f-PER E = P ∩ El,
(12)
238
O. Pichugina and L. Koliechkina
where P : −ci ≤ a i x − bi ≤ ci , i ∈ JM ; El :
M
2 (a i x − bi ) =
i=1
M
c2i .
(13) (14)
i=1
Proof. Let E be a set given by (13), (14). In this notation, the relation (12) becomes E = E . The inclusion E ⊆ E is obvious because (11) holds for any x ∈ E that entails (13), (14). The reverse inclusion E ⊇ E will be proved /E by contradiction. Assume that this inclusion is false. This means that ∃x ∈ such that x ∈ El, x ∈ P . By assumption, E is full-dimensional, wherefrom there exists x0 ∈ Int P , where Int P is an interior of P . From the inclusion P ⊂ El = conv El follows x0 ∈ Int El, thus in the left-hand side of (14) is a strictly quadratic convex function; hence El is a n − 1-dimensional ellipsoid. x ∈ El implies that x is an extreme point of El. Together with x ∈ P , it means / E and E is a VLS, we have x ∈ vert P \E. At that x ∈ vertP . Now, from x ∈ most M − 1 of the constraints (13) are active at x ; otherwise, x satisfies (11) and belongs to E. Let I = ∅ (15) 2 2 be a set of inactive constraints at x , then (a i x − bi ) = ci , i ∈ I; (ai x − 2 2 bi ) = ci , i ∈ JM \I. Since x ∈ P , then, by (13), this condition can written as 2 2 2 = c2i , i ∈ JM \I. Adding all the relations, (a i x − bi ) < c i , i ∈ I; (ai x − bi ) M M by (15), we get i=1 (ai x − bi )2 < i=1 c2i ⇒ x ∈ / El. Contradiction. Hence E = E holds.
(11) can be written equivalently as a i x − bi ∈ {±ci }, i ∈ JM that makes E, given by the f-representation (11), similar to a 2LS (see Remark 2). The only difference is that, instead of P = conv E, any relaxation polytope P satisfying (9) can participate in (13). With this regard, we introduce the following 2LS generalization. Definition 4. An FPC E is called two-level in a polytope P (P -2LS) if, in the direction of an arbitrary facet of polytope P , it is decomposed exactly into two parallel planes. Equivalently, E is a P -2LS if every P -facet-defining affine function takes exactly two distinct values on E. Note that E in the formulation of Theorem 2 is a P -2LS. Corollary 1. Any full-dimensional P -2LS E allows a PER (12). 3.2
KCFG Modelling
The capacity constraint (3) can always be written as a two-sided inequality (CC) : b ≤
n i=1
ai xi ≤ b, where b ∈ Z1+ .
(16)
The Constrained Knapsack Problem
239
Generalizing the CCC-constraints from conflict to forcing ones, we get a model of KCFG, further referred to as its Clique formulation (KCFG.CF) involving CCC and clique forcing constraints (CFC): find a solution of (2) subject to constraints (6), (16), xi ≤ 1, C ∈ C(V ); (17) (CCC) i∈C
(CF C)
xi ≥ |C | − 1, C ∈ C (V ),
(18)
i∈C
where C(V ) and C (V ) are sets of cliques in G and G , respectively. It is seen, the models KCFG.EF and KCFG.CF are of SCKP-type. Introduce another model of KCFG titled the mixed KCFG formulation (KCFG.MF): find a solution of (2) subject to constraints (6), (16), (19) i∈Ej xi ≤ 1, j ∈ Jm , (20) i∈E xi ≥ |Ej | − 1, j ∈ Jm , j
where Ej is a set of conflict (incompatible) vertices in G (j ∈ Jm ), and Ej is a set of conflict nodes in G (j ∈ Jm ). This means that, on the one hand, each Ej ⊆ C and Ej ⊆ C . On the other hand, edges of the cliques {Ej }j∈Jm covers the edge set E and {Ej }j ∈Jm covers E . The advantage KCFG.CF is its ability to utilize an approximate solution of the Clique problem, which can be found by various heuristics (see for instance [1]). Another plus of this formulation is that it generalities KCFG.EF and KCFG.CF, as well as ILP formulations of KCG and KFG. Therefore, in the study, we will focus on modeling and solving KCFG where KCFG.MF is utilized. Let E be a feasible domain of KCFG, i.e., E is an FPC given by (6), (16), (19), (20). Now, KCFG becomes (2), x ∈ E.
(21)
We reduce KCFG to a series of ILP KCFG.MF(k) having a form of (2), x ∈ Ek ,
(22)
where k ∈ JK ,K = {K, ..., K }. Here, the range JK ,K depends on b , b, every Ek is a feasible domain of KCFG.MF(k), which is a Pk -2LS for a polytope Pk . Construct KCFG.MF(k). A polyhedral relaxation of KCFG.MF is formed from it by replacing (6) by 0 ≤ xi ≤ 1, i ∈ Jn .
(23)
The feasible domain of the relaxation problem is a polytope P given by the H-representation (16), (19), (20), (23). Let us analyze the levelness of E with
240
O. Pichugina and L. Koliechkina
respect to P -faces-support hyperplanes corresponding to inequalities in it: (24) i∈Ej xi ∈ {0, 1}, j ∈ Jm ; (25) i∈E xi ∈ { Ej − 1, Ej }, j ∈ Jm ; j
xi ∈ {0, 1}, i ∈ Jn ; n 1 i=1 ai xi ∈ [b , b] ∩ Z+ .
(26) (27)
From (24)–(26), the affine functions presented in these relations take exactly two distinct values on E. However, the affine function a x can take on E any integer value in the range [b , b]. Now, we replace the condition (21) by (22), k ∈ JK ,K . For that, we rewrite (27) depending on the parity of b , namely, if b is even (further, Case 1), then 2k ≤ a x ≤ 2k + 1, k ∈ JK ,K , K =
b 2 ,K
=
b 2
(28) ;
(29)
if b is odd (further, Case 2), then 2k − 1 ≤ a x ≤ 2k,
k ∈ JK ,K , K =
b −1 2 ,K
=
b−1 2
(30) .
(31)
In the same manner as (24)–(26), we combine (28) and (30): a x ∈ {2k, 2k ± 1},
(32)
where 2k + 1 corresponds to Case 1 and 2k − 1 – to Case 2. For k ∈ JK ,K , introducing FPCs: in Case 1, Ek = {x ∈ E : 2k ≤ a x ≤ 2k + 1}, in Case 2, Ek = {x ∈ E : 2k − 1 ≤ a x ≤ 2k},
(33) (34)
we get that a set {Ek }k∈JK ,K forms a partition of E. Combining the above discussion with (33), (34) we can conclude that each Ek is a Pk -2LS for Pk = {x ∈ P : 2k ≤ a x ≤ 2k + 1} in Case 1 and Pk = {x ∈ P : 2k − 1 ≤ a x ≤ 2k} in Case 2. Thus we came to the following result. Let x∗ , z ∗ be an optimal solution of KCFG, and zk∗ be an optimal value of KCFG.MF(k). Let x∗ , z ∗ be an optimizer and an optimal value in KCFG, respectively. Proposition 1. KCFG is reducible to solution of problems KCFG.MF(k) for k ∈ JK ,K on the Pk -2LS Ek defined by (33) in Case 1 and by (34) in Case 2. Namely, z ∗ = max zk∗ , x∗ = argmax z ∗ , where zk∗ = max c x if Ek = ∅, k
otherwise, zk∗ = −∞ (k ∈ JK ,K ).
x∈Ek
Model KCFG(k).QF Fix k ∈ JK ,K (see (29), (31)). Let us build a KCFG(k)formulation with all constraints in the form of quadratic equality constraints (further referred to as a quadratic KCFG(k)-formulation, KCFG(k).QF).
The Constrained Knapsack Problem
241
Let us use the following observation: ∀a ∈ Rn \{0}, ∀b ∈ R1 , a condition a x ∈ {b−1, b} can be written as a x − b + 0.5 = 0.5 or (a x−b+0.5)2 = 0.25. Applying this relation to conditions (24)–(26), (32), respectively, we get: ( i∈Ej xi − 0.5)2 = 0.25, j ∈ Jm ; (35) 2 ( (36) xi − E − 0.5) = 0.25, j ∈ Jm ; j
i∈Ej
x2 − xi = 0, i ∈ Jn ; n i ( i=1 ai xi − 2k ∓ 0.5)2 = 0.25,
(37) (38)
where sign “-” corresponds to Case 1 and sign “+” – to Case 2. We came to a strict quadratic f -representation (35)–(38) of Ek and to KCFG(k).QF in the form of (2), (35)–(38). Respectively, KCFG(k).QF is given by (35)–(38), where k ∈ JK ,K . Model KCFG(k).PEF KCFG(k).PEF is a model of type (11), where the number of relations is M = m+m +n+1 and every ci = 0.5. That is why, Theorem 2 can be applied to constructing a f-PER of Ek and a new ILP formulation of KCFG based on its utilization (further referred to as a polyhedral-ellipsoid formulation of KCFG(k), KCFG(k).PEF). We combine all the equalities (35)–(38) in the ellipsoid equation (14) obtaining the following. Corollary 2. (from Theorem 2) A set Ek (see (33), (34)) enables an f-PER (19), (20), (23), (28) in Case 1 ( (30) in Case 2), n
(x2i − xi ) +
i=1
j=1
+
m (( xi − 0.5)2 − 0.25)+
m j =1
((
i∈Ej
xi − Ej − 0.5)2 − 0.25) + (ai xi − 2k ∓ 0.5)2 − 0.25 = 0,
i∈Ej
i∈Jn
where sign “-” in ∓ is associated with Case 1 and sign “+” – with Case 2. Equivalently, the ellipsoid El is given by equation ⎛⎛ ⎞ ⎞2 m ⎜⎝ ⎟ xi ⎠ − xi ⎠ x Ix − e x + ⎝ j=1
i∈Ej
i∈Ej
⎞ m 2 ⎟ ⎜⎜ ⎟ + xi ⎠ − (2 Ej + 1) xi Ej + Ej ⎠ + ⎝⎝ ⎛⎛
j =1
⎞2
i∈Ej
(39)
i∈Ej
+ (x Ax − (4k ± 1)a x + 4k 2 ± 2k) = 0, where I is an identity matrix, A = aa , e is a vector of ones. The corresponding KCFG-model KCFG(k).PEF is described by (19), (20), (23), (28) in Case 1 ((30) in Case 2), where k ∈ JK ,K .
242
O. Pichugina and L. Koliechkina
Model KCFG.PF. In this model titled the KCFG polynomial formulation (KCFG.PF) we replace the inequalities CCC and CFC in KCFG.MF by polynomial equations xi = 0, C ∈ C(V ); (1 − xi ) = 0, C ∈ C (V ), (40) i∈C
i∈C
that generalize an equivalent representation xi xj = 1 of an edge constraint xi + xj ≤ 1 [25] to CCC and CFC. Models KCFG.MF’, KCFG.PF’. A quadratic model KCFG.MF’ is formed from the ILP model KCFG.MF by replacing (6) by x2i − xi = 0, i ∈ Jn . From KCFG.PF, another quadratic model is formed by the same replacement of (6) and iterative decreasing the polynomial’s degree in (40) by substitutions x2i = xi , xi xj = yij ∈ {0, 1}.
4 4.1
Applications of Quadratic KCFG-formulations New Upper Bounds on z ∗
In Sect. 3.2, four quadratic models (KCFG.QF, KCFG.PEF, KCFG.MF’, KCFG.PF’) were built. Optimal values of z for semi-definite relaxations [17] of u u u u (QF ), zsd (P EF ),zsd (M F ), zsd (P F ). these problems yield the upper bounds zsd The Quadratic dual bounds (N.Z.Shor) [25] are related to solving convex nonsmooth programs effectively solvable by Shor’s r-algorithm. They form another u u u u (QF ), zqd (P EF ), zqd (M F ), zqd (P F ). They are all tighter set of the bounds zqd than the optimal value of the polyhedral relaxation z P = max c x [26]. One x∈P more bound z El can be found by solution xElk , z Elk of ellipsoidal relaxations of KCFG.PEF(k), k ∈ JK ,K with further choice z El = max z Elk . Every z Elk k
can be found explicitly. Indeed, let an ellipsoid El is given in the form of (x − x0 ) D(x − x0 ) = 1,
(41)
where D 0, x0 is a center of El.
Theorem 3. A solution to an optimization problem maxn c x subject to (41) is given by the formula:
x∈R
√ D−1 c xmax = x0 + √ , z max = c x0 + c D−1 c. c D−1 c 4.2
(42)
Polyhedral-Ellipsoid Method
The bound z El is weaker than all other bounds listed above. However, it can easily found by (3), and the point Theorem is can be xmax can be used further used in Branch and Bound and cutting plane methods. In this section, we present such a Branch and Bound methods that utilizes z El -type upper bounds only.
The Constrained Knapsack Problem
243
PEM Outline. In this paper, we offer a generalization of the PSM called the polyhedral-ellipsoid method (PEM) for solving linear optimization problems on a P -2LS. It is intended to solve a problem (2), x ∈ E, where E is a P -2LS, E ⊆ Bn , dim P = n.
(43) (44)
Step 0. Let z [.] = f (x[.] ) = c x. Apply a heuristic to find an initial feasible solution x∗∗ , z ∗∗ , otherwise, set z ∗∗ = −∞. Step 1. Build an E-PER (12) with El given by (41) and move to consideration of an equivalent problem (19), x ∈ P , x ∈ El.
(45)
S
Step 2. Find xS , z – a solution of an ellipsoid relaxation (ER) (19), (45) by Theorem 3. Verify a condition xS ∈ P . If it holds, then x∗ , z ∗ = xS , z S , terminate. Otherwise, go to Step 3. The current upper bound is z u = z S . Step 3. Project xS onto Bn getting y S . If y S ∈ E, update z ∗∗ = max{z ∗∗ , c y S }, x∗∗ = argmaxz ∗∗ , set the current lower bound z l = z ∗∗ + 1. The process terminates if z l > z u . Step 4. By assumption, E satisfies (43), (44), i.e., P is given by (13), where ci > 0, i ∈ JM thus P : −ci + bi ≤ a i x ≤ ci + bi , i ∈ JM . Determine values a xs −b −c
a xs −b +c
hi = { i |ai |i i }+ , hi = { i |ai |i i }+ , i ∈ JM , where {x}+ = max{0, x}. / P , there exists a violated constraint Evaluate Hi∗ = max {hi , hi }. Since, xS ∈ i∈JM
thus Hi∗ > 0. If Hi∗ = hi∗ , then the planes α and α ¯ in (10) are selected by ¯ : a rule: α : a i∗ x = bi∗ + ci∗ ; α i∗ x = bi∗ − ci∗ . If Hi∗ = hi∗ , then α : ai∗ x = α α ¯ ¯ : ai∗ x = bi∗ + ci∗ . Form the E-partition E = E ∪ E into sets (10). bi∗ − ci∗ ; α Step 5. Proceed to an examination of the search-tree node induced by E α . Now, a problem under consideration is: (2), x ∈ E α , where E α is a P α -2LS, P α = P ∩ α, dim P α ≤ n − 1, while P α = conv E α . For reducing it to optimization on a full-dimensional FPC, a projection into the corresponding lower dimensional space is conducted, then Steps 1–4 are repeated for the problem. The same is performed for the E α¯ -induced tree node, when a problem (2), x ∈ E α¯ of type (43) is solved. This process continues iteratively until termination. PEM Illustration. We will demonstrate the PEM by the example of the KFG instance with 7 vertices, 5 conflicts from [6]. Its parameters are: n = 7, m = 5, a profit vector c = (3, 2, 3, 4, 3, 5, 4), a weights’ vector a = (1, 1, 2, 3, 3, 6, 5), a knapsack capacity b = 8. Conflict constraints are x1 + x2 ≤ 1, x2 + x4 ≤ 1, x3 + x4 ≤ 1, x4 + x6 ≤ 1, x5 + x6 ≤ 1.
(46)
The corresponding conflict graph G has a size m = 5 and an order n = 7, where there are no cliques of order grater than two. Thus, we have KCG in the form of KCFG.EF, where the forcing constraints (20) are absent and E1 = {1, 2}, E2 = {2, 4}, E3 = {3, 4}, E4 = {4, 6}, E5 = {5, 6}. Note that (ci /ai )i =
244
O. Pichugina and L. Koliechkina
(3, 2, 3/2, 4/3, 1, 5/6, 4/5), thus the elements of P are sorted in nonincreasing order of profit-over-weight ratio. Let us build the model KCFG.QF of this problem. For this purpose, we find b in CC (16) estimating a x from below. P is given by (46), 7 (47) i=1 ai xi ≤ 8, 0 ≤ xi ≤ 1, i ∈ J7 . (48) First, we attempt to find a feasible solution x∗∗ of the KGF applying the greedy heuristic, where the items are examined in the given order. At the same time, the absence of conflicts is monitored [6]. Let us put the item P1 into the knapsack P ∗∗ , thus x∗∗ 1 = 1, and the remained space is b = 7. Given (46), the next item can be selected from P3 − P7 . Placing the next available item P3 we get x∗∗ 3 = 1 and the remained room of 5. Given (48), for the next item, the choice is restricted to the items P5 − P7 . When P5 is chosen, x∗∗ 5 = 1, then there is still available space of 2. Now, given (46), the item P7 can be selected without causing conflicts but its weight is a7 = 5 > 2. Thus, an admissible knapsack P ∗∗ = {P1 , P3 , P5 }, it corresponds to a feasible KFG-solution x∗∗ = (1, 0, 1, 0, 1, 0, 0). The P ∗∗ -profit is z ∗∗ = 9, and the weight is a x∗∗ = 6 < 8. Taking into account the integrity of the problem parameters, we set a new lower bound z l = z ∗∗ + 1 = 10. Consider a problem min a y, s.t. c y ≥ z l , y ∈ P . Its solution is y ∗ , g ∗ , where y ∗ = (1, 0, 0, 1, 1, 0, 0), g ∗ = a y ∗ = 7, so (47) can be refined to CC 7 ≤ a x ≤ 8.
(49)
Thus, in (47), b = 7, b = 8, hence the range JK ,K = {4}. So, we deal with KCG=KCFG(4). The polytope P = P4 is given by the inequalities (46)–(49), and the feasible domain E = E4 . Let us build the model KCFG(4).PEF. The constraint (39) takes the form of 7 2 2 2 2 2 i=1 (xi − xi ) + (x1 + x2 ) + (x2 + x4 ) + (x3 + x4 )+ (x4 + x6 ) 7 2 2 +(x5 + x6 ) + x aa x − (4k − 1)a x + 4k + 2k − i=1 xi −(x1 + x2 ) − (x2 + x4 ) − (x3 + x4 ) − (x4 + x6 ) − (x5 + x6 ) = 0, where k = 4. Rewriting it as x A x+b x+c = 0, A 0 and find a center of El x0 = − 21 A−1 b = (0.35, 0.30, 0.37, 0.26, 0.35, 0.30, 0.50). Now, substituting the feasible point x∗∗ into equation (x − x0 ) A (x − x0 ) = Δ, we get a normalization coefficient Δ = 2.98 and come too the equation (41) with D = A /Δ. ∗t ∗t Two sequences will be formed: {x∗t , z ∗t }t , {y ∗t , z ∗∗t }t . Here, z S = Sc x ∗t, ∗t ∗0 ∗0 = { x ,z ; y x are optimal ER-solutions at step t, in particular, { x , z is the closest point of B7 to x∗t (y ∗t = PrB7 x∗t ), z ∗∗t = c y ∗∗t . Each point y ∗t is validated on ability to improve the current approximate solution x∗∗ . {z ∗t }t , {z ∗∗t : y ∗t ∈ E}t form sequences of upper and lower bounds on z ∗ , respectively. Introduce a notation N t = N (cdt , ztu ) for a node examined on an iteration t. Here, cdt is a node code, which is a list of planes to the P -description, ztu is an upper bound for this node.
The Constrained Knapsack Problem
245
Step 0. The root node corresponds to the KCFG and iteration t = 0, the node code cd(0) = ∅, thus the node notation is N 0 = N (cd0 , z0u ). By the formula (42) we get x∗0 = (1.28, 0.21, 0.76, 0.53, 0.64, 0.07, 0.26) . z 0∗ = 11.95 is the node upper bound, thus z0u = 11.95. We try to refine the lower bound. For that, a closest Boolean point y ∗0 to x∗0 is derived and checked / E. A feasible on feasibility for the KFG: y ∗0 = P rB7 x∗0 = (1, 0, 1, 1, 1, 0, 0) ∈ solution is not found. At x∗0 , 4 constraint of P are violated. Hi∗ = h1 = 0.35 for branching is achieved for the the node, constraint x1 + x2≤ 1. Therefore, ¯ = x ∈ R7 : x1 + x2 = 0 . In queue, we choose α = x ∈ R7 : x1 + x2 = 1 , α N (∅, 11.95) is replaced by a pair of nodes N ({α} , 11.95), N ({α} , 11.95) viewed in the indicated order. Step 1. t = t + 1 = 1. Explore N 1 = N ({α} , 11.95). The constraint x1 + x2 = 1 is added to (46)–(49), and the resulting problem on ellipsoid of dimension n − 2 = 5 is solved getting x∗1 = (1.07, −0.07, 0.83, 0.68, 0.74, 0.01, 0.28) , z ∗1 = u 11.65 = z1 . The upper bound 11.95 was refined, thus this node gets parameters / E. There was no N 1 = N ({α} , 11.65). y ∗1 = PrB7 x∗1 = (1, 0, 1, 1, 1, 0, 0) ∈ improvement of the current solution x∗∗ . Step 2. t = t + 1 = 2. The queue contains two nodes with upper bounds 11.65 Similarly and 11.95, therefore, next, we analyze the later – N 2 = N ({α} , 11.95). to Step 1, on the plane x1 + x2 = 0, we obtain a ER-solution x∗2 , z ∗2 with z ∗2 = 9.72. Since 9.72 < z l = 10, this node is discarded. Step 3. Now, we branch N 1 . At x∗1 , 3 constraints of P are violated, of which the maximum value Hi∗ = h3 = 0.36 is achieved on separating hyperplane an E-decomposition into parallel planes x3 +x4 = 1. This leads toconsidering + x = 0 . They induce two child β = x ∈ R7 : x3 + x4 = 1 , β¯ = x ∈ R7 : x 3 4 nodes of the tree – N ({α, β} , 11.65) and N ( α, β¯ , 11.65), analyzed one by one. Step 4. t = t + 1 = 3, N 3 = N ({α, β} , 11.65). We solve an ER with additional constraints x ∈ α, β. On the resulting ellipsoid of dimension 4, an optimal solu tion is x∗3 = (1.17, −0.17, 0.41, 0.59, 0.86, 0.07, 0.41) , z ∗3 = 11.33 = z3u , i.e., ∗3 ∗3 this node becomes N ({α, β} , 11.33); y = PrB7 x = (1, 0, 0, 1, 1, 0, 0) ∈ E, z ∗∗3 = 10 > z ∗∗ = 9. Since an improvement of the objective function has been achieved, we refine the current KCFG-solution and the lower bound. The solution is x∗∗ = y ∗3 , z ∗∗ = max {z ∗∗ , 10} = 10, and the lower bound z l = z ∗∗ + 1 = 11. Continuing in the same manner, we add two more pairs of parallel planes in P constraints and explore the induced perspective nodes. No improvement was found, hence x∗ = x∗∗ = (1, 0, 0, 1, 1, 0, 0) , z ∗ = c x∗ = 10, P ∗ = {P1 , P4 , P5 } is an optimal solution of the KCG. Compared this solution with P ∗ = {P1 , P3 , P7 } presented in [6], the same number of examined nodes (eight) and constraints (four) is the same. In [6], all additional constraints fix coordinates of solution, the constraints represent all three groups of (46)–(48).
Conclusion A generalization of the classical Knapsack Problem is attacked, where incompatibility of including and excluding some items must be considered (KCFG). A
246
O. Pichugina and L. Koliechkina
concept of a two-level set in a polytope P (P -2LS) is introduced and applied in modelling KCFG as a quadratically constrained program and in the PolyhedralEllipsoid Method (PEM) of linear optimization over P -2LSs. PEM is illustrated by an example. Shor’s dual quadratic bounds and semi-definite relaxation solutions are offered as upper bounds on KCFG optimal value. The presented KCFG formulations and applications can be directly generalized on all P -2LSs and CKPs reducible to optimization on such sets.
References 1. Bettinelli, A., Cacchiani, V., Malaguti, E.: A branch-and-bound algorithm for the knapsack problem with conflict graph. INFORMS J. Comput. 29, 457–473 (2017). https://doi.org/10.1287/ijoc.2016.0742 2. Birkhoff, G.: Tres observaciones sobre el algebra lineal. Rev. Univ. Nac. Tucum´ an 5(A), 147–151 (1946) 3. Blum, C., Roli, A.: Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput. Surv. 35, 268–308 (2003). https://doi.org/ 10.1145/937503.937505 4. Bohn, A., Faenza, Y., Fiorini, S., Fisikopoulos, V., Macchia, M., Pashkovich, K.: Enumeration of 2-level polytopes. Math. Program. Comput. 11(1), 173–210 (2018). https://doi.org/10.1007/s12532-018-0145-6 5. Chvatal, V.: On certain polytopes associated with graphs. J. Comb. Theor. Ser. B. 18, 138–154 (1975). https://doi.org/10.1016/0095-8956(75)900416 6. Coniglio, S., Furini, F., San Segundo, P.: A new combinatorial branch-and-bound algorithm for the knapsack problem with conflicts. Eur. J. Oper. Res. 289, 435–455 (2021). https://doi.org/10.1016/j.ejor.2020.07.023 7. Dahl, J.: Convex optimization in signal processing and communications (2003) 8. Elhedhli, S., Li, L., Gzara, M., Naoum-Sawaya, J.: A branch-and-price algorithm for the bin packing problem with conflicts. INFORMS J. Comput. 23, 404–415 (2010). https://doi.org/10.1287/ijoc.1100.0406 9. Fiorini, S., Macchia, M., Pashkovich, K.: Bounds on the number of 2-level polytopes, cones, and configurations. Discrete Comput. Geom. 65(3), 587–600 (2021). https://doi.org/10.1007/s00454-020-00181-4 10. Grande, F., Sanyal, R.: Theta rank, levelness, and matroid minors. J. Comb. Theor. Ser. B. 123, 1–31 (2017). https://doi.org/10.1016/j.jctb.2016.11.002 11. Gurski, F., Rehs, C.: Solutions for the knapsack problem with conflict and forcing graphs of bounded clique-width. Math. Methods Oper. Res. 89(3), 411–432 (2019). https://doi.org/10.1007/s00186-019-00664-y 12. Hanner, O.: Intersections of translates of convex bodies. Math. Scand. 4, 65–87 (1956). https://doi.org/10.7146/math.scand.a-10456 13. Hifi, M., Saleh, S., Wu, L.: A fast large neighborhood search for disjunctively constrained knapsack problems. In: Fouilhoux, P., Gouveia, L.E.N., Mahjoub, A.R., Paschos, V.T. (eds.) ISCO 2014. LNCS, vol. 8596, pp. 396–407. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09174-7 34 14. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations, pp. 85–103. Springer, Boston (1972). https://doi.org/10.1007/978-1-4684-2001-2 9 15. Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, Berlin (2010)
The Constrained Knapsack Problem
247
16. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, Chichester (1990) 17. Klerk, E. de: Aspects of Semidefinite Programming: Interior Point Algorithms and Selected Applications. Springer, Dordrecht (2002) https://doi.org/10.1007/ b105286 18. Lazarev, A., Salnikov, A., Baranov, A.: Graphical algorithm for the knapsack problems. In: Malyshkin, V. (ed.) PaCT 2011. LNCS, vol. 6873, pp. 459–466. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23178-0 41 19. Pferschy, U., Schauer, J.: Approximation of knapsack problems with conflict and forcing graphs. J. Comb. Optim. 33(4), 1300–1323 (2016). https://doi.org/10. 1007/s10878-016-0035-7 20. Pichugina, O., Yakovlev, S.: Continuous approaches to the unconstrained binary quadratic problems. In: B´elair, J., Frigaard, I.A., Kunze, H., Makarov, R., Melnik, R., Spiteri, R.J. (eds.) Mathematical and Computational Approaches in Advancing Modern Science and Engineering, pp. 689–700. Springer, Cham (2016). https://doi. org/10.1007/978-3-319-30379-6 62 21. Pichugina, O., Yakovlev, S.: Euclidean combinatorial configurations: typology and applications. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON 2019) Conference Proceedings, pp. 1065–1070. Lviv, Ukraine (2019). https://doi.org/10.1109/UKRCON.2019.8879912 22. Pichugina, O., Yakovlev, S.: Euclidean combinatorial configurations: continuous representations and convex extensions. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. pp. 65–80. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26474-1 5 23. Pogorelov, A.V.: Extrinsic Geometry of Convex Surfaces. American Mathematical Society, Providence (1973) 24. Stanley, R.P.: Two poset polytopes. Discrete Comput Geom. 1, 9–23 (1986). https://doi.org/10.1007/BF02187680 25. Shor, N.Z.: Nondifferentiable optimization and polynomial problems. Kluwer Academic Publishers, Dordrecht (1998) 26. Stetsyuk, P.I.: Dual bounds in quadratic extremal problems. Eureka, Chisinau (2018) 27. Yakovlev, S.: Convex extensions in combinatorial optimization and their applications. In: Butenko, S., Pardalos, P.M., Shylo, V. (eds.) Optimization Methods and Applications. SOIA, vol. 130, pp. 567–584. Springer, Cham (2017). https://doi. org/10.1007/978-3-319-68640-0 27 28. Yakovlev, S., Pichugina, O.: On constrained optimization of polynomials on permutation set. In: Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems (CMIS-2019), pp. 570–580. CEUR Vol2353 urn:nbn:de:0074–2353-0, Zaporizhzhia, Ukraine (2019) 29. Yakovlev, S., Pichugina, O., Koliechkina, L.: A lower bound for optimization of arbitrary function on permutations. In: Babichev, S., Lytvynenko, V., W´ ojcik, W., Vyshemyrskaya, S. (eds.) ISDMCI 2020. AISC, vol. 1246, pp. 195–212. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54215-3 13 30. Yamada, T., Kataoka, S., Watanabe, K.: Heuristic and exact algorithms for the disjunctively constrained knapsack problem. Inf. Process. Soc. Jpn. J. 8, 191–205 (2002)
NP-Hardness of 1-Mean and 1-Medoid 2-Clustering Problem with Arbitrary Clusters Sizes Artem V. Pyatkin1,2(B) 1
Sobolev Institute of Mathematics, Koptyug Avenue 4, Novosibirsk 630090, Russia 2 Novosibirsk State University, Pirogova Street 2, Novosibirsk 630090, Russia
Abstract. We consider the following 2-clustering problem. Given n points in Euclidean space, partition it into two subsets (clusters) so that the sum of squared distances between the elements of the clusters and their centers would be minimum. The center of the first cluster coincides with its centroid (mean) while the center of the second cluster should be chosen from the set of the initial points (medoid). It is known that this problem is NP-hard if the cardinalities of the clusters are given as a part of the input. In this paper we prove that the peoblem remains NP-hard in the case of arbitrary clusters sizes.
Keywords: Euclidean space NP-hardness
1
· Mean · Medoid · 2-clustering · Strong
Introduction
The object of study in this paper is 2-clustering, i.e. partition a set of points in Euclidean space into two non-empty clusters according to some similarity criteria. The aim of the paper is to prove NP-hardness of one particular 2-clustering problem in the case of non-fixed cardinalities of the clusters. The research is motivated by the fact that the computaional complexity of the problem remained unknown up to date. Note that clustering (partitioning a set of some objects into non-empty subsets containing similar objects) is one of the most actual problems in data analysis, data mining, computational geometry, mathematical statistics and discrete optimization [3,4,8]. It is indeed a wide class of problems differing by the number of clusters, similarity criteria, cluster cardinalities constraints, etc. One of the most usual similarity criteria is the minimum of the squared distances from the elements of the cluster to some point called a center of the cluster. There could be following constraints on the choice of the center: The research was supported by the program of fundamental scientific researches of the SB RAS, project 0314-2019-0014 and by the Russian Foundation for Basic Research, project 19-01-00308. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 248–256, 2021. https://doi.org/10.1007/978-3-030-86433-0_17
NP-Hardness of 2-Clustering Problem
249
– An arbitrary point (no restrictions) – A point from the given set – A given (fixed) point of the space. This choice of centers types can be motivated by the following problem. Asssume that several sensors should be placed for monitoring the situation in towns of a given area. Some sensors are autonomous and could be placed anywhere; others need regular service and so they should be put in towns. There could be also sensors that had been put earlier and cannot be moved (fixed). Since the energy consumption is proportional to the square of the distances, the clustering problem with different centers types can be interpreted as location problem for the sensors of two types taking into account existing sensors and minimizing the total energy consumption. If all points of a cluster C are known and the center can be chosen arbitrarily then it is easy to prove by taking partial derivations that the optimal center coincides with the so-called centroid defined as y∈C y . y(C) = |C| If this constraint must hold for all cluster centers then one get a classical problem MSSC (minimum sum-squared clustering) [6,18] also known as k-means where k is the number of clusters. So, we call such centers means. If the center of the cluster must coincide with one of the points from the initial set then we call it medoid1 . Finally, the third type of centers (a fixed point) is called given. In 2-clustering, if both clusters have the same requirement on the center (mean, medoid or given), then we denote the problem, respectively, 2-mean, 2-medoid or 2-given. Note that 2-mean 2-clustering problem is the same as the classical 2-means. However, it is possible that different clusters have different constraints on their centers. In this case, the corresponding problems are denoted, respectively, 1-mean and 1-medoid, 1-mean and 1-given, or 1-medoid and 1given. Also, we distinguish whether the cluster cardinalities are fixed (given as a part of input) or can be chosen arbitrarily. Clearly, if a problem with fixed clusters cardinalities is polynomially solvable then the problem with arbitrary clusters sizes is polynomially solvable as well—just consider all n − 1 possible sizes of the first cluster (where n is the number of points) and choose the best solution. And visa versa, the NP-hardness of a probem with arbitrary clusters cardinalities implies the NP-hardness of the one with fixed sizes of the clusters. It is easy to see that 2-given 2-clustering problem is polynomially solvable. Indeed, it is sufficient to show this only in case of fixed cardinalities. So, if the size of the first cluster is M then just consider for each point the difference between the squared distances from the first and the second center, and choose 1
The term medoid was introduced in [12] as a representative object of a cluster within a data set whose average dissimilarity to all the objects in the cluster is minimal. Although by dissimilarity usually the distance is meant, applying it for the square of the distance does not contradict the initial definition.
250
A. V. Pyatkin
among them M minimal ones. This implies the polynomial solvability (both for fixed or arbitrary clusters cardinalities) of 1-medoid and 1-given 2-clustering problem that can be reduced to n instances of 2-given 2-clustering problems, and of 2-medoid 2-clustering problem that can be reduced to n2 instances of 2-given 2-clustering problems. If at least one cluster center must be a mean then the problem becomes much harder. It is known [1] that the problem k-means in the case of the arbitrary cluster sizes in NP-hard even for k = 2. Then by the remark above, the fixed cardinalities version of 2-mean 2-clustering problem is also NP-hard. If both the space dimension d and the number of clusters k are fixed then k-means is polynomially solvable [11]; however, if k is a part of the input, then it remains NP-hard even in the planar case [19], i.e. for d = 2. Note also that a PTAS is known [17] O(1) ). for 2-means finding an (1 + ε)-approximate solution in time O(dn2(1/ε) The 2-clustering problem 1-mean and 1-given was proved to be NP-hard both for the cases of fixed [2,9] and arbitrary [14,15] cardinalities. Note that both these variants admit polynomial 2-approximation algorithms of complexity O(dN 2 ); for the fixed cardinalities such algorithm can be found in [5], while for the arbitrary arbitrary cardinalities—in [13]. For any fixed space dimension d this problem is polynomially solvable in case of fixed cluster cardinalities. The first algorithm of complexity O(dN 2d+2 ) was suggested in [10]; the best known algorithm of complexity O(dN d+1 ) can be found in [21]. Finally, the fixed cardinalities version of 1-mean and 1-medoid problem was studied in [16] where its NP-hardness was proved (medoid was erraneously called median there). So, the only case of unknown computation complexity up to date was the 1-mean and 1-medoid 2-clustering problem in the case of arbitrary cluster cardinalities. This paper closes this final open case by showing that it is NP-hard. Note that the reduction used in the proof is similar to the one used in [20] for a subset choice problem. For the convenience, the review of all cases is given in Table 1, where the contribution of the current paper is shown in bold. Table 1. Complexity of various 2-clustering problems Centers types
Clusters cardinalities Fixed Arbitrary
2-given
Polynomially solvable Polynomially solvable
2-medoid
Polynomially solvable Polynomially solvable
2-mean
NP-hard [1]
NP-hard [1]
1-medoid and 1-given Polynomially solvable Polynomially solvable 1-mean and 1-given
NP-hard [2, 9]
1-mean and 1-medoid NP-hard [16]
NP-hard [14, 15] NP-hard (this paper)
NP-Hardness of 2-Clustering Problem
251
The paper is organized as follows. In the next section the strict formulation of the considered problem is given and some preliminary results are proved. In Sect. 3 the main complexity result is presented. The last section contains concluding remarks.
2
Preliminaries
Call a cluster trivial if it contains only one element. Clearly, for a trivial cluster both mean and medoid center coincides with the cluster element, and the contribution of such cluster into the objective function is zero. Therefore, it looks reasonable to require in 1-mean and 1-medoid 2-clustering problem that the clusters are non-trivial because, otherwise, the specifics of the cluster can be lost. Note also that there are only n possible trivial clusters, so their excludance does not narrow the problem much. We make use of the following well-known folklore identity (the proof can be found, for instance, in [15]): 2 y∈C z∈C y − z 2 . (1) y − y(C) = 2|C| y∈C
Using (1), we may formulate 1-mean and 1-medoid 2-clustering problem as follows: Problem 1. Given a set of points Y = {y1 , . . . , yN } in Eulidean space Rd , find a subset C ⊆ Y of cardinality t ∈ [2, N − 2] and a point x ∈ Y minimizing the objective function 1 y − z2 + y − x2 . 2|C|
f (C, x) =
y∈C z∈C
y∈Y\C
The following property of the optimal solution is easy but useful. Proposition 1. If t ∈ [3, N − 2] then the center x of the second cluster does not lie in C. Proof. If t ∈ [3, N − 2] and x ∈ C then consider the cluster C = C \ {x}. Clearly, the second addend in f (C, x) does not change, and we have y,z∈C
f (C, x) − f (C , x) =
2t
y∈C
=
y∈C
z − y2 + 2 y − x2 t
y − x − 2
y∈C
y − x2
−
y∈C
y,z∈C
−
y,z∈C
z − y2
2(t − 1)
z − y2
2t(t − 1) y − y(C )2
≥ 0. t follows from the well-known fact that the function g(x) = The last inequality 2 y∈C y − x reaches its global minimum at the centroid of the cluster C , i.e. at x = y(C ). =
252
3
A. V. Pyatkin
Main Result
Reformulate Problem 1 as a decision problem: Problem 1’. Given a set of points Y = {y1 , . . . , yN } in Eulidean space Rd and a number K > 0, are there a subset C ⊆ Y of cardinality t ∈ [2, N − 2] and a point x ∈ Y so that f (C, x) ≤ K? We need the following known NP-complete variant of the exact cover by 3-sets problem [7] where each vertex belongs to at most three subsets: Problem X3C3. Given a 3-uniform hypergraph of maximum degree 3 on n = 3q vertices, is there a subset of q edges covering all its vertices? In other words, there is a set of vertices V = {v1 , . . . , vn } where n = 3q and a collection of edges (subsets) E = {e1 , . . . , em } such that each ei ⊆ V, |ei | = 3 and every vj lies in at most three edges; the question is whether there is a subset E0 ⊆ E of cardinality q such that ∪e∈E0 e = V ? Note that we may assume m > q + 2 since otherwise the problem X3C3 can be solved by brute force in time O(m2 ). Theorem 1. Problem 1 is NP-complete in a strong sense. Proof. Consider an arbitrary instance of X3C3 problem and reduce it to an instance of Problem 1’ in a following way. Put d = 3n+1 = 9q+1 and N = m+1. Choose an integer a so that a2 > max{(m − q − 1)(m − 1)/6, m/18} and let K = 18a2 (m − 1) + m − q. Each hyperedge ei corresponds to a point yi ∈ Y, i = 1, . . . , m and each vertex vj corresponds to three coordinates 3j, 3j − 1, 3j − 2 that are referred to as j-th coordinate triple, j = 1, . . . , n. Denote by yi (k) the k-th coordinate of the point yi . If vj ∈ ei then put yi (3j − 2) = yi (3j − 1) = yi (3j) = 0. Otherwise, define s as the number of hyperedges with lesser indices than i, containing the vertex vj , i.e. s = |{l < i | vj ∈ el }|. Note that s ∈ {0, 1, 2} since the maximum degree of the hypergraph is 3. Put yi (3j − 2) = 2a, yi (3j − 1) = yi (3j) = −a, if s = 0; yi (3j − 1) = 2a, yi (3j − 2) = yi (3j) = −a, if s = 1; yi (3j) = 2a, yi (3j − 2) = yi (3j − 1) = −a, if s = 2. Also, put yi (d) = 1 for all i ∈ {1, . . . , m} and yN (k) = 0 for all k ∈ {1, . . . , d}. Since the hypergraph is 3-uniform, we have yi 2 = yi − yN 2 = 18a2 + 1 and yi − yj 2 ≥ 36a2 for all i, j ∈ {1, . . . , N − 1}, i = j. Note also that the equality yi − yj 2 = 36a2 holds if and only if ei ∩ ej = ∅. If a subset E0 of cardinality q covering all vertices of the hypergraph exists then let C = {yi | ei ∈ E0 } and x = yN . Clearly, f (C, x) = as required.
(q 2 − q)36a2 + (m − q)(18a2 + 1) = 18a2 (m − 1) + m − q = K, 2q
NP-Hardness of 2-Clustering Problem
253
Suppose now that there is a cluster C ⊂ Y of cardinality t ∈ [2, N − 2] and a point x such that f (C, x) ≤ K. Consider two cases. Case 1. Assume yN ∈ Y \ C. In this case, clearly, x = yN . So, the second addend in f (C, x) is (2) (m − t)(18a2 + 1). To calculate the first addend, for i = 0, 1, 2, 3 denote by ai the number of coordinate triples that are non-zero in exactly i points from C. Since the total number of coordinate triples is n and each point from C has exactly three non-zero coordinate triples, we have a0 + a1 + a2 + a3 = n = 3q and a1 + 2a2 + 3a3 = 3t. The contribution of the a0 zero coordinate triples into the first addend of the objective function is 0. Each of the a1 coordinate triples that is non-zero in one point from C contirbutes (t − 1)6a2 /t; so, their total contribution is 6(t − 1)a2 a1 . t
(3)
If a coordinate triple is non-zero in two points from C, it contributes (18a2 + 2(t − 2)6a2 )/t = 6(2t − 1)a2 /t. The total contribution of such triples is 6(2t − 1)a2 a2 . t
(4)
Finally, the total contribution of the triples that are non-zero in three points from C equals (3 · 18a2 + 3(t − 3)6a2 )a3 = 18a2 a3 . (5) t Summing (2)–(5) we get f (C, x) =
6(t − 1)a2 a1 + 6(2t − 1)a2 a2 + 18a2 a3 + (m − t)(18a2 + 1) t
= 6(a1 + 2a2 + 3a3 )a2 + 18(m − t)a2 −
= 18ma2 −
6(a1 + a2 )a2 +m−t t
6(a1 + a2 )a2 6(a1 + a2 )a2 + m − t = K + 18a2 − + q − t. t t
Note that a1 + a2 = 3t − a2 − 3a3 ≤ 3t; hence f (C, x) > K if q > t. Therefore, q ≤ t. If a1 + a2 ≤ 3t − 1 then using t ≤ m − 1 we have f (C, x) ≥ K + 18a2 −
6a2 6(3t − 1)a2 +q−t≥K + +q−m+1>K t m−1
by the choice of a. So, a1 + a2 = 3t = a1 + 2a2 + 3a3 , i.e., a2 = a3 = 0 and a1 = 3t. On the other hand, a0 +a1 = 3q ≤ 3t, giving a0 = 0 and q = t. But then the set E0 = {ei | yi ∈ C} induces a cover of cardinality q in the hypergraph.
254
A. V. Pyatkin
Case 2. Assume yN ∈ C. If x = yN then y∈Y\C x − y2 = (18a2 + 1)(N − t) while if x = yi for some yi ∈ Y \ C then y∈Y\C x − y2 ≥ 36a2 (N − t − 1). Clearly, (18a2 +1)(N −t) < 36a2 (N −t−1) whenever N −t ≥ 3. So, two subcases are available: either x = yN and t = N − 2 or, by Proposition 1, x = yN and t = 2. Consider them separately. Subcase 2a. Let t = N − 2 = m − 1 and x = yN . Then the second addend in the objective function is at least 36a2 . Introduce a0 , a1 , a2 , a3 in the same way as in the Case 1. Note, however, that now a1 + 2a2 + 3a3 = 3t − 3 = 3m − 6 since yN ∈ C. Note also that the last coordinate contributes (m − 2)/(m − 1) into the first addend of the objective function. Using (3)–(5) with t = m − 1 and a1 + a2 ≤ 3q, we have f (C, x) ≥
6(m − 2)a2 a1 + 6(2m − 3)a2 a2 m−2 + 18a2 a3 + + 36a2 m−1 m−1
= 6(a1 + 2a2 + 3a3 )a2 + 36a2 + 1 − ≥ 18ma2 + 1 −
6(a1 + a2 )a2 + 1 m−1
18qa2 + 1 18qa2 + 1 = K + 18a2 − m + q + 1 − m−1 m−1
=K +q+1+
18a2 (m − q − 1) − m2 + m − 1 m−1
m−q−2 mq + 1 =K+ >K m−1 m−1 because a2 > m/18 and m > q + 2. A contradiction. >K +q+1−
Subcase 2b. Let t = 2 and x = yN . Then C = {yi , yN } for some i. So, f (C, x) = (18a2 + 1)/2 + (m − 1)(18a2 + 1) = 9a2 + 1/2 + K + q − 1 > K. Since in both subcases we have a contradiction, Case 2 is impossible. Note that K and all coordinates of points from Y are bounded by a polynomial of m and q. Hence, Problem 1’ is NP-complete in a strong sense.
4
Conclusions
In this paper we have proved that 1-mean and 1-medoid 2-clustering problem remains NP-hard in the case of arbitrary clusters cardinalities. This finishes the classification of the complexity of 2-clustering problems where the centers of the clusters can be either means, or medoids or given points. The question of existence of an approximation algorithm with a guaranteed performance for 1-mean and 1-medoid 2-clustering problem remains open. Anknowledgements. The author is grateful to the unknown referees for their valuable comments.
NP-Hardness of 2-Clustering Problem
255
References 1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sumof-squares clustering. Mach. Learn. 75(2), 245–248 (2009) 2. Baburin, A.E., Gimadi, E.K., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008). https://doi.org/10.1134/S1990478908010043 3. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-28349-8 2 4. Dubes, R. C., Jain, A. K.: Algorithms for Clustering Data. Prentice Hall (1988) 5. Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011) 6. Fisher, W.D.: On grouping for maximum homogeneity. J. Amer. Statist. Assoc. 53(284), 789–798 (1958) 7. Garey, M.R., Johnson D.S.: Computers and Intractability. The Guide to the Theory of NP-Completeness. W. H. Freeman and Company, San Francisco (1979) 8. Ghoreyshi, S., Hosseinkhani, J.: Developing a clustering model based on K-means algorithm in order to creating different policies for policyholders. Int. J. Adv. Compt. Sci. Inf. Technol. 4(2), 46–53 (2015) 9. Gimadi, E.K., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Sib. Zh. Ind. Mat. 9(1), 55–74 (2006). (in Russian) 10. Gimadi, E.K., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(1), 48–53 (2010) 11. Inaba, M., Katoh, N., Imai H.: Applications of weighted Voronoi diagrams and randomization to variance-based clustering. In: Proceedings of the Annual Symposium on Computational Geometry, pp. 332–339 (1994) 12. Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis based on the L1 Norm, pp. 405–416. North-Holland, Amsterdam (1987) 13. Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013). https://doi.org/10. 1134/S1990478913040066 14. Kelmanov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Dokl. Math. 78(1), 574–575 (2008) 15. Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009) 16. Kel’manov, A.V., Pyatkin, A.V., Khandeev, V.I.: NP-hardness of quadratic Euclidean 1-mean and 1-median 2-clustering problem with constraints on the cluster sizes. Dokl. Math. 100(3), 545–548 (2019) 17. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for geometric k-means clustering in any dimensions. In: Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 454–462 (2004) 18. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematics, Statistics and Probability, vol. 1, pp. 281–297 (1967) 19. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)
256
A. V. Pyatkin
20. Pyatkin, A.V.: Easy NP-hardness proofs of some subset choice problems. Comm. Comput. Inf. Sci. 1275, 70–79 (2020) 21. Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016). https://doi.org/10.1134/ S199047891604013X
The Polytope of Schedules of Processing of Identical Requirements: The Properties of the Relaxation Polyhedron R. Yu. Simanchev1,2(B) 2
and I. V. Urazova1(B)
1 Omsk State University, Mira Avenue, 55a, Omsk 644077, Russia Omsk Scientific Center of SB RAS, Marksa Avenue, 15, Omsk 644024, Russia
Abstract. The paper deals with a set of identical requirements processing schedules for parallel machines. The precedence constraints are set, interrupts are prohibited, time is discrete. This set of schedules is the set of feasible solutions to a number of optimization problems which are determined by various objective functions. The paper proposes the set of all schedules as a family of subsets of a finite set and defines the schedule polytope as the convex hull of incidence vectors. The report deals with the affine hull and polyhedral relaxation of the scheduling polytope. Polyhedral relaxation includes nonnegativity constraints, constraints as to the number of machines, and precedence constraints. Using the bH-basis technique, the work shows that constraints for nonnegativity generate facets of the scheduling polytope. Necessary conditions for facet constraints to the number of machines and precedence constraints are found.
Keywords: Scheduling theory problems Polyhedral relaxation · Facet
1
· Scheduling polytope ·
Introduction
The paper discusses the identical requirements processing schedules for parallel machines. These schedules are described through the following conditions. Requirements of the set V , |V | = n, are to be processed by m ≥ 3 parallel identical machines. All requirements to be processed arrive simultaneously (at time k = 0) and have the same (equal to 1) processing times. Interrupts in processing requirements are prohibited. A partial order on V is defined. This partial order relation determines the precedence conditions in processing of the requirements. Any order of processing all requirements of the set V , admissible with respect to the partial order and to the number of machines, is called a feasible schedule or simply a schedule. We consider schedules in which all processes are completed This work was carried out within the governmental order for Omsk Scientific Center SB RAS (project registration number 121022000112-2). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 257–270, 2021. https://doi.org/10.1007/978-3-030-86433-0_18
258
R. Y. Simanchev and I. V. Urazova
by a priori given time d. Moreover, it is obvious that if d is too small, then the set of schedules defined above may turn out to be empty. This set of schedules admits the following formalization [1]. A schedule is a function σ : V −→ D = {1, 2, . . . , d} such that (i) the relation i j implies the inequality σ(i) < σ(j); (ii) for any k ∈ D, there are at most m requirements i ∈ V such that σ(i) = k. The described set of schedules is the set of feasible solutions to a number of optimization problems which are determined by various objective functions such as minimizing the total processing time for all the requirements, minimizing the total weighted processing time for all the requirements. Any partial order on a finite set can be described using a directed acyclic graph. An acyclic digraph defining a partial order on the set of requirements V will be denoted by the letter G. By V and A we will denote the sets of its vertices and arcs, respectively. The vertices of the digraph G will be denoted by the letters i or j. An arc starting at i and ending at j will be denoted by ij. A path in a digraph G is a subgraph generated by the set of vertices i1 , i2 , . . . , it satisfying the condition is is+1 ∈ A for all s = 1, 2, . . . , t − 1. An arc ij is called transitive if in the digraph G there is a path from i to j that is different from the arc ij. Relying on the definition of a schedule, we will further assume that the digraph G does not contain transitive arcs. To exclude a trivial case from our consideration, we will assume that G is not a path. If no two vertices of the set U ⊂ V are connected by precedence conditions, then such a set U will be called independent. The structure of the article is as follows. In Sect. 2 the set of schedules is presented as a family of subsets of a finite set, which makes it possible to apply a polyhedral approach to the analysis of the problem. In Sect. 3 the influence of independent sets of the precedence graph on the combinatorial structure of schedules is considered. These results open up some technical possibilities needed to prove the properties of support inequalities for the scheduling polytope, which are discussed in Sect. 4.
2
The Schedule Polytope and Polyhedral Relaxation
We will formalize the schedules as follows. Let us define the table E = V × D as a set of cells (i, k), assuming that the rows of the table one-to-one correspond to the vertices V of the precedence graph G, and the columns correspond to the times D. The order of the rows does not matter, but the columns are strictly ordered starting with one. Let us assume that V ⊂ V and D ⊂ D. The subtable E(V , D ) formed by the intersection of rows V and columns D of the table E will be called a fragment. The fragment of the set S ⊂ E is the set S(V , D ) = S ∩ E(V , D ). If the column k does not contain cells from the set S ⊂ E (the fragment S(V , D )), then we say that the column k is empty with respect to the set S (the fragment S(V , D ), respectively). According to the above definition a schedule is a subset of cells of the table E of the type {(i, k) ∈ E | σ(i) = k}, where σ is a function satisfying the
The Polytope of Schedules of Processing of Identical Requirements
259
conditions (i )–(ii). This subset of cells will also be denoted by σ ⊆ E. With this approach and the notation introduced above, the condition (ii) can be written as |σ(V, {k})| ≤ m for each k ∈ D. The family of all schedules in E is denoted by Σd . If (i, k) ∈ σ, then we will say that in the schedule σ the vertex i is located in the column k; if the inequality k < l holds for two cells (i, k), (j, l) ∈ σ, then we will say that in the schedule σ the vertex i is located to the left of the column l (vertex j is to the right of the column k). A column of the table E will be called empty with respect to the schedule σ if it contains no vertices. It follows from the precedence conditions given by the digraph G that for any i ∈ V that having incoming arcs there are k ∈ D such that the cells (i, k) certainly do not belong to any schedule. Let us formalize this situation as follows. By pi , i ∈ V , we denote the characteristics of the problem determined by the conditions: – for any schedule σ ∈ Σd the condition {(i, 1), (i, 2), . . . , (i, pi )} ∩ σ = ∅ holds true; – there is a schedule σ ∈ Σd such that (i, pi + 1) ∈ σ . By qi , i ∈ V , we denote the characteristics of the problem determined by the conditions: – for any schedule σ ∈ Σd the condition {(i, d−qi +1), (i, d−qi +2), . . . , (i, d)}∩ σ = ∅ holds true; – there is a schedule σ inΣd such that (i, d − qi ) ∈ σ . The problem of finding the values of the parameters pi and qi is equivalent to one of the well-known problems of scheduling theory, namely, minimizing the total processing time for all requirements: max σ(i) −→ min . i∈V
σ∈Σd
(1)
Indeed, if one more vertex i is added to the precedence graph G and arcs are drawn to it from all vertices having a zero outdegree (contrabase, see [2]), then the search for pi is equivalent to solving the problem of minimizing the total processing time on the graph G. It is clear that a similar reasoning is valid for the parameter qi . We note straight away that problem (1), which resulted in this paper, is N P -hard for an arbitrary number of machines m. If m is fixed, then the complexity status of problem (1) is currently unknown [3]. This fact, to a certain extent, justifies the introduction of the parameters pi and qi . Let us define the set Di = {pi + 1, pi + 2, . . . , d − qi } ⊂ D for each vertex i ∈ V . It follows from the definitions of the parameters pi and qi that the vertex i ∈ V cannot be located outside this set in any schedule. This means that the set of cells in the table E can be reduced to the set Ed = {(i, k) | i ∈ V, k ∈ Di }. It is clear that σ ⊂ Ed for any σ ∈ Σd . The set Ed will be called basic. Having the sets Di , we will define the set Vk = {i ∈ V | k ∈ Di } for each k ∈ D . Let us associate the base set Ed with the Euclidean space REd of dimension |Ed | = i∈V |Di | by means of a one-to-one correspondence between the set Ed and the set of coordinate axes of the space REd . In other words, REd is the
260
R. Y. Simanchev and I. V. Urazova
space of column vectors whose coordinates are indexed by elements (cells) of the set Ed . With each S ⊆ Ed we will associate its incidence vector xS ∈ REd with coordinates xSik = 1 for (i, k) ∈ S and xSik = 0 for (i, k) ∈ / S. The schedule polytope is the set P (Σd ) = conv{xσ ∈ REd | σ ∈ Σd }, where “conv ” is the designation of the convex hull of points in REd . The papers [4,5] show that (0, 1) - vector x ∈ REd is a vector of incidences of the schedule if and only if it satisfies the following system of linear equations and inequalities: xik = 1, i ∈ V, (2) k∈Di
xik ≤ m,
k ∈ D,
(3)
i∈Vk
xik ≤
xjl ,
ij ∈ A, k ∈ Di ,
(4)
l∈Dj , l>k
xik ≥ 0,
i ∈ V, k ∈ Di .
(5)
The polytope defined by constraints (2)–(5) will be denoted by Md . It follows from the above result that Md is a polyhedral relaxation of the polytope P (Σd ) and does not contain integer points other than the schedule incidence vectors, or, which is the same, vertP (Σd ) = Md ∩ Z Ed , where “vert” is the notation for the set of vertices of the polytope, Z Ed is an integer lattice in the space REd . Moreover, since P (Σd ) and Md lie in the unit hypercube of the space REd , then vertP (Σd ) ⊆ vertMd . The family of schedules Σd on a fixed precedence graph G can be considered for different values of the parameter d. It is easy to see that for different d the families Σd satisfy the monotonicity condition in the sense that the inclusion Σd ⊂ Σd holds true for d < d . In this regard, below we will everywhere assume that d = n since the set Σn is certainly not empty for any precedence graph on n vertices. ¯ = {k + 1, k + 2, . . . , k + t} and the Let us assume that S ⊂ E, V¯ ⊆ V , D columns {k +t+1, k +t+2, . . . , k +t+s} are empty with respect to the fragment S(V¯ , D). We will define the set of cells S ⊂ E by the conditions
xSil = 0,
xSil = xSi l ,
xSil = xSil ,
i ∈ V¯ , l = k + 1, k + 2, . . . , k + s,
i ∈ V¯ , l = k + s + 1, k + s + 2, . . . , k + t + s, (i, l) ∈ / E(V¯ , {k + 1, k + 2, . . . k + t + s}).
This transition from S to S will be denoted as S = RVs¯ ,D¯ (S) and called a ¯ of the set S by s columns to the right. A similar shift of the fragment S(V¯ , D) shift by s columns to the left will be denoted as S = LsV¯ ,D¯ (S). It is clear that RVs¯ ,D¯ (LsV¯ ,D¯ (S)) = LsV¯ ,D¯ (RVs¯ ,D¯ (S)) = S. Note an important property of the
The Polytope of Schedules of Processing of Identical Requirements
261
s s shift: if σ ∈ Σd and V¯ = V , then the sets RV, ¯ (σ) and LV,D ¯ (σ) will also be the D s schedules from Σd . And finally, since Σd ⊂ Σd+s , the operation σ = RV, ¯ (σ) for D k + t = d will also be considered possible since the columns d + 1, d + 2, . . . , d + s can be considered empty with respect to the schedule σ in the basic set Ed+s .
3
Independent Sets and Empty Columns
This section presents some technical results that are the basis for proving the key polyhedral properties of a set of schedules, such as the validity, support and facetness of linear inequalities in the space REd with respect to the polyhedron P (Σd ). For subsets U1 , U2 ⊂ V we will write U1 U2 if the following conditions are håld: 1) U1 ∩ U2 = ∅, 2) for any vertex i ∈ U1 there is a vertex j ∈ U2 such that i j, 3) there is no vertex j ∈ U2 such that j i for some i ∈ U1 . In addition, we need the following notation. For i ∈ V and U ⊂ V , we will define the following subsets of vertices: N− (i) = {j ∈ V | j i}, N− (U ) = ∪i∈U N− (i),
N+ (i) = {j ∈ V | i j}, N+ (U ) = ∪i∈U N+ (i),
W (U ) = V \ (U ∪ N− (U ) ∪ N+ (U )). It is easy to see that if U is an independent set, then the introduced sets have the following properties: a) N− (U ) U, U N+ (U ); b) any two vertices i ∈ N− (U ) (i ∈ N+ (U )) and j ∈ W (U ) are either not connected by a precedence relation, or i j (j i, respectively); c) any two vertices i ∈ U and j ∈ W (U ) are not connected by a precedence relation. Let us extend the definitions of the characteristics pi and qi to subsets of the set σV . Let us assume that σ ∈ Σn , U ⊂ V . Suppose Tσ (U ) = {k ∈ D | i∈U xik > 0} is the set of columns in which the schedule σ contains the vertices from the set U . We will denote pU = min{k | exists σ ∈ Σd such that Tσ (N− (U )) ⊆ {1, 2, . . . , k}}, qU = min{k | exists σ ∈ Σd such that Tσ (N+ (U )) ⊆ {d − k + 1, d − k + 2, . . . , d}}. Similarly to how it was above, we will define the set DU = {pU + 1, pU + 2, . . . , d−qU } ⊂ D. Note that, generally speaking, the set DU may turn out to be empty. However, as it will be shown below, if U is independent and d = n, then DU = ∅. It is easy to verify that if U consists of one vertex i, then the introduced characteristics pU , qU and DU for d = n coincide with the characteristics pi , qi and Di .
262
R. Y. Simanchev and I. V. Urazova
Lemma 1. [11] Let U ⊂ V be an independent set and d = n. Then there is a schedule σ ∈ Σn with the following properties: 1) Tσ (N− (U )) = {1, 2, . . . , pU }, 2) Tσ (U ∪ W (U )) ⊆ DU , 3) Tσ (N+ (U )) = {n − qU + 1, n − qU + 2, . . . , n}. In this case, the vertices of the set U can be located in the columns of DU arbitrarily, taking into account the constraints (3) on the number of vertices in the column. This lemma implies that for d = n for an independent set U ⊂ V the inequality |U | + |W (U )| ≤ |DU | is true. The possibility of using empty columns plays an important role in the construction of schedules Ed . It is easy to see that if in the table E there is an empty column relative to the schedule σ, then by using an appropriate sequence of shifts we can move it to any place. The resulting set σ ⊂ E will also be a schedule. The fact that the precedence graph G is not a path implies the following Lemma 2. Let U ⊂ V be an independent set and k ∈ DU . Among the schedules satisfying Lemma 1, there is σ ∈ Σn such that the column k is empty with respect to σ and the cardinality of each of the fragments σ(W (U ), l), l ∈ DU does not exceed m − 1. Indeed, as noted above, |U | + |W (U )| ≤ |DU |. If each column from the set DU contains a vertex from the set U ∪ W (U ), then |U | + |W (U )| = |DU |. Then it follows from Lemma 1 that the table E has no columns empty with respect to σ. Since |D| = |V | = n, then each column contains exactly one vertex. Now from the definition of the characteristics pU and qU it follows that the graph G is a path. Thus, among the columns of the set DU there is certainly an empty one. Since k ∈ DU , then with the help of a sequence of corresponding shifts it is easy to achieve that exactly the column k is empty. The inequality |σ(W (U ), l)| ≤ m − 1 for l ∈ DU follows from |W (U )| < |DU | and properties b) and c). In further constructions, we will need schedules for which intentionally there are empty columns in the table E. The number of empty columns relative to the schedule σ ∈ Σn can be understood as the difference of n − |Tσ (V )|. We will introduce the value δG = min{m, max{|U |, U ⊆ V − independent}}. The next statement strengthens Lemma 2. Lemma 3. Let U ⊂ V be an independent set of vertices, d = n. There is a schedule σ ∈ Σn such that n − |Tσ (V )| ≥ δG − 1. Proof. Let the schedule σ satisfy the properties 1) -3) from Lemma 1. Moreover, the columns 1, 2, . . . , pU , n − qU + 1, n − qU + 2, . . . , n are not empty with respect
The Polytope of Schedules of Processing of Identical Requirements
263
to σ. This means that for the location of the vertices of the set U ∪ W (U ) we can use the columns DU . According to the property c), the vertices from U and from W (U ) are not connected by precedence conditions. This means that they can be arbitrarily placed relative to each other. We will place the vertices of the set W (U ) in the fragment E(W (U ), DU ) one by one in any |W (U )| columns taking into account the precedence conditions between them. As a result, there will be exactly |DU | − |W (U )| = n − pU − qu − |W (U )| empty columns for the placement of the vertices from U in the table E. Note that pU = |Tσ (N− (U ))| ≤ |N+ (U )| and qU = |Tσ (N+ (U ))| ≤ |N+ (U )|. Then |U | = n − |N− (U )| − |N+ (U )| − |W (U )| ≤ n − pU − qu − |W (U )| = |DU | − |W (U )|, that is, there are enough empty columns to place the vertices of the set U . Moreover, due to the independence of the set U and the property c), we can place them in the indicated columns arbitrarily, taking into account only constraints (3) on the number of machines. By the definition of the value δG , we can place the number δG vertices of the set U in one of the empty columns. If δG = |U |, then the entire set U will placed into this column. If δG < |U |, then we can place the remaining vertices one at a time in any |U | − δG columns from the remaining empty ones. It is easy to calculate that after that empty will remain n − |Tσ (V )| = |DU | − |W (U )| − (1 + |U | − δG ) = (|DU | − |W (U )| − |U |) − 1 + δG ≥ δG − 1
columns. The lemma is proved. Remark 1. Since, by convention, m ≥ 3 and the graph G are not a path, the maximum of the cardinalities of the independent sets in G is at least 2. Therefore, in this paper, we always have δG ≥ 2. In particular, this means that Σn always contains a schedule for which at least one column of the table E is empty. In addition, the equality δG = 2 means that any independent set in G has cardinality at most 2 and therefore no more than two machines may be needed to process all requirements. Therefore, the condition m ≥ 3 turns out to be redundant. In this regard, we can assume that δG ≥ 3. This ensures that Σn has a schedule with at least two empty columns. Remark 2. The following result is known: if the cardinality of any of the independent sets of the graph G does not exceed m, then min{d | Σd = ∅} = |Pmax | + 1, where |Pmax | is the length of the maximum path in the number of arcs in the graph G [4,5]. Since the maximum path problem in an acyclic digraph is polynomially solvable [3,5], the problem (1) is also polynomially solvable. Therefore, if we consider only the graphs with δG = m, then, by virtue of Lemma 3, this will mean that Σn has a schedule with at least m − 1 empty columns.
4
Faces and Facets of Polytope P (Σn ) Among the Constraints of Polyhedron Mn
Let us give some general definitions necessary for describing the polyhedral structure of convex hulls of a finite number of points in Euclidean space. Let Rn be
264
R. Y. Simanchev and I. V. Urazova
the n -dimensional space of column vectors and P = conv{x1 , x2 , . . . , xt } ⊂ Rn be a polytope. A linear inequality aT x ≤ a0 is called valid with respect to P if it holds true for every point from P . A valid inequality is called support to P if there is a point x ∈ P such that aT x = a0 . Any support inequality generates the set {x ∈ P | aT x = a0 }, which is called the face of the polytope P . The proper faces of a polytope that are maximal by inclusion are called the facets of polytope. The support inequalities that generate facets are called facet inequalities. From the point of view of solving combinatorial optimization problems, the most relevant are facet inequalities. They have worked well in cutting plane processes for solving high-dimensional problems (see [6–10], etc.). In addition, any linear system, the set of solutions of which coincides with a given polytope, necessarily contains all the facet inequalities of the polytope. Theorem 1 [11]. The following statements are true. (a) (constraints on the number of machines) Let us assume that k ∈ D and |Vk | ≥ m. The inequality i∈Vk xik ≤ m is support to the polytope P (Σn ) if and only if the graph G contains such an independent set U ⊆ Vk that |U | = m and k ∈ DU . (b) (precedence constraints) The inequality xik ≤ l∈Dj , l>k xjl , ij ∈ A, k ∈ Di , is support to the polytope P (Σn ). (c) (non-negativity constraints) The inequality xik ≥ 0, i ∈ V , k ∈ Di is support to the polytope P (Σn ). Corollary 1. A constraint of the type (3) of the polyhedron Mn is redundant if and only if δG < m. As we have already mentioned, facet inequalities play a special role not only in the construction of convex hulls but also in the cutting plane processes. In the paper [12], a technique for proving the facetness of an inequality that is support for a given polytope was developed. For further use of this technique, we will give its description. Let H ⊆ 2E be a family of subsets of some basic set E and P (H) ⊂ RE be the convex hull of incidence vectors of sets of the family H. To describe a sufficient facet condition for an inequality that is support to P (H), we will need the following notation and definitions. Let us assume that aff P (H) = {x ∈ RE | ΦT x = α}, and the matrix Φ has a full rank. Each row of the matrix Φ corresponds to exactly one element of e ∈ E and vice versa. Therefore, the set of rows of the matrix Φ will be denoted by E. The set of columns is denoted by the letter V , and suppose that |V | = n. It is clear that rank Φ = |V | ≤ |E|. If c ∈ RE , then by (c|Φ) (respectively, (Φ|c)) we will denote the matrix obtained by adding to we the matrix Φ the column c on the left (respectively, on the right). By Φ(c, E) will denote the submatrix of the matrix (c|Φ) formed by the rows E ⊆ E.
The Polytope of Schedules of Processing of Identical Requirements
265
Let bT x ≤ b0 be a support inequality f orP (H). A non-empty set S ⊂ E will be called a bH -switching if there exists H1 , H2 ∈ H such that S = H1 H2 and bT xH1 = bT xH2 = b0 . ⊂ E is called a bH -basis if the following conditions are met: A subset E = n + 1, (b1) |E| has a full rank, (b2) the matrix Φ(b, E) there is an ordered sequence e1 , e2 , . . . , et = e of elements (b3) for every e ∈ E \ E from E such that for any i ∈ {1, 2, . . . , t} the element ei belongs to some bH ∪ {e1 , e2 , . . . , ei }. -switching lying in E Theorem 2 [12]. The support inequality bT x ≤ b0 for P (H) is a facet inequality, ⊂ E. if there exists the bH -basis E In our case, the base set is the set En , the family of subsets H is the set of all schedules Σn . As the paper [11] shows, the affine hull of the polytope P (Σn ) is determined by system (2), that is xik = 1, i ∈ V }. aff P (Σn ) = {x ∈ REn | k∈Di
4.1
Trivial Facets
First, we will consider the non-negativity constraints for xik ≥ 0. The vector of coefficients on the left-hand side of this inequality is the incidence vector of the set {(i, k)} ⊂ Ed , that is, this inequality can be rewritten as (x{(i,k)} )T x ≥ 0. Therefore, when constructing switchings and bases for this inequality, we will talk about x{(i,k)} Σn -switching and x{(i,k)} Σn -basis. Theorem 3. Let us assume that i ∈ V , k ∈ Di . The inequality xik ≥ 0 is the facet inequality for the polytope P (Σn ). Proof. Let us consider the vertex j ∈ V \ {i}. First, let us assume that k ∈ Dj . By Lemma 1 and Corollary 2, there exists σ ∈ Σn such that Tσ (N− (j)) = {1, 2, . . . , pj }, Tσ (N+ (j)) = {n − qj + 1, d − qj + 2, . . . , n} and the column k is empty relative to σ. In addition, since |W (i)| < |Dj |, we can require that the cardinality of each of the fragments σ(W (j), l), l ∈ Dj is at most m − 1. Take arbitrarily two neighboring cells (j, l), (j, l + 1) ∈ E({j}, Dj ). Since the vertex j can be located in any of the columns of Dj , we assume that (j, l) ∈ σ. Moreover, due to constraints (2), (j, l + 1) ∈ / σ. Now, again, due to the arbitrariness of the location of the vertex j in the columns of Dj , the set σ = (σ\{(j, l)})∪{(j, l+1)} will be the schedule from Σn . As a result, we get σ σ = {(j, l), (j, l + 1)} and xσik = xσik = 0. In other words, we have shown that the set of cells {(j, l), (j, l + 1)} ∈ E({j}, Dj ) for j = i is x{(i,k)} Σn -switching. Let us again, as before, assume that j ∈ V \ {i}, but k ∈ / Dj . To be more specific, suppose that k ≤ pj , that is, the columns Dj are to the right of the column k. Again, using Lemma 1 and Corollary 2, we will choose a schedule σ
266
R. Y. Simanchev and I. V. Urazova
that has an empty column s ∈ Dj and has the property |σ(W (j), l)| ≤ m − 1 1 (σ) for all l ∈ Dj . From the σ schedule, we will go to the σ1 = RV,{k,k+1,...,s−1} schedule. As a result of this shift, column k becomes empty relative to σ1 (column s, generally speaking, ceases to be empty). Now, repeating the reasoning from the previous case, we come to the conclusion that the set of cells {(j, l), (j, l + 1)} ∈ E({j}, Dj ) for j = i is a x{(i,k)} Σn -switching. The case k > n − qj differs from the one considered in that σ1 = L1V,{s+1,s+2,...,k} (σ). Now, let us describe the x{(i,k)} Σn -switching for the vertex i. If among the cells (i, l), (i, l + 1) ∈ E({i}, Di ) there is no cell (i, k), then using constructions similar to the case j ∈ V \ {i}, k ∈ Dj , we come to the conclusion that the pair {(i, l), (i, l + 1)} is a x{(i,k)} Σn -switching. These constructions will not work if l or l + 1 is equal to k since both schedules participating in the symmetric difference must not contain the cell (i, k). What follows is that when constructing the x{(i,k)} Σn -basis next, we will only need the case pi + 1 < k < n − qi . It is easy to see that the pair of cells (i, k − 1) and (i, k + 1) is a x{(i,k)} Σn switching. Indeed, let us consider the schedules σ1 , σ2 inΣn satisfying Lemma 1. Since k − 1, k + 1 ∈ Di , we can require that (i, k − 1) ∈ σ1 and (i, k + 1) ∈ σ2 . Then we have σ1 σ2 = {(i, k − 1), (i, k + 1)}. In addition, by virtue of (2), we have (i, k) ∈ / σ1 ∩σ2 , that is, the schedules σ1 and σ2 satisfy the equality xik = 0. Thus, {(i, k − 1), (i, k + 1)} is a x{(i,k)} Σn -switching. The summary of this part of the proof is as follows: – for (i, k) ∈ / {(j, l), (j, l + 1)} the set {(j, l), (j, l + 1)} ⊂ En is a x{(i,k)} Σn switching for any j ∈ V (–switching of the first type); – for pi + 1 < k < n − qi the set {(i, k − 1), (i, k + 1)} is a x{(i,k)} Σn -switching (– switching of the second type). Let us proceed to the construction of the x{(i,k)} Σn -basis. We will consider three cases, each of which will have its own x{(i,k)} Σn -basis. n = {(j, pj + 1), j ∈ V ; (i, pi + Case 1 (k = pi + 1). Let us show that E {(i,k)} Σn -basis. The condition (b1) from the definition of the 2)} ⊂ En is a x n ) is square bH-basis is satisfied. It is easy to see that the matrix Φ(x{(i,k)} , E and nondegenerate. This matrix is shown in the Fig. 1. The rows of the matrix n . The first column corresponds correspond to the variables included in the set E {(i,k)} (denoted by b in the Fig. 1), the next columns correspond to the vector x to the constraints (2), numbered from 1 to n. (see Fig. 1). In particular, the two rows in the center of the matrix correspond to the variables xi pi +1 and xi pi +2 , the column in the center of the matrix corresponds to the vertex i ∈ V . Thus, the condition (b2) is satisfied. The verification of the condition (b3) implies that from the cells included in n , we must “add on” using the x{(i,k)} Σn -switchings all the cells of the table E En . Let us consider the row j ∈ V of the table En . If j = i, then (j, pj + 1) ∈ n . Accordingly, in the line j we need to complete the cells (j, pj + 2), (j, pj + E 3), . . . , (j, n − qj ). These cells are completed sequentially by using the switchings of the first type: with the switching {(j, pj + 1), (j, pj + 2)}, the cell (j, pj + 2) is
The Polytope of Schedules of Processing of Identical Requirements
267
added on, then with the switching {(j, pj + 2), (j, pj + 3)} the cell (j, pj + 3) is added, and so on up to the last cell (j, n − qj ). Thus, all the cells of the table En are added except for the cells of the fragment E({i}, {pi + 3, pi + 4, . . . n − qi }). It is easy to see that the remaining cells can be completed in the same way as above starting with (i, pi + 3). Thus, the condition (b3) is also satisfied.
b 0 ⎜0 ⎜ ⎜ .. ⎜. ⎜ (i, pi + 1) ⎜ ⎜1 (i, pi + 2) ⎜ ⎜0 .. .. . . (1, p1 + 1) (1, p2 + 1) .. .
(n, pn + 1)
⎛
1 1 0 .. .
0 0 .. .
0 0
2 ... 0 ... 1 ... .. . . . . 0 ... 0 ... .. . . . .
i 0 0 .. .
1 1 .. .
0 ... 0
... n ⎞ ... 0 ... 0⎟ ⎟ .. ⎟ .. . .⎟ ⎟ ... 0⎟ ⎟ ... 0⎟ ⎟ . .. . .. ...
1
n ). Fig. 1. The matrix Φ(x{(i,k)} , E
n = {(j, n−qj ), j inV ; (i, n−qi −1)} ⊂ En . Case 2 (k = n−qi ). Let us define E {(i,k)} Σn -basis is proved in exactly the same way as in The fact that this set is a x the previous case, with the difference that when considering the condition (b3), the table En is completed not in left to right but right to left order. n = {(j, pj + 1), j ∈ Case 3 (pi + 1 < k < n − qi ). Let us show that E V \ {i}; (i, k), (i, k + 1)} ⊂ En is a x{(i,k)} Σn -basis. The conditions (b1) and (b2) are easily verified. Let us prove (b3). The completion of the rows j ∈ V \ {i} is performed in the same way as in case 1. Then, with the second type switching {(i, k−1), (i, k+1)} we complete the cell (i, k−1) (that is feasible since (i, k+1) ∈ n ). Further, moving to the left of the cell (i, k − 1), we complete, with the help E of the first type switchings, all the cells (i, k − 2), (i, k − 3), . . . , (i, pi + 1), and moving to the right of the cell (i, k + 1), we complete all the cells (i, k + 2), (i, k + 3), . . . , (i, n − qi ). The theorem is proved. 4.2
Constraints on the Number of the Machines
Let us consider the constraints (3) on the number of the machines. According to the statement (a) from Theorem 1, the inequality i∈Vk xik ≤ m is a support inequality if and only if there exists an independent set U ⊆ Vk such that |U | = m and k ∈ DU . Generally speaking, such a set U may not be the only one. Let U(k) = {U1 , U2 , . . . , Ut } be all independent sets from Vk satisfying the conditions |Uα | = m and k ∈ DUα , α = 1, 2, . . . , t; Uα = U beta for α = β. Each set Uα we will associate with the set of schedules Σn (Uα ) ⊆ Σn according to the rule: σ ∈ Σn (Uα ) if and only if E(Vk , k) ∩ σ = E(Uα , k).
268
R. Y. Simanchev and I. V. Urazova
Lemma 4. The family of sets Σn (Uα ), α = 1, 2, . . . , t is a regular partition of the set of schedules whose incidence vectors form the set of all vertices of the face of the polytope P (Σn ), the face generated by the support inequality i∈Vk xik ≤ m. Proof. The lemma is true if: 1) Σn (Uα ) = ∅ for all α = 1, 2, . . . , t; 2) Σn (Uα ) ∩ Σn (Uβ ) = ∅ for α = β; 3) 3) {σ ∈ Σn | i∈Vk xσik = m} = ∪tα=1 Σn (Uα ). The proof of the condition 1) coincides with the proof of sufficiency in Theorem 1(a). To prove the condition 2), let us assume that σ ∈ Σn (Uα ) ∩ Σn (Uβ ), α = β. Then we have E(Uα , k) ∪ E(Uβ , k) = E(Uα ∪ Uβ , k) ⊂ σ. Since |Uα ∪ Uβ | > m and all vertices from Uα ∪ Uβ are located in the column k in the schedule σ, then i∈Vk xσik > m. This contradicts the validity of the inequality. Condition 3). Let us assume that α ∈ {1, 2, . . . , t} and σ ∈ Σn (Uα ). By the definition of the set Σn (Uα ) in the schedule σ only the vertices of the set Uα are placed in the column k. Then, by virtue of the fact that our inequality is σ support, we have i∈Vk xik = m. Since α was chosen arbitrarily, we obtain the t Σn (Uα ). inclusion {σ inΣn | i∈Vk xσik = m} ⊇ ∪α=1 Now let σ ∈ Σn be such that i∈Vk xσik = m. This means that among the terms xσik , i ∈ Vk there are exactly m pieces equal to 1. Since the vertices {i1 , i2 , . . . , im } = U are located in the same column relative to the schedule σ, the set U is independent. Let us show that k ∈ DU . It is clear that Tσ (U ) = {k}. Suppose that k ≤ pU . Since all vertices from U are located in this column, the inequality k ≤ pU means that Tσ (N− (U )) ⊆ {1, 2, . . . k − 1} ⊆ {1, 2, . . . pU − 1}, which contradicts the minimal value of pU . Similarly, the assumption k > n − qU leads to a contradiction with the definition of the characteristic qU . Thus, U ∈ U(k). Hence, the inclusion {σ ∈ Σn | i∈Vk xσik = m} ⊆ ∪tα=1 Σn (Uα ). As a result, we get the required equality. The lemma is proved. Lemma 5. Let k ∈ D and the i∈Vk xik ≤ m be a support inequality to the polytope P (Σn ). If ∩tα=1 Uα = ∅ or Vk \ ∪tα=1 Uα = ∅, then the i∈Vk xik ≤ m is not a facet inequality. Proof. Let F denote the face of the polytope P (Σn ) generated by the inequality x ≤ m, that is, F = {x ∈ P (Σ ) | n i∈Vk ik i∈Vk xik = m}. First, let us assume that i0 ∈ ∩tα=1 Uα . Suppose F0 = {x ∈ P (Σn ) | xi0 k = 1} is the face generated by the inequality xi0 k ≤ 1. Let us show that the facet F0 is of its own. Since |Di0 | > 1, then there is a schedule that does not contain the cell (i0 , k). Indeed, using Lemma 2, we will construct a schedule σ in which one of the columns adjacent to the column k is empty. Now we can construct the schedule σ , which contains the cell (i0 , k − 1) (or (i0 , k + 1)) and does not contain the cell (i0 , k). Then xσi0 k = 0 < 1 and, therefore, F0 is its own face of the polytope P (Σn ).
The Polytope of Schedules of Processing of Identical Requirements
269
By virtue of Lemma 4 and the fact that i0 ∈ ∩tα=1 Uα , the inclusion F ⊆ F0 holds true. Let us show that F = F0 . We will choose arbitrarily Uα ∈ U(k). Using Lemma 1 and Lemma 2, we will take a schedule σ containing a fragment E(Uα , k) and, moreover, the column l that is adjacent to k is empty. Supposing i ∈ Uα \ {i0 }, we will define a new schedule σ = (σ \ {(i0 , k)}) ∪ {(i0 , l)}. It is / F . Hence, F ⊂ F0 . easy to see that x sigma ∈ F0 and xσ ∈ Since F and F0 are faces, then we have a chain of equalities dimF < dimF0 ≤ dimP (Σn ) − 1. Hence, F is not a facet. Now let i0 ∈ Vk \ ∪tα=1 Uα and F0 = {x ∈ P (Σn ) | xi0 k = 0} be the face generated by the inequality xi0 k ≤ 0. Then, for any schedule σ such that xσ ∈ F / ∪tα=1 Uα , we get xσ ∈ F0 and, therefore, there is certainly Uα ∈ U(k). Since i0 ∈ F ⊆ F0 . Similarly to how it was done in the previous case, we can prove that the face F0 is of its own and F ⊂ F0 . Thus, the face F is again not a facet of the polytope P (Σn ). The lemma is proved. 4.3
Precedence Constraints
Let us move on to the precedence constraints (4). These constraints are as follows xik ≤ xjl , ij ∈ A, k ∈ Di . l∈Dj , l>k
According to Theorem 1(b), this constraint is a support inequality to the polytope P (Σn ) for any ij ∈ A and k ∈ Di . We will formulate and prove a necessary condition for such an inequality to be faceted. Lemma 6. If the inequality (4) generates a facet of the polytope P (Σn ), then k ∈ Di ∩ Dj . Proof. Let F = {x ∈ P (Σn ) | xik = l∈Dj , l>k xjl } be the face generated by the inequality (4). Suppose k ∈ / Di ∩Dj . Since k ∈ Di by hypothesis, our assumption means that k ∈ / Dj . Since i j, then pi < pj and n − pi < n − pj . Therefore, the assumption is equivalent to the inequality k ≤ pj . Then the right-hand side of the inequality (4) coincides with the left-hand side of the corresponding constraint (2) and, therefore, xjl = xjl = 1. l∈Dj , l>k
l∈Dj
It follows that for any vertex x ¯ of the face F , the equality x ¯ik = 1 holds true. Then, by virtue of (2), x ¯is = 0 for all s ∈ Di \ {k}. Then, denoting by Fs , s ∈ Di \ {k} the facets of the polytope P (Σn ) generated by the inequalities xis ≥ 0, we obtain the inclusion F ⊆ ∩s∈Di \{k} Fs . Since the faces Fs are of its own and pairwise distinct, we have dimF < dimP (Σn ) − 1. We get a contradiction with the facets of inequality. The lemma is proved.
270
5
R. Y. Simanchev and I. V. Urazova
Conclusion
The presented paper is theoretical. The main idea of the work is to represent the set of schedules as a family of subsets of a finite set. This allows polyhedral methods to be used to analyze the properties of schedules set. The incidence vectors of schedules, schedule polytope, faces of the polytope are defined in a natural way. Classical and new techniques for proving that the support inequalities are facet inequalities of the schedule polytope are applied. An essential point of the work is the use of special characteristics pi and qi , the calculation of which for a fixed number of machines is an open problem in the sense of complexity theory. The nearest perspective of this work is the use of the results obtained to construct inequalities that generate facets of the scheduling polytope, and their application in branch and cut procedures.
References 1. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co, New York (1979) 2. Christofides, N.: Graph Theory. An Algorithmic Approach. Academic Press, New York (1975) 3. https://www.mathematik.uni-osnabrueck.de/research/OR/class. Accessed 23 Jan 2021 4. Simanchev, R.Y., Urazova, I.V.: An integer-valued model for the problem of minimizing the total servicing time of unit claims with parallel devices with precedences. Autom. Remote Control 71(10), 2102–2108 (2010). https://doi.org/10. 1134/S0005117910100097 5. Simanchev, R.Y., Urazova, I.V.: The polytope of schedules of identical jobs on parallel processors (Russian). Diskretn. Anal. Issled. Oper. 18(11), 85–97 (2011) 6. Simanchev, R.Yu., Urazova, I.V.: Kochetov, Yu.A.: The branch and cut method for the clique partitioning problem. J. Appl. Ind. Math. 13(3), 539–556 (2019) 7. Applegate, D.L., et al.: Certification of an optimal TSP tour through 85,900 cities. Oper. Res. Lett. 37, 11–15 (2009) 8. Grotschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Program. 2(51), 141–202 (1991) 9. Mokotoff, E.: An exact algorithm for the identical parallel machine scheduling problem. Eur. J. Oper. Res. 152, 758–769 (2004) 10. Nemhauser, G.L., Savelsbergh, M.W.: A cutting plane algorithm of single machine scheduling problem with release times. In: Akgül, M., Hamacher, H.W., Tüfekçi, S. (eds.) Combinatorial Optimization: New Frontiers in the Theory and Practice. NATO ASI Series F: Computer and System Science, vol. 82, pp. 63–84. Springer, Heidelberg (1992). https://doi.org/10.1007/978-3-642-77489-8_4 11. Simanchev, R.Y., Solov’eva, P.V., Urazova, I.V.: The affine hull of the schedule polytope for servicing identical requests by parallel devices. J. Appl. Ind. Math. 15(1), 146–157 (2021) 12. Simanchev, R.Y.: On facet-inducing inequalities for combinatorial polytopes. J. Appl. Ind. Math. 11(4), 564–571 (2017). https://doi.org/10.1134/ S1990478917040147
A Heuristic Approach in Solving the Optimal Seating Chart Problem Milan Tomi´c1(B) 1
and Dragan Uroˇsevi´c1,2
School of Computing, Union University, Kneza Mihaila 6, 11000 Beograd, Serbia {mtomic,durosevic}@raf.rs 2 Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, p.p. 367, 11000 Beograd, Serbia [email protected]
Abstract. The optimal seating chart problem is a graph partitioning problem with very practical use. It is the problem of finding an optimal seating arrangement for a wedding or a gala dinner. Given the number of tables available and the number of seats per table, the optimal seating arrangement is determined based on the guests’ preferences to seat or not to seat together with other guests at the same table. The problem is easily translated to finding an m-partitioning of a graph of n nodes, such that the number of nodes in each partition is less than or equal to the given upper limit c, and that the sum of edge weights in all partitions is maximized. Although it can be easily formulated as MILP, the problem is extremely difficult to solve even with relatively small instances, which makes it perfect to apply a heuristic method. In this paper, we present research of a novel descent-ascent method for the solution, with comparison to some of the already proposed techniques from the earlier works. Keywords: Seating optimization annealing
1
· Graph partitioning · Simulated
Introduction
People are social beings, and they usually tend to group by their ages, social status, common interests, and many other factors. At a big event, like a wedding dinner, for example, it is most likely that people from all age groups (starting from children to grandparents) can be found, and each one of them has some specific traits and needs that should be respected. There is also a common, but also ungrateful practice for the hosts to have to make the seating plan for their event, to make it easier for their guests to find good company, and to make sure that there will be enough room for everyone. That way the groups, which would probably be formed eventually, have to be artificially formed in advance so that every guest feels equally comfortable and amused. While making a seating chart, one should take many things into account. The members of the same family (like parents with small children) usually want c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 271–283, 2021. https://doi.org/10.1007/978-3-030-86433-0_19
272
M. Tomi´c and D. Uroˇsevi´c
to seat together. Teenagers and young adults usually tend to seat apart from the elders, because of the age gap and different topics of interest. The friends from work or school sometimes want to seat together because they don’t know anyone else. There is also a negative side – some people shouldn’t be allowed to seat at the same table because of their broken relationships from the past. At last, there are additional constraints on the number of available seats at each table. All in all, finding an optimal seating chart is a complex problem. 1.1
Problem Statement
The described problem can be stated in the following way. Let the number of guests at the event be n, and the number of tables m. For the sake of simplicity, let all the tables have an equal number of available seats - table capacity, c. Then, let’s say that the hosts know in advance some seating preferences of their guests, which can be described numerically by assigning a satisfaction factor to each pair of guests. Then the goal for the hosts is to find an appropriate assignment of the tables for the guests, such that the sum of all satisfaction factors of pairs of people assigned to the same table is maximized. In order to understand the problem better, let us introduce a complete weighted undirected graph Kn = (V, E), with the node set V = 1, 2, . . . , n. Let the weight w(i, j) of edge (i, j) ∈ E be the satisfaction factor described above. We should then choose an m-partitioning such that the sum of edge weights of all partitions is maximal, and that the number of nodes in each partition is not greater than the given upper limit c. The other way of looking at the problem is through graph coloring perspective: choose a coloring of the graph Kn with at most m colors, such that each color is used at most c times and the sum of weights of the edges between the nodes of the same color is maximized. The described coloring is also known as c-improper m-coloring [2]. The problem description can be given in the following mixed integer programming representation. Given the weight matrix W ∈ Zn×n , solve the following problem: max A
subject to
m n−1 n
Wj,k Ai,j Ai,k ,
i=1 j=1 k=j+1 m i=1 n
(1)
Ai,j = 1, ∀j (1 ≤ j ≤ n),
(2)
Ai,j ≤ c, ∀i (1 ≤ i ≤ m),
(3)
j=1
Ai,j ∈ {0, 1}, ∀i, j (1 ≤ i ≤ m, 1 ≤ j ≤ n), where the matrix A ∈ {0, 1}m×n is defined by 1 if the guest j is at the table i, Ai,j = 0 otherwise.
(4)
A Heuristic Approach in Solving the Optimal Seating Chart Problem
273
An even more general version of the problem would be that each of the tables has its own arbitrary capacity, which would replace the constant c with an mdimensional vector. Another constraint can be introduced, for each guest to have an arbitrary minimum satisfaction b:
Ai,k
n
Ai,j Wj,k ≥ bAi,k , ∀i, k (1 ≤ i ≤ m, 1 ≤ k ≤ n).
(5)
j=1
This constraint was used in the original proposal, but thrown out from consideration in the final work. 1.2
Complexity
The number of possible colorings of the graph G of n nodes with m colors is mn . But, for the specific problem, we can take into account that all tables are equal in importance, i.e. it is not important if the same subset V of set V is colored by color α or color β, as long as all the other nodes are colored with the other mn equivalence classes, colors. That is, we can categorize all those colorings into m! one (or more) of which represent(s) the optimal solution. However, this search space has many unfeasible solutions, because of the Inequality (3). As can be seen from [13], given n = m · c (equally sized tables with no place left), this can be reduced to: m−2 n − ic n! , (6) = m c (c!) i=0 and even more, when the order of tables is not important, to: m−2 n−ic n! i=0 c = . m! m!(c!)m
(7)
However, the Eqs. (6) and (7) show that the number of feasible solutions in a very simplified problem still has a factorial growth, which means that even relatively small-sized examples are unthinkable to be solved by simple enumerating even with the most sophisticated computers. Any additional constraint change (e.g. unequally sized tables, adding table order importance, etc.) could additionally increase the search space. As stated earlier, the goal is to determine a c-improper m-coloring of an nnode weighted graph (described by the adjacency matrix W ), such that the sum of the weights between every two same-colored nodes is maximized. Determining whether any such coloring exists is N P-complete for all c ≥ 1 when m ≥ 2, as proved in [9]. Consequently, determining the optimal coloring based on the given criteria is at least N P-hard. This is the primary reason to consider using some heuristic approach to try and approximate the optimal solution. The complexity of some similar problems was discussed in more detail in [13].
274
2
M. Tomi´c and D. Uroˇsevi´c
Related Work
Although the problem is not new, to the best of our knowledge, the general version of this problem has not been studied very thoroughly, although various special cases of the optimal graph partitioning problem were studied. In particular, the partitioning into two sub-graphs constrained in size such that the sum of weights of the cut edges is minimal was studied by Christofides and Brooker [8]. A more general version of the same idea, with p partitions of a given capacity b, where each node also has its own weight that counts into the partition capacity, was considered by Holm and Sørensen [10]. Both works were based on the branch and bound method, with MILP formulation in mind. The latter one is directly related to our problem, but with a different goal. Another interesting problem formulation is the one where the number of nodes in a single partition is not limited, but the goal is to minimize the weights of the edges inside the partitions. That variation was considered by Carlson and Nemhauser as a “scheduling problem in which several activities are competing for a limited number of facilities” [7]. The goal was to minimize interaction between the activities scheduled in the same facility. This variation is directly related to our problem in a way that we need to maximize the interaction between the people on the same table. However, the problem cannot be taken as an equivalent, because of the table capacity constraint. Also related to our problem is a graph coloring problem called the maximum happy vertices (MHV) problem, which arose increased interest recently [3,12,14,15]. The goal is to maximize the number of happy vertices (nodes) in a graph, taking the given partial coloring and coloring all the other (“free”) vertices. A happy vertex is the one that has all its neighbors colored with the same color as itself. We should see that the MHV problem is another variant of our problem, where edge weights are binary and the number of people seating at the same table is unlimited, but there exists some starting distribution of taken seats and only people who have all their friends at the same table are considered satisfied. At last, the design of seating plans for large social events described earlier was also the topic of some recent researches. Bellows and Luc Peterson tried to implement a practical solution for their own wedding, so they wrote a MILP model in GAMS and executed it using the commercial IBM ILOG CPLEX solver. For the real application, their model was executing for 36 h before it was finally stopped and the best solution found was used as a very good starting point [5]. On the other side, Lewis and Carroll created a heuristic algorithm, which was used on a commercial seating planner website. Their two-stage solution is based on tabu search and introduces two neighborhood operators: Kempe-chain interchange and Swaps. The solution showed superior performance in comparison with the IP solvers [13].
A Heuristic Approach in Solving the Optimal Seating Chart Problem
3
275
Proposal
In the search for a more robust method, several experiments were conducted, in an attempt to find a well-balanced method between exhaustive search and random search. The goal was to find a method with better, or at least similar, efficacy as the commercial solution proposed in [13], using a more general model. The algorithm is based on classical simulated annealing and modified to fulfill our needs as much as possible. Some of the good ideas from the earlier works were also implemented where we found them useful. In particular, Lewis’ and Carroll’s way of making the problem smaller by creating indivisible groups before starting local search was found to be very practical. Some of the traditional operators for generating neighborhoods were also applied, similar to the ones that could be found elsewhere, e.g. in [12,13].
4
Proposal Description
As exposed in Kirkpatrick et al. [11], the basic steps to define a simulated annealing procedure are to describe the configuration of the system, to create a random generator of moves between the states, to define the objective function, and to define the cooling schedule. An instance of the problem was already defined above, but for the sake of readability, a short review of variables used in the problem statement is given in the Table 1. Each of them represents an input for the algorithm. Table 1. Variables of an instance of the problem. Variable Type
Size
Description
n
Integer
1
Node count = |V | (number of guests)
m
Integer
1
Partition count (number of tables)
c
Integer
1
Partition capacity (table size)
W
Matrix of integers n × n Connectivity matrix (mutual guest satisfaction)
Pre-processing Step. In order to make the solving easier, we used the same principle used by Lewis and Carroll in [13]: the “must-seat-together” groups are identified in the pre-processing step and grouped together. The grouping step requires that multiple nodes are combined into a hyper-node, which has its own size (defined by the number of elements in the group). As the nodes are combined into hyper-nodes, a new connectivity matrix is made, by summing up all the appropriate connections from the starting matrix. Essentially, we are redefining the starting graph. The added variables are introduced in the Table 2.
276
M. Tomi´c and D. Uroˇsevi´c
Table 2. Variables added to an instance of the problem after the pre-processing step. Variable Type
Size
Description
ng
Integer
1
Group count
Gsize
Array of ints.
ng
Group size array,
Gset
2D array of ints. ng × max (Gsize ) group sets,
Wg
Matrix of ints.
Gsize [i] = number of guests in group i Gset [i] = array of guests in group i ng × ng
Connectivity matrix (group satisfaction)
Although the original graph had no loops (edge weight on the main diagonal was 0), the new one introduces non-zero elements on the main diagonal, because of the positive association between the people in the group. The main diagonal can be safely ignored in the objective function anyway, but it was left for the sake of completeness, to be directly comparable to the original objective function. A state in the annealing process is described by a complete mapping between the hyper-nodes (groups) and their corresponding partitions (tables). The algorithm was implemented in C, so the indexing starts from 0 up to ng − 1 for nodes, or m − 1 for partitions. The basic structure (enough for the solution encoding) is an integer array of ng elements which takes values from the set 0, . . . , m − 1. Some alternative encoding procedures useful for graph partitioning optimization were described in [6]. Since the objective calculation must be fast, we also created several helper structures to accelerate the execution. A complete state is described in the Table 3. Table 3. Data structure used for the simulated annealing algorithm. Variable Type
Size
Description
Gt
Array of ints.
ng
Gt [i] = j when group i is at the table j
Tg
2D array of ints. m × ng Tg [j] contains i when Gt [i] = j
Ng
Array of ints.
m
To
Array of ints.
m
S
Array of ints.
m
Ng [j] = number of groups at the table j = |Tg [j]| Ng [j]−1 To [j] = k=0 Gsize [Tg [j][k]] S[j] = total satisfaction at the table j Ng [j]−1 Ng [j]−1 Wg [Tg [k]][Tg [l]] k=0 l=0
=
Since we look at all tables as equal, we added another constraint regarding the tables: group i can only be placed at the table with index not greater than i, i.e. (∀i)(Gt [i] ≤ i). That constraint notably reduces the search space, by the factor mm−1 , and it is very easy to implement inside the state random generator. of (m−1)!
A Heuristic Approach in Solving the Optimal Seating Chart Problem
277
State Transition Moves. A neighbor state is generated by sequentially applying one or more of the following operators: Move, and Swap. The Move operator takes a random group and moves it to the randomly selected table, respecting the aforementioned rules. The Swap operator takes two random groups from different tables and exchanges their table assignments. Each time a group changes its table, all values regarding its previous table, its new table, and the group itself are updated. Those partial calculations reduce the total number of operations needed for the objective function. We also added a neighborhood size parameter ν, which is used to control the number of applied operators in the transition. This parameter can take values from 1 to m, and is controlled by the success rate of the applied transitions. The number of iterations without an acceptance is tracked, and the neighborhood size is increased by 1 if the maximum number of stagnating iterations is reached. It gets reset to 1 when we accept the generated state, or when the maximum value of neighborhood size is reached. The strategy for applying the operators in the next state creation is following: – When the neighborhood size is odd, apply one Move, and – Otherwise, apply only ν−1 2 Swaps.
ν−1 2
Swaps;
The Swaps are favored in regard to the Moves in larger neighborhoods because they are somewhat better at keeping the number of filled places at the tables balanced if there are large groups. However, since the neighborhood size is often reset to 1, Moves are also relatively frequent. The experiments show different Moves: Swaps ratios on various test instances. A larger ratio for the same m (which is taken as maximum neighborhood size) means that the generated states were accepted more often. The Objective Function.
m−1 The objective is clear – to maximize the total satisfaction on all tables, j=0 S[j]. That is the exact double of the Eq. (1), because each connection is calculated twice (see Table 3), so the objectives are easily comparable. However, the table capacity is not taken into account, so we introduce a penalty for the exceeded number of seats. The penalty should be large enough to often exceed the total satisfaction, it should be proportional to the number of seats exceeded, and it should not be too large in order not to drop the acceptance probability to zero very early. We experimentally took the penalty of 100n · (number of seats exceeded). The factor 100n is taken naturally, because the maximum value of a cell in W is arbitrarily taken to be 100, and that is also the “must-seat-together” constant which is taken as a sign to join the two nodes into a hyper-node.
278
M. Tomi´c and D. Uroˇsevi´c
The Cooling Schedule and the Transition Probability Function. The optimal cooling schedule is a rather complex topic and it is often chosen experimentally. We defined the cooling schedule as: T0 = 125, Ti+1 = γTi ,
(8) (9)
and the experiments show that the most efficient cooling coefficient values are γ ∈ [0.9, 0.995]. The starting temperature T0 is also the result of several experiments with different cooling schedules for each test case. The starting temperature should be adjusted to the running time limit of the algorithm, which we defined to be 1 s. It could also be defined based on the properties of the input, like the connectivity matrix (W, Wg ) or input size (n, ng , m), but we didn’t find any concrete formulation which could be favored among the others. The transition probability function is defined as: ⎧ ⎪ 1 if δ > 0, ⎪ ⎨ 1 p(xi ) = (10) otherwise, ⎪ −δ ⎪ ⎩ 1 + exp Ti where δ = f (xi ) − f (xi−1 ) is the difference between the values of the objective function of the previous state xi−1 and the new state xi . If the new state has larger value of the objective function, it is accepted as the next state. If it hasn’t, the logistic function is used to calculate the acceptance probability. In the end, we introduced a short list of forbidden moves, which is used in the random move generator not to repeat itself for a short while. Its size can be variable, but the best maximum size for our larger instances is 10. Greater values slow the algorithm down, and lesser values make it more prone to cycling through the same states. A move is written in the list as a (guest, table) pair. A pair is added to the list for each Move operation, and two pairs are added for each Swap operation. The main procedure consists of an inner and an outer loop. The outer loop controls the cooling schedule, and the inner loop controls the state transitions. Algorithm 1 presents the complete simulated annealing procedure.
A Heuristic Approach in Solving the Optimal Seating Chart Problem
279
Algorithm 1: Simulated annealing procedure initialize problem instance and parameters; generate the random starting state x; initialize the best state best ← x; repeat for il ← 1 to il limit do apply operators on x to generate new state y; if f (y) > f (x) then x ← y; stagnation ← 0; ν ← 1; if f (y) > f (best) then best ← y; else generate random r ∈ [0, 1]; if p(y) > r then x ← y; stagnation ← 0; ν ← 1; else stagnation ← stagnation + 1; if stagnation = max stagnation then stagnation ← 0; ν ← ν + 1; if ν > m then ν ← 1; break the loop if time limit exceeded; decrease temperature by cooling schedule; until time limit exceeded or T = 0; return best
5
Experimental Results
The algorithm was tested on generated problem instances of various sizes. Two small instances were taken from [5], and the algorithm was quite consistent in solving them optimally inside the specified time limit of 1 s. Another generated large instance, which was also tested on the MILP model described in the same work using Gurobi Optimizer [1], was used as a reference. The branch and bound procedure was running on the model for over 100 h before it was stopped, and the final solution (without the certificate of optimality) was 2391, with a gap of 7.48%. All experiments were conducted on a personal laptop, with Intel(R) Core(TM) i3 2365M CPU @ 1.40 GHz, 8.00 GB RAM. Another idea was to compare the results with the results from the commercial solver described in [13], but since that online solution was written in Adobe Flash, which was already discontinued at the time of writing, we didn’t manage to do it in time.
280
M. Tomi´c and D. Uroˇsevi´c
The instance generator was written in Python, with the wedding planning task in mind, and it uses the following parameters to generate an instance: number of guests, number of tables, table capacity, weight constants for different sitting preferences (must sit together = 100, know each other = 10, don’t want to sit together = -10, neutral = 1, don’t know each other = 0 ), chances that the guest is from groom’s family, bride’s family, or anyone else, how many people are positive/negative/neutral to the rest of their group, chance that some people come as a pair, chances that the people from the different groups know each other and that they think positive of each other. Each test was conducted 10 times, and the results are summarized in the Table 4. The Max column represents the maximum solution found during the experiment, the Avg. column represents the average solution in 10 runs, the Times achieved Max is the total number of runs when the algorithm achieved the M ax solution for the given test case, the Avg./M ax ratio is taken as a measure of the algorithm performance in multiple runs for the given test case, the Std.dev. column is the standard deviation of the solutions in 10 runs, and the St.ddev/Avg. is taken as a measure of relative dispersion of the achieved results from the average solution. The first two instances were the ones described in [5], and the Max solutions were actually optimal. For the large generated test cases we don’t have proofs of optimal solutions, so only the relative comparison was given, to show the consistency of the algorithm. We didn’t use the forbidden moves list for the instances with a small number of groups, because it tends to lock the algorithm, making it unable to choose a valid move. However, it does help in boosting the results for many instances, which is very important. Table 4. Summary results of the conducted tests. n
m
c
Max
Avg.
Times achieved Max
Avg./Max Std. dev. Std. dev/Avg.
17
2 10
1153 1147.4
8
0.995
12.522
1.091%
17
5
4
1069 1069
10
1.000
0.000
0.000%
20
3
8
1440 1440
10
1.000
0.000
0.000%
20
2 10
2142 2136.8
8
0.998
10.400
0.487%
30
5
6
2432 2424.8
5
0.997
9.516
0.392%
30
2 15
2820 2807.8
7
0.996
22.969
0.818%
40
4 10
4188 4181
6
0.998
10.479
0.251%
4784 4734.4
1
0.990
31.213
0.659%
5832 5821
6
0.998
15.186
0.261% 0.818%
50 10
5
50
4 15
80
5 16 10288 10136.8
1
0.985
82.961
8718 8616.6
1
0.988
52.983
0.615%
107 11 10 12586 12526
1
0.995
44.686
0.357%
80 10
8
A Heuristic Approach in Solving the Optimal Seating Chart Problem
281
The starting parameters for the most of the tests conducted were T0 = 125, γ = 0.9995, il limit = 20, max stagnation = 10. The forbidden move list usage was turned off and on depending on the instance (whether or not the program would hang). We also tried running the algorithm for a longer time, in which cases the starting temperature would be calculated as T0 = 125 · time limit (where time limit is in seconds). The obtained results showed no particular difference, most probably because of the fast cooling down to zero, so there is nothing important to compare.
6
Discussion
Table 4 shows that most of the solutions are not far away from the maximal value found inside the time limit, and the standard deviation is also relatively small, compared to the instance size. The allowed execution time is extremely short on purpose, so we can see if the solution quality is satisfying. The aforementioned problem instance from the 8. Row of the table is the one tested with Gurobi Optimizer, and the best solution found using the simulated annealing is greater by 1 when compared as explained before in the proposal description (4784/2 = 2392, compared to 2391 from the branch and bound algorithm). However, the simulated annealing procedure doesn’t guarantee the quality of solutions because of its random nature, so it is possible to have many runs before obtaining the same solution again. However, the average solution, which would be ≈2367, is also not far away from the one found after many running hours. Taking the search space size into account, and the short execution time, the method is very promising. There are many ways to extend this research. One of them would be to try to explain the optimal cooling schedule for the problem. It would be beneficial to have proof of a schedule that could be calculated directly from the instance setup, which would generate near-optimal solutions more consistently. The cooling schedule could also be extended or shrunk based on the given time limit, so the time could be used more effectively. Another possible research topic could be the fine-tuning of the algorithm parameters. Alongside the cooling schedule, there are several other factors that could be improved: the length of the forbidden moves list, the operator choosing for the random state generator, the maximum neighborhood size, etc. It could be useful to implement some more robust adaptive parameter control, or ideally selfadaptive parameter control, like explained in [4]. That would move the algorithm towards a more intelligent exploration of the search space. The implemented algorithm was not tested on many edge cases. The generated problem instances had a consistent structure in terms of the guest relations distribution, so that only the problem size could be the stumbling block. We have already mentioned the variable parameters of the instance generator, and it would be interesting to see if some problem instances would make the algorithm unusable, or what is the nature of the instances which the algorithm can always solve optimally.
282
M. Tomi´c and D. Uroˇsevi´c
The hybrid nature of the proposed algorithm suggests the need for even more research on the ways that could make it better. It can probably be extended with some kind of fast local search after reaching the time limit, to quickly improve the best solution found. Even further, the local search could be used to look for the best neighbor before trying to accept it, applied only in some iterations. If the balance was found between the fast local search with great advances and the completely random choice of neighbors in simulated annealing, that would make the algorithm results even more satisfying. Last but not least, the method could be extended to work with various table sizes and the minimal guest satisfaction constraint. The minimal guest satisfaction would introduce a new penalty for the objective function. The variable table capacity would introduce larger solution spaces because the tables cannot be seen as equal when they have different capacities. That would also mean we would need to remove the i ≤ Gt [i] constraint. It would also introduce redundancy when some of the tables have the same capacity.
7
Conclusion
We made a hybrid method based on simulated annealing, in order to solve the problem of guests’ placement at large events, based on their social preferences. Knowing the computational complexity of the problem, we created a fast method for getting approximate solutions for large problem instances. The comparison with the results from the MILP solver and related works shows that the obtained results are acceptable. The statistics obtained from the tests run ten times on each of the test instances show that the results in multiple runs are quite consistent. The generated solutions can be used as good starting points in local search or manual adjustment. Acknowledgement. This work was partially supported by the Serbian Ministry of Education, Science and Technological Development through the Mathematical Institute of the Serbian Academy of Sciences and Arts.
References 1. Gurobi optimizer. https://www.gurobi.com/products/gurobi-optimizer/. Accessed 05 Feb 2021 2. Araujo, J., Bermond, J.C., Giroire, F., Havet, F., Mazauric, D., Modrzejewski, R.: Weighted improper colouring. J. Discrete Algorithms 16, 53– 66 (2012). https://doi.org/10.1016/j.jda.2012.07.001. https://www.sciencedirect. com/science/article/pii/S1570866712001049, selected papers from the 22nd International Workshop on Combinatorial Algorithms (IWOCA 2011) 3. Aravind, N.R., Kalyanasundaram, S., Kare, A.S., Lauri, J.: Algorithms and hardness results for happy coloring problems. CoRR abs/1705.08282 (2017). http:// arxiv.org/abs/1705.08282 4. B¨ ack, T., Fogel, D., Michalewicz, Z.: Evolutionary Computation 2–Advanced Algorithms and Operators, January 2000. https://doi.org/10.1201/9781420034349
A Heuristic Approach in Solving the Optimal Seating Chart Problem
283
5. Bellows, M.L., Peterson, J.D.L.: Finding an optimal seating chart. Ann. Improbable Res. 18, 3 (2012) 6. Boulif, M.: Genetic algorithm encoding representations for graph partitioning problems. In: 2010 International Conference on Machine and Web Intelligence, pp. 288– 291 (2010). https://doi.org/10.1109/ICMWI.2010.5648133 7. Carlson, R.C., Nemhauser, G.L.: Scheduling to minimize interaction cost. Oper. Res. 14, 52–58 (1966). https://doi.org/10.1287/opre.14.1.52 8. Christofides, N., Brooker, P.: The optimal partitioning of graphs. SIAM J. Appl. Math. 30, 55–69 (1976). https://doi.org/10.1137/0130006 9. Cowen, L., Goddard, W., Jesurum, C.: Defective coloring revisited. J. Graph Theory 24, 205–219 (1995). https://doi.org/10.1002/(SICI)1097-0118(199703)24: 3205::AID-JGT23.0.CO;2-T 10. Holm, S., Sørensen, M.M.: The optimal graph partitioning problem. Oper. Res. Spektrum 15(1), 1–8 (1993). https://doi.org/10.1007/BF01783411 11. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983). https://doi.org/10.1126/science.220.4598.671 12. Lewis, R., Thiruvady, D., Morgan, K.: Finding happiness: an analysis of the maximum happy vertices problem. Comput. Oper. Res. 103 (2018). https://doi.org/10. 1016/j.cor.2018.11.015 13. Lewis, R., Carroll, F.: Creating seating plans: a practical application. J. Oper. Res. Soc. 67(11), 1353–1362 (2016). https://doi.org/10.1057/jors.2016.34 14. Thiruvady, D., Lewis, R., Morgan, K.: Tackling the maximum happy vertices problem in large networks. 4OR-Q. J. Oper. Res. 18, 507–527 (2020). https://doi.org/ 10.1007/s10288-020-00431-4 15. Zhang, P., Jiang, T., Li, A.: Improved approximation algorithms for the maximum happy vertices and edges problems. In: Xu, D., Du, D., Du, D. (eds.) COCOON 2015. LNCS, vol. 9198, pp. 159–170. Springer, Cham (2015). https://doi.org/10. 1007/978-3-319-21398-9 13
Fast Heuristic Algorithms for the Multiple Strip Packing Problem Igor Vasilyev1,3 , Anton V. Ushakov1(B) , Maria V. Barkova1 , Dong Zhang2 , Jie Ren3 , and Juan Chen2 1
Matrosov Institute for System Dynamics and Control Theory of SB RAS, 134 Lermontov Street, 664033 Irkutsk, Russia {vil,aushakov,mbarkova}@icc.ru 2 Algorithm and Technology Development Department, Global Technical Service Department, Huawei Technologies, Co., Ltd., Dongguan, China {zhangdong48,chenjuan35}@huawei.com 3 Moscow Advanced Software Technology Lab, Huawei Russian Research Institute, Moscow, Russia {vasilyev.igor,renjie21}@huawei.com
Abstract. In this paper we address the so-called two-dimensional Multiple Strip Packing Problem (MSPP). There is a set of rectangles and a set of strips, where each strip has a pre-defined width. The problem is to find a feasible packing of all the rectangles into the strips so as the maximal height of the packing over all the strips is minimized. A packing is feasible if all the rectangles are placed into the strips and not overlapped. In case of only one strip, the problem is widely known as the Strip Packing Problem (SPP). Many effective constructive heuristics for SPP are based on the so-called skyline representation of a packing pattern. We generalized this approach for the case of multiple strips and developed a randomized local search, which embeds the proposed skyline heuristic to compute the objective value for a packing. We also introduced the twodimensional level multiple strip packing problem and developed some level-based constructive heuristics. We report computational results on real-life problem instances arisen in an application of MSPP in scheduling computing jobs on multiple Spark computer clusters. Keywords: 2-D Strip Packing Problem · Multiple Strip Packing Problem · Multiple cluster scheduling problem · Skyline heuristics Spark
1
·
Introduction
The two-dimensional strip packing problem (SPP) is a widely studied combinatorial optimization problem related to the vast field of research on cutting and packing problems (for a recent survey see [16]). Given a set J = {1, . . . , n} of rectangles (rectangular items), each of which has width wj and height hj ; and a strip (open-ended rectangle) having width W and infinite height. The problem c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 284–297, 2021. https://doi.org/10.1007/978-3-030-86433-0_20
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
285
is to place all the rectangles in the strip so as the height of the packing obtained is minimized. The problem has many natural real-life industrial applications in manufacturing, e.g. when one needs to cut rolls of materials into rectangular items so as the waste is minimized, or when one has to pack boxes that cannot be stacked on top of each other. One of the first papers addressing SPP was [1] where one of the first solution algorithms was proposed. The problem is known to be NP-hard in the strong sense, since it is a generalization of one-dimensional bin packing problem. Being one of the basic packing problems, SPP has widely been studied in approximation algorithms and operations research communities. Beside solving SPP directly, another approach is to solve its particular case where the rectangles are to be placed on so-called levels (shelves). The first level is the bottom of strip, each next level is defined as the horizontal line drawn through the top of the tallest rectangle placed on the previous level. This problem is known as the two-dimensional level strip packing problem (LSPP) and it is NPhard in the strong sense. Nevertheless, its properties allows one to obtain effective ILP models with polynomial number of variables and constraints (for a survey see [2]). Actually, many approximation algorithms for SPP places rectangles according to the level approach [13]. The best-known level-based algorithms are the First-Fit Decreasing Height (FFDH) and Next-Fit Decreasing Height (NFDH) heuristics proposed in [10]. Exact methods for SPP include, among others, combinatorial branch and bound methods [4] based on the one-dimensional contiguous bin-packing problem [22] and the perfect packing problem [19]; Bender’s decomposition approach based on combinatorial cuts [11]; methods based on iterative solution of MILP programs [9] etc. In this paper we consider an extension of SPP, in which a given set of rectangles is supposed to be packed into m strips, each of width Wi , so as the maximum packing height of the strips is minimized. This problem is often referred to as the two-dimensional multiple strip packing problem (MSPP). Note that there is very closely related problem, known as the multiple cluster scheduling problem (MCSP), concerning scheduling computing tasks on some set of machines or computer nodes. Instead of rectangles, it addresses jobs j ∈ J that have processing time hj and require wj processors to be executed. Suppose that there is a computer grid or cluster consisting of m platforms or machines, each of which involves Wi processors or executors. The problem is to schedule all the jobs so as to minimize the total makespan. The only essential difference between MSPP and MCSP is that in the latter problem jobs may be assigned to non-contiguous processors, i.e. rectangles may be vertically split in a feasible schedule. One also distinguishes between heterogeneous and identical machines. In the former case, machines are supposed to have different numbers of processors Wi , whereas identical machines have equal W . As noted in [12], though some algorithms developed for MSPP can also be adapted to MCSP, they in general use different techniques that make them suitable only to one of the problems. The optimal value of MSPP is an upper bound for the optimal value of MCSP.
286
I. Vasilyev et al.
Our main motivation for addressing MSPP is its application to the problem of scheduling and assignment of computing resources in Spark systems. Apache Spark is one of the most popular frameworks for distributed processing of huge amount of data [27]. It decomposes a computation task into several sub-tasks and assign them to multiple machines to process them simultaneously. If multiple tasks are assigned to Spark systems, it is not easy to optimally control resource utilization. Indeed, as the amount of resources and the number of tasks assigned is uncertain beforehand, a system must be designed to have built-in redundancy to accommodate all potential tasks. This substantially increases its installation and operation costs. For example, it is observed that the CPU and memory usage of a typical Spark system does not exceed 20%, which is obviously very far from being optimal. A useful strategy to overcome this difficulty is to consider a forecast of tasks and optimally decide the resource assignment plan based on the forecast. However, the success of this approach highly depends on the precision of the forecast, which is usually very challenging in practice. An alternative approach is to assign resources dynamically. Thus, we do not rely on precision of forecast. Each time point, we collect the updated information on assigned tasks and dynamically define a new schedule by finding a quality solution of MSPP. Thus, the computational efficiency of algorithms for MSPP plays a very important role, as they must respond in near real-time. As far as we know, unlike SPP, all the results on MSPP presented in the literature relate to the development of approximation algorithms. The problem is supposed to be first considered in [28] where it was proved that there is no approximation algorithm with absolute ratio less than 2 unless P = N P . In [26] the authors addressed online case of the problem but also proposed an offline algorithm with ratio 2+ε. This was then improved in [5] where an approximation algorithm with ratio 2 was developed, which is the best possible one unless P = N P . The authors proposed an AFPTAS for MSPP and also demonstrate how to adapt the level-based algorithms (FFDH and NFDH) to MSPP preserving the same approximation ratio as for SPP. As the worst-case running time for the 2-approximation algorithm is Ω(n256 ), further research was devoted to improving the running time at the expense of ratio. For example, one should mention the approximation algorithms with ratios 5/2 [6] and 9/4 [18] for MCSP or a linear time algorithm for MSPP (involving, however, a large hidden constant) [18]. The goal of this paper is to devise fast heuristic algorithms for MSPP. In particular, we introduce the two-dimensional level multiple strip packing problem (LMSPP) and propose its mathematical model. Further, we develop and implement variants of FFDH and NFDH heuristics for LMSPP. Moreover, we also devise an iterative heuristic for MSPP based on the idea of local search. In order to compute the value of the objective function for a given sequence of rectangles (jobs), we develop a variant of the so-called skyline heuristic which is then embedded in our local search procedure. We test the proposed approaches on a set of real-life problem instances arisen in the field of scheduling computing jobs on Spark computer clusters.
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
2
287
Level Multiple Strip Packing Problem and Level-Based Heuristics
Here we demonstrate that the aforementioned level strip packing problem can be extended for the case of multiple strips. We call this problem the Level Multiple Strip Packing Problem (LMSPP). Note that any feasible solution of LMSPP provides an upper bound for the optimal value of MSPP. Recall that we are given a set J of rectangles, each rectangle j ∈ J has width wj and height hj . Let us also introduce a set I of strips, each of which has width Wi . Assume that the rectangles are ordered by non-increasing height: h1 ≥ h2 ≥ h3 , . . . , hn . Following [21], we say that a rectangle initializes a level if it is located at the first place in the level. There are n potential levels associated with rectangles that can initialize them. Thus, each potential level has height hi . Let us introduce the following decision variables: 1, if rectangle j initializes a level on strip i, i ∈ I, j ∈ J. uij = 0, otherwise. 1, if rectangle j is placed on level k on strip i, vikj = i ∈ I, k ∈ J, j > k. 0, otherwise. Note that the rectangle that initializes a level is the tallest one in the level, thus variables vikj are introduced only for the rectangles that are lower than the item initializing level k. Using these notations, LMSPP can be cast as an integer program: min H,
(1) hj uij ≤ H
∀i ∈ I,
(2)
∀j ∈ J,
(3)
∀i ∈ I, k = 1, . . . , n − 1,
(4)
∀i ∈ I, j ∈ J,
(5)
∀i ∈ I, k ∈ J, j > k.
(6)
j∈J j−1
vikj + uij = 1
i∈I k=1 n
wj vikj ≤ (Wi − wk )uik
j=k+1
uij ∈ {0, 1} vikj ∈ {0, 1}
The objective function (1) minimizes the maximum sum of heights of the rectangles that initialize levels over all the strips. Constraints (2) define the maximal height of packing among all the strips. Equations (3) guarantee that each item is packed exactly once. It can either initialize a new level or be located at a preceding level. Constraints (4) impose that the total width of all items placed at a level must fit the width of the corresponding strip.
288
I. Vasilyev et al.
The above problem involves a polynomial number of variables and constraints and can be solved with any general MIP solver. Though even relatively large problem instances can be solved exactly, finding solutions of real-life problem instances may take long time or become intractable. Thus, one can devise some heuristics to fast solve LMSPP and find a feasible upper bound for MSPP. As noted above, the heuristics based on the level strip packing problem are among the most popular ones for SPP due to their simplicity, speed, and relatively high effectiveness. 2.1
Heuristics for LMSPP
We developed and implemented two heuristics for LMSPP which are extensions of FFDH and NFDH to the case of several strips. Note that the online versions of level-based packing algorithms are often referred to as shelf algorithms. NFDH is the most basic such heuristic. It packs rectangles, ordered by non-increasing height, one by one on a sequence of levels. The first level is the bottom of the strip. The algorithm places the rectangles left-justified on the current level, until the next rectangle fits the width of the strip. Once this happens, a new level is defined by a horizontal line drawn through the top edge of the first rectangle located on the previous level. Thus, the next items are packed left-justified on this new level. FFDH operates in a similar way, however it places the next rectangle on the lowest level where it fits instead of the current one. If no any lower levels and the current level can accommodate it, a new level is defined and packing proceeds. NFDH can easily be extended for MSPP as follows. Note that the variant of NFDH we implemented is very similar to one proposed in [5]. The algorithm first packs the rectangles on the bottoms of all strips so as the next rectangle that does not fit a current strip is located on the next strip. When the first levels of all strips are filled, the algorithm creates a new level on the strip with minimum packing height and proceeds packing the rectangles into this strip. The outline of our implementation is given in Algorithm 1. The algorithm takes as an input a set of rectangles and a set of strips and returns the height of the obtained packing as well as the coordinates of the bottom left corner of each rectangle on corresponding strip. We also developed a variant of FFDH for the case of multiple strips. Here, the initial level is the bottom of the first strip. When there are no rectangles fitting the strip width, the algorithm defines a new level on the bottom of the next strip. Then, it tries to place the next rectangle not on the lowest level (as in FFDH for one strip) but on the most previous one where it fits. If no previous levels can accommodate it, a new level is defined on the strip with minimum height (see Algorithm 2).
3
Skyline Heuristic and Its Generalization for MSPP
Fitness-based algorithms are among the most effective and popular strip packing approaches. Starting with the prominent paper by Burke [7], many research
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
289
Algorithm 1. Next-Fit Decreasing Height heuristic (NFDH) for LMSPP Input: A set J of rectangles (|J| = n), each of which has width wj and height hj , and a set I of strips (|I| = m) having widths Wi . 1: Sort the rectangles by non-increasing height. 2: Set the heights of the strips Hj ← 0, j = 1, . . . , m. 3: Set the width and height of a current level wc ← 0; hc ← h1 . 4: Define i as the index of the first strip such that w1 ≤ Wi . 5: for k = 1 to n do 6: if Wi − wc ≥ wk then + 7: wc ← wk , (xk ← wc , yk ← Hi , sk ← i). 8: else + 9: Set Hi ← hc . 10: Find the strip i with the minimum packing height such that Wi ≥ wk . 11: Set wc ← wk , hc ← hk , (xk ← 0, yk ← Hi , sk ← i). 12: end if 13: end for 14: return H ∗ = maxi∈I Hi , and (xk , yk , sk ), k ∈ J.
efforts have been dedicated to developing various strategies to improve the BestFit algorithm from [7]. The Best-Fit is a greedy approach that always chooses the lowest available space in the strip and places the rectangle that best fits it. If there is no rectangle that perfectly fits the niche, then the algorithm places the item that maximally fills it following a special placement policy. As rectangles usually do not perfectly fit a current space, various strategies to select the most appropriate rectangle were devised. Most of such approaches resort to the socalled skyline to represent a packing pattern. The goal is to keep the skyline as flat as possible. To achieve that, the concept of so-called fitness values is introduced [20]. In general, a fitness value (or score) indicates how much free space is filled by a rectangle, e.g. how many corner points of the space are fit. There are actually different scoring systems proposed in the literature that vary in the number of scoring values and their order. For example, in [20] the authors proposed to use 4 fitness values varying from 3 to 0, while in [25] 8 values, varying from 6 to −1 were used. In this paper we develop a fast best fit heuristic based on the skyline concept for MSPP. Actually, our approach is an extension of the skyline algorithm proposed in [23], which is one of the best state-of-the-art solution approaches to SPP. First, we give a description of the skyline algorithm and then discuss how it can be adapted for MSPP. The algorithm assesses the current packing pattern and determines available free spaces by keeping a rectilinear skyline. The latter is a sequence of horizontal (parallel to x-axis) line segments corresponding to the top edges of the already placed rectangles or the bottom of the strip. Thus, all the skyline segments satisfy the following properties: any adjacent segments always have different ycoordinates and one common x-coordinate. Note that the initial skyline related to the empty packing consists of only one segment corresponding to the bottom
290
I. Vasilyev et al.
Algorithm 2. First-Fit Decreasing Height heuristic (FFDH) for LMSPP Input: A set J of rectangles (|J| = n), each of which has width wj and height hj , and a set I of strips (|I| = m) having widths Wi . 1: Sort the rectangles by non-increasing height. 2: Set the heights of the strips Hj ← 0, j = 1, . . . , m. 3: Set the initial level l ← 1. 4: Set the width and height of the initial level wcl ← 0; hlc ← h1 . 5: Define sl as the index of the first strip such that w1 ≤ Wsl . 6: for k = 1 to n do 7: for each level i preceding level l do 8: if Wsi − wci ≥ wk then + 9: wci ← wk , (xk ← wci , yk ← Hsi , sk ← si ). 10: else + 11: Set Hsl ← hlc . 12: Set l ← l + 1, find the strip sl with the minimum packing height such that Wsl ≥ wk . 13: Set wcl ← wk , hlc ← hk , (xk ← 0, yk ← Hsl , sk ← sl ). 14: end if 15: end for 16: end for 17: return H ∗ = maxi∈I Hi , and (xk , yk , sk ), k ∈ J.
edge of a strip. Each segment s is characterized by the coordinates of its left end point and width. Moreover, for a segment s one computes its so-called left (right) height which is the difference between y coordinate of segment s and y coordinate of preceding segment s − 1 (subsequent segment s + 1). Note that the left height of the first skyline segment and the right height of the last segment are supposed to be infinite. Starting from the empty packing, the algorithm selects the most bottom-left segment to pack a rectangle on the top of it. Then, the algorithm determines which rectangle to place taking into account its fitness value. In our case, the fitness value is the number of rectangle edges that fit the width or heights of the segment (see Fig. 1). For example, if the width of a rectangle equals to the segment width, its fitness value is at least 1. Furthermore, if the height of the rectangle is equal to one or both heights of the segment, its fitness value is 2 or 3, respectively. Obliviously, the fitness value can vary from 0 to 3. Note that if the width of a rectangle is not equal to the segment width, it cannot be located at this place and its fitness value is −1. Thus, if all the remained unpacked rectangles have the fitness value of −1, the area above the segment cannot be fully utilized. Thus, the segment is raised up and joined with the lowest neighbor segment; the area below it is considered as wastage. If there are two ore more rectangles with equal fitness values, the algorithm selects the first such rectangle in a sequence. After a rectangle is placed, the skyline is updated and the next most bottom-left segment is evaluated.
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
291
Fig. 1. Fitness valus
Since we have m strips instead of one, we suppose that the initial skyline consists of m segments corresponding to the bottom edges of the strips. Given strips {1, . . . , m}, the x-coordinate of corresponding initial skyline segments s = s 1 . . . , m is equal to xs = l=2 Wl−1 and x1 = 0 (see Fig. 2). Thus, the algorithm begins packing rectangles into the first strip as the corresponding segment is the most bottom-left one. In some variants of MCSP, a job that must be executed is allowed to be divided between several machines. However, in our case a rectangle must be packed only in one strip. Thus, the neighbors of a segment must always belong to the same strip. To achieve that, we set the left and right heights of initial segments to infinity (see Fig. 2).
4
Randomized Local Search
An important feature of many constructive heuristics (like Bottom-Left, BLF or skyline heuristics) is that the solution found is actually dependent on an input sequence of rectangles. In other words, the algorithms follow the same set of rules, hence the solution produced will be the same if the heuristics have the same input. Thus, a natural idea is to employ metaheuristic approaches to find
292
I. Vasilyev et al.
Fig. 2. Initial skyline segments corresponding to the empty packing (left) and some feasible packing pattern for multiple strips (right)
quality packings by changing initial sequences of rectangles. The objective value is then computed by a constructive heuristic that packs rectangles according to a new sequence. As far as we know, the first such approach was proposed in [17] where a genetic algorithm was used to modify a sequence of rectangles which are then placed with the BL heuristic. Other widespread approaches are simulated annealing [8,14], tabu search [24], and a random local search [23]. Another possible approach is to apply a metaheuristic directly to solution layouts, e.g. using a genetic algorithm [3]. For MSPP, we developed a relatively simple randomized local search algorithm aimed at fast finding quality solutions. Note that our solution strategy is motivated by the practical application of MSPP in which the time to find a solution is strictly limited. Recall that local search heuristics start from some arbitrary initial feasible solution (incumbent) and then try to find a better solution in a neighborhood of the incumbent. If such a solution is found, it replaces the incumbent, and the procedure is repeated. Otherwise, the incumbent is said to be locally optimal with respect to the neighborhood and the algorithm halts. We suppose that our search space consists of all permutations of the set of rectangles J = {1, . . . , n}. The permutations represent input sequences interpreted by the skyline heuristic. Our approach starts from a random permutation of J and tries to improve it by searching for its neighbors. The neighborhood N consists of all permutations for which the Hamming distance from the current one is equal to 2, i.e. N comprises all permutations that differ from the current one in two positions. Note that cardinality of N is n n(n − 1) . |N | = = 2 2 If a neighbor with a better objective value (lower height) is found, it is accepted as the new incumbent, and the local search proceeds.
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
293
There are a lot of variations of local search depending on which neighbors to examine and accept. The common local search heuristic follows the steepest descent strategy, i.e. all neighbors are examined and the best one is chosen as the next incumbent. In our particular implementation, we adapted a randomized first improvement local search strategy, which assumes assessing only a subset of neighbors picked randomly. Thus, our algorithm chooses a random neighbor of the incumbent. If it is better, then the incumbent, it is replaced with it. Otherwise, another random neighbor is examined. The procedure halts when a certain number of examined neighbors is considered. In our implementation we set the number to be equal to 100n. To compute the objective value for a permutation of J, we run the skyline heuristic described above. Note that its naive implementation requires O(n2 ) time. However, a revised version may run much faster taking only O(n log n) time (see e.g. [15]). Thus, our main motivation for employing the randomized variant of local search instead of the steepest descent one is that the full examination of the neighborhood N in case of large numbers of rectangles becomes too slow and, at the same time, does not result in finding significantly better solutions.
5
Computational Results
Here we report some computational results to test the efficiency and effectiveness of the proposed approaches to MSPP. Our test bed consists of 10 problem instances derived from real-life scheduling problems for Spark systems provided by the Huawei’s Global Technical Service Team. Instances 1 − 5 are small, while 6 − 10 are large ones. All the small problems contain 500 rectangles, whereas the large ones assumes packing 1000 rectangles. For each problem we tested three cases for the number of strips: 2, 3, and 5. Since our test instances are derived from the applications to scheduling computing jobs, the widths of rectangles are integer (it is equal to the number of processors required) varying from 1 to 15, and the heights are relatively large and vary from 1 to 13197. The strip widths are not of the same size and may be 5, 10, or 15. We implemented the proposed heuristics using C++ programming language and tested them on a PC with Intel Core i7 10700 CPU 3.8 GHz and 64 GB of RAM. Our test results are presented in Tables 1, 2 and 3, where the first column demonstrates the problem name, while the next columns contain the results obtained with NFDH, FFDH, and our randomized local search algorithm (RLS). The last column (LMSPP) presents the results obtained with a commercial solver for the two-dimensional level strip packing problem. For each solution algorithm we report the found maximal height over all strips. For our randomized local search algorithm we also report the run time in seconds given in column T ime. In case of LMSPP we limited the MIP solver to 1 h and 2 h for small and large problems, respectively, and report upper bounds found. Note that the run times of both NFDH and FFDH are negligible, thus we do not report them in the tables. For RLS we present the best found results over 10 runs. As was expected, NFDH provides the worst results among the competing algorithms for all test instances. On the other hand, the ability of FFDH to
294
I. Vasilyev et al.
Table 1. Computational results on real-life problem instances. The number of strips m = 2. We report the maximum height of the strips and the run time of RLS in seconds Problem
NFDH FFDH RLS
Time LMSPP
test-prob1 test-prob2 test-prob3 test-prob4 test-prob5
19456 25514 19136 20975 26984
16899 22742 16830 18107 23584
15584 21225 15693 16923 22414
19,6 19,4 18,7 18,9 19,5
16698 22795 16641 17741 23685
test-prob6 test-prob7 test-prob8 test-prob9 test-prob10
40713 31813 43246 40167 30529
34535 29670 39538 35491 28821
33391 28367 36834 33722 27368
146,7 144,2 147,8 148,6 140,6
35471 30001 39418 36335 28755
Table 2. Computational results on real-life problem instances. The number of strips m = 3. We report the maximum height of the strips and the run time of RLS in seconds Problem
NFDH FFDH RLS
Time LMSPP
test-prob1 test-prob2 test-prob3 test-prob4 test-prob5
15476 15630 12279 13902 16434
13881 14268 11449 11863 14613
13041 13359 10500 11497 14046
21,7 21,9 21,3 21,4 22,5
13707 14514 11299 12059 15017
test-prob6 test-prob7 test-prob8 test-prob9 test-prob10
23271 24716 29019 25971 20426
20705 22192 26034 23595 19087
20187 21302 24879 22794 18334
168,6 165,1 169,1 162,5 163,8
21563 22459 26846 23669 20053
better utilize free spaces on levels allows it to find solutions that are very close or even better than ones obtained with LMSPP solved with a MIP solver (e.g. see large instances in Table 3). Taking into account the simplicity of FFDH and its high speed, it might be a quite reasonable option for real-life applications to cluster scheduling. We can see that when the numbers of strips are relatively small (2 or 3), our randomized local search heuristic outperforms all other competing levelbased approaches. Note that it takes more than 3 minutes to provide the best solutions. However, in case of 5 strips, RLS found the best solutions only to 6 out of 10 problems. In all other instances it is slightly inferior to FFDH heuristic.
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
295
Table 3. Computational results on real-life problem instances. The number of strips m = 5. We report the maximum height of the strips and the run time of RLS in seconds Problem
NFDH FFDH RLS
Time
LMSPP
test-prob1 test-prob2 test-prob3 test-prob4 test-prob5
9708 9701 9731 10509 9129
8354 8684 8538 8886 8301
8052 8546 8164 8904 8340
25.79 25.55 25.74 25.73 25.65
8232 8870 8505 8870 8886
test-prob6 test-prob7 test-prob8 test-prob9 test-prob10
16976 16642 15951 15826 15085
14743 14730 14539 14195 14191
14550 14317 14585 14303 13911
198.02 191.72 198.29 192.96 193.76
15025 14905 15177 15029 14655
This can be explained by the fact that the nature of test instances (containing large number of small rectangles) together with the large number of strips allow FFDH to better fill spaces on levels. At the same time, RLS relying on different sequences of rectangles requires much more time to search the neighborhoods of incumbent solutions. Nevertheless, the run time of RLS is relatively small and appropriate for practical use. Thus, it can be especially useful when a high utilization of computing resources is the primary goal.
6
Conclusion
In this paper we addressed a variant of the well-known two-dimensional strip packing problem, in which rectangles are to be packed into several strips, instead of just one. Motivated by a real-life application of this problem to scheduling multiple computer clusters, we developed several solution approaches to this problem. In particular, we addressed a special case of this problem where rectangles are to be located on so-called levels. We introduced its mathematical model cast as an integer program that can be solved with a commercial solver. We also proposed/implemented two heuristics for the proposed level problem which are extensions of NFDH and FFDH heuristics. Finally, we developed a skyline heuristic for the multiple strip packing problem and embedded it into a simple randomized local search algorithm that tries to improve the packing by searching sequences of rectangles. We compare the proposed approaches in a series of computational experiments on real-life problem instances concerning scheduling Spark computer clusters. Our future research will be focused on devising more sophisticated solution strategies, e.g. two-stage solution approaches involving local and global schedul-
296
I. Vasilyev et al.
ing, various metaheuristics or exact methods. Our research will also aim at finding quality lower bounds for the objective value of the problem. Due to the application of the problem to scheduling, the main characteristic of solution approaches is their low run time. Thus, we are going to develop some strategies to speed up the developed constructive heuristics especially for the large-scale problem instances.
References 1. Baker, B.S., Coffman, E., Jr., Rivest, R.L.: Orthogonal packings in two dimensions. SIAM J. Comput. 9(4), 846–855 (1980). https://doi.org/10.1137/0209064 2. Bezerra, V.M.R., Leao, A.A.S., Oliveira, J.F., Santos, M.O.: Models for the twodimensional level strip packing problem - a review and a computational evaluation. J. Oper. Res. Soc. 71(4), 606–627 (2020). https://doi.org/10.1080/01605682.2019. 1578914 3. Bortfeldt, A.: A genetic algorithm for the two-dimensional strip packing problem with rectangular pieces. Eur. J. Oper. Res. 172(3), 814–837 (2006). https://doi. org/10.1016/j.ejor.2004.11.016 4. Boschetti, M.A., Montaletti, L.: An exact algorithm for the two-dimensional strippacking problem. Oper. Res. 58(6), 1774–1791 (2010). https://doi.org/10.1287/ opre.1100.0833 5. Bougeret, M., Dutot, P.F., Jansen, K., Robenek, C., Trystram, D.: Approximation algorithms for multiple strip packing and scheduling parallel jobs in platforms. Discrete Math. Algorithms Appl. 3(4), 553–586 (2011). https://doi.org/10.1142/ S1793830911001413 6. Bougeret, M., Dutot, P.F., Trystram, D., Jansen, K., Robenek, C.: Improved approximation algorithms for scheduling parallel jobs on identical clusters. Theory Comput. Syst. 600, 70–85 (2015). https://doi.org/10.1016/j.tcs.2015.07.003 7. Burke, E., Kendall, G., Whitwell, G.: A new placement heuristic for the orthogonal stock-cutting problem. Oper. Res. 52(4), 655–671 (2004). https://doi.org/10.1287/ opre.1040.0109 8. Burke, E.K., Kendall, G., Whitwell, G.: A simulated annealing enhancement of the best-fit heuristic for the orthogonal stock-cutting problem. INFORMS J. Comput. 21(3), 505–516 (2009). https://doi.org/10.1287/ijoc.1080.0306 9. Castro, P.M., Oliveira, J.F.: Scheduling inspired models for two-dimensional packing problems. Eur. J. Oper. Res. 215(1), 45–56 (2011). https://doi.org/10.1016/j. ejor.2011.06.001 10. Coffman, E.G., Jr., Garey, M.R., Johnson, D.S., Tarjan, R.E.: Performance bounds for level-oriented two-dimensional packing algorithms. SIAM J. Comput. 9(4), 808– 826 (1980). https://doi.org/10.1137/0209062 11. Cˆ ot´e, J., Dell’Amico, M., Iori, M.: Combinatorial benders’ cuts for the strip packing problem. Oper. Res. 62(3), 643–661 (2014). https://doi.org/10.1287/opre.2013. 1248 12. Dutot, P.-F., Jansen, K., Robenek, C., Trystram, D.: A (2 + ε)-approximation for scheduling parallel jobs in platforms. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 78–89. Springer, Heidelberg (2013). https:// doi.org/10.1007/978-3-642-40047-6 11 13. Henning, S., Jansen, K., Rau, M., Schmarje, L.: Complexity and inapproximability results for parallel task scheduling and strip packing. Theory Comput. Syst. 64(1), 120–140 (2019). https://doi.org/10.1007/s00224-019-09910-6
Fast Heuristic Algorithms for the Multiple Strip Packing Problem
297
14. Hopper, E., Turton, B.C.H.: An empirical investigation of meta-heuristic and heuristic algorithms for a 2D packing problem. Eur. J. Oper. Res. 128(1), 34– 57 (2001). https://doi.org/10.1016/S0377-2217(99)00357-4 15. Imahori, S., Yagiura, M.: The best-fit heuristic for the rectangular strip packing problem: an efficient implementation and the worst-case approximation ratio. Comput. Oper. Res. 37(2), 325–333 (2010). https://doi.org/10.1016/j.cor.2009.05. 008 16. Iori, M., de Lima, V.L., Martello, S., Miyazawa, F.K., Monaci, M.: Exact solution techniques for two-dimensional cutting and packing. Eur. J. Oper. Res. 289(2), 399–415 (2021). https://doi.org/10.1016/j.ejor.2020.06.050 17. Jakobs, S.: On genetic algorithms for the packing of polygons. Eur. J. Oper. Res. 88(1), 165–181 (1996). https://doi.org/10.1016/0377-2217(94)00166-9 18. Jansen, K., Rau, M.: Linear time algorithms for multiple cluster scheduling and multiple strip packing. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 103–116. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7 8 19. Kenmochi, M., Imamichi, T., Nonobe, K., Yagiura, M., Nagamochi, H.: Exact algorithms for the two-dimensional strip packing problem with and without rotations. Eur. J. Oper. Res. 198(1), 73–83 (2009). https://doi.org/10.1016/j.ejor.2008.08. 020 20. Leung, S.C., Zhang, D.: A fast layer-based heuristic for non-guillotine strip packing. Expert. Syst. Appl. 38(10) (2011). https://doi.org/10.1016/j.eswa.2011.04.105 21. Lodi, A., Martello, S., Vigo, D.: Models and bounds for two-dimensional level packing problems. J. Comb. Optim. 8, 363–379 (2004). https://doi.org/10.1023/ B:JOCO.0000038915.62826.79 22. Martello, S., Monaci, M., Vigo, D.: An exact approach to the strip-packing problem. INFORMS J. Comput. 15(3), 310–319 (2003). https://doi.org/10.1287/ijoc.15.3. 310.16082 23. Wei, L., Hu, Q., Leung, S., Zhang, N.: An improved skyline based heuristic for the 2D strip packing problem and its efficient implementation. Comput. Oper. Res. 80, 113–127 (2017). https://doi.org/10.1016/j.cor.2016.11.024 24. Wei, L., Oon, W.C., Zhu, W., Lim, A.: A skyline heuristic for the 2D rectangular packing and strip packing problems. Eur. J. Oper. Res. 215(1), 337–346 (2011). https://doi.org/10.1016/j.ejor.2011.06.022 25. Yang, S., Han, S., Ye, W.: A simple randomized algorithm for two-dimensional strip packing. Comput. Oper. Res. 40(1), 1–8 (2013). https://doi.org/10.1016/j. cor.2012.05.001 26. Ye, D., Han, X., Zhang, G.: Online multiple-strip packing. Theor. Comput. Sci. 412(3), 233–239 (2011). https://doi.org/10.1016/j.tcs.2009.09.029 27. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings the 2nd USENIX Conference on Hot Topics in Cloud Computing, p. 10. USENIX Association, USA (2010) 28. Zhuk, S.N.: Approximate algorithms to pack rectangles into several strips. Discrete Math. Appl. 16(1), 73–85 (2006). https://doi.org/10.1515/156939206776241264
Operational Research Applications
The Research of Mathematical Models for Forecasting Covid-19 Cases Mostafa Salaheldin Abdelsalam Abotaleb(B)
and Tatiana Makarovskikh
South Ural State University, Chelyabinsk, Russia [email protected] http://www.susu.ru
Abstract. The world is currently facing a Covid-19 pandemic and that virus is spreading rapidly among people, which leads to an increase in the number of infection cases and also an increase in the number of death cases. This is a huge challenge as this pandemic affected all sectors, and therefore there was important for mathematicians in modelling this epidemic spread in the world to reduce the damage caused by this pandemic and also discovering the pattern of that virus spreading. In our report, time series models are used to obtain estimates of the number of cases of infection and numbers of deaths using ARIMA, Holt’s Linear Trend, BATS, TBATS, and SIR Models. We have developed a new algorithm to use these models and choose the best model for forecasting the number of infections and deaths in terms of the least error of MAPE as standard. We have observed in most of the data that were used in this algorithm that the best models that achieve the least forecast errors are BATS, TBATS, and ARIMA respectively. The experiment was held for the ten countries most affected by the Covid-19, this algorithm was able to detect the data pattern of the virus spreading for every country, besides it is interested in more research and studies on other models. Keywords: ARIMA · Holt’s model Forecasting · Covid-19 · Algorithm
1
· SIR · BATS · TBATS · MAPE ·
Introduction
The Covid-19 pandemic has become a global challenge for all fields of knowledge, not only for medicine, where treatments, drugs, vaccines, and test systems are developed but also for mathematics science, and statistics science As they are playing an important role in modelling and discovering a pattern of spread infection cases And also forecasting infection cases and deaths Covid-19. Since the first days of the pandemic, the various methods of modelling and forecasting the spread of infection cases both in the world and in local regions appeared. The work was supported by Act 211 Government of the Russian Federation, contract No. 02.A03.21.0011. The work was supported by the Ministry of Science and Higher Education of the Russian Federation (government order FENU-2020-0022). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 301–315, 2021. https://doi.org/10.1007/978-3-030-86433-0_21
302
M. S. A. Abotaleb and T. Makarovskikh
Various articles were devoted to the use of known models, identification of their parameters, testing on known data. Of course, The modelling of the spread Covid-19 began with the application of modifications of classical epidemiological SIR (Susceptible-Infected-Recovered), but it soon became clear that it was not possible to select the model parameters that would ensure its acceptable accuracy of forecasting. In parallel, statisticians carried out their researches on autoregression integrated moving average that’s called (ARIMA) models, which are powerful tools for short-term forecasting. Many articles have been published, mainly by scientific schools of South-East Asia, devoted to the selection of this model parameter for individual territories and same time intervals. We have shown that the parameters of the ARIMA model strongly depend on the region of study (for example, models with completely different parameters were used to predict the number of cases of infection in different regions of the Russian Federation for the same period). It should be noted that the ARIMA model allows one to abstract from the nature of the process and operate only with numerical information. On the other hand, there is no way to use exogenous variables to improve the forecast quality for such a model. Another disadvantage of this model is the need for additional training of the model and verification of its parameters. Another tool for forecasting Covid-19 infection spreading is the Holt-Winters adaptive model, which also does not actually explain the essence of the epidemic development process and is solely focused on the data itself but provides us with some information about dealing with seasonality and trend. At the very beginning of the epidemic, information on long-term forecasts of the epidemic progressing in the world as a whole and each country individually was updated daily. For example, experts from the Singapore University of Technology and Design (SUTD) predicted that the epidemic in Russia will end by June 20, 2020, and in the world by December 9, 2020, then they postponed the date several times and, in the end, cancelled their forecasts. Nevertheless, after some time it turned out that the time series with data on Covid-19 infection spreading is still too short to make any adequate long-term forecast. Generating high-precision short-term forecasts for the spread of confirmed cases of the disease, as well as analysing the number of deaths and recoveries, is an equally urgent task for today. Thus, the development of new approaches to modelling and forecasting time series is an urgent task at the time of Covid-19 pandemics. At present, when in some countries the peak of the second wave of the infection spreading has been passed, and articles on the seasonality of the disease have appeared in medical issues, prerequisites have emerged for the use of models that take into account the seasonal component. Authors of studies using the ARIMA model actively switched to the study of the SARIMA model. However, it should be understood that in the case of coronavirus infection spreading, it makes sense to talk about complex seasonal patterns. For such cases [13] has developed sophisticated BATS and TBATS models. This is a general method to iterate over a large number of possible models. The models were first presented by Australian scientists in
The Research of Mathematical Models for Forecasting Covid-19 Cases
303
2010–2012 and tested for classic forecasting problems as electricity consumption, fuel sales, call centre calls, etc. These models have not been earlier tested to predict the spread of the Covid-19 epidemic. The purpose of our work is to create an algorithm that allows for the available initial data on the spread of coronavirus infection in a certain region for a given period to determine the best model for making a forecast for a given period. The algorithm analyses forecasts from SIR, ARIMA, Holt’s linear model, BATS, and TBATS and selects a model that produces a forecast with a minimum mean absolute percentage error (MAPE). The article describes a program in the R language that allows you to get a forecast using the models described above. The first section discusses in detail each of the studied models. Some results of using these models to predict the spread of coronavirus infection are presented. The second section describes the structure of the forecasting model selection algorithm. In the third section, the results of computational experiments are presented. In the conclusion, the results obtained are discussed and directions for further research are discussed.
2 2.1
Mathematical and Statistical Models for Forecasting of Covid-19 Cases Conventional SIR Model
Conventional epidemiological models like SIR (Susceptible, Infected, Recovered) and their numerous variants describe the density of infected individuals I by the typical equation [7] dI = λIS − νI, (1) dt where S and I are the densities of susceptible and infected individuals, and λ, μ are positive constants. The first term in the right-hand side of this equation characterizes the appearance of newly infected individuals due to their contact with susceptible. The second term corresponds to the decrease of I due to the recovery or death of infected individuals. At the beginning of the infection spreading, the number of infected and recovered individuals are much less than the number of susceptible, so that we can approximate S by a constant S ≈ S0 . Using this approximation, we obtain a linear differential equation with constant coefficients. Its solution is given by the function I(t) = I0 exp(αt), where I0 is the number of infected individuals at the initial moment of time, and α = λS0 − ν. If α > 0, then I(t) grows exponentially. The same condition can be written as R0 > 1, where R0 = λS0 /ν is the basic reproduction number. Thus, according to Eq. (1), the growth of the number of infected individuals is exponential at the beginning of the epidemic, and it slows down later when S decreases. However, it does not describe accelerated growth during the second stage of the epidemic. A possible explanation of this acceleration is that at the beginning of the epidemic, the disease spreads mainly among people with weak immune response. At the second stage, people with strong immune response become also exposed.
304
M. S. A. Abotaleb and T. Makarovskikh
The classical SIR model does not provide a high quality of the obtained forecasts [8,10,20] due to the differences in algorithms for choosing its parameters. In [10] the recently released extension for R covid19. Analytics [18] uses this model and allows getting a complete picture of the spread of Covid-19 anywhere in the world. The author of this package claims that it does this by accessing and retrieving the data publicly available and published by two main sources: the “COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University” [2] for the worldwide and the US data, and the city of Toronto for the Toronto data [3]. The package also provides basic analysis and visualization tools and functions to investigate these datasets and other ones structured similarly. The main disadvantage of this package at the moment is the use of an exclusively classical SIR model for forecasting, which gives a very large error. At the very beginning of the pandemic, researchers appeared using sophisticated SIR models, taking into account both the disposal of infected people due to mortality (SIR(D) model in China) and the incubation period of the disease (SEIR model) for short-term forecasting of the epidemic in Mexico [6]. In [7] authors attempt to develop a SEIR model under the assumption that epidemic progression begins with a subpopulation characterized by a weak immune response and, at a later stage, it continues in the whole population. Though this assumption seems plausible, and authors do not have direct confirmation that this heterogeneity of the population plays a significant role during epidemic spread. The model discussed in this article does not take into account the involvement of many different cell types, intracellular regulation, time delay in virus production and clonal expansion of immune cells, and some other relevant aspects. However, it captures the main features of the interaction of viral infection with the immune response. These models have better results for longer-term forecasting (over 7 days). The peculiarities of the SIR(D) epidemiological model application include the fact that it takes constant values of coefficients responsible for the probability of infection, the probability of recovery, and the probability of death. However, in reality, effective (and not so) measures to constrain the epidemic (quarantines, restricting of events and traffic, masks usage, etc.) are being developed and practiced everywhere, which affects the change in the trajectory of the epidemic and, as a result, leads to the fact that the coefficients of such a model become variables. In article [12], the authors retrain the coefficients of the model based on the newly received data, which is justified to obtain short-term forecasts (up to 10 days) of high accuracy. 2.2
The Autoregressive Models
Models based on time series analysis, in particular ARIMA models, are difficult to customize when conducting a full analysis, but they almost always give good results where a high-quality forecast for the medium and short term is required [12]. In articles [11,17] as well as in many others that are easy to find in the Internet and Scientific Portals, the authors manually select the parameters for
The Research of Mathematical Models for Forecasting Covid-19 Cases
305
the time series available at the time of their publication and carry out calculations in programs widely used for statistical analysis (for example, R or Gretl) for individual states and regions for the fixed dates. Especially many similar works can be found among analysts from India and South-East Asia. Article [12] also cites several results of scientists from South-East Asia and Europe obtained similarly. In this work, the parameters of the ARIMA model obtained in early May for the territory of Russia are indicated and it is noted that these parameters vary for different regions; moreover, the authors show that these parameters may vary for different periods of the epidemic spreading. The work does not provide any algorithms that allow determining the parameters in automatic mode. We would also like to acknowledge the article [9], which proposes to consider the relations between countries located in the same geographic area to predict the spread of the virus. Countries in the same geographic region have variables with similar values (both quantity and non-quantity) that affect the spread of the virus. This approach can be used in the future to predict the spread of the virus in the Russian Federation since in different regions there is a different nature of the epidemic spreading. ARIMA model consists of three components – AR (autoregressive term) refers to the past values used to predict the next value; defined by parameter p in the autoregressive model Yt = Θ1 · Yt−1 + Θ2 · Yt−2 + . . . + Θp · Yt−p + εt , which is determined by PACF (partial auto-correlation function) between Yt and Yt−k , excluding the influence of Yt−1 , . . . , Yt−k+1 . – MA (moving average) used to determine the number of past forecast errors used to predict future values; defined by the parameter q obtained from ACF (auto-correlation function) ρk =
cov{Yt , Yt−k }t , var{Yt }t
yt = εt + α · εt−1 + . . . + αq · ε1−q , where εt is white noise, which is always a stationary process1 . The moving average shows the presence of fluctuations in the series. The higher the moving average value, the higher the chance of fluctuations. – I (integrating term): if the series is not stationary, then its difference of order d is defined, which allows getting stationary series. To check the stationarity of the obtained series, the extended Dickey-Fuller test (ADF) is used2 Kwiatkowski-Phillips-Schmidt-Shin test, KPSS3 . The same tests allow you to determine parameter d of the model. 1 2 3
White noise is a process that has a constant mathematical expectation, constant variance and zero, for all but zero lag, autocovariance function. A technique that is used in applied statistics and econometrics to analyze time series to check for stationarity. A criterion used to check the stationarity of the observed time series.
306
M. S. A. Abotaleb and T. Makarovskikh
Regular “additional training” of the ARIMA model at different stages of the epidemic development is an obvious disadvantage of its use. Even though it is often required to change not only the estimated parameters of the model, but also the number of hyperparameters of the model, so, this tool is well suited for short-term forecasting (up to 7 days). In our paper [14], we investigated the process of selecting the parameters of the ARIMA model for data on the spread of Covid-19 infection in different regions of the Russian Federation and at different periods. We showed that the model requires constant additional training, and it is also required to determine the factors affecting the change in the model parameters. The paper presents an algorithm for the automatic selection of model parameters, which will be used in the current work for forecasting using the ARIMA model. 2.3
Adaptive Anti-aliasing Models
Adaptive exponential smoothing models are a fairly popular tool for forecasting coronavirus infection spreading. These models also served as a common tool for making forecasts for time series corresponding to the development of the epidemic in various countries [4,15,19]. As in the case with the ARIMA model, the main drawback of most of the presented researches is the lack of explanation for the choice of the corresponding model specification, as well as the lack of an “explanation” for the selection of hyperparameters of the forecasting models [12]. We also note the article [4], which shows that the exponential smoothing model for the time series under consideration gives more accurate results than the ARIMA model. The Holt-Winters model does not explain the essence of the epidemic’s development in any way and exclusively focuses on the data itself. Thus, this model can note the phenomenon of insignificant seven-day cyclicality associated primarily not with the true development of the infection process, but with the work schedule of individual health services (testing laboratories, as well as administrative services) [12]. The most commonly employed seasonal models in the innovations state space framework include well-known Holt-Winters’ additive and multiplicative methods. The linear version of the Holt-Winters method to incorporate a second seasonal component as follows: (1)
(2)
yt = at−1 + bt−1 + st + st + dt , where at = at−1 + bt−1 + αdt ,
(2)
bt = bt−1 + βdt ,
(3)
(1)
st
(2) st
(1)
= st−m1 + γ1 dt , =
(2) st−m2
+ γ2 dt ,
(4) (5)
where m1 and m2 are the periods of the seasonal cycles and dt is a white noise random variable representing the prediction error (or disturbance). The components at and bt represent the level and trend components of the series at time
The Research of Mathematical Models for Forecasting Covid-19 Cases
307
(i)
t, respectively, and st represents the i-th seasonal component at time t. The coefficients α, β, γ1 and γ2 are the so-called smoothing parameters, and a0 , b0 , (1) (1) (2) (2) {s1−m1 , . . . s0 } and {s1−m2 , . . . s0 } are the initial state variables (or “seeds”). 2.4
BATS and TBATS Models
Models based on the Box-Cox transformation and ARMA model, take into account Trend and Seasonality (BATS) and its trigonometric modification developed by De Livera, Hyndman, and Snyder in 2010 [13]. They considered various modifications of the state space models for exponential smoothing to handle a wider variety of seasonal patterns. To avoid the problems with non-linear models the authors of [13] restrict their attention to linear homoscedastic models but allow some types of non-linearity using Box-Cox transformations. This limits their approach to only positive time series, but most of the considered series are positive. The same is correct for the time series for Covid-19 infection cases. The TBATS model uses a combination of the Fourier series. The selection of such model parameters is carried out in a fully automated manner. This allows the TBATS model, unlike other models, to simulate gradually changing seasonality by introducing combinations from the Fourier series into the model. One of the disadvantages of this model is the computation speed. Let us consider this model. Box-Cox transformation applied to the time series yt is the following: (yω −1) t (w) , ω = 0, ω yt = log(yt ), ω = 0. Observations are modelled as a series (w)
yt
= lt−1 + φbt−1 +
M i=1
(i)
st−mi + dt ,
(6)
where – bt = (1 − φ)b + φbt−1 + βdt reflects a global trend; – lt = lt−1 + φbt−1 + αdt reflects a local trend; p q – dt = φi dt−i + θj εt−j + εt are the errors of ARMA. i=1
j=1
(i)
Equation (6) has M seasonal periods, each of which, in turn, consists of st = ki (i) sj,t . Each term is modelled using a Fourier series:
j=1
(i)
(i)
(i)
∗(i)
(i)
(i)
sj,t = sj,t−1 cos λj + sj,t−1 sin λj + Γ1 dt ; (i)
(i)
(i)
∗(i)
(i)
(i)
sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + Γ2 dt . The BATS model is the most obvious generalization of the traditional seasonal innovation models to allow for multiple seasonal periods. However, it cannot accommodate non-integer seasonality, and it can have a very large number
308
M. S. A. Abotaleb and T. Makarovskikh
of states; the initial seasonal component alone contains mt non-zero states. This becomes a huge number of values for seasonal patterns with high periods. The advantages of the TBATS model are the following: 1. it admits a larger effective parameter space with the possibility of better forecasts; 2. it allows for the accommodation of nested and non-nested multiple seasonal components; 3. it handles typical non-linear features that are often seen in real-time series; 4. it allows for any autocorrelation in the residuals to be taken into account; 5. it involves a much simpler, yet efficient estimation procedure. BATS and TBATS Models a combination of Fourier with an exponential smoothing state-space model and a Box-Cox transformation, Both the BATS and TBATS models dealing with seasonality. BATS differs from TBATS only in the way it models seasonal effects, and BATS can only model integer period lengths.
3
Algorithm for Selecting a Model for Forecasting
Let’s consider the architecture of the model selection algorithm for forecasting the time series. The input data of the algorithm are the following parameters: – – – – – – –
X be analyzed time series. Ylab be reference time series. AD be date interval for the analyzed time series. F D be the forecasting period. V alD be the number of days during which the result is supposed to be tested. F req be timeline units (days, weeks, months, years). P op be population size (used for SIR model only).
The algorithm calculates forecasts using five models: ARIMA, BATS, TBATS, Holt’s model, and SIR. The scheme of the algorithm is shown in the Fig. 1. All algorithms are launched according to the same scheme. An important stage in forecasting various phenomena is assessing the accuracy and reliability of forecasts. It is known that an empirical measure of forecast accuracy is the value of its error, which is defined as the difference between the predicted and actual values of the explored indicator. Such an approach is possible only in two cases: when the lead period is known and the researcher has the necessary actual values of the forecast indicator; or when we obtain a retrospective forecast, which is held to test the developed forecasting methodology. All indicators for assessing the accuracy of statistical forecasts may be divided into three groups: – analytical: absolute and relative forecast errors, average forecast accuracy, mean square forecast error, average approximation error;
The Research of Mathematical Models for Forecasting Covid-19 Cases
309
Fig. 1. The scheme of algorithm selecting the best forecasting model
– comparative: the coefficient of correlation between the predicted and actual values, the coefficients of the discrepancy; – quality. In the considered scheme for choosing the best model for forecasting the time series provided as input data, for each model, a minimum means absolute percentage error (MAPE) is analyzed and the model with the minimum error value is selected as the best model for the current input data. The advantage of the proposed algorithm is that it allows choosing a suitable model for time series of different lengths corresponding to one object of research. Moreover, for the same object of study and for different time intervals, different models can be selected.
4
Forecasting Covid-19 Cases Using the Developed Algorithm
In our work, we used the data sets [2,3], as well as the official data on the regions of the Russian Federation, provided on the portal [1]. The package epidemic.ta in the R language has been developed, which allows you to select the parameters of the models under consideration for the input data received on cases of Covid19 infection in different states, to make a forecast, calculate the forecast error and determine the best model for predicting infection cases for each vector data. Let’s consider the MAPE error for the weekly forecast made by the considered five models for top-10 countries. The MAPE values are shown in Table 1. In this table, we also show the best model for each country. So, it is easy to see the following. There are different models for better forecasting for different countries. Complex models like BATS and TBATS requiring more computational time may
310
M. S. A. Abotaleb and T. Makarovskikh
Table 1. Analysis of MAPE of Covid-19 cases forecasting for 7 days using different models Country
BATS TBATS Holt’s ARIMA SIR
The best model
USA
0.972
0.961
0.811
1.102
7.415
Holt’s
India
0.073
0.082
0.118
0.049
1.205
ARIMA (5,2,0)
Brazil
0.412
0.407
0.492
0.262
3.284
ARIMA (5,2,0)
Russia
0.086
0.089
0.224
0.063
5.191
ARIMA (1,2,4)
UK
0.454
0.319
0.080
0.272
13.949 Holt’s
France
0.737
0.711
0.504
0.550
3.848
Holt’s
Turkey
0.953
0.421
0.148
0.364
6.101
Holt’s
Italy
0.320
0.290
0.158
0.406
5.012
Holt’s
Spain
0.725
0.310
0.827
0.398
3.772
TBATS
Germany
0.656
0.711
0.816
0.623
6.808
ARIMA (2,2,2)
Average MAPE (%) 0.5388 0.4301
0.4178 0.4089
5.6585
give good results but these results are comparable with the results obtained by the classical linear models like ARIMA and Holt’s model. So, the use of complicated models, especially those taking into account the floating seasonality, is not advisable. As for the SIR model that allows getting short-time forecasts with greater error than linear models. It needs lots of computations to calculate the correct and always changing parameters. This is since the parameters of the model are selected based on expert assessments and calculations, taking into account many factors that affect the spread of the epidemic. At the moment, it is still difficult to conclude whether it is advisable to discard meaningful information about the spread of the epidemic and use a numerical vector to search for a pattern. But following the available data, it can be argued that the use of such anonymized vectors makes it possible to obtain short-term forecasts with an error of less than 1%. For long-term forecasting, such methods are not applicable and a search for other forecasting methods is required. All the detailed results of our computational experiments with source code, tables, and graphs are published at Rpubs [5]. After choosing the best model there are some factors influencing model that make model not good although in the past it showed good results of forecasting (see Fig. 2). Those factors lead to that error we can measure with several methods. These factors may vary from country to country, and now there are lots of questions about these factors and it is the topic of future researches.
5
Further Researches
The topic of further research is discovering the structure of initial data and factors influencing them. Our software allows getting the forecasts using all the five models for each country in the world. Now we are to discover the necessity of using all the models to get a forecast for any country or we can use an expert
The Research of Mathematical Models for Forecasting Covid-19 Cases
311
Fig. 2. How outer factors influence the process
system to define the appropriate method for forecasting using some information about a country (its situation, climate, economics and medicine development etc.) This approach may be used for selecting the appropriate model if the number of realised methods is highly greater than five. As soon as the Covid-19 spreading process is not well studied, the urgent problem is to test different existing models, providing different experiments, and developing new methods for forecasting time series. There are lots of factors influencing the spreading of Covid-19. Some of them are well studied, some of them are not discovered. So, to discover the process we are forecasting we need to investigate not only the process but also the background against which it develops. The components of the process may depend on each other or be independent. The most complicated research deals with some events in the outer world and some random factors that cannot be predicted. One of the modern methods of increasing the accuracy of forecasts lays in the superposition of analysis of time series and cognitive modelling. One of the examples of cognitive maps for Covid-19 spreading we give in Fig. 3. The whole cognitive map for Covid-19 is too complicated, and discovery of each of its influencing quantitative parameter is a topic of big medical research that may be held for years. Hence, we will consider only the following modification of the SIR model to describe the random process of spreading Covid-19 infection. Let a part of the population be in one of the following six states: – – – –
S0 S1 S2 S3
– – – –
healthy and immune; contacted (susceptible); infected (with symptoms); recovered (with immunity);
312
M. S. A. Abotaleb and T. Makarovskikh
Fig. 3. Cognitive map of Covid-19 spreading
– S4 – died; – S5 – vaccinated (with immunity). We will consider that for each moment in time the probability of any state in the future depends only on the state of the system in the present and does not depend on when and how the system came to this state. The system moves from one state to another at any time under the influence of certain streams of events. We will consider these flows to be the simplest. The probability of transition from state Si to state Sj in a short time interval Δt is pij = λij · t, where pij be transient probability, and λij be the density of the flux that transfers the system from state Si to state Sj . Let us know all λij . If each arc of the state graph is associated with the density λij , then such a graph will be called a labelled state graph. Knowing this labelled graph, we can determine the probabilities of states P1 (t), P2 (t),..., Pn (t) as a function of time. To describe processes with continuous time, we use a model in the form of a so-called Markov chain with discrete states, assuming time to be continuous. In this sense, we will say that we are dealing with a continuous Markov chain (Fig. 4). Let us construct a system of Kolmogorov differential equations. On the left side of each equation is the derivative of the probability of the state that is being considered, and the right side contains as many terms as there are transitions (arcs) associated with this state. For outgoing arcs, the terms are written with
The Research of Mathematical Models for Forecasting Covid-19 Cases
313
Fig. 4. Markov process for Covid-19 spreading
a minus sign, for incoming arcs - plus. Each term is equal to the product of the probability density of the corresponding transition by the probability of the state from which the transition occurs. The system of Kolmogorov differential equations is the following ⎧ dP0 (t) = −λ01 P0 (t) − λ05 P0 (t) + λ30 P3 (t); ⎪ dt ⎪ ⎪ dP1 (t) ⎪ ⎪ = λ01 P0 (t) + λ51 P5 (t) − λ12 P1 (t); ⎪ ⎪ ⎨ dPdt 2 (t) = λ12 P1 (t) − λ23 P2 (t) − λ24 P2 (t); dt 3 (t) ⎪ dPdt = λ23 P2 (t) + λ53 P5 (t) − λ30 P3 (t); ⎪ ⎪ ⎪ dP4 (t) ⎪ = λ24 P2 (t) + λ54 P5 (t); ⎪ ⎪ ⎩ dPdt 5 (t) = λ05 P0 (t) − λ51 P5 (t) − λ53 P5 (t) − λ54 P5 (t). dt In addition, the equality
5
(7)
Pi (t) = 1
i=0
hold. Additional initial conditions must be added to these equations. For the case under consideration, at the moment t = 0, the system is in state S0 . Thus, the initial conditions are: P0 (0) = 1; P1 (0) = 0; P2 (0) = 0; P3 (0) = 0; P4 (0) = 0; P5 (0) = 1. The study and solution of the system of Eqs. (7) is a topic for further research. But Markov process is also the first-order autoregressive model, hence, it is planned to consider nonlinear models and quasilinear methods [16] for forecasting time series.
314
6
M. S. A. Abotaleb and T. Makarovskikh
Conclusions
The article discusses an algorithm that allows obtaining a forecast using several linear models of short-term forecasting and the epidemiological SIR model. Note that the considered algorithm is extensible, and various modules can be connected to it, providing the construction of forecasts by various methods. Thus, using the considered algorithm scheme, it is possible to create a flexible calling function that allows choosing from the set of implemented methods the one that allows getting the best result by a given criterion. Currently, this criterion is MAPE. In the future, we shall modify the calling function so that the user can choose the criteria for determining the best method. Note that this scheme, which we considered on the example of forecasting the time series of coronavirus infection spreading, is applicable for other time series, for example, for forecasting sales and production volumes, changes in prices and exchange rates etc. Let us note that we have considered models for short-term forecasting of time series. As for directions for further research, it is planned to provide the possibility of medium-term and long-term forecasting of time series in weakly structured situations, to develop mechanisms for correcting long-term forecasts, to force a set of forecasting models taking into account the forecasting quality in previous periods, and also to consider the possibility of using non-linear forecasting models for weakly structured data. All these, as well as using some other criteria for verification of the best models can extend and improve the considered in this paper algorithm (as soon as this algorithm is built in the way allowing extension).
References 1. Coronavirus: Statistics. https://yandex.ru/covid19/stat 2. Covid-19 data repository by the center for systems science and engineering (csse) at johns hopkins university. https://github.com/CSSEGISandData/COVID-19 3. Covid-19: Status of cases in toronto. https://www.toronto.ca/home/covid-19/ covid-19-latest-city-of-toronto-news/covid-19-status-of-cases-in-toronto/ 4. Abotaleb, M.S.A.: Predicting covid-19 cases using some statistical models: an application to the cases reported in china Italy and USA. Acad. J. Appl. Math. Sci. 6(4), 32–40 (2020). https://doi.org/10.32861/ajams.64.32.40 5. Abotaleb, M.S.A., Makarovskikh, T.A.: https://rpubs.com/abotalebmostafa/ 6. Avila, E., Canto, F.J.A.: Fitting parameters of seir and sird models of covid?19 pandemic in mexico. https://www.researchgate.net/publication/341165247 Fitting parameters of SEIR and SIRD models of COVID-19 pandemic in Mexico 7. Banerjee M., Tokarev A., V.V.: Immuno-epidemiological model of two-stage epidemic growth. Mathematical Modelling of Natural Phenomena (15) (2020). https://doi.org/10.1051/mmnp/2020012 8. Barzon, G., Rugel, W., Manjuna, K.K.H., Orlandini, E., Baiesi, M.: Modelling the deceleration of covid-19 spreading. https://www.researchgate.net/publication/ 344530056 9. Hernandez-Matamorosb, A., Fujitaa, H., Hayashib, T., Perez-Meana, H.: Forecasting of covid-19 per regions using arima models and polynomial functions. Appl. Soft Comput. J. (96), 106610 (2020). https://doi.org/10.1016/j.asoc.2020.106610
The Research of Mathematical Models for Forecasting Covid-19 Cases
315
10. Hussain N., L.B.: Using r-studio to examine the covid-19 patients in Pakistan implementation of sir model on cases. Int. J. Sci. Res. Multidisciplinary Stud. 6(8), 54–59 (2020). https://doi.org/10.13140/RG.2.2.32580.04482 11. Kumar M., Gupta S., K.K.S.M.: Spreading of covid-19 in India, Italy, Japan, Spain, UK, USA prediction using arima and lstm model. Digital Government: Res. Practice 1(4), 24 (2020). https://doi.org/10.1145/3411760 12. Lakman I.A., Agapitov A.A., S.L.e.a.: Possibilities of mathematical forecasting of coronavirus infection in the Russian federation. Arterialnaya gipertenzia 26(3), 288–294 (2020) 13. Livera, A.D., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 106, 1513–1527 (2011). https://doi.org/10.1198/jasa.2011.tm09771 14. Makarovskikh, T.A., Abotaleb, M.: Automatic selection of arima model parameters to forecast covid-19 infection and death cases. Bull. South Ural State Univ. Series: Comput. Math. Softw. Eng. 12(3), Z1–Z2 (2021) 15. Panda, M.: Application of arima and holt-winters forecasting model to predict the spreading of covid-19 for India and its states. https://doi.org/10.1101/2020.07.14. 20153908 16. Panyukov, A., Mezal, Y.: Parametric identification of quasilinear difference equation. Bulletin of the South Ural State University Series “Mathematics. Mechanics. Physics” 11(4)), 32–38 (2019). https://doi.org/10.14529/mmph190404 17. Perone, G.: Arima forecasting of covid-19 incidence in Italy, Russia, and the USA. https://doi.org/10.2139/ssrn.3612402 18. Ponce, M.: Covid19.analytics: An r package to obtain, analyze and visualize data from the corona virus disease pandemic (2020) 19. Shokeralla, A.A.A., Sameeh, F.R.T., Musa, A.G., Zahrani, S.: Prediction the daily number of confirmed cases of covid-19 in Sudan with Arima and holt-winters exponential smoothing. Int. J. Dev. Res. 10(8), 39408–39413 (2020). https://doi.org/ 10.37118/ijdr.19811.08.2020 20. Sun, D., Duan, L., Xiong, J., Wang, D.: Modelling and forecasting the spread tendency of the covid-19 in China. BMC Infectious Diseases (2020). https://doi. org/10.21203/rs.3.rs?26772/v1
Detecting Corruption in Single-Bidder Auctions via Positive-Unlabelled Learning Natalya Goryunova , Artem Baklanov(B) , and Egor Ianovski HSE University, 3A Kantemirovskaya Street, St Petersburg 194100, Russian Federation [email protected]
Abstract. In research and policy-making guidelines, the single-bidder rate is a commonly used proxy of corruption in public procurement used but ipso facto, this is not evidence of a corrupt auction, but an uncompetitive auction. And while an uncompetitive auction could arise due to a corrupt procurer attempting to conceal the transaction, but it could also be a result of geographic isolation, monopolist presence, or other structural factors. In this paper, we use positive-unlabelled classification to attempt to separate public procurement auctions in the Russian Federation into auctions that are probably fair and those that are suspicious.
Keywords: Positive-unlabelled classification Single-bidder auctions
1
· Public procurement ·
Introduction
Public procurement is the process by which government entities purchase goods or services from the public sector. Given that a public official is spending public money on public interests, moral hazard is at play – if a school director is given full freedom in how to purchase milk for a canteen, what is to stop him from buying the most expensive milk possible from a supplier that he happens to own stock in? As such strict regulations are needed to govern public procurement, typically specifying admissible auction formats and rules on advertisement and disclosure of tenders. However, legislating a competitive and transparent procedure does not guarantee that the procedure will indeed be competitive and transparent. Corruption imposes a cost on society, and scholars seeking to understand this cost need to devise means to measure how prevalent corruption is [5,10,12,27]. A common proxy of corruption in public procurement is the single-bidder rate – the proportion of auctions that only attracted a single firm – which is both prominent in the literature [6,9,17,25] and is used by the European Commission to assess the effectiveness of public procurement in EU member states [8]. The study was supported by a grant from the Russian Science Foundation (project No. 20-71-00034). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 316–326, 2021. https://doi.org/10.1007/978-3-030-86433-0_22
Detecting Corruption in Single-Bidder Auctions
317
The single-bidder rate in Russia is high. In the class of auctions studied in this paper 48% attracted a single bidder; compared to the EU only Poland and Czechia fare worse with 51% [8]. However, while the problem of corruption in Russia is well known [1,24], there are many reasons why the procurement process could be uncompetitive – the country is vast and sparsely populated, and large sectors of the economy are dominated by monopolies and oligopolies; it is conceivable that in many parts of the country if an official wishes to purchase a product, one supplier is the best he can hope for. Learning to divide the two is important as while addressing the problems of monopolies requires long-term structural change, corruption can be dealt with immediate regulatory action. 1.1
Related Work
The idea of this work originated in a series of works on bid leakage in Russian procurement auctions [2,14,15]. Bid leakage is a form of corruption in a firstprice, sealed-bid auction where the procurer reveals the contents of other firms’ bids to a favoured firm, allowing the favoured firm to submit a marginally lower bid at the end of the auction and take the contract for the highest price possible. The work of [2] is based on the assumption that in a fair auction the order of the bids should be independent of the winner; if it turns out that the last bidder is more likely to win than the others, there is reason to suspect bid-leakage has taken place. This was followed up by [14,15] who relaxed the independence assumption, as they demonstrated that in a game theoretic model there are legitimate reasons for a serious competitor to delay bidding – in particular, if a firm believes the procurer is corrupt and bid leakage could take place, by bidding near the deadline, there would be no time to leak the bid to the favoured firm. Their approach was based on the DEDPUL positive-unlabelled classifier [13]: auctions, where the last bidder did not win, were labelled as “fair” (even if bid leakage did take place, it was not successful), and the classifier separated the remaining auctions into “fair” or “suspicious”. Their approach found signs of bid leakage in 9% of auctions with three or more participants, and 16% of auctions with two or more. The single-bidder rate is a common proxy of corruption in the literature, but while the authors acknowledge that a single-bidder auction is not necessarily evidence of corruption, there has been no attempt to distinguish the two. To our knowledge, the closest is the works of [6] and [10], who find some correlation between the single-bidder rate and features suggestive of corruption such as a short advertisement period or subjective award criteria. We are not aware of any work that attempts to assign a posteriori probabilities to a single-bidder auction being corrupt. A related body of work is the detection of collusion in procurement auctions. Collusion differs from corruption in that it is an agreement between firms to limit competition, i.e. a cartel, rather than an agreement between a firm and the procurer. Classic work in the field relies on having access to a dataset with court records identifying confirmed cases of cartel activity, which allows training a
318
N. Goryunova et al.
classifier on statistical screens [11,20,26]. In the absence of such data, researchers look for behaviour that is inconsistent with a theoretical model of competition [16,18,22]. 1.2
Our Contribution
We use the DEDPUL positive-unlabelled classifier [13] to separate a dataset of auctions held in the Russian Federation in the years 2014–2018 into a class that is “fair” and “suspicious”, based on a selection of features indicative of corruption. This approach labels just over half (53.86%) of single-bidder auctions as “suspicious”. The distribution of posterior probabilities reveals a cluster of auctions with the posterior probability of being labelled as “suspicious” close to 1. A decision tree for this cluster reveals two patterns that resemble the “one-day firm” mode of corruption – a firm being created on paper to snap up government contracts rather than do legitimate business – and a third pattern that could equally describe a legitimate monopolist as a corrupt relationship.
2 2.1
Methodology Positive-Unlabelled Learning
In this paper, we employ the method of positive-unlabelled (PU) learning that, similar to general binary classification, provides a classifier that can separate positive and negative instances based on the features but with less information available. Namely, the training (labelled) data constitute only a fraction of the positives; labels for negative examples are not provided. In the context of this paper, the positive class will represent the bids believed to be fair, and the negative class the bids that are suspicious. In the initial dataset, all bids in multi-bidder auctions are labelled positive, and bids in singlebidder auctions are unlabelled. The task of the classifier is to decide which of these unlabelled bids should be classed as positive, i.e. probably fair, or negative, possibly corrupt. A PU dataset can be mathematically abstracted as a collection of (x, y, s) with x a vector of features of an instance, y the class and s a binary variable indicating whether the triplet was labelled. The true class y may not be known. By definition of a PU learning problem, every instance with s = 1 is labelled as positive: P r(y = 1|s = 1) = 1. Following the general PU learning setting, we assume that the data x is an independent and identically distributed sample from f , the probability density function (pdf) that we want to estimate, such that x ∼ f (x) = fu (x) = αf+ (x) + (1 − α)f− (x); here α stands for the prior of positive class (i.e., P r(y = 1)) and f+ , f− , fu are the pdfs of the positive, negative, and unlabelled examples, respectively. Unfortunately, the true value of α is unidentifiable [13]. Namely, even under precise knowledge of fu and f+ , estimation of α is an ill-posed problem since
Detecting Corruption in Single-Bidder Auctions
319
any α ˆ such that fu (·) ≥ α ˆ f+ (·) is a valid guess. Thus, following [13] we only compute α∗ , the upper bound for the valid estimates of true α, α∗ = inf x
fu (x) , f+ (x)
(1)
hence α ∈ [0, α∗ ]. We make the following assumption, a rather strong one but common for PU learning (see [7] and [3, Definition 1]): the labelled examples were Selected Completely At Random (SCAR) independently from their attributes. Thus, we treat the probability of absence of corruption to be independent from the attributes of auctions and equal to the estimate of prior α∗ . This assumption delivers nice properties for classification problems (see [7]): – The probability of an instance being labelled is proportional to the probability of an instance being positive. – Non-traditional classifiers (classifiers that treat unlabelled instances as negative) preserve the ranking order. i.e., if we only wish to rank instances with respect to the chance that they belong to positive class, then non-traditional classifiers rank instances similarly to estimation of f obtained by a traditional probabilistic classifier. Robust estimation of α∗ can be a very complex task due to the challenges of finding the best fit for multidimensional empirical distributions and the inf operation in (1). To tackle this issue, we employ a state-of-the-art PU learning algorithm (DEDPUL [13]) that uses multiple regularisation techniques. At the first step, a non-traditional classifier is trained. In general, this can be any classifier separating positive from unlabelled. At this step, using cross-validation, we applied CatBoost [21], an algorithm based on gradient boosting of decision trees that achieves state-of-the-art performance. At the second step, DEDPUL corrects the bias caused by treating all unlabelled instances as negatives, taking care of the challenges we mentioned. 2.2
The Data
Public procurement in Russia is governed by Federal Law No. 44-FZ [23], which specifies admissible procurement formats and requires that the data be publicly available on the official website (https://zakupki.gov.ru/). We focus on the “requests for quotation” format which is a first-price, sealed-bid auction for low value transactions (the maximum reserve price is 500,000 roubles, approximately 6,600 USD). These are frequent auctions with an objective award criterion (i.e., the contract is awarded to the lowest bid, with no considerations of quality or reputation) and are thus amenable to machine learning techniques. The dataset used in this paper was extracted by [15]. It covers the years 2014–2018 and consists of 3,081,719 bids from 1,372,307 auctions and 363,009 firms. An observation in the dataset is a bid and is labelled by the identification of the procurer, firm, auction, and region; the reserve price of the auction and
320
N. Goryunova et al.
the actual bid of the firm; the start and end date of the auction; the date the bid was actually placed.1 After preliminary processing we removed 10.6% of the bids. About 3% removed due to errors in the data, consisting in one or more of: 1. 2. 3. 4.
Missing values in the bid description. The start date of the auction being later than the end date. The bid amount being less than zero or higher than the reserve price. The reserve price being higher than the maximum allowed price of 500,000 roubles.
The rest were removed for one of three reasons: 1. The auction took place in Baikonur, which is administered by Russia but is part of Kazakhstan. 2. The reserve price was under 3,440 roubles (lowest 0.5%). 3. The firm placing the bid appears once in the dataset. Baikonur was excluded for its peculiar status. The minimum reserve price of 3,440 (about 45 USD) is an ad hoc approach to remove potential data errors – a price of 0 or 1 rouble should probably be classed as an error, but it is not clear where to draw the line, so we opted to drop the bottom 0.5%. Firms that bid once were removed because one of our features is the length of time the firm is active in the system. This is a potential issue since our dataset covers the years 2014–2018 and could capture a firm that was active before this period and stopped in 2014, or a firm that began activity in 2018. We do not wish to misidentify an established firm with a long history of bids that ceased operations in 2014 with a firm that only placed one bid, ever. This left us with 2,787,136 bids. Every bid consists of four identifiers and five values. The ranges of these values are summarised in Table 1 and Table 2. Table 1. Categorical variables Variable
Description
Number of values
procurer_id Identification of the procurer firm_id
Identification of the firm
auction_id
Identification of the auction
region_id
The auction location (RF subject)
43,311 255,650 1,358,369 85
The single-bidder rate was high, and increasing over the time period. Year
2014 2015 2016 2017 2018
Single-bidder rate 0.4
1
0.47 0.51 0.52 0.51
All the data and code used is available on request.
Detecting Corruption in Single-Bidder Auctions
321
Table 2. Numeric variables Variable
Description
Min
Median Max
reserve_price Reserve price set by procurer (roubles) 3,440 price
2.3
Bid price set by firm (roubles)
134,637 500,000
0.01
106,500 500,000
start_date
Start date of the auction
28.01.14
end_date
End date of the auction
31.01.14
26.03.18 30.03.18
date
Time of bid
29.01.14
26.03.18
Feature Engineering
We train the classifier on the features in Table 3. These were chosen to reflect potential signs of corruption. The intuition behind the features is as follows: Table 3. Features engineered for the model Variable
ID
Type
Is the auction a single-bid auction?
single
Binary 0
Min Median Max 0
Time from bid to the end date (seconds)
bid.date
Int
0
72 000 783 840
Ratio of bid to reserve price
bid.price
Float
0
0.90
Has the firm dealt with the procurer before?
con.met
Binary 0
0
1
Ratio of firm’s victories with procurer to total con.win victories
Float
0
0.06
1
How many auctions did the firm bid in?
Int
2
27
How long the firm is active in the data (days) sel.period
Int
0
980
Ratio of reserve price to maximum (500,000)
au.reserve
Float
Auction duration (days)
au.duration Int
Is auction in Moscow or Moscow Oblast?
au.moscow
sel.num
Ratio of unique winners to number of auctions buy.unique held by procurer
0
0.27
0
7
Binary 0
0
Float
0.02 0.5
1 1
8319 1 498 1 26 1 1
– bid.date: If a firm knows in advance there will be no competition, they have no need to delay their bid. – bid.price: If a firm knows there will be no competition, they will ask for the highest possible price. – con.met: If a firm has a corrupt relationship with a procurer, they are likely to have met before. – con.win: If a firm wins a disproportionate number of its tenders from a single procurer, the question arises why this firm is unable to win an auction anywhere else. – sel.num: A monopolist would likely participate in a vast amount of auctions; a firm owned by the procurer is likely smaller. – sel.period: A monopolist would be an established firm.
322
N. Goryunova et al.
– au.reserve: If you are out to fleece the taxpayer, why ask for less than the maximum price? – au.duration: A short auction is less likely to be noticed. – au.moscow: The “geographic isolation” argument does not apply to the heart of the Russian economy. The main feature, single, is whether or not the auction attracted a single firm. We label auctions with more than one participant (single = 0) as positive – the “fair” class – and the remaining auctions (single = 1) are the unlabelled set that the classifier will attempt to separate into positive (“fair”) and negative (“suspicious”) instances. As shown in Table 4, these classes are behave differently with respect to our features. Table 4. Statistics of single-bidder and competitive auctions ID
Mean
Median
Std. Dev.
single = 1 single = 0 single = 1 single = 0 single = 1 single = 0
3
bid.date
146 910
133 815
76 800
71 220
172 297
167 765
bid.price
0.94
0.81
0.99
0.86
0.12
0.19
con.met
0.58
0.43
1
0
0.49
0.50
con.win
0.42
0.20
0.29
0.02
0.37
0.32
sel.num
288
181
22
29
1 070
683
sel.period
927
855
1064
947
422
439
au.reserve
0.30
0.38
0.20
0.30
0.29
0.31
au.duration 7.83
8.26
7
7
2.83
2.94
au.moscow
0.06
0.14
0
0
0.25
0.35
buy.unique
0.49
0.54
0.48
0.52
0.18
0.19
Results
The DEDPUL classifier was trained on the features in Table 3. The positive class is all auctions with single = 0. The unlabelled auctions (single = 1) were separated into positive and negative. The algorithm found an α∗ of 46.14%. Thus 1 − α∗ – the probability of a single-bidder auction being labelled negative – is 53.86% (13% of all bids). The distribution of posterior probabilities has an interesting cluster near 1. 26% of all bids in single-bidder auctions (about 170,000) have a posterior probability of being negative higher than 0.96. These bids differ strongly from those in multi-bidder auctions, but it is too early to conclude that they are corrupt – it is conceivable that an auction in a monopolistic market, where both the procurer and the monopolist know the auction will only attract a single bidder, will look very different from multi-bidder auctions. For example, the monopolist might always bid the reserve price, knowing the bid will be unchallenged, and
Detecting Corruption in Single-Bidder Auctions
323
an honest procurer might counter this by using a lower reserve price than they would in a competitive market. To understand the nature of this cluster better we fit a classification tree, depicted in Fig. 1, that separates two classes of single-bid auctions: the one consisting of all highly suspicious actions (class 1) with posterior probability of being negative higher than 0.96 and the complementing class that includes the remaining auctions (class 0). The tree shows high overall accuracy equal to 0.9.
Fig. 1. Decision tree that detects highly suspicious auctions (class 1) within the set of all single-bid auctions.
There are three paths to an auction being placed in the cluster by the tree: 1. If bid price equals to reserve price and the firm has not met the procurer before. 2. If bid price equals to reserve price, the firm has met the procurer, the auction is not in Moscow or Moscow Oblast, and the firm has been active in the system for more than 707 days. 3. If bid price equals to reserve price, the firm has met the procurer, the auction is not in Moscow or Moscow Oblast, and the firm has been active in the system for less than 707 days and participated in less than 5 auctions. The fact that the firms bid the reserve price exactly is nearly a prerequisite for an auction being in the cluster. It demonstrates that a firm, for whatever reason, knows that it will be the sole bidder and does not even attempt to compete. But as we argued, this is not necessarily corruption since a monopolist would have no reason to compete either. The first path, then, is interesting as this is a firm that expects no competition but has not dealt with the procurer before. A monopolist would likely have had dealings with all counterparties in its area; this path seems to better resemble a one-day firm created for the sole task of snapping up a contract. The third path, representing a short-lived firm that did not participate in many auctions,
324
N. Goryunova et al.
also resembles this form of corruption; such a firm certainly does not resemble a monopolist. There is little we can say about the second path – an established firm that dealt with the procurer before. This could be a monopolist, or a firm with an established corrupt relationship with the procurer. It is curious that both the second and third path require that the auction does not take place in Moscow or Moscow Oblast. It is not obvious why this is the case – ceteris paribus, we would consider a single-bidder auction in a highly competitive region like Moscow to be more such suspicious than one elsewhere. Perhaps due to greater policing, procurers are more careful to mask signs of corruption, and single-bidder auctions there more closely resemble the multibidder case.
4
Discussion
Our main result challenges the common assumption in the literature and policymaking that the single-bidder rate can serve as a good proxy of corruption in public procurement. Using a state-of-the-art semi-supervised learning algorithm, we demonstrate that multi-bidder and single-bidder auctions may belong to the same latent class of competitive auctions. Using PU learning, we obtained an α∗ – the upper bound on the probability of a single-bidder auction belonging to the same category as multi-bidder auctions – of 46.14%. In other words, almost half of all single-bidder auctions are very similar to multi-bidder auctions and could very well be fair. Conversely, this means that at least half are different and could represent corruption. By ranking the single-bidder auctions based on their posterior probability to belong to the suspicious class, we identified a cluster of auctions with very high posterior probabilities – over 0.96. We built a decision tree for auctions to belong to this class and find that at least two of the patterns identified by the tree fall resemble a form of corruption – a one-day firm. We acknowledge that though the above threshold of 0.96 for highly suspicious single-bid auctions is to some extent arbitrary, the ranking of single-bid auctions’ posterior probabilities of being corrupt is not (by the SCAR assumption). Thus, as an immediate implication for the state regulator, examination of the topranked auctions should be prioritised and become routine. The positive-unlabelled method used in this paper relies on two strong assumptions that should be relaxed in future works. First, we assumed that all multi-bidder auctions are fair. This is clearly not the case: not only will there be cases where the procurer attempted, but failed, to restrict participation to a single firm, there could also be the situation where the procurer intentionally held a multi-bidder auction – perhaps registering several one-day firms owned by himself to create the illusion of competition.2 A natural future direction is then 2
Compare with the two strategies a cartel may use to implement bid rotation: the other firms could refrain from bidding in an auction a certain cartel member is supposed to win, or they could place intentionally non-competitive bids to fool the regulators.
Detecting Corruption in Single-Bidder Auctions
325
to try learning with noisy labels [19] – the single-bidder auctions are labelled as “suspicious”, the multi-bidder as “fair”, but both categories are assumed to contain mislabelled elements. The second one is the SCAR assumption required by DEDPUL that treats the probability of corruption to be independent of the attributes of auctions and equal to 1 − α∗ , one minus the estimated prior probability of the positive class. This assumption can be relaxed to the Selected At Random (SAR) assumption (see [3, Definition 2] and [4]) when the probability for choosing positive examples to be labelled is conditional on its features.
References 1. Accounts Chamber of the Russian Federation: Report on results of the analytical event «Monitoring of public and corporate procurement development in Russian Federation in 2018» (2018). https://ach.gov.ru/promo/goszakupki-2018/ index.html. Accessed 18 Jan 2021 2. Andreyanov, P., Davidson, A., Korovkin, V.: Detecting auctioneer corruption: evidence from Russian procurement auctions (2018). https://www.researchgate.net/ publication/333755312 3. Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5 4. Bekker, J., Robberechts, P., Davis, J.: Beyond the selected completely at random assumption for learning from positive and unlabeled data. arXiv: 1809.03207 (2019) 5. Cai, H., Henderson, J.V., Zhang, Q.: China’s land market auctions: evidence of corruption? RAND J. Econ. 44(3), 488–521 (2013) 6. Charron, N., Dahlström, C., Lapuente, V., Fazekas, M.: Careers, connections, and corruption risks: investigating the impact of bureaucratic meritocracy on public procurement processes. J. Pol. 79 (2016). https://doi.org/10.1086/687209 7. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD 2008, pp. 213–220. ACM, New York (2008). https://doi.org/ 10.1145/1401890.1401920 8. European Commission: Performance per policy area: Public procurement. https://ec.europa.eu/internal_market/scoreboard/performance_per_policy_ area/public_procurement/index_en.htm (2019). Accessed 22 Jan 2021 9. Fazekas, M., János, T., King, L.: Anatomy of grand corruption: a composite corruption risk index based on objective data. SSRN Electron. J. (2013). https://doi. org/10.2139/ssrn.2331980 10. Fazekas, M., Kocsis, G.: Uncovering high-level corruption: cross-national objective corruption risk indicators using public procurement data. Br. J. Polit. Sci 50(1), 155–164 (2020). https://doi.org/10.1017/S0007123417000461 11. Huber, M., Imhof, D.: Machine learning with screens for detecting bid-rigging cartels. Int. J. Ind. Org. 65 (2019). https://doi.org/10.1016/j.ijindorg.2019.04.002 12. Ingraham, A.: A test for collusion between a bidder and an auctioneer in sealed-bid auctions. Contrib. Econ. Anal. Policy (4) (2005). https://doi.org/10.2202/15380645.1448 13. Ivanov, D.: DEDPUL: method for mixture proportion estimation and positiveunlabeled classification based on density estimation. arXiv:1902.06965 (2019)
326
N. Goryunova et al.
14. Ivanov, D., Nesterov, A.: Identifying bid leakage in procurement auctions: machine learning approach. In: Proceedings of the 2019 ACM Conference on Economics and Computation. EC 2019, pp. 69–70 (2019). https://doi.org/10.1145/3328526. 3329642 15. Ivanov, D.I., Nesterov, A.S.: Stealed-bid auctions: detecting bid leakage via semisupervised learning. arXiv:1902.06965 (2020) 16. Kawai, K., Nakabayashi, J.: Detecting large-scale collusion in procurement auctions. Available at SSRN 2467175 (2014). https://doi.org/10.2139/ssrn.2467175 17. Klasnja, M.: Corruption and the incumbency disadvantage: theory and evidence. J. Polit. 77 (2015). https://doi.org/10.1086/682913 18. Molchanova, G.O., Rey, A.I., Shagarov, D.Y.: Detecting indicators of horizontal collusion in public procurement with machine learning methods. Econ. Contemporary Russia (2020). https://doi.org/10.33293/1609-1442-2020-1(88)-109-127 19. Northcutt, C., Jiang, L., Chuang, I.: Confident learning: estimating uncertainty in dataset labels. J Artif. Intell. Res. 70, 1373–1411 (2021) 20. Porter, R., Zona, J.: Detection of bid rigging in procurement auction. J. Political Econ. 101, 518–38 (1993). https://doi.org/10.1086/261885 21. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances Neural Information Processing System, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/ 14491b756b3a51daac41c24863285549-Paper.pdf 22. Rey, A., Shagarov, D., Andronova, E., Molchanova, G.: Collusion detection on public procurement in Russia. Available at SSRN 3634005 (2020). https://doi.org/ 10.2139/ssrn.3634005 23. Single Information System in the Sphere of Procurement: Federal Law No. 44-FZ of 1 January 2014 “On the contract system in state and municipal procurement of goods, works and service” (2014). https://zakupki.gov.ru/epz/main/public/ download/downloadDocument.html?id=33991. Accessed 18 Jan 2021 24. Transparency International: Transparency international corruption perceptions index (2019). https://www.transparency.org/en/cpi. Accessed 18 Jan 2021 25. Wachs, J., Fazekas, M., Kertész, J.: Corruption risk in contracting markets: a network science perspective. Int. J. Data Sci. Anal. 12, 1–16 (2020) 26. Wallimann, H., Imhof, D., Huber, M.: A machine learning approach for flagging incomplete bid-rigging cartels. arXiv preprint arXiv:2004.05629 (2020) 27. Yakovlev, A., et al.: Incentives for repeated contracts in public sector: empirical study of gasoline procurement in Russia. Int. J. Procurement Manage. 9, 2640–2647 (2016)
On the Speed-in-Action Problem for the Class of Linear Non-stationary Infinite-Dimensional Discrete-Time Systems with Bounded Control and Degenerate Operator Danis N. Ibragimov(B)
and Nikita M. Novozhilkin
Moscow Aviation Institute, Moscow, Russia
Abstract. An infinite-dimensional non-stationary linear control system with discrete time and bounded control is considered. It is assumed that the set of admissible values of the control is convex and weakly compact, the phase space is normed, and the system operator is linear and bounded. For a given system, the speed-in-action problem is solved. In this paper, we propose an approach based on the use of the apparatus of reachable sets, which allows us to generalize the known results to the case of a degenerate operator of the system. An analytical description of sets of reachability is presented, the optimality criterion for the boundary points of the set of reachability is formulated and proved in the form of a maximum principle, its degeneracy is demonstrated for interior points, a method for reducing the case of interior points to the case of boundary points is developed. Example is considered. Keywords: Linear discrete-time control system · Speed-in-action problem · Set of reachability · Discrete maximum principle
1
Introduction
The discrete maximum principle as a tool for solving optimal control problems for discrete-time systems has been known for a relatively long time. This method of optimal control synthesis is presented in detail in the [3,6,16] and in the papers [7–10]. However, despite the fact that most of the problems arising in practice can be solved using the maximum principle, its efficiency turns out to be lower than that of the continuous analogue [2,14,15]. This is primarily due to the fundamentally different nature of the tasks under consideration. While in continuous time the optimal control problem is a problem of the calculus of variations, in the discrete case it is a convex programming problem. In connection with this fact, a class of problems has no computational difficulty for continuous systems, but when passing to discrete time have a number of essential features Supported by RFFI
18-08-00128-a.
c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 327–341, 2021. https://doi.org/10.1007/978-3-030-86433-0_23
328
D. N. Ibragimov and N. M. Novozhilkin
that complicate the use of standard tools, such as the discrete maximum principle [16] and the dynamic programming method [1]. One of these tasks is the speed-in-action problem. In particular, the following difficulties arise for the speed-in-action problem: in the Lagrange multiplier method, all the factors can simultaneously be equal to zero, which leads to an irregularity of the extremum. The quality functional, which is the operating time of the system, can take values only from a set of non-negative integers, i.e., it is actually discrete, which leads to the lack of its control continuity and, as a consequence, the lack of continuity of the Lagrange function. Optimal control, unlike linear-quadratic optimal control problems, is not unique [12]. On the basis of the apparatus of 0-controllability sets in the work [12], necessary and sufficient conditions for the applicability of the maximum principle are constructed for solving the speed-in-action problem for a linear discrete-time system in the case of a non-degenerate operator of the system. These results are generalized for an arbitrary system operator in the work [11]. However, a significant limitation of the methods from [11,12] is the requirement that the system under consideration is stationary and that the dimensions of the control vector and the state vector coincide, which usually does not hold in practice. In this case, it is possible by changing the quantization step to achieve equality of the dimensions of the state vector and control, but the resulting system turns out to be non-stationary. Also, the investigation of the infinite-dimensional systems with a degenerate operator has various applications. For example, some systems described with delay differential equations [4] can be reduced to discrete infinitedimensional systems with a differentiation operator, which is degenerate [13]. In this paper, we propose an approach that generalizes the results obtained in [11,12] to the case of non-stationary discrete-time systems. Another essential feature of the work is the consideration of systems of arbitrary dimensions, including infinite ones. In particular, necessary and sufficient conditions for the applicability of the maximum principle are formulated for solving the speedin-action problem for a linear discrete-time non-stationary system, the degenerate nature of the maximum principle in the general case has been demonstrated, and an effective algorithm for the construction of optimal control in the speed-in-action problem has been proposed. Proofs are based on statements from the convex analysis [17], the functional analysis [13], and the linear operator theory [5].
2
Formulation of the Problem
An infinite-dimensional non-stationary linear control system with discrete time and a bounded set of admissible control values (A, U) is considered: x(k + 1) = A(k)x(k) + u(k), x(0) = x0 , u(k) ∈ U(k), k = 0, 1, 2, . . . ,
(1)
where x(k) ∈ L is the system state vector, u(k) ∈ U(k) ⊂ L is the control, L is the normed space, A = {A(k)}∞ k=0 – sequence of system’s operators. It is
The Discrete Maximum Principle in the Speed-in-Action Problem
329
assumed that for each k ∈ IN ∪ {0} set of admissible values of controls U(k) is convex and weakly compact, 0 ∈ int U(k), A(k) is linear and bounded. For the system (A, U) the speed-in-action problem is being solved i.e. it is needed to calculate the minimum number of steps Nmin , for which the system can be transferred from a given initial state x0 ∈ L to the origin, and also build ∗ min a process {x∗ (k), u∗ (k − 1), x0 }N k=1 , satisfied condition x (Nmin ) = 0, which we will call optimal. We define a family of sets of reachability {Y(x, N, k)}∞ N,k=0 , where Y(x, N, k) represents a set of states into which the system (A, U) can be transferred, started from the step k ∈ IN ∪ {0}, in N steps from the state x ∈ L through permissible control actions: ⎧ ⎪ ⎨ {y ∈ L : ∃ u(k) ∈ U(k), . . . , u(N + k − 1) ∈ U(N + k − 1) : (2) Y(x, N, k) = if x(k) = x, then x(N + k) = y}, N ∈ N, ⎪ ⎩ {x0 }, N = 0. Also, we will assume that 0∈
∞
Y(x0 , N, 0),
N =0
i.e. the speed-in-action problem is solvable for a given initial state. It is possible to calculate the optimal value of the criterion for the control problem by using the class of reachability sets: Nmin = min{N ∈ IN ∪ {0} : 0 ∈ Y(x0 , N, 0)}.
(3)
For arbitrary X, U ⊂ L the Minkowski sum of sets is denoted by X + U. We formulate in the form of the following lemma an analytic representation of the reachability set in N steps. Lemma 1. Let {Y(x, N, k)}∞ N,k=0 be determined by the relations (2). Then for each N ∈ IN the following representation is valid: Y(x, N, k) = A(N + k − 1) . . . A(k)x+ N +k−2
A(N + k − 1) . . . A(i + 1)U(i) + U(N + k − 1).
i=k
Proof. Let x(N + k) ∈ Y(x, N, k), which, by the definition of the reachability set, be equivalent to the existence of vectors u(k) ∈ U(k), . . . , u(N + k − 1) ∈ U(N + k − 1) such that x(N + k) = A(N + k − 1)x(N + k − 1) + u(N + k − 1) = A(N +k−1)A(N +k−2)x(N +k−2)+A(N +k−1)u(N +k−2)+u(N +k−1) = . . .
330
D. N. Ibragimov and N. M. Novozhilkin
= A(N + k − 1) . . . A(k)x + A(N + k − 1) . . . A(k + 1)u(k) +A(N + k − 1) . . . A(k + 2)u(k + 1) + . . . + u(N + k − 1), which is equivalent to the assertion of the lemma.
Also, reachability sets admit a more convenient recurrent representation. Corollary 1. Let {Y(x, N, k)}∞ N,k=0 be determined by the relations (2). Then for each N ∈ IN the following representation is valid: Y(x, N + 1, k) = A(N + k)Y(x, N, k) + U(N + k). Lemma 2. Let {Y(x, N, k)}∞ N,k=0 be determined by the relations (2), for some ˜ ∈ IN ∪ {0} the following inclusion is correct: N ˜ , 0). 0 ∈ Y(x, N ˜ Then for each N N
0 ∈ Y(x, N, 0)
is correct too. Proof. Let us prove this statement by induction. We will assume that for all k ∈ IN ˜ + k) = 0 ∈ U(N ˜ + k). u(N As an induction base, consider the inclusion ˜ , 0). 0 ∈ Y(x, N Suppose that for some k ∈ IN ˜ + k) = 0 ∈ Y(x, N ˜ + k, 0). x(N Then ˜ + k + 1) = A(N ˜ + k)x(N ˜ + k) + u(N ˜ + k) = 0 ∈ Y(x, N ˜ + k + 1, 0). x(N According to the method of mathematical induction, the lemma is proved.
Lemma 2 guarantees that once having got to the origin, the non-stationary system (1) can remain in it for as long as necessary. On the other hand, this fact justifies the formula (3) for calculating the optimal value of the control quality criterion in the speed-in-action problem.
The Discrete Maximum Principle in the Speed-in-Action Problem
3
331
Additional Constructions
Let us define the U2 and U3 in the following way: U2 = {U ⊂ L : U − weakly compact and strictly convex, 0 ∈ int U}, U3 = {X ⊂ L : X − weakly compact and convex, 0 ∈ X}. Everywhere below we will assume that each element of the sequence of sets of admissible values of controls U of system (1) is a member of the U2 : U(k) ∈ U2 , k = 0, 1, 2, . . . Let us denote by L∗ conjugate space to L. The result of the action of the linear and bounded functional p ∈ L∗ to vector x is denoted by (p, x). Functional p ∈ L∗ \ {0} is called support to the set X ⊂ U3 in point x ∈ ∂X, if X ⊂ {˜ x ∈ L : (p, x ˜) (p, x)} or ˜). (p, x) = max(p, x x ˜∈X
Normal cone N(x, X) of set X ∈ U3 in point x ∈ ∂X is the set of all support functionals X in x: We will also assume that for an arbitrary interior point x ∈ int X normal cone is the empty set: N(x, X) = ∅. The properties of the U2 and U3 necessary for the subsequent reasoning are given below. Lemma 3 ([11]). Let U ∈ U2 . Then for any different u1 , u2 ∈ U, N(u1 , U) ∩ N(u2 , U) = ∅ is correct. Lemma 4 ([11]). Let X ∈ U3 , A : L → L be the linear and bounded operator. Then for each x ∈ X inclusion (ker A∗ \ {0}) ⊂ N(Ax, AX) is correct. Lemma 5 ([11]). Let X ∈ U3 , A : L → L be the linear and bounded operator. Then for all x ∈ X N(Ax, AX) = (A∗ )−1 (N(x, X)) ∪ (ker A∗ \ {0}). Lemma 6 ([11]). Let X1 , X2 ∈ U3 , x1 ∈ X1 , x2 ∈ X2 . Тhen N(x1 + x2 , X1 + X2 ) = N(x1 , X1 ) ∩ N(x2 , X2 ).
332
D. N. Ibragimov and N. M. Novozhilkin
Lemma 3 admits to assert that each boundary point of a set from U2 is one-to-one determined by its normal cone. Lemmas 4 and 5 define the transformation of the normal cone of a set from U3 in case of using continuous linear transformation. Lemma 6 defines the transformation of the normal cone in the case of addition of sets from U3 . Note that due to the Lemma’s 1 representation of the set of reachability Lemma 3 actually defines the uniqueness of the optimal control for boundary points and proposes a way to calculate it by means of support functionals, while the Lemmas 5 6 connect these support functionals to each other.
4
Optimality Criterion in the Speed-in-Action Problem
Formulated in the Sect. 3 lemmas are essential, since on their basis it is possible to formulate a criterion for the optimality of control in the speed-in-action problem. In fact, due to the Lemma 1 the problem of constructing an optimal control is reduced to the expansion of the vector A(Nmin − 1) · . . . · A(0)x0 by elements of sets U(0), . . . , U(Nmin − 1). Moreover, if N
A(Nmin − 1) · . . . · A(0)x0 ∈
min −2
∂
A(Nmin − 1) · . . . · A(i + 1)U(i) + U(Nmin − 1) ,
i=0
which is equivalent to inclusion 0 ∈ ∂Y(x0 , Nmin , 0),
(4)
then the considered linear transformation of the initial state due to the Lemma 3 can be uniquely determined by the functional from its normal cone. On the other hand, according to the Lemma 6 each element of the required decomposition can also be found by using this functional. Lemma 5 allows recovering the values of the optimal control based on the calculated expansions. These arguments can be formulated in the form of the maximum principle, which will be demonstrated below, where the support functionals of the optimal control represent the trajectory of the adjoint system. The only potential difficulty in case (4) is that, according to the Lemma 4 the initial state of the adjoint system can theoretically turn out to be an element of the kernel of one of the operators A(0), . . . , A(Nmin − 1), which will lead to the degenerate nature of the maximum principle, i.e., it will be impossible to calculate the optimal control from the condition of maximizing the Hamiltonian, since it will be identically equal to zero. Nevertheless, it is possible to prove that the degenerate situation described above is impossible. This fact is formulated as a lemma. Lemma 7. Let {Y(x, N, k)}∞ N,k=0 be determined by the relations (2), inclusion (4) is correct. Then, for each p ∈ N(0, Y(x0 , Nmin , 0)) p ∈ ker(A∗ (0) · . . . · A∗ (Nmin − 1)).
The Discrete Maximum Principle in the Speed-in-Action Problem
333
Proof. Due to the Lemma 1 N(0, Y(x0 , Nmin , 0)) = N (−A(Nmin − 1) · . . . · A(0)x0 ,
Nmin −2
A(Nmin − 1) · . . . · A(i + 1)U(i) + U(Nmin − 1) ,
i=0
is correct. Also taking into account the definition of the normal cone p = 0, and therefore the relations
∗ (A (0) · . . . · A∗ (Nmin − 1)p, −x0 = (p, −A(Nmin − 1) · . . . · A(0)x0 ) > 0 are true too. Then, by the definition of the kernel of the operator p ∈ ker(A∗ (0) · . . . · A∗ (Nmin − 1)). Now let us formulate a criterion for the optimality of the process in the speed-in-action problem for the inclusion (4). As it was noted earlier, this case, although it is rather rare, is of significant interest, since the sufficient conditions for optimality of the control are also necessary, and the optimal trajectory itself turns out to be unique. Theorem 1 (Maximum principle). Let for the system (1) condition (4) Nmin −1 ∗ min be satisfied, vector sets {x∗ (k)}N ⊂ L and functionals k=0 , {u (k)}k=0 Nmin ∗ {ψ(k)}k=0 ⊂ L \ {0} for each k = 0, Nmin − 1 be determined by the relations x∗ (k + 1) = A(k)x∗ (k) + u∗ (k), ψ(k) = A∗ (k)ψ(k + 1), x∗ (0) = x0 ,
(5)
ψ(Nmin ) ∈ N(0, Y(x0 , Nmin , 0)), u∗ (k) = arg max (ψ(k + 1), u). u∈U(k)
Then min i) {x∗ (k), u∗ (k − 1), x0 }N k=1 is the only optimal process of the system (1); ∗ ii) x (k) ∈ ∂Y(x0 , k, 0), k = 0, Nmin ; iii) ψ(k) ∈ N(x∗ (k), Y(x0 , k, 0)), k = 0, Nmin . min Proof. Let’s construct a set of vectors {y ∗ (k)}N k=0 according to recurrence relations ⎧ ∗ ⎪ ⎨ y (Nmin ) = 0, y ∗ (k) ∈ Y(x0 , k, 0), ⎪ ⎩ A(k)y ∗ (k) = y ∗ (k + 1) − u∗ (k), k = 0, Nmin − 1.
334
D. N. Ibragimov and N. M. Novozhilkin
Then, in accordance with the conditions of the theorem y ∗ (Nmin ) ∈ ∂Y(x0 , Nmin , 0), ψ(Nmin ) ∈ N(y ∗ (Nmin ), Y(x0 , Nmin , 0)). Suppose that for some k = 0, Nmin − 1 conditions y ∗ (k + 1) = ∂Y(x0 , k + 1, 0), ψ(k + 1) ∈ N(y ∗ (k + 1), Y(x0 , k + 1, 0)) are correct. Due to the corollary 1 representation Y(x0 , k + 1, 0) = A(k)Y(x0 , k, 0) + U(k), is correct. Since U3 is closed under the Minkowski addition and linear transformations, it is true that A(k)Y(x0 , k, 0) ∈ U3 , U(k) ∈ U2 . Then according to Lemmas 3 and 6, there is a unique decomposition y ∗ (k + 1) = z(k) + u∗ ,
(6)
where z(k) ∈ ∂A(k)Y(x0 , k + 1, 0), u∗ ∈ ∂U(k). According to the Lemma 6 N(y ∗ (k + 1), Y(x0 , k + 1, 0)) = N(z ∗ (k), A(k)Y(x0 , k, 0)) ∩ N(u∗ , U). Then, by Lemma 3, the vector u∗ is uniquely determined by the relation u∗ = arg max (ψ(k + 1), u) = u∗ (k). u∈U(k)
Because the z(k) ∈ ∂A(k)Y(x0 , k, 0), then there y ∗ (k) ∈ Y(x0 , k, 0) such that z(k) = A(k)y ∗ (k). Since by the Lemma 7 ψ(k + 1) = A∗ (k + 1) · . . . · A∗ (Nmin − 1)ψ(Nmin ) ∈ ker A∗ (k), then in view of the Lemma 5
ψ(k + 1) ∈ (A∗ (k))−1 N(y ∗ (k), Y(x0 , k, 0) . Then
ψ(k) = A∗ (k)ψ(k + 1) = 0, ψ(k) ∈ N(y ∗ (k), Y(x0 , k, 0)).
(7)
Because the N(y ∗ (k), Y(x0 , k, 0)) = ∅, then y ∗ (k) ∈ ∂Y(x0 , k, 0).
(8)
Then by the method of mathematical induction (8) is correct for all k = 0, Nmin .
The Discrete Maximum Principle in the Speed-in-Action Problem
In case k = 0
335
y ∗ (0) ∈ ∂Y(x0 , 0, 0) = {x0 }.
By construction, the equality y ∗ (k + 1) = A(k)y ∗ (k) + u∗ (k), k = 0, Nmin − 1 is correct. Then for all k = 0, Nmin − 1 y ∗ (k) = x∗ (k). min i) Because the x∗ (Nmin ) = y ∗ (Nmin ) = 0, then {x∗ (k), u∗ (k − 1), x0 }N k=1 is optimal process of system (1) in the speed-in-action problem. The uniqueness follows from Lemma 3 and the uniqueness of the decomposition (6). ii) Due to (8) inclusion
x∗ (k) ∈ ∂Y(x0 , k, 0), k = 0, Nmin − 1 is correct. iii) Due to (7) ψ(k) ∈ N(x∗ (k), Y(x0 , k, 0)), k = 0, Nmin − 1 is correct. Let us consider separately the case of the inclusion 0 ∈ int Y(x0 , Nmin , 0).
(9)
Problem of the case (9) lies in the fact that the maximum principle acquires a degenerate character. On the one hand, the condition for calculating the initial state of the conjugate system, proposed in the relations (5), turns out to be inconsistent due to the fact that the normal cone to any interior point is an empty set. On the other hand, it is possible to prove that for any nondegenerate trajectory of the adjoint system, a control process satisfying (5), turns out to be nonoptimal. min Theorem 2. Let the inclusion (9) be correct, process {x (k), u (k − 1), x0 }N k=1 is determined by the relations
x (k + 1) = A(k)x (k) + u (k), ψ(k) = A∗ (k)ψ(k + 1), x (0) = x0 , ψ(Nmin ) = ψNmin , u (k) = arg max (ψ(k + 1), u). u∈U(k)
Then for any ψNmin ∈ ker(A∗ (1) · . . . · A∗ (Nmin − 1)) x (Nmin ) = 0.
336
D. N. Ibragimov and N. M. Novozhilkin
min Proof. Suppose the opposite, process {x (k), u (k − 1), x0 }N k=1 is optimal in the speed-in-action problem for the system (1), but wherein ψNmin ∈ ker(A∗ (1) · . . . · A∗ (Nmin − 1)). Then for each k = 1, Nmin − 1
(A∗ (k) · . . . · A∗ (Nmin − 1))ψNmin = 0, ψ(k) = (A∗ (k) · . . . · A∗ (Nmin − 1))ψNmin ∈ N(u (k − 1), U(k − 1)),
ψNmin ∈ (A∗ (k) · . . . · A∗ (Nmin − 1))−1 N(u (k − 1)), U(k − 1)) . According to Lemma 5
ψNmin ∈ N A(Nmin − 1) · . . . · A(k)u (k − 1), A(Nmin − 1) · . . . · A(k)U(k − 1) . Then by Lemma 6 ψNmin ∈ N
Nmin −1
A(Nmin − 1) · . . . · A(k)u (k − 1) + u (Nmin − 1),
k=1 Nmin −1
A(Nmin − 1) · . . . · A(k)U(k − 1) + U(Nmin − 1) .
k=1
So, in accordance with Lemma 1 ψNmin ∈ N(A(Nmin − 1) · . . . · A(0)x0 +
Nmin −1
A(Nmin − 1) · . . . · A(k)u (k − 1) + u (Nmin − 1), Y(x0 , Nmin , 0)).
k=1
Because the 0 ∈ int Y(x0 , Nmin , 0), then N(0, Y(x0 , Nmin , 0)) = ∅, 0 = A(Nmin − 1) · . . . · A(0)x0 +
Nmin −1
A(Nmin − 1) · . . . · A(k)u (k − 1) + u (Nmin − 1) = x (Nmin ).
k=1 min Then, by definition, the process {x (k), u (k − 1), x0 }N k=1 is not optimal in the speed-in-action problem. We get a contradiction.
As follows from the Theorem 2, in case (9) optimal process of system (1) satisfies the relations of the maximum principle (5) if and only if there exists k0 ∈ 1, Nmin such that for each k = 0, k0 ψ(k) = 0. This fact makes it impossible to determine the optimal control from the condition u∗ (k) = arg max (ψ(k + 1), u), k = 0, k0 − 1. u∈U(k)
The Discrete Maximum Principle in the Speed-in-Action Problem
337
Consider a way to reduce the case (9) to the case (4) and formulate an optimality criterion for the process for an arbitrary initial state. We denote: α = μ(−A(Nmin − 1) · . . . · A(0)x0 , Y(x0 , Nmin , 0) − A(Nmin − 1) · . . . · A(0)x0 ), where μ(x, X) is the Minkowski functional [13]. For the additional system (A, Uα ), where Uα = {αU(k)}∞ k=0 , denote the family of sets of reachability as . {Yα (x, N, k)}∞ N,k=0 Lemma 8. Let inclusion (9) be correct. Then for the system (A, Uα ) relations i)Yα (x, N, 0) ⊂ Y(x, N, 0), N ∈ IN ∪ {0}, ii) 0 ∈ ∂Yα (x0 , Nmin , 0) are correct. Proof. Due to (9) and the definition of the Minkowski functional it is correct that α < 1. Because 0 ∈ U(Nmin − 1) and 0 ∈ A(Nmin − 1) · . . . · A(k + 1)U(k), k = 0, Nmin − 2, then Nmin −2
αA(Nmin − 1) · . . . · A(k + 1)U(k) + αU(Nmin − 1) ⊂
k=0 Nmin −2
A(Nmin − 1) · . . . · A(k + 1)U(k) + U(Nmin − 1).
k=0
Then, according to the Lemma 1 and the definition of the Minkowski functional i)Yα (x, N, 0) ⊂ Y(x, N, 0), N ∈ IN ∪ {0},
ii) − A(Nmin − 1) · . . . · A(0)x0 ∈ ∂ α(Y(x0 , Nmin , 0) − A(Nmin − 1) · . . . · A(0)x0 ) N −2 min = ∂ αA(Nmin − 1) · . . . · A(k + 1)U(k) + αU(Nmin − 1) , k=0
N
min −2
0∈∂
αA(Nmin − 1) · . . . · A(k + 1)U(k) + αU(Nmin − 1)
k=0
+ A(Nmin − 1) · . . . · A(0)x0
= ∂Yα (x0 , Nmin , 0).
In fact Lemma 8 allows us to restrict ourselves to considering the case (9), going from the original system (A, U) to the system (A, Uα ).
338
D. N. Ibragimov and N. M. Novozhilkin
Theorem 3. Let for the system (1) condition (4) be correct, vector sets Nmin −1 ∗ ∗ min min ⊂ L and functionals {ψ(k)}N {x∗ (k)}N k=0 , {u (k)}k=0 k=0 ⊂ L \ {0} for each k = 0, Nmin − 1 are determined by the following relations: x∗ (k + 1) = A(k)x∗ (k) + u∗ (k), ψ(k) = A∗ (k)ψ(k + 1), x∗ (0) = x0 , ψ(Nmin ) ∈ N(0, Yα (x0 , Nmin , 0)), u∗ (k) = arg max (ψ(k + 1), u). u∈αU(k)
min Тhen {x∗ (k), u∗ (k − 1), x0 }N k=1 is the optimal process for system (A, U) in the speed-in-action problem.
Proof. The assertion follows from the fact that the system (A, αU) by the Lemma 8 satisfies the conditions of the Theorem 1, and according to the inclusion αU(k) ⊂ U(k) for all k ∈ IN ∪ {0} admissible control of the system (A, αU) is also valid for (A, U).
5
Example
Consider the system (1) with a state vector from L = l2 , where the sequence of system operators has the form Lx = (x2 , x3 , x4 , x5 , . . .), k = 0, 2, 4, . . . , A(k)x = Cx = (x2 , x1 , x4 , x3 , . . .), k = 1, 3, 5, . . . Consider that set U(k) is a closed ball at each step: U(k) = B1 (0), k ∈ IN ∪ {0}. For each x ∈ LB1 (0) there is y ∈ B1 (0). Such that x = Ly. So inequality x y 1 is correct. Therefore, x ∈ B1 (0) and LB1 (0) ⊂ B1 (0). Also for each y ∈ B1 (0) there is x = (0, y1 , y2 , . . .) such that Lx = y. Then x = y . Therefore, x ∈ B1 (0) and y ∈ LB1 (0). Thus, B1 (0) ⊂ LB1 (0). Finally, we get equality B1 (0) = LB1 (0). Also, because it is true that C −1 = C, equality Cx = x is correct for each x ∈ l2 , then the equality CB1 (0) = B1 (0)
The Discrete Maximum Principle in the Speed-in-Action Problem
339
is true. Finally, for all N ∈ IN ∪ {0} and k = 0, N A(N ) · . . . · A(N − k)U = B1 (0). Since by Riesz’s theorem l2∗ is isomorphic to l2 , we will assume that A∗ (k) acts on l2 : A∗ (k) : l2 → l2 . Then equality A∗ (k)x =
L∗ x = Rx = (0, x1 , x2 , x3 , x4 , x5 , . . .), k = 0, 2, 4, . . . , C ∗ x = Cx = (x2 , x1 , x4 , x3 , x6 , x5 , . . .), k = 1, 3, 5, . . .
is correct [13]. Let 0 ∈ ∂Y(x0 , Nmin , 0). Due to Lemma 1 for each N ∈ IN Y(x0 , N, 0) = BN (A(N − 1) · . . . · A(0)x0 ). Consider the case 0 ∈ ∂BNmin (A(Nmin − 1) · . . . · A(0)x0 ), which is equivalent to the condition A(Nmin − 1) · . . . · A(0)x0 l2 = Nmin . Also for ball Y(0, Nmin , 0) in a Hilbert space, the equality N(0, Y(x0 , Nmin , 0)) = cone{−A(Nmin − 1) · . . . · A(0)x0 } \ {0} min is correct. Due to Theorem 1 optimal process {x∗ (k), u∗ (k−1), x0 }N k=1 of system Nmin (A, U) and the trajectory of the adjoint system {ψ(k)}k=1 have the following form: ψ(Nmin ) = −A∗ (Nmin − 1) · . . . · A∗ (0)x0 ,
ψ(k) = −A∗ (Nmin − k) · . . . · A∗ (Nmin − 1)A(Nmin − 1) · . . . · A(0)x0 , u∗ (k) = arg max (ψ(k + 1), u) = − ⎧ ⎪ ⎪ ⎨−
u∈B1 (0)
1 Nmin
A(k) · . . . · A(0)x0
1 (x2 , xk+3 , x4 , xk+5 , x6 , xk+7 . . .), k = 0, 2, 4, . . . , Nmin = 1 ⎪ ⎪ ⎩− (xk+2 , x2 , xk+4 , x4 , xk+6 , x6 , . . .), k = 1, 3, 5, . . . Nmin x∗ (k) =
Nmin − k A(k) · . . . · A(0)x0 , Nmin x∗ (Nmin ) = 0.
The speed-in-action problem is solved. Let us demonstrate the resulting relation by using the following example: √ 1 1 1 1 x0 = 8 3 β1 , , β2 , , , , . . . . 2 8 16 32
340
D. N. Ibragimov and N. M. Novozhilkin
Note that for any β1 , β2 ∈ IR A(3)A(2)A(1)A(0)x0 = 4, From where Nmin = 4. Then the trajectory of the control system has the following form: √ 1 1 1 1 ∗ , β2 , , , , . . . , x (1) = 6 3 2 8 16 32 √ 1 1 1 1 x∗ (2) = 4 3 β2 , , , , , . . . , 2 16 8 64 √ 1 1 1 1 , , , ,... , x∗ (3) = 2 3 2 16 8 64 x∗ (4) = 0.
6
Conclusion
The paper proposes a method for constructing the optimal process in the speedin-action problem for a linear non-stationary infinite-dimensional discrete-time system with bounded control and a degenerate operator. The solution to this problem is based on the apparatus of sets of reachability, the properties of which are given in the Sects. 2 and 3. In particular, it is proved that for a sequence of strictly convex and weakly compact sets {U(k)}∞ k=0 each set of reachability can be represented as a Minkowski sum of a convex set and a strictly convex set. Due to Lemmas 3 and 6 this fact guarantees the uniqueness of the decomposition of any boundary point of the set of reachability. Moreover, according to the Lemma 3, each element of the decomposition can be uniquely determined by its support functional. On the basis of the obtained properties in the Sect. 4 the necessary and sufficient conditions for the optimality of the process in the speed-in-action problem are formulated for the case when the origin belongs to the boundary of the set of reachability. Lemma 7 guarantees the nondegeneracy of the trajectory of the adjoint system. Theorem 2 demonstrates the degenerate nature of the maximum principle for the case when the origin belongs to the interior of the set of reachability. The results obtained earlier are generalized for an arbitrary initial state in Theorem 3, although in the general case the optimal process is not unique.
References 1. Bellman R., Dreyfus S.: Applied dynamic programming, United states air force project rand (1962)
The Discrete Maximum Principle in the Speed-in-Action Problem
341
2. Boltianskiy, V.G.: Mathematical Methods of Optimal Control. Nauka, Russia (1969). (in Russian) 3. Boltianskiy, V.G.: Optimal Control of Discrete-Time Systems. Nauka, Russia (1973). (in Russian) 4. Dolgiy, Y.F., Surkov, P.G.: Mathematical Models of Dynamical Systems with Delay. Ural University Publishing House, Russia (2012). (in Russian) 5. Dunford, N., Schwartz, J.T.: Linear Operators, Part 2: Spectral Theory, Self Adjoint Operators in Hilbert Space. Wiley-Interscience, USA (1988) 6. Evtushenko, U.G.: Methods for Solving Extreme Problems and Their Applications in Optimization Systems. Nauka, Russia (1982). (in Russian) 7. Fisher, M.E., Gayek, J.E.: Estimating reachable sets for two-dimensional linear discrete systems. J. Optim. Theory Appl. 56, 67–88 (1988) 8. Halkin, H.: A maximum principle of the Pontryagin type for systems described by nonlinear difference equations. SIAM J. Control 4(1), 90–111 (1966). https://doi. org/10.1137/0304009 9. Holtzman, J.M., Halkin, H.: Directional convexity and the maximum principle for discrete systems. J. SIAM Control 4(2), 263–275 (1966). https://doi.org/10.1137/ 0304023 10. Horn, F., Jackson, R.: On discrete analogues of Pontryagin’s maximum principle. Int. J. Control 1(4), 389–395 (1965). https://doi.org/10.1080/00207176508905489 11. Ibragimov, D.N.: On the optimal speed problem for the class of linear autonomous infinite-dimensional discrete-time systems with bounded control and degenerate operator. Autom. Remote Control 80(3), 393–412 (2019). https://doi.org/10.1134/ S0005117919030019 12. Ibragimov, D.N., Sirotin, A.N.: On the problem of operation speed for the class of linear infinite-dimensional discrete-time systems with bounded control. Autom. Remote Control 78(10), 1731–1756 (2017). https://doi.org/10.1134/ S0005117917100010 13. Kolmogorov A.N., Fomin S.V.: Elements of Function Theory and Functional Analysis. Fizmalit, Russia (2012). (in Russian) 14. Moiseev, N.N.: Elements of the Theory of Optimal Systems. Nauka, Russia (1975). (in Russian) 15. Pontriagin, L.S., Boltianskiy, V.G., Gamerkilidze, R.V., Mischenko, B.F.: Mathematical Theory of Optimal Processes. Nauka, Russia (1969). (in Russian) 16. Propoy A.I.: Elements of the Theory of Optimal Discrete Processes. Nauka, Russia (1973). (in Russian) 17. Rockafellar R.T.: Convex Analysis. Princeton University Press, USA (1970). https://doi.org/10.1515/9781400873173
Method for Calculating the Air Pollution Emission Quotas Vladimir Krutikov1 , Anatoly Bykov2 , Elena Tovbis3 , and Lev Kazakovtsev3,4(B) 1
Kemerovo State University, 6 Krasnaya Street, Kemerovo 650043, Russia Institute of Computational Technologies SB RAS, 21 Rukavishnikov Street, Kemerovo 650025, Russia 3 Reshetnev Siberian State University of Science and Technology, prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660031, Russia 4 Siberian Federal University, 79 Svobodny pr., Krasnoyarsk 660041, Russia [email protected]
2
Abstract. The main purpose of quotas is to limit emissions for facilities that have a negative impact on the environment. When calculating emission quotas, it is necessary to solve a nonlinear programming problem with a nonlinear objective function and linear constraints. Variables of the problem are emission reduction coefficients for objects that have a negative impact on the environment. Constraints of the problem are determined by the admissibility of the emission contributions from objects to the concentration of pollutants in the air at the locations of quotas. As a rule, the number of constraints of this problem is significantly less than the number of variables (the problem can include up to 10,000 variables and 1,000 linear constraints). In this regard, it seems relevant to use the theory of duality for the purposes of substantive analysis and simplification of computational methods for problem solving. We suggest the transition from primal to dual nonlinear programming. As a result, we gain a nonsmooth problem of unconstrained minimization of a much smaller order, and the solution can be obtained by effective subgradient minimization methods with an alteration in the space metric. We propose an effective method for solving the problem of emission quotas to be determined, and confirm its efficiency by a computational experiment on both test and applied data. The explicit form of the dependence of primal and dual variables is useful for the analysis of the solution and the selection of the object priority parameters by an expert. Keywords: Subgradient methods pollution · Emission quotas
1
· Space extension · Relaxation · Air
Introduction
Nowadays, one of the priority goals of global community is air quality control [1]. Air pollution refers to the contamination of the atmosphere by harmful chemicals Supported by the Ministry of Science and Higher Education of the Russian Federation (Project FEFE-2020-0013). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 342–357, 2021. https://doi.org/10.1007/978-3-030-86433-0_24
Method for Calculating the Air Pollution Emission Quotas
343
or biological materials. The main purpose of quotas is to limit emissions for facilities that have a negative impact on the environment in accordance with their contribution to the concentration of air pollutants in the living sector [2]. In order to control the air pollution, the experiment is to be conducted on setting quotas for emissions of atmospheric pollutants in major urban industrial areas. For example, in Russia, such experiments were conducted in 12 areas, and the new regulations for air pollution emissions quotas calculating [3] were approved in 2019. The quotas established are to be based on aggregate calculations on permissible pollutants. In accordance with the new regulations, to calculate the allowable increments in pollution at quota locations, we have to solve a nonlinear programming problem with a given convex nonlinear objective function and linear constraints under two-sided constraints on the variables. The variables of such a problem are the emission reduction coefficients, their number can be up to about 10,000. The constraints of the problem determine the allowable increments of emissions in the atmospheric air at the quota locations. The abovementioned problem is a nonlinear programming problem with a convex nonlinear objective function and linear constraints. For an industrial city, the number of constraints of this problem can be an order of magnitude less than the number of variables. In this regard, it seems relevant to use the theory of duality of convex programming for the purposes of meaningful analysis and simplification of computational methods for solving the problem. In related literature, the authors present different types of models for quota determining. Regression model considered by Yu et al. [4] describe the emission reduction per million tons in each region during the five years in the study of the SO2 emission problem. In the study of water pollution regulation [5], the change of the concentration of each pollutant including such variables as regulation stringency and city characteristics described by regression model. Qin et al. [6] proposed a multi-criteria decision analysis model considering both efficiency and fairness principles to formulate a regional quota allocation scheme. In [7], single-object and multi-object carbon emission allocation optimization models were presented. Tang et al. [8] designed carbon allowance auction from a multi-agent perspective. An approach to determine carbon emission reduction target allocation based on the particle swarm optimization algorithm, fuzzy c-means clustering algorithm, and Shapley decomposition was proposed in [9]. Game theoretic approach applied to allocation of carbon emissions problem by Ren et al. [10]. In [11], the authors combine the multi-index method with the optimization idea of zero-sum gains–data envelopment analysis model to allocate carbon emission quotas. The article [12] develops an integrated cooperative game data envelopment analysis approach. Zero-sum gains data envelopment analysis was used in [13]. Optimization model of air pollution distribution in terms of linear programming presented in [14]. To reduce inequality between 25 types of environmental burdens encompassing natural resources, Pozo et al. [15] propose a framework for an optimal allocation of burdens quotas. The nonlinear programming model seeks for regional quotas deviating the least from the current consumption-based burdens that, in turn, satisfy the following constraints: (i)
344
V. Krutikov et al.
an upper bound on inequality (quantified via the Gini coefficient); (ii) a maximum disparity in per-capita burdens across regions; and (iii) a limit on the total burden that can be generated globally. A nonlinear programming approach was proposed [16] to obtain an optimal carbon emission quota allocation in a leastcost way. A hybrid nonlinear grey-prediction and quota allocation model was developed for supporting optimal planning of carbon intensity reduction [17]. In this work, we propose a nonlinear model, which, in contrast to the linear programming model, is more resistant to data inaccuracies and provides for proportional distribution of emissions for enterprises with identical air pollution influence. We formulate a dual problem, the number of variables of which is equal to the number of constraints, and establish relationships between primal and dual variables. As a result, we obtain a problem of unconstrained optimization of a significantly lower dimension, the solution of which can be obtained by relaxation subgradient minimization methods [18–26]. Thus, it was possible to avoid solving a complex large-scale nonlinear programming problem with many linear constraints and two-sided constraints on variables. Another important features of using the dual approach are that, firstly, it enables us to use the explicit form of the dependence of primal and dual variables for the analysis of the solution and the expert selection of the parameters of the objects priority and, secondly, it enables us to obtain estimates of the solution accuracy by function and by its parameters. The contribution of this paper is the further development of the approach to calculating the air pollution emission quotas proposed in [27]. Based on the research carried out in this study, we propose an algorithm for solving the nonlinear optimization problem and implemented it for solving applied and test problems. The results show a high rate of convergence of the proposed method and a good accuracy of the obtained solution. The rest of this paper organized as follows. Section 2 provides formulation of the problem in terms of nonlinear optimization. Section 3 describes the transition from primal to dual nonlinear programming. Section 4 declares method for reducing the problem to the problem of unconditional minimization, method for calculating emission quotas, subgradient minimization methods. Section 5 includes computational experiment on both test and applied data, and in Sect. 6 conclusions were made.
2
Problem Statement
Maximum permissible concentrations (MPC) of air pollutants in populated areas are regulated by sanitary and hygienic requirements [2]. The assessment of air pollution, depending on the totality of the enterprise’s emissions, is carried out using a model for calculating the dispersion of the impurities in the atmosphere [2,28]. Ensuring MPC limits is achieved by setting maximum permissible emissions (MPE) for enterprises (sources) [3]. Let us denote by x the vector, whose components xi , i = 1, . . . , n, are emissions of the sources. MPE for enterprises is a set of vector x components, for which on some set of m locations of quotas the constraints are:
Method for Calculating the Air Pollution Emission Quotas n
Aji xi ≤ bj , j = 1, ..., m,
345
(1)
i=1
where the left part is total emission concentration for all sources, bj is allowable concentration in j -th location (MPC minus background concentration), Aji is concentration per volume of emission from source i in quota location j. It is assumed that the vector x components satisfy two-sided constraints, which can be written in a vector form as x ∈ Q = {x|0 < a ≤ x ≤ d}, a, d, x ∈ IRn .
(2)
Here, vector d is current state of MPE limitations for sources, a is technical capabilitiy to reduce emissions. Before the introduction of the new regulations [3], it was enough to solve the system of inequalities (1) under constraints (2) in any way. It was used optimization formulations of both linear programming and quadratic programming, with the latter being interpreted as the problem of minimizing the cost of reducing emissions. According to the new requirements, the emission quota problem is further defined by the introduction of the objective function. Denote by qi , i = 1, 2, . . . , n, current volume reduction coefficients of MPC, which is set in (2) by vector d components. Then xi = di qi , i = 1, 2, . . . , n. The new formulation of the problem in terms of the reduction coefficients of the current MPE volumes is: n min pi /qi , (3) i=1 n
Aji di qi ≤ bj , j = 1, ..., m,
(4)
i=1
0 < ai /di ≡ Qi ≤ qi < 1, i = 1, ..., n,
(5)
where constraints (4), (5) are derived from (1) and (2) after changing variables. Coefficients pi , i = 1, 2, . . . , n, are designed for possible differentiation of sources according to one or another priority. In the problem (3)–(5), amount of variables n may exceed the number of constraints m by an order of magnitude. Additional difficulties in solving this problem arise due to the presence of the bilateral constraints (5). In the next section, we consider a method for obtaining a solution to the presented problem based on duality theory.
3
Primal and Dual Problems
In terms of the variables x, constraints (4), (5) are equivalent to constraints (1), (2), and the objective function takes the form: minf (x) = min
n pi d i i=1
xi
.
(6)
346
V. Krutikov et al.
The matrix of second derivatives of the objective function (6) is diagonal, and its diagonal elements under condition (2) are strictly positive: ∇2ii f (x) =
2di pi > 0, i = 1, ..., n. x3i
(7)
The matrix of second derivatives under condition (2) is bounded, i.e. lI ≤ ∇2 f (x) ≤ LI, x ∈ Q,
(8)
where constants l and L are the minimum and maximum values of the diagonal elements (7), respectively. The left inequality in (8) means the strong convexity of the function on the set Q. A strongly convex function satisfies the inequality [18] f (x + z) ≥ f (x) + (∇f (x), z) + lz2 /2.
(9)
Problem constraints are linear. Therefore, the admissible domain X defined by these constraints is convex. Taking into account the strong convexity of the function, we have a convex programming problem of the following form: minf (x), x ∈ IRn ,
(10)
gi (x) = [Ax − b]i ≤ 0, i = 1, ..., m, x ∈ Q.
(11) ∗
Due to the strong convexity of the function, the minimum point x of the problem (10)–(11), under the condition of a non-empty admissible set X exists and is unique. For an arbitrary point x ∈ X, inequality (9) implies the relation: f (x) ≥ f (x∗ ) + (∇f (x∗ ), x − x∗ ) + lx − x∗ 2 /2.
(12)
Since x∗ is a minimum point and the function does not decrease along the ray x − x∗ , the following inequality holds: (∇f (x∗ ), x − x∗ ) ≥ 0.
(13)
From (12), taking into account (13), we obtain the estimate x − x∗ 2 ≤ 2(f (x) − f (x∗ ))/l.
(14)
This estimate enables us to evaluate the deviation from the minimum if we know the error of the solution. The algorithms we use for finding a solution allow us to obtain an upper bound for the right side of (14). The primal problem (10)–(11) is a convex nonlinear program. We will assume that in the primal problem (10)–(11) the Slater condition is satisfied: there exists an interior point x0 of the set Q such that gj (x0 ) < 0, j = 1, . . . , m. A convex programming problem for which the Slater conditions are satisfied is called regular [18]. The conditions for the extremum of the regular problem are formulated by the Kuhn–Tucker theorems (see, for example, [18]) using
Method for Calculating the Air Pollution Emission Quotas
347
the Lagrange function L(x, y) = f (x) + (y, g(x)), y ≥ 0, y ∈ IRm , x ∈ Q, where y = (y1 , y2 , . . . , ym )T is the vector of Lagrange multipliers, g(x) = (g1 (x), g2 (x), . . . , g(x))T . According to the Kuhn-Tucker theorem, in the case of a regular minimum point x∗ , there is a vector y ∗ of non-negative Lagrange multipliers satisfying the complementary slackness conditions yi∗ gi (x∗ ) = 0, i = 1, ..., m, such that the Lagrange function at y = y ∗ attains a minimum on the set Q at point x∗ . Thus, we need to find a way to determine the Lagrange multipliers yi∗ and a method to find x∗ when y ∗ is known. Since the number of constraints (11) is significantly less than the number of variables of the primal problem, the transition to the dual problem allows to reduce the number of variables. Let us describe a method for solving the primal minimization problem (10) under constraints (11), based on the duality theory. Following [18], we introduce the dual function n pi d i + (y, Ax − b)], ψ(y) = inf L(x, y) = min[ x∈Q x∈Q xi i=1
(15)
and formulate the dual problem as max ψ(y). y≥0
(16)
Let the primal problem (10)–(11) be regular. Denote its solution as x∗ and nonnegative Lagrange multipliers satisfying the extremum conditions of the KuhnTucker theorem as yi∗ . According to the duality theorem [18], in the case of a regular primal problem, the Lagrange multipliers yi∗ are also a solution to the dual problem (16). Moreover, for any admissible vectors x and y (i.e., x is under constraints (11) and y ≥ 0), f (x) ≥ ψ(y) and f (x∗ ) = ψ(y ∗ ).
(17)
Due to the specifics of the problem, it is not difficult to carry out the operation of minimization in (15). Let’s solve the system ∇x L(x, y) = 0 without constraints on the variables x : ∂L(x, y)/∂xi = −di pi /x2i + [AT y]i = 0, y ≥ 0, i = 1..n. As a result, for arbitrary vector y ≥ 0 we get: d i pi , y ≥ 0, i = 1, 2, ..., n. (18) x ˆi = [AT y]i Taking into account constraints takes the form: ⎧ ⎪ ⎨ai , if xi (y) = di , if ⎪ ⎩ x ˆi , if
(11), the solution x(y) of the system L(x, y) x ˆi ≤ ai , x ˆ i ≥ di , i = 1, 2, ..., n. ai ≤ x∗i ≤ di ,
(19)
The results of the duality theorem make it possible to obtain a solution to the primal problem by Formula (19), where y is the solution of the dual problem.
348
V. Krutikov et al.
Denote θ(y) = −ψ(y). Instead of (16), we obtain the minimization problem min θ(y). y≥0
(20)
Under the earlier assumptions about the primal problem, the function θ(y) is convex [18]. Nonsmooth minimization problem (20) can be solved by subgradient methods. Taking into n account (19), the dual function at the point yk takes the form: θ(yk ) = − i=1 pi di /xk,i − (yk , Axk − b), xk = x(yk ). Its subgradient is θ(yk ) = −(Axk − b), xk = x(yk ). Based on the studies carried out, it was revealed that the solution to the primal problem (10)–(11) consists in solving the problem of minimizing the dual function θ(y) with the subsequent finding of the solution x(y). Expression (18) gives us an explicit form of the dependence of the emissions volumes on the priorities of plants specified by the vector p. This simplifies the analysis of the impact of priority values on emissions and facilitates their rational selection.
4 4.1
Methods for Solving the Problem Reducing to the Problem of Unconstrained Minimization
For m 0). Parameters α, β must satisfy the conditions: α > 1, 0 < β ≤ 1, α · β > 1. We used parameter values α2 = 30, β 2 = 0.2, this provides the resulting extension ratio (α · β)2 = 6. The paper [26] substantiates the accelerating properties of the method (25), (28)–(30), similar to the properties of quasi-Newtonian methods.
5
Analysis of the Algorithms
To analyze the algorithms for solving problem (10)–(11), we used a set of test problems in which the optimal solution x∗ and the vector of optimal Lagrange multipliers y ∗ were known. Table 1 uses the following notation: nx is n from (10), m is the total number of constraints in (11), mA is the number of constraints in (11), which hold as equalities at the minimum point (gi (x) = 0). For the solution errors presented in (24), the notation is following: Δ∗ = (fD − f ∗ )/fD , Δ = (fD − ψθ )/fD . In all tests, for the constraints (11), we assumed that 10 variables of the optimal solution x∗ are on the lower bounds (x∗i = ai ), and 10 more variables are on the upper bounds (x∗i = di ). The test problems were solved by the multistep subgradient method (MM) and the subgradient method with space extension (MH). To solve the dual problem, the transformations of variables (21) or (22) were used, which are denoted |v| and v 2 ,
Method for Calculating the Air Pollution Emission Quotas
351
respectively. The number of equality constraints (active inequality constraints) ranged from 10 to 50. At the same time, the complexity of the problem did not increase. As it turned out, the number of constraints significantly affects the quality of the solution. The solution of test problems on a full set of tests by the MH method with the ratio n = 10 m, where the transformation of variables (21) was used, is given in Table 1. Table 1. Transformation (21) with n = 10 m. Δ∗
Method nx
m (mA)
MH
10000
1000 (10) 3.187 · 10−4
3.198 · 10−4
5000
1.490 ·
10−4
1.493 · 10−4
1.637 ·
10−9
1.677 · 10−9
1000 (20) 5.091 · 10−4
5.149 · 10−4
4.606 · 10−4
4.669 · 10−4
MH MH
1000
MH
10000
MH
5000
MH MH MH MH
1000 10000 5000 1000
500 (10) 100 (10) 500 (20)
Δ
10−10
2.882 · 10−10
1000 (50) 5.259 ·
10−3
5.259 · 10−3
2.334 ·
10−4
2.350 · 10−4
2.926 ·
10−11
2.302 · 10−10
100 (20) 500 (50) 100 (50)
2.079 ·
Here, we did not get a solution of high accuracy, although for a practical problem, this accuracy is quite enough. The analysis of the obtained solutions shows that similar accuracy is typical for the Lagrange multipliers. Zero factors Table 2. Transformation (21) with m = 100. Method nx
m (mA) Δ∗
Δ
MM
10000
100 (10) 9.173 · 10−5
1.470 · 10−4
MM
5000
100 (10) 1.243 · 10−4
1.327 · 10−4
MM
1000
100 (10) 4.563 · 10−4
4.683 · 10−4
MM
10000
100 (20) 3.971 · 10−4
4.112 · 10−4
5000
100 (20) 1.261 ·
10−4
1.341 · 10−4
100 (20) 6.655 ·
10−5
7.323 · 10−5
100 (50) 3.752 ·
10−5
3.840 · 10−5
100 (50) 4.595 ·
10−5
4.653 · 10−5
100 (50) 1.320 ·
10−4
1.333 · 10−4
MM MM MM MM
1000 10000 5000
MM
1000
MH
10000
100 (10) 3.893 · 10−9
3.897 · 10−9
MH
5000
100 (10) 1.826 · 10−9
1.834 · 10−9
1000
100 (10) 1.637 ·
10−9
1.677 · 10−9
100 (20) 2.052 ·
10−10
2.138 · 10−10
100 (20) 5.240 ·
10−10
5.410 · 10−10
100 (20) 2.079 ·
10−10
2.882 · 10−10
100 (50) 2.958 ·
10−11
5.094 · 10−11
MH MH MH MH
10000 5000 1000
MH
10000
MH
5000
100 (50) 2.527 · 10−11 6.761 · 10−11
MH
1000
100 (50) 2.926 · 10−11 2.302 · 10−10
352
V. Krutikov et al.
are easily distinguished from non-zero factors by visual analysis. This enables us to construct the formal procedures for reducing the set of constraints by removing inactive constraints from consideration. For the MM method, the solution quality turned out to be much worse (Δ ≈ 0.1) and we do not present these results. The solution of test problems on a full set of tests by the MM and MH methods with m = 100, where the transformation (21) was used, is shown in Table 2. The MH method enables us to solve problems with much higher accuracy than MM. The solution with the ratio n = 10 m and the transformation of variables (22) was used, is given in Table 3. Here, both methods managed to obtain a solution with high accuracy. Table 3. Transformation (22) with n = 10m. Method nx MM MM MM MM
m (mA)
Δ∗
Δ
10000 1000 (10) 1.432 · 10−10 1.441 · 10−10 1.195 · 10−9
1.197 · 10−9
−9
2.682 · 10−9
10000 1000 (20) 9.608 · 10−9
9.610 · 10−9
−9
4.687 · 10−9
5000 500 (10) 1000 100 (10)
2.674 · 10
MM
5000 500 (20)
4.684 · 10
MM
1000 100 (20)
2.339 · 10−11 3.947 · 10−11
MM
10000 1000 (50) 1.946 · 10−9 −11
1.950 · 10−9 3.778 · 10−11
MM
5000 500 (50)
2.931 · 10
MM
1000 100 (50)
8.011 · 10−11 1.203 · 10−10
MH MH MH MH
10000 1000 (10) 5.999 · 10−10 6.008 · 10−10 5000 500 (10) 1000 100 (10)
1.342 · 10−9
1.344 · 10−9
−9
2.857 · 10−9
2.849 · 10
10000 1000 (20) 6.273 · 10−10 6.290 · 10−10
MH
5000 500 (20)
7.909 · 10−10 7.943 · 10−10
MH
1000 100 (20)
7.831 · 10−10 7.992 · 10−10
MH
10000 1000 (50) 1.000 · 10−10 1.043 · 10−10
MH
5000 500 (50)
2.363 · 10−10 2.448 · 10−10
MH
1000 100 (50)
1.248 · 10−10 1.650 · 10−10
In order to test the influence of the linear dependence of the constraints, some active constraints at the point x∗ of the test problem were duplicated. There were no significant changes in the nature of the convergence of iterative processes on such tests. At the same time, the achieved accuracy has not changed. The dual problem solution required from 400 to 2000 iterations of subgradient methods. Moreover, the number of iterations depended to a greater extent on the stopping criteria, i.e. when the criteria were weakened, we got practically equivalent results. However, to improve reliability, the stopping criteria have been strengthened.
Method for Calculating the Air Pollution Emission Quotas
353
Based on the research conducted, the next conclusions can be drawn: 1. The number of active constraints in the optimal solution practically does not affect the accuracy of the resulting solution. 2. The solution of the problem using transformation (21) are less efficient compared to transformation (22). 3. The multistep MM method is unable to solve the problems with transformation (21). With transformation (22), all problems are solved with high accuracy. 4. The method with space extension MH in the case of transformation (21) allows to obtain a solution to the problem with high accuracy only for m = 100. In the case of transformation (22), all problems are solved with high accuracy. 5. A linear dependence of constraints, leading to a set of solutions to the dual problem, did not affect the solution efficiency by subgradient methods. 6. The dual approach enables us to estimate the accuracy of the solution, both by function and by variables. 7. The proposed simple method for transforming an infeasible solution into a feasible one, despite its non-optimality, is effective in terms of accuracy.
Fig. 1. Isolines of the calculated air pollution from emission sources in the industrial zone (left) and the selection of points with exceeding the MPC (quota points) on the border of the sanitary protection zone and the nearest living sector (right). (Color figure online)
In addition, we give a practical example of the problem (3)–(5) for a set of 251 sources of atmospheric pollution (160 spot, 14 linear and 77 areal) of a several nearby enterprises with a single sanitary protection zone (Fig. 1). Sources
354
V. Krutikov et al.
emit nitrogen dioxide into the atmosphere (MPC = 0.2 mg/m3 ). To calculate the quotas (vector x ), we used the ERA-AIR software complex [29]. The isolines of the maximum one-time surface concentrations obtained when calculating by the method [28] in the mode of searching for the maximum by wind speed and direction shown on the left side of Fig. 1. The calculation was carried out on a regular grid of 8 by 10 km with a step of 250 m. It can be seen that the pollution zone with an excess of the MPC (red line) covers a part of the sanitary protection zone and the living sector closest to the industrial site. Calculated points along the boundaries of the sanitary protection zone and the inner regions of the living sector shown on the right side of Fig. 1. All 270 points with excessive MPC (or other specified level) are automatically highlighted in red, this is a set of quota locations. For solving problem (3)–(5) in the selected quota locations, the influence coefficients Aji are calculated, forming the matrix A. All elements bj of the vector in this example are equal to the MPC, however, it is allowed to set their individual values. The vectors a and d of constraints on the search area for the solution x are specified as follows: the vector d is a set of outliers from all 251 sources for which the original calculation was carried out, vector a = 0.1d. Thus, the emission quota for each source cannot exceed its existing emission, and the reduction coefficient qi at the source cannot be less than 0.1. The priority indicators of enterprises pi , the determination of which is at the discretion of the municipality, are assumed to be the same: pi = 1 for all i. Thus, the initial data of the problem (3)–(5) are completely determined. The ERA-AIR software package included modules for calculating MPE using the previously mentioned methods [14], which allow finding a single particular solution to the system, or the optimal one based on the quadratic programming problem. After the release of the new regulations [3], a module has been added to this software that implements the corresponding method for solving problem (3)–(5), described in this work, for a set of emission sources of city plants. The solution to the problem for 251 sources and 270 constraints takes about 5 s on a personal computer with a 3.00 GHz processor. Table 4 shows a reduction for 10 sources, emissions from the remaining 241 sources do not require reduction. Due to the low dimension of the applied problem, for each of the methods minimizing the dual function in the case of transformation (21), Δ < 10−10 . According to the computational experiment, for solving such applied problems, the MH method can be recommended. This method is less dependent on the degree of problem degeneracy when using duplicate transformations (21) and (22) if one of them is inoperable. In all our experiments, transformation (22) enabled us to obtain a highly accurate solution.
Method for Calculating the Air Pollution Emission Quotas
355
Table 4. Sources for emission reduction, the existing total emission and the total value of MPE.
6
Source number Source code Existing total emission
Smallest possible emission
Reduction coefficient
MPE
i
di
ai
qi
xi
16
82
0.1936
0.0194
0.611
0.1183
155
26
0.1711
0.0171
0.139
0.0237
225
84
15.603
1.5603
0.478
7.4552
226
85
15.0961
1.5096
0.463
6.9936
227
86
4.4992
0.4499
0.908
4.0855
228
87
3.7946
0.3795
0.991
3.7621
229
88
5.8836
0.5884
0.787
4.6295
230
89
18.7785
1.8779
0.442
8.2951
231
90
3.8119
0.3812
0.886
3.3771
232
91
7.3953
0.7395
0.665
4.9174
Conclusion
Having formulated a dual problem with constraints on the positivity of variables, and the number of variables equal to the number of linear constraints of the primal problem, we proposed a method for solving the problem of air pollution emissions quotas. Such an approach enabled us to reduce the primal problem to a problem of lower dimension and obtain an estimate of the accuracy both by function and by variables of the primal problem. Then, the dual problem was then reduced to the problem of unconstrained minimization of a non-smooth function with the use of two types of transformations and was solved with the use of relaxation subgradient methods for non-smooth non-convex functions. We substantiated the applicability of the proposed constructions by an extensive computational experiment on test and applied problems with the number of variables up to 10000 and the number of linear constraints up to 1000. Our experiments demonstrated the ability of the proposed algorithms to obtain a solution with a high degree of accuracy.
References 1. Air quality in Europe – 2020 report (2020). https://www.eea.europa.eu/ publications/air-quality-in-europe-2020-report. Accessed 2 Feb 2021 2. Berlyand, M. E.: Prediction and Regulation of Air Pollution. Springer, Netherlands (1991). https://doi.org/10.1007/978-94-011-3768-3 3. The order of the Ministry of Natural Resources and the Environment of the Russian Federation No814 “On approval of the regulations for air pollution emissions quotas determining (with the exception of radioactive substances)”, Moscow, 29 Nov 2019
356
V. Krutikov et al.
4. Yu, B., Lee, W. S., Rafiq, S.: Air pollution quotas and the dynamics of internal skilled migration in Chinese cities. IZA discussion paper series. https://www.iza.org/publications/dp/13479/air-pollution-quotas-and-thedynamics-of-internal-skilled-migration-in-chinese-cities. Accessed 2 Feb 2021 5. Zhao, C., Kahn, M.E., Liu, Y., Wang, Z.: The consequences of spatially differentiated water pollution regulation in China. J. Environ. Econ. Manage. 88, 468–485 (2018). https://doi.org/10.1016/j.jeem.2018.01.010 6. Qin, Q., Liu, Y., Li, X., Li, H.: A multi-criteria decision analysis model for carbon emission quota allocation in China’s east coastal areas: efficiency and equity. J. Clean. Prod. 168, 410–419 (2017). https://doi.org/10.1016/j.jclepro.2017.08.220 7. Zhu, B., Jiang, M., He, K., Chevallier, J., Xie, R.: Allocating CO2 allowances to emitters in China: a multi-objective decision approach. Energy Policy 121, 441–451 (2018). https://doi.org/10.1016/j.enpol.2018.07.002 8. Tang, L., Wu, J., Yu, L., Bao, Q.: Carbon allowance auction design of China’s emissions trading scheme: a multi-agent-based approach. Energy Policy 102, 30– 40 (2017). https://doi.org/10.1016/j.enpol.2016.11.041 9. Yu, S., Wei, Y.M., Wang, K.: Provincial allocation of carbon emission reduction targets in China: an approach based on improved fuzzy cluster and Shapley value decomposition. Energy Policy 66, 630–644 (2014). https://doi.org/10.1016/j.enpol. 2013.11.025 10. Ren, J., Bian, Y., Xu, X., He, P.: Allocation of product-related carbon emission abatement target in a make-to-order supply chain. Comput. Ind. Eng. 80, 181–194 (2015). https://doi.org/10.1016/j.cie.2014.12.007 11. Tan, Q., Zheng, J., Ding, Y., Zhang, Y.: Provincial carbon emission quota allocation study in China from the perspective of abatement cost and regional cooperation. Sustainability 12, 8457 (2020). https://doi.org/10.3390/su12208457 12. Li, F., Emrouznejad, A., Yang, G., Li, Y.: Carbon emission abatement quota allocation in Chinese manufacturing industries: an integrated cooperative game data envelopment analysis approach. J. Oper. Res. Soc. 71(8), 1259–1288 (2020). https://doi.org/10.1080/01605682.2019.1609892 13. Mi, Z., et al.: Consumption-based emission accounting for Chinese cities. Appl. Energy 184, 1073–1081 (2016). https://doi.org/10.1016/j.apenergy.2016.06.094 14. The order of the Ministry of Natural Resources and the Environment of the Russian Federation No 66 “Regulations on using the system of aggregate calculations of atmospheric pollution for finding admissible quotas of industrial and motor transport emissions”, Moscow (1999) 15. Pozo, C., et al.: Reducing global environmental inequality: determining regional quotas for environmental burdens through systems optimization. J. Clean. Prod. 270, 121828 (2020). https://doi.org/10.1016/j.jclepro.2020.121828 16. Liu, H., Lin, B.: Cost-based modeling of optimal emission quota allocation. J. Clean. Prod. 149, 472e484 (2017). https://doi.org/10.1016/j.jclepro.2017.02.079 17. Wang, X., Cai, Y., Xu, Y., Zhao, H., Chen., J.: Optimal strategies for carbon reduction at dual levels in China based on a hybrid nonlinear grey-prediction and quota-allocation model. J. Clean. Prod. 83, 185e193 (2014). https://doi.org/10. 1016/j.jclepro.2014.07.015 18. Polyak, B. T.: Introduction to Optimization, New York (1987) 19. Chong, E.K.P., Zak, S.H.: An Introduction to Optimization. John Wiley & Sons, Inc., New York (2013) 20. Wolfe, P.: Note on a method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. 7(1), 380—383 (1974). https://doi.org/10.1007/ BF01585533
Method for Calculating the Air Pollution Emission Quotas
357
21. Nesterov, Yu.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009). https://doi.org/10.1007/s10107-007-0149-x 22. Lemarechal, C.: An extension of Davidon methods to non-differentiable problems. In: Balinski, M.L., Wolfe, P. (eds.) Nondifferentiable Optimization. Mathematical Programming Studies, vol 3. pp. 95–109. Springer, Berlin, Heidelberg (1975). https://doi.org/10.1007/BFb0120700 23. Bertsekas, D.P.: Convex Optimization Algorithms. MA., Athena Scientific, Belmont (2015) 24. Krutikov, V.N., Gorskaya, T.A.: Family of relaxation subgradient methods with two-rank correction of metric matrices. Econ. Math. Methods 45(4), 37–80 (2009) 25. Krutikov, V.N., Vershinin, Y.N.: The subgradient multistep minimization method for nonsmooth high-dimensional problems. Vestn. Tomsk. Gos. Univ. Mat. Mekh. 3(29), 5–19 (2014) 26. Krutikov, V.N., Samoilenko, N.S., Meshechkin, V.V.: On the properties of the method of minimization for convex functions with relaxation on the distance to extremum. Autom. Remote Control 80(1), 102—111 (2019) 27. Meshechkin, V.V., Bykov, A.A., Krutikov, V.N., Kagan, E.S.: Distributive model of maximum permissible emissions of enterprises into the atmosphere and its application. In: IOP Conference Series: Earth and Environmental Science, vol. 224(1), 5 February 2019, article ID 012019 (2019) 28. The order of the Ministry of Natural Resources and the Environment of the Russian Federation No273 “On the approval of methods for calculating the dispersion of emissions of harmful (polluting) substances in the air”, Moscow (2017) 29. The ERA-AIR software complex. https://lpp.ru/. Accessed 18 Feb 2021
Bilevel Models for Socially Oriented Strategic Planning in the Natural Resources Sector Sergey Lavlinskii1,2(B) , Artem Panin1,2 , and Alexander Plyasunov1,2 1
Sobolev Institute of Mathematics, Novosibirsk, Russia 2 Novosibirsk State University, Novosibirsk, Russia {lavlin,apljas}@math.nsc.ru
Abstract. This article continues the author’s research into cooperation between public and private investors in the resource region. Unlike previous works, here an attempt was made to explicitly take into account the interests of the local population. This work aims to analyze the partnership mechanisms in terms of efficiency, using the game-theoretical Stackelberg model. Such mechanisms determine the economic policy of the state and play an important role in addressing a whole range of issues related to the strategic management of the natural resource sector in Russia. Models are formulated as bilevel mathematical programming problems with two optimization criteria at the upper level. Effective solution algorithms based on metaheuristics and allowing solving largedimensional problems will be developed. This opens up the possibility of practical study on the real data of the properties of Stackelberg equilibrium, which determines the design of the mechanism for the formation of economic policy, clearly taking into account the interests of the local population. The simulation results will allow not only to assess the impact of various factors on the effectiveness of the generated subsoil development program but also to formulate the basic principles that should guide the state in the strategic management process. Keywords: Stochastic local search · Stackelberg game · Bilevel mathematical programming problems · Strategic planning · Subsoil development program
Introduction The designing of mechanisms to reconcile the long-term interests of the government, the private investor, and the population in the development of the mineral raw materials base (MRMB) is one of the key tasks of strategic planning in a resource-rich region. This task needs to be addressed to ensure investment attractiveness, budget revenues, compliance with environmental restrictions, and an adequate increase in the living standards of the population. What would that take? c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 358–371, 2021. https://doi.org/10.1007/978-3-030-86433-0_25
Bilevel Models for Socially Oriented Strategic Planning
359
The investor needs support to overcome the barriers posed by the lack of necessary infrastructure and the high environmental costs, which are very typical of most of the Siberian and Far Eastern regions of Russia. The government seeks to obtain as much rent as possible in the form of tax payments, exclusive of costs, and to increase budget revenues, using them as a base for developing the territory. The population concerns itself primarily with their income levels and with environmental pollution from the development of mineral resources. How can a compromise be reached between the three stakeholders in the strategic planning process? In our previous works [3,8,9,17], we examined models for reconciling the interests of the government and the private investor whereby the interests of the local population were taken into account only indirectly. The present paper is an attempt at building a socially oriented strategic planning model that pays explicit attention to the interests of the population. The model is formulated as a bilevel mathematical programming problem with two criteria at the upper level, one of which is the objective function of the local population. This approach addresses the features of the hierarchy of interactions between the government and the private investor in the mineral raw materials sector [1,2] and allows for reaching a compromise between the interests of the budget, the population, and the private investor and for designing a natural resources development program that would be effective in terms of sustainable development prospects.
1
Model Toolkit
In order to develop an effective mechanism of economic policy development that ensures a compromise between the interests of the government, the population, and the private investor, we propose to use the Stackelberg equilibrium search procedure. In the process of interaction (two periods, sequential choice), the government acts as a leader by launching some of the environmental and infrastructure projects and creating incentives for the investor to participate in infrastructure construction within the classical public-private partnership model [10–12]. The private investor thus plays the role of a follower, rationally choosing an MRMB development program in response to the actions of the government, which addresses the issue of additional investor costs associated with environmental protection and spatial referencing of the projects. This conceptual model, which assumes that the government makes the first move by investing public finances into infrastructure development and the implementation of some of the environmental protection projects, is quite consistent with the existing practice. In the case of regions with a low level of infrastructure development in general and housing and communal services in particular (these are almost all the regions in the east of Russia), there are convincing grounds for the government, which represents the interests of society, to also participate in private projects to pursue environmental protection goals. Firstly, new environmental protection facilities can perform not only production-related but also important social functions (treatment facilities, landfills for waste disposal, etc.). Most settlements in remote regions of Siberia and the Far East have
360
S. Lavlinskii et al.
never had any such facilities. Their construction allows for the systemic development of comfortable housing and socially relevant facilities and helps resolve the long-standing issue of solid household waste. The second reason is that environmental protection projects often need to be launched from scratch. This is why they may objectively require an initial investment on such a scale that the private investor would lose motivation for production development. The economic mathematical toolkit for solving the thus formulated task is a bilevel mathematical programming problem, the solution of which defines a Stackelberg equilibrium [7]. It is this solution that can be used as the basis for a socially oriented development strategy that harmonizes the long-term interests of the government, the population, and the private investor in the process of acquiring the natural wealth of a resource-rich region. A formal description of the model can be presented as follows. We use the following notation: T is a planning horizon; I is a set of investment projects; J is a set of infrastructure development projects; K is a set of environmental projects; Investment project i in year t : CF Pit is the cashflow (the difference between the incomes and expenses of all kinds); EP Pit is the environmental damage from the implementation of the project; ZP Pit is the salary received by the population during the implementation of the project; DBPit is the government revenue from the implementation of the project. Infrastructure development project j in year t: ZIjt is the costs of implementation of the project; ZP Ijt is the salary received by the population during the implementation of the project; EP Ijt is the environmental damage from the implementation of the project; V DIjt is the government revenue from local economic development as a result of the implementation of the project. Environmental project k in year t: ZP Ekt is the salary received by the population during the implementation of the project; EDEkt is the incomes from the implementation of the project; ZEkt is the costs of implementation of the project. The discounts of the government, population and the investor: DG, DP and DI respectively. The budget constraints: I bG t is the government budget in year t; bt is the investor budget in year t.
Bilevel Models for Socially Oriented Strategic Planning
361
The matrices μ and ν define the relationship between the projects, where μij is a coherence indicator for the infrastructure and investment projects, i ∈ I, j ∈ J, and νij is a coherence indicator for the environmental and investment projects, i ∈ I, k ∈ K: ⎧ ⎨ 1, if the implementation of investment project i requires the implementation of infrastructure development project j, μij = ⎩ 0 otherwise; ⎧ ⎨ 1, if the implementation of investment project i requires the implementation of environmental project k, νik = ⎩ 0 otherwise. We use the following integer variables: ⎧ ⎨ 1, if the government is prepared to launch infrastructure development project j (the government has included it into the budget expenses), x ¯j = ⎩ 0 otherwise; xj =
1, if the government launches infrastructure development project j, 0 otherwise;
⎧ ⎨ 1, if the government is prepared to launch environmental project k (the government has included it into the budget expenses), y¯k = ⎩ 0 otherwise; ⎧ ⎨ 1, if the government launches environmental project k as agreed with the investor, yk = ⎩ 0 otherwise; vj =
1, if the investor launches infrastructure development project j, 0 otherwise; zi =
uk =
1, if the investor launches investment project i, 0 otherwise;
1, if the investor launches environmental project k, 0 otherwise.
¯ t , Wt is the schedule of compensation payments for infrastructure developW ment in year t, which was proposed by the government and used by the investor. can be formulated as follows: The upper-level problem PG (ZP Pit − EP Pit )zi + (ZP Ijt − EP Ijt )(xj + vj ) (1) t∈T
+
i∈I
k∈K
j∈J
(ZP Ekt + EDEkt )(yk + uk ) /(1 + DP )t →
max
x,y,W,v,u,z
362
S. Lavlinskii et al.
−
t∈T
DBPit zi +
i∈I
ZIjt xj −
j∈J
V DIjt (xj + vj )
ZEkt yk − Wt /(1 + DG)t →
k∈K
subject to: 1≤t≤ω
(2)
j∈J
ZIjt x ¯j +
j∈J
max
x,y,W,v,u,z
¯t ≤ ZEkt y¯k + W bG t ;ω ∈ T;
k∈K
(3)
1≤t≤ω
¯ t ≥ 0; t ∈ T ; W
(4)
¯ t = 0; 0 ≤ t ≤ T0 ; W
(5)
∗
¯ ). (x, y, W, z, u, v) ∈ F (¯ x, y¯, W
(6)
The set F ∗ is a set of optimal solutions of the following low-level parametric x, y¯, W ¯ ): investor problem PI(¯ CF Pit zi − ZEkt uk − ZIjt vj + Wt /(1 + DI)t → max (7) t∈T
i∈I
x,y,W,z,u,v
j∈J
k∈K
subject to: 1≤t≤ω
t∈T
ZEkt uk +
ZIjt vj /(1 + DI)t ≥ 0;
j∈J
ZIjt vj −
j∈J
k∈K
Wt −
i∈I
(ZP Pit − EP Pit )zi +
(8)
CF Pit zi − Wt ≤ bIt ; ω ∈ T ;
(9)
1≤t≤ω
(ZP Ijt − EP Ijt )(xj + vj ) t∈T i∈I j∈J (ZP Ekt + EDEkt )(yk + uk ) /(1 + DP )t ≥ 0; +
(10)
k∈K
xj + vj ≥ μij zi ; i ∈ I, j ∈ J;
(11)
xj + vj ≤ 1; j ∈ J;
(12)
yk + uk ≥ νik zi ; i ∈ I, k ∈ K;
(13)
yk + uk ≤ 1; k ∈ K;
(14)
νik zi ≥ yk + uk ; k ∈ K;
(15)
i∈I
t∈T
DBPit zi − Wt /(1 + DG)t ≥ 0;
(16)
i∈I
¯j ; j ∈ J; xj ≤ x
(17)
Bilevel Models for Socially Oriented Strategic Planning
363
yk ≤ y¯k ; k ∈ K;
(18)
¯ t; t ∈ T ; Wt ≤ W
(19)
xj , yk , vj , zi , uk ∈ {0, 1}; i ∈ I, k ∈ K, j ∈ J.
(20)
In the formulated model, the first objective function (1) expresses the interests of the population and takes into account the wages earned and the incomes from the implementation of environmental projects minus a cost estimate for environmental pollution damage. This approach is consistent with the population being primarily concerned with their income levels and with environmental pollution from mineral resources development. The second objective function (2) expresses the government’s desire to obtain as much rent as possible in the form of tax payments, exclusive of costs, and to increase budget revenues, using them as a base for developing the territory. The government starts the infrastructure compensation payments to the investor after a lapse of T0 years (e.g., since the time of receipt of the first tax payments from the investor) (4), (5). The schedule of the compensation payments should ensure: (i) for the investor, a compensation of his infrastructure expenses with a discount factor (8), and (ii) for the government, a balance between the budget revenues and the compensation payments to the investor (16). Constraints (11)–(15) formalize the relationships between the industrial, infrastructural, and environmental projects. Each infrastructural and environmental project can only be launched by one of the partners and must be necessary for the realization of some industrial project. An infrastructural or environmental project can likewise be assigned to the government only under the condition that the government has put the respective project onto its list (17), (18). How does the setting of the proposed model compare with the previous works of the authors? The whole range of mechanisms for the formation of investment policy (from the classical PPP model to its “Russian” modification policy) has been investigated in works [3,8,9,17,18]. Here, reaching a compromise is based on the procedure for coordinating the interests of the budget and the private investor. In this work, the main part of the relationships describing the mechanism of the partnership between the government and the private investor is the same, but a step has been taken towards the ideas of a “green economy”, and the population is becoming a full-fledged participant in the formation of investment policy. It gets a voice - an objective function (1) at the top level, and the ability to block “dirty” mining technologies (10). In the objective function of the government (2) and the ratio of the balance of tax revenues and compensation to the investor (16), purely budgetary components remain. As a result, the model output provides economic policy parameters: {x, y, W, v, u, z}, which realize a compromise between the interests of the budget, the population, and the private investor.
364
2
S. Lavlinskii et al.
Computational Complexity and Solution Algorithm
The investigated two-criterion problem (1)–(20) is a generalization of the problem with the sum of criteria, which was considered in [18]. Therefore, the investigated problem of public-private partnership is not just NP-hard, but belongs to the class of Σ2P -hard problems. It is a much more difficult problem than any NP (NPO) problem [18]. Therefore, it is not necessary to develop an efficient exact algorithm for solving the problem even on relatively small dimensions. An efficient approximate solution algorithm is proposed below. In previous works on public-private partnership [8,9,14,15,17,18], a hybrid algorithm based on stochastic local search and the CPLEX solver in various modifications was proposed for the solution. The main idea of the algorithm is a sequential solution of the one-level problem of the investor with a constraint on the value of the objective function of the government (including the interests of the population) for calculating a “good” starting solution for the stochastic local search. To solve the problem (1)–(20) and successfully conduct the numerical experiment described below, a new modification of the algorithm is proposed. The main idea of the numerical experiment is to move away from the search for the Pareto frontier to the search for the economically feasible part of its approximation. The government, as a person making decisions for itself (filling the budget) and for the population, must find a balance, a compromise between the short-term interests of the population of the planning and implementation regions of public-private partnership and the long-term interests of the government itself (the growth of the welfare of the entire population). To make the appropriate correct (effective) management decisions, it is reasonable to analyze the total revenues of the government (budget) and the population when the level of income of the population changes. What does it mean? We solve the bilevel problem (1), (3)–(20) with the population objective function at the upper level and determine P OFmax , i.e., the maximum possible value of (1). We choose the number of levels N P I to take into account the population interests. Then, we solve N P I bilevel problems (23), (23), (3)–(20) with the aggregate criterion (21) and an additional constraint (22) at the upper level: t∈T
(DBPit + ZP Pit − EP Pit )zi +
i∈I
(V DIjt + ZP Ijt − EP Ijt )(xj + vj ) (21) j∈J
ZEkt yk − Wt j∈J k∈K t + (ZP Ek + EDEkt )(yk + uk ) /(1 + DP )t → −
ZIjt xj
−
k∈K
t∈T
+
k∈K
i∈I
(ZP Pit − EP Pit )zi +
j∈J
max
x,y,W,v,u,z
(ZP Ijt − EP Ijt )(xj + vj )
(ZP Ekt + EDEkt )(yk + uk ) /(1 + DP )t ≥ P OFmax · P I,
(22)
Bilevel Models for Socially Oriented Strategic Planning
365
P I = l/N P I, l = 0, ..., N P I. For each l we obtain the solution {x, y, W, v, u, z}l . The set {{x, y, W, v, u, z}l , l = 0, ..., N P I} is the economically feasible part of the Pareto frontier of problem (1)–(20). The main difficulty in developing an effective solution algorithm is that the number of experiments is too large (it is proportional to N P I). Therefore, the following solution algorithm scheme was proposed. As before [18], by the solu¯ ). tion of the problem we mean the value of the government variables (¯ x, y¯, W To calculate the value of the objective function, we will solve the mixed-integer linear investor problem using the solver (software package) CPLEX. We describe the algorithm scheme as follows: Step 1: Let’s find the starting solution to the problem by solving the one-level problem of the government with investor constraints. In other words, let the government make decisions for all players. As the experiment shows, this starting solution is quite effective. Step 2: The government variables can be divided into two subsets: Boolean (¯ x, y¯) ¯ ). Let’s apply the following greedy procedure. First, we fix and continuous (W the Boolean variables and find the value of continuous variables: Step 2.1: Let’s randomly generate a set (of a given size) of continuous variables value as follows. Step 2.2: First, let’s calculate the budget expenditures on environmental and infrastructure projects at each moment in time (at each year) according to the value of Boolean variables and subtract them from the budget. Further, in each year, starting from the first, we will randomly (with a uniform distribution) ¯ t from the interval from 0 to the maximum balance of the budget in choose W year t, taking into account the transfer of the balance from previous years. Step 2.3: From the set of continuous variables, we choose the one that gives the maximum income of the upper-level problem. Step 3: Let us fix continuous variables and choose the value of Boolean variables: Step 3.1: Let’s randomly generate a set (of a given size) of Boolean variables assignment as follows (based on the current assignment). Step 3.2: With probability 1/(|J| + |K|), we change each component of Boolean vectors to the inverse (1 to 0 and 0 to 1). Step 3.3: From the set of Boolean variables assignment, choose the one that gives the maximum income of the upper-level problem. Step 4: If the stopping criterion is not done (by the number of iterations limitation or the counting time limitation) then go to Step 2. Such an algorithm combines the ideas of stochastic local search and alternating heuristics and allows finding an effective solution in a relatively short
366
S. Lavlinskii et al.
period of time. To manage the experiment, the following values of the algorithm parameters were empirically selected: limitation on the counting time is 500 s; the size of the sets of Boolean and continuous variables is 1000.
3
Numerical Experiment
The database of the model (1)–(20) builds upon special forecasting models, which describe in detail the processes of realization of all the three types of projects [17]. The actual data describe a fragment of the Zabaykalsky Krai MRMB, which consists of 20 deposits of polymetallic ores. The experiment considers the implementation of 20 environmental and 10 infrastructural projects (railroad, powerlines, auto roads), combined in such a way that the realization of the entire infrastructural and environmental program would enable the launching of all the MRMB development (i.e., industrial) projects. What did we get from directly accounting for the population‘s interests in choosing a development strategy? First of all, we can now try to answer the question about the economic policy levers that benefit the population. And here it is important to take into account the ecological part of the development strategy. Let us designate as ECL and ELL, respectively, the levels of environmental costs and environmental losses in the development program {x, y, W, v, u, z}. The ECL parameter is determined by the ratio between the total costs of implementing environmental and production projects. The ELL parameter characterizes the ratio between the environmental damage and the positive effects of implementing the program: (23) EP Pit zi + EP Ijt (xj + vj ) /(1 + DG)t / ELL = t∈T i∈I j∈J t t / (DBPi + ZP Pi )zi + (V DIjt + ZP Ijt )(xj + vj ) t∈T
i∈I
+
j∈J
(ZP Ekt
+
EDEkt )(yk
+ uk ) /(1 + DG)t .
k∈K
Figures 1 and 2 show for N P I = 5 the results of calculations based on actual data using model (1)–(20) with a varying level of environmental losses of the mining technologies used ELP . This parameter multiplies the initial environmental damage EP P and thus directly affects the population objective function (1). A distinctive feature of today is the use of ecologically faulty technology in the mineral resources sector. How does the government respond to the growing pollution? At P I = 0, the government takes no direct account of the population interests in the decision-making process. Consequently, as ELP increases, the government provides no support for the implementation of environmental projects (Fig. 1) and retracts from infrastructure projects. As a result, fewer deposits are developed, and the functionals of the government and the investor decrease. At P I > 0 with increasing levels of environmental pollution, the government adapts its economic policy following P I, i.e., the degree of taking into account the population interests. At high environmental losses, the government builds
Bilevel Models for Socially Oriented Strategic Planning
367
Fig. 1. The number of environmental projects implemented by the government and the number of production projects launched by the investor.
more infrastructure as P I increases, and it also implements more environmental projects, which collectively open up a larger number of deposit development projects for the investor (Fig. 2). Thus, an increase in the degree of taking into account the population interests leads to an increase in the value of the population objective function (1), which means that the government chooses greener yet less budget-friendly projects, which together reduce functional (2). This behavior of the government is significantly different from what we have seen in previous models [9,17,18]. Here, the government did not undertake any environmental projects, focusing assistance to the investor on the compensation mechanism.
Fig. 2. Objective functions of the population (1) and the government (2).
The discount of the population, which spends its human capital, is substantively justified in the model, and it largely affects the equilibrium solution. At DG = 5% and DI = 15%, the government chooses a policy of supporting the investor in the implementation of environmental projects, depending on the discount of the population. The DP range within which the government covers a part of the environmental costs, increases with an increasing degree of taking into account the population interests P I. Small values of the population discount, coupled with a small P I, lead to the government pursuing a development program with high environmental losses (Fig. 3).
368
S. Lavlinskii et al.
Fig. 3. The number of environmental projects implemented by the government and the level of environmental losses in the development program ELL.
The discount of the government is a factor determining the architecture of cooperation in the process of extracting natural resources. It exerts a significant influence on the government functional (2) only at small values of P I. As soon as the government takes into account the population interests, it is forced to choose small values of the discount despite the higher environmental protection costs (Fig. 4, DI = 15%, DP = 5%).
Fig. 4. The government objective function and the level of environmental costs ECL.
The discount of the investor, DI, best shows how favorable the investment climate is in the Russian natural resources sector. Figures 5 and 6 show the results of calculations for fixed values of the discounts of the government and the population: DG = 5% and P D = 5%. The policy of ignoring the population interests in an uncomfortable investment climate leads to a detrimental environmental situation with the maximum ECL (environmental cost level) and ELL (environmental loss level). The investor also loses from such a policy and benefits from an increase in P I, which yields an increase in the investor functional at any value of its discount. When the investment climate deteriorates, accompanied by an increase in ID, the government objective function decreases at a rate that depends on P I; i.e.,
Bilevel Models for Socially Oriented Strategic Planning
369
Fig. 5. The levels of environmental costs and environmental losses in the development program.
the greater the degree of taking into account the population interests, the faster the decrease in the functional. However, at small values of DI, which correspond to comfortable conditions for the investor in mineral resources development, the value of the government objective function (2) grows with an increase in the degree of taking into account the population interests (Fig. 6).
Fig. 6. Objective functions of the investor and the government.
What is new, in comparison with the previous works of the authors, given by taking into account the interests of the population in the model of the formation of an investment program? In the optimal solution of the models [17,18] the government only took upon itself the construction of a part of the infrastructure. In model (1)–(20), the state expands the toolkit for helping the investor and implements some environmental projects. At the same time, the front of the “state” environmental construction is expanding with the growth of the level of pollution caused by the used mining technologies. This is consistent with the general theory of sustainable development and allows you to find a reasonable compromise between efficiency and environmental impact, the quality of which largely determines the standard of living of the region’s population.
370
4
S. Lavlinskii et al.
Results and Discussion
The bilevel mathematical programming model was proposed for reconciling the long-term interests of the government, the private investor, and the population in the process of developing the mineral raw materials base. To this end, algorithms were constructed for solving problems of higher dimensions and formulating real-life spatial development plans. This approach sets a foundation for a practical methodology of socially oriented strategic planning in the natural resources sector. Numerical experiments were carried out using actual data, which allowed us to formulate the following conclusions, which may be useful in mineral resources sector management. 1. When working with investors that may potentially be using faulty technology to extract natural resources, the government should pay direct attention to the interests of the local population. Otherwise, it will be forced to retract from its infrastructure projects thereby reducing the number of deposits to be developed and blunting the effectiveness of the program both for the budget and the investor. 2. When the government pays considerable attention to the population interests, it tends to choose greener yet less budget-friendly projects thereby reducing the objective function of the government. This not only stabilizes the social climate in the region but also makes it more attractive for private investors. 3. In resource-rich regions with a comfortable investment climate, consideration of the population interests in the strategic planning process may not only create an increase in the investor’s efficiency but also improve the value of the government objective function. It is here that one should look for a development program that meets the principles of a “green” economy, which, in turn, opens up prospects for sustainable development. Acknowledgements. The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0014). This work was financially supported by the Russian Foundation for Basic Research (projects numbers 20-010-00151 and 19-410-240003).
References 1. Glazyrina, I.P., Kalgina, I.S., Lavlinskii, S.M.: Problems in the development of the mineral and raw-material base of Russia’s Far East and prospects for the modernization of the region’s economy in the framework of Russian-Chinese cooperation. Reg. Res. Russ. 3(4), 405–413 (2013). https://doi.org/10.1134/S2079970514010055 2. Glazyrina, I.P., Lavlinskii, S.M., Kalgina, I.S.: Public-private partnership in the mineral resources complex of Zabaikalskii krai: Problems and prospects. Geogr. Nat. Resour. 35(4), 359–364 (2014). https://doi.org/10.1134/S1875372814040088 3. Glazyrina, I., Lavlinskii, S.: Transaction costs and problems in the development of the mineral and raw-material base of the resource region. J. New Econ. Assoc. New Econ. Assoc. 38(2), 121–143 (2018)
Bilevel Models for Socially Oriented Strategic Planning
371
4. Weisbrod, G., Lynch, T., Meyer, M.: Extending monetary values to broader performance and impact measures: transportation applications and lessons for other fields. Eval. Program Plann. 32, 332–341 (2009) 5. Lakshmanan, T.R.: The broader economic consequences of transport Infrastructure investments. J. Transp. Geogr. 19(1), 1–12 (2011) 6. Mackie, P., Worsley, T., Eliasson, J.: Transport appraisal revisited. Res. Transp. Econ. 47, 3–18 (2014) 7. Dempe, S.J.: Foundations of Bilevel Programming. Kluwer Academ. Publishers, Dordrecht (2002) 8. Lavlinskii, S.M., Panin, A.A., Plyasunov, A.V.: A bilevel planning model for public–private partnership. Autom. Remote. Control. 76(11), 1976–1987 (2015). https://doi.org/10.1134/S0005117915110077 9. Lavlinskii, S., Panin, A., Pliasunov, A.: Stackelberg model and public-private partnerships in the natural resources sector of Russia. In: Lecture Notes in Computer Sciences, vol. 11548, pp. 158–171 (2019). https://doi.org/10.1007/978-3-03022629-9-12 10. Reznichenko, N. V.: Public-private partnership models. Bulletin of St. Petersburg Univ., Series 8: Management. 4, 58–83 (2010). (in Russian) 11. Quiggin, J.: Risk, PPPs and the public sector comparator. Aust. Account. Rev. 14(33), 51–61 (2004) 12. Grimsey, D., Levis, M.K.: Public Private Partnerships: The Worldwide Revolution in Infrastructure Provision and Project Finance. Edward Elgar, Cheltenham (2004) 13. Lavlinskii, S.M.: Public-private partnership in a natural resource region: ecological problems, models, and prospects. Stud. Russ. Econ. Dev. 21(1), 71–79 (2010). https://doi.org/10.1134/S1075700710010089 14. Bondarenko, A.N., Bugueva, T.V., Dedok, V.A.: Inverse problems of anomalous diffusion theory: an artificial neural network approach. J. Appl. Ind. Math. 10(3), 311–321 (2016). https://doi.org/10.1134/S1990478916030017 15. Lavlinskii, S., Panin, A., Pliasunov, A.: Public-private partnership models with tax incentives: numerical analysis of solutions. CCIS 871, 220–234 (2018). https://doi. org/10.1007/978-3-319-93800-4-18 16. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti- Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and their Approximability Properties. Springer, Berlin (1999) 17. Lavlinskii, S., Panin, A., Plyasunov, A.: The Stackelberg model in territotial planning. Autom. Remote. Control. 80(2), 286–296 (2019) 18. Lavlinskii, S., Panin, A., Plyasunov, A.: Bilevel models for investment policy in resource-rich regions. In: Kochetov, Y., Bykadorov, I., Gruzdeva, T. (eds.) MOTOR 2020. CCIS, vol. 1275, pp. 36–50. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-58657-7 5
Strong Stability in Finite Games with Perturbed Payoffs Yury Nikulin1(B) and Vladimir Emelichev2 1
2
University of Turku, Vesilinnantie 5, 20014 Turku, Finland [email protected] Belarusian State University, Nezavisimosti 4, 220030 Minsk, Belarus
Abstract. We consider a finite game of several players in a normal form with perturbed linear payoffs where perturbations formed by a set of additive matrices, with two arbitrary H¨ older norms specified independently in the outcome and criterion spaces. The concept of equilibrium is generalized using the coalitional profile, i.e. by partitioning the players of the game into coalitions. In this situation, two extreme cases of this partitioning correspond to the Pareto optimal outcome and the Nash equilibrium outcome, respectively. We analyze such type of stability, called strong stability, that is under any small admissible perturbations the efficiency of at least one optimal outcome of the game is preserved. The attainable upper and lower bounds of such perturbations are specified. The obtained result generalizes some previously known facts and sheds more light on the combinatorial specific of the problem considered. Some numerical examples illustrating the main result are specified.
Keywords: Post-optimal analysis radius · Parametric optimality
1
· Multiple criteria · Strong stability
Introduction
The stability of the problem is usually regarded in optimization context as one of classical properties of the continuity or semi-continuity (for example, by F. Hausdorff or G. Berge) of the optimal mapping, i.e. a set-valued mapping which puts in correspondence the set of optimal (efficient) solutions to each set of problem parameters. By virtue of discreteness of the feasible solution set, such a definition of stability can be easily reformulated in terms of existence of a stability sphere. It is such a neighborhood of the initial data in the space of problem parameters that any “perturbed” problem with parameters from this neighborhood possesses some property of invariance with respect to the initial problem. For example, the upper semi-continuity by Hausdorff of the optimal mapping of a discrete optimization problem turns into the property of non-appearance of new optimal solutions under any changes of the problem parameters within a “small” neighborhood of the initial data. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 372–386, 2021. https://doi.org/10.1007/978-3-030-86433-0_26
Strong Stability in Finite Games
373
Game-theoretic models target finding classes of outcomes that are rationally coordinated in terms of possible actions and interests of participants (players) or groups of participants (coalitions). For each game in normal form, coalitional and non-coalitional equilibrium concepts (principles of optimality) are used, which usually lead to different game outcomes. In the theory of non-antagonistic games there is no single approach to the development of such concepts. The most famous one is the concept of the Nash equilibrium [6,7], as well as its various generalizations related to the problems of group choice, which is understood as the reduction of various individual preferences into a single collective preference. This paper continues investigation started in [4] where a parametrization of the equilibrium concept of a finite game in normal form was introduced. The parameter of this parameterizations is the method of dividing players into coalitions, in which the two extreme cases (a single coalition of players and a set of single-player coalitions) correspond to the Pareto optimal outcome and the Nash equilibrium outcome. In there, a type of stability of the game to perturbations of the parameters of the player payoff functions was considered, which is a discrete analog of the Hausdorff upper semicontinuity property [1] of a multi-valued mapping that maps any set of game parameters to the corresponding set of all generalized equilibrium outcomes. As a result of the parametric analysis, the formula for the radius of quasistability of the coalition game was found under the assumption that arbitrary norms are specified in the two-dimensional space of game parameters in [4]. In this paper we consider another variant of stability, the so-called strong stability. It is defined in a way that is under any small admissible perturbations the efficiency of at least one optimal outcome of the game is preserved. The attainable upper and lower bounds of such perturbations have been specified. The paper is organized as follows. In Sect. 2, we formulate parametric optimality and introduce basic definitions along with the notation. Section 3 contains some auxiliary properties and four lemmas used later for the proof of the main result. In Sect. 4, we formulate and prove the main result regarding the strong stability radius. Section 5 provides a list of important corollaries. Brief concluding remarks appear in Sect. 6.
2
Definitions and Notation
Consider a game of several players in normal form, where every player j ∈ Nn = {1, 2, . . . , n}, n ≥ 2 is choosing an action (antagonistic strategy) xj to play from the finite set Xj = E = {0, 1}. The outcome of the game is a realization of the strategies chosen by all the players. Given a set of all possible outcomes of the game X= Xj = En = {0, 1}n . j∈Nn
For each player i ∈ Nn we define a linear payoff function fi (x) = Ci x, i ∈ Nn ,
374
Y. Nikulin and V. Emelichev
where Ci is the i-th row of a square matrix C = [cij ] ∈ Rn×n , x = (x1 , x2 , . . . , xn )T , xj ∈ Xj , j ∈ Nn . We assume all players try to maximize own payoffs simultaneously: (1) Cx = (C1 x, C2 x, ..., Cn x)T → max . x∈X
Since individual objectives are usually conflicting, a certain parameterized optimality principle will be introduced later. A non-empty subset J ⊆ Nn is called a coalition of players. For a coalition J and game outcome x0 = (x01 , x02 , . . . , x0n ) we introduce a set V (x0 , J) = Vj (x0 , J) j∈Nn
where
Xj if j ∈ J, Vj (x0 , J) = 0 {xj } if j ∈ Nn \J.
Thus, Vj (x0 , J) is the set of outcomes that are reachable by coalition J from the outcome x0 . It is clear that V (x0 , Nn ) = X for any x0 . In the space Rk of arbitrary dimension k ∈ N we introduce a binary relation that generates the Pareto optimality principle. y ≺ y ⇔ y ≤ y & y = y , where y = (y1 , y2 , ..., yk )T ∈ Rk , y = (y1 , y2 , ..., yk )T ∈ Rk . The symbol ≺, as usual, denotes the negation of the relation ≺. Definition 1. Let s ∈ Nn , and let Nn = Jk be a partition of the set Nn k∈Ns
into s nonempty sets (coalitions), i.e. Jk = ∅, k ∈ Ns , and p = q ⇒ Jp ∩ Jq = ∅. A set of (J1 , J2 , ..., Js )-efficient outcomes is introduced according to the formula: Gn (C, J1 , J2 , . . . , Js ) = x ∈ X : (2) ∀k ∈ Ns ∀x ∈ V (x, Jk ) CJk x ≺ CJk x , where CJk is a submatrix of matrix C consisting of rows that correspond to players in coalition Jk . Sometimes for brevity, we denote this set by Gn (C). Thus, preference relations between players within the same coalition is based on Pareto dominance. Obviously, any Nn -efficient outcome x ∈ Gn (C, Nn ) (s = 1, i.e. all players are united in one coalition) is Pareto optimal, i.e. efficient outcome to game (1). Therefore, the set Gn (C, Nn ) is the Pareto set P n (C) defined below. Definition 2. The Pareto set of the game Z n (C) is defined as: P n (C) = x ∈ X : X(x, C) = ∅ , where
X(x, C) = x ∈ X : Cx ≺ Cx .
Strong Stability in Finite Games
375
In the other extreme case, when s = n, Gn (C, {1}, {2}, ..., {n}) becomes a set of the Nash equilibria [6,7]. This set is denoted by N E n (C) and defined as follows. Definition 3. The Nash set of the game Z n (C) is defined as:
N E n (C) = x ∈ X : ∃k ∈ Nn ∃x ∈ X Ck x < Ck x & xNn \{k} = xNn \{k} , where xNn \{k} is a projection of vector x ∈ X to the coordinate axis with numbers from the set Nn \{k}. We assume that the game is such that it has at least one Nash equilibrium. It is easy to see that rationality of the Nash equilibrium is that no player can individually deviate from the own equilibrium strategy choice while others keep playing their equilibrium strategies. Strict axioms regarding perfect and common (shared) knowledge are assumed to be fulfilled [9]. Thus, we have just introduced a parametrization of the equilibrium concept for a finite game in normal form. The parameter s of this parameterizations is the partitioning of all the players into coalitions J = (J1 , J2 , ..., Js ), in which the two extreme cases (a single coalition of players and a set of n single-player coalitions) correspond to finding the Pareto optimal outcomes P n (C) and the Nash equilibrium outcomes N E n (C), respectively. Denoted by Z n (C, J1 , J2 , . . . , Js ), the game consists in finding the set Gn (C, J1 , J2 , . . . , Js ). Sometimes for brevity, we use the notation Z n (C) for this problem. Without loss of generality, we assume that the elements of partitioning Jk be defined as follows: Nn = k∈Ns
J1 = {1, 2, . . . , t1 }, J2 = {t1 + 1, t1 + 2, . . . , t2 }, ... Js = {ts−1 + 1, ts−1 + 2, . . . , n}. k
For any k ∈ Ns , let C denote a square submatrix of size |Jk | × |Jk |, consisting of those matrix C elements locates at the crossings of rows and columns with numbers Jk , and let P (C k ) is the Pareto set of the |Jk |-criteria problem. C k z → max , z∈XJk
where z = (z1 , z2 , . . . , z|Jk | )T , and XJk is a projection of X onto Jk , i.e. X Jk =
j∈Jk
Xj .
376
Y. Nikulin and V. Emelichev
In particular case s = 1, we have P (C) = Gn (C, Nn ). It is evident that all matrices C (k) , k ∈ Ns , form a diagonal block matrix C. Due to the fact that the payoff linear functions Ci x, i ∈ Nn are separable, the following equality is valid: n
G (C, J1 , J2 , . . . , Js ) =
s
P (C k ).
(3)
k=1
In the definition of (J1 , J2 , ..., Js )-efficiency in the game Z m (C, J1 , J2 , ..., Js ) with matrix C ∈ Rn×n due to its separable structure only block-diagonal elements matter. Thus, we can say that C ∈ Rn×n induces a matrix bundle C˜ = {C 1 , C 2 , . . . , C s }, i.e. a diagonal structure with blocks C k ∈ R|Jk |×|Jk | , k ∈ Ns . Thus, the set of (J1 , J2 , ..., Js )-efficient outcomes of the game ˜ J1 , J2 , . . . , Js ), or, Z m (C, J1 , J2 , ..., Js ) here and after will be denoted Gn (C, ˜ shortly, Gn (C). older’s In the space of game outcomes Rk , k ≥ 2, we define an arbitrary H¨ norm lp , p ∈ [1, ∞], i.e. by the norm of the vector a = (a1 , a2 , ..., ak )T ∈ Rk we mean the number ⎧ 1/p ⎪ ⎪ ⎨ p |aj | if 1 ≤ p < ∞, ap = j∈Nk ⎪ ⎪ ⎩max |a | : j ∈ N if p = ∞. j
k
The norm of the matrix C ∈ Rk×k with the rows Ci , i ∈ Nk , is defined as the norm of a vector whose components are the norms of the rows of the matrix C. By that, we have Cpq = (C1 p , C2 p , . . . , Ck p )q , older’s norm, i.e. lp may differ from lq in general where lq , q ∈ [1, ∞], is another H¨ case. It is easy to see that for any p, q ∈ [1, ∞], and for any i ∈ Nn we have Ci p ≤ Cpq .
(4)
The norm of the matrix bundle is defined as follows: ˜ max = max C k pq : k ∈ Ns . C Perturbation of the elements of the matrix n×n bundle C˜ = {C 1 , C 2 , . . . , C s } is imposed by adding perturbing matrix bundle ˜ = {B 1 , B 2 , . . . , B s }, B where B k ∈ R|Jk |×|Jk | are matrices with rows Bik , i ∈ Nn , k ∈ Ns . Thus, the set of (J1 , J2 , ..., Js )-efficient outcomes of the perturbed game here and after will be ˜ J1 , J2 , . . . , Js ), or, shortly, Gn (C˜ + B). ˜ denoted as Gn (C˜ + B,
Strong Stability in Finite Games
377
For an arbitrary number ε > 0, we define a bundle of perturbing matrices s ˜∈ ˜ max < ε , Ω(ε) = B R|Jk |×|Jk | : B k=1
where
˜ max = max B k pq : k ∈ Ns . B
Following [5], we introduce a concept of the strong stability radius as follows. Definition 4. The strong stability radius of the game Z n (C, J1 , J2 , . . . , Js ), n ∈ N, (called T1 -stability radius in the terminology of [3, 10]) is the number sup Ξ , if Ξ = ∅, ρ = ρnpq (J1 , J2 , . . . , Js ) = 0 , if Ξ = ∅, where
n ˜ ˜ ˜ =∅ . ˜ ∈ Ω(ε) Gn (C)∩G (C + B) Ξ = ε> 0 : ∀B
Thus, the strong stability radius of the game Z n (C) determines the limit level of perturbations of the elements of the matrix bundle C˜ (induced by matrix C) that preserve (J1 , J2 , . . . , Js ), optimality of at least one (not necessarily the ˜ ∈ Ω (ε) and ε> 0, it is obvious same) outcome of the original game. For any B n ˜ ˜ ˜ = ∅ if Gn (C) ˜ = X, i.e. then ρ = ∞. Therefore, the (C + B) that Gn (C)∩G ¯ n (C) ˜ = X\Gn (C) ˜ = ∅ is called non-trivial. game Z n (C) with G Recall that in [4], another type of stability, the so-called, quasistability, was defined and analyzed. in there, the quasistability was defined as the limit level of perturbations of the elements of the matrix C that preserve optimality of all the outcomes of the set Gn (C) of the original problem Z n (C) but new extreme outcomes are allowed to arise in the perturbed problem Z n (C + B). Clearly, the quasistability and strong stability are two distinct properties of invariance of the set Gn (C) with significantly different structures and analytical expressions for their radii.
3
Lemmas and Properties
In the outcome space Rn along with the norm lp , p ∈ [1, ∞], we will use the conjugate norm lp∗ , where the numbers p and p∗ are connected, as usual, by the equality 1 1 + = 1, p p∗ assuming p∗ = 1 if p = ∞, and p∗ = ∞ if p = 1. Therefore, we further suppose that the range of variation of the numbers p and p∗ is the closed interval [1, ∞], and the numbers themselves are connected by the above conditions. Further we use the well-known H¨ older’s inequality |aT b| ≤ ap bp∗
(5)
378
Y. Nikulin and V. Emelichev
that is true for any two vectors a = (a1 , a2 , . . . , an )T ∈ Rn and b = (b1 , b2 , . . . , bn )T ∈ Rn . Directly from (3), the following lemma similar to lemma in [4] follows. Lemma 1. The outcome x = (x1 , x2 , . . . , xn )T ∈ X is (J1 , J2 , . . . , Js )-efficient, i.e. ˜ J1 , J2 , . . . , Js ) x ∈ Gn (C, if and only if for any index k ∈ Ns xJk ∈ P (C k ). Hereinafter, xJk is a projection of vector x = (x1 , x2 , . . . , xn )T on coordinate axes of X with coalition numbers Jk . Denote ˜ = K n (C, ˜ J1 , J2 , . . . , Js ) = {k ∈ Ns : P (C k ) = XJ }. K n (C) k It is easy to see that the following propositions are valid. ˜ is Proposition 1. The game Z n (C) is non-trivial if and only if the set K n (C) non-empty. Proposition 2. The outcome x0 ∈ X is not (J1 , J2 , . . . , Js )-efficient in the game Z n (C), i.e. ˜ J1 , J2 , . . . , Js ) x0 ∈ Gn (C, ˜ such that if and only if there exists an index k ∈ K n (C) x0Jk ∈ P (C k ). ˜ we Proposition 3. If the game Z n (C) is non-trivial, then for any k ∈ K n (C) have P (C k ) = XJk . Hereinafter, a+ is a projection of a vector a = (a1 , a2 , . . . , ak ) ∈ Rk on a positive orthant, i.e. + + a+ = [a]+ = (a+ 1 , a2 , . . . , ak ), where + implies positive cut of vector a, i.e. + a+ i = [ai ] = max{0, ai }.
Lemma 2. Let p, q ∈ [1, ∞], C k ∈ R|Jk |×|Jk | . Assume that a number ϕ > 0 and vectors z, z ∈ XJk are such that inequality [C k (z − z )]+ q ≥ ϕ z − z p∗ > 0 ˜ ∈ Ω(ϕ) we have holds. Then for any perturbing matrix B z ∈ X(z, C k + B k ).
(6)
Strong Stability in Finite Games
379
Proof. The proof will be given by contradiction. Assume that there exists a ˜ ∈ Ω(ϕ) such that z ∈ X(z, C k + B k ). Then for any index perturbing matrix B i ∈ Jk we derive (Cik + Bik )z ≤ (Cik + Bik )z , and hence
Cik (z − z ) ≤ Bik (z − z).
From the last inequality, we continue [Cik (z − z )]+ ≤ |Bik (z − z )|. Taking into account H¨ olders inequality (5), we get [Cik (z − z )]+ ≤ Bik p z − z p∗ . Thus, we conclude ˜ max z − z p∗ < ϕz − z p∗ . [Cik (z − z )]+ q ≤ B k pq z − z p∗ ≤ B The last inequality contradicts the condition (6) of the lemma. The Lemma 2 is proven. ˜ J1 , J2 , . . . , Js ), we denote For any x0 ∈ Gn (C, ˜ : x0 ∈ P (C k )}. K(C, x0 ) = {k ∈ K n (C) Jk ˜ then K(C, x0 ) is non-empty. Lemma 3. If x0 ∈ Gn (C), ˜ then the game Z n (C) is non-trivial, and due to Proof. Indeed, if x0 ∈ Gn (C), n 0 Proposition 1, K (C, x ) is non-empty. Now assume that K n (C, x0 ) = ∅. Then ˜ If k ∈ K n (C), ˜ then due to Proposition 3, x0Jk ∈ P (C k ) when k ∈ K n (C). k 0 k P (C ) = XJk , i.e. again we have xJk ∈ P (C ). Hence, according to Lemma 1, ˜ The obtained contradiction ends the proof of Lemma 3. x0 ∈ Gn (C).
4
Main Result
For the non-trivial game Z n (C, J1 , J2 , . . . , Js ), n ≥ 2, s ∈ Nn and any p, q ∈ [1, ∞], we define ϕ = ϕnpq (J1 , J2 , . . . , Js ) =
max
˜ x∈Gn (C)
n ψ = ψpq (J1 , J2 , . . . , Js ) = min
˜ k∈K n (C)
min
˜ k∈K n (C)
min
z∈P (C k )
min
z∈P (C k )
max
˜ x∈Gn (C)
max i∈Jk
[C k (xJk − z)]+ q , xJk − zp∗ 1 1 Cik (xJk − z) |Jk | p + q . xJk − z1
Here we formulate the main result of this work. The analytical bounds ϕ and ψ specified in the main theorem below provides and enumerative way of calculating bounds for the strong stability radius.
380
Y. Nikulin and V. Emelichev
Theorem 1. For any p, q ∈ [1, ∞], C ∈ Rn×n , n ≥ 2 and any coalition partition (J1 , J2 , . . . , Js ), s ∈ Nn , the strong stability radius of the non-trivial game Z n (C, J1 , J2 , . . . , Js ) has the following lower and upper bounds: n 0 < ϕnpq (J1 , J2 , . . . , Js ) ≤ ρnpq (J1 , J2 , . . . , Js ) ≤ ψpq (J1 , J2 , . . . , Js ).
˜ is nonProof. First of all, we notice that due to Proposition 1, the set K n (C) empty. It is easy to see that formula k ˜ ∀x ∈ Gn (C) ˜ ∀z ∈ P (C k ) ∃i ∈ Jk ∀x ∈ K n (C) Ci (xJk − z) > 0 is true, i.e. ϕ, ψ > 0. First, we prove the inequality ρ ≥ ϕ. Consider perturbing matrix bundle ˜ = {B 1 , B 2 , . . . , B s } ∈ Ω(ϕ). In order to show that ρ ≥ ϕ it suffices to find B ˜ ∩ Gn (C˜ + B). ˜ x∗ ∈ Gn (C) Then according to the definition of the positive number ϕ, we have ˜ ∀k ∈ K n (C) ˜ ∀z ∈ P (C k ) ∃x0 ∈ Gn (C) [C r (xJr − z)]+ q ≥ ϕxJr − zp∗ > 0.
(7)
Therefore according to Lemma 2, we get ˜ ∀z ∈ P (C k ) ∀B ˜ ∈ Ω(ϕ) ∀k ∈ K n (C) z ∈ X(x0Jk , C k + B k ).
(8)
˜ ∩ Gn (C˜ + B), ˜ where Further, we specify a way of selecting x∗ ∈ Gn (C) 0 n ˜ ∗ 0 0 ˜ ˜ ˜ B ∈ Ω(ϕ). If x ∈ G (C + B), then we set x = x . Assume now x ∈ Gn (C˜ + B). 0 Due to Lemma 3, we get K(C + B, x ) = ∅. Thus,
and we set
x0Jk ∈ P (C k + B k ),
k ∈ K(C + B, x0 ),
x0Jk ∈ P (C k + B k ),
k ∈ K(C + B, x0 ),
x∗Jk = x0Jk , k ∈ K(C + B, x0 ).
(9)
Further due to external stability (see e.g. [8]) of each of the Pareto sets P (C k + B k ), k ∈ K(C + B, x0 ) one can select a vector x∗Jk , k ∈ K(C + B, x0 ) such that
x∗Jk ∈ X(x0Jk , C k + B k ).
Taking into account proven earlier formula (8), it is easy to see ˜ x∗Jk ∈ P (C k ), k ∈ K n (C).
(10)
Strong Stability in Finite Games
381
Moreover, due to Proposition 3, for any x∗Jk ∈ XJk we have x∗Jk ∈ P (C k ),
˜ k ∈ K n (C).
˜ In addition to that due to (9) and (10), x∗ ∈ Gn (C˜ + B). ˜ So, x∗ ∈ Gn (C). ∗ ˜ ∩ Gn (C˜ + B) ˜ for any B ˜ ∈ Ω(ϕ), i.e. ρ ≥ ϕ. Therefore, x ∈ Gn (C) Further, we prove that ρ ≤ ψ. According to the definition of the positive number ψ, we have ˜ ∃r ∈ K n (C)
˜ ∀i ∈ Jr ∃z 0 ∈ P (C r ) ∀x ∈ Gn (C) 1
1
ψxJr − z 0 1 ≥ |Jr | p + q Cir (xJr − z 0 ).
(11)
Let ε > ψ. The elements bij of the perturbing matrix bundle ˜ = {B 1 , B 2 , . . . , B s } ∈ Rn×n are defined as follows: B ⎧ ⎨ −δ if (i, j) ∈ Jr × Jr , zj0 = 0, bij = δ if (i, j) ∈ Jr × Jr , zj0 = 1, ⎩ 0 if (i, j) ∈ Jr × Jr , 1
(12)
1
where ψ < δ|Jr | p + q < ε. Taking into consideration (5) we get 1
Bir p = δ|Jr | p , i ∈ Jr , 1
1
˜ max = B r pq = δ|Jr | p + q , B ˜ ∈ Ω(ε). B Recall that here and after Bir , i ∈ Jr are the rows of matrix B r . Moreover, we have Bir (xJr − z 0 ) = −δxJr − z 0 1 , i ∈ Jr . Using (11) and (12), we conclude that for any index i ∈ Jr the following inequalities are true: (Cir + Bir )(xJr − z 0 ) ≤
ψ |Jr |
1 1 p+q
− δ xJr − z 0 1 < 0.
˜ if x ∈ Gn (C). ˜ Thus, xJr ∈ P (C r +B r ), and due to Proposition 2, x ∈ Gn (C˜ + B) Summarizing, for any ε > ψ there exists a perturbing matrix bundle ˜ ∈ Ω(ε) such that Gn (C) ˜ ∩ Gn (C˜ + B) ˜ = ∅, i.e. ρ < ε for any ε > ψ. Thus, B ρ ≤ ψ. The last concludes the proof of Theorem 1.
382
5
Y. Nikulin and V. Emelichev
Corollaries
Corollary 1. Given p, q ∈ [1, ∞], C ∈ Rn×n , n ≥ 2 and any coalition partition (J1 , J2 , . . . , Js ), s ∈ Nn , assume the equality P (C k ) = {x0Jk } ˜ J1 , J2 , . . . , Js ). Then the strong stability radius of the be true for any k ∈ K n (C, n non-trivial game Z (C, J1 , J2 , . . . , Js ) is expressed by the formula: ρ = ρnpq (J1 , J2 , . . . , Js ) =
min
˜ k∈K n (C)
min
z∈P (C k )
[C k (x0Jk − z)]+ q . x0Jk − zp∗
(13)
Proof. For the sake of brevity, we denote ϕ0 the right-hand side of (13). It is easy to see that ϕ0 > 0. The inequality ρ ≥ ϕ0 follows from Theorem 1. We prove ˜ J1 , J2 , . . . , Js ), that ρ ≤ ϕ0 . Since P (C k ) = {x0Jk } is true for any k ∈ K n (C, then [C k (x0Jk − z)]+ q ϕ0 = min min 0 . ˜ x0Jk − zp∗ z∈XJk \{xJ } k∈K n (C) k ˜ be an index and z 0 ∈ XJ \{x0 } be a vector Let ε > ϕ0 , and let r ∈ K n (C) r Jr such that (14) ϕ0 x0Jr − z 0 p∗ = [C r (x0Jr − z 0 )]+ q . Let α be a number satisfying αϕ0 < ε. We define a vector γ ∈ R|Jr | according to the formula below γx0Jr − z 0 p∗ = α[C r (x0Jr − z 0 )]+ . Then due to (14), we get γx0Jr − z 0 p∗ [C r (x0Jr − z 0 )]+ ,
(15)
γq = αϕ0 < ε. Therefore, for any index i ∈ Jr according to Lemma 1 there exists a perˆ r ∈ R|Jr | , i ∈ Jr such that for any index ˆ r = [bˆij ] with rows B turbing matrix B i i ∈ Jr the equalities below hold: ˆ r (x0 − z 0 ) = −γi x0 − z 0 p∗ , B i Jr Jr ˆir |p = γi . B From the above using (15) we derive ˆ r )(x0J − z 0 ) ≺ [C r (x0J − z 0 )]+ − γx0J − z 0 p∗ ≺ 0, (C r + B r r r ˆ r pq = γq < ε. B
Strong Stability in Finite Games
where
383
0 = (0, 0, . . . , 0)T ∈ XJr ,
As a result, we get
ˆ r ), z 0 ∈ X(x0Jr , C r + B
˜ 0 , J1 , J2 , . . . , Js ) if i.e. according to Proposition 2, we have x0 ∈ Gn (C˜ + B 0 n ˜ 0 1 2 r ˜ x ∈ G (C, J1 , J2 , . . . , Js ). Here B = {B , B , . . . , B }, where r ˆ B if k = r, k B = 0|Jk |×|Jk | if k ∈ Ns \{r}. ˜ 0 max < ε. So, we conclude that for any number ε > ϕ0 there exists a Thus, B ˜ J1 , J2 , . . . , Js ) ∩ Gn (C˜ + ˜ 0 ∈ Ω(ε) such that Gn (C, perturbing matrix bundle B 0 0 ˜ , J1 , J2 , . . . , Js ) = ∅, i.e. ρ ≤ ε for any ε > ϕ - Hence, ρ ≤ ϕ0 . Taking into B consideration proven earlier inequality ρ ≥ ϕ0 we get formula (13). The Corollary 1 has now been proven. Notice that the formula (13) from Corollary 1 basically implies the attainability of the lower bound specified in Theorem 1, i.e. for any p, q ∈ [1, ∞] the equality ρnpq (J1 , J2 , . . . , Js ) = ϕnpq (J1 , J2 , . . . , Js ) holds if
˜ J1 , J2 , . . . , Js ) = {x0 }. Gn (C,
Corollary 2. Given p, q ∈ [1, ∞], C ∈ Rn×n , n ≥ 2, the strong stability radius of the non-trivial 1-coalitional game Z n (C, Nn ) of finding the Pareto set P n (C), has the following lower and upper bounds: n (Nn ), 0 < ϕnpq (Nn ) ≤ ρnpq (Nn ) ≤ ψpq
where ϕnpq (Nn ) = 1
max n
x∈P (C) 1
n ψpq (Nn ) = n p + q
min n
x ∈P (C)
min
x ∈P n (C)
[C(x − x )]+ q , x − x p∗
max n
max
i∈Nn
x∈P (C)
Ci (x − x ) . x − x 1
Corollary 3. [2] Given p = q = ∞, the strong stability radius of the non-trivial 1-coalitional game Z n (C, Nn ) of finding the Pareto set P n (C), has the following lower and upper bounds: 0
0, ⎨1 if cii < 0, x0i = 0 ⎩ xi ∈ Xi if cii = 0. Therefore, it is obvious that ˜ {1}, {2}, . . . , {n}) = {k ∈ Nn : ckk = 0}. K n (C, ˆ n . The game Z n (C, {1}, {2}, . . . , {n}) is For the sake of brevity, we denote it K n ˆ non-trivial if and only if K = ∅. Corollary 7. For any p, q ∈ [1, ∞], C ∈ Rn×n , n ≥ 2, the strong stability radius of the non-trivial game Z n (C, {1}, {2}, . . . , {n}) of finding the Nash set N E n (C) is expressed by the formula: ˆ n }. ρ∗ = ρnpq ({1}, {2}, . . . , {n}) = min{|ckk | : k ∈ K Proof. From Theorem 1, we have ϕ∗ ≤ ρ∗ ≤ ψ ∗ , where ϕ∗ = ϕnpq ({1}, {2}, . . . , {n}) =
max n
x∈N E (C)
min
ˆn k∈K
[ckk (xk − x ¯k )]+ q , xk − x ¯k p∗
Strong Stability in Finite Games n ψ ∗ = ψpq ({1}, {2}, . . . , {n}) = min
ˆn k∈K
Here x ¯k =
max
x∈N E n (C)
385
ckk (xk − x ¯k ) . xk − x ¯1
0 if xk = 1, 1 if xk = 0.
ˆ n and x ∈ N E n (C) Therefore, taking into account Corollary 6, for any k ∈ K the equalities hold: [ckk (xk − x ¯k )]+ q ¯k )q ckk (xk − x = = |ckk |, ∗ xk − x ¯k p xk − x ¯k p∗ ckk (xk − x ¯k ) = |ckk |. xk − x ¯k 1 ˆ n }. Hence, ρ∗ = min{|ckk | : k ∈ K Notice that the formula from Corollary 7 technically implies the attainability of the lower and upper bound specified in Theorem 1, for the case of n-coalitional game. Consider an example of bi-matrix games with two players. Example 1. Let C ∈ R2×2 be a matrix with rows C1 and C2 , and let Xi ∈ {0, 1}, i ∈ N2 , x(1) = (0, 0)T , x(2) = (0, 1)T , x(3) = (1, 0)T , x(4) = (1, 1)T . Set p = q = ∞. The payoff functions are written as (C1 x(1) , C2 x(1) ) (C1 x(2) , C2 x(2) ) (C1 x(3) , C2 x(3) ) (C1 x(4) , C2 x(4) ) Let C numerically be defined as follows 23 C= . 51 Then we have bi-matrix game Z 2 (C) with payoffs (0, 0) (3, 1) . (2, 5) (5, 6) Therefore, P 2 (C) = N E 2 (C) = {x(4) }. According to Corollary 5, we have n ρn∞∞ ({1, 2}) = ϕn∞∞ ({1, 2}) = ψ∞∞ ({1, 2}) = 3,
and according to Corollary 7, we get ρn∞∞ ({1}, {2}) = 1. So, forming a coalition between two players leads to Pareto optimal outcome x(4) . If both players stay independent, the game converges again to x(4) , which is the Nash equilibrium in this case. From stability point of view, forming coalition is preferable since the strong stability radius is larger in such situation.
386
6
Y. Nikulin and V. Emelichev
Conclusions
As a summary, it is worth mentioning that the bounds and formulas, proven in Theorem 1 and Corollaries 1–7, are mostly theoretical due to their analytical and enumerative structures. In practical applications, one can try to get reasonable approximation of the bounds using some meta-heuristics, e.g. evolutionary algorithms or Monte-Carlo simulation. This could become a subject for future investigations. Another possibility to continue research in this direction is to specify some particular classes of games where computational burden can be drastically reduced due to a unique structure of the set of efficient outcomes.
References 1. Aubin, J.-P., Frankowska, H.: Set-Valued Analysis. Birkh¨ auser, Basel (1990) 2. Emelichev, V., Girlich, E., Nikulin, Yu., Podkopaev, D.: Stability and regularization of vector problem of integer linear programming. J. Optim. 51, 645–676 (2002). https://doi.org/10.1080/0233193021000030760 3. Emelichev, V., Kotov, V., Kuzmin, K., Lebedeva, N., Semenova, N.: Stability and effective algorithms for solving multiobjective discrete optimization problems with incomplete information. J. Autom. Inf. Sci. 46, 27–41 (2014). https://doi.org/10. 1615/JAutomatInfScien.v46.i2.30 4. Emelichev, V., Nikulin Y.: Finite Games with Perturbed Payoffs. In: Olenev N., Evtushenko Y., Khachay M., Malkova V. (eds): Advances in Optimization and Applications, 11th International Conference, OPTIMA 2020, Moscow, Russia, 28 September–2 October 2020, Revised Selected Papers. CCIS, vol. 1340, pp. 158–169. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65739-0 5. Emelichev, V., Nikulin, Y.: Strong stability measures for multicriteria quadratic integer programming problem of finding extremum solutions. Comput. Sci. J. Moldova 26, 115–125 (2018) 6. Nash, J.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. U.S.A. 36, 48–49 (1950) 7. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951) 8. Noghin, V.: Reduction of the Pareto Set: An Axiomatic Approach. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67873-3 9. Osborne, M., Rubinstein, A.: A Course in Game Theory. MIT Press (1994) 10. Sergienko, I., Shilo, V.: Discrete Optimization Problems. Research. Naukova dumka, Kiev, Problems, Methods (2003)
Inverse Optimal Control with Continuous Updating for a Steering Behavior Model with Reference Trajectory Ildus Kuchkarov1(B) , German Mitiai1 , Ovanes Petrosian1 , oren Hohmann2 Timur Lepikhin1 , Jairo Inga2 , and S¨ 1
2
St.Petersburg State University, 7/9, Universitetskaya nab., Saint-Petersburg 199034, Russia kuchkarov [email protected], [email protected], [email protected], [email protected] Institute of Control Systems, Karlsruhe Institute of Technology (KIT), 12, Kaiserstrasse, Karlsruhe 76131, Germany {jairo.inga,soeren.hohmann}@kit.edu
Abstract. Most real control processes continuously evolve in time and a participant may not have all the information about the process at the time of its initiation. For example, a driver only has local information about the curvature of a road or any obstacles that might necessitate a lane change. The continuous updating approach allows us to arrive at models accounting for the limited information available to subjects during the decision making process. Previously, authors have considered many variations and methods for applying the continuous information updating approach: optimality conditions for equilibrium and cooperative strategies were constructed for the linear-quadratic case [18, 20], the Hamilton-Jacobi-Belman equation [29, 30], Pontryagin’s maximum principle [31, 43]. Also an application of the continuous updating approach was introduced for the general inverse optimal control problem with continuous updating in the paper [27], where the continuous updating was used for identifying cost function parameters from measured data and also the value of the information horizon. In this paper, we apply a continuous updating approach to a special and practical case of an inverse optimal control problem of determining the behavior of a driver while driving along a reference trajectory. Here the inverse optimal control problem becomes nonautonomous since the reference trajectory is included in the objective function of the driver as a function of time. The real motion data from the steering wheel driving simulator is used and the conclusion is drawn.
Keywords: Continuous updating control
· Optimal control · Inverse optimal
The article was carried out under the auspices of a grant from the President of the Russian Federation for state support of young Russian scientists - candidates of science, project number MK-4674.2021.1.1. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 387–402, 2021. https://doi.org/10.1007/978-3-030-86433-0_27
388
1
I. Kuchkarov et al.
Introduction
Understanding how humans move or control a system is an important issue in human-machine interaction scenarios. These include, among others, situations of shared control where a human and an automatic controller influence a dynamic system simultaneously. In this context, having knowledge of human behavior in the form of mathematical models is essential in order to adapt the controller according to a particular human partner. In the neuroscience community, optimal feedback control arose as a promising approach to model human motor behavior [37]. The last two decades have therefore seen a growing interest in optimal control models to describe all kinds of human movement including reach-to-grasp [5], saccadic eye movements [6], and locomotion [25]. In addition, optimal control has been applied to model how a human controls or manipulates a dynamic system alone [11,33] or in cooperation with an assistance system [9]. For this purpose, the model parameters, i.e. the cost function parameters, are determined out of the analyzed data by solving inverse optimal control problems. This problem has also attracted much attention from the control engineering community (see e.g. [13,15,26]). In all the aforementioned modeling approaches, it is assumed that the human participant has all information about the motion equations and objective function at the beginning of the process and for the complete time interval which may be finite (e.g. [1]) or infinite (e.g. [5,33]). Nevertheless, most real-life control processes evolve continuously in time and the human subject may not have all information about the process at the initial time instant. For example, in an automotive application, the driver only has local information about the road curvature or any obstacles which may force a lane change. Hence, it is questionable whether approaches based on classical optimal control theory are an adequate reflection of human decision making. We conjecture that a human subject continuously receives updated information and adapts their behavior accordingly to the new situation and, hence, it is important to include this characteristic in a procedure that seeks to identify a model of human behavior. This type of behavior we try to model using the continuous updating approach presented in this paper. The continuous updating approach has been studied in the papers [18–20,28– 31,42,43], in these cases it is supposed that the updating process about motion equations and payoff functions evolves continuously over time. Previously, the authors considered many variations and methods of applying the continuous information updating approach: optimality conditions for equilibrium and cooperative strategies were constructed for the linear-quadratic case [18–20], the Hamilton-Jacobi-Belman equation [28–30,42], Pontryagin’s maximum principle [31,43]. In the continuous updating approach, it is assumed that the decision maker – has information about motion equations and objective function only on [t, t + T ], where T is the information horizon and t is the current time instant. – receives updated information as time t ∈ [t0 , +∞) evolves.
IOC with CU for a Steering Behavior Model with Reference Trajectory
389
The same or related assumptions can be made for human behavior. Therefore, in the article [27], a general optimal control with continuous updating was introduced as a human behavior model. In this work, the explicit solution of a linearquadratic optimal control problem with continuous updating is used to solve a corresponding inverse optimal control problem, where not only the human cost functions parameters are identified, but also the length of the information horizon T . The approach of dynamic and continuous updating has some similarities with Model Predictive Control (MPC) theory which is worked out within the framework of numerical optimal control, [23,34,41]. In the MPC approach, the current control action is achieved by solving a finite-horizon open-loop optimal control problem at each sampling instant. For linear systems, there exists a solution in explicit form, [2,8]. However, in general, the MPC approach demands the solution of several optimization problems. Another related series of papers corresponds to the class of stabilizing control, [21,22,24,36], here similar approaches were considered for the class of linear quadratic optimal control problems. However, in the current work and papers about the continuous updating approach, the main goal is different: to model players’ behavior when information about the game updates continuously in time. Also, there exist a list of papers related to the class of differential games with dynamic and continuous updating, [7,14,35,38–40]. This paper is the prolongation of our previous work on inverse optimal control [27]. It is devoted to an application of the continuous updating approach to an inverse optimal tracking control problem with reference trajectory. Classical optimal tracking control problem has been considered in the papers [1,3,4,12]. In the paper [27] the inverse optimal control problem is stated for a class of optimal control problems with continuous updating and the closed form of optimal control for continuous updating setting is obtained. In this paper, the essential difference is the introduction of the reference trajectory in the model. This makes the continuous updating model nonautonomous itself because the reference trajectory explicitly depends on the time. The extension to handle trajectory tracking control is essential towards real-life applications, e.g. for advanced driver assistant systems. Therefore, in this paper, we consider a simplified steering wheel driving simulator (cf. [10]) for a first evaluation of the methods with real motion data towards an application and use of the human model for advanced driving assistance systems. The paper is structured as follows. In Sect. 2, a description of the initial optimal control problem and corresponding optimal control problem with continuous updating is presented. In Sect. 3, the explicit form of the solution of the optimal control problem with continuous updating is given for the class of linear-quadratic control problems. Then Sect. 4 presents the inverse optimal control approach with continuous updating, where both the cost function parameters and the length of the information horizon are identified. Afterwards, we show in Sect. 5 simulation results of the proposed modeling approach based on continuous updating with parameter identification. Finally, we draw conclusions in Sect. 6.
390
2
I. Kuchkarov et al.
Optimal Tracking Control with Continuous Updating
In this section, we present our results concerning optimal tracking control with continuous updating and reference trajectory. We first present the classical optimal tracking control problem before showing our continuous updating approach. 2.1
Initial Optimal Tracking Control Problem
Consider a linear system x(t) ˙ = Ax(t) + Bu(t, x(t)), x(t0 ) = x0 ,
(1)
where x(t) ∈ Rl is state, x(t0 ) is the initial state of the system, u(.) ∈ U ⊂ Rm is the control, A ∈ Rl×l and B ∈ Rl×m are constant matrices, t ∈ [t0 , T ]. Consider the optimal tracking control problem defined on the interval [t0 , T ] T J(x0 , t0 ; u) =
(z(t) − x(t)) Q(z(t) − x(t))
t0
+ u (t, x(t))Ru(t, x(t)) dt → min
(2)
u∈U
subject to (1), where z(t) ∈ C[t0 , T ] is reference trajectory (z ∈ Rl ), Q ∈ Rl×l , R ∈ Rm×m are assumed to be constant and symmetric matrices, Q is positive semi-definite, R is positive defined, ( · ) means transpose here and hereafter. We consider the class of control functions u(t, x) ∈ U ⊂ Rm which are continuous in (t, x) (this assumption is important further in (5)). 2.2
Problem Formulation for Optimal Tracking Control with Continuous Updating
Using the initial optimal tracking control problem defined on the closed time interval [t0 , T ], we construct the corresponding optimal control problem with continuous updating. Consider the following optimal tracking control problem defined on the interval [t, t + T ], where 0 < T < +∞, t+T
J(x, t; ut ) =
(z(s) − xt (s)) Q(z(s) − xt (s))
t
+ (ut (s, xt (s))) Rut (s, xt (s)) ds → min ut
(3)
IOC with CU for a Steering Behavior Model with Reference Trajectory
391
subject to x˙ t (s) = Axt (s) + But (s, xt (s)), xt (t) = x,
(4)
where xt ∈ Rl is state, ut = ut (s, xt (s)) ∈ U ⊂ Rm is control, s ∈ [t, t + T ]. The main characteristic of the optimal control problem with continuous updating is the following: the current time t ∈ [t0 , +∞) evolves continuously and as a result the human continuously obtains new information about the motion equation and objective function on the interval [t, t + T ]. The control u(t, x) in the optimal control problem with continuous updating has the form (5) u(t, x) = ut (s, x)|s=t , t ∈ [t0 , +∞), where ut (s, x), s ∈ [t, t + T ] is the control in the problem defined on the interval [t, t + T ] and ut (s, x)|s=t is the part of that control in the first instant s = t. The main idea of (5) is that as the current time t evolves information updates, therefore in order to model the behavior of the human subject it is necessary to consider the control ut (s, x) only in the points where s = t. The trajectory x(t) in the optimal control problem with continuous updating is determined in accordance with the system dynamics in (1) where u = u(t, x) is the control in (5). We assume that the control with continuous updating obtained using (5) is admissible. The essential difference between a control problem with continuous updating and classic optimal control problem defined on the closed interval is that the decision maker in the initial problem is guided by the objective that payoff will eventually be received for the interval [t0 , T ]. In the case of a control problem with continuous updating, at time instant t the system is oriented to the expected objective (3), which is calculated using the information on the interval [t, t + T ] or the information that the system has at the instant t. 2.3
Optimal Tracking Control with Continuous Updating
For the framework of continuously updated information, we use the concept of optimal control in feedback form u∗ (t, x). Furthermore, we require that, for any fixed current time t ∈ [t0 , +∞), u∗ (t, x) coincides with the optimal control in the problem specified by (3) and (4), defined on the interval [t, t + T ] in the instant t. However, direct application of classical approaches for optimal control in feedback form is not possible. Consider another current time instant t+, 0, S˜0 > 0, W = W (t) is a standard Wiener process and dW is its stochastic differential, t ≥ 0. Let θ > 0 be an inverse to money velocity. Let 1 − α, α ∈ (0, 1) determine a risk aversion. To support the consumption level p(t)C(t), where p(t) is the consumer price index, a household needs cash M (t) = θp(t)C(t), so the liquidity constraint is completed [6,7]. We assume that the consumer price index obeys the exponential growth p(t) = p0 ejt , where j is an inflation rate. Denote by D(t) the savings in the form of deposits at the interest rate rD and by L(t) the consumer loan debt at the interest rate rL . The lack of arbitration in the savings and consumer credit market follows the inequalities rL > rD > 0. Let x(t) = M (t) + D(t) − L(t) determine the household welfare balance. The problem of the household incomes distribution poses as: ⎛ +∞ ⎞ α M ˆ J = ⎝E e−Δt dt⎠ → max , (3) ˜ ≥0 p0 ejt M ≥M 0
dx 1 = S − M + rD (x − M )+ − rL (M − x)+ , dt θ x(0) = x ˜0 ,
(4) (5)
420
A. A. Shananin et al.
x≥−
S . rL − γ
(6)
The salary dynamic in the problem (3)–(6) satisfies the SDE (1) and (2), M = ˜ =M ˜ (t) M (t) is a control function and determines the cash of the household, M ˆ ˆ determines the minimal cash the household can have, Δ = Δ(t) > 0 is a discount coefficient. Let Δ = Δˆ + αj. We assume that the households take loans and are able to pay off their loan obligations (6) (for a more detailed explanation of the problem considered, we refer to [10,11]). To solve the optimal control problem (1)–(6), we introduce the Hamilton–Jacobi–Bellman equation: ∂v ∂v σ 2 2 ∂ 2 v ∂v + S +S + Sγ 2 ∂t 2 ∂S ∂S ∂x ∂v 1 α −Δt + max = 0, rD (x − M )+ − rL (M − x)+ − M + M e ˜ ≥0 ∂x θ M ≥M where v = v(t, x, S). The optimal control problem has an almost analytic solution in case σ = 0 (see [8–11]). From the optimal control analysis, we identify three classes of household behavior: those who take consumer loans, those who save money, and those who do not take consumer loans and do not save money. In the next section, we consider these classes in detail.
3
Statistic Data Reproduction
To classify the households, we used the data from RLMS-HSE [12]. We considered the data during the period 2015–2018 that included 11,352 observations of the behavior of the households living in 38 Russian regions. According to the Russian Federal State Statistics Service, the regions were clustered into two groups and presented in Appendix. The distribution of the debt burden was determined for each group of regions defined as the ratio of the monthly loan payment to the monthly income per one household member. For each group of regions, we have obtained the data presented in the following Tables 1 and 2. The data in two tables show that the incomes and the consumption differ in two groups of regions. It is clear that the households in group 1 have higher incomes and consumptions for each class of the households. Denote the class of households that take consumer loans to support their life needs as x1 (we say that these households take unsecured credits); households that take consumer loans as a lifestyle as x2 ; households that do not take consumer loans and do not save money as x3 ; households that save money as x4 . Thus, we obtain 4 types of households’ behavior. For each class, we solve the problem (1)–(6) considering σ = 0. We choose such functions Δ1 (t), Δ2 (t), Δ4 (t) for classes x1 , x2 , x4 respectively, that are based on the regressions presented in Appendix. The functions Δ1 (t), Δ2 (t) and the parameters α1 , α2 are chosen to reproduce the consumer loan debt statistics L(t) = L1 (t) + L2 (t), where L1 (t) and L2 (t) determine the consumer loan debt of classes of the households x1 and x2 (all the parameters
Consumer Loan Demand Modeling
421
are presented in Appendix). The same is for the class x4 : the function Δ4 (t) and the parameter α4 are chosen to reproduce the deposits since April 2009. To improve the data quality, we denote the function v(t) that represents the part of the incomes relying on the class x4 between classes x3 and x4 . For the problems that describe consumer loan debts and savings, we set parameters θ1 , θ2 , θ4 as constants. For the class x3 , we take θ3 (t), C3 (t) as the functions with a given regressions (see Appendix) that in total reproduce the consumptions and cash statistics. Table 1. Group of regions 1. L > 0 (59.6%)
L > 0 (40.4%)
L = 0, D = 0
L = 0, D > 0
70
30
159
147
406
(17.24%)
(7.39%)
(39.16%)
(36.21%)
(100%)
241
76
364
290
971
(24.82%)
(7.83%)
(37.49%)
(29.86%)
(100%)
Average persons in a household
3.45
2.55
2.29
1.97
2.39
Average income per person, rubles
19, 118
41, 787
20, 224
27, 185
23, 708
Average consumption per person, rubles
22, 181
56, 994
15, 643
22, 724
22, 636
Households Population
Total incomes, rubles Total consumptions, rubles Denotation
Total
4.61 · 106
3.18 · 106
7.35 · 106
7.88 · 106
23.02 · 106
(20.03%)
(13.81%)
(31.93%)
(34.23%)
(100%)
5.35 · 106
4.35 · 106
5.69 · 106
6.59 · 106
21.98 · 106
(24.34%)
(19.79%)
(25.89%)
(29.98%)
(100%)
x1
x2
x3
x4
Table 2. Group of regions 2. L > 0 (52.3%)
L > 0 (47.7%)
L = 0, D = 0
L = 0, D > 0
553
237
952
444
2186
(25.3%)
(10.84%)
(43.55%)
(20.31%)
(100%)
1816
601
2194
920
5531
(32.83%)
(10.87%)
(39.67%)
(16.63%)
(100%)
Average persons in a household
3.28
2.54
2.3
2.07
2.53
Average income per person, rubles
12, 497
31, 583
14, 221
19, 599
16, 440
Average consumption per person, rubles
15, 099
33, 391
12, 138
17, 139
16, 254
Households Population
Total incomes, rubles Total consumptions, rubles Denotation
Total
22.7 · 106
18.99 · 106
31.2 · 106
18.04 · 106
90.93 · 106
(20.03%)
(20.88%)
(34.31%)
(19.85%)
(100%)
27.42 · 106
20.08 · 106
26.63 · 106
15.77 · 106
89.9 · 106
(30.05%)
(22.34%)
(29.62%)
(17.54%)
(100%)
x1
x2
x3
x4
422
A. A. Shananin et al.
Remark 1. All statistic data was smoothed by a Hodrick-Prescott filter [13] to avoid short-term fluctuations. We next show how the modified Ramsey model has reproduced data since April 2009.
Fig. 2. Group of regions 1 (left) and group of regions 2 (right). The red line represents deposits, the black line represents cash, the pink line represents consumer loans, and the blue line represents consumptions. (Color figure online)
In Fig. 2 the dots represent statistical data, the solid lines are the computations of the modified Ramsey model for the period from 2009 to 2019, and the dash lines—continuation of the estimated data for 2019–2020. Thus, it is possible to understand the quality of reproducing statistical indicators not only in the model training period, but in the test period as well.
4
Forecasts
Nowadays, in the face of a difficult epidemiological situation, which affects the economy of the country, welfare, and the employment of the households, the most relevant scenarios are based on the loss of incomes of the population. The self-isolation regime led to unemployment growth. The incomes drop in April 2020 was about 10%. After that, the government made payments to the poor households that found themselves in a bad financial situation. These payments motivated us to split the dynamics of the income in different proportions between the borrowers with secured and unsecured loans. The dynamic of the consumer loan debt depends on the loan interest rate rL that, in its turn, depends on the dynamic of the arrears and the Central Bank key interest rate. Nowadays, the key interest rate is 5%. In Fig. 3 we present
Consumer Loan Demand Modeling
423
the regressions of the arrears and the loan interest rate. The regressions of the arrears are constructed due to the interest rate on loan with the 3 month delay and the amount of unsecured loans L1 (the total consumer loan debt of class x1 ). 1.2
0.21
Actual data Regression data
1.1
0.2 0.19
Arrears, trillions, RUB
1
0.18
0.9
0.17
0.8 0.16
0.7 0.15
0.6
0.14
0.5
0.13
0.4
0.12
0.3 2010
0.11
2012
2014
2016
2018
2020
2015.5 2016 2016.5 2017 2017.5 2018 2018.5 2019 2019.5 2020
Year
Year
Fig. 3. Arrears regression (left) and interest rate on loans (right).
We consider the 4 possible scenarios of the key interest rate dynamics: holding the current key interest rate at 5%, freezing the key interest rate at 4.5%, a gradual decrease down to 3%, and an abrupt decrease down to 3%. In Fig. 4 we present these scenarios of the key interest rate dynamics and the corresponding dynamics of the loan interest rate. Key interest rate
8
0.22 0.2
7
0.18 Interest rate
Percent
6
5
0.16 0.14
r train period L
r test period L
4 0.12
Freezing the KIR at 4.5%. r (2023) = 12.6% L
Abrupt drop of the KIR to 3%. r (2023) = 9.3%
3
2 2019
L
Freezing the KIR at 4.5% Abrupt drop of the KIR to 3% Gradual decrease of the KIR to 3% Current increase of the KIR to 5% 2019.5
2020
2020.5
2021
Year
0.1
Gradual decrease of the KIR to 3%. r (2023) = 10% L
Current increase of the KIR to 5%. r (2023) = 14% L
real data
2021.5
2022
2022.5
2023
0.08
2010
2012
2014
2016
2018
2020
2022
Year
Fig. 4. The Central Bank key interest rate scenarios (left); corresponding loan interest rate (right). The dash dotted line represents the actual dynamic of the loan interest rate.
In Fig. 5 and 6 we present the dynamics of the consumer loan debt in the group of regions 1 and 2, depending on the key interest rate scenario.
424
A. A. Shananin et al. Group of regions 1
3.5
Trillions, RUB
2.5
L1 train period L test period 1
3
Gradual decrease of the KIR to 3%. L (2023) = 2.43
2.5
2 1.5
2
2
0.5
2014
2016
2018
2020
0
2022
Freezing the KIR at 4.5%. L2 (2023) = 0.69 Gradual decrease of the KIR to 3%. L 2 (2023) = 0.73 Current increase of the KIR to 5%. L 2 (2023) = 0.67
0.5
2012
L2 test period Abrupt drop of the KIR to 3%. L 2 (2023) = 0.72
1.5 1
2010
1
Current increase of the KIR to 5%. L 1 (2023) = 2.27 L train period
1
0
Freezing the KIR at 4.5%. L1 (2023) = 2.32 Abrupt drop of the KIR to 3%. L 1 (2023) = 2.44
Trillions, RUB
3
Group of regions 1. Secured and unsecured loans.
3.5
L train period L test period Freezing the KIR at 4.5%. L(2023) = 3.01 Abrupt drop of the KIR to 3%. L(2023) = 3.16 Gradual decrease of the KIR to 3%. L(2023) = 3.17 Current increase of the KIR to 5%. L(2023) = 2.94 Real data
2010
2012
2014
2016
Year
2018
2020
2022
Year
Fig. 5. Total loan debt (left); unsecured (red line) and secured (blue line) loan debts (right). (Color figure online) Group of regions 2
18
14
Trillions, RUB
12
L train period 1
14
1
1
12
Abrupt drop of the KIR to 3%. L (2023) = 8.13 1
Gradual decrease of the KIR to 3%. L 1(2023) = 9.78
10 8
10
L2 test period Freezing the KIR at 4.5%. L2(2023) = 1.29
6 4
4
Current increase of the KIR to 5%. L 1(2023) = 15.43 L2 train period
8
6
Abrupt drop of the KIR to 3%. L 2(2023) = 1.13 Gradual decrease of the KIR to 3%. L 2(2023) = 1.32 Current increase of the KIR to 5%. L (2023) = 1.33 2
2
2 0
L test period Freezing the KIR at 4.5%. L (2023) = 12.73
Trillions, RUB
16
Group of regions 2. Secured and unsecured loans.
16
L train period L test period Freezing the KIR at 4.5%. L(2023) = 14.02 Abrupt drop of the KIR to 3%. L(2023) = 9.26 Gradual decrease of the KIR to 3%. L(2023) = 11.11 Current increase of the KIR to 5%. L(2023) = 16.76 Real data
2010
2012
2014
2016
Year
2018
2020
2022
0
2010
2012
2014
2016
2018
2020
2022
Year
Fig. 6. Total loan debt (left); unsecured (red line) and secured (blue line) loan debts (right). (Color figure online)
As we can see from the Fig. 5 and 6 above, there is no problem of growth of the consumer loan debt in the group of regions 1. But, there is a big growth of the consumer loan debt in the group of regions 2 in non-decreasing key interest rate scenarios. The bold dots in Fig. 6 represent the time periods when the credits become bad. The question arises if the decreasing scenarios of the key interest rate are the only way to control the exponential growth of the consumer loan debt in the group of regions 2. One of the possible ways is to continue subsidizing the households with unsecured loans. In Fig. 7 we present the scenario when the gov-
Consumer Loan Demand Modeling
425
ernment continues making payments to the poor households (about 2.1 trillion rubles till the end of 2022) and, at the same time, the Central Bank holds the key interest rate at 5%. Group of regions 2
12
10
Group of regions 2. Secured and unsecured loans.
12
L train period L test period Subsidizing the poor. L(2023) = 11.74 Real data
L1 train period L1 test period
10
Subsidizing the poor. L 1(2023) = 10.64 L2 train period
8 Trillions, RUB
Trillions, RUB
8
6
6
4
2
2
2010
2012
2014
2016
Year
2018
2020
2022
2
Subsidizing the poor. L (2023) = 1.1 2
4
0
L test period
0
2010
2012
2014
2016
2018
2020
2022
Year
Fig. 7. Total loan debt (left); unsecured (red line) and secured (blue line) loan debts (right). (Color figure online)
As we can see from Fig. 7, the growth of the consumer loan debt is still huge, but the credits will become bad only in December 2022. Under the subsidizing politics, the consumer loan debt decreases around 5 trillion rubles, in accordance with the scenario with the same key interest rate dynamic, presented in Fig. 6. Thus, subsidizing the poor borrowers is not a solution of the general problem of consumer loan debt growth. This scenario only delays the explosive growth of the consumer loan debt. More scenarios are investigated in [10].
5
Conclusion
In our research we evaluated over-indebtedness in the context of different deciles by using the data from Russian Longitudinal Monitoring Survey (RLMS). This survey has annually been conducted by Higher School of Economics since 2000. In order to analyze longer and relevant time period, the data of household financial behavior from 2015 till 2018 was examined. In 2015–2018 there were 11352 observations or 2838 households at disposal, which participated in the survey 4 times. These households were decomposed into 4 groups: poor borrowers, medium borrowers, savers and financially passive households. The behavior of these four groups was modeled by the modified Ramsey model [8,9]. The behavior of two borrower groups was simulated using the credit-demand model. The behavior of the savers was simulated using the model with the demand for cash. The intertemporal dynamic of the variables of the third group was reconstructed as
426
A. A. Shananin et al.
the difference between the real values of variables and the sum of values of modeled variables for three groups of borrowers and savers. Such a methodology has led to a qualitative fit of the data of consumer loans, deposits, consumption and consumer cash.
Appendix Regions Considered Group 1 : Moscow; Moscow region; St. Petersburg; Kazan; New Moscow region. Group 2 : Komi Republic, Syktyvkar; Tomsk; Chelyabinsk; Chelyabinsk region, Krasnoarmeysky district; Komi Republic, Usinsk and Usinsky district; Rostov Region, Bataysk; Amur Region, Tambov District; Saratov; Stavropol Territory, Georgievsk and Georgievsky District; Novosibirsk region, Berdsk and Berdsky district; Perm Territory, Solikamsk and Solikamsky District; Orenburg Region, Orsk; Volgograd region, Rudnyansky district; Smolensk; Saratov Region, Volsk and Volsky District; Vladivostok; Penza region, Zemetchinsky district; Leningrad Region, Volosovsky District; Krasnodar; Lipetsk; Nizhny Novgorod; Tula; Kaluga Region, Kuibyshevsky District; Udmurt Republic, Glazov and Glazovsky district; Tver region, Rzhev and Rzhevsky district; Krasnodar Territory, Kushchevsky District; Tambov region, Uvarovo and Uvarovsky district; Krasnoyarsk; Kurgan; Altai Territory; Krasnoyarsk Territory, Nazarovo and Nazarovsky District; Republic of Chuvashia, Shumerlya and Shumerlinsky district; The Republic of Kabardino-Balkaria, Zalukokoazhe and Zolsky district. Regressions Arrears(t) ≈ −1.78736 + 0.0936723rL,% (t − 3) + 0.253206L1 (t − 3);
rL (t) ≈ 0.0128 + 0.007213KIR% (t) + 0.0118Arrears(t) + 0.4259rL (t − 1). The following predicted data are used as regressors (previously smoothed using Hodrick-Prescott filter [13]): rL (t)—loan rate; rD (t)—deposit rate; rD,curr (t)—foreign exchange deposit rate; jm (t)—monthly inflation; jq (t)— quarterly inflation; jy (t)—annual inflation; γm (t)—monthly income growth rate; γq (t)—quarterly income growth rate; γy (t)—annual income growth rate. Group 1 : 1. Δ1 (t) ≈ 0.0278 + 0.8177rL (t) + 0.0383jq (t) − 0.0666jy (t) + 0.0059γy (t) + 0.213Δ1 (t − 1); 2. Δ2 (t) ≈ 0.0054 + 0.786rL (t) + 0.0408jq (t) − 0.0477jy (t) + 0.1114γq (t) − 0.0108γy (t) + 0.233Δ2 (t − 1); 3. Δ4 (t) ≈ −0.0296 + 0.6658rD (t) − 0.0995rD,curr (t) + 0.0313jq (t) + 0.2921Δ4 (t − 1);
Consumer Loan Demand Modeling
427
4. v(t) ≈ 0.2013 + 0.5303rD (t) + 0.7147γq (t) − 0.3343γy (t) − 0.1845jq (t) + 0.67rD,curr (t) + 0.8249v(t − 1); 5. θ3 (t) ≈ 3.2387 − 11.9275rD (t) + 0.1845γq (t) − 2.9587γy (t) − 11.9042jq (t) + 9.4019jy (t) + 0.9798θ3 (t − 1); 6. C3 (t) ≈ −0.3108 − 0.0492γq (t) + 0.1119γy (t) + 0.2841jm (t) + 0.3977jq (t) − 0.3448jy (t) + 0.9374C3 (t − 1); Group 2 : 1. Δ1 (t) ≈ 0.0282 + 0.7956rL (t) + 0.0374jq (t) − 0.0658jy (t) + 0.016γy (t) + 0.2253Δ1 (t − 1); 2. Δ2 (t) ≈ 0.0524 + 0.7324rL (t) + 0.0113jq (t) − 0.0656jy (t) + 0.1128γq (t) + 0.2936Δ2 (t − 1); 3. Δ4 (t) ≈ −0.0392 + 0.7181rD (t) − 0.0993rD,curr (t) + 0.0405jq (t) + 0.2433Δ4 (t − 1); 4. v(t) ≈ −0.1848 + 0.3674rD (t) − 0.1095γq (t) + 0.0168γy (t) + 0.1787jq (t) + 0.0235jy (t) + 0.723v(t − 1); 5. θ3 (t) ≈ 1.9163 − 3.0933rD (t) − 0.4745γq (t) − 1.2085jq (t) − 0.3759jy (t) + 0.9348θ3 (t − 1); 6. C3 (t) ≈ −0.7029 + 0.4254γq (t) − 0.0048γy (t) + 0.2154jm (t) + 0.9445jq (t) − 0.4237jy (t) + 0.9844C3 (t − 1); 7. C1,add (t) ≈ −9.921 + 11.7407jm (t) − 1.8498jq (t) + 0.0177jy (t) − 4.4094γm (t) + 1.07γq (t) − 0.0264γy (t) + 0.7834C1,add (t − 1). This regression implies taking part of the consumptions of the class x1 by the class x4 . Model Parameters. Group 1 : θ1 = 1, θ2 = 2, θ4 = 4, α1 = 0.88, α2 = 0.75, α4 = 0.765. Group 2 : θ1 = 0.7, θ2 = 2, θ4 = 3, α1 = 0.9, α2 = 0.8, α4 = 0.775.
References 1. Makri, V., Tsaganos, A., Bellas, A.: Determinants of non-performing loans: the case of Eurozone. Panoeconomicues 61(2), 193–206 (2014) 2. Ari, M., Chen S., Ratnovski, M.: The dynamics of non-performing loans during banking crises: a new database. International Monetary Fund (2019) 3. Louzis, D., Vouldis, A., Metaxas, V.: Macroeconomic and bank-specific determinants of non-performing loans in Greece: a comparative study of mortgage, business and consumer loan portfolios. J. Bank. Finance 36(4), 1012–1027 (2012) 4. Kuzina, O., Krupenskiy, N.: Over-indebtedness of Russians: myth or reality? Voprosy Ekonomiki 11, 85–104 (2018)
428
A. A. Shananin et al.
5. Tikhnov, A.: Dynamics of financial and consumer behavior of Russians in 2003– 2018. J. Inst. Stud. 11(3), 153–169 (2019) 6. World Bank: Household Over-Indebtedness in Russia. World Bank Group, Washington, D.C. http://documents.worldbank.org/curated/en/518651584539590418/ Household-Over-Indebtedness-in-Russia. Accessed 1 May 2021 7. Ramsey, F.: A mathematical theory of savings. Econ. J. 152(38), 543–559 (1928) 8. Gimaltdinov, I.: Research of the demand for consumer loans and money. J. Math. Model 24(2), 84–98 (2012) 9. Rudeva, A., Shananin, A.: Control synthesis in a modified Ramsey model with a liquidity constraint. J. Differ. Equ. 45(12), 1835–1839 (2009) 10. Shananin, A.A., Tarasenko, M.V., Trusov, N.V.: Mathematical Modeling of household economy in Russia. Comput. Math. Math. Phys. 61(6), 1030–1051 (2021) 11. Shananin, A.A., Trusov, N.V.: The household behavior modeling based on Mean Field Games approach. Lobachevskii J. Math. 42(7), 1738–1752 (2021) 12. Russia Longitudinal Monitoring Survey, “RLMS-HSE”, conducted by the National Research University Higher School of Economics and ZAO “Demoscope” together with Carolina Population Center, University of North Carolina at Chapel Hill and the Institute of Sociology RAS 13. Hodrick R., Prescott E.: Post-war U.S. business cycles: an empirical investigation. Discussion Papers 451, Northwestern University, Center for Mathematical Studies in Economics and Management Science (1981)
An Industry Maintenance Planning Optimization Problem Using CMA-VNS and Its Variations Anna Zholobova(B) , Yefim Zholobov , Ivan Polyakov, Ovanes Petrosian , and Tatyana Vlasova St Petersburg State University, Peterhof 198504, Russia {st062241,st062280,st062550}@student.spbu.ru, [email protected], [email protected]
Abstract. In this article, we consider the statement of a problem described in the competition ROADEF/EURO challenge 2020 [4] dedicated to a maintenance planning optimization problem in collaboration with RTE [5]. First of all, the main task is to build an optimal maintenance schedule for a high voltage transmission network to ensure the delivery and supply of electricity. To solve this problem we use several optimization methods to find the best possible maintenance schedule that would be consistent with all work-related constraints and would take into account the risk assessment. There are many existing heuristic, metaheuristic, and exact algorithms that can solve this optimization problem; yet it is highly feasible that the state-of-the-art methods currently available may offer some improvement. For this reason, taking into account the task at hand, we decided to use the CMA-VNS [27] algorithm after examining the results of the Combinatorial Black Box Optimization Competition [2]. The algorithm is hybrid and consists of Bipop CMA-ES and VNS algorithms that make it possible to obtain solutions with a zero penalty function and close to a minimal objective function in an admissible time. Other popular algorithms were carried out in comparison with our solution to prove the efficiency of this approach. The effectiveness of this approach to solving the problem is demonstrated. Keywords: Global optimization · Maintenance planning combinatorial optimization · Operations research
1
· Large scale
Introduction
A discrete optimization problem occurs in many practical daily situations, including planning, delivery, routing, and others. Accurate methods for solving such problems require large time and computation outputs. This is unacceptable The work of the penultimate author was carried out under the auspices of a grant of the President of the Russian Federation for state support of young Russian scientists candidates of science, project number MK-4674.2021.1.1. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 429–443, 2021. https://doi.org/10.1007/978-3-030-86433-0_30
430
A. Zholobova et al.
for real-life problems where solutions often need to be obtained dynamically. Thus, approaches for solving large scale problems that are close to optimal in a relatively short time are particularly important and relevant. The French Society of Operations Research and Decision Analysis (ROADEF) [4] organized a challenge dedicated to a maintenance planning optimization problem in collaboration with France’s Transmission System Operator (RTE) [1]. The main goal of this competition is to find a plan for the maintenance of the power grid that meets the specified conditions and does not increase the risk of interruptions in the supply of electricity. Also, in addition to the established function of evaluating the quality of the received solutions, there are several types of restrictions that must be met in the presented planning, these are schedule restrictions, resource restrictions, and so-called exceptions. As a review article on the problems of maintenance scheduling, we have made use of paper [12]. This offers a full description of approaches and known methods of solving such problems. The task of scheduling in a different application has been given ample consideration by many researchers. For example, Roy et al. proposed a distributed pool architecture for genetic algorithms [24] and paper [25] shows the development of distributed cooperative algorithms with a grouping strategy. Furthermore, many popular algorithms have been modified and improved for the scheduling optimization problem. They are Genetic algorithms [20], Swarm Intelligence algorithms (Ant Colony Optimization (ACO) [11,28], Particle Swarm Optimization (PSO) [18,23]), random search (Simulated Annealing (SA) [9]) and local search methods, (Variable Neighborhood Search (VNS) [16]). Various Exact Integer algorithms (Branch and Bound, Branch and Cut) are also used, but they are not considered in this article because of the dimensions of the problem and calculation speed. In order to choose from the whole variety of methods that would obviously give better results in relation to our problem, we have taken the following approach. We first undertook a review of the literature as well as other recent comparable competitions. During the research in this direction, the Combinatorial Black-Box Optimization competition was found, according to the results of which the CMA-VNS algorithm was selected as a potentially good method for stating our problem.
2
Competition
The owner of the ROADEF/EURO challenge problem is the company RTE, which is responsible for the operation, maintenance of the power grid, as well as for the supply of energy. To reach an objective, RTE decided to implement a three-step approach for scheduling maintenance [10]: 1. Computing risk values corresponding to different future scenarios. 2. To find a good schedule for the computed values involving several optimisation approaches. 3. Validating the obtained plans.
IMP Optimization Problem Using CMA-VNS and Its Variations
431
This challenge focuses on the second step of this approach: within the given risk values to find an optimal planning regarding a risk-based objective. 2.1
Description of Parameters
In this section we will show the main notations provided by the organizers [10]. Planning Horizon. In this task, the number of time steps is determined by T ∈ N and the discrete time horizon is H = {1, . . . , T }. Resources. Let C be a set of teams (or resources) with a distinct size needed to carry out the different tasks (or interventions). There are some parameters for each resource c ∈ C: Maximum resource (uct ) defines the upper limit that cannot be exceeded for every time moment t ∈ H and every resource c ∈ C, and Minimum resource (ltc ) specifies the minimum demanded value of resources consumed. Interventions. The set of interventions I defines tasks distinct in terms of duration and resource demand that have to be scheduled in the coming year. Each intervention i ∈ I has Time duration (Δi,t ) that determines the actual duration of intervention i ∈ I if it starts at time t ∈ H. Resource workload c,t + (ri,t ∈ R ) required for resource c ∈ C at time t ∈ H by intervention i ∈ I if i begins at time t ∈ H. This depends on time as more resources are mainly needed at the beginning and the end of the intervention for certain reasons (bringing and removing equipment, finishing touches). s,t Risk. The risk (riski,t ) value depends on the intervention considered and on time, as it is often much less risky to perform interventions in the summer (when there is less burden on the electricity network) rather than in winter. So the risk s,t value (in euros) is determined by riski,t ∈ R for time period t ∈ H, scenario s ∈ St , and intervention i ∈ I when i starts at time t ∈ H.
2.2
Schedule Definition
A schedule (or solution) is a list L of pairs (i, t) ∈ I × H, where t is the starting time of intervention i. Let starti denote the starting time of intervention i ∈ I, and It ⊆ I the set of interventions in process at time t ∈ H. A plan is said to be feasible if it adheres to all constraints mentioned below. 2.3
Definition of Constraints
Schedule Restrictions. There are several restrictions regarding the arranging of schedules: • Non-preemptive scheduling. Interventions have to start at the beginning of a period. Moreover, as interventions require shutting down particular lines of the electricity network, an intervention once initiated cannot be interrupted
432
A. Zholobova et al.
(except for non-working days). More precisely, if intervention i ∈ I starts at time t ∈ H, then it has to end at t + Δi,t . • Interventions are scheduled once. All interventions have to be executed. • No work left. All interventions must be completed no later than the end of the horizon. If intervention i ∈ I starts at time t ∈ H, then t + Δi,t ≤ T + 1. Resource Restrictions. The solution corresponding to the resource constraints leads to the following statements. Let the workload due to intervention i c,t for resource c at time t be ri,start . Then the total resource workload for c at time i c,t c,t t is r = ri,starti . The resource demand cannot exceed the resource capaci∈It
ity, but must be at least equal to the minimum workload. Thus, the resource constraints are ltc ≤ rc,t ≤ uct ; ∀c ∈ C, t ∈ H. Disjunctive Restrictions. Some lines where maintenance operations have to be completed are at times too close to one another to carry out the corresponding interventions simultaneously. So the exclusions help to prevent weakness in the network by allowing for consideration of dependencies between the risk values of interventions. Thus, the probability of scenarios where another nearby line is disconnected during interventions will be reduced. The set of exclusions is denoted by Exc. It is a set of triplets (i1 , i2 , t) where i1 , i2 ∈ I and t ∈ H. The exclusion constraints can formally be written as / It ; ∀(i1 , i2 , t) ∈ Exc. i1 ∈ It ⇒ i2 ∈ 2.4
Objective Function Structure
After studies driven by the current scheduling methods, RTE decided to quantify the quality of a given plan by looking at two criteria: the mean cost and the expected excess. Both are risk-related and quantified in euros. 1. Mean cost. The total planning risk considering solution at t ∈ H for scenario s ∈ St , named by risk s,t , is thesum of risks in scenario s over the intermediate s,t riski,start . As the risk values are assumed to interventions at t : risk s,t = i i∈It risk s,t be independent, we have a first order approximation. risk t = |S1t | is the average aggregate planning risk for t ∈ H. Thus, the overall planning risk (or mean cost) is obj1 =
s∈St
1 T
risk t .
t∈H
As a result, the mean cost is average in two ways: relative to the time horizon and regarding to scenarios. 2. Expected excess. To increase the sensitivity of the objective function to critical scenarios, including those with extremely high costs, a metric to quantify the variability of the scenarios has been included. Definition 1. Let E ⊂ R be a non-empty finite set and τ ∈ [0, 1]. The τ quantile of E is: Qτ (E) = min{q ∈ R : ∃X ⊆ E : |X| ≥ τ × |E| and ∀x ∈ X, x ≤ q}.
(1)
IMP Optimization Problem Using CMA-VNS and Its Variations
433
The expected excess indicator depends on τ quantile values. For every time period t, the quantile value Qtτ is Qtτ = Qτ (risk s,t : s ∈ St ). The expected excess at time t ∈ H is then denoted as Excessτ (t) = max (0, Qtτ − risk t ). Excessτ (t). The expected excess of a plan is described as obj2 (τ ) = T1 t∈H
Both metrics are in euros. However, owing to policies of risk mitigation, they cannot be compared directly. In this regard a scaling factor α ∈ [0, 1] has been included in the objective function: obj(τ ) = α × obj1 + (1 − α) × obj2 (τ ). The parameters τ and α are provided by the organizers for each specific case along with the rest of the initial data. Their values are constant as each individual example is solved, but may differ in a number of cases. The corresponding values are given in Table 1. So the aim is to find a feasible plan with the minimum score of the objective function.
3 3.1
Mathematical Optimization Problem Problem Formulation
Combining all expressions from previous section, we get an overall model corresponding to the formulation of the competition task: obj(τ ) = α × obj1 + (1 − α) × obj2 (τ ) ⇒ min; 1 (risk t ); obj1 (τ ) = T
(2) (3)
t∈H
1 Excessτ (t); T t∈H s,t = (riski,start ); i
obj2 (τ ) = risk s,t
(4) (5)
i∈It
risk t =
1 (risk s,t ); |St |
(6)
s∈St
Qτ (E) = min{q ∈ R : ∃X ⊆ E : |X| ≥ τ × |E| and ∀x ∈ X, x ≤ q}; Qtτ
= Qτ (risk
s,t
: s ∈ St );
Excessτ (t) = max (0, Qtτ − risk t ).
(7) (8) (9)
The planning constraints for defining the feasible solution are: t + Δi,t ≤ T + 1; ltc ≤ rtot (c, t) ≤ uct ∀c ∈ C, t ∈ H;
(10) (11)
i1 ∈ It ⇒ i2 ∈ / It
(12)
∀(i1 , i2 , t) ∈ Exc.
434
A. Zholobova et al.
where (10) is schedule constraint, (11) is resource constraint and (12) is disjunctive constraint. We consider that our planning is feasible if all given constraints (exp. (10) to (12)) are carried out. In this regard, when searching for an optimal solution it is expedient to introduce into the objective function additional penalties for violation of constraints. When initiating the task solving process, some calculations are needed in order to get the initial solution and corresponding penalty value. We first use the VNS algorithm with work duration of 0.02 · time limit to get a solution, which then use in the following penalty formula: penalty init init − obj 2 ; θmax = |I| + |Exc|. (13) penalty = 50 · θmax where obj init is the objective function value and penalty init is the penalty for the corresponding initial solution. Finally, we get a discrete optimization problem (exp. (2) to (12)) with the penalty addition (13). Taking into account the complexity and types of equations included in the objective function, we will consider our problem as a nonlinear discrete optimization problem. 3.2
Case Samples for Testing Algorithms
The competition organizers also provided a set of test cases with different parameters and sizes to test possible solutions. A description of some of them is shown below. Table 1. Test cases sample Parameter
Case 02 Case 03 Case 07 Case 10
Resources
9
10
9
9
Interventions
89
91
36
108
Exclusions
32
12
3
40
T
90
90
17
53
Average scenarios 120,0
1,0
5,65
5,68
τ
0,95
0,95
0,5
0,5
α
0,5
0,5
0,5
0,5
Dimension
89
91
36
108
The dimension of the task will be considered as the number of the task that needs to be inserted into the schedule.
IMP Optimization Problem Using CMA-VNS and Its Variations
4 4.1
435
Solution Approach Known Methods
For a high-quality solution [26] of this problem, we decided to analyze various competitions in the field of global, combinatorial, and black-box optimization to choose from the best algorithms at the moment [22]. It is often the case that the competitive process brings about new ideas that then develop into effective solutions. In particular the 1st and 2nd Combinatorial Black-Box Optimization Competitions (CBBOC) [2] were considered, according to the results of which CMA-VNS became the leading algorithm and also LTGA, SAHH, LaguerreHH took the next position on some tracks. We also took into account the Special Session and Competition on Large-Scale Global Optimization [6] dedicated to highlighting the latest achievements in the field of global optimization with one or more goals and with different mathematical approaches. In it, the winning algorithms were SHADE-ILS [21], ML SHADE-SPA [13], and MOS [19], but in our formulation they were not particularly easy to use, as they were more focused on continuous optimization. Since it was decided to treat our problem as a nonlinear combinatorial optimization problem, we decided to use the winner of the CBBOC competition. In this competition, class of BlackBox functions was considered, which is a more general class than non-linear functions, and therefore covers our problem. The dimensions of the verification data in the CBBOC ranged from 50 to 300, which also corresponds to our dimensions in most cases. An additional argument is that the CMA-VNS algorithm became the winner both on the short training track and on the long training track, since the calculation of the final result in the ROADEF/EURO competition takes into account the results both after 15 min and after an hour and a half of the algorithm’s operation. Thus, as the winner of one of the considered problems, the Bipop CMA-VNS algorithm was chosen as the algorithm for solving our problem. For comparison, we also decided to use the SA and PSO methods that are classical solutions [9]. 4.2
Bipop CMA-VNS
The Bipop CMA-VNS is an algorithm that first implements a two-population covariance matrix adaptation with fairly flexible configuration options (9–11 lines of pseudocode), followed by a local search method VNS (12–14 lines of pseudocode) with relatively inexpensive iteration costs as an intensification of obtained Bipop CMA-ES solution:
436
A. Zholobova et al.
1 : procedure Bipop CMA − VNS 2:
time vns short – The execution time of the initial algorithm
3:
to determine the values of the penalty parameters
4:
time bipop cmaes – Specifies the execution time of the first block of code
5:
time vns – Specifies the execution time of the second block of code
6:
while (time < time vns short) do
7:
Initial execution of the algorithm VNS;
8:
end while
9:
while (time < time bipop cmaes) do
10 :
Executing the algorithm Bipop CMA − ES;
11 :
end while
12 :
while (time < time vns) do
13 : 14 :
Executing the algorithm VNS; end while
15 : end procedure CMA − VNS
Below you can see the description of algorithm components. Bipop CMA-ES. The two-population mode of operation of the algorithm (Bipop CMA-ES ) allows to generalize the principles used in CMA-ES. In this mode, the algorithm is used under restart conditions with a change in the size of the initial population, as well as with an updated generation of the values of the initial parameters. The criterion for stopping the algorithm is the maximum number of restarts (max n restarts) of the evolutionary processes set before the start of optimization and the maximum time allocated for the process. We can use this combination of heuristics for considering the discrete optimization problem since the solution elements obtained in the Bipop CMA-ES populations are rounded to the nearest integer values, and the variable neighborhood search is arranged in such a way that it generates integer solutions entirely from the range of acceptable values. CMA-ES. This algorithm works by generating the parameters of the multivariate normal distribution over a solution space, iteratively selecting a population of solution candidates from the parameterized search distribution (line 9 of the Pseudocode) [7,8,14,15]. These candidates are then evaluated by the target function in order for the CMA-ES to update the search distribution, namely its mean value (line 11) and covariance matrix (line 17). Also CMA-ES belongs to the class of Evolutionary Strategies (ES), therefore, it also operates in three main stages: recombination, mutation and selection.
IMP Optimization Problem Using CMA-VNS and Its Variations
437
1 : procedure CMA − ES 2 : Assigning initial parameters : (0)
4:C 5:σ
(0)
= 0;
3 : pσ
(0)
(0)
pc
= 0; - Evolutionary paths
= I; - Covariative Matrix ∈ R+ ;
m
(0)
∈R
n
– Step size and mean distribution
6 : g = 0; - generation 7 : while(t < tmax ) do 8: 9:
A new population of the desired values: (g+1) (g) (g) (g) f or ∼m + σ N 0, C xk
k = 1...λ
10 :
After selection and recombination get new mean:
11 :
m
(g+1)
=
μ
(g+1)
ωi · xi:λ
i=1
12 : 13 : 14 : 15 : 16 :
The step size σ
(g+1)
is recalculated −1 (g+1) (g) (g) 2 = (1 − cσ ) pσ + cσ (2 − cσ ) μ C pσ p(g+1) cσ (g+1) (g) σ −1 σ = σ exp dσ EN (0, I)
Recalculation of the covariance matrix: m(g+1) − m(g) (g+1) (g) pc = (1 − cc )pc + cc (2 − cc )μ σ (g) (g+1)
= (1 − ccov )C
17 :
C
18 :
t = time;
(g)
(g+1)
+ ccov pc
(g+1)T
pc
19 : end while
The objective of the CMA-ES is to match the search distribution with the level lines of the multivariate objective function, which should be minimized. The essence of the algorithm is laid down in the geometric sense of the covariance matrix, which can be used to obtain a scattering ellipsoid. This, in turn, obeys the law of normal distribution. Thus at each step, by adapting the covariance matrix, it is possible to obtain an ellipsoid that is maximally similar in shape to the level line of the objective function. This greatly simplifies the search for an extremum. VNS. VNS [17] is a method that builds on the initial available solution and then advances to the local optimum by local changes. A new potential solution is generated randomly to avoid looping, and then a local search is applied to it over the neighborhoods of the k-th order up to the first improvement of the solution. We use the following basic variant:
438
A. Zholobova et al.
In this case, the neighboring neighborhoods of the k-th order to the current solution will be solutions that differ from the current one in k positions. At the moment, a large number of schemes operating in this way are already presented. Here function Shake(x) randomly changes the solution to avoid looping (line 6) and function FirstImprovement( x ) applies a k-th order neighborhood search (line 7) until it meets the first improvement of the solution x or the maximum number of iterations for the search occurs. 4.3
1 : procedure VNS algorithm 2 : t = 0; 3 : repeat 4:
k = 1;
5:
repeat
6:
x = Shake(x);
7:
x
8: 9: 10 : 11 : 12 : 13 :
= F irstImprovement(x , k);
if (f (x ) < f (x)) then
x=x ; k = 1; else k = k + 1; end
14 :
until (k > kmax )
15 :
t = time;
16 : until (t < tmax )
Particle Swarm Optimization
Particle Swarm Optimization (PSO) is an optimization principle based on the behavior of the population. Each particle of the population has its current speed and position and edits its flight depending on several factors: social and research components. The social component is determined by a set of neighbors, either defined at the beginning of the simulation (the society) or defined at each iteration (the geographical type of neighborhood). Thus, at each time moment, the velocities for each of the particles are recalculated and the particles move to a new position until the stop criterion is met. For the positions of the particles, we have Xi,t+1 = Xi,t + Vi,t+1 . And the following expression for speeds is: Vi,t+1 = c1 ∗ Vi,t + c2 ∗ r2 ∗ (Pi,t − Xi,t ) + c3 ∗ r3 ∗ (Gi,t − Xi,t ), where • • • •
r2 , r3 are random values in range (0, 1); Pi,t is the value of the particle’s best position for this time moment; Gi,t is the value of the social’s best position for this time moment; c1 , c2 , c3 are parameters, accepted as c1 =
|2 − (c2 + c3 ) − c2 = 2.5,
2 |(c2 + c3 )2 − 4 ∗ (c2 + c3 )||
,
c3 = 1.5.
As in the previous algorithm, at each iteration after getting a new particle position, we round the values to get integer solutions. 4.4
Simulated Annealing
The Simulated Annealing (SA) method is a stochastic global optimization algorithm. It is based on the process of crystallization of a substance. In metallurgy,
IMP Optimization Problem Using CMA-VNS and Its Variations
439
to obtain stronger compounds, the metal is first heated, and then controlled cooling occurs to obtain a stronger crystal lattice and reduce the number of defects. The most stable state corresponds to the minimum energy. At high temperatures, the atoms can move freely, allowing the lattice to be rearranged, and controlled cooling makes it possible to find low-level solutions that can be used for further rearrangements. Thus, we have a continuous process at a gradually decreasing temperature. At each iteration we form a new solution in some way. If it provides a lower energy level, then we move on to this solution. However, even if the new solution leads to a higher energy level, we can still move on to it with some probability. Let’s describe the algorithm in more detail: In this pseudo-code, on line 2 the neighborhood operator denotes a new solution that differs randomly from the previous one in k elements, where k is also randomly selected from the interval {0, |I|}.
1 : while ti > tk do
2:
x ← apply neighbourhood operator to x ;
3:
calculate f (x ) ;
4:
if random[0, 1] < e
−(f (x )−f (x)) t
then
x ←x;
5: 6:
end
7:
ti+1 ← α · ti ;
8:
i←i+1 ;
9 : end
5
Simulation Results
For calculations we used a computer with CPU - Intel(R) Core(TM) i7-8750H CPU @ 2.20 GHz; RAM – 16 GB; OS – Windows 10. For implementation of the algorithms Python 3.7 was used [3]. In our Bipop CMA-VNS program we used the following parameters: • • • • • • • • • • •
70 · 300 is the initial penalty value, penaltyinit = |I| max n restart = 25 is the maximum number of restarts CMA-ES, n dim max generation size = 10 + f loor(30 ∗ popsize ), min step = 0.0001 is the minimum step size, max step = 1000 is the maximum step size, inc popsize = 2 is the coefficient that affects the population size change at restart, λ = popsize is the population size, μ = popsize/2 is the offspring size, ωi = log popsize+1 − log (i + 1) are positive weights for recombination, 2 μ+2 is the the time horizon of the evolutionary path, cσ = ndim +μ+5 μ−1 dσ = 1 + 2 · max(0, ndim +1 − 1) + cσ is the step damping parameter,
• cc =
μ dim ndim +4+ n2·μ dim
4+ n
is the the time horizon of the evolutionary path for the
covariance matrix,
2·μ−1 • ccov = μ1 · (n 2+√2)2 + 1 − μ1 · min 1, (ndim is the update rate of 2 +2) +μ dim the covariance matrix,
440
A. Zholobova et al.
where lower bounds and upper bounds are limits of acceptable values of solution variables, dim is accordingly dimension of Interventions, Population size is the initial value of the population for two-populational approach, σ and μ are probabilistic parameters using during the optimization. The graphs below shoes a comparison of convergence in the solution of a 15 min computation and of a 1.5 hour-long computation. So, in the combined Bipop CMA-VNS algorithm, the execution time was distributed as follows: 2% of the tmax was given to the preliminary calculation of the penalty coefficient, 60% to the Bipop CMA-ES operation, and 38% of the total execution time to the VNS operation. We can notice that the main improvements of the solution occur in the initial stages of optimization, and the further work of the program leads to small local changes (Figs. 1, 2, 3, 4).
Fig. 1. Case 10, 15 min optimization value = 3029,972
Fig. 2. Case 10, 1.5 h optimization value = 3025,934
Fig. 3. Case 2, 15 min optimization value = 4800,231
Fig. 4. Case 2, 1.5 h optimization value = 4795,7948
The Table 2 below presents a comparison of the solutions we have obtained and the best-known solutions. It contains the following columns: Instance name, objective function values for SA, PSO and Bipop CMA-VNS accordingly, the best-known solution for all competition participants. And the last column is the percentage of the best-known solution to our solution obtained by Bipop CMAVNS. The values presented in the table are the best values of 10 optimization replications. In some cases our comparative algorithms (SA and PSO) were not able to arrive at a feasible solution (they are marked in the table as a slash). Also,
IMP Optimization Problem Using CMA-VNS and Its Variations
441
the graphics below present the comparison of solutions by different algorithms for cases 3 and 7 (Figs. 5 and 6). Table 2. The table of obtained solutions. Instance SA
PSO
CMA-VNS Best known solution %
A 01
–
–
2220,6812
1767,8156
79,6 %
A 02
4840,7309 10125,199 4795,7948
4671,3766
97,4 %
A 03
1033,925
848,17861
94,6 %
A 04
–
–
3235,1643
2085,87605
64,4 %
A 05
948,8967
954,877
832,5875
635,28288
76,3 %
A 06
–
–
591,4927
A 07
2296,2340 2285,6404 2277,3603
A 08
786,918
744,4778
744,293
99,9 %
A 09
1599,6249 1507,935
1507,5418
1507,2848
99,9 %
A 10
3305,6344 3104,201
3025,4691
2994,8487
98,9 %
A 11
548,5897
572,7169
496,9632
495,2558
99,6 %
A 12
822,3408
845,4398
808.9876
789,635
97,6 %
A 13
–
–
2211,1601
1998,66216
90,4 %
A 14
–
–
2682,8308
2264,12432
84,4 %
A 15
–
–
2760,5039
2268,5926
82,2 %
1010,6039 896,5415
745,5888
Fig. 5. Case 3, dimension:91 CMAPSO: 974,1301; SA: 1033,925; CMAVNS: 965,2488; PSO: 1010,6039
590,6236
99,8 %
2272,7822
99,8 %
Fig. 6. Case 7, dimension: 36 CMAPSO: 2294,7313; SA: 2296,2340; CMAVNS: 2283,6995; PSO: 2285,6404
Our solutions for CMA-VNS were obtained fo 15 min and 1.5 h of computation. And results reveal a 95% accuracy for 95% of cases.
6
Conclusion
The main result of our work is the approach based on the search for a solution algorithm among state-of-the-art algorithms that are winners of international
442
A. Zholobova et al.
competitions for the same set of problems. This involved developing an analysis of the task, modern leading algorithms were studied, as well as other related competitions in order to choose the most optimal approach for solving this problem. We chose the algorithm that solves a similar class of problems and adjusted it to our set conditions, getting, on average, quite good solutions. This approach allowed us to obtain not only solutions that are ahead of the classical algorithms in the field, but also quite close to currently known ones for most examples. The development and modification of the selected algorithm, as well as possible combinations of algorithms from the same algorithm class, to other problems in order to obtain more accurate solutions without losing speed or efficiency, can be considered as areas of the further work.
References 1. Challenge roadef/euro 2020: Maintenance planning problem!. https://www.roadef. org/challenge/2020/en/ 2. Combinatorial black-box optimization competition. https://web.mst.edu/∼tau ritzd/CBBOC/GECCO2015 3. Link to the github repository with the implementation of the described solution. https://github.com/chabann/ROADEF-RTE-maintenance-planning.git 4. Roadef | association fran¸caise de recherche op´erationnelle et d’aide ` a la d´ecision. https://www.roadef.org ´ 5. R´eseau de transport d’Electricit´ e. https://www.rte-france.com/en 6. Special session and competition on large-scale global optimization. http://www. tflsgo.org/special sessions/cec2018.html 7. Varelas, K., et al.: A comparative study of large-scale variants of CMA-ES. In: Auger, Anne, Fonseca, Carlos M.., Louren¸co, Nuno, Machado, Penousal, Paquete, Lu´ıs., Whitley, Darrell (eds.) PPSN 2018. LNCS, vol. 11101, pp. 3–15. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99253-2 1 8. Beyer, H.G., Sendhoff, B.: Simplify your covariance matrix adaptation evolution strategy. IEEE Trans. Evol. Comput. (2017). https://doi.org/10.1109/TEVC.2017. 2680320 9. Brownlee, J.: Clever Algorithms: Nature-Inspired Programming Recipes (2011) 10. Crognier, G., Tournebise, P., Ruiz, M., Panciatici, P.: Grid operation-based outage maintenance planning. Electric Power Syst. Res. 190 (2021). https://doi.org/10. 1016/j.epsr.2020.106682 11. Deng, W., Xu, J., Zhao, H.: An improved ant colony optimization algorithm based on hybrid strategies for scheduling problem. IEEE Access 7, 20281–20292 (2019). https://doi.org/10.1109/ACCESS.2019.2897580 ´ 12. Froger, A., Gendreau, M., Mendoza, J.E., Eric Pinson, Rousseau, L.M.: Maintenance scheduling in the electricity industry: a literature review. Eur. J. Oper. Res. 251(3), 695–706 (2016). https://doi.org/10.1016/j.ejor.2015.08.045 13. Hadi, A.A., Mohamed, A.W., Jambi, K.M.: LSHADE-SPA memetic framework for solving large-scale optimization problems. Complex Intell. Syst. 5(1), 25–40 (2018). https://doi.org/10.1007/s40747-018-0086-8 14. Hansen, N.: The CMA evolution strategy: a comparing review, vol. 192, pp. 75–102 (2007). https://doi.org/10.1007/3-540-32494-1 4 15. Hansen, N.: The CMA evolution strategy: a tutorial (2010)
IMP Optimization Problem Using CMA-VNS and Its Variations
443
16. Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001). https://doi.org/10.1016/S03772217(00)00100-4 17. Hansen, P., Mladenovic, N., Moreno-P´erez, J.: Variable neighbourhood search: Methods and applications. 4OR 175, 367–407 (2010). https://doi.org/10.1007/ s10479-009-0657-6 18. Kemmo´e Tchomt´e, S., Gourgand, M.: Particle swarm optimization: a study of particle displacement for solving continuous and combinatorial optimization problems. Int. J. Prod. Econ. 121(1), 57–67 (2009). https://doi.org/10.1016/j.ijpe.2008.03. 015 19. LaTorre, A., Muelas, S., Pe˜ na, J.-M.: Evaluating the multiple offspring sampling framework on complex continuous optimization functions. Memetic Comput. 5(4), 295–309 (2013). https://doi.org/10.1007/s12293-013-0120-8 20. Mirjalili, Seyedali: Genetic algorithm. In: Evolutionary Algorithms and Neural Networks. SCI, vol. 780, pp. 43–55. Springer, Cham (2019). https://doi.org/10. 1007/978-3-319-93025-1 4 21. Molina, D., Latorre, A., Herrera, F.: Shade with iterative local search for large-scale global optimization, pp. 1–8 (2018). https://doi.org/10.1109/CEC.2018.8477755 22. Mu˜ noz, M.A., Sun, Y., Kirley, M., Halgamuge, S.K.: Algorithm selection for blackbox continuous optimization problems: a survey on methods and challenges. Inf. Sci. 317, 224–245 (2015). https://doi.org/10.1016/j.ins.2015.05.010 23. Niar, S., Bekrar, A., Ammari, A.: An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem. J. Intell. Manuf. (J Intell Manuf) 29, 603–615 (2015) 24. Roy, G., Lee, H., Welch, J.L., Zhao, Y., Pandey, V., Thurston, D.: A distributed pool architecture for genetic algorithms, pp. 1177–1184 (2009). https://doi.org/10. 1109/CEC.2009.4983079 25. Sun, L., Lin, L., Li, H., Gen, M.: Large scale flexible scheduling optimization by a distributed evolutionary algorithm. Comput. Ind. Eng. 128, 894–904 (2019). https://doi.org/10.1016/j.cie.2018.09.025 26. Weise, T.: Global Optimization Algorithm: Theory and Application (2009) 27. Xue, F., Shen, G.: Design of an efficient hyper-heuristic algorithm CMA-VNS for combinatorial black-box optimization problems, pp. 1157–1162 (2017). https://doi. org/10.1145/3067695.3082054 28. Yang, J., Zhuang, Y.: An improved ant colony optimization algorithm for solving a complex combinatorial optimization problem. Appl. Soft Comput. 10, 653–660 (2010). https://doi.org/10.1016/j.asoc.2009.08.040
Numerical Solution of the Inverse Problem for Diffusion-Logistic Model Arising in Online Social Networks Olga Krivorotko1,2 , Tatiana Zvonareva1,2(B) , and Nikolay Zyatkov1 1
Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Academician Lavrentyev Avenue 6, 630090 Novosibirsk, Russia 2 Novosibirsk State University, Pirogova Street 1, 630090 Novosibirsk, Russia [email protected]
Abstract. The information propagation in online social networks is characterized by a nonlinear partial differential equation with the Neumann boundary conditions and initial condition (source) that depends on the type of information and social network. The problem of source identification using additional measurements of the number of influenced users with a discrete distance at fixed times is numerically investigated. One way to solve the source problem for the diffusive-logistic model is to reduce it to the minimization least squares problem that is solved by a combination of the global particle swarm optimization and the local Nelder-Mead methods. Another way is to construct the function of the density of influenced users in space and time that describes additional measurements with high accuracy using a machine learning method named artificial neural networks. The results of numerical calculations for synthetic data show the accuracy of 99% of the source reconstruction. The neural networks approximate additional measurements with lower accuracy, but the approximation function satisfies the diffusive-logistic mathematical model. The novelty lies in the comparative analysis of the stochastic method for minimizing the misfit function based on the structure of the model, and the machine learning approach, which does not use the mathematical model while learning. Keywords: Inverse problem · Ill-posed problem · Source problem Social networks · Diffusion-logistic model · Singular value decomposition · Stability analysis · Particle swarm optimization · Regularization · Artificial neural networks
·
This work is supported by the Russian Science Foundation (project No. 18-71-10044) and the Council for Grants of the President of the Russian Federation (project no. MK-4994.2021.1.1). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 444–459, 2021. https://doi.org/10.1007/978-3-030-86433-0_31
Numerical Solution of the Inverse Problem for DLM
1
445
Introduction
The development of modern computing technologies, supercomputer modeling and cloud computing makes it possible to use modern mathematical methods for the analysis of social processes. The mechanism of information dissemination in social media is sufficiently complex. Most of the previous works in the field of information dissemination are devoted to the characteristics of information dissemination in various social networks using empirical approaches [7,12]. In papers [8,17] the information diffusion techniques are studied. J. Yang and S. Counts [17] investigated in 2010 the impact of the structure of social networks on the dissemination of information and found that social networks are systematically different in structure, methods of navigation on the Internet and methods of social interactions. The original methods that were combined into a technology TALISMAN [14,18] were developed at the Ivannikov Institute for System Programming of the RAS for social media analysis. This technology is aimed at solving a wide range of typical tasks, including multi-purpose monitoring of the situation in the media space. In paper [8], two different diffusion processes, internal and external, were discussed. The internal process is the result of the influence of the basic network structure; the external one is the impact that comes from a variety of non-network sources. However, such models are largely based on systems of ordinary differential equations, the structure of which does not fully describe the process of dissemination of information in online social networks (it does not take into account the heterogeneity of the degree of dissemination of information and the status of users). F. Wang, H. Wang, and K. Xu in 2012 [16] proposed the mathematical model based on partial differential equations (PDEs) which described the diffusion of information in time and space. Using a real data set from the social news aggregation site Digg.com it is shown that the constructed model can characterize and predict the process of diffusion of information with the probability of 95%. These authors introduced the distance x based on friendships and formulated the law of conservation of information flow. The constructed mathematical model is named the diffusion-logistic model (DLM). The coefficients of DLM as well as the initial condition (source) are unknown or could be estimated from the statistical information. Each information in online social networks has its own unique initial source function. It is necessary to identify the initial distribution of influenced users (source function) for developing an optimal plan of the information control in social networks. The diffusion-logistic equation can be considered as a heat conduction equation with a special right-hand side or as a special case of the Fokker-Planck equation [1]. A. Hasanov [4] proved the existence of a quasi-solution of the inverse coefficient problem for the heat equation and Tikhonov functional. B. Kaltenbacher [10] developed a method based on a Maximum Likelihood approach for solving the problem of reconstructing the parameters for the FokkerPlanck equation, which is supplemented by an adjoint method to efficiently compute the gradient of the loss function with respect to the parameters.
446
O. Krivorotko et al.
The purpose of this work is to identify the initial data (source) from the initial boundary value problem for the mathematical model of the specific information propagation in the online social network using additional measurements of the density of influenced users at fixed times and friendship distances (inverse problem). Such models are described by partial differential equations. Coefficients of that characterize the features of information distribution. There are some ways to solve the problem of identification of the initial density function in the mathematical model. The classic way in inverse problem theory is to reduce the inverse problem to the problem of minimizing the misfit function describing the quadratic deviation of the model data from the experimental ones. For the numerical solution of the optimization problem, the combination of the stochastic particle swarm optimization and the deterministic Nelder-Mead method are used. As a result, the initial data of the DLM and the distribution of influenced users in space and time are determined. Another way is to construct the mathematical model that described the statistical information about social media with high accuracy. The machine learning methods in particular artificial neural networks (ANN) [15] are applied to the construction of the distribution function of influenced users in space and time. ANN can identify complex patterns with the available set of historical observations and known outcomes of these data. The novelty lies in the comparative analysis of the stochastic method for minimizing the misfit function based on the structure of the model, and the machine learning approach, which does not use the mathematical model while learning. The paper is organized as follows. The direct and inverse problems for the mathematical model of the diffusion logistic model is formulated in Sect. 2. The analysis of singular values of the matrix of linearized inverse problems is conducted in Sect. 2.1. In Sect. 3 the problem of determining the initial density function is reduced to the problem of minimizing the misfit function. The particle swarm optimization is described in Sect. 3.1, the local Nelder-Mead method is in Sect. 3.2 and the ANN algorithm is presented in Sect. 3.3. The results of numerical calculations and comparative analysis are presented in Sect. 4. Conclusions and future work plans are described in Sect. 5.
2
Inverse Problem for the Diffusion-Logistic Model
The process of information propagation in online social networks is divided into two separate processes named content-oriented and structurally-oriented (corresponding to the first and second terms of the right part (1) respectively). A structurally-oriented process consists of information diffusion between users placed the same distance away from source and this diffusion is based on users’ direct links to those who contribute to the propagation. A content-oriented process describes the information diffusion through non-structural actions (such as search) that arise as a result of the popularity of information. The distance x is an integer value that described the minimum number of friendships between the user and the source. Distance x is measured in units. Time t is measured in hours, the density of influenced users I(x, t) is measured in the number of people per unit distance.
Numerical Solution of the Inverse Problem for DLM
The initial-boundary value problem for DLM is defined as follows: ⎧ ⎨ It = dIxx + r(t)I(x, t) 1 − I(x, t)K −1 , t ≥ 1, l ≤ x ≤ L, I(x, 1) = q(x), l ≤ x ≤ L, ⎩ Ix (l, t) = Ix (L, t) = 0, t ≥ 1.
447
(1)
Here q(x) ≥ 0 is an initial density function that satisfies to following condi2 tions for well-posedness of problem (1) [16]: 1) q(x) ∈ C (l, L); 2) q (l) = q (L) = q ≥ 0. For example, function q(x) could be described by 0; 3) dq + rq 1 − K the conditions in Table 2 (second column) and in Sect. 4 in more details. Function r(t) is a growth rate of the number of influenced users placed at the same distance away from the source. It is measured in quantity per hour and has the form
β β −α(t−1) r(t) = − e −γ , α α and the values and properties of parameters are presented in Table 1. Table 1. Parameters of the model (1). Parameter Characteristic
Value Range
α
Rate of declining interest in information over time 1.5
β
Residual speed
0.375 [0.105; 0.6]
[1.1; 2.2]
γ
Initial growth rate of influenced users
1.65
[1.2; 2.1]
d
Popularity of information which promotes the spread of the information through non-structure based activities such as search
0.01
[0.001; 0.04]
K
Carrying capacity, which is the maximum possible 25 density of influenced users at a given distance
[25; 1000]
Table 2. Discrete initial density function qi = q(xi ), x1 = 1, . . . , x6 = 6. Parameter
q1
Measured value 5.8
q2
q3
q4
q5
q6
1.7
1.9
1
0.95
0.7
Range for ANN [4; 8] [0.9; 2.5] [1; 2.8] [0.8; 1.2] [0.4; 1.5] [0.3; 1.2]
The Neumann boundary conditions in problem (1) means that the network is closed to external information in the considered period of time and the information spreads only within it. For q(x) ≥ 0 and q(x) ≡ 0 the initial-boundary value problem (1) has a unique positive solution I(x, t) ∈ C 2,1 ((l, L)×(1, +∞))∩ C 1,0 ([l, L] × [1, +∞)).
448
O. Krivorotko et al.
In the inverse problem in contrast to the direct one besides the function I(x, t), the unknown model parameters in (1) is q(x). Assume that additional statistic information about the process is given in the following form I(xi , tk ; q) = fik ,
i = 1, . . . , N1 , k = 1, . . . , N2 .
(2)
Here and further I(x, t; q) determines the dependence of q(x). The inverse problem (1)–(2) consists in identification of initial density q(x) of the model (1) using additional data fik (2). Since the data of the inverse problem is not complete, the inverse problem is ill-posed [5] (the solution of inverse problem could be non-unique and/or unstable). 2.1
Singular Value Analysis of the Linearized Inverse Problem
For investigation of the stability of linearized inverse problem (1)–(2) solution we reduce it to the form Aq = f as follows. 1. Linearization. ˜ t; q) = I0 (x, t; q0 ) + δI(x, t; δq) and q = q0 + δq, where I0 is a known Let I(x, solution, q0 is a known initial density function. Substite these expressions in (1) and obtain a linearized initial-boundary value problem with respect to unknown variations δI and δq: ⎧ ⎨ δIt = dδIxx + r(t)δI 1 − 2I0 K −1 , t ≥ 1, l ≤ x ≤ L, δI(x, 1) = δq(x), l ≤ x ≤ L, (3) ⎩ t ≥ 1. δIx (l, t) = δIx (L, t) = 0, 2. Discretization. Substitute for simplicity the entries δI and δq to I and q respectively, and apply a finite difference scheme of the approximation order O(τ + h2 ) to the problem (3). Construct an uniform grid ω = {(xj , tn ) | xj = l + jh, tn = L−l T −1 , τ = } in a closed 1 + nτ, j = 0, . . . , Nx , n = 0, . . . , Nt , h = Nx Nt domain D = {(x, t) | l ≤ x ≤ L, 1 ≤ t ≤ T }. n n n
Ijn+1 − Ijn − 2Ijn + Ij−1 Ij+1 2I0,j n n =d + r Ij 1 − , τ h2 K Ij0 = qj ,
j = 1, . . . , Nx − 1 n = 0, . . . , Nt − 1
j = 0, . . . , Nx ,
n+1 n+1 I n+1 − 4IN + 3IN −3I0n+1 + 4I1n+1 − I2n+1 x −1 x = Nx −2 = 0, 2h 2h
n = 0, . . . , Nt .
Then we express Ijn through q and write the corresponding matrix in rows separating the elements of the row with vertical lines: A(m0) = (0 . . . 1 . . . 0), m = 0, . . . , Nx , 0 m = 1, . . . , Nx − 1, A(m1) = 0 . . . c | Bm | c . . . 0),
Numerical Solution of the Inverse Problem for DLM
...
449
0 k−1 (k−1)−n k k−1 k−1,m−(k−2) Bm−n A(mk) = 0 . . . c c c + Bm Em n=k−1 k−1,m−(k−3) k−1,m−(k−2) k−1,m−(k−4) k−1 k−1,m−(k−3) + cEm−1 + Bm Em + cEm−1 cEm+1 k
k−1,m−(k−4) k−1,m−(k−2) k−1,m−(k−3) k−1 k−1,m−(k−3) . . . cEm+1 + Bm Em + cEm−1 cEm+1 0
(k−1)−n k k−1 k−1,m−(k−2) k k−1 + Bm Em + c c Bm+n c ...0 , n=k−1
k = 2, . . . , Nt , m = k, . . . , Nx − k, 4A(1k) − A(2k) , k = 1, . . . , Nt , 3 4A((Nx − 1)k) − A((Nx − 2)k) , k = 1, . . . , Nt , A(Nx k) = 3 k−1 A(mk) = cA((m − 1)(k − 1)) + Bm A(m(k − 1)) + cA((m + 1)(k − 1)), k − 1, . . . , 1, k = 2, . . . , Nt , m = Nx − (k − 1), . . . , Nx − 1, A(0k) =
dτ k = 1−2c+τ rk A(mk) is a mk-th row of the matrix, c = 2 , Bm h
k 2I0,m 1− K
k,s and Em is an element s-th column k-th row at the point m. The resulting matrix has dimension (N1 + 1) · (N2 + 1) × (Nx + 1) and corresponding rows for measured data time and space.
Analyze the condition number of matrix A using the singular value decomposition: A = U ΣV T ,
Σ = diag (σ1 , . . . , σp ) .
Here p = min{(N1 + 1) · (N2 + 1), Nx + 1}, Σ is a matrix of size (N1 + 1) · (N2 + 1) × (Nx + 1). Singular numbers σ1 ≥ σ2 ≥ · · · ≥ σp ≥ 0, U (of order (N1 + 1) · (N2 + 1)) and V (of order (Nx + 1)) are orthogonal matrices consisting of left and right singular vectors respectively (V T is the transposed matrix to V ) [3]. In numerical experiments, for the inverse problem (1)–(2) we obtained a set of singular values of the matrix A (see Fig. 1). The condition number of the matrix A is determined by the formula cond(A) = σmax /σmin . A singular decomposition of the matrix A was carried out with the parameters and mesh decomposition described in Sect. 4.1. Then the order of magnitude of conditional number is 1016 . It means that the solution
450
O. Krivorotko et al.
of the linearized inverse problem is unstable. We can conclude that the solution of the nonlinear inverse problem (1)–(2) will be unstable. Thus, the considered inverse problem is ill-posed, so it is necessary to develop a numerical algorithm for its regularization. 100
0.01
1
0.009 0.008
0.01
0.007
0.0001
0.006
1x10−6
0.005
1x10−8
0.004
−10
0.003
−12
0.002
−14
0.001
1x10 1x10 1x10
0 0
10
20
30
40
50
Fig. 1. The logarithmic scale of the singular values of the matrix A. In numerical experiments we set N1 = 5, N2 = 15 and Nx = 50.
3
1
2
3
4
5
6
Fig. 2. The absolute difference between the “exact” I(x, 1) = q(x) and recovered initial density of active users by the combination of PSO and the Nelder-Mead method.
Variational Formulation of the Inverse Problem and Optimization Methods
The inverse problem (1)-(2) can be reduced to the problem of minimizing the misfit function: 1 2 (L − l)(T − 1) |I(xi , tk ; q) − fik |2 , J(q) = Aq − f := N 1 · N2 i=1
N
N
2
(4)
k=1
where I(x, t; q) is the solution of a direct problem for a certain set of parameters q. Minimization of function (4) consists of two major steps: a global optimization such as stochastic particle swarm optimization (PSO) [6] for recognizing the global minimum domain and local optimization (the Nelder-Mead method [9]) to clarify the solution of inverse problem. 3.1
Particle Swarm Optimization (PSO)
PSO consists of a population called a swarm of possible solutions called particles and an algorithm that moves these particles in the solution space according to some formula. Displacements satisfy the principle of the best position found in this space which constantly changes when the particles find more favorable positions.
Numerical Solution of the Inverse Problem for DLM
451
Let S be the number of particles in the swarm, xi be the coordinate of the i-th particle, vi be the velocity of the i-th particle, pi be the best known position of the i-th particle, and g be the best known state of the swarm. Then the movement of the swarm occurs in the formula vi ← ωvi + ϕp rp (pi − xi ) + ϕg rg (g − xi ). Algorithm parameters S = 200, ω = 0.01, ϕp = 3.3 and ϕg = 1.3 were obtained empirically and the pseudo-code of the PSO algorithm is as follows: for i = 0, . . . , S do for i = 0, . . . , S do xi ∼ U (blo , bup ) Generate the initial position of the particle pi ← xi Assign the best position of the particle if J(pi ) < J(g) then if J(pi ) < J(g) then g ← pi Update the best position of the swarm vi ∼ U (−|bup − blo |, |bup − blo |) The generated speed of the particle while a termination criterion is not met do while a termination criterion is not met do for i = 0, . . . , S do for i = 0, . . . , S do Generate random values rp , rg ∼ U (0, 1) vi ← ωvi + ϕp rp (pi − xi )+ Updated velocity of the particle +ϕg rg (g − xi ) Update particle position xi ← xi + vi if J(xi ) < J(pi ) then if J(xi ) < J(pi ) then pi ← xi
Update the best particle
if J(pi ) < J(g) then
if J(pi ) < J(g) then
g ← pi
position
Update the best position of the swarm
Here, the notation xi ∼ U (blo , bup ) means that the random vector xi has a multivariate uniform distribution in the region [blo , bup ]6 . The criterion for stopping the PSO is based on change in the value of the functional J. If the difference between the value of the functional on the previous and current iterations is less than some constant ε (in numerical calculations ε = 0.01) the iterations stops. Also for each particle are chosen the lower blo = 0 and upper bup = 6 boundaries of the search-space. 3.2
Nelder-Mead Method
The Nelder-Mead method (simplex method) is a method of local optimization of a function of several variables. The method forms a simplex and moves it in the direction of the minimum using three operations on the simplex: reflection, compression, and stretching. The method uses the corresponding operation coefficients α = 1, β = 0.5 and γ = 2. A simplex is a geometric figure that is an n-dimensional generalization of a triangle, and an n-dimensional simplex has n + 1 vertices.
452
O. Krivorotko et al.
The method is run after PSO using the coordinates of the vector Nelder-Mead g = g (1) , . . . , g (e) as the vertices of the simplex (the best found position of the swarm). The algorithm of the Nelder-Mead method is as follows: for i = 0, . . . , e + 1 do for i = 0, . . . , e + 1 do for k = 0, . . . , e do for k = 0, . . . , e do (k) (k) (k) gi ∼ U (0.7g , 1.3g ) Generate the initial simplex Ji ← J(gi ) The value of the functional is calculated do do gh ← gi | Ji ≥ Jj ∀j Three points are selected that correspond to gn ← gi | Ji ≥ Jj ∀j = h the highest value of the function, the next gl ← gi | Ji ≤ Jj ∀j largest and the lowest value of the function Calculates the center of gravity of all points gc ← 1e gi i=h
except gh
gr ← (1 + α)gc − αgh The point gh is reflected relative to gc Jr ← J(gr ) The value of the functional is calculated if Jr < Jl then if Jr < Jl then gf ← (1 − γ)gc + γgr The direction is correct, performed Jf ← J(gf ) stretching if Jf < Jr then if Jf < Jr then gh ← gf Assign the point gh to the value gf if Jr < Jf then if Jr < Jf then gh ← gr Assign the point gh to the value gr if Jl < Jr < Jn then if Jl < Jr < Jn then gh ← gr Assign the point gh to the value gr if Jn < Jr < Jh then if Jn < Jr < Jh then gh ←→ gr Swap the values of points gr with gh Jh ←→ Jr and Jr with Jh gs ← βgh + (1 − β)gc The point gs is constructed and the value Js ← J(gs ) of the functional is calculated at the point if Js < Jh then if Js < Jh then gh ← gs Assign the point gh to the value gs if Jh < Js then if Jh < Js then l gi ← gl + gi −g Global compression to the gl 2 , i = l if Jh < Jr then if Jh < Jr then gs ← βgh + (1 − β)gc The point gs is constructed and the value Js ← J(gs ) of the functional is calculated at the point if Js < Jh then if Js < Jh then gh ← gs Assign the point gh to the value gs if Jh < Js then if Jh < Js then l gi ← gl + gi −g Global compression to the gl 2 , i = l while required accuracy not achieved while required accuracy not achieved If the greatest distance between vertices is less than some ε (in numerical calculations ε = 1.5) then the necessary solution is obtained and iterations are stopped.
Numerical Solution of the Inverse Problem for DLM
3.3
453
Algorithm of Artificial Neural Network
Artificial neural networks (ANN) are computing systems inspired by biological neural networks that constitute animal brains. ANN allows modeling or approximate non-linear function with some input and output data. The standard neural network has: – input layer that contains of input parameters associated with the state of each neuron of the input layer; – output layer in which the output parameters associated with the state of each neuron of the output layer are calculated. If ANN has additional layers between the input and output layers then they are called hidden layers, and the training of such a network are called deep learning. Each layer is connected to neighboring layers using weights and bias coefficients. The transfer of information from the previous layer to the next is carried out according to the following rule: z = σ(W y + b). Here y ∈ RK2 is the vector of data on the previous layer, z ∈ RK1 is vector of data on the next layer, W ∈ RK1 × RK2 is the weight matrix of the transition from the previous layer to the next, b ∈ RK1 is the vector of bias coefficients, σ is an activation function for elimination of linearity. In numerical calculation a sigmoid function is used: σ(y) =
1 . 1 + e−y
Supervise learning of neural network means that for a given set of previously known input and output data (training data), it is necessary to select the optimal W and b coefficients. For that the quadratic error between the “exact” output true and the output value zm obtained by propagating the input values value zm through the neural network tended to minimal, i.e. (W ∗ , b∗ ) = min E, W,b
E=
M 1 true 2 (zm − zm ) . 2 m=1
(5)
Here M is the number of samples in train data. The optimal coefficients W ∗ and b∗ is identified using the gradient descent method with back propagation of errors method to the minimization problem (5) [11]: k+1 k = wij −η wij
∂E , k ∂wij
bk+1 = bkj − η j
∂E , ∂bkj
i ∈ K2 , j ∈ K1 ,
where k is the iteration step. The stopping condition could be the maximum number of iterations. The gradient of functional E is expressed as follows: ∂E ∂E ∂zj ∂Sj = , ∂wij ∂zj ∂Sj ∂wij
i ∈ K2 , j ∈ K1 ,
454
O. Krivorotko et al.
where M
∂zj ∂E true zm − zm , = = σ(y) (1 − σ(y)) , ∂zj ∂Sj m=1 K 2
∂Sj = y i , Sj = σ wij yi + bj , j ∈ K1 . ∂wij i=1
And similarly for bias coefficients b: ∂E ∂E ∂zj ∂Sj = , ∂bj ∂zj ∂Sj ∂bj
∂E = 1. ∂bj
The algorithm is as follows: true 1. Prepare data {(xm , zm )}, m = 1, ..., M . 2. Select the structure of a fully connected neural network: (a) Determine the number of neurons in the input layer the same as the true )}, m = 1, ..., M . dimension of the vectors xm from the dataset {(xm , zm (b) Determine the number of neurons in the output layer the same as true true from the dataset {(xm , zm )}, m = the dimension of the vectors zm 1, ..., M . (c) Define an arbitrary number of hidden layers and an arbitrary number of neurons for each hidden layer. (d) Determine activation functions for each layer of the neural network except the input. 3. Find the optimal coefficients W ∗ and b∗ for each transition between the layers of the neural network, minimizing the functional E (5), where zm (W, b) is calculated by transforming xm by the neural network.
4
Numerical Experiments
This section contains the initial datasets and numerical results of solving the source problem for the diffusive-logistic mathematical model by the PSO and its combination with the Nelder-Mead method and artificial neural network. At the end of this section, the comparative analysis of information trajectories reconstruction for PSO and ANN methods is presented. 4.1
Initial Datasets
The grid described in Subsect. 2.1 is used. The direct problem (1) is solved by an explicit finite difference scheme with an order of approximation O(τ + h2 ):
n n Ijn+1 − Ijn − 2Ijn + Ij−1 Ij+1 Ijn n n =d + r I 1 − , j τ h2 K Ij0 = q(xj ),
j = 1, . . . , Nx − 1 n = 0, . . . , Nt − 1 j = 0, . . . , Nx ,
Numerical Solution of the Inverse Problem for DLM
I0n+1 =
n+1 4I n+1 − IN 4I1n+1 − I2n+1 n+1 x −2 , INx = Nx −1 , 3 3
455
n = 0, . . . , Nt − 1.
Here Ijn = I(xj , tn ). Put in the numerical calculation l = 1, L = 6, T = 24, Nx = 50 and Nt = 575. Such values of l, L and T were chosen in accordance with the data presented in the paper [16], which illustrate that the interest in information appears to users with a distance of x in the range from 1 to 6 friendships and in the time interval t from 1 to 24 h. The grid split values Nx and Nt are selected so that satisfied the Courant-Friedrichs-Lewy condition: τ≤
2h2 , 4d + rc h2
given the fact that rc = maxt r(t) = 0.44, h = 0.1 and τ = 0.04. In numerical experiments, we use the synthetic data fik for solving the inverse problem (1)–(2) to control the accuracy of a reconstruction of the inverse problem solution. For this we set the “exact” inverse problem solution q(x) using the values from Table 2 for known coefficients presented in Table 1 and using cubic spline interpolation on gaps (xi , xi + 1), where xi = i, i = 1, . . . , 5. The solution of the direct problem (1) is calculated for chosen “exact” parameters for points xi = 1, 2, 3, 4, 5 and tk = 3, . . . , 24 given every hour, i.e. N1 = 5 and N2 = 22 (synthetic measured data). Introduce the relative error for control the accuracy of calculated inverse problem solutions L qex − q˜ 2 2 δ(q) = , q = |q|2 dx. qex 2 l Here q˜ is calculated solution that gives the minimum value of function J(q). 4.2
PSO and Nelder-Mead Approaches
A combined method is used that works as follows. Firstly, the PSO determines the solution using the stopping criteria |Jn − Jn−1 | < ε (see Sect. 3.1). Secondly, using getting solution as an initial approximation for simplex the Nelder-Mead method updates the inverse problem solution. The results of initial function q(x) reconstruction are presented in Table 3 (the second column) and Fig. 2. The values of the variable x are along the Ox axis. Table 3. The misfit function J(q) and the relative error δ(I) of the direct problem solution for combination of PSO and Nelder-Mead method and ANN approach. Calculated value PSO+Nelder-Mead ANN J(q)
0.00089925
δ(I)
0.00000117
79.135 0.00276
456
O. Krivorotko et al.
The parameters of swarm and velocities are regularized using conditions of q(x) for well-posedness of the direct problem (1). The behaviour of q(x) is smooth when x becomes bigger and the reconstruction of its has high accuracy. Figure 2 shows that the greatest deviation from the “exact” solution q(x) is observed at a distance from 1 to 2, where q(x) has the greatest value of the gradient. The relative error of inverse problem solution is δ = 0.00153279. 4.3
Artificial Neural Network
To solve the problem of recovering I(x, t) function from synthetic data fik a fully connected neural network with two hidden layers containing 512 neurons with sigmoid activation function is trained. The 15552 different fik and I(x, t) pairs are generated by solving the direct problem (1) for various parameter values α, β, γ, d, K from the range indicated in Table 1 and the initial data range q indicated in Table 2. The resulting datasets are randomly divided into training data (80% of the entire dataset) and test data (20% of the entire dataset). The neural network is learned on the training data for approximation of I(x, t) according to fik data. The root mean square propagation algorithm was used as an optimizer [13]. This is a variation of the stochastic gradient descent algorithm in which the learning rate is adapted for each parameter. train = 116. The minimum value of the quadratic error for the train data is Emin A trained neural network is verified on test data. For the test data the quadratic test = 46 that means the neural network is not overfitted. error is Emin Figures 3a-d show the result of applying a trained neural network to fik data (blue dashed lines) generated by solving a direct problem for parameters which trial values are given in Tables 1 and 2 as well as the comparison with I(x, tk ) calculated for reconstructed q by combined PSO and Nelder-Mead methods (red triangle lines). Note that ANNs reconstruct function I(x, t) that is close to model behaviour (1) and do not use the mathematical model directly. The accuracy of ANN method depends on number of training data and its quality. A method that uses a neural network to reconstruct I(x, t) function requires the storage of large amounts of data. Since in this work 15552 samples fik and I(x, t) were used to train and test the neural network, it took about 3 GB of RAM to store these samples.
Numerical Solution of the Inverse Problem for DLM 14
18
12
16
457
14
10
12 8 10 6
8
4
6
2
4 1
2
3
4
5
6
1
2
3
a)
4
5
6
5
6
b)
25
25 24.8
24
24.6 23
24.4
22
24.2 24
21
23.8
20
23.6 19
23.4
18
23.2 1
2
3
4
5
6
1
2
c)
3
4
d)
Fig. 3. The density of influenced users I(x, ∗) for “exact” parameter values (solid black line), for reconstructed parameters by combination of PSO and Nelder-Mead methods (dashed red line with triangles) and approximate function I(x, ∗) by ANN (dash-dotted blue line with two dots and circles) for a) I(x, 3), t = 3 hours, b) I(x, 6), t = 6 hours, c) I(x, 15), t = 15 hours and d) I(x, 23), t = 23 hours. The abscissa is the distance x.
5
Conclusion and Plans for Further Work
The source identification problem for the diffusive-logistic mathematical model described information propagation in online social networks using additional measurements about the number of influenced users with a discrete distance at fixed times is numerically investigated. The classic way in ill-posed theory is to reduce source problem to the minimization least-squares problem that solved by a combination of the global particle swarm optimization and the local NelderMead method. The alternative method is to construct the function of the density of influenced users in space and time that described additional measurements with high accuracy using artificial neural networks. ANNs use only a set of data and do not apply the structure of the mathematical model. The results of numerical calculations for synthetic data show that usage of mathematical model and optimization of the misfit function approaches gets the high accuracy in source reconstruction (approximately 99%). The ANN reconstruction of the
458
O. Krivorotko et al.
direct problem solution is 103 times worse than the classical method. The above methods and model are applicable in cases of discrete problem setting. Real data, in contrast to the synthetic data that were used in this article, may not be complete or presented in another form, for example as a sum of x (Twitter and Reddit ones). And in such cases, the artificial neural networks, processing real data will receive a rough approximation of the density of influenced users in space and time. And using ANN approximation as the initial condition the optimization techniques (PSO, Nelder-Mead, gradient methods, etc.) identify the source condition with high accuracy for get good prediction maps. However, with an increase in the number of parameters, due to the absence of the theory of convergence, the Nelder-Mead method may not lead to the correct solution. Therefore, it is necessary to use gradient methods, for example, the fast gradient method [2] to solve an optimization problem with a large number of parameters. The novelty lies in the comparative analysis of the stochastic method for minimizing the misfit function based on the structure of the model, and the machine learning approach, which does not use the mathematical model while learning.
References 1. Dunker, F., Hohage, T.: On parameter identification in stochastic differential equations by penalized maximum likelihood. Inverse Prob. 30(9), 095001 (2014) 2. Gasnikov, A.: Universal gradient descent (2017). https://arxiv.org/abs/1711.00394 3. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970). https://doi.org/10.1007/BF02163027 4. Hasanov, A.: Simultaneously identifying the thermal conductivity and radiative coefficient in heat equation from dirichlet and neumann boundary measured outputs. J. Inverse Ill-Posed Prob. 29(1), 81–91 (2021). https://doi.org/10.1515/jiip2020-0047 5. Kabanikhin, S.: Definitions and examples of inverse and ill-posed problems. J. Inverse Ill-Posed Prob. 16(4), 317–357 (2009). https://doi.org/10.1515/JIIP.2008. 019 6. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN 1995 - International Conference on Neural Networks, pp. 1942–1948. IEEE (1995). https://doi.org/10.1109/ICNN.1995.488968 7. Lerman, K., Ghosh, R.: Information contagion: an empirical study of spread of news on digg and twitter social networks. In: Proceedings of 4th International Conference on Weblogs and Social Media. The AAAI Press (2010) 8. Myers, S., Zhu, C., Leskovec, J.: Information diffusion and external influence in networks. In: Proceedings of the 18th ACM, SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 33–41. Association for Computing Machinery (2012) 9. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965). https://doi.org/10.1093/comjnl/7.4.308 10. Pedretscher, B., Nelhiebel, M., Kaltenbacher, B.: Applying a statistical model to the observed texture evolution of fatigued metal films. IEEE Trans. Device Mater. Reliab. 20(3), 517–523 (2020). https://doi.org/10.1109/TDMR.2020.3004044 11. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning Internal Representations by Error Propagation, vol. 1, pp. 318–362. MIT Press (1986)
Numerical Solution of the Inverse Problem for DLM
459
12. Steeg, G.V., Ghosh, R., Lerman, K.: What stops social epidemics? In: Proceedings of 5th International Conference on Weblogs and Social Media. The AAAI Press (2011) 13. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012) 14. Trofimovich, Y., Kozlov, I., Turdakov, D.: User location detection based on social graph analysis. In: Proceedings of ISPRAS open conference (2016) 15. Villarrubia, G., Paz, J.F.D., Chamoso, P., la Prieta, F.D.: Artificial neural networks used in optimization problems. Neurocomputing 272, 10–16 (2018). https://doi. org/10.1016/j.neucom.2017.04.075 16. Wang, H., Wang, F., Xu, K.: Diffusive logistic model towards predicting information diffusion in online social networks. In: Proceedings of 32nd IEEE International Conference on Distributed Computing Systems Workshops, pp. 133–139. IEEE (2012). https://doi.org/10.1109/ICDCSW.2012.16 17. Yang, J., Counts, S.: Comparing information diffusion structure in weblogs and microblogs. In: Proceedings of 4th International Conference on Weblogs and Social Media. The AAAI Press (2010) 18. Yatskov, A., Varlamov, M., Turdakov, D.: Extraction of data from mass media web sites. Program. Comput. Softw. 44(5), 344–352 (2018). https://doi.org/10.1134/ s0361768818050092
Optimal Control
On One Approach to the Optimization of Discrete-Continuous Controlled Systems Alexander Buldaev(B) Buryat State University, 24a Smolin Street, Ulan-Ude 670000, Russia
Abstract. A class of discrete-continuous control systems described by differential equations with piecewise constant controls is considered. Conditions for nonlocal improvement and control optimality are constructed in the form of fixed point problems in the control space. This representation of conditions makes it possible to apply and modify the well-known theory and methods of fixed points for constructing iterative algorithms for solving the considered discrete-continuous optimal control problems. The proposed iterative algorithms have the property of nonlocality of successive control approximations and the absence of a procedure for the parametric search for an improving approximation at each iteration, which is characteristic of known standard gradient-type methods. Based on the proposed approach, new necessary conditions for control optimality are constructed, which strengthen the known conditions of optimality. Conditions for the convergence of relaxation sequences of controls obtained based on the constructed sufficient conditions for nonlocal improvement of control are derived. Illustrative examples of improving admissible controls and obtaining optimal controls by fixed-point methods are given. Keywords: Discrete-continuous system · Piecewise constant control · Conditions for improving control · Optimality conditions · Fixed point problem · Iterative algorithm
1
Introduction
The relevance and interest in the development of discrete-continuous models of controlled processes are dictated by several important circumstances. On the one hand, at present, many models of controlled processes in current ecological-economic, medico-biological, technical applications cannot be adequately represented by classical differential equations that do not change their structure during the period under consideration. These include, for example, systems described by differential equations with discontinuous right-hand sides, This work was supported by the Russian Foundation for Basic Research, project 1841-030005, and Buryat State University, the project of the 2021 year. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 463–477, 2021. https://doi.org/10.1007/978-3-030-86433-0_32
464
A. Buldaev
differential equations of different orders at different time intervals, or containing, in addition to differential equations, objects of a different nature. In this case, one of the common modeling approaches is the full or partial discretization of the modeled process in terms of control and state. In this case, various models of a heterogeneous structure arise [1–5]. On the other hand, such discretization acts as an effective tool for studying continuous problems and searching for approximately optimal solutions to such problems, which can be used as good initial approximations for the subsequent refinement of solutions to the original problems. Such techniques are well developed and tested in many applications [6,7]. In addition, there are many problems, in particular, in the field of modeling control of aircraft and space vehicles [8] or control of electric trains [9], in which control can technically be realized only with discrete control of the traction force. A discrete-continuous approach to modeling controlled systems based on partial discretization only for control has been considered in many works [10,11]. In papers [12,13], discrete-continuous models of optimal control problems are based on piecewise-linear approximations of control with the representation of the dynamics of the state of the system on intervals of approximation of control in the form of differential equations are considered. The analysis of the solution of problems in [12] is carried out based on gradient methods used for equivalent finite-dimensional problems in the space of control parameters. In [13], in the class of quadratic problems, equivalent finite-dimensional quadratic problems in the space of control parameters are constructed, for the solution of which the methods of the maximum principle are modified. In this paper, we propose a new approach for solving discrete-continuous problems based on the representation of necessary optimality conditions and improvement of control in equivalent finite-dimensional models in the form of fixed point problems in the space of control parameters. The approach is illustrated in the framework of optimal control problems with piecewise constant controls.
2
Discrete-Continuous Optimal Control Problem
We consider the following optimal control problem Φ(u) = ϕ(x(t1 )) + F (x(t), u(t), t)dt → inf , T
u∈V
x(t) ˙ = f (x(t), u(t), t), x(t0 ) = x0 , u(t) ∈ U ⊂ Rm , t ∈ T = [t0 , t1 ], n
(1) (2)
where the function ϕ(x) is continuously differentiable on R , functions f (x, u, t), F (x, u, t) and their partial derivatives for variables x, u are continuous on the set Rn ×U ×T . Function f (x, u, t) satisfies the Lipschitz condition for x on Rn ×U ×T with a constant L > 0: f (x, u, t) − f (y, u, t) ≤ Lx − y. As admissible controls u(t) = (u1 (t), ..., um (t)) the set V of piecewise constant vector functions is considered, defined on an interval T with a given partition into disjoint intervals
On One Approach to the Optimization of Discrete-Continuous
465
by grid nodes: t0 = Θ0 < Θ1 < ... < Θn = t1 . At each interval Tk = [Θk−1 , Θk ), k = 1, N , the control u(t) takes on a constant value uk = (u1k , ..., umk ) ⊂ U . The set U ⊂ Rm is compact and convex. The initial state x0 and interval T are fixed. Denote by u(·) the admissible control in problem (1), (2) with values u(t) = uk , t ∈ Tk , k = 1, N . Each admissible control u(·) ∈ V is in one-to-one correspondence with a N -dimensional set of m-dimensional vectors u = {u1 , ..., uN }, uk ∈ U, k = 1, N . Let Ω be the set of admissible sets. Thus, problem (1), (2) can be considered as a special problem of mathematical programming, in which it is required to determine a set of vectors u = {u1 , ..., uN } on a given partition of an interval T into N disjoint intervals so that the minimum of the objective function is achieved N F (x(t), uk , t)dt → inf . (3) I(u) = ϕ(x(t1 )) + k=1
u={u1 ,...,uN }∈Ω
Tk
The values x(t), t ∈ Tk , k = 1, N , are determined by successive integration of system (2) on the partition intervals Tk at u(t) = uk , t ∈ Tk , k = 1, N . The Pontryagin function with adjoint variable ψ and the standard adjoint system in problem (1), (2) have the following forms H(ψ, x, u, t) = ψ, f (x, u, t) − F (x, u, t), ψ ∈ Rn , ψ˙ = −Hx (ψ(t), x(t), u(t), t), t ∈ T, ψ(t1 ) = −ϕx (x(t1 )).
(4)
For admissible control v(·) ∈ V we denote by x(t, v), t ∈ T , the solution of system (1); ψ(t, v), t ∈ T is the solution to the standard adjoint system (4) at x(t) = x(t, v), u(t) = v(t). We use the following notation for the partial increment of an arbitrary vector function g(y1 , ..., yl ) for variables ys1 , ys2 : Δzs1 ,zs g(y1 , ..., yl ) = g(y1 , ..., zs1 , ..., zs2 , ..., yl ) − g(y1 , ..., ys1 , ..., ys2 , ..., yl ). 2
Let PY be the operator of projection onto a set Y ⊂ Rk in the Euclidean norm: PY (z) = arg min(y − z), z ∈ Rk . y∈Y
Based on the well-known standard formula for the increment of the objective functional, constructed in the class of piecewise continuous controls [14,15], as a consequence for piecewise constant controls in problem (1), (2) we can obtain an increment formula with the remainder of the expansion in the following form Δv Φ(u) = − Δv(t) H(ψ(t, u), x(t, u), u(t), t)dt + o( v(t) − u(t)dt). T
T
For the problem in the finite-dimensional form (2), (3) the formula for the increment, respectively, takes the form: N N Δv I(u) = − Δvk H(ψ(t, u), x(t, u), uk , t)dt + o( vk − uk ), (5) k=1
Tk
k=1
466
A. Buldaev
where u = {u1 , ..., uN }, v = {v1 , ..., vN } are admissible sets of vectors corresponding to controls u(·), v(·), respectively. From formula (5) it follows the standard formula with the main part of the objective function increment linear in the control increment: N N Hu (ψ(t, u), x(t, u), uk , t), vk − uk dt + o( vk − uk ). (6) Δv I(u) = − k=1
Tk
k=1
Note that in the linear in control problem (2), (3) (functions f , F are linear for u) formulas (5) and (6) coincide. To obtain a non-standard formula for the increment of the objective function, following [16] we consider a modified differential-algebraic conjugate system, including an additional phase variable y(t) = y1 (t), ..., yn (t)), p˙ = −Hx (p(t), x(t), ω(t), t) − r(t), Hx (p(t), x(t), ω(t), t) + r(t), y(t) − x(t) = Δy(t) H(p(t), x(t), ω(t), t)
(7) (8)
with boundary conditions p(t1 ) = −ϕx (x(t1 )) − q,
(9)
ϕx (x(t1 )) + q, y(t1 ) − x(t1 ) = Δy(t1 ) ϕ(x(t1 )),
(10)
in which, by definition, we assume r(t) = 0, q = 0 in the case of linearity of functions f , F for x (linear in state problem (1), (2)), as well as in the case y(t) = x(t) with corresponding t ∈ T . In the state-linear problem (1), (2) the modified adjoint system (7)–(10), by definition, coincides with the standard adjoint system (4). In the nonlinear problem (1), (2) algebraic equations (8) and (10) can always be analytically resolved for quantities r(t) and q in the form of explicit or conditional formulas (perhaps not uniquely). Thus, the differential-algebraic conjugate system (7)–(10) can always be reduced (perhaps not in a unique way) to a differential conjugate system with uniquely determined values r(t) and q. For admissible controls u(·) ∈ V , v(·) ∈ V , let p(t, u, v), t ∈ T be the solution of the modified adjoint system (7)–(10) at x(t) = x(t, u), y(t) = x(t, v), ω(t) = u(t). The definition implies obvious equality p(t, u, u) = ψ(t, u), t ∈ T . Following the formula for the increment of the objective functional in the class of piecewise continuous controls [16], for piecewise constant controls in problem (1), (2) we obtain the following formula for the increment of the objective functional, which does not contain the remainder of the expansions: Δv(t) H(p(t, u, v), x(t, v), u(t), t)dt. Δv Φ(u) = − T
For the problem in the finite-dimensional form (2), (3) the formula for the increment, respectively, takes the form: N Δvk H(p(t, u, v), x(t, v), uk , t)dt. (11) Δv I(u) = − k=1
Tk
On One Approach to the Optimization of Discrete-Continuous
3
467
Optimality and Control Improvement Conditions
Based on the increment formula (6), one can easily obtain the necessary optimality condition in problem (2), (3) for control u ∈ Ω in the form of inequality: N k=1
Tk
Hu (ψ(t, u), x(t, u), uk , t), wk − uk dt ≤ 0,
w = {w1 , ..., wN }, wk ∈ U, k = 1, N . This inequality implies an obvious equivalent necessary optimality condition in the form of a system of inequalities: Hu (ψ(t, u), x(t, u), uk , t), w − uk dt ≤ 0, w ∈ U, k = 1, N . (12) Tk
The system of inequalities (12) can be written as a system of equations: uk = arg max Hu (ψ(t, u), x(t, u), uk , t), wdt, k = 1, N . w∈U
(13)
Tk
Using the projection operator, the system of inequalities (12) can also be represented in the projection form with the parameter α > 0: uk = PU (uk + α Hu (ψ(t, u), x(t, u), uk , t)dt), k = 1, N . (14) Tk
Note that to satisfy the necessary optimality condition (12), it suffices to verify condition (14) for some α > 0. Conversely, condition (12) implies that condition (14) is satisfied for all α > 0. To search for extreme controls, i.e. satisfying the necessary optimality conditions, one can apply and modify the well-known gradient methods and their modifications based on the increment formula (6). In this paper, we consider a new approach to the search for extremal controls, based on the representation of the necessary optimality conditions (13) and (14) in the form of fixed point problems for special control operators in a finite-dimensional space of admissible sets of vectors u = {u1 , ..., uN }. Such a representation makes it possible to apply and modify the well-known theory and methods of fixed points for constructing new iterative algorithms for finding extremal controls as solutions of the fixed point problems under consideration. Consider the problem of improving an admissible set u = {u1 , ..., uN }: find an admissible set v = {v1 , ..., vN } with a condition Δv I(u) ≤ 0. To improve control u = {u1 , ..., uN } one can apply well-known gradient methods based on the increment formula (6), which are related to local improvement methods. In this paper we consider new methods of non-local improvement, based on the construction, using a non-standard increment formula (11), of conditions for non-local improvement of control in the form of fixed point problems in the control space.
468
A. Buldaev
Let us show that to improve the control in problem (2), (3), it suffices to solve the following system of equations: vk = arg max H(p(t, u, v), x(t, v), w, t)dt, k = 1, N . (15) w∈U
Tk
Indeed, the solution to system (15) satisfies the inequality: H(p(t, u, v), x(t, v), vk , t)dt ≥ H(p(t, u, v), x(t, v), uk , t)dt, k = 1, N . Tk
Tk
Summing these relations over the index k based on the formula (11), we obtain the required relation Δv I(u) ≤ 0. We construct another condition for the nonlocal improvement of control in the linear in control problem (2), (3), in which formula (11) takes the following form N Hu (p(t, u, v), x(t, v), uk , t), vk − uk dt. (16) Δv I(u) = − k=1
Tk
Consider the system of equations: vk = PU (uk + α Hu (p(t, u, v), x(t, v), uk , t)dt), k = 1, N .
(17)
Tk
For the solution v = {v1 , ..., vN } of the system (17), based on the known property of the projection operation, we obtain the inequality: 1 2 Hu (p(t, u, v), x(t, v), uk , t), vk − uk dt ≥ vk − uk . α Tk As a result, from formula (16), the improvement of control u = {u1 , ..., uN } follows with the estimate: N
Δv I(u) ≤ −
1 2 vk − uk . α
(18)
k=1
In this paper, systems (15) and (17) are considered as fixed point problems for special control operators in a finite-dimensional space of admissible sets of vectors v = {v1 , ..., vN }. This approach makes it possible to construct new iterative algorithms for finding improving controls. We represent the conditions for improving the control in the standard operator form of fixed point problems. For admissible controls u = {u1 , ..., uN } , v = {v1 , ..., vN } we introduce maps using the relations: ∗ Wk (u, v) = arg max H(p(t, u, v), x(t, v), w, t)dt, k = 1, N . (19) w∈U
Tk
On One Approach to the Optimization of Discrete-Continuous
469
Using the mappings introduced, the system of equations (15) can be written in the following operator form v = W ∗ (u, v) = G∗ (v), W ∗ = (W1∗ , ..., WN∗ ),
(20)
which for a given control u = {u1 , ..., uN } takes the standard form of a fixed point problem with an operator G∗ . Let Ω(u) ∈ Ω be the set of fixed points of the problem (15). We define the following mappings with a parameter α > 0: Wkα (u, v) = PU (uk + α Hu (p(t, u, v), x(t, v), uk , t)dt), k = 1, N . (21) Tk
To improve control u = {u1 , ..., uN } in the linear in control problem (2), (3), the system of equations (17) can be written in the operator form using the mappings (21) (22) v = W α (u, v) = Gα (v), W α = (W1α , ..., WNα ). Let Ω α (u) ∈ Ω be the set of fixed points of the problem (17). Let us show that the considered fixed-point approach for representing the conditions for improving the control makes it possible to formulate the necessary optimality conditions (13) and (14) in the linear in control problem (2), (3) in terms of the constructed fixed-point problems. In this case, the improvement estimate (18) makes it possible to strengthen and obtain a new necessary optimality condition in comparison with the necessary condition (14). In the linear in control problem (2), (3), maps (19) can be represented as: Hu (p(t, u, v), x(t, v), uk , t), wdt, k = 1, N . Wk∗ (u, v) = arg max w∈U
Tk
Based on this representation in the linear in control problem (2), (3), we obtain the following statement. Lemma 1. u ∈ Ω(u) if and only if u = {u1 , ..., uN } satisfies the necessary optimality condition (13). Thus, in the linear in control problem (2), (3), the necessary optimality condition (13) can be formulated in terms of the fixed point problem (15) in the form of the following statement. Theorem 1. Let the control u = {u1 , ..., uN } be optimal in the linear in control problem (2), (3). Then u ∈ Ω(u). Corollary 1. The fixed point problem (15) is always solvable for a control u = {u1 , ..., uN } satisfying the necessary optimality condition (13). Corollary 2. The absence of fixed points in problem (15) or non-fulfillment of the condition u ∈ Ω(u) indicates that the control u = {u1 , ..., uN } is not optimal.
470
A. Buldaev
Corollary 3. In the case of non-uniqueness of the solution to the fixed point problem (15), there appears a fundamental possibility of strictly improving the control u = {u1 , ..., uN } that satisfies the necessary optimality condition (13). Based on the fixed point problem (17) in the linear in control problem (2), (3), we obtain similar statements. Lemma 2. u ∈ Ω α (u) if and only if u = {u1 , ..., uN } satisfies the necessary optimality condition (14). Thus, in the linear in control problem (2), (3), the necessary optimality condition (14) can be formulated in the form of the following statement. Theorem 2. Let the control u = {u1 , ..., uN } be optimal in the linear in control problem (2), (3). Then u ∈ Ω α (u) for some α > 0. Corollary 4. The fixed point problem (17) is always solvable for control u = {u1 , ..., uN } that satisfies the necessary optimality condition (14). Corollary 5. The absence of fixed points in problem (17) or non-fulfillment of the condition u ∈ Ω α (u) for all α > 0 indicates that the control u = {u1 , ..., uN } is not optimal. Corollary 6. In the case of non-uniqueness of the solution to the fixed point problem (17), the control u = {u1 , ..., uN }, satisfying the necessary condition (14) is strictly improved on the control v ∈ Ω α (u), v = u according to estimate (18). Estimate (18) makes it possible to strengthen the necessary optimality condition (14) in the linear in control problem (2), (3) in terms of the fixed point problem (17). Theorem 3. (A strengthened necessary optimality condition). Let the control u = {u1 , ..., uN } be optimal in the linear in control problem (2), (3). Then, for all α > 0, the control u = {u1 , ..., uN } is the only solution to the fixed point problem (17), i.e. Ω α (u) = {u}, α > 0. Indeed, in the case of existence at some α > 0 fixed point v ∈ Ω α (u), v = u, by estimate (18), we obtain a strict improvement Δv I(u) < 0, which contradicts the optimality of the control u = {u1 , ..., uN }. The projection problem on a fixed point (17), in contrast to the problem on a fixed point based on the maximization operation (15), in the case of the existence of a solution v = u, v ∈ Ω α (u) allows, based on an estimate (18), to get a conclusion about the strict improvement of control without calculating the objective function. Hence, in the case of the existence of fixed points v = u, v ∈ Ω α (u), we conclude that the extremal control u ∈ Ω is non-optimal without calculating the objective function.
On One Approach to the Optimization of Discrete-Continuous
4
471
Iterative Algorithms
To numerically solve the problem of a fixed point of an operator G : VE → VE , acting on a set VE in a complete normed space E with a norm · E , v = G(v), v ∈ VE ,
(23)
one can use the method of successive approximations and its modifications known in computational mathematics. In particular, an explicit simple iteration method with an index s ≥ 0, which has the form: v s+1 = G(v s ), v 0 ∈ VE .
(24)
To improve the convergence of the iterative process, the fixed point problem (23) can be transformed to an equivalent problem with the parameter δ = 0: v = v + δ(v − G(v)), v ∈ VE , based on which we obtain a modification of the iterative process: v s+1 = v s + δ(v s − G(v s )), v 0 ∈ VE . By choosing a parameter δ = 0, it is possible to regulate the convergence of the considered modification of the simple iteration method. The convergence of this iterative process can be analyzed using the wellknown contraction mapping principle. To solve the fixed point problem of the necessary optimality conditions (13) and (14), simple iteration methods with given initial approximations u0 ∈ Ω take the following forms, respectively s+1 uk = arg max Hu (ψ(t, us ), x(t, us ), usk , t), wdt, k = 1, N , s ≥ 0. (25) w∈U
Tk
us+1 = PU (usk + α k
Tk
Hu (ψ(t, us ), x(t, us ), usk , t)dt), k = 1, N , s ≥ 0.
(26)
Unlike the well-known gradient methods, the proposed fixed-point methods do not guarantee relaxation for the objective function at each iteration of the methods. The relaxation property is compensated by the nonlocality of successive control approximations and the absence at each iteration of a rather laborious operation of convex or needle-like control variation in the vicinity of the current control approximation. The convergence results of iterative processes depend on the choice of the initial approximation of the processes. In particular, in the case of a non-unique solution of Eq. (14), the convergence of the iterative process (26) to one or another extremal control is determined by the choice of the initial approximation.
472
A. Buldaev
Nonlocal improvement conditions (15) and (17) make it possible to construct iterative algorithms for constructing control sequences relaxation for the objective function and obtaining approximate solutions to the optimal control problem (2), (3). Simple iteration methods for solving fixed point problems (15) and (17) with initial approximations v 0 ∈ Ω have the following forms, respectively s+1 H(p(t, u, v s ), x(t, v s ), w, t)dt, k = 1, N , s ≥ 0. (27) vk = arg max w∈U
vks+1 = PU (usk + α
Tk
Tk
H(p(t, u, v s ), x(t, v s ), uk , t)dt), k = 1, N , s ≥ 0.
(28)
Iterations over the index s ≥ 0 are carried out until the first strict improvement of the control u ∈ Ω over the objective function: I(v s+1 ) < I(u). Next, a new fixed point problem is constructed to improve the obtained computational control and the iterative process is repeated. If there is no strict improvement in control, then the iterative process is carried out until the following condition is met v s+1 − v s ≤ ε, where ε > 0 is the given accuracy of calculating the fixed point problem. At this iteration of the calculation of sequential problems the improvement of control by the proposed algorithms ends. Let us analyze the convergence of the relaxation sequence of controls ul , l ≥ 0, formed as a result of the sequential calculation of control improvement problems (17) in the linear in control problem (2), (3). For each l ≥ 0 consider the value δ(ul ) = I(ul ) − I(ul+1 ) ≥ 0. If δ(ul ) = 0, then by estimate (18) we obtain that ul = ul+1 , i.e. the control ul satisfies the necessary optimality condition (14). Thus, the quantity δ(ul ) characterizes the discrepancy (measure) of the fulfillment of the necessary optimality condition (14) on the control ul . Theorem 4. Let the family of phase trajectories of system (2) in the total boundedness in the linear in control problem (2), (3): x(t, u) ∈ X, t ∈ T, u ∈ Ω, n
where X ∈ R is a convex compact set. Then the relaxation sequence of admissible controls v l , l ≥ 0, converges in the residual of the necessary optimality condition (14): δ(ul ) → 0, l → ∞. Proof. Due to the boundedness of the family of phase trajectories, the sequence I(ul ), l ≥ 1, limited from below. Therefore, taking into account relaxation, this sequence is convergent, i.e. δ(ul ) = I(ul ) − I(ul+1 ) → 0, l → ∞.
On One Approach to the Optimization of Discrete-Continuous
5
473
Examples
Examples of determining extremal controls and improving control based on the proposed fixed-point approach with an illustration of characteristic properties, in particular, the possibility of strictly improving non-optimal extremal control, are considered. Gradient methods do not have this opportunity. Example 1 (extreme control). The problem is considered in the class of piecewise constant controls on an interval T = [0, 2] with a point splitting into two non-intersecting intervals Θ1 = 1 : 0 = Θ0 < Θ1 < Θ2 = 2: 1 Φ(u) = 2
2 x2 (t)dt → inf , u∈V
0
x(t) ˙ = u(t), x(0) = 1, u(t) = uk ∈ U = [−1, 1], t ∈ Tk = [Θk−1 , Θk ], k = 1, 2. In this example, the optimal control is easily determined from the following physical considerations. First, you need to reduce the value of the variable x(t) as much as possible (to zero) on the interval t ∈ T1 = [0, 1] using the control u(t) = −1, t ∈ T1 = [0, 1]. Then, the reached minimum (zero) value x(t) should be maintained over the interval t ∈ T2 = [1, 2] with the help of control u(t) = 0, t ∈ T2 = [1, 2]. The purpose of this example is to demonstrate the possibility of finding an optimal control as a fixed point in the control space. The proposed fixed-point approach for finding extreme controls, in contrast to gradient methods, does not require calculating the values of the objective function. The Pontryagin function and the standard conjugate system have the form: 1 ˙ = x(t), ψ(2) = 0. H(ψ, x, u, t) = ψu − x2 , ψ(t) 2 For admissible u = {u1 , u2 } we define x(t, u) = u1 t + 1, t ∈ T1 ; x(t, u) = u2 t + u1 − u2 + 1, t ∈ T2 . 2 From here we get ψ(t, u) = u2 t2 + (u1 − u2 + 1)t − 2u1 − 2, t ∈ T2 ; ψ(t, u) = 2 u1 t2 + t − 32 u1 − 12 u2 − 2, t ∈ T1 . Using a function sign(z), defined by the following way ⎧ ⎨ −1, z < 0, sign(z) = +1, z > 0, ⎩ w ∈ [−1, 1], z = 0, the fixed point problem of the necessary optimality condition (13) takes the following form u1 = sign( ψ(t, u)dt), u2 = sign( ψ(t, u)dt). T1
T2
474
A. Buldaev
Calculating the integrals, we obtain a system of equations: 4 1 3 1 1 1 u1 = sign(− u1 − u2 − ), u2 = sign(− u1 − u2 − ). 3 2 2 2 3 2 By enumerating nine possible cases that determine the values of the righthand sides of the system, it is easy to find the only admissible extremal solution to the system u = {−1, 0}, which is the optimal solution to the problem under consideration. Thus, the example demonstrates the possibility of searching for extreme controls by the proposed fixed-point methods in the considered class of finitedimensional problems without calculating the values of the objective function necessary to implement the gradient methods. Example 2 (strict improvement of extreme control). The problem is considered in the class of piecewise constant controls on an interval T = [0, 2] with a point splitting into two non-intersecting intervals Θ1 = 1 : 0 = Θ0 < Θ1 < Θ2 = 2: Φ(u) =
1 2
2 u(t)x2 (t)dt → inf , u∈V
0
x(t) ˙ = u(t), x(0) = 0, u(t) = uk ∈ U = [−1, 1], t ∈ Tk = [Θk−1 , Θk ], k = 1, 2. The Pontryagin function and the modified differential-algebraic conjugate system have the form: 1 H(p, x, u, t) = pu − ux2 , p(t) ˙ = u(t)x(t) − r(t), p(2) = 0, 2 1 1 (y(t) − x(t))(r(t) + u(t)y(t) − u(t)x(t)) = 0. 2 2 If y(t) = x(t), then by definition we have r(t) = 0. If y(t) = x(t), then r(t) = 12 u(t)(x(t) − y(t)). From here we get a single general formula r(t) = 1 2 u(t)(x(t) − y(t)). Therefore, the modified conjugate system is reduced to a differential form: 1 p(t) ˙ = u(t)(x(t) + y(t)), p(2) = 0. 2 Let us pose the problem of improving control u = {0, 0} in the finitedimensional version of the problem with the corresponding phase trajectory x(t, u) = 0, t ∈ Tk , k = 1, 2 and the value of the objective function I(u) = 0. For the admissible v = {v1 , v2 } we get x(t, v) = v1 t, t ∈ T1 ; x(t, v) = v2 t + v1 − v2 , t ∈ T2 . It’s obvious that p(t, u, v) = 0, t ∈ Tk , k = 1, 2. As a result, the fixed point problem for improving control u = {0, 0} according to (15) is represented in the form: 1 v1 = sign 0
1 (− v12 t2 )dt, v2 = sign 2
2 1
1 2 (− (v2 t + v1 − v2 ) )dt. 2
On One Approach to the Optimization of Discrete-Continuous
475
After calculating the integrals, the problem takes the form: 1 1 1 1 v1 = sign(− v12 ), v2 = sign(− v12 − v1 v2 − v22 ). 6 2 2 6 The first equation of the system admits two admissible solutions v1 = 0 and v1 = −1. Substituting these solutions into the second equation, we finally obtain three feasible solutions of the fixed point problem: Ω(u) = {v = {0, 0}, v = {0, −1}, v = {−1, −1}}. Hence, we conclude that the control u = {0, 0} satisfies the necessary optimality condition (12), since u ∈ Ω(u). It is easy to check by calculating the objective function that Δv I(u) < 0 for fixed points v = {0, −1}, v = {−1, −1}. Thus, the example demonstrates the possibility of rigorously improving the extreme non-optimal control due to the non-uniqueness of the solution to the fixed point problem based on the maximization operation. Example 3 (extreme control that does not satisfy the strengthened necessary condition). The optimal control problem from the previous example is considered. The problem is posed to improve control u = {0, 0} using the fixed point problem based on the operation of the projection. The projection problem of a fixed point for improving control u = {0, 0} with a parameter α > 0 according to (17) is represented in the form: 1 v1 = PU (−α
1 (− v12 t2 )dt), v2 = PU (−α 2
0
2
1 2 (− (v2 t + v1 − v2 ) )dt). 2
1
Using the expressions for the integrals, we get the following problem: 1 1 1 1 v1 = PU (−α v12 ), v2 = PU (α(− v12 − v1 v2 − v22 )). 6 2 2 6 The first equation of the system admits a solution v1 = 0 for any α > 0. At α ≥ 6, the first equation admits a second admissible solution v1 = − α6 . Substituting the obtained solutions into the second equation, it is easy to determine the set of solutions of the projection problem about a fixed point for different values of the parameter α > 0: Ω α (u) = {v = {0, 0}}, 0 < α < 6, Ω α (u) =
6 v = {0, 0}, v = {0, −1}, v = {− , −1} , α ≥ 6. α
Hence, as in the previous example, we conclude that the control u = {0, 0} satisfies the necessary optimality condition (12), since u ∈ Ω α (u). In this case, we
476
A. Buldaev
additionally conclude that the control u = {0, 0} does not satisfy the strengthened necessary optimality condition since u = {0, 0} it is not the only fixed point for values α ≥ 6. Thus, the example demonstrates the possibility of extracting, without calculating the objective function, such extreme controls that do not satisfy the strengthened necessary optimality condition, and, therefore, are not optimal. The example also illustrates the following more efficient potential possibilities of the problem of improving control based on the design operation as compared to the problem of improving control based on the operation of maximizing the Pontryagin function. The set of fixed points Ω α (u) for values α ≥ 6 is infinite and includes a finite set of fixed points Ω(u) from the previous example. By estimate (18) without calculating the objective function, we conclude that all fixed points v = u, v ∈ Ω α (u) strictly improve the extremal control u = {0, 0}. Moreover, from these fixed points based on estimate (18) without calculating the objective function, the best incremental improving control v = {−1, −1} is easily determined.
6
Conclusion
In the class of discrete-continuous systems with piecewise constant controls, the following main results are obtained: 1. new necessary conditions for optimality and conditions for non-local improvement of control are constructed in the form of fixed point problems in the space of control parameters; 2. based on the obtained conditions, iterative optimization methods in the considered class of problems are designed and analyzed for convergence; 3. the characteristic properties of the proposed optimization methods are illustrated using test examples. The proposed iterative fixed-point methods are characterized by the following features: 1. nonlocality of successive control approximations at each iteration of the methods; 2. the absence of a procedure for convex or needle-like variation of the control with the calculation of the value of the functional of the control at each iteration of the methods, which is typical for gradient methods; 3. the possibility of strictly improving the control that satisfies the necessary optimality condition, which is absent in gradient methods. The specified features of the proposed approach determine a promising direction in the development of effective methods for solving discrete-continuous optimal control problems.
On One Approach to the Optimization of Discrete-Continuous
477
References 1. Emelyanov, S., Korovin, S., Mamedov, I.: Variable Structure Control Systems. Discrete and Digital. CRC Press, Boca Raton (1994) 2. Levine, W. (eds): The Control Handbook: Control System Advanced Methods. CRC Press, London (2010) 3. Van der Schaft, A., Schumacher, H.: An Introduction to Hybrid Dynamical Systems. Springer, London (2000). https://doi.org/10.1007/BFb0109998 4. Gurman, V., Rasina, I.: Discrete-continuous representations of impulsive processes in the controllable systems. Autom. Remote Control. 8(73), 1290–1300 (2012) 5. Mastaliyev, R.: Necessary optimality conditions in optimal control problems by discrete-continuous systems. Tomsk State Univ. J. Control Comput. Sci. 1(30), 4–10 (2015) 6. Evtushenko, Y.: Numerical Optimization Techniques. Publications Division, New York (1985) 7. Tabak, D., Kuo, B.: Optimal Control and Mathematical Programming. Nauka, Moscow (1975) 8. Gurman, V., Kang, N.M.: Degenerate problems of optimal control. I. Autom. Remote Control 4(72), 727–739 (2011) 9. Moiseev, A.: Optimal control under discrete control actions. Autom. Remote Control 9(52), 1274–1280 (1991) 10. Teo, K., Goh, C., Wong, K.: A Unified Computational Approach to Optimal Control Problem. Longman Group Limited (1991) 11. Rahimov, A.: On an approach to solution to optimal control problems on the classes of piecewise constant, piecewise linear, and piecewise given functions. Tomsk State Univ. J. Control Comput. Sci. 2(19), 20–30 (2012) 12. Gorbunov, V.: A method for the parametrization of optimal control problems. USSR Comput. Math. Math. Phys. 2(19), 292–303 (1979) 13. Srochko, V., Aksenyushkina, E.: Parametrization of some control problems for linear systems. Bull. Irkutsk State Univ. Series Math. (30), 83–98 (2019) 14. Srochko, V.: Iterative Methods for Solving Optimal Control Problems. Fizmatlit, Moscow (2000) 15. Vasiliev, O.: Optimization Methods. World Federation Publishers Company, Atlanta (1996) 16. Buldaev, A., Khishektueva, I.-Kh.: The fixed point method for the problems of nonlinear systems optimization on the managing functions and parameters. Bull. Irkutsk State Univ. Series Math. (19), 89–104 (2017)
On One Optimization Problem for the Age Structure of Power Plants Equipment Evgeniia Markova(B)
and Inna Sidler
Melentiev Energy Systems Institute SB RAS, Lermontov Street 130, 664033 Irkutsk, Russia {markova,krlv}@isem.irk.ru
Abstract. In this paper, we consider an approach to modeling strategies for the development of electric power systems. It is based on an integral model of developing systems. The model includes the non-classical Volterra equation of the first kind, which describes the balance between the desired level of electricity consumption and the input of generating capacities. The capacities are divided into several age groups with different indices of their functioning efficiency. We study the case when the moment of the system’s origin coincides with the beginning of the modeling. The prospect of using two methods of describing age boundaries for solving the problem of forecasting the commissioning of capacities of the electric power system for a long period is considered. Based on one of the methods, the problem of optimizing the lifetime of the main equipment of real-life objects of electric power systems is studied. Keywords: Developing system · Age groups · Volterra equation of the first kind · Optimization problem · Electric power system
1
Introduction
Present-day problems of the development and analysis of the dynamics of systems include the study of qualitative changes in the structure of the system, namely: replacing outdated technologies with new ones, taking into account the aging processes of elements, which is associated with changes in the age structure and the efficiency of the system elements in dynamics. The problems of managing aging equipment of enterprises include modeling of developing systems, which allows displaying the performance of age groups of fixed assets [1–3]. Models of this type are mainly used to study the possible consequences of the processes of replacing obsolete equipment in dynamics and are described by the Volterra type integral equations with variable limits of integration, which allow taking into account technological changes over time. The research was carried out within the state assignment of Ministry of Science and Higher Education of the Russian Federation (project FWEU-2021-0006, reg. no. AAAA-A21-121012090034-3). c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 478–492, 2021. https://doi.org/10.1007/978-3-030-86433-0_33
On One Optimization Problem for the Age Structure
479
Such models are often referred to as models with memory, meaning that the production capacities introduced earlier influence the current input dynamics. Such integral models are widely used in economics, ecology, and the study of population dynamics [4–10]. Based on these models, various optimal control problems [10–14] are set. Most researchers consider developing systems in the case when the upper and lower limits of integration do not coincide at the initial point and there is so-called prehistory. In our paper, we consider a developing system from the moment of its origin, when the limits of integration coincide at the initial point. In this case, there is no prehistory. These cases are principally different both in theory and in the application of numerical methods. We refer readers to the monograph [15] for a more detailed analysis. In [16], a model of a developing system was proposed, in which production facilities are divided into several age groups with different efficiency. In there, three types of models are proposed that describe different assumptions about the mechanisms of system elements aging. This paper considers the prospect of applying two of them to modeling long-term strategies for the development of a large electric power system, studies the corresponding problem of optimizing the timing of equipment decommissioning, and considers test examples. The remainder of this paper is structured as follows. Section 2 provides a general statement of the problem and touches upon the correctness of the solution of the corresponding equation. Sections 3 and 4 study two types of models with different assumptions about the behavior of the decommissioning function. In Sect. 5, the problem of optimizing the age structure of a developing system is considered and the results of numerical calculations using model examples as applied to the electric power system are presented. Section 6 contains conclusions on the paper.
2
General Problem Statement
Consider a model of a developing system consisting of n age groups [16]: ai−1 (t) n Ki (t, s)x(s)ds = y(t), t ∈ [0, T ], i=1
(1)
ai (t)
where x(s) is the number of elements of the system that have age t − s at time t, Ki (t, s) is the efficiency coefficient of the functioning of elements x(s) in the age group i at time t, Ki is continuous in both variables and continuously differentiable with respect to t in the domain Δi = {t ∈ [0, T ], s ∈ [ai (t), ai−1 (t)]}, a0 (t) ≡ t > a1 (t) > . . . > an (t) ≡ 0; t − ai (t) is the upper age boundary of the group i. The right-hand side y(t) is interpreted as the required level of system development, the function y(t) is continuously differentiable on [0, T ]. Various problems are set on the basis of this model. For example, the identification problem involves searching for the parameters Ki (t, s) of the system
480
E. Markova and I. Sidler
with known input (x(t)) and output (y(t)) data. We are dealing with a forecasting problem, when, using the known parameters of the system and the given (desired) right-hand side of (1), we need to find the number of input elements of the system x(t). We consider the system since its origin, when ai (0) = 0, i = 1, n. In this case, there is no prehistory, all age groups for t = 0 are empty and y(0) = 0, and therefore x(0) = 0 (when modeling a developing system, this condition is important a priori information). For t > 0, there are n groups whose efficiency differs throughout the entire segment [0, T ]. A series of works [16–19] is devoted to a theoretical study of the Eq. (1), provided that the beginning of modeling and moment of the system’s origin coincide. In [19], sufficient conditions for the correctness of the problem (1) on ◦
(1)
◦
(1)
the pair (C[0,T ] , C [0,T ] ) are given. Here C [0,T ] is the space of functions y(t) continuously differentiable on [0, T ] and y(0) = 0. As for the numerical solution, the implementation of the quadrature methods developed for the numerical solution of the classical Volterra equations as applied to the Eq. (1) in the case ai (0) = 0, i = 0, n, leads to the fact that an equation with n unknowns can arise even at the first node of the grid. The reason for this is the possible discrepancy between the values of the integration limits ai (tj ) and the nodes of the uniform grid tj = jh, j = 1, n, nh = T . In [20], a modification of the right rectangles method is proposed, which gives the first order of convergence in the grid step. In [21], a modification of the left rectangle method is proposed, based on the transformation of the original equation to an equivalent one in which only the upper limits of integration are variables. The modified middle rectangle method is applied to the solution to (1) in [22]. The constructed numerical schemes have the same order of convergence as in the classical case.
3
Model 1
Let the elements of the system x(s) be divided into n age groups Mi , i = 1, n, so that x(s) ∈ Mi , if t − s ∈ Ωi = [t − ai−1 (t), t − ai (t)). The first model assumes that from the beginning of the system origin until the moment T1 , all elements function as efficiently as possible and belong to the same age group, the rest of the groups are empty. Moreover, the age boundaries Ωi defining the groups do not depend on t and ai (Ti ) = 0. The transition functions from one age group to another have the form 0, t ∈ [0, Ti ), (2) i = 1, n − 1, ai (t) = t − Ti , t ∈ Ωi , so that every time at the moment Ti the age group i + 1 appears. An example of the lower limit functions for this model is shown in Fig. 1.
On One Optimization Problem for the Age Structure
481
Fig. 1. Age boudnaries ai (t) for model 1
The function x ¯(t), t ∈ [0, T ], describing the dynamics of the system development, is defined by a set of n equations k−1
t−T i−1
i=1 t−T i
t−T k−1
Ki (t, s)x(s)ds+
Kk (t, s)x(s)ds = y(t), 0
t ∈ Ωk = [Tk−1 , Tk ), T0 = 0, k = 1, n.
(3)
In [16], the theorem on the existence and uniqueness of a solution to (3) in the class of piecewise continuous functions is proved, and the algorithm for determining the discontinuity points of a solution on any finite segment is given. Here is an important example from [16]. Example 1. Let in (3) n = 4, K1 (t, s) = 1, K2 (t, s) = 1/2, K3 (t, s) = 1/4, K4 (t, s) = 0, T1 = 3, T2 = 5, T3 = 6, T = 12, y(t) = t. The work [16] provides an analytical solution to this equation using a chain of functional equations. It was also determined there that the number of discontinuity points of the solution is 7 and their values are found. Here we present the results of a numerical solution using the modified left rectangle method. Step h is chosen such that all Ti , i = 1, 3, are divisible by h. Figure 2 shows the numerical solution of this example. Note that for this example, the exact solution is a piecewise constant function, so the numerical method gives an exact result that coincides with the analytical solution obtained in [16]. One of the characteristic features of a developing system is the interchangeability of the elements that make it up. This property is possessed by electric power systems (EPS), which include different types of power plants. In this regard, let us consider a model example, which is designed to represent a production (electric power) system that includes three types of operating equipment.
482
E. Markova and I. Sidler
Fig. 2. Numerical solution with step h = 0.1 for Example 1
Example 2. Let each type of equipment consist of two age groups, then the mathematical model of the EPS has the form t
t−T 11
t−T12
(4)
t t t x1 (s)ds = γ1 (t) x1 (s)ds + x2 (s)ds + x3 (s)ds ,
(5)
x3 (s)ds +
t−T31
t 0
t−T22
t−T 31
(1 − δ3 s)x3 (s)ds = y(t), t ∈ [0, T ],
+
t−T12
(1 − δ2 s)x2 (s)ds+
x2 (s)ds +
t−T21
t
t
t−T 21
(1 − δ1 s)x1 (s)ds +
x1 (s)ds + t−T11
t
0
t−T12
t−T22
0
t t t x3 (s)ds = γ3 (t) x1 (s)ds + x2 (s)ds + x3 (s)ds t−T12
t−T22
0 ≤ γ1 (t) + γ3 (t) ≤ 1.
(6)
0
(7)
Here xi (t) are generating capacities of the corresponding type of power plants; Tij are the moments when the generating capacities of the i -th type leave the group j; γi (t) are proportions of the corresponding type of power plants in the total composition of generating equipment; 0 < δi < 1, i = 1, 3, are numeric parameters. From (4) it can be seen that the elements of the first and second types of power plants operate with an efficiency 100% from the commissioning moment until they reach the age T11 (T21 ), after that the efficiency of the elements decreases at the rate δ1 (δ2 ). Upon reaching the age T12 (T22 ), elements of the corresponding type are removed from service. Objects of the first type simulate the operation of thermal power plants (TPP), the second—nuclear power plants
On One Optimization Problem for the Age Structure
483
(NPP). Elements of the third type operate with efficiency 100% until reaching the age of T31 , after which their efficiency decreases, while elements of this type are not decommissioned within the forecast period. This mode is typical for the operation of hydroelectric power plants (HPP), the lifetime of which is unlimited. Equations (5), (6) define the quantitative structure of system elements. Practical problems imply that the commissioning capacities are nonnegative: xi (t) ≥ 0, i = 1, 3. Solving the equation system, we can get a negative value at some step. In this case, we replace it with zero (nothing needs to be entered). In fact, we obtain an inequality instead of the balance Eq. (4) at this step. Let the following values be the initial data: the moments of transition from age group to the next one T11 = 12, T12 = 15, T21 = 8, T22 = 17, T31 = 18; length of the forecast period T = 30; parameters of the efficiency coefficients of the corresponding groups δ1 = 0.01, δ2 = 0.07, δ3 = 0.05; proportions of the corresponding type of power plants in the total composition of generating equipment γ1 (t) = 0.5, γ3 (t) = 0.2. Let us set the required total available capacity of power plants y(t) = t. The solution x(t) cannot be found analytically. Figure 3 shows the graphs of the numerical solution to Eqs. (4)–(7). A modified left rectangle method was used to solve them. The step of the numerical scheme is chosen so that all Tij , i = 1, 3, j = 1, 2 are divisible by h. It can be seen from the figure that the solution undergoes jumps at the points of the segment whose values are multiples of T11 , T12 , T21 , T22 , T31 , T11 + T12 , T11 + T21 , T11 + T22 , T12 + T21 , T12 + T22 , T12 + T31 , T22 + T31 , which is consistent with the results of [16].
Fig. 3. Numerical solution with step h = 0.5 for Example 2
484
4
E. Markova and I. Sidler
Model 2
The difference of this type of model from the previous one is that the continuous functions ai (t) satisfy the condition t ≡ a0 (t) > a1 (t) > ... > an (t) ≡ 0 ∀t > 0; ai (0) = 0, i = 1, n,
(8)
which guarantees that all age groups Mi are nonempty for t > 0. Thus, it is assumed that the process of dividing system elements into groups with different efficiency indices begins from the moment the system emerged. In addition to (1) (8), it is assumed that ai (t) ∈ C[0,T ] , ai (t) ≥ 0, an (0) < · · · < a1 (0) < 1. The question of stability of the solution to the Eqs. (1), (8) was researched in [17,19,23–25]. The following main results from that papers are important for EPS modeling. Let in (1) the lower limits be ai (t) = αi t, 1 = α0 > α1 ... > αn = 0 (see Fig. 4), and the efficiency coefficients be constant: Ki (t, s) = βi , β1 = 0, so (1) has the form t β1
x(s)ds +
α1 t
n i=2
αi−1 t
x(s)ds = y(t), t ∈ [0, T ].
βi
(9)
αi t
Fig. 4. Age boundaries ai (t) = αi t
Then the condition
n
|βi−1 − βi |αi−1 < 1
(10)
i=2 ◦
(1)
guarantees the correctness of the Eq. (9) on the pair (C[0,T ] , C [0,T ] ). In the case when the kernels Ki , i = 2, N are monotonically increasing (modulo) functions, there is some threshold value T ∗ that guarantees, according
On One Optimization Problem for the Age Structure
485
to the principle of contraction maps, the existence, uniqueness, and stability of a continuous solution to (1) on [0, T ] for T < T ∗ . For T > T ∗ , the compression condition is violated and the solution becomes unstable. It should be noted that in order to study the aging processes of the elements of a developing system and replace them with new ones, it is necessary to consider the modeling segment rather large. Consider an example built for the purpose of an approach to modeling EPS. Example 3. Suppose that the EPS has the same structure as in the first model, while the age boundaries correspond to the model of the second type: 3 t i=1
t
α i1 t
(1 − δi s)xi (s)ds = y(t), t ∈ [0, T ],
xi (s)ds +
αi1 t
αi2 t
t t t x1 (s)ds = γ1 (t) x1 (s)ds + x2 (s)ds + x3 (s)ds ,
α12 t
α12 t
α22 t
α32 t
t
t
t
t
α32 t
(11)
x3 (s)ds = γ3 (t)
x1 (s)ds +
α12 t
x2 (s)ds +
α22 t
x3 (s)ds ,
(12)
(13)
α32 t
0 ≤ γ1 (t) + γ3 (t) ≤ 1,
(14)
xi (t) ≥ 0, i = 1, 3.
(15)
Let the parameters of the age limits be α11 = 0.8, α12 = 0.1, α21 = 0.8, α22 = 0.2, α31 = 0.9, α32 = 0, forecast period be T = 30, parameters of efficiency coefficients of the corresponding age groups be δ1 = 0.01, δ2 = 0.07, δ3 = 0.05, the proportions of TPP and HPP capacities, correspondingly, in the total composition of generating equipment be γ1 (t) = 0.5, γ3 (t) = 0.2. From (11) it can be seen that the young elements of the system (ranging from αi1 t to t) work with an efficiency 100%, the elements from the older groups (ranging from αi2 t to αi1 t) work with decreasing (modulo) efficiency (1 − δi t). It is important that all elements reaching at the time t the age αi2 t, i = 1, 2, is decommissioned (TPP and NPP), and elements of the third type (HPP) is not decommissioned during the forecast period. Let the required total available capacity be y(t) = t as in the Example 2. Figure 5 shows the numerical solution. Examples 2 and 3 are built to compare two approaches to modeling the age boundaries of the functioning of capacities in the task of developing the electric power industry. Two models describe the operation of one object, but in the first model, the age boundaries are constructed in the form ai (t) = t − Ti , where Ti is the age at which the element leaves the i-th age group. As a result, we obtain a numerical solution with jumps. In the second model, the age limits are ai (t) = αi t, and the age at which an element leaves the group is a variable value. In this case, the obtained numerical solution is close to smooth.
486
E. Markova and I. Sidler
Fig. 5. Numerical solution for Example 3 (h = 0.5).
5
Optimization Problem
Based on (1), we formulate the problem of optimizing the age structure and the moment of decommissioning the power plants equipment. At the same time, the specified demand for electricity y(t) should be provided and the total costs for commissioning and operation of generating capacities should be minimized. As the objective functional, we will take the cost functional:
T
I an (t) =
q 0
t
n i=1
ai−1 (t)
T
u1 (t − s)u2 (s)x(s)dsdt +
βi ai (t)
q t k(t)x(t)dt.
(16)
0
The following functions are supposed to be known: u1 (t − s), at time t, the coefficients of increase in the operating cost for capacities commissioned at time s; u2 (t), the annual cost of operating a capacity unit commissioned at time t; k(t), the cost of commissioning a capacity unit at time t; q t , the cost discount coefficient, 0 < q < 1. Thus, the two terms represent the operating and commissioning costs correspondingly. The control parameter an (t) belongs to the feasible set A = {an (t) : 0 ≤ an (t) < an−1 (t)} .
(17)
Then the optimization problem is to find the minimum of the objective functional: (18) I an (t) → min an (t)∈A
under constraints (3) for the first model or (9) for the second one. For the case of several types of power plant equipment, we obtain a vector analogue of the problem (16)–(18), (3) (or (9)).
On One Optimization Problem for the Age Structure
487
Problems (16)–(18), (3) and (16)–(18), (9) essentially nonlinear, since the control parameter (lifetime) is in the lower limit of integration both in the objective functional and in the constraints. An analytical solution can be found only for special cases. Therefore, we are looking for a numerical solution by a heuristic algorithm based on discretizing all elements of the problem on a grid and replacing the feasible set (17). A detailed description of the algorithm for solving the problem (16)–(18), (3) for the case of three types of power plants, divided into three age groups, can be found in the work [26]. The algorithm for solving the problem (16)–(18), (9) in the scalar case (without dividing plants into types) is given in [27]. Let us give a solution to (16)–(18), (9) for the vector case using the example of the problem of optimizing the lifetime of power plant equipment. Let the generating capacities be divided into components according to the types of energy resources used: TPP, HPP, and NPP. The power plants of the same type are divided into three age groups: 3 i=1
t βi1
α i1 t
xi (s)ds + βi2
αi1 t
t
xi (s)ds + βi3
αi2 t
xi (s)ds = y(t), t ∈ [0, T ], (19)
αi3 t
t t t x1 (s)ds = γ1 (t) x1 (s)ds + x2 (s)ds + x3 (s)ds ,
α13 t
t
α i2 t
α13 t
α23 t
t t t x3 (s)ds = γ3 (t) x1 (s)ds + x2 (s)ds + x3 (s)ds ,
α33 t
α13 t
(20)
α33 t
α23 t
(21)
α33 t
under conditions (14), (15), γ1 (t) and γ3 (t) are given functions describing the change in the proportions of the TPP and HPP capacities, respectively, in the total composition of generating equipment. In the case of aj (t) ≡ (a1j (t), a2j (t), a3j (t)) = (α1j t, α2j t, α3j t), j = 1, 3, the feasible set (17) is reduced to the feasible set of the following form: Ac = {αi3 : 0 αi3 < αi2 , i = 1, 3} .
(22)
It is assumed that the HPP equipment is not decommissioned during the forecast period (parameter α33 = 0). Thus, the optimal control problem is to find the parameters α13 and α23 , characterizing the moment of decommissioning the TPP and NPP equipment and minimizing the objective functional ⎫ ⎧ αi,j−1 ⎪ ⎪ t 3 T 3 ⎬ ⎨ qt βij ui1 (t − s)ui2 (s)xi (s)ds dt I α13 , α23 = ⎪ ⎪ ⎭ ⎩j=1 i=1 αij t
0
3
T
+
i=1 0
q t ki (t)xi (t)dt →
min
α13 ,α23 ∈Ac
.
(23)
488
E. Markova and I. Sidler
As in the scalar case [27], we proceed to discretize all elements on the grid and replace the feasible set Ac with the set Ah = α13 , α23 : α13 = ih, i = 0, N1 − 1, α23 = jh, j = 0, N2 − 1 , (24) where N1 = αh12 , N2 = αh22 . Choose the pair (α13 , α23 ) from the feasible set Ah , substitute it into the discrete analogue of (19)–(21) and find a numerical solution with respect to xi (t). Then we substitute the found solution into the discrete analogue of (23) and find the value of the objective functional for this pair. Thus, looking over all possible options α13 , α23 from (24), we find the ∗ ∗ , α23 ). optimal values (α13 Example 4. Suppose βi1 = 1, βi2 = 0.97, βi3 = 0.9, βi4 = 0, i = 1, 3, α11 = 1/2, α12 = 1/4, α13 = 1/8 (age limits coefficients for TPP); α21 = 1/2, α22 = 1/3, α23 = 1/6 (for NPP); α31 = 1/2, α32 = 1/3, α33 = 0 (for HPP); γ1 (t) = 0.69, γ3 (t) = 0.19; T = 60 in (19)–(21). The functions of increasing the operating costs ui1 (t − s) ≡ ui1 (τ ), i = 1, 3, have the form: 1, τ 30, 1, τ 30, 2 u11 (τ ) = u31 (τ ) = (τ ) = u 1 1.03τ −30 , τ > 30, 1.05τ −30 , τ > 30, (the operating costs increase with the growth rate 3% or 5% per year after 30 years of operation). The functions ki (t) and ui2 (t) are assumed to be constant (USD/MW): k1 (t) = 1300, u12 (t) = 189, k2 (t) = 2500, u22 (t) = 170, k3 (t) = 3000, u32 (t) = 200, t ∈ [0, 60]. Cost discount coefficient is q = 0.97. The parameters of the model were chosen as close as possible to the indices of the real-life EPS. All economic indices were given by experts. The results of applying the numerical algorithm for solving the problem for the right-hand side y(t) = t2 /2 are shown in Fig. 6 and Fig. 7. Figure 6 shows the dynamics of capacity commissioning with base values of age boundaries. Figure 7 shows the graphs of the objective function in the case of optimization of the equipment lifetime for TPP (NPP) with the remaining parameters fixed. The blue point indicates the base value of the corresponding lifetime parameter, and the red one indicates the obtained optimum value. It can be seen that, for these economic parameters, it is recommended to increase the service of the TPP equipment to the maximum age and to reduce the rate of disposal of the NPP equipment. By managing both lifetimes at once, the downward trend in equipment retirement rates continues for these economic indices. In the future, it is planned to carry out calculations with other economic indices to analyze the behavior of the dynamics of capacity commissioning. In addition, it is possible to optimize the age composition not only by the moments of equipment retirement but also by the moments of its transition from one age group to another.
On One Optimization Problem for the Age Structure
489
Y, MW
X, MW 120
2 000 1 800
100
1 600 1 400
80
1 200 1 000
60
800 40
600 400
20
200 0
0 5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
TPP
NPP
HPP
y(t)
Fig. 6. Commissioning of capacities for y(t) = t2 /2 (base variant). I, USD 3 068 500 3 066 500 3 064 500 3 062 500 3 060 500
3 058 500 0 00 0 01 0 02 0 03 0 04 0 05 0 06 0 07 0 08 0 09 0 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 0 22 0 23 0 24
3 056 500
I, USD 3 070 000 3 068 000 3 066 000 3 064 000 3 062 000 3 060 000 3 058 000
Fig. 7. Total costs I(α13 ) and I(α23 ).
0 30
0 32
0 27
0 29
0 24
0 26
0 21
0 23
0 20
0 18
0 15
0 17
0 12
0 14
0 11
0 08
0 09
0 06
0 03
0 05
3 056 000 0 00
3
0 02
1
490
6
E. Markova and I. Sidler
Conclusion
In this work, we considered the application of integral models of developing systems to modeling the development strategies of electric power systems. In the case when the moment of the system origin coincides with the beginning of modeling, the prospect of applying two approaches to describing the age boundaries of the functioning of EPS equipment is considered. On test examples, they showed qualitatively different results. Based on this model, the problem of optimizing the age structure of the EPS is posed, namely, the problem of finding the optimal lifetime of the EPS power. The optimal strategy for the commissioning of capacities should ensure the given available capacity and minimize the total costs of commissioning new and operating generating capacities over the forecast period. In this case, the control parameter (lifetime) enters nonlinearly. Numerical examples of the operation of a heuristic algorithm for a problem with real-life data are given. In further studies, it is planned to consider the possibility of applying methods related to the use of metaheuristics (see, for example, [28,29]), to study the problem of optimizing the age composition of power plant equipment.
References 1. Cooley, T., Greenwood, J., Yorukoglu, M.: The replacement problem. J. Monet. Econ. 40(3), 457–499 (1997). https://doi.org/10.1016/S0304-3932(97)00055-X 2. Hilten, O.: The optimal lifetime of capital equipment. J. Econom. Theory 55(2), 449–454 (1991). https://doi.org/10.1016/0022-0531(91)90051-5 3. Glushkov, V.M.: On one class of dynamic macroeconomic models. Control Syst. Mach. 2, 3–6 (1977). (in Russian) 4. Glushkov, V.M., Ivanov, V.V., Yanenko, V.M.: Modeling of Developing Systems. Nauka Publ, Moscow (1983). (in Russian) 5. Corduneanu, C.: Integral Equations and Applications. Cambridge University Press, Cambridge (1991). https://doi.org/10.1017/CBO9780511569395 6. Hritonenko, N., Yatsenko, Yu.: Creative destruction of computing systems: analysis and modeling. J. Supercomput. 38(2), 134–154 (2006). https://doi.org/10.1007/ s11227-006-7763-x 7. Hritonenko, N., Yatsenko, Y.: Mathematical Modeling in Economics, Ecology and the Environment, 2nd edn. Springer, New York (2013) 8. Hritonenko, N., Yatsenko, Y.: Nonlinear integral models with delays: recent developments and applications. J. King Saud Univ. Sci. 32(1), 726–731 (2020). https:// doi.org/10.1016/j.jksus.2018.11.001 9. Sidorov, D., Tynda, A., Muftahov, I., Dreglea, A., Liu, F.: Nonlinear systems of Volterra equations with piecewise smooth kernels: numerical solution and application for power systems operation. Mathematics 8(8), 1257–1275 (2020). https:// doi.org/10.3390/math8081257 10. Yatsenko, Y.P.: Integral Models of Systems with Controllable Memory. Naukova Dumka, Kiev (1991). (in Russian) 11. Hritonenko, N.: Optimization analysis of a nonlinear integral model with applications to economics. Nonlin. Stud. 11, 59–70 (2004)
On One Optimization Problem for the Age Structure
491
12. Hritonenko, N., Yatsenko, Yu.: Modeling and Optimization of the Lifetime of Technologies. Kluwer Academic Publishers, Dordrecht (1996) 13. Hritonenko, N., Yatsenko, Yu.: Structure of optimal trajectories in a nonlinear dynamic model with endogenous delay. J. Appl. Math. 5, 433–445 (2004). https:// doi.org/10.1155/S1110757X04311046 14. Hritonenko, N., Kato, N., Yatsenko, Y.: Optimal control of investments in old and new capital under improving technology. J. Optim. Theory Appl. 172(1), 247–266 (2016). https://doi.org/10.1007/s10957-016-1022-y 15. Apartsyn, A.S.: Nonclassical linear Volterra equations of the first kind. VSP, Utrecht, Boston (2003). https://doi.org/10.1515/9783110944976 16. Apartsin, A.S., Sidler, I.V.: Using the nonclassical Volterra equations of the first kind to model the developing. Automat. Remote Control 74(6), 899–910 (2013). https://doi.org/10.1134/S0005117913060015 17. Apartsyn, A.S.: To a study on stability of solutions to the test nonclassical Volterra equations of the first kind. Sib. Electron. Math. Reports 12(S), 15–20 (2015). (in Russian, Absr. in Engl.) 18. Apartsin, A.S., Sidler, I.V.: Integral models of development of electric power systems with allowance for ageing of equipment of electric power plants. Electron. Model. 4, 81–88 (2014). (in Russian) 19. Apartsyn, A.S.: On some classes of linear Volterra integral equations. Abstr. Appl. Anal. 2014, Article ID 532409 (2014). https://doi.org/10.1155/2014/532409 20. Markova, E.V., Sidorov, D.N.: Volterra integral equations of the first kind with piecewise continuous kernels in the theory of evolving systems modeling. Bull. Irk. State Univ. Series: Math. 2, 31–45 (2012). (in Russian) 21. Apartsyn, A.S., Sidler, I.V.: Numerical solution of the Volterra equations of the first kind in integral models of developing systems. In: Proc. VII Int. Symp. ”Generalized Statements and Solutions of Control Problems (GSSCP-2014)”, pp. 21–25. ANO, Moscow (2014). (in Russian) 22. Apartsyn, A.S., Sidler, I.V.: On the numerical solution of the nonclassical Volterra equations of the first kind. In: Proc. 9th Int. Conf. ”Analytical and Numerical Methods of Modeling of Natural Science and Social Problems”, pp. 59–64. Penza State University, Penza (2014). (in Russian) 23. Apartsyn, A.S., Sidler, I.V.: Study of test Volterra equations of the first kind in integral models of developing systems. Trudy Inst. Mat. i Mekh. UrO RAN 24(2), 24–33 (2018). https://dx.doi.org/10.21538/0134-4889-2018-24-2-24-33. (in Russian) 24. Apartsin, A.S., Sidler, I.V.: On the Test Volterra Equations of the First Kind in the Integral Models of Developing Systems. Autom. Remote. Control. 79(4), 604–616 (2018). https://doi.org/10.1134/S0005117918040033 25. Apartsyn, A.S., Sidler, I.V.: The test Volterra equation of the first kind in integral models of developing systems containing n age groups. Tambov Univ. Rep. Series: Nat. Tech. Sci. 23(122), 168–179 (2018). https://dx.doi.org/10.20310/1810-01982018-23-122-168-179. (in Russian) 26. Markova, E.V., Sidler, I.V.: Numerical solution of the age structure optimization problem for basic types of power plants. Yugoslav J. Oper. Res. 29(1), 81–92 (2019). https://doi.org/10.2298/YJOR171015009M 27. Markova, E., Sidler, I.: Optimization problem in an integral model of the developing system without prehistory. CCIS 1090, 524–535 (2019). https://doi.org/10.1007/ 978-3-030-33394-2 40
492
E. Markova and I. Sidler
28. Zelinka, I., Tomaszek, L., Vasant, P., Dao, T.T., Hoang, D.V.: A novel approach on evolutionary dynamics analysis - A progress report. J. Comput. Sci. 25, 437–445 (2018). https://doi.org/10.4108/eai.27-2-2017.152351 29. Vasant, P., Marmolejo, J.A., Litvinchev, I., Aguilar, R.R.: Nature-inspired metaheuristics approaches for charging plug-in hybrid electric vehicle. Wireless Netw. 26(7), 4753–4766 (2019). https://doi.org/10.1007/s11276-019-01993-w
Valid Implementation of the Fractional Order Model of Energy Supply-Demand System Samad Noeiaghdam1,2(B) 1
3
and Denis Sidorov1,3
Industrial Mathematics Laboratory, Baikal School of BRICS, Irkutsk National Research Technical University, 83 Lermontov Street, 664074 Irkutsk, Russia {snoei,sidorovdn}@istu.edu 2 South Ural State University, 76 Lenin Prospect, 454080 Chelyabinsk, Russia [email protected] Energy Systems Institute of Russian Academy of Science, 130 Lermontov Street, 664033 Irkutsk, Russia [email protected] Abstract. The aim of this study, is to present the fractional model of energy supply-demand system (ES-DS) based on the Caputo-Fabrizio derivative. The existence and uniqueness of the solution of the fractional model of ES-DS are proved. Also, we know that the obtained results from mathematical models with fractional order are more accurate than usual models. This model is based on four important functions, energy resources demand (ERD) x1 , energy resource supply (ERS) x2 , energy resource import (ERI) x3 , and renewable energy resources (RER) x4 . Combining the homotopy analysis method and Laplace transformation we will present the homotopy analysis transform method (HATM) to solve the nonlinear fractional model. In this method, we will be free to choose the auxiliary parameters and functions in the method, and this is one of the main abilities of the method than other methods. In order to validate the numerical results the CESTAC method is applied which is based on the stochastic arithmetic. Also, instead of applying usual mathematical software, the CADNA library is used. This library should be done on LINUX operating system and the CADNA codes should be written by C, C++, FORTRAN or ADA codes. Keywords: Fractional differential equations · Energy supply-demand system · Caputo-Fabrizio derivative · CESTAC method · CADNA library
1
Introduction
Modelling, solving, and applying the mathematical models to forecast the behaviour of a phenomenon are one of the main interests of applied mathematicians during recent years. Mathematical model of HIV infection, model of smoking habit, model of computer viruses, model of tuberculosis infection, and many other models are only some of these applications [1–3]. c Springer Nature Switzerland AG 2021 A. Strekalovsky et al. (Eds.): MOTOR 2021, CCIS 1476, pp. 493–503, 2021. https://doi.org/10.1007/978-3-030-86433-0_34
494
S. Noeiaghdam and D. Sidorov
Mathematical model of ES-DS is among applicable models because of its importance to control the energy resources. This model was presented by Sun et al. [4–7] and they discussed the stability analysis of the 3-dimensional ESDS model. Also, some numerical illustrations of the 4-dimensional model were presented in [8]. The aim of this paper is to focus on the non-linear mathematical model of ES-DS as ⎧ x1 = γ1 x1 (1 − x1 /W) − γ2 (x2 + x3 ) − γ11 x4 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x2 = −γ3 x2 − γ4 x3 + γ5 x1 [N − (x1 − x3 )], (1) ⎪ ⎪ = γ x (γ x − γ ), x ⎪ 6 3 7 1 8 3 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ x4 = γ9 x1 − γ10 x4 , where ai , zi , si , di , W, N > 0 are positive scalars, and N < W. We define region A as Russian Far East, including Buryat Republic, and region B is Eastern Siberia region which has abundant energy resources, but it has high transportation costs and sevier climate. x1 shows the ERD of A, x2 shows the ERS of region B to A, x3 shows the import rate of energy resource in area A, and finally x4 shows the renewable energy resources of area A. γ1 = 0.09 is the elasticity factor of ERD of region A. γ2 = 0.5 shows the energy supply factor of B, which affects the energy demand of area A. W = 1.8 shows the maximum value of ERD of region A. The valve value is presented by N = 1. γ3 = 0.06; γ4 = 0.082 and γ5 = 0.07 are the coefficient of the energy supply of region B to A, the energy import of region A, and the influence of ERD of A to the rate of energy resources supply of region B, respectively. γ6 = 0.2 is the velocity factor of energy import of area A. γ7 = 0.5 is to show the benefit of imported energy for per unit and γ8 = 0.4 is the cost of imported energy. Region A influence factor of ERD to the rate of applying RER is demonstrated by γ9 = 0.1. The influence factor of RER to the rate of applying RER is displayed by γ10 = 0.06 and finally the influence factor of the RER to the energy resources demand of region A is presented by γ11 = 0.07. According to [6,7], based on the mentioned values, the system (1) will be in the chaotic form. Recently, the fractional models and calculus are among challenging problems in various fields of engineering and applied sciences such as image denoising, model of supply chain financial system, modeling of open circuit voltage of lithium-ion batteries for electric vehicles, solving fractional integral and differential equations, and other [8–12]. Among the semi-analytical and efficient methods, we can find the homotopy analysis method (HAM) which was presented by Liao for the first time. There are many applications of this method such as solving singular integral equations, solving ill-posed problems and other [3,15]. Generally, the mentioned studies are based on the FPA [15] and researchers apply the absolute error to discuss the accuracy of the methods as follows |f (x) − fn (x)| < ε,
or
|fn (x) − fn−1 (x)| < ε,
(2)
Fractional Order Model of Energy Supply-Demand System
495
where f (x) and fn (x) are exact and approximate solutions of the problem and ε is a small positive value. The main problem of these conditions is depending on the exact solution and also the positive value ε in the right-hand side. Why we should solve the problem that we have its exact solution? Also, generally the researchers do not know the optimal value of ε. If we choose a small value as ε we will have huge number of iterations without improving the accuracy of the results and we will have so many extra iterations. Also, if we choose a large value as ε, the algorithm will be stopped at the first step without producing accurate results. Thus we propose the CESTAC method and the CADNA library which are based on the SA and instead of the stopping conditions (2) we have new termination criterion (3) |fn (x) − fn−1 (x)| = @.0, where fn (x) and fn−1 (x) are two successive approximations and @.0 denotes the informatical zero [16–19]. This sign can be produced only in the CESTAC method by the CADNA library and it shows that the number of common significant digits between two successive approximations are almost equal to zero. The CADNA library should be done on Linux operating systems and all CADNA codes should be written using the C, C++, FORTRAN or ADA codes. Recently, many methods have been validated using the CESTAC method and the CADNA library to solve various problems. In [17,18] the numerical validation of quadrature integration rules were discussed, in [15] the validation of the homotopy analysis method, the Admoian decomposition method and the homotopy perturbation method for solving integral equations were illustrated. For more details you can see [15]. The aim of this study is to present the non-linear fractional model of ES-DS which is based on the ERD x1 , ERS x2 , ERI x3 and RER x4 . We will apply the HATM for solving the model. Also, instead of applying the FPA, the CESTAC method and the CADNA library are used. Thus, all the numerical results will be based on the DSA. The main theorem of the CESTAC method is proved to show the equality between the number of common significant digits between two successive approximations and the exact and approximate solutions. Several -curves are plotted to find the convergence region of the method. Applying the CESTAC method and the CADNA library, the optimal iteration, the optimal error, the optimal approximation, and some of numerical instabilities are obtained. It is the main novelties of this research.
2
Model Description
The ES-DS (1) can be written in the fractional form, as follows ⎧ CF β 0 Dt x1 = γ1 x1 (1 − x1 /W) − γ2 (x2 + x3 ) − γ11 x4 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ CF β ⎪ ⎨ 0 Dt x2 = −γ3 x2 − γ4 x3 + γ5 x1 [N − (x1 − x3 )], ⎪ CF β ⎪ ⎪ 0 Dt x3 = γ6 x3 (γ7 x1 − γ8 ), ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ CF β 0 Dt x4 = γ9 x1 − γ10 x4
(4)
496
where
S. Noeiaghdam and D. Sidorov CF β 0 Dt
is the Caputo-Fabrizio derivative and x1 (0) = α1 , x2 (0) = α2 , x3 (0) = α3 , x4 (0) = α4 .
(5)
The existence and uniqueness of fractional order ES-DS (4) was discussed by Noeiagdam and Sidorov in [8]. For this aim we should write the model (4) using the fractional operator D as ⎧ ⎪ x1 (t) − x1 (0) =CF Dtβ γ1 x1 (1 − x1 /W) − γ2 (x2 + x3 ) − γ11 x4 , ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ β CF ⎪ x − γ (t) − x (0) = D x − γ x + γ x [N − (x − x ) , ⎪ 2 2 3 2 4 3 5 1 1 3 t 0 ⎨ (6) ⎪ ⎪ β CF ⎪ γ x (t) − x (0) = D x (γ x − γ ) , ⎪ 3 3 6 3 7 1 8 t 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ x4 (t) − x4 (0) =CF Dtβ γ9 x1 − γ10 x4 , 0 and applying the Caputo-Fabrizio derivative definition [13,14] we get x1 (t) − x1 (0) =
+
2(1 − β) γ1 x1 (t)(1 − x1 (t)/W) − γ2 (x2 (t) + x3 (t)) − γ11 x4 (t) (2 − β)M (β)
2β (2 − β)M (β)
x2 (t) − x2 (0) =
0
γ1 x1 (s)(1 − x1 (s)/W) − γ2 (x2 (s) + x3 (s)) − γ11 x4 (s) ds,
t 0
− γ3 x2 (s) − γ4 x3 (s) + γ5 x1 (s)[N − (x1 (s) − x3 (s))] ds,
2(1 − β) γ6 x3 (t)(γ7 x1 (t) − γ8 ) (2 − β)M (β) +
x4 (t) − x4 (0) =
t
2(1 − β) − γ3 x2 (t) − γ4 x3 (t) + γ5 x1 (t)[N − (x1 (t) − x3 (t))] (2 − β)M (β)
2β + (2 − β)M (β) x3 (t) − x3 (0) =
2β (2 − β)M (β)
t 0
γ6 x3 (s)(γ7 x1 (s) − γ8 ) ds,
2(1 − β) γ9 x1 (t) − γ10 x4 (t) (2 − β)M (β) +
2β (2 − β)M (β)
t 0
γ9 x1 (s) − γ10 x4 (s) ds,
Fractional Order Model of Energy Supply-Demand System
497
where we can define K1 (t, x1 ) = γ1 x1 (t)(1 − x1 (t)/W) − γ2 (x2 (t) + x3 (t)) − γ11 x4 (t), K2 (t, x1 ) = −γ3 x2 (t) − γ4 x3 (t) + γ5 x1 (t)[N − (x1 (t) − x3 (t)), (7) K3 (t, x1 ) = γ6 x3 (t) [γ7 x1 (t) − γ8 ] , K4 (t, x1 ) = γ9 x1 (t) − γ10 x4 (t). Theorem 1. [8] Let K1 , K2 , K3 and K4 be the kernels which are defined in (7). The Lipchitz condition is satisfied for them if 0 ≤ γ1 , γ3 , γ6 , γ7 , γ8 , γ10 < 1. Theorem 2. [8] The nonlinear fractional ES-DS (4) has exact coupled solutions that we can find t0 as 2(1 − β) 2β γ1 + γ2 t0 < 1. (2 − β)M (β) (2 − β)M (β) Theorem 3. [8] We can find the unique solution of model (4) if 2β 2(1 − β) γ1 − γ1 t > 0. 1− (2 − β)M (β) (2 − β)M (β)
3
(8)
CESTAC Method-CADNA Library
If we apply a computer to produce all representable values and collect them in set B, then member S ∗ of set B can be written in the form s∗ ∈ R with α mantissa bits of the binary FPA, as follows S ∗ = s∗ − β2E−α φ,
(9)
where sign of this relation is shown by β, missing segment of the mantissa is demonstrated by 2−α φ and the binary exponent of the result are displayed by E [15,18,19]. Choosing values 24 or 53 as α, we can find the numerical results with single and double precisions. By assuming φ as uniformly distributed casual variable on [−1, 1] and making perturbation on last mantissa bit of s∗ , we can calculate the mean value (μ) and the standard deviation (σ) of S ∗ . If we repeat the mentioned process for k times, because of the quasi Gaussian distribution on Si∗ , i = 1, · · · , k, we can write an equality between μ and the exact s∗ . Algorithm 1 demonstrates the method step by step where τδ is the value of T distribution with k − 1 freedom degree [16,17].
498
S. Noeiaghdam and D. Sidorov
Algorithm 1: Step 1- Make k samples of S ∗ as Φ = {S1∗ , S2∗ , ..., Sk∗ } by constructing perturbation on the last bit of mantissa.
k S∗ Step 2- Find S˜∗ = i=1 i . k
k (S ∗ − S˜∗ )2 2 Step 3- Compute σ = i=1 i . k−1 Step 4- Find the number of common significant digits between S ∗ √ ∗ k S˜ ∗ ˜ . and S , using CS˜∗ ,S ∗ = log10 τδ σ ∗ ∗ Step 5- Show S = @.0 if S˜ = 0, or CS˜∗ ,S ∗ ≤ 0. In the CESTAC method, we do not need to apply this method directly. The CADNA library implements the CESTAC method exactly. Thus the algorithm will be done using the CADNA library automatically. For this aim, we need to use this library on Linux operating system. Also, the mathematical commands are not ready in this library like Mathematica, Maple or MATLAB. All CADNA codes should be written using C, C++, FORTRAN or ADA codes. Using the CESTAC method and the CADNA library we can find the optimal approximation, the optimal error and the optimal step of the method. Also, using this method, the numerical instabilities can be found. In this method and using the CADNA library we will be able to produce the informatical zero sign @.0, but it can not be produced in the FPA. This sign shows that the number of common significant digits for two successive approximations is zero. Using the following definition, we prove a theorem to show the equality between the number of common significant digits of two successive approximations and exact and approximate solutions. This theorem helps to apply the termination criterion (3) instead of conditions (2). Definition 1. [15] Let p1 and p2 be two real numbers. The number of common significant digits of p1 and p2 can be defined as ⎧ p1 + p 2 p1 1 ⎪ ⎪ = log10 p1 = p2 , − , ⎨ Cp1 ,p2 = log10 2(p1 − p2 ) p1 − p2 2 (10) ⎪ ⎪ ⎩ Cp1 ,p1 = +∞. Theorem 4. Assume that
m
m x1,m (t) = j=0 x1,j (t), x2,m = j=0 x2,j (t), x3,m (t) =
m
j=0
x3,j (t), x4,m (t) =
m
j=0
x4,j (t),
are the approximate solutions of the fractional model (4) which are produced by the HATM, then 1 ), Cx1,m ,x1,m+1 = Cx1,m ,x1 + O( m
1 Cx2,m ,x2,m+1 = Cx2,m ,x2 + O( m ),
1 ), Cx3,m ,x3,m+1 = Cx3,m ,x3 + O( m
1 Cx4,m ,x4,m+1 = Cx4,m ,x4 + O( m ).
(11)
Fractional Order Model of Energy Supply-Demand System
4
499
Numerical Discussion
In this section, the numerical results are obtained using the HATM. We know that approximate solutions of the HATM will be based on t and . The parameter can help us to control the rate of convergence of the HATM and we call the convergence control parameter. Based on the obtained approximate solutions and for fixed values of t we can plot some graphs based on that help us to find the convergence regions. Using -curves and finding the parallel parts of the graph with axis x. In Fig. 1, the convergence regions for xi , i = 1, 2, 3, 4, are obtained as follows −1.5 ≤ xi ≤ −0.5,
i = 1, 2, 3, 4,
which are obtained for t = 1 and β = 1. The graphs of error functions based on and t are demonstrated in Fig. 2 for m = 5. In Table 1, the numerical results are obtained using the CESTAC method and applying the CADNA library for = −1. Using this method we can find the optimal iteration of the HATM for solving the fractional model and also the optimal approximations. Thus we do not need to apply the traditional absolute error to show the accuracy of the method. The results show that the optimal iteration is mopt = 10 and the optimal approximations are x1,opt x2,opt x3,opt x4,opt
= 0.233246321053811E + 000, = 0.183637921237464E + 000, = 0.284417954388135E + 000, = 0.214202210593050E + 000.
Figure 3, shows the graph of the approximate solutions based on the optimal iteration m = 10 and the optimal approximations at the time [0, 3]. Based on this figure, we can see that by decreasing the energy resource supply x2 , the energy resources demand x1 is in fix form and it can keep its stability. Thus by decreasing the energy resource supply we can see that energy resource import x3 is decreasing. When at that region they do not want to import energy, thus they need to apply the renewable energy resources and clearly, we can see that at t = 3 the renewable energy resources will be finished. The CADNA library will help us to apply some detections such as selfvalidation detection, mathematical instabilities detection, branching instabilities detection, intrinsic instabilities detection, cancellation instabilities detection. Based on the CADNA report, in the presented algorithm to solve the nonlinear fractional model (4) we have 59 numerical instabilities conclude 3 unstable intrinsic functions and 56 losses of accuracy due to cancellations.
500
S. Noeiaghdam and D. Sidorov
Table 1. The numerical results based on the DSA and using the CESTAC method for t = 1, β = 1 and = −1. m Approximate solutions
Difference between two iterations
0.233499999999999E+000 0.233499999999999E+000 1
0.184400000000000E+000 0.184400000000000E+000 0.284999999999999E+000 0.284999999999999E+000 0.218000000000000E+000 0.218000000000000E+000 0.233169999999999E+000 0.3300000000000E-003
2
0.183696250000000E+000 0.7037499999999E-003 0.284377499999999E+000 0.622499999999E-003
.. .
0.214135000000000E+000 .. .
0.38649999999999E-002 .. .
9
0.183637921237464E+000 0.138E-012
0.233246321053811E+000 0.985E-013 0.284417954388135E+000 0.186E-012 0.214202210593050E+000 0.43E-013 0.233246321053811E+000 @.0 10 0.183637921237464E+000 @.0 0.284417954388135E+000 @.0 0.214202210593050E+000 0.5E-015
Fig. 1. The -curves of x1 (t), x2 (t), x3 (t) for t = 1, β = 1 and m = 5.
Fractional Order Model of Energy Supply-Demand System
501
Fig. 2. The graph of error functions based on and t.
Fig. 3. The approximate solutions for mopt = 10 and = −1.
5
Conclusion
Based on the important role of fractional calculus in different fields, we focused on the fractional model of nonlinear ES-DS for the Russian Far East, including the Buryat Republic, and the Eastern Siberia region. Using this applicable model, we can adjust and control the supply and demand of energy in these regions. Therefore, this model can be exploited for countries with limited energy resources. We applied the Caputo-Fabrizio derivative and its properties to solve the problem. Several theorems were proved to show the existence and uniqueness of the solution. Also, we used the combination of the HAM and Laplace transforms to solve the mentioned model. Instead of applying the FPA and the absolute error or residual error, we applied the DSA and a novel condition to show the accuracy which is based on two successive approximations. For this aim we applied the CESTAC method and the CADNA library. Using this method, we found the optimal iteration of the method and the optimal approximations which were the main novelties of this study. Acknowledgment. The work was financially supported by the Ministry of Education and Science of the Russian Federation.
502
S. Noeiaghdam and D. Sidorov
References 1. Noeiaghdam, S.: Numerical approximation of modified non-linear SIR model of computer viruses. Contemp. Math. 1(1), 34–8 (2019). https://doi.org/10.37256/ cm.11201959.34-48 2. Noeiaghdam, S.: A novel technique to solve the modified epidemiological model of computer viruses. SeMA J. 76(1), 97–108 (2018). https://doi.org/10.1007/s40324018-0163-3 3. Noeiaghdam, S., Suleman, M., Budak, H.: Solving a modified nonlinear epidemiological model of computer viruses by homotopy analysis method. Math. Sci. 12(3), 211–222 (2018). https://doi.org/10.1007/s40096-018-0261-5 4. Sun, M., Tian, L., Xu, J.: Time-delayed feedback control of the energy resource chaotic system. Int. J. Nonlinear Sci. 1(3), 172–177 (2006) 5. Sun, M., Tian, L., Fu, Y.: An energy resources demand-supply system and its dynamical analysis. Chaos, Solitons Fractals. 32(1), 168–180 (2007). https://doi. org/10.1016/j.chaos.2005.10.085 6. Sun, M., Tao, Y., Wang, X., Tian, L.: The model reference control for the fourdimensional energy supply-demand system. Appl. Math. Model. 35, 5165–5172 (2011). https://doi.org/10.1016/j.apm.2011.04.016 7. Sun, M., Jia, Q., Tian, L.: A new four-dimensional energy resources system and its linear feedback control. Chaos Solitons Fractals 39, 101–108 (2009). https:// doi.org/10.1016/j.chaos.2007.01.125 8. Noeiaghdam, S., Sidorov, D.: Caputo-fabrizio fractional derivative to solve the fractional model of energy supply-demand system. Math. Model. Eng. Probl. 7(3), 359–367 (2020). https://doi.org/10.18280/mmep.070305 9. Sidorov, D., et al.: A dynamic analysis of energy storage with renewable and diesel generation using volterra equations. IEEE Trans. Ind. Informat. 16(5), 3451–3459 (2020). https://doi.org/10.1109/TII.2019.2932453 10. Zhang, Q., Cui, N., Li, Y., Duan, B., Zhang, C.: Fractional calculus based modeling of open circuit voltage of lithium-ion batteries for electric vehicles. J. Energy Storage. 27, 100945 (2020). https://doi.org/10.1016/j.est.2019.100945 11. Odibat, Z.: Approximations of fractional integrals and Caputo fractional derivatives. Appl. Math. Comput. 178(215), 527–533 (2006). https://doi.org/10.1016/j. amc.2005.11.072 12. Suleman, M., Lu, D., He, J.H., Farooq, U., Noeiaghdam, S., Chandio, F.A.: Elzaki projected differential transform method for fractional order system of linear and nonlinear fractional partial differential equation. Fractals 26(3), 1850041 (2018). https://doi.org/10.1142/S0218348X1850041X 13. Caputo, M., Fabrizio, M.: A new definition of fractional derivative without singular Kernel. Progr. Fract. Differ. Appl. 1, 73–85 (2015). https://doi.org/10.12785/pfda/ 010201 14. Losada, J., Nieto, J.J.: Properties of the new fractional derivative without singular Kernel. Progr. Fract. Differ. Appl. 1, 87–92 (2015). https://doi.org/10.12785/pfda/ 010202 15. Fariborzi Araghi, M.A., Noeiaghdam, S.: Validation of Numerical Algorithms: Stochastic Arithmetic. Entekhab Bartar Publisher, Iran (2021). ISBN: 978-6226498-09-8 16. Noeiaghdam, S., Sidorov, D., Wazwaz, A. M., Sidorov, N., Sizikov, V.: The numerical validation of the Adomian decomposition method for solving Volterra integral equation with discontinuous kernel using the CESTAC method. Maths. 9, 260 (2021). https://doi.org/10.3390/math9030260
Fractional Order Model of Energy Supply-Demand System
503
17. Noeiaghdam, S., Sidorov, D., Zamyshlyaeva, A., Tynda, A., Dreglea, A.: A valid dynamical control on the reverse osmosis system using the CESTAC method. Maths. 9, 48 (2021). https://doi.org/10.3390/math9010048 18. Noeiaghdam, S., Fariborzi Araghi, M.A.: A novel algorithm to evaluate definite integrals by the Gauss-Legendre integration rule based on the stochastic arithmetic: Application in the model of osmosis system. Math. Model. Eng. Probl. 7(4), 577– 586 (2020). https://doi.org/10.18280/mmep.070410 19. Noeiaghdam, S., Dreglea, A., He, J.H., Avazzadeh, Z., Suleman, M., Fariborzi Araghi, M.A., Sidorov, D., Sidorov, N.: Error estimation of the homotopy perturbation method to solve second kind Volterra integral equations with piecewise smooth kernels: application of the CADNA library. Symmetry 12, 1730 (2020). https://doi.org/10.3390/sym12101730
Author Index
Abotaleb, Mostafa Salaheldin Abdelsalam 301 Alkousa, Mohammad 19 Alkousa, Mohammad S. 86 Anikin, Anton 54 Antonov, Lev 54 Baklanov, Artem 316 Barkova, Maria V. 284 Begicheva, Maria 54 Beznosikov, Aleksandr 19, 71 Boccia, Maurizio 131 Buldaev, Alexander 463 Bykov, Anatoly 342 Chen, Juan 284 Chernykh, K. A. 163 Chirkova, Julia V. 147
Kostenko, Andrey 216 Krasnikov, Alexander 3 Krivorotko, Olga 444 Krutikov, Vladimir 342 Kuchkarov, Ildus 387 Lavlinskii, Sergey 358 Lepikhin, Timur 387 Liberti, Leo 201 Makarovskikh, Tatiana 301 Mancuso, Andrea 131 Markova, Evgeniia 478 Maslovskiy, Alexander 54 Masone, Adriano 131 Mitiai, German 387
Emelichev, Vladimir 372 Erokhin, Vladimir 3
Nasyrov, Ilnar 184 Nikolaev, Andrei 216 Nikolaeva, Anna 54 Nikulin, Yury 372 Noeiaghdam, Samad 493 Novozhilkin, Nikita M. 327
Fabarisova, Aigul I. 175
Orlov, Viktor
Gasnikov, Alexander 19, 54, 71 Gasnikov, Alexander V. 86 Gladin, Egor 19 Gornov, Alexander 54 Goryunova, Natalya 316
Panin, Artem 358 Pankratova, Yaroslavna 403 Pasechnyuk, Dmitry 54 Petrosian, Ovanes 387, 429 Petrosyan, Leon 403 Pichugina, Oksana 233 Pinyagina, Olga 41 Plyasunov, Alexander 358 Polyakov, Ivan 429 Pyatkin, Artem V. 248
Dvurechensky, Pavel
Hohmann, Sören
19, 71
387
Ianovski, Egor 316 Ibragimov, Danis N. 327 Inga, Jairo 387 Kartak, Vadim M. 175 Kazakovtsev, Lev 184, 342 Khvostov, Mikhail 3 Koliechkina, Liudmyla 233 Konnov, Igor 41
184
Ren, Jie 284 Rogozin, Alexander 54, 117 Rozhnov, Ivan 184 Sadiev, Abdurakhmon Servakh, V. V. 163
19, 71
506
Author Index
Shananin, A. A. 417 Sidler, Inna 478 Sidorov, Denis 493 Simanchev, R. Yu. 257 Sobol, Vitaliy 102 Sterle, Claudio 131 Stonyakin, Fedor S. 86
Urazova, I. V. 257 Uroševi´c, Dragan 271 Ushakov, Anton V. 284
Tarasenko, M. V. 417 Titov, Alexander A. 86 Tomi´c, Milan 271 Torishnyi, Roman 102 Tovbis, Elena 342 Trimbach, Ekaterina 117 Trusov, N. V. 417
Yeung, David
Vasilyev, Igor 284 Vlasov, Roman 54 Vlasova, Tatyana 429 Volkov, Vladimir 3 403
Zhang, Dong 284 Zholobov, Yefim 429 Zholobova, Anna 429 Zvonareva, Tatiana 444 Zyatkov, Nikolay 444