227 105 6MB
English Pages [317] Year 2020
Arc-Search Techniques for Interior-Point Methods Yaguang Yang
Office of Research US Nuclear Regulatory Commission Rockville, Maryland, USA
p, p,
A SCIENCE PUBLISHERS BOOK A SCIENCE PUBLISHERS BOOK
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2021 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20200415 International Standard Book Number-13: 978-0-367-48728-7 (Hardback) Th is book contains information obtained from authentic and highly regarded sources. Reasonable eff orts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. Th e authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profi t organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.routledge.com
To my wife, Rong Cai
my son, Minghui Yang
and my daughter, Maria M Yang
Preface
My interest in optimization dates back to the time of my graduate studies between 1982–1985 in China. My master thesis was on the solution of linear programming with all parameters (A, b, c) being in some intervals. A paper based on my thesis entitled “A new approach to uncertain parameter linear programming” was published in the European Journal of Operational Research in 1991, volume 54, pp. 95–114. After I came to the US in 1991, I studied controls in the Electrical Engineering department at the University of Maryland and worked as a control engineer after graduation. However, my interest in optimization never faded. I used optimization techniques to solve my engineering problems, and studied optimization books and read papers in my spare time. In 2007, while reading a paper by Megiddo (“Pathways to the Optimal Set in Linear Programming”) published in Progress in Mathematical Programming, I realized that using arc-search to estimate the central path (the pathway suggested by Megiddo) may be a better method than line search, although the latter has been widely used in the optimization community. This thought inspired me to systematically study the interior-point methods, including some excellent books like Wright’s “Primal-Dual Interior-Point Methods”, Ye’s “Interior-Point Algorithm: Theory and Analysis”, and Roos, Terlaky and Vial’s book “Interior Point Methods for Linear Optimization”. At first, I was amazed by the beautiful convergence analysis which proved that many interior-point algorithms are convergent in polynomial time while there was no simplex algorithm that was proved to be convergent in polynomial time (this is still a fact today). This means that, as the problem size becomes larger, the time to find a solution may become exponentially longer if a simplex algorithm is used. However, if a convergent interior-point algorithm is used, the time to find a solution increases moderately as the problem size increases. Computational experience shows that Mehrotra’s algorithm, one of the most successful interior-point algorithms, which has been implemented in the state-of-the-art interior-point software packages, is competitive with the simplex algorithms for large linear programming problems. Then, to my surprise, behind the huge success of the interior-point method, there were also noticeable gaps between the theoretical analysis of the interior-point
Preface < v
algorithms and the computational experiences of these interior-point algorithms. Researchers noted that, in order to make an interior-point algorithm efficient, three strategies are very important: (a) use higher order derivatives to approximate the central path, (b) search for the optimizer in a wide neighborhood of the central path, and (c) start from an infeasible initial point. However, convergence analysis shows that using any of these proven strategies will increase the polynomial bound of these algorithms. This clearly contradicts the definition of the polynomial bound. In addition, using a combination of these proven strategies will increase the polynomial bound even more (making things worse). To further exacerbate the problem, Mehrotra’s predictor-corrector (MPC) algorithm (the most popular and most efficient interiorpoint algorithm around 2010) uses all of the strategies listed above but does not have a convergence proof. Since it fails to converge, it doesn’t have polynomiality, a critical issue with the simplex method (see Klee and Minty “How good is the simplex algorithm?” in Inequalities, O Shisha, eds., Academic Press, Providence, RI, 1972, pp. 159–175”). This lack of polynomiality motivated Khachiyan’s ellipsoid method for linear programming and was the main argument for the development of interiorpoint algorithms. This observation motivated me to work on the interior-point method using the arc-search technique. I hoped that by using some kind of arc-search technique I would be able to solve the dilemma. My intuition was to use a part of an ellipse arc to approximate the central path because by changing the major and minor axes and the center, the shape and location of the ellipse changes. I figured out some necessary details and posted my first result on the website Optimization Online in 2009, http://www.optimization-online.org/DB HTML/2009/08/2375.html. The early support I received was from my Ph.D adviser, Professor Andre L Tits, and Professor Tamas Terlaky. They both read my posted article and provided valuable feedback. It took some time for journals to accept my first two papers on polynomial arc-search interior-point algorithms for linear programming and convex quadratic programming. Due to the different paths in the review processes, the paper on arcsearch interior-point algorithms for convex quadratic programming was published before the paper on arc-search interior-point algorithms for linear programming. These two papers showed that, by using the arc-search technique, the higher order interior-point method results in the best polynomial bound for any interior-point algorithms for linear programming. This solved part of the dilemma that we discussed earlier. The main criticism of the reviewers for these two papers was the lack of extensive numerical testing to provide support for the aforementioned theoretical analysis. Therefore, I decided to test the arc-search technique using Netlib benchmark linear programming problems. Since the majority of the Netlib benchmark problems do not have an interior-point, the infeasible starting point method is the only choice. I decided, therefore, to write two codes, one implements Mehrotra’s method and the other implements a similar algorithm. The only difference between the two algorithms is that the former uses line search while the latter uses arc-search. I got support from Dr. Chris Hoxie, at the Office of Research in the U.S. NRC, who provided the
vi < Arc-Search Techniques for Interior-Point Methods
computational environment for this research. The extensive testing on the Netlib benchmark problems was very promising, it showed that arc-search algorithm is more efficient and more robust than Mehrotra’s predictor-corrector (MPC) method. This result motivated me to devise some arc-search algorithms that use all three proven strategies and to show their convergence. Not long after I finished the test on the Netlib benchmark problems, I developed two algorithms, both are more efficient and more robust than Mehrotra’s method and both converge in polynomial time. In particular, one of the algorithms achieves the best polynomial bound for all existing interior-point algorithms for linear programming (feasible or infeasible). This solves all the dilemmas mentioned above. I didn’t submit these results until the benchmark test paper was accepted. While I was focused on arc-search techniques for linear and convex quadratic programming, researchers in several groups extended the arc-search technique to more general problems, such as linear complementarity problem (LCP), symmetric cone optimization problem (SCP), and positive semi-definite programming problem (SDP). Unsurprisingly, all algorithms in these extensions have obtained better polynomial bounds than their counterparts that use the traditional line search method. However, algorithms in these extensions have not considered all three proven strategies at the same time and have not achieved the best polynomial bound that the best algorithm for linear programming has achieved (in my opinion, they should be able to achieve the best polynomial bound that the best algorithm for linear programming has achieved). In addition, these algorithms for LCP, SCP, and SDP have not been tested as extensively as similar algorithms designed for linear programming. I also believe that there will be more meaningful extensions of the arcsearch techniques to other optimization areas, such as general convex programming and nonlinear programming. This motivated me to write a book about the arc-search technique; to discuss the merits of the arc-search in linear programming, LCP, SCP, and SDP; to introduce some of the latest developments in applying arc-search technique to other optimization problems; to encourage more researchers to identify new applications based on existing results. If some readers decide to work on this new area after reading this book, I will have achieved my goal. Like other similar projects, I have received support from people at different stages of development. First, I am indebted to Professor Tits at University of Maryland, and Professor Terlaky at Lehigh University, for reading my very first result and providing useful comments. I would like to thank Dr. Hoxie of U.S. NRC, for providing the computational environment for this research. I also would like to thank Professor Yamashita at Tokyo Institute of Technology, who co-authored two papers on arc-search interior-point algorithms, one of them is included in this book. I am grateful to Professor Zhang at China Three Gorges University, who kindly sent me his very recent papers which are included in this book as well.
Contents Preface
iv
PART I: LINE SEARCH INTERIOR-POINT
METHODS FOR LINEAR PROGRAMMING
1.
Introduction 1.1 1.2 1.3
1.4
2.
3.
About the Notations Linear Programming Convex Optimization 1.3.1 Convex sets, functions, and optimization 1.3.2 Convex quadratic programming 1.3.3 Monotone and mixed linear complementarity problem 1.3.4 Positive semidefinite programming Nonlinear Programming 1.4.1 Problem description 1.4.2 Karush-Kuhn-Tucker conditions
3
5 6 14 15 17 17 18 19 19 19
A Potential-Reduction Algorithm for LP
21
2.1 2.2 2.3 2.4
Preliminaries A Potential-Reduction Algorithm Convergence Analysis Concluding Remarks
21 23 25 31
Feasible Path-Following Algorithms for LP
32
3.1 3.2 3.3
34 39 42
A Short-Step Path-Following Algorithm A Long-Step Path-Following Algorithm MTY Predictor-Corrector Algorithm
viii < Arc-Search Techniques for Interior-Point Methods
4.
3.4 A Modified Short Step Path-Following Algorithm 3.4.1 A modified interior point algorithm for LP 3.4.2 Implementation and numerical test 3.4.2.1 Implementation 3.4.2.2 Test on Netlib problems 3.5 A Higher-Order Feasible Interior-Point Algorithm 3.6 Concluding Remarks
45 46 53 53 54 55 56
Infeasible Interior-Point Method Algorithms for LP
57
4.1 4.2 4.3 4.4
58 63 73 75
A Short Step Infeasible Interior-Point Algorithm A Long Step Infeasible Interior-Point Algorithm Mehrotra’s Infeasible Interior-Point Algorithm Concluding Remarks
PART II: ARC-SEARCH INTERIOR-POINT
METHODS FOR LINEAR PROGRAMMING
5.
6.
7.
A Feasible Arc-Search Algorithm for LP
79
5.1 A Polynomial Arc-Search Algorithm for LP 5.1.1 Ellipsoidal approximation of the central path 5.1.2 Search along the approximate central path 5.1.3 A polynomial arc-search algorithm 5.2 Numerical Examples 5.2.1 A simple illustrative example 5.2.2 Some Netlib test examples 5.3 Concluding Remarks
80 81 82 83 93 93 94 96
A MTY-Type Infeasible Arc-Search Algorithm for LP
97
6.1 An Infeasible Predictor-Corrector Algorithm 6.2 Polynomiality 6.3 Concluding Remarks
98 103 112
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
113
7.1 Installation and Basic Structure of CurveLP 7.2 An Infeasible Long-Step Arc-Search Algorithm for Linear Programming 7.3 Implementation Details 7.3.1 Initial point selection 7.3.2 Pre-process 7.3.3 Post-process 7.3.4 Matrix scaling 7.3.5 Removing row dependency from A
114 115 121 122 123 128 129 129
Contents < ix
7.3.6 Linear algebra for sparse Cholesky matrix 7.3.7 Handling degenerate solutions 7.3.8 Analytic solution of αx and αs 7.3.9 Step scaling parameter 7.3.10 Terminate criteria Numerical Tests 7.4.1 A simple illustrative example 7.4.2 Netlib test problems Concluding Remarks
130 131 132 135 136 136 136 138 140
An O( nL) Infeasible Arc-Search Algorithm for LP
142
8.1 8.2 8.3 8.4
144 153 163 166 166 166 167 167 167 167 168 168 170 171 172 172 178
7.4 7.5 8.
8.5 8.6
Preliminaries A Basic Algorithm The O( nL) Algorithm Implementation Details 8.4.1 Default parameters 8.4.2 Initial point selection 8.4.3 Pre-process and post-process 8.4.4 Matrix scaling 8.4.5 Removing row dependency from A 8.4.6 Linear algebra for sparse Cholesky matrix 8.4.7 Handling degenerate solutions 8.4.8 Analytic solution of αk 8.4.9 Selection of centering parameter σk 8.4.10 Rescaling αk 8.4.11 Terminate criteria Numerical Tests Concluding Remarks
PART III: ARC-SEARCH INTERIOR-POINT
METHODS: EXTENSIONS
9.
An Arc-Search Algorithm for Convex Quadratic Programming
181
9.1 9.2
182 183 184 196 202 206 207 207 208 211
9.3 9.4
Problem Descriptions An Arc-search Algorithm for Convex Quadratic Programming 9.2.1 Predictor step 9.2.2 Corrector step Convergence Analysis Implementation Details 9.4.1 Termination criterion 9.4.2 Finding initial (x 0, λ 0, s 0 ) ∈ N 2 (θ) 9.4.3 Solving linear systems of equations 9.4.4 Duality measure reduction
x < Arc-Search Techniques for Interior-Point Methods
9.5 9.6
Numerical Examples 9.5.1 A simple example 9.5.2 Test on problems in [53] Concluding Remarks
10. An Arc-Search Algorithm for QP with Box Constraints 10.1 10.2 10.3 10.4
Problem Descriptions An Interior-Point Algorithm for Convex QP with Box Constraints Convergence Analysis Implementation Issues 10.4.1 Termination criterion 10.4.2 A feasible initial point 10.4.3 Step size 10.4.4 The practical implementation 10.5 A Design Example 10.6 Concluding Remarks
11. An Arc-Search Algorithm for LCP 11.1 11.2 11.3 11.4
Linear Complementarity Programming An Arc-search Interior-point Algorithm Convergence Analysis Concluding Remarks
214 214 216 217 218
219 221 243 247 247 248 248 253 253 255 256
257 257 260 267
12. An Arc-Search Algorithm for Semidefinite Programming
268
12.1 Preliminary 12.2 A Long Step Feasible SDP Arc-Search Algorithm 12.3 Concluding Remarks
269 272 291
References
292
Index
303
LINE SEARCH INTERIOR-POINT METHODS FOR LINEAR PROGRAMMING
I
Chapter 1
Introduction
Most numerical optimization algorithms are composed of two major steps in every iteration: (a) from the current iterate, find a direction that will improve the objec tive function, and (b) determine a step-size in this direction to obtain the next iter ate that can actually improve the objective function. However, people later realized that searching along a straight line may not be a good idea in some applications, because many applications are nonlinear. Therefore, arc-search methods were devel oped, analyzed, and tested. For example, G. P. McCormick [87] proposed a method that searches for the optimizer along an arc for unconstrained optimization prob lems. For equality constrained optimization problems, if both objective function and equality constraints are smooth, noticing that the smooth equality constraints form a Riemannian manifold, Luenberger realized in 1972 [79] that the constraints set is similar to a surface in three dimensional space and search for the optimizer might be carried out along the geodesic on the Riemannian manifold. This idea was then fully developed by Gabay [39] and extended by Smith [127] and Yang [150] among others. Some good resources on this topic are [1, 31]. However, in this book, the proposed arc-search uses a different technique which is specifically developed for interior-point method of numerical optimization. Interior-point method was originally developed for linear programming (LP) problems after Karmarkar published his seminal work in 1984 [57]1 . The tremendous success of the interior-point method is on two folds: Many interior-point algorithms are proved to be polynomial for linear programming problems, an attractive feature that simplex algorithms do not have; some interior-point algorithms, such as Mehro tra’s predictor-corrector (MPC) method, are computationally competitive compared to the simplex algorithms, at least for large linear programming problems. 1 The interior-point method is related to the log barrier function which was proposed probably as early as 1955 by Frisch [38] and to the affine scaling used by Dikin [27], but their works were not widely noticed by researchers until interior-point method became a popular tool in numerical optimization in the 1980s.
4
� Arc-Search Techniques for Interior-Point Methods
By the middle of the 1990s, interior-point method was viewed as a mature disci pline in optimization when three monographs were published [10, 144, 164]. How ever, there was a big gap for interior-point method between the convergence theory and the efficiency of practical computation. In theory, the efficiency of an optimiza tion algorithm can be measured by the worst case iteration bounds. Although the ma jority interior-point algorithms have polynomial bounds, these bounds are in different orders. The lowest polynomial bounds increase (become worse) in the following or 2 : (1) the first-order method with feasible starting point in narrow neighborhood, der√ O( nL); (2a) the second-order method with feasible starting point in narrow neigh borhood, O(n3/4 L3/2 ), which may be comparable to (2b) the first-order method with feasible starting point in wide neighborhood, O(nL), which has the same complexity as (2c) the first-order method with infeasible starting point in narrow neighborhood, O(nL); (3) the first-order method with infeasible starting point in wide neighbor hood, O(n2 L); and (4) the second-order method with infeasible starting point in wide neighborhood (Mehrotra’s method), no polynomial bound (no convergence result). However, the observed computational efficiency is exactly in the opposite direction, i.e., Mehrotra’s method is the most efficient, and the efficiency decreases from (4) to (3) to (2) and to (1). This dilemma has been known since the very beginning of the development of the interior-point methods [144]. Todd, in 2002 [133], raised two questions to the researchers in the optimization community: “Can we give a theoreti cal explanation for the difference between worst-case bounds and observed practical performance? Can we find a theoretically and practically efficient way to reopti mize?”. However, this gap was not closed until the 2010s when this author published a series of papers to resolve the dilemma. The main idea involved in these papers is a new arc-search technique. This book has three major parts. The first part, including Chapters 2, 3, and 4, introduces some important algorithms during the development of the interior-point method around the 1990s, most of them being widely known. The main purpose of this part is (a) to explain the dilemma described above by analyzing these al gorithms’ polynomial bounds and providing the computational experience associ ated with these algorithms, and (b) summarize three techniques that were proved to enhance the computational efficiency for the interior-point algorithms. These tech niques are (1) using higher-order derivative to approximate central path, (2) starting from infeasible initial point, and (3) searching for the optimizer in a wide neighbor hood. The second part, including Chapters 5, 6, 7, and 8, discusses how to solve the dilemma step by step. Chapter 5 considers a higher-order interior-point algorithm that uses arc-search technique and shows that higher-order interior-point algorithm can achieve the same polynomial complexity as the first-order interior-point algo rithms if an arc-search technique is used. Chapter 6 goes beyond by considering an algorithm that uses both higher-order derivatives and starts from infeasible initial point and proves that the algorithm still can achieve a good polynomial bound if the 2 This complexity order will be discussed in Chapters 2, 3, 4, and summarized in the last section of Chapter 4.
Introduction
•
5
arc-search technique is used. To verify that the arc-search strategy can indeed en hance the computational efficiency, Chapter 7 describes an algorithm that is almost identical to the Mehrotra’s algorithm, except that one uses arc-search and the other uses line search. These two algorithms are tested against each other using the widely accepted Netlib benchmark problems. To be fair, both algorithms start with the same infeasible initial point, use the same pre-process and post-process method, use the same parameters, and terminate with the same stopping criterion. Extensive numeri cal result shows that the arc-search algorithm is indeed more efficient and robust than Mehrotra’s algorithm. Chapter 8 introduces two new arc-search infeasible interiorpoint algorithms and their convergence is analyzed. These algorithms have all the features that have been demonstrated to be good for computational efficiency: (a) use second-order derivative, (b) start from an infeasible starting point, and (c) search in the widest neighborhood. Yet, we show that these algorithms are convergent in polynomial time and more robust and efficient than Mehrotra’s method. In addition, one algorithm achieves the best polynomial bound for all interior-point algorithms, feasible or infeasible, for the linear programming problem. Because of the success of the interior-point method in linear programming, there are many extensions of the interior-point method to solve convex quadratic pro gramming (CQP), linear complementarity problem (LCP), semidefinite program ming problem (SDP), general convex programming problem (CP), and nonlinear programming problem using interior-point method. Part three of this book, including Chapters 9, 10, 11, and 12, extends arc-search techniques to these problems. Conver gence analysis shows that these algorithms are superior in complexity bounds to their counterparts, the interior-point algorithms that use traditional line search. Unlike the interior-point algorithms for linear programming, these arc-search algorithms have not been extensively tested and their computationally superiority to their counter parts has not been demonstrated. The author believes that there is a lot of room for researchers to work on arc-search algorithms in these areas. Part of the purpose of this book is to provide the readers the latest development in this direction so that they will be familiar with this area and will be able to contribute in this direction. To have a big picture on the optimization research, we provide some fundamen tals in the next few sections and point out where the interior-point method may apply.
1.1
About the Notations
Throughout the book, normal letters are used for scalars, bold capital letters are used for matrices, and bold small letters are used for vectors. For real scalar a, Ial is used to represent the smallest integer that is larger than a and laJ is used to represent the largest integer that is smaller than a. The set of all real numbers is denoted as R, The set of all vectors in n-dimensional space is denoted as Rn . The set of all matrices in m × n dimension is denoted as Rm×n . For two vectors x and s, their Hadamard (element-wise) product is denoted by x ◦ s, the ith component of x is denoted by xi , the element-wise inverse of x is denoted by x−1 if min |xi | > 0, the element-wise division of the two vectors is denoted by s−1 ◦ x, or x ◦ s−1 , or xs if min |si | > 0, the Euclidean norm of x is denoted by IxI or explicitly by IxI2 , the 1-norm of x is
6
• Arc-Search Techniques for Interior-Point Methods
denoted by -x-1 , the ∞-norm of x is denoted by -x-∞ . For p ≥ 1, -x-1 , -x- p , and -x-∞ are defined as -x-1 =
i=1,...,n
|xi |,
-x- p = |x1 | p + |x2 | p + . . . + |xn | p
(1.1a) 1/p
,
-x-∞ = max |xi |.
(1.1b) (1.1c)
The identity matrix of any dimension is denoted by I, the vector of all ones with appropriate dimension is denoted by e, the transpose of matrix A is denoted by AT , ˆ , the ith row of A is denoted by Ai,· , a basis for the null space of A is denoted by A the jth column of A is denoted by A·, j . The matrix norms -A-1 , -A-2 , and -A-∞ are all induced norms which are defined as -Ax- p , x=0 -x- p
-A- p = sup
(1.2)
where 1 ≤ p ≤ ∞. It is straightforward to see from the definition that -Ax- p ≤ -A- p -x- p .
(1.3)
where A ∈ Rm×n . For a matrix H ∈ Rn×n , Tr(H) = ni=1 Hii is named the trace of H, if H is positive semidefinite, it is denoted by H C 0 , and if H is positive definite, then it is denoted by H > 0. To make the notation simple for block column vectors, [xT , sT ]T is denoted by (x, s). For x ∈ Rn , a related diagonal matrix is denoted by X ∈ Rn×n whose diagonal elements are components of the vector x. For any vector x in any iterative algorithm, its initial point is denoted by x0 , the point at the kth iteration is denoted by xk ; for any scalar a in any iterative algorithm, its initial point is denoted by a0 , the point at the kth iteration is denoted by ak . Finally, the empty set is denoted by ∅.
1.2
Linear Programming
We consider the linear programming (LP) in the standard form: min cT x,
subject to Ax = b, x ≥ 0,
(1.4)
where A ∈ Rm×n , b ∈ Rm , c ∈ Rn are given, and x ∈ Rn is the vector to be optimized. The dual programming (DP) of (1.4) is: max bT λ ,
subject to AT λ + s = c, s ≥ 0,
(1.5)
with dual variable vector λ ∈ Rm , and dual slack vector s ∈ Rn . A different presentation for linear programming is the canonical form which can be presented as follows: min cT x,
subject to Ax ≥ b, x ≥ 0,
(1.6)
Introduction
•
7
where A ∈ Rm×d , b ∈ Rm , c ∈ Rd are given, and x ∈ Rd is the vector to be optimized. The dual of (1.6) is: max bT y,
subject to AT y ≤ c, y ≥ 0,
(1.7)
with dual variable vector y ∈ Rm . By introducing slack variables, the canonical form can easily be converted into the standard form. While most computational algorithms are based on standard form, canonical form is convenient for discrete mathematics and computational geometry, such as the study for polytopes, which is closely related to the complexity of the simplex method. Linear programming has been one of the most extensively studied problems in contemporary mathematics since it was formulated and solved by Dantzig in 1940s [24]. Dantzig’s simplex method is not only mathematically elegant but also compu tationally very efficient in practice. Noticing that the simplex method searches for optimizers from a vertex to vertex along the edges of the polytope, Hirsch conjec tured that the upper bound of the diameter of the polytopes, which is also the upper bound for simplex methods to find the optimizer in the worst case3 , is m − d for m > d ≥ 2 [25]. After a 50-year effort by many researchers, Santos [121] showed that this conjecture is incorrect. Most experts now believe that the best upper bound for simplex method to find the optimizer in the worst case may be bounded by a polynomial. However, Klee and Minty in 1972 [65] showed that Dantzig’s pivot rule (that governs how to move iterate from a vertex to the next vertex) needs exponen tially many iterations to find the optimizer in the worst case. Different pivot rules, such as Zadeh’s rule [169], were proposed with the hope to achieve a polynomial upper bound of iterations in the worst case. Unfortunately, people have shown that all popular pivot rules in simplex method cannot find the optimizer in polynomial iterations in the worst case [108], including Zadeh’s rule [37]. Researches realized that solving this dilemma can be a very difficult problem (see [126]). Another direction is to find other methods that are good in theory, meaning that their worst case computational complexities are bounded above by some polynomial. There was a surprising breakthrough in 1979: Khachiyan proved that the ellipsoid method for linear programming finds the optimizers in polynomial time [59]. How ever, researchers quickly realized that Khachiyan’s ellipsoid algorithm is not com putationally competitive in practice [11]. Finding an algorithm polynomial in theory and competitive in practice became important. A different method, the interior-point method, was then developed by Karmarkar [57]. Karmarkar’s work inspired many researchers and the interior-point method has achieved tremendous success. How ever, as it was pointed out early, there was still a major gap that had to be closed up. A major task of the book is to present some recent algorithms that are not only polynomial but also computationally efficient. We start with some basic and important results. Let (x∗ , λ ∗ , s∗ ) be the a primaldual optimal solution of (1.4) and (1.5), by applying the Karush-Kuhn-Tucker (KKT) 3 Hirsch’s conjecture actually states that the diameter of the polytope described by (1.6) is less than m − d.
8
• Arc-Search Techniques for Interior-Point Methods
conditions4 to LP, we have
T ∗
xi∗ s∗i
Ax∗ = b
(1.8a)
∗
(1.8b) (1.8c) (1.8d)
A λ +s = c = 0, i = 1, . . . , n (x∗ , s∗ ) ≥ 0.
Since (1.4) is a convex problem , (x∗ , λ ∗ , s∗ ) is the optimal solution of (1.4) if and only if all the KKT conditions (1.8) hold. Let P∗ be the set of all optimal primal solutions of x∗ and D∗ be the set of all optimal dual solutions of (λ ∗ , s∗ ). Denote the feasible set F as a collection of all points that meet the constraints of LP and DP. F = {(x, λ , s) | Ax = b, AT λ + s = c, (x, s) ≥ 0},
(1.9)
and the strictly feasible set F o as a collection of all points that meet the constraints of LP and DP and are strictly positive F o = {(x, λ , s) | Ax = b, AT λ + s = c, (x, s) > 0}.
(1.10)
Similarly, x ≥ 0 is said to be a primal feasible solution if Ax = b; x > 0 is said to be a strictly primal feasible solution if Ax = b; s ≥ 0 is said to be a dual feasible solution if AT λ + s = c for some λ ; and s > 0 is said to be a strictly dual feasible solution if AT λ + s = c for some λ . The following result is about the existence and boundedness of the solution set of P∗ and D∗ . Theorem 1.1 Suppose that the primal and dual problems are feasible; that is, F �= ∅. If the dual problem has a strictly feasible point, then the primal optimal solution set P∗ is not empty and bounded. Similarly, if the primal problem has a strictly feasible point, then the set {s∗ | (λ ∗ , s∗ ) ∈ D∗ for some λ ∗ ∈ Rm } is not empty and bounded. Proof 1.1 The first claim is proved in [144]. The second part is proved as follows. Let (s¯, λ¯ ) with s¯ ≥ 0 be a feasible dual solution and xˆ > 0 be a strictly primal solution. Then, 0 ≤ s¯T xˆ = (c − AT λ¯ )T xˆ = cT xˆ − λ¯ T b. (1.11) Consider the set T defined by T = {s | AT λ + s = c, s ≥ 0, λ T b ≥ λ¯ T b}.
(1.12)
4 KKT conditions are named after Karush, Kuhn, and Tucker. This result will be presented in Section 1.4.
Introduction
•
9
The set T is not empty because s¯ ∈ T . T is also closed by definition. For any s ∈ T , using (1.11), we have n n i=1
xˆi si = xˆ T s = xˆ T (c − AT λ ) = cT xˆ − λ T b ≤ cT xˆ − λ¯ T b = s¯T xˆ
(1.13)
Therefore, si ≤
1 T s¯ xˆ , i = 1, 2, . . . , n, xˆi
which means that �s�∞ ≤ max i
1 T s¯ xˆ xˆi
Since s is an arbitrary element of T , it can be concluded that T is bounded. By definition, it is clear that (λ ∗ , s∗ ) ∈ T . Since T is bounded and closed, the set {s∗ | (λ ∗ , s∗ ) ∈ D∗ for some λ ∗ ∈ Rm } is not empty and bounded. This completes the proof. Corollary 1.1 If F o = ∅, then the set {(x∗ , s∗ ) | x∗ ∈ P∗ , (λ ∗ , s∗ ) ∈ D∗ for some λ ∗ ∈ Rm } is bounded. In view of condition (1.8c), it follows that T
x∗ s∗ = 0.
(1.14)
This condition is called complementarity condition. If all conditions of (1.8) except (1.8c) hold, then we have 0 ≤ xT s = xT (c − AT λ ) = xT c − bT λ . Hence, xT c ≥ bT λ and xT s is called the duality gap. In the discussion of the rest of the book, along with duality gap, we will use duality measure which is defined as µ=
xT s n
(1.15)
for linear programming problem (1.4). When conditions (1.8a), (1.8b), and (1.8d) hold, and the duality gap is zero, an optimal solution is found. The key idea of “feasible interior-point method” is to maintain the conditions (1.8a), (1.8b), and (1.8d) and reduce the duality gap in all iterations until an optimizer is found. The term “feasible” is used in the method be cause the conditions (1.8a) and (1.8b) are maintained in all iterations, while the term “interior-point” is used in the method because the strict inequality of (1.8d) is main tained in all iterations. If the conditions of (1.8a) and (1.8b) are not enforced but only strict inequality is maintained in all iterations before the optimal solution is found,
10
• Arc-Search Techniques for Interior-Point Methods
then the method is called “infeasible interior-point method”. Both strategies will be discussed in this book. It will be seen later that the convergence properties of the interior-point algo rithms are closely related to a classical result, Goldman-Tucker Theorem. Before we state and provide a proof for the Goldman-Tucker Theorem, we introduce the sepa rating hyperplane theorem and Farkas Lemma. Theorem 1.2 Let C and D be two closed convex sets in Rn with at least one of them bounded, and assume that C ∩ D = ∅. Then, there exist a ∈ Rn , and b ∈ R such that aT x > b, ∀x ∈ D and aT x < b, ∀x ∈ C. Proof 1.2
(1.16)
Define Dist(C, D) = inf Iu − vI, s.t. u ∈ C, , v ∈ D.
The infimum is achievable and is positive. Let c ∈ C, d ∈ D be the points that achieve the infimum, and IdI2 − IcI2 a = d − c, b = . 2 Note that a �= 0 because C ∩ D = ∅. We show that the separating hyperplane is the function f (x) = aT x − b, and f (x) > 0, ∀x ∈ D and Note that f
f (x) < 0, ∀x ∈ C
(1.17)
c + d IdI2 − IcI2 c+d − = 0. = (d − c)T 2 2 2
We show only that f (x) > 0, ∀x ∈ D because the proof for f (x) < 0, ∀x ∈ C is identical. Denoting g(x) = Ix − cI2 , we have \g(x) = 2(x − c). Suppose by contra diction, there exists a d¯ ∈ D with f (d¯ ) ≤ 0. Then, (d − c)T d¯ −
IdI2 − IcI2 ≤ 0. 2
Using this formula, we have \g(d)T (d¯ − d) = 2(d − c)T (d¯ − d) = 2(−IdI2 + (d − c)T d¯ + cT d) 2
[Since d = � c]
2
≤ 2(−IdI2 + 1d1 −1c1 + cT d) 2 = −Id − cI2 < 0.
This means that d¯ − d is a strictly decent direction for g at d. Hence, there is a α¯ > 0 such that for all α ∈ (0, α¯ ), g(d + α(d¯ − d)) < g(d)
Introduction
•
11
i.e., Id + α(d¯ − d) − cI2 < Id − cI2 . This contradicts to the assumption that d is the closest point to c. The following corollary will be used in the proof for the Farxas Lemma. Corollary 1.2 Let C be a closed convex set and x ∈ Rn be a point not in C. Then, x and C can be separated by a hyperplane. Lemma 1.1 Let A be a m × n matrix, and b ∈ Rm . Then exactly one of the following statements holds, but not both: (I) ∃y ∈ Rm : yT A ≥ 0, yT b < 0, or (II) ∃x ∈ Rn : Ax = b, x ≥ 0.
(1.18)
Proof 1.3 We will first show that if (II) is true, then (I) necessarily false. Assume Ax = b for some x ≥ 0. If yT A ≥ 0, then for x ≥ 0, we have that yT Ax ≥ 0. Since Ax = b, this implies that yT b ≥ 0, and thus, it cannot be that both yT Ax ≥ 0 and yT b < 0. Now we will prove that if (II) is false then (I) is necessarily true. Define: C = {q ∈ Rm | ∃x ≥ 0, Ax = q}. Notice that C is a convex set: for q1 , q2 ∈ C, there exist x1 , x2 such that q1 = Ax1 and q2 = Ax2 , and for any λ ∈ [0, 1] we have that λ q1 + (1 − λ )q2 = λ Ax1 + (1 − λ )Ax2 = A(λ x1 + (1 − λ )x2 ), and hence, q1 + (1 − λ )q2 ∈ C. Since (II) is false b ∈ / C. From the separating hyper plane theorem, we know there exists y ∈ Rm \ {0} s.t. yT q ≥ 0 and yT b < 0, for all q ∈ C. Since q = Ax that implies that ∀x ≥ 0 we have that yT A ≥ 0 and yT b < 0, as required. A simple extension of this lemma is provided in [144]. Corollary 1.3 For each pair of matrices G ∈ R p×n and H ∈ Rq×n and each vector d ∈ Rn , either (I) ∃x ∈ Rn : Gx ≥ 0, Hx = 0, dT x < 0 or p q (II) ∃y ∈ R , z ∈ R : GT y + HT z = d, y ≥ 0.
(1.19)
Let (x∗ , λ ∗ , s∗ ) be any solution of (1.8). Let index sets B, N be defined as B = { j ∈ {1, . . . , n} | x∗j �= 0}.
(1.20)
12
• Arc-Search Techniques for Interior-Point Methods
N = { j ∈ {1, . . . , n} | s∗j = � 0}.
(1.21)
Goldman-Tucker [40] proved the following result. Theorem 1.3 There exists at least one primal solution x∗ and one dual solution of (λ ∗ , s∗ ), for the corresponding index sets B and N , the following conditions hold simultaneously: B ∪ N = {1, . . . , n} and B ∩ N = ∅; i.e., x∗ + s∗ > 0 and x∗ ◦ s∗ = 0. Proof 1.4 It is easy to see that B ∩ N = ∅. Otherwise, there is a pair of xi∗ > 0 and s∗i > 0 which is contradict to the condition of (1.8c). Let T be the set of indices in {1, 2, . . . , n} that do not intersect to either B or N . The proof is to show that T = ∅. Since B ∩ N = ∅, B ∪ N ∪ T is a partition of {1, 2, . . . , n}. Let AB and AT denotes the submatrices of columns of A that correspond to B and T . Select any index i ∈ T . The goal is show that i must be in either B or N , depending on whether there exists a vector w satisfying the following relations: AT·,i w < 0, −AT·, j w ≥ 0, ATB w = 0,
for j ∈ T \ {i}
(1.22)
where A·,i denotes the ith column of A. Suppose that w satisfying (1.22) exists. Let (x∗ , λ ∗ , s∗ ) be a primal-dual solution for which s∗N > 0, define the vector (λ¯ , s¯) as λ¯ = λ ∗ + εw, s¯ = c − AT λ¯ = s∗ − εAT w, and choose ε > 0 small enough that s¯i = s∗i − εAT·,i w > 0,
s¯ j = s∗j − εA·T, j w ≥ 0,
for j ∈ T \ {i} s¯B = s∗B = 0,
s¯N = s∗N − εATN w > 0. It follows from these relations that (λ¯ , s¯) is feasible for the dual problem. In fact, it is / B and also optimal, since any primal solution vector x∗ must have xi∗ = 0 for all i ∈ s¯T x∗ = 0. Therefore, by the definition of N , we must have i ∈ N . Suppose, alternatively, that no vector w satisfies (1.22). Denote G = −AT·, j , j ∈ T \ {i}, H = ATB , d = A·,i , and x = w. In view of Corollary 1.3, we may claim that the following system must have a solution: � − A·, j y j + AB z = A·,i , y j ≥ 0, for all j ∈ T \ {i}. (1.23) j∈T
Let a vector v ∈ R|T | be defined as vi = 1, v j = y j , for all j ∈ T \ {i},
Introduction
•
13
the formula of (1.23) can be rewritten as AT v = AB z, v ≥ 0, vi > 0.
(1.24)
Let x∗ be a primal solution for which x∗B > 0 and define x¯ by x¯ B = x∗B − εz, x¯ T = εv, x¯ N = 0.
(1.25)
Using (1.24) and (1.25), we have Ax¯ = AB x¯ B + AT x¯ T + AN x¯ N = AB (x∗B − εz) + AT εv = AB x∗B = b.
(1.26)
Since for ε small enough, x¯ ≥ 0, x¯ is a feasible solution and, in fact, an optimal solution, because x¯ N = 0. Since vi = 1 and x¯i = ε > 0, it must have i ∈ B by (1.20). Therefore, by the definition of T , it must have T = ∅. An optimal solution with the properties described in the Goldman-Tucker theo rem is strictly complementary. The Goldman-Tucker theorem declares that there is always a strictly complementary optimal solution for linear programming, but this is not the case for general optimization problems, even for quadratic programming problem [49]. Before we state an important theorem, we provide a simple lemma here. Lemma 1.2 For all β > −1, we have with equality if and only if β = 0.
log(1 + β ) ≤ β
Proof 1.5 For β ≥ 1, clearly, the inequality holds. For −1 < β < 1, it follows from Taylor expansion that log(1 + β ) = β −
β2 β3 β4 + − +... ≤ β 2 3 4
with equality if and only if β = 0. The following theorem will be used repeatedly to prove the polynomiality for many algorithms in this book. Theorem 1.4 Let ε ∈ (0, 1) be given. Suppose that an algorithm for solving (1.8) generates a se quence of iterations that satisfies � � δ µk+1 ≤ 1 − ω µk , k = 0, 1, 2, . . . , (1.27) n
14
• Arc-Search Techniques for Interior-Point Methods
for some positive constants δ and ω. Suppose that the starting point (x0 , λ 0 , s0 ) satisfies (1.28) µ0 ≤ 1/ε. Then there exists an index K with K = O(nω log(1/ε)) such that µk ≤ ε for ∀k ≥ K. Proof 1.6
By taking logarithms on both sides of (1.27), we have log(µk+1 ) ≤ log 1 −
δ + log(µk ). nω
Repeatedly using this formula and using (1.28) yields log µk ≤ k log 1 −
δ δ + log µ0 ≤ k log 1 − ω + log(1/ε). n nω
Using Lemma 1.2 yields log µk ≤ k −
δ + log(1/ε). nω
Therefore, µk ≤ ε is met if we have k −
δ + log(1/ε) ≤ log(ε) = − log(1/ε). nω
This inequality holds for all k that satisfy k≥K=
1 2 ω n log . δ ε
This finishes the proof.
1.3
Convex Optimization
Convex optimization was studied at least as early as the 1950s, when Markowitz [86] formulated portfolio selection problem as a convex quadratic optimization problem. In the 1960s, Rosen considered pattern separation by using convex programming [119]. A very nice book on convex analysis was published in the 1970s by Rockafel lar [115]. The discipline became popular in the 1990s, when Boyd, EI Ghaoui, Feron, and Balakrishnan indicated [14] that many control system design problems are re lated to convex optimization problems, Alizadeh, Nesterov and Nemirovski [2, 101] found out that the very successful interior-point method for linear programming can
Introduction
•
15
be a powerful tool for convex optimization problems. This book, however, does not discuss general convex optimization problems as some excellent books [15, 100] that cover this topic are already available. This book will discuss some special convex optimization problems, including convex quadratic programming, monotone linear complementarity problem, and positive semidefinite programming, which are closely related to linear programming problem and very efficient algorithms are available.
1.3.1 Convex sets, functions, and optimization The set U ∈ Rn is convex if u, v ∈ U ⇒ tu + (1 − t)v ∈ U,
for all t ∈ [0, 1].
(1.29)
Given a convex set U and a function f , we say that f is convex if u, v ∈ U ⇒ f (αu+(1−α)v) ≤ α f (u)+(1−α) f (v),
for all α ∈ [0, 1]. (1.30)
� v and α ∈ (0, 1). We say that f is strict convex if the inequality in (1.30) is strict if u = Special examples are (a) linear functions are convex; (b) if U is an open convex set and f : U → R is twice continuously differentiable on U with a positive semidefinite Hessian for all u ∈ U, then the function f is convex; if the Hessian is positive definite for all u ∈ U, then the function f is strictly convex. In this book, we will consider a linear constrained convex optimization problem: min f (x) subject to Gx = c, Hx ≥ d, x ∈ U, x
(1.31)
where f is a smooth convex function, U ∈ Rn is convex set, G and H are matrices, and c and d are vectors. Denote the convex constraint set Ω = {x | Gx = c, Hx ≥ d, x ∈ U }. A vector x∗ is a local solution of the problem (1.31) if x∗ ∈ Ω and there is a neigh borhood N of x∗ such that f (x) ≥ f (x∗ ) for x ∈ N ∩ Ω. A vector x∗ is a strictly local solution of the problem (1.31) if x∗ ∈ Ω and there is a neighborhood N of x∗ such that f (x) > f (x∗ ) for x ∈ N ∩ Ω with x = � x∗ . A vector x∗ is a global solution of the problem (1.31) if x∗ ∈ Ω such that f (x) ≥ f (x∗ ). A vector x∗ is a strictly global solution of the problem (1.31) if x∗ ∈ Ω such that f (x) > f (x∗ ) for x ∈ Ω with x �= x∗ . Applying the KKT conditions (to be discussed at the end of the chapter) to the problem (1.31) gives the necessary conditions for the optimal solution(s) x∗ of the problem (1.31): \ f (x∗ ) − GT y − HT z = 0,
T
∗
Gx∗ = c, Hx∗ ≥ d, z ≥ 0,
z (Hx − d) = 0.
(1.32a) (1.32b) (1.32c) (1.32d) (1.32e)
16
• Arc-Search Techniques for Interior-Point Methods
One of the main results for the convex optimization problem (1.31) is the follow ing theorem: Theorem 1.5 For the convex programming problem (1.31), the KKT conditions (1.32) are sufficient for x∗ to be a global solution. That is, if there are vectors y and z such that (1.32) hold, then x∗ is a global solution of (1.31). If, in addition, the function f is strictly convex on its convex region, then any local solution is a uniquely defined global solution. Proof 1.7 We prove the first claim and leave the second claims for readers. Given x∗ together with vectors y and z that satisfy the KKT conditions, we show that f (x∗ + v) ≥ f (x∗ )
(1.33)
for all v such that x∗ + v is feasible. Because f is convex, we have for any vector v with x∗ + v ∈ U that f (x∗ + αv) = f ((1 − α)x∗ + α(x∗ + v)) ≤ (1 − α) f (x∗ ) + α f (x∗ + v), which is equivalent to 1 [ f (x∗ + αv) − f (x∗ )] ≤ f (x∗ + v) − f (x∗ ). α By taking the limit as α → 0+ and using the differentiability of f , we have f (x∗ + v) ≥ f (x∗ ) + vT \ f (x∗ ).
(1.34)
Since the feasible set is convex, we can express any feasible set as x∗ + v. Since Gx∗ = c and G(x∗ + v) = c, the vector v must satisfy Gv = 0. Consider the active components of the inequality constraints Hx∗ = d, that is, the index i for which Hi,· x∗ = di , because x∗ + v is feasible, we must have Hi,· v ≥ 0. For the inactive components, the rows i for which Hi,· x∗ > di , we have from the conditions (1.32d) and (1.32e) that zi = 0. Inserting the vector v into the formula (1.34), from (1.32a), we obtain HT z) vT \ f (x∗ ) = vT (GT y +� � zi Hi,· v + zi Hi,· v. = yT Gv + i active i inactive Since Gv = 0 and zi = 0 for i inactive components, the first and the third terms on the right-hand side are zeros. The middle term is non-negative, since zi and Hi,· v are both non-negative when i is an active index. Hence, vT \ f (x∗ ) ≥ 0, we have from (1.34) that (1.33) holds. This shows that x∗ is a global solution, as claimed.
Introduction
•
17
1.3.2 Convex quadratic programming In Chapters 9 and 10, we will consider the convex quadratic programming (QP) in the standard form: (QP)
min subject to
1 T x Hx + cT x, 2 Ax = b, x ≥ 0,
(1.35)
where 0 � H ∈ Rn×n is a positive semidefinite matrix, A ∈ Rm×n , b ∈ Rm , c ∈ Rn are given, and x ∈ Rn is the vector to be optimized. Associated with the quadratic programming is the dual programming (DQP) that is also presented in the standard form: (DQP)
max subject to
1 − xT Hx + bT λ , 2 −Hx + AT λ + s = c, s ≥ 0, x ≥ 0,
(1.36)
where λ ∈ Rm is the dual variable vector, and s ∈ Rn is the dual slack vector. It is well-known that this problem can be solved by simplex method [143] and other alternatives [106]. However, in this book, we will apply the arc-search interiorpoint method to this problem and show that arc-search interior-point algorithms can achieve the best polynomial bound as the interior-point algorithms can achieve for linear program problem. Again, the interior-point algorithms for quadratic programming are based on the KKT conditions (1.49) applied to the QP, which is given as follows: Ax = b
(1.37a)
A λ + s − Hx = c xi si = 0, i = 1, . . . , n (x, s) ≥ 0.
(1.37b) (1.37c) (1.37d)
T
It is worthwhile to point out that the linear programming is a special case of the convex quadratic programming. Indeed, setting H = 0 in (1.35) and (1.36), the QP and its dual DQP reduce to LP and its dual DP. In addition, setting H = 0 in (1.37), the KKT conditions for QP problem reduce to the KKT conditions for LP problem.
1.3.3 Monotone and mixed linear complementarity problem The monotone linear complementarity problem (LCP) [21] is to solve the following systems: s = Mx + q, (x, s) ≥ 0, xT s = 0, (1.38)
where 0 � M ∈ Rn×n , q ∈ Rn are given, x ∈ Rn and s ∈ Rn are variables to be determined. A more general LCP is the mixed LCP [69] which is given as � � � �� � � � s M11 M12 x q1 = + , (x, s) ≥ 0, xT s = 0, (1.39) λ q2 0 M21 M22
18
• Arc-Search Techniques for Interior-Point Methods
where x, s ∈ Rn and λ ∈ Rm are vectors to determined. It is easy to see that the monotone LCP is a special case of mixed LCP. From the KKT conditions (1.37) for QP, one can rewrite it as follows: s 0
=
H A
−AT 0
x λ
+
c −b
, (x, s) ≥ 0, xT s = 0,
(1.40)
which is the mixed LCP problem . Therefore, convex quadratic programming is also a special case of mixed LCP problem.
1.3.4 Positive semidefinite programming Another special case of convex programming is the positive semidefinite program ming (SDP), which has many interesting applications to physical problems, to control system design problems, and to other areas of mathematical programming [3, 15, 107]. Let S n be the set of real symmetric n × n matrices. For G ∈ S n and H ∈ S n , their inner product on S n is defined by G • H = Tr(GT H),
(1.41)
where Tr(M) is the trace of the matrix M. Using this notation, we can define the SDP as follows: min C • X, s.t. Ai • X = bi , i = 1, . . . , m, X t 0,
X∈S n
(1.42)
where C ∈ S n , Ai ∈ S n for k = 1, . . . , m, and b = (b1 , . . . , bm ) ∈ Rn are given, and 0 j X ∈ S n is the primal variables to be optimized. Due to the symmetric restriction, there are n(n + 1)/2 variables (not n2 variables) in X. The dual problem of (1.42) is given by m m yi Ai + S = C, S t 0, (1.43) minn bT y, s.t. X∈S
i=1
where y = [y1 , . . . , ym ]T ∈ Rm and S ∈ S n are the dual variables. Again, there are n(n+ 1)/2 variables in S because of the symmetric restriction. Without loss of generality, we assume that Ai is linear independent. Applying the KKT conditions (1.49) to (1.42) gives the KKT conditions for the positive semidefinite problem: Ai • X = Tr(Ai X) = bi , i = 1, . . . , m, X t 0 m m yi Ai + S = C, S t 0,
(1.44a) (1.44b)
i=1
XS = 0.
(1.44c)
The KKT system has n(n + 1) + m equations and exact n(n + 1) + m variables.
Introduction
1.4
•
19
Nonlinear Programming
The first order optimality conditions are probably the most important result for the general constrained optimization problems. These conditions are applicable to many optimization problems, such as a linear optimization problem which has linear objec tive function and linear constraints, a convex quadratic optimization problem which has a convex quadratic objective function and linear constraints, and a general non linear optimization problem which has general nonlinear objective function and non linear constraints. Although the first order optimality conditions for the general con strained optimization problems are necessary conditions, these conditions are neces sary and sufficient conditions for a linear optimization problem, a convex quadratic optimization problem, and other convex optimization problems which are considered extensively in this book.
1.4.1 Problem description Consider the general optimization problem: minx∈Rn f (x) subject to ci (x) = 0, i ∈ E ci (x) ≥ 0, i ∈ I where f is the objective function and ci are the constraint functions; these functions are all smooth, real-valued on a subset of Rn , and E and I are two finite sets of indices for equality constraints and inequality constraints, respectively. The feasible set Ω is defined as the set of all points x that satisfy all the constraints; i.e., Ω = {x | ci (x) = 0, i ∈ E; ci (x) ≥ 0, i ∈ I }
(1.45)
so that one can rewrite (1.45) as min f (x).
(1.46)
Ω
A vector x∗ is a local solution of the problem (1.45) if x∗ ∈ Ω and there is a neigh borhood N of x∗ such that f (x) ≥ f (x∗ ) for x ∈ N ∩ Ω. A vector x∗ is a strictly local solution of the problem (1.45) if x∗ ∈ Ω and there is a neighborhood N of x∗ such that f (x) > f (x∗ ) for x ∈ N ∩ Ω with x �= x∗ . A vector x∗ is a global solution of the problem (1.45) if x∗ ∈ Ω such that f (x) ≥ f (x∗ ). A vector x∗ is a strictly global solution of the problem (1.45) if x∗ ∈ Ω such that f (x) > f (x∗ ) for x ∈ Ω with x �= x∗ .
1.4.2 Karush-Kuhn-Tucker conditions To state the first order optimality conditions, we introduce the Lagrangian function for the constrained optimization problem (1.45) which is defined as � L(x, λ ) = f (x) − (1.47) λi ci (x). i∈E ∪I
20
• Arc-Search Techniques for Interior-Point Methods
The active set at any feasible x is the union of the set E and a subset of I with the indices of the active inequality constraints given by A(x) = E ∪ {ci (x) = 0 | i ∈ I}.
(1.48)
The first order optimality conditions are directly related to the linearly independent constraint qualification (LICQ), which is defined as follows: Definition 1.1 Given the point x∗ and the active set A(x∗ ) defined by (1.48), the linear independent constraint qualification is said to be held if the set of active con straint gradients {\ci (x∗ ), i ∈ A(x∗ )} is linearly independent. Note, that if this condition holds, none of the active constraint gradients can be zero. Now we are ready to state the first-order necessary conditions. Theorem 1.6 Suppose that x∗ is a local solution of (1.45) and that the LICQ holds at x∗ . Then, there is a Lagrange multiplier vector λ ∗ , with components λ ∗i , i ∈ E ∪ I such that the following conditions are satisfied at (x∗ , λ ∗ ) \x L(x∗ , λ ∗ ) = 0, ci (x∗ ) = 0, ∀i ∈ E, ci (x∗ ) ≥ 0, ∀i ∈ I, λi∗ ≥ 0, ∀i ∈ I, ∗ ∗ λi ci (x ) = 0, , ∀i ∈ E ∪ I.
(1.49a)
(1.49b)
(1.49c)
(1.49d)
(1.49e)
The proof of Theorem 1.6 is very technical, therefore, is omitted. Readers who are interested in the proof are referred to [106]. The conditions of (1.49) are widely known as the Karush-Kuhn-Tucker conditions or KKT conditions for short. The KKT conditions were first proved by Karush in his master thesis in 1939 [58] and rediscov ered by Kuhn and Tucker in 1951 [75]. A special solution is important and deserves its own definition: Definition 1.2 Given a local solution x∗ of (1.45) and a vector λ ∗ satisfying (1.49), we say that the solution is strictly complementary if exactly one of λi∗ and ci (x∗ ) is zero for each index i ∈ I. In other words, λi∗ > 0 for each i ∈ I ∩ A(x∗ ).
Chapter 2
A Potential-Reduction Algorithm for LP
Potential-reduction method is closely related to the logarithmic barrier function, which is usually attributed to Frisch [38]. Logarithmic barrier function was exten sively studied by Fiacco and McCormick in their book [36] for the nonlinear pro gramming. Since the calculation of logarithmic barrier function becomes increas ingly difficulty as the barrier parameter approaches to zero, the method of using logarithmic barrier function was not widely accepted. However, after Karmarkar an nounced his first polynomial interior-point algorithm for linear programming [57] which uses a logarithmic barrier function, research of interior-point methods us ing some logarithmic barrier functions became popular. Two classes of potentialreduction methods, primal potential-reduction method and primal-dual potentialreduction method, were proposed. Since primal-dual potential functions have a more balanced consideration on both primal and dual variables, they are now considered to be better than their counterpart. Therefore, this chapter discusses only a potentialreduction method that uses a logarithmic barrier function involving both primal and dual variables [130, 135, 72, 163].
2.1
Preliminaries
One of the main ideas of an interior-point algorithm is to search for the optimizer from inside of F 0 , which reduces the duality gap in every iteration. For this reason, we need to know if the iterate stays in a bounded set.
22
• Arc-Search Techniques for Interior-Point Methods
Theorem 2.1 Suppose that F 0 �= ∅. Then, for each K ≥ 0, the set
{(x, s) | (x, λ , s) ∈ F for some λ , and xT s ≤ K}
(2.1)
is bounded. Proof 2.1 Let (x¯ , λ¯ , s¯) be any vector in F o and (x, λ , s) be any vector in F with xT s ≤ K. Since A(x¯ − x) = 0, and AT (λ¯ − λ ) + (s¯ − s) = 0, we have (x¯ − x)T (s¯ − s) = −(x¯ − x)T AT (λ¯ − λ ) = 0.
Using xT s ≤ K yields
x¯ T s + s¯T x ≤ K + x¯ T s¯.
(2.2)
Since (x¯ , s¯) > 0, denoting c = mini=1,2,...,n min(x¯i , s¯i ) > 0 and using (2.2), we have ceT (x + s) ≤ K + x¯ T s¯. This implies that for i = 1, 2, . . . , n, 1 0 ≤ xi ≤ (K + x¯ T s¯), c 1 0 ≤ si ≤ (K + x¯ T s¯). c
(2.3)
This completes the proof. When the strict feasible set of the linear programming is not empty, we know that the level set defined by (2.1) is bounded. Starting from an interior-point, the best way to search for the next iterate is to follow a curve named central path [88] because this strategy avoids the numerical difficulty when the iterate moves too close to the boundary before an optimizer is approached. The central path C is an arc of the points (x(t), λ (t), s(t)) ∈ F o parametrized by a positive scalar t. All points on C satisfy the perturbed KKT system for some t > 0: Ax = b,
(2.4a)
A y + s = c, x ◦ s = te, (x, s) > 0.
(2.4b)
T
(2.4c) (2.4d)
Remark 2.1 The important feature of the points on central path C is that the pairwise product xi si are identical for all i. A linear system of equations related to the perturbed KKT system is as follows: ⎡ ⎤⎡ ⎤ ⎤ ⎡ Δxk 0 A 0 0 ⎦, ⎣ 0 AT I ⎦ ⎣ Δλ k ⎦ = ⎣ 0 (2.5) S 0 X σk µk e − xk ◦ sk Δsk
A Potential-Reduction Algorithm for LP
�
23
where σk is the centering parameter1 and µk is the duality measure. Quite a few algorithms are related to the solution of this linear system of equations.
2.2
A Potential-Reduction Algorithm
A popular primal and dual potential function is defined by fρ (x, s) = ρ log(xT s) −
n �
log(xi si )
(2.6)
i=1
where ρ > n is a weight parameter. It is easy to see that the first part is related to the duality gap. When an optimizer is found, this term goes to zero. The second term is a barrier which prevents the iterate from going to zero too fast before an optimizer is approached. The second term is also related to centrality, which can easily be seen if we rewrite fρ (x, s) as follows: fρ (x, s) = ρ log(xT s) −
n �
log(xi si )
i=1
= (ρ − n) log(xT s) −
n
�
log
i=1
xi si + n log n. xT s/n
(2.7)
Notice that the first term in (2.7) is the objective with the weight of (ρ − n). The second term is the penalty for xi si deviating from the duality measure xT s/n (better if no xi si is much smaller than the average of µ = xT s/n). If xT s > 0 but xi si = 0, then fρ (x, s) will blow up. The following lemma shows that the summation of the second and third terms has a low bound of n log n. Therefore, fρ (x, s) → −∞ only if the first term goes to −∞, which means that duality gap is reduced to zero. Lemma 2.1 If the vector pair (x, s) > 0, then we have − The equality holds if and only if Proof 2.2 have
Let β =
− 1 The
n � i=1
si xi µ
n �
log
i=1
si xi µ
xi si ≥ 0. xT s/n
(2.8)
= 1.
− 1 or equivalently 1 + β =
si xi µ .
In view of Lemma 1.2, we
n
log
� � si xi � xi si ≥− − 1 = −(n − n) = 0. T µ x s/n i=1
role of the centering parameter has been fully discussed in [144].
(2.9)
24
• Arc-Search Techniques for Interior-Point Methods
The equality holds if and only if 0 = β =
si xi µ
− 1, i.e.,
si x i µ
= 1.
This lemma is useful to derive an important relation between fρ (x, s) and duality measure µ = xT s/n. Lemma 2.2 For any (x, λ , s) ∈ F o , it must have µ ≤ exp Proof 2.3
fρ (x, s) . ρ −n
(2.10)
By the definition of fρ (x, s), we have fρ (x, s) use Lemma 2.1
= (ρ − n) log(xT s) −
n � i=1
log
xi si T x s/n
+ n log n
≥ (ρ − n) log µ + (ρ − n) log n + n log n ≥ (ρ − n) log µ.
(2.11)
Inequality (2.10) follows immediately from the above inequality. Lemma 2.2 indicates that if fρ (x, s) → −∞, then µ → 0. For (x, λ , s) ∈ F, when µ → 0, in view of KKT conditions (1.8) (which are necessary and sufficient for linear programming), an optimal solution of the linear programming problem (1.4) is found. Now we are ready to present the potential-reduction algorithm. Algorithm 2.1 Data: ε > 0, ρ > n, σ = n/ρ, initial point (x0 , λ 0 , s0 ) ∈ F o , and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1: If µk < ε and (xk , sk ) > 0 hold, stop. Otherwise continue. Step 2: Solve the linear system of equations (2.5) for (Δxk , Δλ k , Δsk ). Step 3: Calculate αmax = sup{α ∈ [0, 1] | (x, s) + α(Δxk , Δsk ) ≥ 0} Step 4: Select αk such that αk = arg
min
α∈(0,αmax )
fρ (xk + αΔxk , xk + αΔxk ).
Step 5: Set (xk+1 , λ k+1 , sk+1 ) = (xk , λ k , sk ) + αk (Δxk , Δλ k , Δsk ). end (for)
A Potential-Reduction Algorithm for LP
2.3
•
25
Convergence Analysis
We first show that if (x0 , λ 0 , s0 ) ∈ F o , then, all iterates will stay in F o . Lemma 2.3 Let (xk , λ k , sk ) ∈ F o , then the next iterate generated by Algorithm 2.1 satisfies (xk+1 , λ k+1 , sk+1 ) ∈ F o . Proof 2.4
From Step 5 of Algorithm 2.1, we have (xk+1 , λ k+1 , sk+1 ) = (xk , λ k , sk ) + αk (Δxk , Δλ k , Δsk ).
In view of the first row of (2.5), it follows Axk+1 = A(xk + αk Δxk ) = Axk = b. In view of the second row of (2.5), it follows AT λ k+1 + sk+1
= AT (λ k + αk Δλ k ) + (sk + αk Δsk ) = (AT λ k + sk ) + αk (AT Δλ k + Δsk ) = (AT λ k + sk ) = c.
(2.12)
Steps 3 and 4 of Algorithm 2.1 guarantees that (xk+1 , sk+1 ) > 0. This completes the proof. Next we show that fρ (x, s) is reduced by at least a fix amount of δ > 0, which is independent of n, at every iteration of Algorithm 2.1. First, we introduce a technical result. Lemma 2.4 Let t ∈ Rn and ItI∞ ≤ τ < 1. We have n �
−
i=1
log(1 + ti ) ≤ −eT t +
ItI2 . 2(1 − τ)
(2.13)
If τ = 0.5, it is reduced to −
n � i=1
log(1 + ti ) ≤ −eT t + ItI2 .
(2.14)
Proof 2.5 Let f (t) = − ni=1 log(1 + ti ), which is well defined and smooth for ItI∞ ≤ τ < 1. Then, it follows f' (t) = −
1 1 ,...,− , 1 + tn 1 + t1
26
• Arc-Search Techniques for Interior-Point Methods
and f'' (t) = diag
1 . (1 + ti )2
Using Taylor’s theorem, we have 1 f (t) = f (0 ) + t f ( 0) + 2 T '
�
1
tT f'' (αt)tdα.
0
Note, that f (0 ) = 0 and tT f' (0) = −eT t, we need only to estimate the third term. Since n n � � ti2 ti2 ItI2 tT f'' (αt)t = ≤ = , (1 − ατ)2 (1 + αti )2 (1 − ατ)2 i=1
we have � 0
1
tT f'' (αt)tdα ≤ ItI2
i=1
� 0
1
dα 1 �α=1 ItI2 21 = t = I I 2 τ 1 − ατ α=0 1 − τ (1 − ατ)
Combining formulas above yields the inequality of (2.13). Substituting τ = 0.5 into (2.13) gives (2.14). To simplify the notation, we will drop the superscript k. Some formulas derived from (2.5) are also needed. Pre-multiplying ΔxT in the second row of (2.5) yields ΔxT Δs = 0.
(2.15)
By examining the last row of (2.5), it is easy to see sT Δx + xT Δs = −xT s + σ µn = −xT s(1 − σ ).
(2.16)
Lemma 2.5 Let αc satisfy αc max(IX−1 ΔxI∞ , IS−1 ΔsI∞ ) = 0.5. For α ∈ (0, αc ] ⊂ (0, αmax ) ⊂ (0, 1], we have fρ (x + αΔx, s + αΔs) − fρ (x, s) ≤ α f1 + α 2 f2 where f1 := ρ
sT Δx + xT Δs − eT (X−1 Δx + S−1 Δs), xT s f2 := IX−1 ΔxI2 + IS−1 ΔsI2 .
Proof 2.6
Using (2.7) and (2.15), we have
fρ (x + αΔx, s + αΔs) − fρ (x, s) n � = ρ log(x + αΔx)T (s + αΔs) − log(xi + αΔxi )(si + αΔsi ) i=1
A Potential-Reduction Algorithm for LP
−ρ log(xT s) + = ρ log
n � i=1
27
log(xi si )
i=1 n
n
i=1
i=1
si + αΔsi xi + αΔxi � (x + αΔx)T (s + αΔs) � log log − − T xi x s si
= ρ log 1 + α −
n �
•
sT Δx + xT Δs
xT s n � Δxi Δsi log 1 + α − log 1 + α xi si
(2.17)
i=1
In view of (2.16), it has β := α
sT Δx + xT Δs = −α (1 − σ ) > −1. xT s
Therefore, the first term meets the condition of Lemma 1.2. By the definition of αc , |Δsi | i| we know that α |Δx |xi | < 1 and α |si | < 1. Therefore, applying Lemmas 1.2 and 2.4 to (2.17) yields fρ (x + αΔx, s + αΔs) − fρ (x, s) ≤
ρα
sT Δx + xT Δs − α eT X−1 Δx + S−1 Δs xT s
+α 2 IX−1 Δx||2 + IS−1 ΔsI2 := α f1 + α 2 f2 . This completes the proof. The next task is to estimate the size of f1 and f2 . To simplify the analysis and to get a good polynomial bound, we select σ=
n n 1 √ ≈ 1− √ , = ρ n+ n n
(2.18)
which was proposed in [72]2 . Therefore, the third row of (2.5) can be rewritten as n SΔx + XΔs = −XSe + µe. (2.19) ρ Denote V = (XS)1/2 , v = Ve, vmin = min vi = min i=1,...,n
and D = X1/2 S−1/2 , u = −v +
i=1,...,n
nµ −1 V e. ρ
√ xi si .
(2.20)
(2.21)
2 Computational experience shows that using adaptively selected σ with small σ in later iterations will be much more efficient than this fixed one.
28
• Arc-Search Techniques for Interior-Point Methods
The following lemma provides the formulas which will be used in the remainder of this chapter. Lemma 2.6 By the definitions of (2.20) and (2.21), the following formulas hold. IvI2 = xT s = nµ, X = VD, S = VD−1 , SΔx + XΔs = Vu, D−1 Δx + DΔs = u, −1 X Δx + S−1 Δs = V−1 u, ID−1 ΔxI, IDΔsI ≤ IuI.
(2.22)
(2.23)
(2.24)
(2.25)
(2.26)
Proof 2.7 Formulas in (2.22) are directly followed from the definitions of (2.20) and (2.21). The formula of (2.23) is easy to see from (2.19). Pre-multiplying V−1 on both sides of (2.19) gives (2.24). Pre-multiplying (XS)−1 on both sides of (2.19) shows (2.25). To show (2.26), we use (2.15) and (2.24) to get the following relations. IuI2 = = = ≥
ID−1 Δx + DΔsI2 ID−1 ΔxI2 + 2ΔxT D−1 DΔs + IDΔsI2 ID−1 ΔxI2 + IDΔsI2 max{ID−1 ΔxI2 , IDΔsI2 }.
Taking the square root on both side proves (2.26). The estimation of f1 and f2 follows from Lemma 2.6. Lemma 2.7 Let f1 and f2 be defined as in Lemma 2.5, then, the following relations hold. f1 ≤ − Proof 2.8
ρ IuI2 , nµ
f2 ≤
IuI2 . v2min
From Lemma 2.5, we have
f1 use (2.23) and (2.25) use (2.20) use (2.21)
= xρT s eT (SΔx + XΔs) − eT (X−1 Δx + S−1 Δs) ρ T = nµ e Vu − eT V−1 u
ρ T −1 = nµ (eT V − nµ
ρ e V )u ρ nµ −1 T = nµ (v − ρ V e) u ρ −1 T = − nµ (−v + nµ ρ V e) u ρ 2 = − nµ IuI
(2.27)
A Potential-Reduction Algorithm for LP
•
29
In view of Lemma 2.5, we have f2 use (2.22) use (2.27) and (2.20)
= IX−1 ΔxI2 + IS−1 ΔsI2 = IV−1 D−1 ΔxI2 + IV−1 DΔsI2 ≤ IV−1 I2 (ID−1 ΔxI2 + IDΔsI2 ) ≤
1u12 . v2min
This completes the proof. Since f1 < 0 and f2 > 0, from Lemma 2.5, as α becomes small enough, it must have fρ (x + αΔx, s + αΔs) ≤ fρ (x, s). We need to determine what is the value of the α for the potential function to decrease and what amount is the decrease. Since f1 and f2 depend on IuI, we need to estimate the size of u in terms of some constants and vmin . Lemma 2.8 √ For any (x, λ , s) ∈ F o and for ρ > n + n, it must have √ 3nµ IuI ≥ 2vmin ρ Proof 2.9
(2.28)
Using (2.22), we have vT (V−1 e − v/µ) = n − n = 0.
(2.29)
For the vector ve − µ1 v, the norm IV−1 e −
1 2 vI µ
is bounded below by taking the square of any component of v that corresponds to vmin , i.e., 1 1 vmin 2 − . (2.30) IV−1 e − vI2 ≥ µ µ vmin Also, the following equalities hold: √ n+ n 2 vI nµ √ √ n + n T −1 n+ n 2 T −1 2 = IV eI − 2 e V v+ v v nµ nµ √ √ n+ n T n+ n 2 = IV−1 eI2 − 2 e e+ nµ nµ nµ √ √ n + n n2 + 2 nn + n = IV−1 eI2 − 2 + µ nµ IV−1 e −
use of (2.22)
30
• Arc-Search Techniques for Interior-Point Methods
= IV−1 eI2 −
n−1 µ
(2.31)
By using (2.21) and V−1 v = e, we have 2
IuI2 n2ρµ 2 use of (2.22) use of elementary algebra √ ρ > n+ n use (2.31) use of (2.29) use of (2.30)
ρ = IV−1 e − nµ vI2 2
ρ T −1 = IV−1 eI2 + n2ρµ 2 IvI2 − 2 nµ e V v 2
ρ = IV−1 eI2 + nµ − 2 µρ
(ρ−n− = IV−1 eI2 − n−1
µ + ≥ IV−1 eI2 − n−1
µ
√ 2 √ √ n) +2 n(ρ−n− n) nµ
√
= IV−1 e − n+nµ n vI2 = IV−1 e − µ1 vI2 + I √1nµ vI2 ≥ = =
≥
2
T
v v − vmin + nµ 2 µ � � 2 µ 1 − v + µ min 2 v µ � min � 2 2 µ 1 3 µ − v + min 2 2 2v 4 µ v min 1 vmin
min
3 . 4v2min
Rearranging the inequality proves the lemma. We are now ready to prove the convergence of the potential reduction algorithm. Lemma 2.9 vmin . Then, we have Let α¯ = 21u1 fρ (x + α¯ Δx, s + α¯ Δs) − fρ (x, s) < −0.15 := δ .
(2.32)
Proof 2.10 First, it is easy to see that α¯ satisfies the condition of Lemma 2.5 because of the following relation: α¯ max(IX−1 ΔxI∞ , IS−1 ΔsI∞ ) ≤
¯ max(IX−1 ΔxI2 , IS−1 ΔsI2 ) α vmin IuI ≤ = 0.5, 2IuI vmin
where the last inequality follows from the proof of the last inequality in Lemma 2.7. Now, using Lemmas 2.5, 2.7, and 2.8, we have fρ (x + α¯ Δx, s + α¯ Δs) − fρ (x, s)
α¯ f1 + α¯ 2 f2 v2 IuI2 vmin ρ ≤ − IuI2 + min 2 2 2IuI nµ 4IuI vmin
≤
A Potential-Reduction Algorithm for LP
≤ ≤
•
31
ρvmin 1 IuI + 2nµ 4 √ ρvmin 3nµ 1 − + < −0.15. 2nµ 2vmin ρ 4 −
This completes the proof. Finally, we have the convergence result for the potential-reduction interior-point algorithm. Theorem 2.2 Given a starting point (x0 , λ 0 , s0 ) ∈ F o , all iterates will stay in F o , and √ there is a δ = −0.15 such that (2.32) holds. Moreover, for ε ∈ (0, 1) and ρ − n > n, we have an index K defined by √ fρ (x0 , s0 ) n K= + | log ε| (2.33) δ δ such that ∀k ≥ K. (2.34) µk ≤ ε, Proof 2.11 The first claim is shown in Lemma 2.3. The second claim is proved in Lemma 2.9. √ The last claim follows the following argument. From (2.10), (2.32) and ρ > n + n, it follows log µ
≤
≤ ≤
fρ (xk , sk )/(ρ − n) √ fρ (xk , sk )/ n
√ ( fρ (x0 , s0 ) − kδ )/ n.
√ Clearly, if ( fρ (x0 , s0 ) − kδ )/ n ≤ −| log ε|, then, it must have µ ≤ ε, i.e., for k ≥ K with K given in (2.33), it must have µk ≤ ε.
2.4
Concluding Remarks
The primal-dual potential-reduction interior-point algorithm discussed in this chapter is a feasible interior-point algorithm because it requires the initial point in F o . This may not be a reasonable requirement as it may take a lot of effort to find a point in F o . Also, primal-dual potential-reduction interior-point algorithms are not as efficient as primal-dual path-following interior-point algorithms which will be discussed in the next chapter. We will also discuss infeasible primal-dual path-following interiorpoint algorithms which do not require the initial point in F o , a very desired feature for practical computations. However, potential-reduction interior-point method was found to be useful in the development of interior-point algorithms for nonlinear programming. This is the main reason that we have included the method in this book.
Chapter 3
Feasible Path-Following Algorithms for LP
Path-following is an excellent idea of Megiddo [88] in the redevelopment of interiorpoint method in the 1980’s that has been studied extensively. The key idea of the path-following algorithms is to search for the optimizer approximately along a curve which is known as the central path. Following the central-path prevents the iterate from approaching the boundary too early before the iterate approaches the optimal solution. Many algorithms were proposed based on this idea. Central-path is an arc defined by (2.4) in Chapter 2. The potential-reduction algorithm discussed in Chap ter 2 does not restrict its iterate to follow the central path explicitly, but the poten tial function includes an item that implicitly penalizes the iterate far away from the central-path. However, all path-following algorithms explicitly require the iterate to stay in a neighborhood of the central-path. Several neighborhoods are used by re searchers. In this chapter, two mostly used neighborhoods are considered. The first neighborhood is defined by using the 2-norm: N2 (θ ) = {(x, λ , s) ∈ F o | Ix ◦ s − µeI2 ≤ θ µ}
(3.1)
where θ ∈ (0, 1) is a constant and F o is defined in (1.10). It is clear that this neigh borhood restricts xi si to stay not too far away from the average n
1� µ = x s/n = xi si . n T
i=1
The second neighborhood is motivated from the one-side ∞-norm:
N−∞ (γ) = {(x, λ , s) ∈ F o | xi si ≥ γ µ}.
(3.2)
Feasible Path-Following Algorithms for LP
•
33
where γ ∈ (0, 1) is a constant. This neighborhood does not restrict xi si from becoming too big, it cares only if some xi si are much smaller than the average characterized by duality measure µ. Both neighborhoods require that the iterate stays inside the feasible set, therefore, x > 0 and s > 0, hence, xi si > 0 hold for all i = 1, 2, . . . , n in all iterations. Note that the N2 (θ ) neighborhood can be expressed as n � si xi −1 µ i=1
2
≤ θ 2 < 1.
This means that each relative deviation of xi si from µ is very small. Therefore, the neighborhood is called narrow neighborhood. On the other hand, N−∞ (γ) neigh borhood requires only that xi si is not too close to zero, therefore, the neighborhood is called wide neighborhood. For algorithms that restrict the iterate inside the nar row neighborhood, since the central-path is a curve, the step-size is normally short when searching is along a straight line, otherwise the iterate will be outside the nar row neighborhood. Therefore, these algorithms are named as short step algorithms. For algorithms that restrict the iterate inside the wide neighborhood, although the central-path is a still the same curve, but the room to search for the new iterate is much larger, the step-size is normally longer. Therefore, these algorithms are known as long-step algorithms. Another idea to make the iterate take a longer step is proposed by Mizuno, Todd, and Ye [93]. The initial iterate is inside a narrow neighborhood with a small param eter θ and the search for the next iterate is in a larger narrow neighborhood with a larger parameter 2θ , this will generate a longer step than a search in the neighbor hood with a small θ . This step is called the predictor step. Since θ < 1, to prevent θ becomes bigger and bigger, after the iterate moves to a neighborhood with a larger parameter 2θ , the algorithm brings the iterate back to the neighborhood with the original small θ but the objective function will not increase while moving the iter ate back to the original narrow neighborhood. This is called the corrector step. This algorithm is known as the Mizuno-Todd-Ye algorithm, or MTY algorithm for short. A different way to make a longer step for the iterate is proposed by Monteiro, Adler, and Resende [97]. As the central-path is an arc, searching along a straight line will keep the step-size small in order to prevent the iterate from moving to the outside of the neighborhood, an intuitive idea is to search for the optimizer along the central-path. However the calculation of the central-path defined by (2.4) will be too expansive to be feasible. Monteiro, Adler, and Resende proposed using power series to approximate the central-path. This method needs second-order derivatives to ap proximate the central-path, which is different from aforementioned algorithms which need only the first-order derivative to get the search direction. Therefore, algorithms that need more than first order derivatives are called higher-order algorithms. It is worthwhile to note that all algorithms mentioned above assume that the ini tial point is either in N2 (θ ) or N−∞ (γ). Therefore, they are called feasible interiorpoint algorithms. It has been known that, in general, feasible interior-point algorithms are not as efficient as the infeasible interior-point algorithms to be discussed in the
34
� Arc-Search Techniques for Interior-Point Methods
next chapter. However, all the iterates in the feasible interior-point algorithm are fea sible, which is desirable for some engineering problems that will be discussed in Chapter 10. This chapter discusses only feasible interior-point algorithms. We start with the short-step path-following algorithm.
3.1
A Short-Step Path-Following Algorithm
A well-known short-step path-following algorithm was proposed by Kojima, Mizuno, and Yoshise [70]. The polynomial bound established for this algorithm is still the best among all interior-point algorithms for linear programming. Before we discuss the algorithm and its convergence, we state several technical lemmas that will be used to analyze this algorithm and some algorithms to be discussed in the rest of the book. Lemma 3.1 For any vector x ∈ Rn , the following vector norm inequalities hold √ �x�∞ ≤ �x�2 ≤ �x�1 ≤ n�x�2 ≤ n�x�∞ . Lemma 3.2 For any vector pairs of u and v with uT v ≥ 0, we have �u ◦ v� ≤ 2−3/2 �u + v�2 . Proof 3.1
Since uT v ≥ 0, we have � � 0 ≤ uT v = |ui vi | − |ui vi |.
(3.3)
(3.4)
ui vi 0. Since (x(0), s(0)) = (xk , sk ) > 0 and it has no chance for xi (α) = 0 or si (α) = 0, it must have (x(1), s(1)) = (xk+1 , sk+1 ) > 0. This shows that (xk+1 , λ k+1 , sk+1 ) ∈ F o .
Feasible Path-Following Algorithms for LP
•
39
Since all iterates stay in N2 (θ ), using Lemma 3.4, we have Lemma 3.8 Given ε > 0, suppose that initial point (x0 , λ 0 , s0 ) ∈√N2 (0.4) and µ0 ≤ 1/ε κ for some positive constant κ. Then, there is a large K = O( n log(1/ε)) such that µk ≤ ε for all k ≥ K.
Proof 3.7
√ Using Lemma 3.4 with α = 1 and σ = 1 − 0.4/ n, we have � 0.4 � µk+1 = µ(α) = 1 − √ µ. n
Using Theorem 1.4 with ω = 0.5 and δ = 0.4, the claim follows. Remark 3.2 In interior-point method literature, people say that the polynomial √ complexity of the short step path-following algorithm is O( n log(1/ε)).
3.2
A Long-Step Path-Following Algorithm
As pointed out earlier, the short step path-following algorithm is not computationally very efficient because the searching neighborhood is unnecessarily tight. There fore, researchers developed long-step path-following algorithms. The first and a wellknown long-step path-following algorithm is formally stated below: Algorithm 3.2 Data: A, b, c. Parameters: γ ∈ (0, 1), ε ∈ (0, 1), and 0 < σmin < σmax < 1. Initial point (x0 , λ 0 , s0 ) ∈ N−∞ (γ) and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1: If µk < ε, stop. Otherwise continue. Step 2: Select σ ∈ [σmin , σmax ] and solve the linear systems of equations (2.5) to get (Δx, Δλ , Δs). Step 3: Choose the largest αk in [0, 1] such that (x(α), λ (α), s(α)) = (xk , λ k , sk ) + α(Δx, Δλ , Δs) ∈ N−∞ (γ). Step 4: Set (xk+1 , λ k+1 , sk+1 ) = (x(αk ), λ (αk ), s(αk )), T
and µk+1 = end (for)
xk+1 sk+1 . n
Set k + 1 → k. Go back to Step 1.
(3.30)
40
• Arc-Search Techniques for Interior-Point Methods
The rest of this section is to prove the convergence of Algorithm 3.2 and estimate its corresponding polynomial bound. First, it is easy to verify that (3.11) and (3.12) hold for Algorithm 3.2. It will be shown that for all i, and α ∈ (0, αk ], xi (α)si (α) ≥ γ µ(α) > 0 in Lemma 3.11, therefore, there is no chance for xi (α) = 0 or si (α) = 0, i.e., xi (α) > 0 and si (α) > 0. This shows Lemma 3.9 If (x0 , λ 0 , s0 ) ∈ F o , then, (xk , λ k , sk ) ∈ F o . Therefore, we need only to show that proximity condition defined by N−∞ (γ) holds and µk decreases at least at a constant rate. Lemma 3.10 Suppose that (x, λ , s) ∈ N−∞ (γ), then = 2−3/2 1 +
IΔx ◦ ΔsI Proof 3.8
1 nµk . γ
(3.31)
By using Lemma 3.3, it has IΔx ◦ ΔsI
≤
2−3/2 I(XS)−1/2 (−x ◦ s + σ µk e)I2
= 2−3/2 I − (XS)1/2 e + σ µk (XS)−1/2 eI2 n � 1 = 2−3/2 xT s − 2σ µk eT e + σ 2 µk2 xi si i=1
use xi si ≥ γ µk
−3/2
≤ 2
T
x
s − 2σ µk n + σ 2 µk2
≤ 2−3/2 1 − 2σ + use σ ∈ (0, 1)
≤
2−3/2 1 +
n γ µk
σ2 nµk γ
1 nµk γ
This completes the proof. The following lemma provides the condition for (xk , λ k , sk ) to stay in N−∞ (γ). Lemma 3.11 Let (xk , λ k , sk ) ∈ N−∞ (γ). For � (1 − γ)σ � α ∈ 0, 23/2 γ , (1 + γ)n it has (x(α), λ (α), s(α)) ∈ N−∞ (γ).
(3.32)
Feasible Path-Following Algorithms for LP
•
41
For i = 1, . . . , n, using Lemma 3.10, we have
Proof 3.9
|Δxi Δsi | ≤ IΔx ◦ ΔsI2 ≤ 2−3/2 1 +
1 nµk . γ
(3.33)
Using (3.13), we have xi (α)si (α) = ≥ use (3.33)
≥
xi si + α(si Δxi + xi Δsi ) + α 2 Δxi Δsi xi si (1 − α) + ασ µk − |α 2 Δxi Δsi | (1 − α)γ µk + ασ µk − α 2 2−3/2 1 +
1 nµk . γ
(3.34)
Therefore, in view of (3.16), the proximity condition xi (α)si (α) ≥ γ µ(α) = γ(1 − α(1 − σ ))µk > 0
(3.35)
will be satisfied if the following condition holds: (1 − α)γ µk + ασ µk − α 2 2−3/2 1 +
1 nµk ≥ γ(1 − α(1 − σ ))µk γ
Simple manipulation gives (1 − γ)ασ µk ≥ α 2 2−3/2 1 +
1 nµk , γ
which yields α ≤ 23/2 γ
(1 − γ)σ , (1 + γ)n
the upper bound of (3.32). Therefore, the proximity condition (3.35) holds if (3.32) is satisfied. From (3.35), there is no chance for xi (α) = 0 or si (α) = 0 if α meets the condition of (3.32). In view of Lemma 3.9, we have (xk (α), λ k (α), sk (α)) ∈ N−∞ (γ). This completes the proof. Now, we are ready to prove the convergence and provide the polynomial bound for Algorithm 3.2. Theorem 3.1 Given the parameters γ, σmin , and σmax in Algorithm 3.2, there exists a constant δ independent of n such that, for all k ≥ 0, µk+1 ≤ 1 − Proof 3.10
δ µk . n
From (3.16) and (3.32), it must have µk+1
=
(1 − α(1 − σ ))µk
42
� Arc-Search Techniques for Interior-Point Methods
≤
�
1 − 23/2 γ
� (1 − γ)σ (1 − σ ) µk . (1 + γ)n
(3.36)
Noticing that σ (1 − σ ) is concave, its minimum is attained at one of its end points, i.e., σ (1 − σ ) ≥ min{σmin (1 − σmin ), σmax (1 − σmax )} for all σ ∈ [σmin , σmax ]. By setting (1 − γ) δ = 23/2 γ min{σmin (1 − σmin ), σmax (1 − σmax )}, (1 + γ) the proof is completed. Invoking Lemma 1.4, we have Theorem 3.2 Given the parameters ε > 0, γ ∈ (0, 1), σmin , and σmax in Algorithm 3.2, suppose that the initial point (x0 , λ 0 , s0 ) ∈ N−∞ (γ) with µ0 ≤ 1/ε κ for some positive constant κ = 1. Then, there is an integer K = O(n log(1/ε) such that for all k ≥ K µk ≤ ε. Remark 3.3 Algorithm 3.2 has the√polynomial bound of O(n log(1/ε)) which is worse than the polynomial bound O( n log(1/ε)) of Algorithm 3.1. However, it is well-known that Algorithm 3.2 is much more efficient in numerical computation. This is the first dilemma between theoretical analysis and numerical experience. Im proved algorithms will be provided in the later Chapters to fix this dilemma.
3.3
MTY Predictor-Corrector Algorithm
The dilemma that the short step path-following interior-point algorithm has better theoretical polynomial bound but the long step path-following interior-point algo rithm has demonstrated better performance in numerical experience was known in the very beginning of the development of interior-point method. Researchers have been working on closing the gap. This section discusses the MTY predictor-correct algorithm, which is proposed for the purpose of improving the computational perfor mance for the short step interior-point algorithm. In √the short step path-following algorithm, the parameters θ = 0.4 and σ = 1 − 0.4/ n are fixed, which balances the requirements of staying close to the centrality and reducing the duality measure. Noticing that reducing σ will reduce the weight of centrality and increasing θ will enlarge the neighborhood and possibly the step size to search for a smaller duality measure, the MTY predictor-corrector path-following algorithm separates the two tasks in two steps. In predictor step (when the iterate stay in a narrow neighborhood defined by N2 (0.25)), its goal is to reduce the duality measure as much as it can, therefore, it reduces σ = 0 and increases θ = 0.5 (the
Feasible Path-Following Algorithms for LP
�
43
searching is carried out in a wider neighborhood defined by N2 (0.5)). In the corrector step, it increases σ = 1 and reduces θ = 0.25 in order to bring the iterate back to N2 (0.25), but does not increase the duality measure. The algorithm is formally stated as follows: Algorithm 3.3 Data: A, b, c. Parameters: ε > 0, Initial point (x0 , λ 0 , s0 ) ∈ N2 (0.25), and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
If µk < ε, stop. Otherwise continue. If k is even (predictor): Set σ = 0 and solve the linear systems of equations (2.5) to get (Δx, Δλ , Δs). Choose the largest αk ∈ [0, 1] such that (x(α), λ (α), s(α)) = (xk , λ k , sk ) + α(Δx, Δλ , Δs) ∈ N2 (0.5). (3.37) Set (xk+1 , λ k+1 , sk+1 ) = (x(αk ), λ (αk ), s(αk )), T
and µk+1 =
xk+1 sk+1 . n
Else (corrector)
Set σ = 1 and solve the linear systems of equations (2.5) to get (Δx, Δλ , Δs). Set (xk+1 , λ k+1 , sk+1 ) = (xk , λ k , sk ) + (Δx, Δλ , Δs), T
and µk+1 =
xk+1 sk+1 . n
Set k + 1 → k.
end (for) The convergence analysis can be simplified by using the results established ear lier for short step and long step algorithms. Lemma 3.12 Suppose that (x, λ , s) ∈ N2 (0.25) and (Δx, Δλ , Δs) be calculated by solving the lin ear systems of equations (2.5) with σ = 0. Then, (x(α), λ (α), s(α)) ∈ N2 (0.5) for all α ∈ [0, α¯ ], where �1 � � 12 � µ α¯ = min , . (3.38) 2 8�Δx ◦ Δs�
44
• Arc-Search Techniques for Interior-Point Methods
Proof 3.11
By using (3.23), α ≤ 0.5, and σ = 0, we have
ix(α) ◦ s(α) − µ(α)ei ≤ (1 − α)ix ◦ s − µei + α 2 iΔx ◦ Δsi [use (3.38) ] ≤ (1 − α)ix ◦ s − µei + µIΔx◦ΔsI 8IΔx◦ΔsI [(x, λ , s) ∈ N2 (0.25)] ≤ (1 − α) µ4 + µ8 ≤ (1 − α) µ4 + (1 − α) µ4 ≤ 0.5µ(α). This proves that (x(α), λ (α), s(α)) meets the proximity condition of N2 (0.5). Therefore, −0.5µ(α) ≤ xi (α)si (α) − µ(α) ≤ 0.5µ(α), hence, using Lemma 3.4 with σ = 0 gives
xi (α)si (α) ≥ 0.5µ(α) = 0.5(1 − α)µ > 0. There is no chance for xi (α) = 0 or xi (α) = 0 to occur. Positivity of (x(α), s(α)) > 0 holds. Using the same argument used to prove Lemma 3.9, it is easy to verify that Ax(α) = b and AT λ (α) + s(α) = c hold. This proves (x(α), λ (α), s(α)) ∈ N2 (0.5) for all α ∈ [0, α¯ ]. The next lemma estimates the reduction rate of µ(α) in predictor step. Lemma 3.13 The reduction rate of the duality measure for predictor step is given by 0.4 µk+1 ≤ 1 + √ µk . n Proof 3.12
(3.39)
By using Lemma 3.5, θ = 0.25, σ = 0, and n ≥ 1, we have µ 8iΔx ◦ Δsi
µ 23/2 (1 − θ )
2 8 (θ + n(1 − σ )2 )µ
≥
23/2 (1 − 0.25) 8(0.252 + n) √ 3 2 1 + 16n 0.16 . n
≥ = ≥
(3.40)
This shows α¯ = min
1 µ , 2 8iΔx ◦ Δsi
1 2
≥ min
1 0.16 , 2 n
Since σ = 0 in predictor step, from (3.16), we have 0.4 µk+1 ≤ 1 − √ . n
1 2
0.4 =√ . n
Feasible Path-Following Algorithms for LP
�
45
This completes the proof. Then, we will show that the corrector step will bring the iterate from N2 (0.5) back to N2 (0.25) without changing the duality measure. Lemma 3.14 Suppose that (x, λ , s) ∈ N2 (0.5) and let (Δx, Δλ , Δs) be calculated by solving the linear systems of equations (2.5) with σ = 1. Then, we have (x(1), λ (1), s(1)) ∈ N2 (0.25),
µ(1) = µ.
(3.41)
Proof 3.13 Substituting σ = 1 into Lemma 3.4 yields µ(1) = µ. Substituting σ = 1, θ = 0.5, and α = 1 into (3.24) gives �X(1)S(1)e − µ(1)e� ≤ 0.25µ = 0.25µ(1).
(3.42)
The proof is completed by verifying that (x(1), λ (1), s(1)) ∈ F o , which is very sim ilar to the argument that was used to prove Lemma 3.9. Invoking Lemma 1.4 and (3.39) and (3.41), we have Theorem 3.3 Give ε ∈ (0, 1), suppose that the initial point (x0 , λ 0 , s0 ) ∈ N2 (0.25) and µ0 ≤ 1/ε κ for √ some positive constant κ = 1 in Algorithm 3.3. Then, there is an integer K = O( n log(1/ε)) such that µk ≤ ε
for all
k ≥ K.
Remark 3.4 MTY algorithm is proposed to improve the efficiency of the short step interior-point algorithm by enlarging the search neighborhood without sacrific ing the convergence property. This idea has been used by many people to design computationally efficient algorithms. We will discuss some of the algorithms later in this book.
3.4
A Modified Short Step Path-Following Algorithm
Although interior-point method was viewed to be a mature discipline in mathemat ical programming after several research monographs were published in the 1990s [144, 164, 117], and some of the most cited books in mathematical programming [81, 106] included the method, there are still some fundamental problems that need to be answered [133]. For example, we have seen that the short step interior-point algorithm has better polynomial bound than the long step interior-point algorithm,
46
• Arc-Search Techniques for Interior-Point Methods
but their performances are in the reversed order. One of the main reasons leading to this dilemma is that, for most interior-point algorithms, the selection of the center ing parameter is based on heuristics. To develop algorithms with the best polynomial bound, some researches use the heuristics with the sole purpose in mind of devising algorithms wherein it is easy to show the low polynomial bound without considering the efficiency in practice. To develop efficient algorithms in practice, other researches focus on the heuristics which, by intuition, will generate good iterates but ignore the problem of proving a polynomial bound. A widely used shortcut in developing interior-point algorithms is to separate the selection of the centering parameter σ from the selection of the line-search step size [120, 68, 97, 89, 82, 172, 152]. This strategy makes the problem simple to deal with but has to use heuristics in the selection of the centering parameter. Therefore, this is not an optimal strategy. In this section, we propose a systematic way to optimally select the centering pa rameters and line-search step size at the same time, aiming at minimizing the duality gap in all iterations. We show that this algorithm will have the best-known poly nomial bound, even though the estimation is extremely conservative. We use some Netlib test problems to demonstrate that the proposed algorithm may be very efficient compared to the very efficient algorithm MPC. The main purpose of this section is to show that optimally selecting σk and line-search step size at the same time will be feasible and will improve the computational efficiency significantly. This strategy will be used in Chapters 7 and 8 for arc-search algorithms. The materials presented in this section are based on [154].
3.4.1 A modified interior point algorithm for LP Starting from any initial point (x0 , λ 0 , s0 ) in a central-path neighborhood N2 (θ ) that satisfies (x0 , s0 ) > 0 and Ix0 s0 − µ0 eI ≤ θ µ, instead of searching along the centralpath, which is difficult to find in practice, we consider searching along a line inside F o (θ ), defined as follows: (x(α, σ ), λ (α, σ ), s(α, σ )) := (xk − αx˙ (σ ), λ k − αλ˙ (σ ), sk − αs˙(σ )),
where α ∈ [0, 1], σ ∈ [0, 1], and (x˙ (σ ), λ˙ (σ ), s˙(σ )) is defined by ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ (σ ) A 0 0 0 ⎦. ⎣ 0 AT I ⎦ ⎣ λ˙ (σ ) ⎦ = ⎣ 0 xk ◦ sk − σ µk e Sk 0 Xk s˙(σ )
(3.43)
(3.44)
We use (x(α, σ ), λ (α, σ ), s(α, σ )) to emphasize that the updated variables are func tions of both α and σ , which will be selected at the same time. Since the search stays in F o (θ ), as xk ◦ sk → 0, (1.15) implies that µk → 0; hence, the iterates will approach to an optimal solution of (1.4) because the perturbed KKT system (2.4) reduces to KKT condition. We will use several results that can easily be derived from (3.44). To simplify the notations, we will drop the superscript and subscript k unless a confusion may be introduced.
Feasible Path-Following Algorithms for LP
•
47
Lemma 3.15 Let (x˙ (σ ), λ˙ (σ ), s˙(σ )) be defined in (3.44). Then, the following relations hold. ˙ ) = xT s − σ µn, ˙ ) + xT s(σ sT x(σ µ(α, σ ) =
x(α, σ )T s(α, σ ) = µ(1 − α(1 − σ )). n
(3.45) (3.46)
Proof 3.14 The first relation is straightforward, using the third row of (3.44), there fore, the proof is omitted. For the second relation, Pre-multiplying x˙ (σ )T to both sides of the second row of (3.44) shows that x˙ (σ )T s˙(σ ) = 0. Using this relation, together with (3.43) and (3.45), we have µ(α, σ ) = = = =
x(α, σ )T s(α, σ ) n k (x − α x˙ (σ ))T (sk − α s˙(σ )) n T k k x s s˙(σ )T xk + x˙ (σ )T sk −α n n µ(1 − α(1 − σ )).
This completes the proof. Lemma 3.16 Let (x˙ (σ ), λ˙ (σ ), s˙(σ )) be defined in (3.44). Assume that (x, λ , s) ∈ F o (θ ). Then, the following relations hold. Ax(α, σ ) = b, AT λ (α, σ ) + s(α, σ ) = c. Proof 3.15
(3.47)
The proof is straightforward and is, therefore, omitted.
ˆ be an orthonormal basis for the null space of A. The following lemma will Let A be needed later in this section. Lemma 3.17 Let (x˙ (σ ), λ˙ (σ ), s˙(σ )) be defined in (3.44). Then, the following relations hold. ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T (Se − σ X−1 µe) := px − σ qx , x˙ (σ ) = A T
−1
T −1
−1
s˙(σ ) = A (AXS A ) A(Xe − σ S µe) := ps − σ qs , λ˙ (σ ) = −(AXS−1 AT )−1 A(Xe − σ S−1 µe),
where ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T Se, qx = µA ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T X−1 e, px = A
(3.48a) (3.48b) (3.48c)
48
• Arc-Search Techniques for Interior-Point Methods
ps = AT (AXS−1 AT )−1 AXe, qs = µAT (AXS−1 AT )−1 AS−1 e. Proof 3.16 In view of the first row of (3.44), it is clear that x˙ (σ ) is a vector in the null space of A. From the first two rows of (3.44), we have, for some vector v, ˆ ˙ ) = 0. S−1 AT λ˙ (σ ) + S−1 s(σ X−1 x˙ (σ ) = X−1 Av,
(3.49)
From the third row of (3.44), we have, X−1 x˙ (σ ) + S−1 s˙(σ ) = e − σ µX−1 S−1 e. Substituting (3.49) into the above equation and writing the result as a matrix form yields � � � � −1 v −1 T ˆ = e − σ µX−1 S−1 e. X A, −S A λ˙ (σ ) ˆ is the orthonormal basis for the null space of A, we have Since A is full rank and A � � ˆ T S � −1 ˆ T SX−1 A) ˆ −1 A � (A ˆ , −S−1 AT = I. X A −1 T −1 −(AXS A ) AX This gives �
v λ˙ (σ )
�
� =
ˆ )−1 A ˆ TS ˆ T SX−1 A (A −1 T −1 −(AXS A ) AX
�
(e − σ µX−1 S−1 e).
Substituting this equation into (3.49) proves the result. Since x˙ (σ ) ◦ s˙(σ )
= (px − σ qx ) ◦ (ps − σ qs ) = px ◦ ps − σ (qx ◦ ps + px ◦ qs ) + σ 2 qx ◦ qs := p − σ q + σ 2 r,
where p = px ◦ ps , q = qx ◦ ps + px ◦ qs , r = qx ◦ qs ,
(3.50)
to make sure that (x(α, σ ), λ (α, σ ), s(α, σ )) stays in N2 (θ ), we need to find some α¯ such that, for ∀α ∈ (0, α¯ ], the following inequality holds: Ix(α, σ ) ◦ s(α, σ ) − µ(α, σ )eI ≤ θ µ(α, σ ) = θ µ(1 − α(1 − σ )). Since Ix(α, σ ) ◦ s(α, σ ) − µ(α, σ )eI = I(x − αx˙ ) ◦ (s − αs˙) − µ(1 − α(1 − σ ))eI = Ix ◦ s − α(x ◦ s˙ + x˙ ◦ s) + α 2 x˙ ◦ s˙ − µ(1 − α(1 − σ ))eI
(3.51)
Feasible Path-Following Algorithms for LP
�
49
= �x ◦ s − α(x ◦ s − σ µe) + α 2 x˙ ◦ s˙ − µ(1 − α(1 − σ ))e� = �(1 − α)(x ◦ s − µe) + α 2 (p − σ q + σ 2 r)� ≤ (1 − α)�x ◦ s − µe� + α 2 �p − σ q + σ 2 r�, and �x ◦ s − µe� ≤ θ µ, we conclude that equation (3.51) holds if h(σ ) := �p − σ q + σ 2 r�2 ≤
θ 2σ 2 µ 2 . α2
(3.52)
Equation (3.52) is a quartic polynomial (in terms of σ ) inequality constraint which can be written as � � θ 2 µ 2σ 2 θ 2µ2 4 3 f (σ , α) := h(σ ) − = a σ − a σ + a − σ 2 − a1 σ + a0 ≤ 0, 4 3 2 α2 α2 (3.53) with h(σ ) = a4 σ 4 − a3 σ 3 + a2 σ 2 − a1 σ + a0 ≥ 0, and a0 = pT p ≥ 0, a1 = qT p + pT q, a2 = pT r + rT p + qT q, a3 = qT r + rT q, a4 = rT r ≥ 0.
(3.54)
Here, ai , i = 0, 1, 2, 3, 4 are all known constants since they are functions of x and s which are known at the beginning of every iteration. It is important to note that f (σ , α) is a monotonically increasing function of α. Therefore, for any fixed σ ∈ [0, 1], if for some α¯ , f (σ , α¯ ) ≤ 0 holds, then f (σ , α) ≤ 0 holds for ∀α ∈ (0, α¯ ]. Using the relation that �x(α, σ ) ◦ s(α, σ ) − µ(α, σ )e� ≤ θ µ(α, σ ), we have xi (α, σ )si (α, σ ) ≥ (1 − θ )µ(1 − α(1 − σ )) > 0 for all ∀α ∈ (0, α¯ ]. This means that (x(α, σ ), s(α, σ )) > 0 for all ∀α ∈ (0, α¯ ]. Therefore, in the remaining discussions, we simply use α instead of α¯ . Assuming that the initial point (x0 , λ 0 , s0 ) ∈ N2 (θ ), we want to minimize the duality measure µ(α, σ ) in each iteration under the constraint (x(α, σ ), λ (α, σ ), s(α, σ )) ∈ N2 (θ ). Because of Lemma 3.16, (3.51), (3.52), and (3.53), the selection of α and σ in each iteration is reduced to the following optimization problem. min α,σ
s.t.
µ(1 − α(1 − σ )) 0 ≤ α ≤ 1, 0 ≤ σ ≤ 1, f (σ , α) ≤ 0.
(3.55)
Since 0 ≤ α ≤ 1 and 0 ≤ σ ≤ 1, we have 0 ≤ α(1 − σ ) ≤ 1. i.e., 0 ≤ (1 − α(1 − σ )) ≤ 1. This means that 0 ≤ µ(α, σ ) = µ(1 − α(1 − σ )) ≤ µ. Clearly, if a0 = 0, then, the optimization problem has a solution of σ = 0 and α = 1 with the objective function µ(α, σ ) = 0. One iteration will find the solution of (1.4). Therefore, in the
50
• Arc-Search Techniques for Interior-Point Methods
rest discussions, we do not consider this simple case. Instead, we assume that a0 > 0 holds in all the iterations. Let the Lagrange function be defined as follows. L = µ(1 − α(1 − σ )) − ν1 α − ν2 (1 − α) − ν3 σ − ν4 (1 − σ ) + ν5 f (σ , α), where νi , i = 1, 2, 3, 4, 5, are Lagrange multipliers. The KKT conditions for Problem (3.55) are as follows. ∂L σ 2θ 2 µ 2 = −(1 − σ )µ − ν1 + ν2 + 2ν5 = 0, ∂α α3 (3.56a) ∂L θ 2µ2 = α µ − ν3 + ν4 + ν5 4a4 σ 3 − 3a3 σ 2 + 2 a2 − 2 ∂σ α
σ − a1
= 0, (3.56b)
ν1 ≥ 0, ν2 ≥ 0, ν3 ≥ 0, ν4 ≥ 0, ν5 ≥ 0, (3.56c) ν1 α = 0, ν2 (1 − α) = 0, ν3 σ = 0, ν4 (1 − σ ) = 0, ν5 f (σ , α) = 0, (3.56d) 0 ≤ α ≤ 1, 0 ≤ σ ≤ 1, f (σ , α) ≤ 0. (3.56e) Relations in (3.56) can be simplified because of the following claims. Claim 1 : α = 0. Otherwise, µ(α, σ ) = µ will be the maximum. Claim 2 : ν1 = 0 because of (3.56d) and Claim 1. Claim 3 : σ = 1. Otherwise, µ(α, σ ) = µ will be the maximum. Claim 4 : ν4 = 0 because of (3.56d) and Claim 3. Claim 5 : σ = 0. Otherwise (3.53) does not hold since a0 = pT p > 0 is as sumed. Claim 6 : ν3 = 0 because of (3.56d) and Claim 5. Therefore, we can rewrite the KKT conditions as follows. σ 2θ 2 µ 2 = 0, α3
(3.57a)
σ − a1
= 0,
(3.57b)
ν2 ≥ 0, ν5 ≥ 0, ν2 (1 − α) = 0, ν5 f (σ , α) = 0, 0 < α ≤ 1, 0 < σ < 1, f (σ , α) ≤ 0.
(3.57c) (3.57d) (3.57e)
(σ − 1)µ + ν2 + 2ν5 α µ + ν5 4a4 σ 3 − 3a3 σ 2 + 2 a2 −
θ 2µ2 α2
Feasible Path-Following Algorithms for LP
�
51
Notice that f (σ , 1) < 0 cannot hold for all σ ∈ (0, 1), otherwise let σ → 0, then f (σ , 1) → pT p > 0. Therefore, we divide our discussion into two cases. Case 1: f (σ , 1) = 0 has solution(s) in σ ∈ (0, 1). First, in view of the fact that f (0, 1) = pT p > 0, it is straightforward to check that the smallest solution of f (σ , 1) = 0 in σ ∈ (0, 1) and α = 1 is a feasible solution and a candidate of the op timal solution that minimizes µ(α, σ ) = µ(1 − α(1 − σ )) under all the constraints. Then, let us consider other feasible solutions which meet KKT condition but α < 1. � 1, we conclude that ν2 = 0 from (3.57d). From (3.57a), we have Since α = ν5 =
(1 − σ )α 3 �= 0. 2µσ 2 θ 2
The last relation follows from the facts that α �= 0 and σ �= 1. Substituting ν5 into (3.57b) yields � � � � (1 − σ )α 2 θ 2µ2 3 2 µ+ 4a σ − 3a σ + 2 a − σ − a ) = 0. (3.58) 4 3 2 1 α2 2µσ 2 θ 2 Since ν5 �= 0, from (3.57d), we have θ 2µ2 f (σ , α) = a4 σ − a3 σ + a2 − 2 α 4
3
�
�
σ 2 − a1 σ + a0 = 0,
which gives, 0
0 for all σ ∈ (0, 1). For any fixed σ , since f (σ , α) is a mono tonic increasing function of α and f (σ , 0) = −∞, there exists an α ∈ (0, 1) such that f (σ , α) = 0. It is easy to see that α = � 1 (otherwise the constraint f (σ , α) ≤ 0 will not hold). Therefore, all arguments for α = � 1 in Case 1 apply here. Furthermore, in this case, we have a stronger condition than (3.59), i.e., θ 2µ2 2 σ = a4 σ 4 − a3 σ 3 + a2 σ 2 − a1 σ + a0 := h(σ ) > f (σ , 1) > 0, ∀σ ∈ (0, 1). α2 (3.62)
52
� Arc-Search Techniques for Interior-Point Methods
In view of the facts that g(0) = −2a0 < 0 and g(1) = 2(a4 − a3 + a2 − a1 + a0 ) = 2h(1) > 0, g(σ ) = 0 has solution(s) in σ ∈ (0, 1). For any candidate pair (σ , α) of the optimal solution obtained in Cases 1 and 2, we use (3.46) to calculate µ(α, σ ) for all candidate pairs. The smallest µ(α, σ ) among all candidate pairs (σ , α) is the solution of (3.55). Now we are ready to present the algorithm. Algorithm 3.4 ˆ . Parameters: θ ∈ (0, 1). Iinitial point: (x0 , y0 , s0 ) ∈ F 0 , and µ0 = Data: A, b, c, A T
x0 s 0 n .
for iteration k = 0, 1, 2, . . . Step 1: Calculate px , qx , ps , qs , x˙ (σ ), y˙ (σ ), and s˙(σ ) using (3.48); p, q, and r using (3.50); a0 , a1 , a2 , a3 , and a4 using (3.54). Step 2: Select α and σ as follows. 1. If a0 = 0
set σ = 0 and α = 1.
2. else a0 > 0 (a) Solve f (σ , 1) = 0. If f (σ , 1) has solution(s) in σ ∈ (0, 1), the smallest solution σ ∈ (0, 1) and α = 1 is a candidate of optimal solution. (b) Solve g(σ ) = 0. If g(σ ) has solutions in σ ∈ (0, 1), calculate h(σ ) and α using (3.59) and (3.61); for each pair of (σ , α), if the pair meets 0 < σ < 1 and 0 < α < 1, the pair is a candidate of solution. (c) Calculate µ(α, σ ) using (3.46) for all candidate pairs; select σ and α that generate the smallest µ(α, σ ). Step 3: Set (x(k + 1), y(k + 1), s(k + 1)) = (x − αx˙ (σ ), y − αy˙ (σ ), s − αs˙(σ )). end (for) Based on the analysis in this section, it is easy to see the following: Theorem 3.4 Algorithm 3.4 finds the optimal solution of problem (3.55). Remark 3.5 The most expensive computation is in Step 1, which involves matrix inverse and products of matrices and vectors. It is worthwhile to note that the update of λ is not necessary but it is included. The computations in Step 2 involve the
Feasible Path-Following Algorithms for LP
�
53
quartic polynomial solutions of f (σ , 1) and g(σ ) which are negligible [125]. The computational details for quartic solution are described in [162]. Remark 3.6 Comparing the the short-step path-following algorithm 3.1 and the modified short-step path-following algorithm 3.4, the former uses the fixed param √ eters θ = 0.4, σ = 1 − 0.4/ n and α = 1, while the latter uses optimized parame ters that minimize the duality √ measure in every iteration. Notice that the polynomial bound for algorithm 3.1 is O( n log( ε1 )), therefore, the polynomial bound of Algo √ rithm 3.4 is at least the same as or better than O( n log( ε1 )). We summarize the discussion in this section in the following theorem. Theorem 3.5 Algorithm √ 3.4 is convergent with the polynomial bound at least the same as or better than O( n log( ε1 )).
3.4.2 Implementation and numerical test R Algorithm 3.4 is implemented in MATLAB� and test is conducted for Netlib test problems. We provide the implementation details and discuss the test result in this section.
3.4.2.1
Implementation
Algorithm 3.4 is presented in a simple form which is convenient for analysis. Some implementation details are provided here. First, to have a large step size, we need to have a large central-path neighborhood, therefore, parameter θ = 0.99 is used. Second, the program needs a stopping criterion to avoid an infinity loop, the code stops if µ < 10−8 max{1, �cT x�, �bT λ �} holds, which is similar to the stopping criterion of linprog [172]. Our experience shows that, when iterations approach an optimal point, some xi and/or s j approach to zero, which introduces large numerical error in the matrix inverses of (3.48). Therefore, the following alternative formulas are used to replace (3.48). Using the QR decomposition, we can write ˆ = Q1 R1 , X−0.5 S0.5 A where Q1 is an orthonormal matrix in Rn×(n−m) , and R1 is an invertible triangle matrix in R(n−m)×(n−m) . Then, we have � T −1 �−1 T ˆ ˆ ˆ A ˆ SX A A A
(3.63)
54
• Arc-Search Techniques for Interior-Point Methods
ˆ A ˆ T SX−1 A ˆ = X0.5 S−0.5 X−0.5 S0.5 A
−1
ˆ T X−0.5 S0.5 X0.5 S−0.5 A
= X0.5 S−0.5 Q1 QT1 X0.5 S−0.5 . Therefore, px = X0.5 S−0.5 Q1 QT1 X0.5 S0.5 e, qx = µX0.5 S−0.5 Q1 QT1 X−0.5 S−0.5 e,
(3.64)
Similarly, we can write X0.5 S−0.5 AT = Q2 R2 , where Q2 is an orthonormal matrix in Rn×m , and R2 is an invertible triangle matrix in Rm×m , AT AS−1 XAT
−1
A = X−0.5 S0.5 Q2 QT2 X−0.5 S0.5 ,
(3.65)
and ps = X−0.5 S0.5 Q2 Q2T X0.5 S0.5 e, qs = µX−0.5 S0.5 Q2 Q2T X−0.5 S−0.5 e,
(3.66)
Remark 3.7 It is observed that formulas (3.64) and (3.66) produce a much more accurate result than (3.48) when iterations approach to the optimal solution. For sparse matrix A, we can use sparse QR decomposition [26].
3.4.2.2
Test on Netlib problems
Numerical tests have been performed for linear programming problems in Netlib library. For Netlib problems, [20] has classified these problems into two categories: Problems with strict interior-point and problems without strict interior-point. Though the MATLAB code and other existing codes can solve problems without strict interior-point, we are most interested in the problems with strict interior-point that is assumed by all feasible interior-point methods. Among these problems, we only choose problems which are presented in standard form and whose A matrices are full rank. The selected problems are solved using our MATLAB function optimalAl phaSigma and function linprog in MATLAB optimization toolbox. For several reasons, it is impossible to be completely fair in the comparison of the test results obtained by optimalAlphaSigma and linprog. First, there is no detail about the initial point selection in linprog. Second, linprog does not allow to start from user selected initial point other than the one provided by linprog. Third, there is no information on what pre-process is actually used before linprog starts to run MPC, we only know from [82] that pre-process “generally increases computational efficiency, often substantial”. We compare the two codes simply by using the iteration numbers for the tested problem which are listed in Table 3.1. Only two Netlib problems that are classified as problems with strict interior-point and are presented in standard form are not in cluded in the table because the PC computer used in the testing does not have enough memory to handle problems of this size.
Feasible Path-Following Algorithms for LP
�
55
Table 3.1: Iteration counts for test problems in Netlib and MATLAB.
Problem AFIRO blend SCAGR25 SCAGR7 SCSD1 SCSD6 SCSD8 SCTAP1 SCTAP2 SCTAP3 SHARE1B
iterations used by different algorithms optimalAlphaSigma linprog 4 7 13 12 5 16 7 12 18 10 26 12 19 11 17 17 17 18 18 18 11 22
source Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib
For all problems, optimalAlphaSigma starts with x = s = e. A pre-process de scribed in [20] is used to find an initial point before Algorithm 3.4 runs. The initial point used in linprog is said to be similar to the one used in [82], with some minor modifications (see [172]). This result is very impressive because optimalAlphaSigma does not have a “cor rector step” which is used by MPC and many other algorithms. Although corrector step is not as expensive as “predictor step”, it still requires some substantial numeri cal operations.
3.5
A Higher-Order Feasible Interior-Point Algorithm
Monteiro, Adler, and Resende [97] proposed a higher-order path-following algorithm using the ∞-norm neighborhood max |xi si − µ| ≤ θ µ.
i=1,...,n
(3.67)
The main idea is to use higher-order derivatives to approximate the central path and search for a new iterate along this approximated path. They proved that, for r-order path-following algorithm, this algorithm has the polynomial bound of � 1+1/r � O n 2 [max(log n, log(1/ε), log µ0 )](1+1/r) . When r → ∞, this bound approaches the best bound of the short step path-following algorithm. However, for second-order (r = 2) path-following algorithm, this bound � 3 (3/2) is approximately O(n 4 [max(log n, log(1/ε), log µ0 )] , which is worse than the bound of the short step path-following algorithm but comparable to the long step path-following algorithm.
56
� Arc-Search Techniques for Interior-Point Methods
Let
fi (x, s) = xi si , fmin = min fi , fmax = max fi ,
i=1,...,n
i=1,...,n
and fmax (x0 , s0 ) + fmin (x0 , s0 ) fmax (x0 , s0 ) − fmin (x0 , s0 ) , θ0 = . 2 fmax (x0 , s0 ) + fmin (x0 , s0 ) � Denote γ = 1/(1 − θ0 ) and q(r) = 2r k=r+1 p(k) with the sequence p(k) defined re cursively as follows: µ0 =
p(1) = 1, p(k) =
k−1 � j=1
p( j)p(k − j), k ≥ 2.
Further, let vr (x, λ , s, α) be the rth-order Taylor polynomial that approximates the central path, r ≥ 1, at α = 0. The algorithm is presented below. Algorithm 3.5 Data: A, b, c. Parameters: ε ∈ (0, 1). Initial point: (x0 , λ 0 , s0 ) meets (3.67), and 0T 0
µ0 = x n s . for iteration k = 0, 1, 2, . . .
Step 1: Calculate r-order derivatives as described in [97]. Step 2: Select step size α as follows: �� r+3 r+1 ��−1 r+1 α = 2 2r γ 2r q(r)1/r n 2r �log(2nε −1 µ0 )�1/r Step 3: Set (xk+1 , λ k+1 , sk+1 ) = (xk , λ k , sk ) + αvr (x, λ , s, α). Step 4: Set k:=k+1. end (for) For a more detailed convergence analysis, the readers are referred to [97].
3.6
Concluding Remarks
In this chapter, we have presented five feasible interior-point algorithms. We also dis cussed three methods that are aimed at improving the efficiency of the path-following algorithms: (a) enlarge the neighborhood, (b) optimize both α and σ at the same time, (c) use high-order derivative in the algorithms. All these methods are proved to improve the computational efficiency. However, except for method (b), these meth ods appear to increase the polynomial bounds of the algorithms, which makes these methods less attractive in theory. We will resolve this dilemma in Chapters 5–8, when we introduce arc-search methods where method (b) plays an important role.
Chapter 4
Infeasible Interior-Point Method Algorithms for LP
All algorithms introduced so far are feasible interior-point algorithms that require the initial point to meet the equality constraints in the KKT conditions (1.8) and strictly feasible (i.e., inside F o ). This requirement, however, turns out to be computationally difficult for two reasons. First, it may take substantial effort to find a feasible interiorpoint to start with; second, many practical problems (or benchmark problems like Netlib problems) do not even have an interior-point [20]. A brilliant idea is to start with an infeasible interior-point that meets only strict positive condition and then reduce residual of the equality constraints constantly during the iteration process. This idea was one of the reasons that Mehrotra’s method had a huge success. At first, people thought that it may not be possible to have a rigorous analysis leading to a global and polynomial convergence result for infeasible interior-point algorithms. Surprisingly, Kojima, Megiddo, and Mizuno [68] and Zhang [171] proved polyno mial convergence for some long step infeasible interior-point algorithms. Miao [90] gave a proof for short step infeasible interior-point algorithms. This chapter provides the derivations of the polynomial bounds for infeasible interior-point algorithms ob tained in the 1990s. These polynomial bounds, however, are worse than the ones established in Chapter 3 for the feasible interior-point algorithms, which contradicts our expectation because better polynomial bounds should correspond to more effi cient algorithms. We will show in Chapters 6 and 8 that, by using arc-search method, the polynomial bounds for infeasible interior-point algorithms can be as good as the ones for feasible interior-point algorithms.
58
4.1
� Arc-Search Techniques for Interior-Point Methods
A Short Step Infeasible Interior-Point Algorithm
This algorithm is proposed by Miao [90]. Although infeasible interior-point algo rithms may start from a point that does not meet the equality constraints of the KKT conditions, these constraints have to be met when the algorithms converge. To char acterize how close the iterate is to meeting the equality requirements, the residuals of primal programming and dual programming are defined as: rkc
rkb = Axk − b, = AT λ k + sk − c.
(4.1)
To simplify the notation, we oftentimes omit the superscript k if it will not cause confusion. The narrow infeasible neighborhood under consideration is given by N2I (θ ) = {(x, s) | (x, s) > 0, �Xs − µe� ≤ θ µ},
(4.2)
where θ ∈ (0, 1). The initial point of the short step (or MTY-type) infeasible interiorpoint algorithm is in N2I (0.25). We also define the ε-solution set as Sε = {(x, λ , s) | (x, s) > 0, xT s ≤ ε, �rb � ≤ ε, �rc � ≤ ε}.
(4.3)
A linear system of equations to be solved in the predictor step of the short step infeasible interior-point algorithm is ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 −rkb Δx p ⎣ 0 AT I ⎦ ⎣ Δλ p ⎦ = ⎣ −rkc ⎦ , (4.4) Δs p Sk 0 Xk −xk ◦ sk and the iterate at predictor step is obtained by searching for an α such that the duality measure is reduced and (x(α), λ (α), s(α)) = (xk , λ k , sk ) + α(Δx p , Δλ p , Δs p ) ∈ N2I (0.5).
(4.5)
The linear system of equations to be solved in corrector step of the short step infea sible interior-point algorithm is ⎤ ⎡ ⎤ ⎡ ⎤⎡ A 0 0 0 Δxc ⎦, ⎣ 0 0 I ⎦ ⎣ Δλ c ⎦ = ⎣ AT (4.6) c (1 − αk )µk e − x(α) ◦ s(α) Δs S(α) 0 X(α) where αk is the step size taken in the predictor step. The short step infeasible interior-point algorithm discussed in this section can be formally stated below: Algorithm 4.1 Data: A, b, c. Parameters: ε > 0. Initial point (x0 , λ 0 , s0 ) ∈ N2I (0.25), and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Infeasible Interior-Point Method Algorithms for LP
•
59
If (xk , sk ) ∈ Sε , stop. Otherwise continue. If k is even (predictor): Solve the linear systems of equations (4.4) to get (Δx p , Δλ p , Δs p ). Choose the largest αk ∈ [0, 1] such that (x(α), λ (α), s(α)) ∈ N2I (0.5), α ≤ αk . Set µ(αk ) = (1 − αk )µk . Else (corrector) Solve the linear systems of equations (4.6) to get (Δxc , Δλ c , Δsc ). Set (xk+1 , λ k+1 , sk+1 ) = (x(αk ), λ (αk ), s(αk )) + (Δxc , Δλ c , Δsc ), T
and µk+1 =
xk+1 sk+1 . n
Set k + 1 → k.
end (for) Remark 4.1 It must be emphasized that, unlike most algorithms, where µ(αk ) := x(αk )T s(αk )/n is the duality measure, for this algorithm, we define µ(αk ) = (1 − αk )µk . The rest of the section is to establish the polynomial bound for this algorithm. Let parameter νk be defined as ν0 = 1, νk =
k−1 k i=0
(1 − αi )
(4.7)
In view of (4.6), it is easy to see T
Δxc Δsc = 0, x(αk )T Δsc + s(αk )T Δxc = (1 − αk )nµk − x(αk )T s(αk ).
(4.8)
The first lemma shows that the primal and dual residuals and duality gap decrease in every iteration. Lemma 4.1 Let (xk , λ k , sk ) be generated by Algorithm 4.1. Then, rk+1 = (1 − αk )rbk = νk+1 rb0 , rck+1 = (1 − αk )rck = νk+1 rc0 . b
(4.9)
60
• Arc-Search Techniques for Interior-Point Methods
Moreover, T
T
T
xk+1 sk+1 = (1 − αk )xk sk = νk+1 x0 s0 .
(4.10)
Using the definition of rk+1 b , (4.6), and (4.4), we have
Proof 4.1
rk+1 b
= Axk+1 − b = A(x(αk ) + Δxc ) − b
= Ax(αk ) − b = A(xk + αk Δx p ) − b = Axk − b − αk rkb = (1 − αk )rbk . Similarly, using the definition of rk+1 c , (4.6), and (4.4), we can prove the second equality of (4.9). To show (4.10), using (4.8), we have T
xk+1 sk+1
= (x(αk ) + Δxc )T (s(αk ) + Δsc ) = x(αk )T s(αk ) + (1 − αk )nµk − x(αk )T s(αk ) T
= (1 − αk )xk sk . This completes the proof.
The next lemma confirms that the predictor step is well-defined: Lemma 4.2 If (xk , sk ) > 0, then, x(αk ) and s(αk ) satisfies Ix(α) ◦ s(α) − µ(α)eI ≤ θ µ(α) = 0.5(1 − α)µk = 0.5µ(α).
(4.11)
Moreover, (x(αk ), s(αk )) > 0. Proof 4.2 The first claim follows directly from predictor step of the algorithm. From (4.11), the following inequality xi (α)si (α) ≥ (1 − θ )(1 − α)µk > 0 holds for all α with αk ≥ α > 0. There is no chance for xi (α) = 0 or si (α) = 0. Therefore, (xk , sk ) > 0 implies that (x(αk ), s(αk )) > 0. Remark 4.2
Lemma 4.2 indicates that (x(αk ), λ (αk ), s(αk )) ∈ N2I (0.5).
The following lemma is given in [91, 90]. Lemma 4.3 Let (Δxc , Δsc ) be given by solving (4.6). Then, √ 1 2 c c IΔx ◦ Δs I ≤ I(X(αk )S(αk ))− 2 (x(αk ) ◦ s(αk ) − µ(αk )e)I2 . 4
(4.12)
Infeasible Interior-Point Method Algorithms for LP
Proof 4.3
•
61
Applying Lemma 3.3 to the last row of (4.6) proves this lemma.
Another basic property of the algorithm is given below. Lemma 4.4 Assume that (x0 , λ 0 , s0 ) ∈ N2I (0.25). Let (xk , λ k , sk ) be generated by Algorithm 4.1. Then, for all k ≥ 0 (xk+1 , λ k+1 , sk+1 ) ∈ N2I (0.25). (4.13) Proof 4.4 In view of Lemma 4.2, we have (x(α), λ (α), s(α)) ∈ N2I (0.5). There fore, Ix(α) ◦ s(α) − µ(α)eI ≤ 0.5µ(α) holds. This gives 0.5µ(α) ≤ xi (α)si (α), which means that � � ��2 � �2 � � 1 1 1 1 � � � � −2 ≤ . �(X(αk )S(αk )) � = �diag � � = � � mini xi (α)si (α) 0.5µ(α) xi (α)si (α) Using the last row of (4.6), we have xk+1 ◦ sk+1 − µk+1 e
= (x(αk ) + Δxc ) ◦ (s(αk ) + Δsc ) − µ(αk )e = Δxc ◦ Δsc .
Then, from these two relations and Lemma 4.3, we have Ixk+1 ◦ sk+1 − µk+1 eI
[use Lemma 4.2] [µk+1 = µ(αk )]
≤ ≤ ≤
≤
= ≤
c ◦ Δsc I IΔx √ 1 2 I[X(αk )S(αk )]− 2 [x(αk ) ◦ s(αk ) − µ(αk )e]I2 √4 2 − 12 2 I Ix(αk ) ◦ s(αk ) − µ(αk )eI2 4 I(X(αk )S(αk )) √
2 2 0.25µ(αk ) 4 0.5µ(α ) k √ 2 4 0.5µk+1 0.25µk+1 .
Still, we need to show that (xk+1 , sk+1 ) > 0. This proof is very similar to the argument used in Theorem 6.2, therefore, we omit it here. Note, that in all iterations, (xk , sk ) > 0 is maintained. Lemma 4.1 indicates that if at some iteration k, αk = 1, then an ε-optimal solution is found. Therefore, in the rest of the analysis, we assume that αk < 1 for all iterations. Similar to Lemma 3.12 which was used for the MTY algorithm, the next lemma is used for short step infeasible interior-point algorithm. Lemma 4.5 Suppose that (xk , λ k , sk ) ∈ N2I (0.25) be generated by Algorithm 4.1. Then, (x(α), λ (α), s(α)) ∈ N2I (0.5) for all α ∈ [0, α¯ ], where
62
� Arc-Search Techniques for Interior-Point Methods
α¯ = min i.e., αk ≥ α¯ . Proof 4.5
�1 � � 12 � µk , , 2 8�Δx p ◦ Δs p �
(4.14)
Note that µ(α) is defined as µ(α) = (1 − α)µk and x(α) ◦ s(α) = (x + αΔx p ) ◦ (s + αΔs p ) = x ◦ s + α(x ◦ Δs p + s ◦ Δx p ) + α 2 Δx p ◦ Δs p = (1 − α)x ◦ s + α 2 Δx p ◦ Δs p .
By using these two relations, α¯ ≤ 0.5, and σ = 0, we have �x(α) ◦ s(α) − µ(α)e� [use (4.14) ] [(x, λ , s) ∈ N2I (0.25)] [α ≤ 0.5]
= ≤ ≤ ≤ ≤ ≤
�x ◦ s(1 − α) + α 2 Δx p ◦ Δs p − (1 − α)µk e�
(1 − α)�x ◦ s − µk e� + α 2 �Δx p ◦ Δs p �
p p k �Δx ◦Δs �
(1 − α)�x ◦ s − µk e� + µ8�Δx p ◦Δs p � (1 − α) µ4k + µ8k (1 − α) µ4k + (1 − α) µ4k 0.5µ(α).
This completes the proof. Lemma 4.6 Suppose that the initial point of Algorithm 4.1 meets the conditions: (x0 , s0 ) ∈ N2I (0.25), x∗ ≤ ρx0 , s∗ ≤ ρs0 ,
(4.15)
for some sufficiently large ρ ≥ 1. Suppose further (Δx p , Δλ p , Δs p ) be generated by solving (4.4). Then, there exists a positive constant C1 > 1, independent of n such that �Δx p ◦ Δs p � ≤ C1 n2 µk . (4.16) Proof 4.6 It is easy to see that (4.4) and (6.3) are identical, except that the notations are slightly different, i.e., (Δx p , Δλ p , Δs p ) is used in (4.4) and (x˙ , λ˙ , s˙) is used in (6.3). Therefore, the inequality (4.16) should be the same as (6.56), including the argument of the proof. To avoid repetition, the proof is, therefore, omitted here. Using Lemma 4.6, we have a simpler and useful expression for step size of α¯ . Theorem 4.1 Suppose that (xk , λ k , sk ) ∈ N2I (0.25) be generated by Algorithm 4.1. Then, (x(α), λ (α), s(α)) ∈ N2I (0.5) for all α ∈ [0, α¯ ], where 1 α¯ ≥ √ . 8n
(4.17)
Infeasible Interior-Point Method Algorithms for LP
•
63
Therefore, from Lemma 4.1, we have 1 µk+1 ≤ 1 − √ µk , 8n 1 rk+1 ≤ 1− √ rbk , b 8n 1 rk+1 ≤ 1− √ rck . c 8n Proof 4.7
(4.18) (4.19) (4.20)
Substituting (4.16) into (4.14), and noticing C1 > 1 and n > 1 yield α¯
1 µk , p 2 8IΔx ◦ Δs p I
≥
min
≥
min
1 µk , 2 8C1 n2 µk
≥
min
1 1 , 2 8n2
=
1 2
1 2
1 2
1 √ . 8n
(4.21)
Substituting αk ≥ α¯ in Lemma 4.1 proves the last claim. Combining Theorems 1.4 and 4.1 gives the convergence result and polynomial bound of Algorithm 4.1. Theorem 4.2 Let {(xk , λ k , sk )} be generated by Algorithm 4.1 with {(x0 , λ 0 , s0 )} satisfying conditions (4.15). For any given ε ∈ (0, 1), the algorithm will terminate with {(xk , λ k , sk )} ∈ Sε in at most O(nL) iterations where T
L = max{log[(x0 s0 )/ε], log(IAx0 − bI/ε), log(IAT λ 0 + s0 − cI/ε).}
4.2
A Long Step Infeasible Interior-Point Algorithm
As pointed out earlier, enlarging the search neighborhood will likely find longer step, thereby enhancing the computational performance of the algorithm. Therefore, this section introduces a long step infeasible interior-point algorithm which is based on the neighborhood of N−∞ (γ), with some modifications. Similar to the short step infeasible interior-point algorithm, it is also assumed that the initial point is chosen to be a special form (x0 , λ 0 , s0 ) = (ρe, 0, ρe) (4.22)
64
• Arc-Search Techniques for Interior-Point Methods
for ρ sufficiently large. This condition is imposed for getting a convergence result. For the same purpose, another condition I(rb , rc )I ≤ I(r0b , r0c )I/µ0 β µ, is imposed, where β ≥ 1 is a parameter that makes sure the initial point (x0 , λ 0 , s0 ) is also in the modified neighborhood defined as follows: I N−∞ (γ, β ) = {(x, λ , s) | I(rb , rc )I ≤ I(rb0 , rc0 )I/µ0 β µ,
(x, s) > 0, xi si ≥ γ µ, i = 1, . . . , n},
(4.23)
where γ ∈ (0, 1) is a given parameter. For all points in the infeasible neighborhood I (γ, β ), the infeasibility is uniformly bounded by a parameter times the duality N−∞ measure µ. The condition (x, s) > 0 is required by the interior-point method; the condition xi si ≥ γ µ is to prevent the iterate from going to zero before the iterate approach to an optimal solution. By forcing µk to zero and restricting all iterates I (xk , λ k , sk ) in the neighborhood N−∞ (γ, β ), it is expected that as k → ∞, (rkb , rkc ) → 0 and an optimal solution of (1.4) is found. The long step infeasible interior-point algorithm is formally stated as follows: Algorithm 4.2 Data: A, b, c. Parameters: ε ∈ (0, 1), γ ∈ (0, 1), β ≥ 1, and 0 < σmin < σmax ≤ 0.5. I Initial point (x0 , λ 0 , s0 ) ∈ N−∞ (γ, β ), and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1. If (xk , sk ) ∈ Sε , stop. Otherwise continue. Step 2. Solve the linear systems of equations ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 −rkb Δxk ⎣ 0 AT I ⎦ ⎣ Δλ k ⎦ = ⎣ ⎦, −rkc k k k k k S 0 X Δs −x ◦ s + σk µk e
(4.24)
to get (Δxk , Δλ k , Δsk ).
Step 3. Choose the largest αk ∈ [0, 1] such that
I (x(α), λ (α), s(α)) = (xk , λ k , sk ) + α(Δxk , Δλ k , Δsk ) ∈ N−∞ (γ, β ). (4.25)
and µ(α) = hold.
x(α)T s(α) ≤ (1 − 0.01α)µk n
(4.26)
Infeasible Interior-Point Method Algorithms for LP
•
65
Step 4. Set (xk+1 , λ k+1 , sk+1 ) = (x(αk ), λ (αk ), s(αk )).
and
µk+1 = µ(αk ).
Step 5. Set k + 1 → k.
end (for) To prove the convergence of Algorithm 4.2, we need to make the following as sumption: I(x∗ , s∗ )I∞ ≤ ρ. (4.27) The following simple relations are useful and listed as a lemma. Lemma 4.7 Let x and s be two vectors. Then, we have max{IxI∞ , IsI∞ } = I(x, s)I∞ , IxI1 + IsI1 = I(x, s)I1 , √ IxI∞ ≤ IxI2 ≤ IxI1 ≤ nxI2 ≤ nIxI∞ .
(4.28) (4.29)
Similar to the proof of Lemma 4.1, it is easy to show the following lemma. Lemma 4.8 Let (xk , λ k , sk ) be generated by Algorithm 4.2. Then, rk+1 = (1 − αk )rbk = νk+1 rb0 , rck+1 = (1 − αk )rck = νk+1 rc0 . b
(4.30)
Moreover, T
T
T
xk+1 sk+1 = (1 − αk )xk sk = νk+1 x0 s0 . Proof 4.8
(4.31)
The proof is similar to the one of Lemma 4.1, therefore, it is omitted.
I (γ, β ), Lemma 4.8 indicates that Assume (xk , λ k , sk ) ∈ N−∞
νk I(r0b , r0c )I/µk = I(rkb , rkc )I/µk ≤ β I(r0b , r0c )I/µ0 . Since this chapter considers infeasible interior-point algorithms, (r0b , r0c ) �= 0, which means that (4.32) νk ≤ β µk /µ0 . Let (x¯ , λ¯ , s¯) satisfy the following conditions:
Ax¯ = 0, AT λ¯ + s¯ = 0,
(4.33)
66
• Arc-Search Techniques for Interior-Point Methods
then, x¯ T s¯ = −x¯ T AT λ¯ = 0,
(4.34)
The following lemma is based on this simple relation. Lemma 4.9 There exists a positive constant C1 (depending on n, initial point, and the optimal solution) such that νk I(xk , sk )I1 ≤ C1 µk , for all k. (4.35) Proof 4.9 Let (x∗ , λ ∗ , s∗ ) be any primal dual solution for (1.4) and (1.5) and (x¯ , λ¯ , s¯) be defined as (x¯ , λ¯ , s¯) = νk (x0 , λ 0 , s0 ) + (1 − νk )(x∗ , λ ∗ , s∗ ) − (xk , λ k , sk ).
(4.36)
From Axk − b = rkb and νk Ax0 = νk (r0b + b), we have Ax¯ = A(νk x0 + (1 − νk )x∗ − xk )
= νk Ax0 + (1 − νk )Ax∗ − Axk = rkb + νk b + (1 − νk )b − rkb − b = 0,
and AT λ¯ + s¯ = = = = = =
AT (νk λ 0 + (1 − νk )λ ∗ − λ k ) + (νk s0 + (1 − νk )s∗ − sk ) νk AT λ 0 − AT λ k + νk s0 − sk + (1 − νk )(AT λ ∗ + s∗ ) νk (AT λ 0 + s0 ) − (AT λ k + sk ) + (1 − νk )c νk (AT λ 0 + s0 − c) − (AT λ k + sk − c) νk r0c − rkc 0.
This verifies that condition (4.33) holds, therefore, (4.34) holds. This gives 0
= x¯ T s¯ =
νk x0 + (1 − νk )x∗ − xk
T
νk s0 + (1 − νk )s∗ − sk
T
T
T
= νk2 x0 s0 + (1 − νk )2 x∗ T s∗ + νk (1 − νk ) x0 s∗ + s0 x∗ T
T
T
T
T
+xk sk − νk xk s0 + sk x0 − (1 − νk ) sk x∗ + xk s∗ . T
T
(4.37)
Since (xk , sk ) ≥ 0 and (x∗ , s∗ ) ≥ 0, we have sk x∗ + xk s∗ ≥ 0. Since (x∗ , λ ∗ , s∗ ) is an optimal solution, we have x∗ T s∗ = 0. Simple rearranging of (4.37) yields T
T
νk xk s0 + sk x0
Infeasible Interior-Point Method Algorithms for LP T
T
T
T
≤ νk2 x0 s0 + νk (1 − νk ) x0 s∗ + s0 x∗ + xk sk .
•
67
(4.38)
Since (x0 , s0 ) > 0, we have ξ = min min{xi0 , si0 } > 0.
(4.39)
i=1,...,n
Using Lemma 4.7 gives ξ I(xk , sk )I1
min {xi0 }Isk I1 + min {si0 }Ixk I1
≤
i=1,...,n
kT 0
≤
i=1,...,n
kT 0
x s +s x
.
(4.40)
Combining (4.40) and (4.38) yields T
T
≤
νk xk s0 + sk x 0
≤
νk2 x0 s0 + xk sk + νk (1 − νk ) x0 s∗ + s0 x∗
[use (4.28)]
≤
νk2 nµ0 + nµk + νk (1 − νk ) Ix0 I∞ Is∗ I1 + Is0 I∞ Ix∗ I1
[use νk ∈ (0, 1)]
≤
νk nµ0 + nµk + νk I(x0 , s0 )I∞ I(s∗ , x∗ )I1
[use (4.32)]
≤
β nµk + nµk + β µk I(x0 , s0 )I∞ I(s∗ , x∗ )I1 /µ0 .
ξ νk I(xk , sk )I1
T
T
T
T
(4.41)
Setting C1 = ξ −1 β n + n + β I(x0 , s0 )I∞ I(s∗ , x∗ )I1 /µ0 proves the lemma. Applying the assumptions (4.22) and (4.27) into Lemma 4.9, we have Lemma 4.10 Suppose that the initial point is selected to satisfy the assumptions (4.22) and (4.27). Then, we have ρνk I(xk , sk )I1 ≤ 4β nµk , for all k. (4.42) Proof 4.10 It is easy to see from (4.22) and (4.39) that ξ = ρ. Due to the choice of ξ and (x0 , λ 0 , s0 ), we have I(x0 , s0 )I∞ ≤ ρ, I(s∗ , x∗ )I1 ≤ 2nI(s∗ , x∗ )I∞ ≤ 2nρ, T
µ0 = x0 s0 /n = ρ 2 .
68
� Arc-Search Techniques for Interior-Point Methods
Substituting these relations into (4.41) gives ρνk �(xk , sk )�1 [use β ≥ 1]
≤
� � β nµk + nµk + β µk �(x0 , s0 )�∞ �(s∗ , x∗ )�1 /µ0
≤ β nµk + nµk + β (µk /µ0 )2nρ 2
≤ 4β nµk .
This proves the lemma. In the following analysis, we will use inequalities for induced norms �Ax� p ≤ �A� p �x� p which is given in (1.3). Lemma 4.11 Suppose that the initial point is chosen as in (4.22) and assumption (4.27) holds. Then, there exists a constant C2 independent of n such that √ √ (4.43) �D−1 Δx� ≤ C2 n µk , �DΔs� ≤ C2 n µk . Proof 4.11
Let (x¯ , λ¯ , s¯) = (Δx, Δλ , Δs) + νk (x0 , λ 0 , s0 ) − νk (x∗ , λ ∗ , s∗ ).
(4.44)
Then, using the first row of (4.24), we have Ax¯ = AΔx + νk Ax0 − νk Ax∗
= −rbk + νk (rb0 + b) − νk b = 0
(4.45)
and using the second row of (4.24), we have AT λ¯ + s¯
= AT Δλ + νk AT λ 0 − νk AT λ ∗ + Δs + νk s0 − νk s∗ = (AT Δλ + Δs) + νk (AT λ 0 + s0 − c) − νk (AT λ ∗ + s∗ − c) = −rkc + νk r0c = 0. (4.46)
This shows that (x¯ , λ¯ , s¯) satisfies (4.33). From (4.34), we have 0 = x¯ T s¯
= (Δx + νk x0 − νk x∗ )T (Δs + νk s0 − νk s∗ ).
(4.47)
Using the last row of (4.24), we have S(Δx + νk (x0 − x∗ )) + X(Δs + νk (s0 − s∗ ))
= −XSe + σk µk e + νk [S(x0 − x∗ ) + X(s0 − s∗ )]. Pre-multiplying (XS)−1/2 on both sides gives D−1 (Δx + νk (x0 − x∗ )) + D(Δs + νk (s0 − s∗ ))
= −(XS)−1/2 (XSe − σk µk e) + νk [D−1 (x0 − x∗ ) + D(s0 − s∗ )]. (4.48)
Infeasible Interior-Point Method Algorithms for LP
69
•
Using (4.47), we have ID−1 (Δx + νk (x0 − x∗ )) + D(Δs + νk (s0 − s∗ ))I2 = ID−1 (Δx + νk (x0 − x∗ ))I2 + ID(Δs + νk (s0 − s∗ ))I2 .
(4.49)
Combining (4.48) and (4.49) gives ID−1 (Δx + νk (x0 − x∗ ))I2 + ID(Δs + νk (s0 − s∗ ))I2 ≤
I(XS)−1/2 II(XSe − σk µk e)I + νk ID−1 (x0 − x∗ )I + νk ID(s0 − s∗ )I
2
.
This shows ID−1 (Δx + νk (x0 − x∗ ))I
≤
I(XS)−1/2 II(XSe − σk µk e)I
+νk ID−1 (x0 − x∗ )I + ID(s0 − s∗ )I .(4.50)
Using triangle inequality and (4.50) yields ID−1 ΔxI
≤ ID−1 (Δx + νk (x0 − x∗ )) − νk D−1 (x0 − x∗ )I ≤ ID−1 (Δx + νk (x0 − x∗ ))I + Iνk D−1 (x0 − x∗ )I
≤
I(XS)−1/2 II(XSe − σk µk e)I +2νk ID−1 (x0 − x∗ )I + 2νk ID(s0 − s∗ )I.
(4.51)
Noticing that I(XSe − σk µk e)I2 [use (4.29)] [use σk ≤ 0.5]
= ≤ ≤ ≤
IXSeI22 − 2σk µk xT s + σk2 µk2 n IXSeI21 − 2σk nµk2 + σk2 nµk2 (nµk )2 − 2σk nµk2 + σk2 nµk2 n2 µk2 ,
and I(XS)−1/2 I = max √ i
1 1 ≤√ , xi si γ µk
(4.52)
combining these two inequality gives n √ I(XS)−1/2 II(XSe − σk µk e)I ≤ √ µk . γ
(4.53)
To have an estimate for the last two terms in (4.51), we will use the following rela tions. ID−1 eI2 = I(XS)−1/2 SeI2 ≤ I(XS)−1/2 IIsI1 , IDeI ≤ I(XS)−1/2 IIxI1 .
Since the initial point is chosen as in (4.22) and (4.27), we have ρe ≥ x0 − x∗ ≥ 0, ρe ≥ s0 − s∗ ≥ 0,
70
• Arc-Search Techniques for Interior-Point Methods
therefore, ID−1 (x0 − x∗ )I + ID(s0 − s∗ )I ≤ ρ(ID−1 eI + IDeI). Using these relations, (4.52), and Lemma 4.10, we can estimate the last two terms in (4.51); 2νk ID−1 (x0 − x∗ )I + ID(s0 − s∗ )I ≤
2νk ρ ID−1 eI + IDeI
≤
2νk ρ I(XS)−1/2 sI + I(XS)−1/2 xI
2νk ρI(x, s)I1 I(XS)−1/2 I 1 ≤ 8β nµk √ γ µk √ 8β n ≤ √ µk γ ≤
Let
(4.54)
9β C2 = √ . γ
Substituting (4.53) and (4.54) into (4.51) proves the first claim of the lemma. A similar argument can prove the second claim. The last important lemma in this section is given as follows. Lemma 4.12 Suppose that the initial point is chosen as in (4.22) and assumption (4.27) holds. Then, there exists a constant α¯ ∈ (0, 1) such that the following three conditions are satisfied for all α ∈ [0, α¯ ] and all k ≥ 0: (xk + αΔx)T (sk + αΔs) ≥ (1 − α)(xk )T sk , γ (xik + αΔxi )(sik + αΔsi ) ≥ (xk + αΔx)T (sk + αΔs), n (xk + αΔx)T (sk + αΔs) ≤ (1 − 0.01α)(xk )T sk .
(4.55) (4.56) (4.57)
More specifically, there exists some C3 > 0, independent of n such that α¯ ≥
C3 . n2
(4.58)
In addition, the conditions (4.25) and (4.26) are satisfied for all α ∈ [0, α¯ ] and all k≥0
Infeasible Interior-Point Method Algorithms for LP
Proof 4.12
•
71
In view of (4.43), we have
ΔxT Δs = (D−1 Δx)T (DΔs) ≤ ID−1 ΔxIIDΔsI ≤ C22 n2 µk −1 2 2 |Δxi Δsi | = |D−1 ii Δxi ||Dii Δsi | ≤ ID ΔxIIDΔsI ≤ C2 n µk
(4.59) (4.60)
From the last row of (4.24), we have si Δxi + xi Δsi = −xi si + σk µk .
(4.61)
Taking summation over i = 1, . . . , n gives sT Δx + xT Δs = (σk − 1)xT s.
(4.62)
Now we find the condition for α such that (4.55) holds. Using (4.62), we have (x + αΔx)T (s + αΔs) = xT s + α(σk − 1)xT s + α 2 ΔxT Δs [use (4.59)] ≥ (1 − α)xT s + ασk xT s − α 2C22 n2 µk ≥ (1 − α)xT s + (ασmin − α 2C22 n)xT s. If the last term of the above inequality is greater than zero, i.e., α≤
σmin , nC22
(4.63)
then, inequality (4.55) holds. Next, we find the condition for α such that (4.56) holds. Using (4.60) and (4.61) and xik ski ≥ γ µk , we have (xik + αΔxi )(sik + αΔsi ) ≥ ≥
xik sik (1 − α) + ασk µk − α 2C22 n2 µk γ(1 − α)µk + ασk µk − α 2C22 n2 µk .
In view of (4.24) and (4.59), we have γ γ (x + αΔx)T (s + αΔs) ≤ [(1 − α)xT s + (ασk + α 2C22 n)xT s] n n ≤ γ[(1 − α)µk + (ασk + α 2C22 n)µk ]
(4.64)
Combining the above two inequalities gives γ (xik + αΔxi )T (ski + αΔsi ) − (x + αΔx)T (sT + αΔs) n ≥ ασk µk − α 2C22 n2 µk − γ(ασk + α 2C22 n)µk ≥ (1 − γ)ασmin µk − 2α 2C22 n2 µk . Therefore, inequality (4.56) holds if the right hand side of above inequality is greater than zero, i.e., if (1 − γ)σmin α≤ . (4.65) 2n2C22
72
� Arc-Search Techniques for Interior-Point Methods
Finally, we find the condition for α such that (4.57) holds. Using the inequality of (4.64) and σ ≤ 0.5, we have 1 (x + αΔx)T (s + αΔs) − (1 − 0.01α)µk n ≤ [(1 − α)µk + (ασk + α 2C22 n2 )µk ] − (1 − 0.01α)µk ≤ −0.99α µk + (0.5α + α 2C22 n2 )µk ≤ −0.49α µk + α 2C22 n2 µk Therefore, inequality (4.57) holds if the right hand side of above inequality is smaller than zero, i.e., if 0.49 α ≤ 2 2. (4.66) n C2 Setting � �σ min (1 − γ)σmin 0.49 C3 ≤ min , , (4.67) C22 2C22 C22 proves (4.58). To show the last claim of the lemma, in view of (4.32), we have β /µ0 ≥ νk /µk . Since rb (α) = Ax(α) − b = (1 − α)rkb , rc (α) = AT λ (α) + s(α) − c = (1 − α)rkc ,
we have �(r0b , rc0 )� �(rb (α), rc (α))� (1 − α)�(rkb , rkc )� �νk (r0b , rc0 )� ≤ ≤ ≤β . µk µ0 µ(α) (1 − α)µk
This shows that the first inequality of (4.23) holds. The positivity is guaranteed be cause (x0 , s0 ) > 0 and (4.55) holds for all α ≤ α¯ . Finally, xik ski ≥ γ µk is guaranteed by (4.56) and the initial point selection. The convergence theorem is obtained by combining Lemma 4.12 and Theorem 1.4 Theorem 4.3 Let ε ∈ (0, 1) be given. Suppose that the initial point is chosen as in (4.22) and assumption (4.27) holds. Then, there exists an index K with K = O(n2 log(1/ε)) such that all iterates {(xk , λ k , sk )} generated by Algorithm 4.2 satisfy µk ≤ ε,
all k ≥ K.
(4.68)
Remark 4.3 The polynomial bound for short step infeasible interior-point al gorithm, which searches for the optimizer in a narrow neighborhood, is K = O(n log(1/ε)), while the polynomial bound for long step infeasible interior-point algorithm, which searches for the optimizer in a wider neighborhood, is K = O(n2 log(1/ε)). However, computational experience shows that searching over a wider neighborhood is more efficient than searching over a narrow neighborhood. This dilemma will be resolved in Chapters 6 and 8 when arc-search is introduced.
Infeasible Interior-Point Method Algorithms for LP
4.3
•
73
Mehrotra’s Infeasible Interior-Point Algorithm
Since the 1990s, most interior-point software packages have been based on Mehro tra’s algorithm because it was demonstrated to be very efficient in computational practice, which is one of the main reasons that interior-point methods have attracted so many researchers. As a matter of fact, Mehrotra’s algorithm is the most efficient one among all algorithms introduced up to now in this book. Mehrotra’s algorithm is based on several brilliant intuitions: (a) start from an infeasible point which avoids the computational effort for finding a feasible starting point and makes the overall al gorithm much more efficient, (b) search for the optimizer in a wide neighborhood to achieve a long step size and reduce the iteration numbers, (c) scale back a little from the calculated step size to prevent the iterate from coming too close to the boundary, which may cause difficulties in the following searches, (d) make full use of Cholesky factorization so that the corrector direction can be calculated in an efficient way, (e) use a good heuristic to select the centering parameter σ , and (f) use different step sizes for primal and dual variables. The original Mehrotra’s algorithm [89] is slightly different from the one discussed in this section. The original one suggested searching for the optimizer along an arc similar to [97], which was briefly discussed in Chapter 3. Because the Taylor approximation for the central-path is good only in a small area, the later version described in [144] changed the search over a corrector direction (a straight line). The algorithm is formally provided below: Algorithm 4.3 Data: A, b, c. Parameters: ε > 0, Initial point (x0 , λ 0 , s0 ) with (x0 , s0 ) > 0, and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1. If (xk , sk ) ∈ Sε , stop. Otherwise continue. Step 2. Solve the linear systems of equations (4.4) to get (Δx p , Δλ p , Δs p ). Step 3. Calculate α p = arg max{α ∈ [0, 1] | xk + αΔx p ≥ 0},
αd = arg max{α ∈ [0, 1] | sk + αΔs p ≥ 0}
(4.69)
and calculate µ p = (x + α p Δx p )T (s + αd Δs p )/n.
(4.70)
Step 4. Set σk =
µp µk
3
.
(4.71)
74
• Arc-Search Techniques for Interior-Point Methods
Step 5. Solve the linear systems of equations ⎤ ⎡ ⎤ ⎡ ⎤⎡ A 0 0 0 Δxc ⎦ ⎣ 0 AT I ⎦ ⎣ Δλ c ⎦ = ⎣ 0 Δsc σk µk e − Δx p ◦ Δs p Sk 0 Xk
(4.72)
to get (Δxc , Δλ c , Δsc ). Step 6. Calculate the search direction (Δxk , Δλ k , Δsk ) = (Δx p , Δλ p , Δs p ) + (Δxc , Δλ c , Δsc ) Step 7. Calculate the step size αkp = arg max{α ∈ [0, 1] | xk + αΔxk ≥ 0},
αkd = arg max{α ∈ [0, 1] | sk + αΔsk ≥ 0}
(4.73)
Set αkp = min(0.99αkp , 1) and αkd = min(0.99αkd , 1) Step 8. Calculate the next iterate xk+1 = xk + αkp Δxk , (λ k+1 , sk+1 ) = (λ k , sk ) + αkd (Δλ k , Δsk ).
(4.74)
T
and µk+1 =
xk+1 sk+1 . n
Step 9. Set k + 1 → k. end (for) We identify the intuitions from (a) to (f) applied in the algorithm one by one: (a) it is clear that initial point is infeasible because it meets only one condition (x0 , s0 ) > 0, (b) the algorithm takes the longest step in the widest neighborhood with the only re striction (xk , sk ) ≥ 0 in Steps 3 and 7 when step sizes α p , αd , αkp , and αkd are selected. This neighborhood is wider than both wide feasible neighborhood and wide infeasi ble neighborhood, (c) to prevent the iterate from being too close to the boundary, a scaling back factor 0.99 is used in Step 7, (d) Cholosky factorization is performed one time but used twice in Steps 2 and 5 because the matrices in (4.4) and (4.72) are the same, which reduces the computational cost, (e) the centering parameter σ is selected using a heuristic method in Step 4, and (f) the step sizes for primal and dual variables are different, this can be seen from Steps 3 and 7. Although Mehrotra’s algorithm is computationally very efficient, much more ef ficient than all algorithms presented so far in this book, it has not been proved to be convergent. As a matter of fact, there exists analysis that supports a guess that Mehrotra’s algorithm does not converge [19]. Remark 4.4 With extensive computational experience, this author believes that In tuition (f) may have least impact to enhance the efficiency of the algorithm if there is
Infeasible Interior-Point Method Algorithms for LP
�
75
any. Instead of using some heuristics to select the the centering parameter σ (Intuition (e)), a better way is to optimally select the step size α and the centering parameter σ at the same time, as suggested for the modified short step path-following algorithm that was discussed in Chapter 3. Intuition (d) should always be used when higherorder methods are used because it significantly reduces the computational cost, in fact, higher-order methods should always be considered, as they find better approxi mation for the central path, which leads to a longer step size; in addition, arc-search should be used instead of line search as proposed in Mehrotra’s method. Using intu itions (c) and (b) together is more efficient than the narrow and wide neighborhood considered so far in this book even though an analytic neighborhood may be helpful in convergence analysis. In summary, starting with infeasible starting point, using higher-order derivative and widest neighborhood, and optimally selecting σ and α at the same time, are the most important strategies for interior-point algorithms.
4.4
Concluding Remarks
At this point, we have discussed the most important interior-point algorithms. We know three useful strategies: (a) higher-order method to give better search direction, (b) wide neighborhood to increase the step size, and (c) infeasible initial point to avoid the cost of finding a feasible initial point, that enhances the efficiency of the algorithms. We also know which strategies are used for what algorithm(s). In addi tion, we know which polynomial bounds are associated with what algorithm(s). In this section, we summarize these results and hope that this information will give us some useful insights that will direct our research. We list these algorithms in increasing order of complexity bounds, starting from the lowest polynomial bound, which is supposed to be the most efficient algorithm(s). (1a) Algorithm 3.1: Feasible short step interior-point algorithm; it uses firstorder derivatives to get the search direction, searches for the √ next iterate in a narrow neighborhood, and its polynomial bound is O( n log L), where log L := log(1/ε). (1b) Algorithm 3.3: Feasible predictor-corrector interior-point algorithm; it uses first-order derivatives twice to first predict the iterate in a narrow neighbor hood with a large θ then correct the iterate √in a narrow neighborhood with a small θ , and its polynomial bound is O( n log L), where log L := log(1/ε). (2a) Algorithm 3.5: Feasible short step interior-point algorithm; it uses higher-order derivatives to get the search direction, searches for the next iterate in a narrow neighborhood, its polynomial bound is O(n3/4 log L3/2 ) when the second-order derivatives is used, where log L := [max(log n, log(1/ε), log µ0 )]. (2b) Algorithm 3.2: Feasible long step interior-point algorithm; it uses firstorder derivatives to get the search direction, searches for the next iterate
76
• Arc-Search Techniques for Interior-Point Methods
in a wide neighborhood, and its polynomial bound is O(n log L), where log L := log(1/ε). (2c) Algorithm 4.1:Infeasible short step interior-point algorithm; it uses secondorder derivatives to get the search direction, searches for the next iterate in a narrow neighborhood, and its polynomial bound is O(n log L), where 0T 0
log L := max log( x
s
ε
0
T 0
), log( 1Ax ε−b1 ), log( 1A
λ +s0 −c1 ) ε
.
(3) Algorithm 4.2: Infeasible long step interior-point algorithm; it uses firstorder derivatives to get the search direction, searches for the next iterate in a wide neighborhood, and its polynomial bound is O(n2 log L), where log L := log(1/ε). (4) Algorithm 4.3: Infeasible long step interior-point algorithm; it uses secondorder derivatives to get the search direction, searches for the next iterate in the widest neighborhood, and its convergence and polynomial bound are unknown. While researchers have known for a long time that incorporating infeasible starting point, wide neighborhood, and higher-order derivatives will significantly enhance the computational efficiency, this list shows that incorporating more strategies leads to worse polynomial bounds. Given that the polynomial bound is supposed to be used to measure the efficiency of the algorithms, this is a dilemma that perplexed researchers for a long time, i.e., there is a big gap between the theory and practical computational experience. Closing this gap is the main focus of the next part of the book. In addition to using the three known strategies, we will introduce two new key ideas: (a) use the arc-search technique, and (b) select step size α and centering parameter σ at the same time using a systematic and optimal method. Both of these ideas were proposed by this author in [151, 154].
ARC-SEARCH INTERIOR-POINT METHODS FOR LINEAR PROGRAMMING
II
Chapter 5
A Feasible Arc-Search Algorithm for LP Since Karmarkar’s algorithm [57] was proved to be polynomial and demonstrated the potential to be computationally competitive in comparison to the simplex method, many interior-point polynomial algorithms, for example [42, 70, 71, 95, 97, 93, 114], have been developed. Since then, several software packages that demonstrated com putational efficiency of the interior-point methods have been released. The most effi cient ones, such as OB1, linprog, PCx, and LOGO,1 are based on MPC (Mehro tra’s predictor-corrector algorithm), proposed by Mehrotra [89] and refined by other researchers. It is now believed that the state-of-the-art interior-point algorithms can be more efficient than simplex algorithm for some large problems and may not be as efficient as simplex algorithm for some other problems [133]. The popular interior-point al gorithms use three strategies to improve efficiency: (a) start from infeasible initial interior-point, (b) use higher-order derivatives, and (c) search in a larger neighbor hood. These strategies were discussed in previous chapters and resulted in poorer polynomial bounds. This chapter discusses a higher-order interior-point algorithm using an arc-search idea proposed in [151, 153] for linear programming. The algorithm searches for op timizers along an ellipse that approximates the central path. This strategy is different from other higher-order methods, such as [97, 89, 41, 5], which search for optimizers either along an arc of power series approximation or along a straight line related to the first and higher-order derivatives of the central path. This chapter will prove that 1 the proposed algorithm has polynomial complexity bound O(n 2 log(1/ε)) which is equal to the best known complexity bound established in Chapter 3 for the short 1 HOPDM
is based on some other higher-order algorithms in [41, 5].
80
• Arc-Search Techniques for Interior-Point Methods
step interior-point algorithm and better than the complexity bound for higher-order polynomial algorithms derived in [97]. Some numerical test will be given to show its advantage over algorithms that search for the optimizer along straight lines.
5.1
A Polynomial Arc-Search Algorithm for LP
We consider the linear programming in the standard form, described as (1.4), and its dual programming also in standard form, described as (1.5). Throughout this chapter, we make the following assumptions. Assumptions: 1. A is a full rank matrix. 2. F o is not empty. Assumption 1 is trivial, as A can always be reduced to meet this condition with oper ation counts bounded by polynomial complexity. Assumption 2 implies the existence of a central path. When Assumption 2 does not hold, it may be a problem for the al gorithm discussed in this chapter. This case will be discussed in Chapters 7 and 8, where infeasible interior-point algorithms are the main topic. It is well known that x ∈ Rn is an optimal solution of (1.4) if and only if x, λ , and s satisfy the KKT conditions (1.8), which are restated below: Ax = b,
(5.1a)
A y + s = c, (x, s) ≥ 0, xi si = 0, i = 1, . . . , n.
(5.1b)
(5.1c)
(5.1d)
T
The first three conditions imply that x is a feasible solution of the primal problem and (λ , s) is a feasible solution of the dual problem. The last condition implies that the duality gap is zero. The central path C is parametrized by a scalar τ > 0 as follows. For each interior point (x, λ , s) ∈ C on the central path, there is a τ > 0 such that Ax = b
(5.2a)
A λ +s = c (x, s) > 0 xi si = τ, i = 1, . . . , n.
(5.2b)
(5.2c)
(5.2d)
T
Therefore, the central path is an arc in R2n+m parametrized as a function of τ and is denoted as C = {(x(τ), λ (τ), s(τ)) : τ > 0}. (5.3)
As τ → 0, the central path (x(τ), λ (τ), s(τ)) represented by (5.2) approaches to a solution of linear programming represented by (1.4). Theoretical analyses demon strated [99] that searching along the central path is an ideal way to find optimizers.
A Feasible Arc-Search Algorithm for LP
•
81
However, there is no practical way to calculate the entire arc of the central path. All path-following algorithms discussed in Chapter 3 try (a) to search, from the current point (x, λ , s) along certain directions related to the tangent of the central path, to a new point that reduces the value of xT s (the duality gap) and simultaneously satisfies (5.2a), (5.2b), and (5.2c), thereby moving the current point towards the solution, and (b) to stay close to the central path, thereby being able to make good progress in the next search. We will consider a central path-following algorithm that searches for the optimizers (located at the boundary of F) along an arc that approximates the central path C ∈ F o ⊂ F.
5.1.1 Ellipsoidal approximation of the central path First, the ellipse in three dimensional space [18] is extended to 2n + m dimensional space and denoted by E. This ellipse is used to approximate the central path C de scribed by (5.2), where E = {(x(α), λ (α), s(α)) : (x(α), λ (α), s(α)) =ba cos(α) +bb sin(α) +bc},
(5.4)
ba ∈ R2n+m and bb ∈ R2n+m are the axes of the ellipse, andbc ∈ R2n+m is the center of the ellipse. Given a point y = (x, λ , s) = (x(α0 ), λ (α0 ), s(α0 )) ∈ E, which is close to or on the central path, we will determineba, bb,bc and α0 such that the first and second order derivatives of (x(α0 ), λ (α0 ), s(α0 )) have the form as if they were on the central path (though they may not be on the central path). Therefore, we want the first and second order derivatives at (x(α0 ), λ (α0 ), s(α0 )) ∈ E to satisfy ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ A 0 0 0 ⎣ 0 AT I ⎦ ⎣ λ˙ ⎦ = ⎣ 0 ⎦ , (5.5) x◦s S 0 X s˙ ⎡
A ⎣ 0 S
0 AT 0
⎤⎡ 0 I ⎦⎣ X
⎤ ⎡ ⎤ x¨ 0 ⎦. 0 λ¨ ⎦ = ⎣ −2x˙ ◦ s˙ s¨
(5.6)
Intuitively, this ellipse approximates the central path well when (x(α0 ), λ (α0 ), s(α0 )) is close to the central path and α0 ± ε → α0 . To simplify the notation, let y(α) = (x(α), λ (α), s(α)) =ba cos(α) +bb sin(α) +bc.
(5.7)
y˙ (α) = (x˙ (α), λ˙ (α), s˙(α)) = −ba sin(α) +bb cos(α), y¨ (α) = (x¨ (α), λ¨ (α), s¨(α)) = −ba cos(α) −bb sin(α).
(5.8)
Then, (5.9)
It is straightforward to verify from (5.7), (5.8), and (5.9) that ba = −y˙ sin(α) − y¨ cos(α), bb = y˙ cos(α) − y¨ sin(α), bc = y + y¨ .
(5.10)
The search along the ellipse will be carried out on the interval α ∈ [0, π2 ]. In the next subsection, we will show that the calculation of α0 can be avoided.
82
• Arc-Search Techniques for Interior-Point Methods
5.1.2 Search along the approximate central path Although one can search for a better feasible point with reduced duality gap along the ellipse defined by (5.7) which needs to compute ba, bb, andbc, we will use a sim plified formula that reduces the operation counts slightly and is more convenient for convergence analysis. Denote ⎡ ⎤ ⎡ ⎤ −x˙ sin(α) − x¨ cos(α) ax ba = ⎣ aλ ⎦ = −y˙ sin(α) − y¨ cos(α) = ⎣ −λ˙ sin(α) − λ¨ cos(α) ⎦ , as −s˙ sin(α) − s¨ cos(α) ⎡ ⎤ ⎡ ⎤ x˙ cos(α) − x¨ sin(α) bx bb = ⎣ bλ ⎦ = y˙ cos(α) − y¨ sin(α) = ⎣ λ˙ cos(α) − λ¨ sin(α) ⎦ , bs s˙ cos(α) − s¨ sin(α) and
⎡
⎤ ⎡ ⎤ x + x¨ cx bc = ⎣ cλ ⎦ = y + y¨ = ⎣ λ + λ¨ ⎦ . cs s + s¨
Let x(α) and s(α) be the updated x and s after the search, this means, x = x(α0 ) = ax cos(α0 ) + bx sin(α0 ) + cx . Using this relation, we have
= = = = =
(5.11) x(α) = ax cos(α0 − α) + bx sin(α0 − α) + cx ax (cos(α0 ) cos(α) + sin(α0 ) sin(α)) + bx (sin(α0 ) cos(α) − cos(α0 ) sin(α)) +cx − cx cos(α) + cx cos(α) x cos(α) + ax sin(α0 ) sin(α) − bx cos(α0 ) sin(α) + cx (1 − cos(α)) x cos(α) − (x˙ sin(α0 ) + x¨ cos(α0 )) sin(α0 ) sin(α) −(x˙ cos(α0 ) − x¨ sin(α0 )) cos(α0 ) sin(α) + (x + x¨ )(1 − cos(α)) x − x˙ (sin2 (α0 ) sin(α) + cos2 (α0 ) sin(α)) +x¨ (− sin(α0 ) cos(α0 ) sin(α) + sin(α0 ) cos(α0 ) sin(α) + (1 − cos(α))) x − x˙ sin(α) + x¨ (1 − cos(α)). (5.12)
Similarly, s(α) = s−s˙ sin(α)+s¨(1−cos(α)), λ (α) = λ −λ˙ sin(α)+ λ¨ (1−cos(α)). (5.13) As pointed out above, (5.12) and (5.13) do not explicitly depend on α0 . We sum marize the above discussion as the following Theorem 5.1 Let (x(α), λ (α), s(α)) be an arc defined by (5.4) passing through a point (x, λ , s) ∈ E, and its first and second derivatives at (x, λ , s) be (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨), which are defined by (5.5) and (5.6). Then, an ellipsoidal approximation of the central path is given by x(α) = x − x˙ sin(α) + x¨ (1 − cos(α)), (5.14)
A Feasible Arc-Search Algorithm for LP
�
83
λ (α) = λ − λ˙ sin(α) + λ¨ (1 − cos(α))
(5.15)
s(α) = s − s˙ sin(α) + s¨(1 − cos(α)).
(5.16)
Assuming (x, s) > 0, one can easily see that if xx˙ , xx¨ , ss˙ , and ss¨ are bounded by some constants, and if α is small enough, then x(α) > 0 and s(α) > 0. Lemma 5.1 Let x˙ , s˙, x¨ , and s¨ be the solution of (5.5) and (5.6). Then, x˙ T s˙ = 0, x¨ T s˙ = 0, x˙ T s¨ = 0, x¨ T s¨ = 0. Proof 5.1 Pre-multiplying x˙ T or x¨ T to the second rows of (5.5) and (5.6), and using the first rows of (5.5) and (5.6) gives the results. Denote the duality measure by µ=
xT s , n
(5.17)
We will show that searching along the ellipse, at least for a small step, will reduce the T
duality measure, i.e., µ(α) = x(α)n s(α) < µ. If (x(α), s(α)) > 0 holds in all iterations, reducing the duality measure to zero means approaching to the solution of the linear programming. Notice that µ(α) = = =
[x − x˙ sin(α) + x¨ (1 − cos(α))]T [s − s˙ sin(α) + s¨(1 − cos(α))] n T T T [x s − sin(α)(x s˙ + x˙ s)] n µ(1 − sin(α)) (5.18)
holds for any choice of α ∈ [0, π2 ] due to Lemma 5.1 and equation (5.5), this means that the larger the α is, the more improvement the µ(α) will be.
5.1.3 A polynomial arc-search algorithm Let θ ∈ (0, 0.5), and N2 (θ ) = {(x, λ , s) : Ax = b, AT λ + s = c, (x, s) > 0, �x ◦ s − µe� ≤ θ µ}. (5.19) Similar to a strategy used in Chapter 3 (which was developed in [93]), we present a predictor-corrector type polynomial algorithm which uses N2 (θ ) and N2 (2θ ). The algorithm starts the iterate inside N2 (θ ) and restricts the arc-search in N2 (2θ ). After the search finds an iterate with smaller duality measure, a corrector step brings the iterate from N2 (2θ ) back to N2 (θ ) without changing the duality measure.
84
� Arc-Search Techniques for Interior-Point Methods
The following lemma indicates that if the initial iterate satisfies the equality con straints in N2 (θ ), a search along the ellipse will satisfy the equality constraints in N2 (2θ ). Lemma 5.2 Let (x, λ , s) be a strictly feasible point of (1.4) and (1.5), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) satisfy (5.5) and (5.6), (x(α), λ (α), s(α)) be calculated using (5.12) and (5.13), then the following conditions hold. Ax(α) = b, AT λ (α) + s(α) = c. Proof 5.2
Direct calculation verifies the result.
Before we present the algorithm, we will introduce several simple lemmas that will be used repeatedly in the convergence analysis. Lemma 5.3 Let p > 0, q > 0, and r > 0. If p + q ≤ r, then pq ≤
r2 4.
Lemma 5.4 For α ∈ [0, π2 ], sin(α) ≥ sin2 (α) = 1 − cos2 (α) ≥ 1 − cos(α) ≥
1 2 sin (α). 2
(5.20)
The next Lemma shows that xx˙ , xx¨ , ss˙ , and ss¨ are bounded by some constants. The lemma also gives some useful estimations and notations to be used later. Lemma 5.5 Let (x˙ , λ˙ , s˙) be calculated by (5.5) and (x¨ , λ¨ , s¨) be calculated by (5.6). Assuming that T (x, λ , s) ∈ N2 (θ ), µ = xn s , then � x˙ �2 (1 + θ ) � s˙ �2 (1 + θ ) � � � � n, � � ≤ n, � � ≤ x (1 − θ ) s (1 − θ ) �3 �3 � � � x¨ �2 � s¨ �2 1+θ 1+θ � � � � n2 , n2 , � � ≤4 � � ≤4 x 1−θ s 1−θ � x˙ ◦ s˙ � (1 + θ )2 � x¨ ◦ s¨ � 4(1 + θ )4 � � � � n, � n2 , � �≤ �≤ (1 − θ )3 µ (1 − θ ) µ � x¨ ◦ s˙ � 2(1 + θ )3 3 � x˙ ◦ s¨ � 2(1 + θ )3 3 � � � � n2 , � n2 . � �≤ �≤ (1 − θ )2 µ (1 − θ )2 µ
(5.21a) (5.21b) (5.21c) (5.21d)
A Feasible Arc-Search Algorithm for LP
•
85
ˆ denote the orthonormal basis of the null space of A, i.e., AA ˆ = 0. Proof 5.3 Let A It is easy to see from the first row of (5.5) that there exists a vector v such that x˙ ˆ = X−1 Av. x
(5.22)
From the second row and the third row of (5.5), we have s˙ s˙ x˙ + = e. S−1 AT λ˙ + = 0, s s x
(5.23)
Combining (5.22) and (5.23) gives � −1 � ˆ , −S−1 AT X A
�
v λ˙
� = e.
(5.24)
ˆ are full rank, we have Since both A and A � � ˆ )−1 A ˆ T S � −1 ˆ T SX−1 A � (A ˆ , −S−1 AT = I. X A −1 T −1 −(AXS A ) AX Taking the inverse in (5.24) gives � � � � ˆ )−1 A ˆ TS ˆ T SX−1 A v (A = e. λ˙ −(AXS−1 AT )−1 AX Substituting this relation to (5.22) gives � � x˙ ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T Se = I − S−1 AT (AXS−1 AT )−1 AX e, = X−1 A x
(5.25)
� � s˙ ˆ (A ˆ )−1 A ˆ T S e. ˆ T SX−1 A = S−1 AT (AXS−1 AT )−1 AXe = I − X−1 A s
(5.26)
and
Since (x, λ , s) ∈ N2 (θ ), we have (1 − θ )µI ≤ XS ≤ (1 + θ )µI. Repeatedly using this estimation and (5.25) yields � � � x˙ �2 � � x
ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T X−1 X−1 A ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T Se = eT ST A ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T SX−1 A ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T Se ≤ ((1 − θ )µ)−1 eT ST A ˆ (A ˆ T SX−1 A ˆ )−1 A ˆ T Se = ((1 − θ )µ)−1 eT SA (1 + θ ) T ˆ ˆ T 2 ˆ −1 ˆ T ≤ e SA(A S A) A Se. (1 − θ )
Using QR decomposition ˆ =Q SA
�
R1 0
�
� = [Q1 , Q2 ]
R1 0
� = Q1 R1 ,
86
• Arc-Search Techniques for Interior-Point Methods
where Q1 and Q2 are orthonormal matrices and orthogonal to each other, we have � x˙ �2 (1 + θ ) (1 + θ ) (1 + θ ) � � eT Q1 QT1 e ≤ IeI2 = n. � � ≤ x (1 − θ ) (1 − θ ) (1 − θ )
(5.27)
� s˙ �2 (1 + θ ) (1 + θ ) � � eT XAT (AX2 AT )−1 AXe ≤ n. � � ≤ s (1 − θ ) (1 − θ )
(5.28)
� x˙ s˙ � � x˙ �� s˙ � (1 + θ ) � � � �� � n. � ◦ � ≤ � �� � ≤ x s x s (1 − θ )
(5.29)
Similarly,
From these two inequalities, we have
Since Ix ◦ s − µeI ≤ θ µ, for any i, it follows that (1 − θ )µ ≤ xi si ≤ (1 + θ )µ, or equivalently, xi si maxi xi si mini xi si xi si ≤ ≤µ≤ ≤ . (5.30) 1+θ 1+θ 1−θ 1−θ Using (5.29), we have � � � � � � � � � x˙ s˙ � (1 + θ )2 �x˙ ◦ s˙� �x˙ ◦ s˙� � � ≤ (1 + θ ) ≤ (1 + θ )� ◦ � ≤ n. maxi (xi si ) 1−θ µ x s ˆ Let φ = −2 xx˙ ◦ ss˙ . From (5.6), there exists a vector v such that xx¨ = X−1 Av, s¨ x¨ s¨ −1 T ¨ S A λ + s = 0, and x + s = φ . Following a proof similar to the one used above, it is easy to get � x¨ �2 1 + θ � � Iφ I2 � � ≤ x 1−θ � � 1+θ � x˙ s˙ � � �2 ◦ ≤ �−2 � 1−θ x s �3 � 1+θ ≤4 n2 , (5.31a) 1−θ � s¨ �2 1 + θ � � Iφ I2 � � ≤ s 1−θ �3 � 1+θ ≤4 n2 . (5.31b) 1−θ From (5.30), (5.27), (5.28), and (5.31), we get � � � � � x˙ s¨ � � x˙ �� s¨ � �x˙ ◦ s¨� (1 + θ )3 3 � � � �� � ≤ (1 + θ )� ◦ � ≤ (1 + θ )� �� � ≤ 2 n2 . (1 − θ )2 µ x s x s � � � � � x¨ s˙ � � x¨ �� s˙ � �x¨ ◦ s˙� (1 + θ )3 3 � � � �� � n2 . ≤ (1 + θ )� ◦ � ≤ (1 + θ )� �� � ≤ 2 (1 − θ )2 µ x s x s
A Feasible Arc-Search Algorithm for LP
� � � � �x¨ ◦ s¨� µ
�
87
� x¨ s¨ � � x¨ �� s¨ � (1 + θ )4 2 � � � �� � n . ≤ (1 + θ )� ◦ � ≤ (1 + θ )� �� � ≤ 4 (1 − θ )3 x s x s
This finishes the proof. For the sake of simplicity in the analysis, we assume that an initial point inside N2 (θ ) is available (this can be achieved by calling Algorithm 1 of [20] prior to calling the algorithm below). The proposed arc-search algorithm is given as follows: Algorithm 5.1 Data: A, b, c, θ = 0.292, ε > 0, initial point (x0 , λ 0 , s0 ) ∈ N2 (θ ), and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1: If µk < ε, and �rb � = �Axk − b� ≤ ε, �rc � = �AT λ k + sk − c� ≤ ε, �rt � = �xk sk e − µk e� ≤ ε. (5.32)
hold, stop. Otherwise continue. Step 2: Solve the linear systems of equations (5.5) and (5.6) to get (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨). Step 3: Find the smallest positive sin(α) that satisfies quartic polynomial in terms of sin(α) � � �� � � �� �� �� � � � � � � � � q(α) = �x¨ ◦ s¨� + �x˙ ◦ s˙� sin4 (α) + �x˙ ◦ s¨� + �x¨ ◦ s˙� sin3 (α) + θ µk sin(α) − θ µk = 0. (5.33)
Set (x(α), λ (α), s(α)) = (xk , λ k , sk )−(x˙ , λ˙ , s˙) sin(α)+(x¨ , λ¨ , s¨)(1−cos(α)), (5.34) and (5.35) µ(α) = µk (1 − sin(α)). Step 4: Calculate (Δx, Δλ , Δs) by solving ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 Δx 0
⎣ 0 AT ⎦ . (5.36)
I ⎦ ⎣ Δλ ⎦ = ⎣
0
S(α) 0 X(α) Δs µ(α)e − x(α) ◦ s(α) Set (xk+1 , λ k+1 , sk+1 ) = (x(α), λ (α), s(α)) + (Δx, Δλ , Δs) T
and µk+1 = end (for)
xk+1 sk+1 . n
Set k + 1 → k. Go back to Step 1.
(5.37)
88
• Arc-Search Techniques for Interior-Point Methods
Using (5.5), (5.6), (5.34), and Lemma 5.1, it is easy to check that (5.35) satisfies the definition of duality measure. x(α)T s(α) = (xk − x˙ sin(α) + x¨ (1 − cos(α))T (sk − s˙ sin(α) + s¨(1 − cos(α)) T
= xk sk − (x˙ T sk + s˙T xk ) sin(α) + (x¨ T sk + s¨T xk )(1 − cos(α)) T
T
= xk sk − xk sk sin(α) − 2x˙ T s˙(1 − cos(α)) T
= xk sk (1 − sin(α)).
Dividing both sides by n shows the claim. Remark 5.1 For any feasible point, if α = π2 , then µ(α) = 0 means an optimal solution is obtained. In the remaining discussion in this chapter, we exclude this simple case and assume that α < π2 . Step 3 of the algorithm finds a step size sin(α) such that (x(α), λ (α), s(α)) ∈ N2 (2θ ). Lemma 5.6 will show that the step size sin(α) can be determined by the smallest positive solution of (5.33). Since (5.33) is a quartic polynomial which ac cepts analytic solutions [113], the computational cost of solving (5.33) is negligible. It is easy to see that the quartic polynomial (5.33) is a monotonic increasing function of α ∈ [0, π2 ], for (xk , sk ) > 0 (which will be shown in Lemma 5.7), q(0) < 0, and q( π2 ) ≥ 0. Therefore, q(α) has only one real solution in [0, π2 ], and α = sin−1 (α) is well-defined. In the rest of this section, we will show that if (xk , λ k , sk ) ∈ N2 (0.292), then (x(α), λ (α), s(α)) ∈ N2 (0.584) and (xk+1 , λ k+1 , sk+1 ) ∈ N2 (0.292). We will also estimate the size of α, which will be used to prove the polynomiality at the end of this section. Lemma 5.6 Let (xk , λ k , sk ) ∈ N2 (θ ), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) be defined by (5.5) and (5.6), sin(α) > 0 be the solution of (5.33) and (x(α), λ (α), s(α)) be updated using (5.34). Then, (x(α), λ (α), s(α)) ∈ N2 (2θ ). Proof 5.4 First, it is worthwhile to note that Lemmas 5.1 and 5.2 hold in this case. From (5.33), we have � � � � � � � � � � � � � � � � �x¨ ◦ s¨� + �x˙ ◦ s˙� sin4 (α) + �x˙ ◦ s¨� + �x¨ ◦ s˙� sin3 (α) = θ µk (1 − sin(α)) = θ µ(α). Also, the following identity is useful. −2x˙ ◦ s˙(1 − cos(α)) + x˙ ◦ s˙ sin2 (α) = x˙ ◦ s˙(−2 + 2 cos(α) + sin2 (α)) = x˙ ◦ s˙(−1 + 2 cos(α) − cos2 (α)) = −x˙ ◦ s˙(1 − cos(α))2 .
(5.38)
A Feasible Arc-Search Algorithm for LP
�
89
Using this identity, relations (5.5), (5.6), (5.34), (5.35), and Lemma 5.4, we have �x(α) ◦ s(α) − µ(α)e� =
�[xk − x˙ sin(α) + x¨ (1 − cos(α))] ◦ [sk − s˙ sin(α) + s¨(1 − cos(α))] − µk (1 − sin(α))e�
=
�(xk ◦ sk − µk e)(1 − sin(α)) + (xk ◦ s¨ + sk ◦ x¨ )(1 − cos(α)) + x˙ ◦ s˙ sin2 (α) −(x˙ ◦ s¨ + s˙ ◦ x¨ ) sin(α)(1 − cos(α)) + x¨ ◦ s¨(1 − cos(α))2 � k
(5.39)
k
≤
(1 − sin(α))�x ◦ s − µk e�
≤
˙ − cos(α))2 − (x˙ ◦ s¨ + s˙ ◦ x¨ ) sin(α)(1 − cos(α))� +�(¨x ◦ s¨ − x˙ ◦ s)(1 � � �� � � �� �� �� � � � � � � � � (1 − sin(α))θ µk + �x¨ ◦ s¨� + �x˙ ◦ s˙� sin4 (α) + �x˙ ◦ s¨� + �x¨ ◦ s˙� sin3 (α)
=
θ µ(α) + θ µ(α) = 2θ µ(α).
(5.40)
Since xi (α )si (α ) ≥ (1 − 2θ )µ (α ) = µk (1 − 2θ )(1 − sin(α )) > 0, x(0) = x > 0 and s(0) = s > 0 means x(α) > 0 and s(α) > 0. This finishes the proof.
The following lemma is similar to Lemma 3.14. Lemma 5.7 Let (x(α), λ (α), s(α)) ∈ N2 (2θ ). Then, for 0 < θ ≤ 0.292, we have (xk+1 , λ k+1 , sk+1 ) ∈ N2 (θ ) and µk+1 = µ(α). Denote (xk+1 (t), sk+1 (t)) = (x(α)+tΔx, s(α)+tΔs)), and (xk+1 , sk+1 ) =
Proof 5.5
T
T
(xk+1 (1), sk+1 (1)). Using the third row of (5.36), we have x(α) Δs+s(α) n the second row of (5.36), we have ΔxT Δs = 0. Therefore, we have µk+1 (t) =
Δx
= 0. Using
(x(α) + tΔx)T (s(α) + tΔs) x(α)T s(α) = = µ(α). n n
1
1
1
Let D = X(α) 2 S(α)− 2 . Pre-multiplying (X(α)S(α))− 2 in the last row of (5.36) yields 1 DΔs + D−1 Δx = (X(α)S(α))− 2 (µ(α)e − X(α)S(α)e).
Denote u = DΔs and v = D−1 Δx. Using Lemma 3.2 and the assumption that (x(α), λ (α), s(α)) ∈ N2 (2θ ), we have �Δx ◦ Δs�
3
1
= �u ◦ v� ≤ 2− 2 �(X(α)S(α))− 2 (µ(α)e − X(α)S(α)e)�2 n 2 2 3 � (µ(α) − xi (α)si (α)) 3 �µ(α)e − x(α) ◦ s(α)� = 2− 2 ≤ 2− 2 xi (α)si (α) mini xi (α)si (α) i=1
− 32
≤ 2
2 1 θ µ(α) 1 (2θ )2 µ(α)2 θ2 = 22 = 22 µk+1 . (1 − 2θ )µ(α) (1 − 2θ ) (1 − 2θ )
(5.41)
90
• Arc-Search Techniques for Interior-Point Methods
Using this result, (5.37), the last row of (5.36), we have
= ≤
� � � � lxk+1 (t) ◦ sk+1 (t) − µk+1 (t)el = � x(α) + tΔx ◦ s(α) + tΔs − µ(α)e� � � � � 2 2 �(1 − t) x(α) ◦ s(α) − µ(α)e + t Δx ◦ Δs� ≤ (1 − t)2θ µ(α) + t lΔx ◦ Δsl � � 1 θ2 (1 − t)2θ + 2 2 t 2 µk+1 := h(t, θ )µk+1 . (5.42) (1 − 2θ ) 1
2
θ Taking t = 1 gives Ixk+1 ◦ sk+1 − µk+1 eI ≤ 2 2 (1−2θ ) . It is easy to verify that for 0 < θ ≤ 0.292, 1 θ2 22 ≤ θ. (1 − 2θ )
Since, for 0 < θ ≤ 0.292 and t ∈ [0, 1], 0 < h(t, θ ) ≤ h(t, 0.292) < 1, we have, for α ∈ [0, π2 ), xik+1 (t)sik+1 (t) ≥ (1 − h(t, θ ))µk+1 = (1 − h(t, θ ))µ(α) = (1 − h(t, θ ))(1 − sin(α))µk > 0.
Therefore, (xk+1 , sk+1 ) > 0. This finishes the proof. The choice of θ = 0.25 is used in Chapter 3. Taking a larger θ will allow a longer step size in arc-search, which may reduce the number of iterations to converge to the optimal solution. Lemma 5.8 Let 0 < θ ≤ 0.292, and sin(α), α ∈ [0, π/2], be the positive real solution of (5.33). Then, 1 sin(α) ≥ θ 2 n− 2 . Proof 5.6 Since q(sin(α)) is a monotonic increasing function of sin(α) ∈ [0, 1] 1 with q(sin(0)) < 0 and q(sin( π2 )) ≥ 0, we need only to show that q(θ 2 n− 2 ) < 0. 1
Using Lemma 5.5, for sin(α) ≤ θ 2 n− 2 , we have � � � � � � � � � � � � � � � � q(α) = �x¨ ◦ s¨� + �x˙ ◦ s˙� sin4 (α) + �x˙ ◦ s¨� + �x¨ ◦ s˙� sin3 (α) ≤
+θ µk sin(α) − θ µk � � 4(1 + θ )4 2 (1 + θ )2 n + n θ 8 n−2 µk (1 − θ )3 (1 − θ )
3 1 4(1 + θ )3 3 n 2 θ 6 n− 2 µk + θ µk θ 2 n− 2 − θ µk (1 − θ )2 � 7 � 4θ (1 + θ )4 θ 7 (1 + θ )2 4θ 5 (1 + θ )3 θ 2 = θ µk + + + √ −1 (1 − θ )n (1 − θ )2 (1 − θ )3 n := θ µk f (θ , n). (5.43)
+
A Feasible Arc-Search Algorithm for LP
�
91
Clearly, f (θ , n) is a monotonic increasing function of θ ∈ (0, 1), and a monotonic decreasing function of n ∈ {1, 2, . . .}. For θ ≤ 0.292, and n ≥ 1, f (θ , n) < 0, hence, 1 1 for sin(α) ≤ θ 2 n− 2 , q(sin(α)) < 0. This proves sin(α) ≥ θ 2 n− 2 . � � � � � � � � � � � � � � � � Remark 5.2 Clearly, if � xx˙ �, � xx¨ �, � ss˙ �, � ss¨ � are smaller than some constant inde pendent of m and n, then the arc-search algorithm proposed above would reduce the duality measure at a constant rate independent of m and n, a nice feature (polynomial bound would be independent of n) that appears to hold according to our numerical test, but we cannot prove it. Let (x∗ , λ ∗ , s∗ ) be any solution of (5.1). Let index sets B, N be defined as B = { j ∈ {1, . . . , n} | x∗j �= 0}.
x∗
N = { j ∈ {1, . . . , n} | s∗j �= 0}.
(5.44) (5.45) (λ ∗ , s∗ )
Let be a solution of the primal linear programming and be a solution of the dual linear programming, such that, B ∩ N = ∅ and B ∪ N = {1, . . . , n}, i.e., x∗ ◦ s∗ = 0, and x∗ + s∗ > 0. An optimal solution with this property is called strictly complementary. Now we are ready to state our main result. Theorem 5.2 The sequence (xk , λ k , sk ) generated by Algorithm 5.1 globally converges to a set of limit points (x∗ , λ ∗ , s∗ ). Moreover, Algorithm 5.1 is a polynomial algorithm with 1 polynomial complexity bound of O(n 2 log(1/ε)). For every limit point (x∗ , λ ∗ , s∗ ), x∗ is the optimal solution of the primal problem, (λ ∗ , s∗ ) is the optimal solution of the dual problem, and (x∗ , s∗ ) is strictly complementary. Proof 5.7 In view of Lemma 5.8, (5.18), and (1.4), Algorithms 5.1 converges in 1 polynomial time with the complexity bound of O(n 2 log(1/ε)). The rest of the proof is to show that the solution is complementary. In view of Assumption 2 in this chap ter, F 0 is not empty. Theorem 1.1 shows that the optimal solution set is bounded, i.e., there is a constant M such that �(x∗ , s∗ )� ≤ M.
P∗
be the set of all primal optimizers Let (λ ∗ , s∗ ), and � ε(A, b, c) = min
x∗ ,
D∗
(5.46) be the set of all dual optimizers �
min sup xi∗ , min sup s∗i i∈B x∗ ∈P∗ i∈N (λ ∗ ,s∗ )∈D∗
.
For any feasible (x, λ , s) ∈ F o with xi si > γ µ, we have Ax = Ax∗ = b and AT λ +s = AT λ ∗ + s∗ = c. Therefore, (x − x∗ )T (s − s∗ ) = −(x − x∗ )T AT (λ − λ ∗ ) = 0.
92
� Arc-Search Techniques for Interior-Point Methods
Since Theorem 1.3 implies xi ∗ = 0 for i ∈ N and s∗i = 0 for i ∈ B, we can rearrange this expression to obtain � � nµ = xT s∗ + sT x∗ = xi s∗i + si xi∗ i∈N
i∈B
Since each of these summations is nonnegative, each term is bounded by nµ. Hence, for any i ∈ N with s∗i > 0, it must have 0 < xi s∗i ≤ nµ, → xi ≤
nµ . s ∗i
This relation holds for any (x∗ , λ ∗ , s∗ ) with s∗i > 0, which implies 0 < xi ≤
nµ . sup(λ ∗ ,s∗ )∈D∗ s∗i
This relation holds for any xi with i ∈ N , which implies 0 < max xi ≤ i∈N
nµ . mini∈N sup(λ ∗ ,s∗ )∈D∗ s∗i
Similarly, 0 < max si ≤ i∈B
nµ . mini∈B supx∗ ∈P∗ xi∗
Adjoining this two inequalities gives � � max max xi , max si ≤ i∈N
Setting C1 =
ε(A,b,c) n
i∈B
nµ . ε(A, b, c)
given
0 < xi ≤ µ/C1 (i ∈ N ),
0 < si ≤ µ/C1 (i ∈ B).
(5.47)
Theorem 1.3 asserts the existence of a strictly complementary solution, which indi cates that ε(A, b, c) > 0 and so is C1 . Since xi si ≥ γ µ for all i = 1, 2, . . . , n, from (5.47), we have γµ γµ ≥ ≥ C1 γ (i ∈ N ), µ/C1 xi γµ γµ xi ≥ ≥ ≥ C1 γ (i ∈ B). µ/C1 si
si ≥
(5.48a) (5.48b)
Therefore, we conclude that every limit point is a strictly complementary primarydual solution of the linear programming, i.e., s∗ i ≥ C2 γ (i ∈ N ),
xi∗ ≥ C2 γ (i ∈ B).
(5.49)
A Feasible Arc-Search Algorithm for LP
5.2
�
93
Numerical Examples
In this section, a simple problem is used as an example to show the central path, the ellipse approximation, and the arc-search in every iteration by plots. From these plots, one can intuitively see that searching along an ellipse is more attractive than searching along a straight line. Some larger scale test problems are also provided in order to show the efficiency of the proposed algorithm.
5.2.1 A simple illustrative example Let us consider s.t. x1 + x2 = 5, x1 ≥ 0, x2 ≥ 0.
min x1 ,
The central path (x, λ , s) satisfies the following conditions.
�
1 1
�
x1 + x2 = 5, � � � � s1 1 λ+ = , 0 s2
x1 s1 = µ,
x2 s2 = µ.
The optimizer is given by x1 = 0, x2 = 5, λ = 0, s1 = 1, and s2 = 0. The central path for this problem is given analytically as � 5 − 2µ − (5 − 2µ)2 + 20µ λ= , 10 s1 = 1 − λ , s2 = −λ , x1 s1 = µ, x2 s2 = µ. The central path is an arc in 5-dimensional space (λ , x1 , s1 , x2 , s2 ). If we project the central path to 2-dimensional space spanned by (s1 , x1 ), it is an arc in 2 dimensional space. Similarly, we can project the ellipse in 5-dimensional space to the same 2-dimensional space spanned by (s1 , x1 ). Figure 5.1 shows all iterations of Algorithm 5.1 in the two dimensional subspace spanned by (s1 , x1 ). The first itera tion moves the iterate very close to the solution, and the remaining iterations have to be rescaled to show the details. In Figure 5.1, (x˙ , s˙, λ˙ ) is calculated by using (5.5), (x¨ , s¨, λ¨ ) is calculated by using (5.6), the projected central path is the continuous line in black, the projected ellipse approximations are the dotted lines in blue in every iteration (they may sometimes look like a continuous line because many dots are used), the initial point (s01 , x10 ) is marked by ‘x’ in red, after moving ‘x’ towards the central path, (sk1 , x1k ) is marked by ‘o’ in red, after arc-search, the point (sk1 , x1k ) on the ellipse is marked by ‘+’ in green, the optimal solution (s∗ , x∗ ) is marked by ‘*’ in red. More detailed information in the second iteration, the third iteration, and the final re sult are presented in Figures 5.2, 5.3, and 5.4 which are amplified parts of Figure 5.1. It is clear that after the first iteration, the central path is close to a straight line, and the ellipses approximate the central path very well, regardless of whether the cen tral path is close to a straight line or not. We expect that the developed algorithm is
94
� Arc-Search Techniques for Interior-Point Methods
1.2
1
0.8
0.6
0.4
0.2
0
−0.2 1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
Figure 5.1: Arc-search for the simple example.
0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 1
1.002
1.004
1.006
1.008
1.01
1.012
1.014
1.016
Figure 5.2: Arc-search of the second iteration for the simple example.
more efficient in the early stage than MPC and other interior-point algorithms based on line search when the central path is not close to the straight line, as shown in Figure 5.1.
5.2.2 Some Netlib test examples R The algorithm developed in this chapter is implemented in MATLAB� . function arc2. Numerical tests have been performed for linear programming problems in Netlib LP library. Cartis and Gould [20] have classified Netlib LP problems into two categories: Problems with strict interior-point and problems without strict interiorpoint. The test is performed for problems with strict interior-point and expressed in standard form with A being full rank matrices. The selected problems are solved us ing the implemented MATLAB function and the linprog function in the MATLAB optimization toolBox. The iterations used to solve these problems are compared and
A Feasible Arc-Search Algorithm for LP
�
95
−4
x 10 10 8 6 4 2 0 −2 −4
1
1
1.0001
1.0001
1.0002
Figure 5.3: Arc-search of the third iteration for the simple example.
−7
x 10 3.5 3 2.5 2 1.5 1 0.5 0 −0.5 −1
1
1
1
1
1
1
Figure 5.4: Arc-search of the final result for the simple example.
the iteration numbers are listed in Table 5.1. Only two Netlib problems that are clas sified as problems with strict interior-point and are presented in standard form are not included in the table because the PC computer used in the test did not have enough memory to handle problems of big size. In the implementation, the stopping criterion used in Algorithm 5.1 is the same as linprog [172] �rc � µ �rb � + + < 10−8 . max{1, �b�} max{1, �c�} max{1, �cT x�, �bT λ �} For all problems, the initial point is set to x = s = e. The initial point used in linprog is essentially the same as was used in [83], with some minor modifications (see [172]).
96
• Arc-Search Techniques for Interior-Point Methods
Table 5.1: Iteration counts for test problems in Netlib and MATLAB. Problem AFIRO blend SCAGR25 SCAGR7 SCSD1 SCSD6 SCSD8 SCTAP1 SCTAP2 SCTAP3 SHARE1B
5.3
iterations used by different algorithms arc2 linprog 4 7 10 8 5 16 6 12 13 10 17 12 13 11 14 17 13 18 14 18 9 22
source Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib Netlib
Concluding Remarks
This chapter proposes an arc-search interior-point path-following algorithm that searches for the optimizers along the ellipses that approximate a central path. The main purpose of this chapter is to show that higher-order interior-point pathfollowing algorithms can achieve the best polynomial bound by using the arc-search method, with the potential to be very efficient. The convergence analysis proved the first claim. Some preliminary numerical result is presented to demonstrate that the proposed algorithm has numerical merit in comparison to some well-implemented algorithm based on Mehrotra’s method, which is known to be very efficient but has no convergence result.
Chapter 6
A MTY-Type Infeasible Arc-Search Algorithm for LP
From Chapters 2, 3, and 4, we know that there are three strategies that can be used to improve the efficiency of the interior-point algorithms, namely, (a) search for opti mizers in a larger neighborhood, (b) use infeasible initial point, and (c) approximate the central path using higher-order derivatives. However, when line search method is used with the interior-point methods, these strategies adversely affect the polynomial bounds, i.e., the theoretical analysis contradicts the computational experience [133]. In Chapter 5, we introduced an arc-search method and showed that the higher-order method does not undermine the polynomial bound if the arc-search is used because √ the lowest bound O( nL) can be achieved for the algorithm that uses the arc-search (higher-order derivatives). Preliminary computational experience also demonstrated promising result. In this chapter, we will incorporate the arc-search to another important strategy, i.e., starting from infeasible initial point. We show that, for a higher-order infeasibleinterior-point method using arc-search, the polynomial bound is O(nL), which is as good as the infeasible first order method using line search [144] and the best bound for infeasible-interior-point algorithms that was obtained by [90]. This method also improves the polynomial bound obtained √ by Yang et al. [149]. In Chapter 8, we will further improve this bound to O( nL). The material of this chapter is based on [161, 67].
98
6.1
• Arc-Search Techniques for Interior-Point Methods
An Infeasible Predictor-Corrector Algorithm
The standard form of linear programming considered in this chapter is the same as (1.4). The dual problem of (1.4) is given in (1.5). To simplify the analysis, we assume that the rank of matrix A equals to m. However, this requirement is not essential because A can always be reduced to a full rank matrix in polynomial time. We use S to denote the set of all the optimal solutions (x∗ , λ ∗ , s∗ ) of (1.4) and (1.5). It is well known that the primal-dual vector (x, λ , s) is an optimal solution of (1.4) and (1.5) if and only if it satisfies the KKT conditions (5.1). We will consider the residuals of primal programming and dual programming defined in Chapter 4: rkc
rkb = Axk − b, = AT λ k + sk − c.
(6.1)
Given a strictly positive current point (xk , sk ) > 0, the infeasible-predictor corrector algorithm is to find the solution of (1.4) approximately along a curve C(t) defined by the following system Ax(t) − b = trkb AT λ (t) + s(t) − c = trkc x(t) ◦ s(t) = txk ◦ sk (x(t), s(t)) > 0,
(6.2)
where t ∈ (0, 1]. As t → 0, (x(t), λ (t), s(t)) approaches the solution of (1.4). Since it is not easy to obtain C(t), an ellipse E (defined in (5.4)) in the 2n + m dimensional space will be used to approximate the curve defined by (6.2). Taking the derivatives for t in (6.2) gives (see [153]) ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ A 0 0 rkb ⎣ 0 AT I ⎦ ⎣ λ˙ ⎦ = ⎣ rkc ⎦ , (6.3) k k k k ˙ S 0 X s x ◦s ⎡
A ⎣ 0 Sk
0 AT 0
⎤⎡ 0 I ⎦⎣ Xk
⎤ ⎡ ⎤ x¨ 0 ⎦. 0 λ¨ ⎦ = ⎣ −2x˙ ◦ s˙ s¨
(6.4)
Here, Xk and Sk are diagonal matrices whose diagonal elements are given by xk and sk , respectively. We require the ellipse to pass the same point (xk , λ k , sk ) on C(t) and to have the same derivatives given by (6.3) and (6.4). The ellipse can be derived similar to the derivation of Theorem 5.1 and is given as below.
A MTY-Type Infeasible Arc-Search Algorithm for LP
•
99
Theorem 6.1 Let (x(α), λ (α), s(α)) be an arc defined by (5.4) passing through a point (x, λ , s) ∈ E ∩C(t), and its first and second derivatives at (x, λ , s) be (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) which are defined by (6.3) and (6.4). Then, the ellipse approximation of (6.2) is given by x(α) = x − x˙ sin(α) + x¨ (1 − cos(α)), λ (α) = λ − λ˙ sin(α) + λ¨ (1 − cos(α)), s(α) = s − s˙ sin(α) + s¨(1 − cos(α)).
(6.5) (6.6) (6.7)
Let the duality measure be defined as (5.17), and define the set of neighborhood by N2I (θ ) := {(x, s) | (x, s) > 0, Ix ◦ s − µeI ≤ θ µ}.
(6.8)
The proposed algorithm searches for an optimizer along the ellipse while staying inside N2I (θ ). Algorithm 6.1 Data: A, b, c, θ ∈ (0, 1√ ], ε > 0, initial point (x0 , λ 0 , s0 ) ∈ N2I (θ ). 2+ 2 for iteration k = 0, 1, 2, . . . Step 1: If µk ≤ ε,
Irkc I
Irkb I = IAxk − bI ≤ ε, T k
k
= IA λ + s − cI ≤ ε, k
k
(x , s ) > 0.
(6.9a) (6.9b) (6.9c) (6.9d)
stop. Otherwise continue. Step 2: Solve the linear systems of equations (6.3) and (6.4) to get (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨). Step 3: Find the largest positive αk ∈ (0, π/2] such that for all α ∈ (0, αk ], (x(α), s(α)) > 0 and I(x(α) ◦ s(α)) − (1 − sin(α))µk eI ≤ 2θ (1 − sin(α))µk .
(6.10)
Set (x(αk ), λ (αk ), s(αk )) = (xk , λ k , sk )−(x˙ , λ˙ , s˙) sin(αk )+(x¨ , λ¨ , s¨)(1−cos(αk )). (6.11)
100
• Arc-Search Techniques for Interior-Point Methods
Step 4: Calculate (Δx, Δλ , Δs) by solving ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 Δx 0 ⎣ 0 ⎦⎣ Δλ ⎦ = ⎣ ⎦. 0 AT I Δs (1 − sin(αk ))µk e − x(αk ) ◦ s(αk ) S(αk ) 0 X(αk ) (6.12) Update (xk+1 , λ k+1 , sk+1 ) = (x(αk ), λ (αk ), s(αk )) + (Δx, Δλ , Δs) and
(6.13)
T
µk+1 =
xk+1 sk+1 . n
(6.14)
Step 5: Set k + 1 → k. Go back to Step 1. end (for) In the rest of this chapter, we will show (1) rkb → 0, rkc → 0, and µk → 0; (2) there exist αk ∈ (0, π/2] such that (x(α), s(α)) > 0 and (6.10) hold for any α ∈ (0, αk ]; (3) (xk , sk ) ∈ N2I (θ ). It is easy to show that rkb , rkc , and µk decrease at the same rate in every iteration. Lemma 6.1 Consider the sequence of points generated by Algorithm 3.1. Then, rk+1 = rkb (1 − sin(αk )), b
rk+1 = rkc (1 − sin(αk )), c µk+1 = µk (1 − sin(αk )). Proof 6.1
(6.15)
Using (6.1), (6.13), (6.12), (6.11), (6.4), and (6.3), we have k k+1 rk+1 − xk ) = A(x(αk ) + Δx − xk ) b − rb = A(x
= A(xk − x˙ sin(αk ) − xk ) = −Ax˙ sin(αk ) = −rbk sin(αk ). This shows the first relation. The second relation follows a similar derivation. From (6.12), it holds that (Δx)T Δs = (Δx)T (−AT Δλ ) = −(AΔx)T Δλ = 0. Using (6.13) and (6.12), we have T
= = = =
xk+1 sk+1 (x(αk ) + Δx)T (s(αk ) + Δs)
x(αk )T s(αk ) + x(αk )T Δs + s(αk )T Δx
x(αk )T s(αk ) + (1 − sin(αk ))µk n − x(αk )T s(αk )
(1 − sin(αk ))µk n.
Dividing both sides by n proves the last relation.
A MTY-Type Infeasible Arc-Search Algorithm for LP
•
101
Clearly, if we can take sin(αk ) = 1 (αk = π2 ) at some kth iteration, we will exactly reach the optimal solution (allowing some xi = 0 and/or s j = 0) at this iteration, which is rarely the case. Therefore, from now on, we assume αk ∈ (0, π2 ) through the execution of the algorithm. We will use the following lemma of [91, 90]. Lemma 6.2 Let (Δx, Δs) be given by (6.12). Then, √ 1 2 IΔx ◦ ΔsI ≤ I(X(αk )S(αk ))− 2 (x(αk ) ◦ s(αk ) − µk+1 e)I2 . 4
(6.16)
Proof 6.2 Noticing that µk+1 = (1 − sin(αk ))µk from Lemma 6.1, and applying Lemma 3.2 to the last row of (6.12) proves this lemma. Theorem 6.2 Assume (xk , sk ) ∈ N2I (θ ). Then, (i) there is an αk ∈ (0, π2 ), such that (x(α), s(α)) > 0 and (6.10) hold for any α ∈ (0, αk ]. (ii) if θ ≤ Proof 6.3
1√ , 2+ 2
then (xk+1 , sk+1 ) ∈ N2I (θ ).
First, note that the last rows of (6.3) and (6.4) are equivalent to sk ◦ x˙ + xk ◦ s˙ = xk ◦ sk ,
sk ◦ x¨ + xk ◦ s¨ = −2x˙ ◦ s˙.
(6.17)
Also, we have the following simple identity: sin2 (α) − 2(1 − cos(α)) = −(1 − cos(α))2 .
(6.18)
Using (6.17) and (6.18), we have x(α) ◦ s(α) = (xk − x˙ sin(α) + x¨ (1 − cos(α)) ◦ (sk − s˙ sin(α) + s¨(1 − cos(α)) = xk ◦ sk − (xk ◦ s˙ + sk ◦ x˙ ) sin(α) + (xk ◦ s¨ + sk ◦ x¨ )(1 − cos(α)) +x˙ ◦ s˙ sin2 (α) − (x¨ ◦ s˙ + x˙ ◦ s¨) sin(α)(1 − cos(α)) +x¨ ◦ s¨(1 − cos(α))2 = xk ◦ sk − xk ◦ sk sin(α) − 2x˙ ◦ s˙(1 − cos(α))
+x˙ ◦ s˙ sin2 (α) − (x¨ ◦ s˙ + x˙ ◦ s¨) sin(α)(1 − cos(α))
+x¨ ◦ s¨(1 − cos(α))2
= xk ◦ sk (1 − sin(α)) + x˙ ◦ s˙(sin2 (α) − 2(1 − cos(α))) −(x¨ ◦ s˙ + x˙ ◦ s¨) sin(α)(1 − cos(α)) + x¨ ◦ s¨(1 − cos(α))2 = xk ◦ sk (1 − sin(α)) + (x¨ ◦ s¨ − x˙ ◦ s˙)(1 − cos(α))2 −(x¨ ◦ s˙ + x˙ ◦ s¨) sin(α)(1 − cos(α)).
(6.19)
102
• Arc-Search Techniques for Interior-Point Methods
Furthermore, for α ∈ (0, π2 ), it holds that 0 ≤ 1 − cos(α) ≤ 1 − cos2 (α) = sin2 (α). Using (6.19), 0 ≤ 1 − cos(α) ≤ sin2 (α) and (xk , sk ) ∈ N2I (θ ), we have Ix(α) ◦ s(α) − (1 − sin(α))µk eI = Ixk ◦ sk (1 − sin(α)) + (x¨ ◦ s¨ − x˙ ◦ s˙)(1 − cos(α))2 −(x¨ ◦ s˙ + x˙ ◦ s¨) sin(α)(1 − cos(α)) − (1 − sin(α))µk eI = I(xk ◦ sk − µk e)(1 − sin(α)) + (x¨ ◦ s¨ − x˙ ◦ s˙)(1 − cos(α))2 −(x¨ ◦ s˙ + x˙ ◦ s¨) sin(α)(1 − cos(α))I ≤ θ µk (1 − sin(α)) + (Ix¨ ◦ s¨I + Ix˙ ◦ s˙I) sin4 (α) +(Ix¨ ◦ s˙I + Ix˙ ◦ s¨I) sin3 (α).
(6.20)
Clearly, if � � � � � � � � � � � � � � � � �x¨ ◦ s¨� + �x˙ ◦ s˙� sin4 (α) + �x˙ ◦ s¨� + �x¨ ◦ s˙� sin3 (α)
q(α) :=
+θ µk sin(α) − θ µk ≤ 0,
(6.21)
then, (6.10) holds. Indeed, since q(0) = −θ µk < 0 and q(α) is a monotonically in creasing function of α, by continuity, there exists an αk ∈ (0, π2 ) such that (6.21) holds for all α ∈ (0, αk ]. This shows that (6.10) holds. From (6.10), we have xi (α)si (α) ≥ (1 − 2θ )(1 − sin(α))µk > 0, ∀θ ∈ [0, 0.5) and ∀α ∈ (0, αk ]. This shows (x(α), s(α)) > 0, therefore, this completes the proof of part (i). From Lemma 6.1, inequality (6.10) is equivalent to Ix(α) ◦ s(α) − µk+1 eI ≤ 2θ µk+1 . Using (6.13), (6.12), Lemmas 6.1 and 6.2, and part (i) of this theorem, we have
= = ≤ ≤ ≤
Ixk+1 ◦ sk+1 − µk+1 eI
I(x(αk ) + Δx) ◦ (s(αk ) + Δs) − µk+1 eI
√ 1 2 IΔx ◦ ΔsI ≤ I(X(αk )S(αk ))− 2 (x(αk ) ◦ s(αk ) − µk+1 e)I2 4 √ 2 Ix(αk ) ◦ s(αk ) − µk+1 eI2
4 mini xi (αk )si (αk )
√ 2
2(2θ )2 µk+1
4(1 − 2θ )µk+1
√ 2 2θ µk+1 . (6.22) (1 − 2θ )
It is easy to check that for θ ≤ θ≤
1√ , 2+ 2
we have
1√ 2+ 2
≈ 0.29289,
√ 2 2θ (1−2θ )
Ixk+1 ◦ sk+1 − µk+1 eI ≤ θ µk+1 .
≤ θ holds, therefore, for
A MTY-Type Infeasible Arc-Search Algorithm for LP
•
103
We now show that (xk+1 , sk+1 ) > 0. Let xk+1 (t) = x(αk ) +tΔx and sk+1 (t) = s(αk ) + tΔs. Then, xk+1 (0) = x(αk ) and xk+1 (1) = xk+1 . Since xk+1 (t) ◦ sk+1 (t) = (x(αk ) + tΔx) ◦ (s(αk ) + tΔs)
= x(αk ) ◦ s(αk ) + t(x(αk ) ◦ Δs + s(αk ) ◦ Δx) + t 2 Δx ◦ Δs,
using (6.12), Lemma 6.1, (6.10), (6.22), and the assumption that θ ≤ = = = ≤ ≤
1√ , 2+ 2
we have
Ixk+1 (t) ◦ sk+1 (t) − µk+1 eI Ix(αk ) ◦ s(αk ) + t(x(αk ) ◦ Δs + s(αk ) ◦ Δx) + t 2 Δx ◦ Δs − µk+1 eI Ix(αk ) ◦ s(αk ) + t[(1 − sin(αk ))µk e − x(αk ) ◦ s(αk )] + t 2 Δx ◦ Δs − µk+1 eI I(1 − t)(x(αk ) ◦ s(αk ) − µk+1 e) + t 2 Δx ◦ ΔsI √ 2
2θ 2 2(1 − t)θ µk+1 + t µk+1
1 − 2θ (2(1 − t) + t 2 )θ µk+1 := f (t)θ µk+1 . (6.23)
The function f (t) is a monotonically decreasing function of t ∈ [0, 1], and f (0) = 2. This proves Ixk+1 (t) ◦ sk+1 (t) − µk+1 eI ≤ 2θ µk+1 . Therefore, xik+1 (t)sik+1 (t) ≥ (1 − 2θ )µk+1 > 0 for all t ∈ [0, 1], which means (xk+1 , sk+1 ) > 0. This completes the proof of part (ii). This theorem indicates that the proposed algorithm is well-defined.
6.2
Polynomiality
Let the initial point be selected to satisfy (x0 , s0 ) ∈ N2I (θ ), x∗ ≤ ρx0 , s∗ ≤ ρs0 ,
(6.24)
where ρ ≥ 1 and (x∗ , λ ∗ , s∗ ) ∈ S. Let (x¯ , λ¯ , s¯) be a member of the set: S¯ = {(x¯ , λ¯ , s¯) | Ax¯ = b, AT λ¯ + s¯ = c}.
(6.25)
Let ω f and ω o be the qualities of the initial point which are the “distances” from feasibility and optimality given by ωf =
min
(x¯ ,λ¯ ,s¯)∈S¯
and
max I(X0 )−1 (x¯ − x0 )I∞ , I(S0 )−1 (s¯ − s0 )I∞ T
.
(6.26)
T
x∗ s0 s∗ x0 ω = min max , T , 1 | (x∗ , λ ∗ , s∗ ) ∈ S . T x∗ ,λ ∗ ,s∗ x0 s0 x0 s0 From the definition of ω f , there are (x¯ , λ¯ , s¯) ∈ S¯ such that 0
|xi0 − x¯i | ≤ ω f xi0 ,
(6.27)
104
• Arc-Search Techniques for Interior-Point Methods
|s0i − s¯i | ≤ ω f si0 .
(6.28)
From the definition of ω o , there is the optimal solution (x∗ , λ ∗ , s∗ ) ∈ S such that T
T
T
x0 s∗ , s0 x∗ ≤ ω 0 x0 s0 . Let νk =
�k−1 i=0
(6.29)
(1 − sin(αk )), it has T
T
νk ω 0 x0 s0 ≤ ω 0 xk sk .
(6.30)
Let ω pr and ωdr be the “ratios” of the feasibility and the total complementarity defined by ω pr =
1Ax0 −b1 , T x0 s0
(6.31) ωdr
=
1AT λ 0 +s0 −c1 . T x0 s0
In view of Lemma 6.1, we have that T
IAxk − bI = ω pr xk sk ,
T
IAT λ k + sk − cI = ωdr xk sk .
(6.32)
Several lemmas, which will be used to prove the polynomiality, are provided below. Lemma 6.3 Let (x0 , s0 ) be defined by (6.24). Then, ω f ≤ ρ, ω o ≤ ρ. Proof 6.4
First, from (6.24), (6.27), and ρ ≥ 1, it is easy to see ω o ≤ ρ. Now, let ρ f = max min{IxI | Ax = b}, min{IsI | AT λ + s = c}
there exists a constant ω ∗ such that ρ f (e, e) ≤ ω ∗ (x0 , s0 ). and by the definition of ρ f , we see that |x¯i | ≤ ρ f ≤ ω ∗ xi0 , |s¯i | ≤ ρ f ≤ ω ∗ s0i . Therefore, ωf
(6.33)
= ≤
min
(x¯ ,λ¯ ,s¯)∈S¯
max
max{I(X0 )−1 (x¯ − x0 )I∞ , I(S0 )−1 (s¯ − s0 )I∞ }
|x¯i − xi0 | |s¯i − s0i | , xi0 si0
A MTY-Type Infeasible Arc-Search Algorithm for LP
≤
max
|x¯i | + xi0 |s¯i | + si0 , xi0 si0
ω ∗ xi0 + xi0 ω ∗ s0i + s0i , xi0 si0 ∗ = 1+ω .
≤
105
•
max
(6.34)
Resetting ρ = max{ρ, 1 + ω ∗ } finishes the proof. The next lemma is also useful. Lemma 6.4 Let (x0 , s0 ) be defined by (6.24). Then, T
T
T
νk (xk s0 + sk x0 ) ≤ (1 + 2ω 0 )xk sk
(6.35)
Proof 6.5 If ω 0 = ∞, then the lemma holds. Therefore, it is assumed that ω 0 < ∞. Let (x∗ , λ ∗ , s∗ ) ∈ S be the optimal solution that achieves the minimum of (6.27). From Lemma 6.1, it follows that T
T
Axk − b = νk (Ax0 − b), AT λ k + sk − c = νk (AT λ 0 + s0 − c), xk sk = νk x0 s0 . Let xˆ = νk x0 + (1 − νk )x∗ ,
(λˆ , sˆ) = νk (λ 0 , s0 ) + (1 − νk )(λ ∗ , s∗ ), then Axˆ − b = νk (Ax0 − b),
AT λˆ + sˆ − c = νk (AT λ 0 + s0 − c). Since sˆ = c − AT λˆ + νk (AT λ 0 + s0 − c) and sk = c − AT λ k + νk (AT λ 0 + s0 − c), it follows that sˆ − sk = c − AT λˆ − c + AT λ k = −AT (λˆ − λ k ). Noticing that A(xˆ − xk ) = 0, we have 0
= (xˆ − xk )T (sˆ − sk ) = (νk x0 + (1 − νk )x∗ − xk )T (νk s0 + (1 − νk )s∗ − sk ).
(6.36)
106
• Arc-Search Techniques for Interior-Point Methods
The last equation is equivalent to (νk x0 + (1 − νk )x∗ )T sk + (νk s0 + (1 − νk )s∗ )T xk T
= (νk x0 + (1 − νk )x∗ )T (νk s0 + (1 − νk )s∗ ) + xk sk
(6.37)
T
Using this relation, x∗ s∗ = 0, (6.29), and (6.30), and noticing that νk ∈ [0, 1] and (x∗ , s∗ ) ≥ 0, it follows that T
T
νk (x0 sk + s0 xk ) [use (6.37)] T [use x∗ s∗ = 0] [use (6.29)] [use ω o ≥ 1] [use (6.30)]
T
T
≤ (νk x0 + (1 − νk )x∗ )T sk + (νk s0 + (1 − νk )s∗ )T xk
T = (νk x0 + (1 − νk )x∗ )T (νk s0 + (1 − νk )s∗ ) + xk sk
T T T T = νk2 x0 s0 + νk (1 − νk )(x0 s∗ + s0 x∗ ) + xk sk
T T T ≤ νk2 x0 s0 + 2νk (1 − νk )ω o x0 s0 + xk sk
T T ≤ 2νk ω o x0 s0 + xk s k T = (1 + 2ω o )xk sk .
This finishes the proof. The next Lemma will be used a few times. Lemma 6.5 1 1 Let Dk = (Xk ) 2 (Sk )− 2 . For i = 1, 2, 3, let (δ xi , δ λ i , δ si ) be the solution of ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 δ xi 0 ⎣ 0 AT I ⎦ ⎣ δ λ i ⎦ = ⎣ 0 ⎦ ri δ si S 0 X and (δ x, δ λ , δ s) be the solution of ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 0 δx ⎣ 0 AT I ⎦ ⎣ δ λ ⎦ = ⎣ ⎦, 0 δs S 0 X r1 + r2 + r3 where r1 = x ◦ s, r2 = −νk s ◦ (x0 − x¯ ), and r3 = −νk x ◦ (s0 − s¯). Then, √ 1 √ ID−1 δ x1 I, IDδ s1 I ≤ ID−1 δ x1 + Dδ s1 I = I(Xs) 2 I = xT s = nµ, −1
ID
2
δ x I, IDδ s I ≤ ID
−1
ID
Proof 6.6
2
3
3
−1
δ x I, IDδ s I ≤ ID
2
2
δ x + Dδ s I = νk ID
−1
3
3
−1
0
(x − x¯ )I, 0
¯ δ x + Dδ s I = νk ID(s − s)I.
Clearly, we have δ x = δ x1 + δ x2 + δ x3 δλ = δλ1 +δλ2 +δλ3 δ s = δ s1 + δ s2 + δ s3 .
(6.38)
(6.39)
(6.40a) (6.40b) (6.40c)
A MTY-Type Infeasible Arc-Search Algorithm for LP
•
107
From the second row of (6.38), we have (D−1 δ xi )T (Dδ si ) = 0, for i = 1, 2, 3, there fore, ID−1 δ xi I2 , IDδ si I2 ≤ ID−1 δ xi I2 + IDδ si I2 = ID−1 δ xi + Dδ si I2 .
(6.41)
Applying (XS)−1/2 (Sδ xi + Xδ si ) = (XS)−1/2 ri to (6.41) for i = 1, 2, 3, respectively, we obtain (6.40). This finishes the proof. Lemma 6.6 1 1 Let (x˙ , s˙) be defined by (6.3), and Dk = (Xk ) 2 (Sk )− 2 . Then, (xk )T sk
1
max{I(Dk )−1 x˙ I, I(Dk )s˙I} ≤ I(xk ◦ sk ) 2 I + ω f (1 + 2ω o )
1
mini (xik ski ) 2
.
(6.42)
Proof 6.7 Let x¯ be a feasible solution of (1.4) and (λ¯ , s¯) be a feasible solution of (1.5) such that the minimum of (6.26) is achieved. Since Ax˙ = rkb = νk r0b = νk (Ax0 − b) = νk A(x0 − x¯ ), we have A(x˙ − νk (x0 − x¯ )) = 0. Similarly, since AT λ˙ + s˙ = rkc = νk r0c = νk (AT λ 0 + s0 − c) = νk (AT (λ 0 − λ¯ ) + (s0 − s¯)), we have AT (λ˙ − νk (λ 0 − λ¯ )) + (s˙ − νk (s0 − s¯)) = 0. Using the last row of (6.3), we have s ◦ (x˙ − νk (x0 − x¯ )) + x ◦ (s˙ − νk (s0 − s¯)) = x ◦ s − νk s ◦ (x0 − x¯ ) − νk x ◦ (s0 − s¯). Thus, in matrix form, we have ⎡ A 0 ⎣ 0 AT S 0 ⎡
⎤⎡ 0 I ⎦⎣ X
⎤ x˙ − νk (x0 − x¯ )
λ˙ − νk (λ 0 − λ¯ ) ⎦
s˙ − νk (s0 − s¯)
⎤ 0 ⎦ 0 = ⎣ 0 0 x ◦ s − νk s ◦ (x − x¯ ) − νk x ◦ (s − s¯)
Denote (δ x, δ λ , δ s) = (x˙ − νk (x0 − x¯ ), λ˙ − νk (λ 0 − λ¯ ), s˙ − νk (s0 − s¯)) and r = r1 + r2 + r3
(6.43)
108
• Arc-Search Techniques for Interior-Point Methods
with (r1 , r2 , r3 ) = (x ◦ s, −νk s ◦ (x0 − x¯ ), −νk x ◦ (s0 − s¯)). Applying Lemma 6.5, it follows that equations (6.40) holds. Considering (6.38) with i = 2, we have Sδ x2 + Xδ s 2 = r 2 = −νk S(x0 − x¯ ), which is equivalent to ¯ − D2 δ s2 . δ x2 = −νk (x0 − x)
(6.44)
Thus, from (6.44) and (6.40), we have ¯ ˙ = ID−1 [δ x1 + δ x2 + δ x3 + νk (x0 − x)]I ID−1 xI −1 1 2 −1 3 = ID δ x − Dδ s + D δ x I ≤ ID−1 δ x1 I + IDδ s2 I + ID−1 δ x3 I.
(6.45)
Considering (6.38) with i = 3, we have Sδ x3 + Xδ s 3 = r 3 = −νk X(s0 − s¯), which is equivalent to δ s3 = −νk (s0 − s¯) − D−2 δ x3 .
(6.46)
Thus, from (6.46) and (6.40), we have ¯ IDs˙I = ID[δ s1 + δ s2 + δ s3 + νk (s0 − s)]I 1 2 −1 3 = IDδ s + Dδ s − D δ x I ≤ IDδ s1 I + IDδ s2 I + ID−1 δ x3 I.
(6.47)
In view of (6.40a), it follows that IDs˙I, ID−1 x˙ I ≤
√ nµ + IDδ s2 I + ID−1 δ x3 I.
Using (6.40b), we have IDδ s2 I
≤ ≤
[use (6.28)] ≤ [use I · I2 ≤ I · I1 ]
≤
νk ID−1 (x0 − x¯ )I ν√ 0 k ¯ )I min xi si IS(x − x νk√ ωf 0 min xi si ISx I f νk√ ω T 0 min xi si s x
Using (6.40c), we have ID−1 δ x3 I
≤ νk ID(s0 − s¯)I ≤ minν√k xi si IX(s0 − s¯)I
[use (6.28)] ≤ [use I · I2 ≤ I · I1 ]
≤
νk√ ωf 0 min xi si IXs I f νk√ ω T 0 min xi si x s .
(6.48)
A MTY-Type Infeasible Arc-Search Algorithm for LP
•
109
Using Lemme 6.4, this shows that IDδ s2 I + ID−1 δ x3 I
≤
νk ω f √ (sT x0 + xT s0 ) min xi si
≤
ω f (1 + 2ω o )
(xk )T sk 1
mini (xik ski ) 2
.
(6.49)
This complete the proof. This leads to the following lemma. Lemma 6.7 Let (x˙ , s˙) be defined by (6.3). Then, there exists a positive constant C0 , independent of n, such that � max{I(Dk )−1 x˙ I, I(Dk )s˙I} ≤ C0
Proof 6.8
n(xk )T sk .
(6.50)
First, it is easy to see k
k 12
I(x ◦ s ) I =
��
xik ski =
� (xk )T sk .
(6.51)
i
Since (xk , sk ) ∈ N2I (θ ), we have mini (xik ski ) ≥ (1 − θ )µk = (1 − θ ) (x �
(xk )T sk 1
mini (xik ski ) 2
≤
n(xk )T sk . (1 − θ )
k T k
) s n
. Therefore, (6.52)
Substituting (6.51) and (6.52) into (6.42) and using Lemma 6.3 prove (6.50) with f o C0 ≥ 1 + ω√(1+2ω ) . (1−θ )
From Lemma 6.7, we can establish several useful inequalities. The following simple facts will be used several times. Let u and v be two vectors with the same dimension, then � �� � � � � Iu ◦ vI2 = (ui vi )2 ≤ u2i v2i = IuI2 IvI2 . (6.53) i
i
i
If u and v satisfy uT v = 0, then, max{IuI2 , IvI2 } ≤ IuI2 + IvI2 = Iu + vI2 ,
(6.54)
and (see Lemma 3.2) 3
Iu ◦ vI ≤ 2− 2 Iu + vI2 .
(6.55)
110
• Arc-Search Techniques for Interior-Point Methods
Lemma 6.8 Let (x˙ , s˙) and (x¨ , s¨) be defined by (6.3) and (6.4), respectively. Then, there exist posi tive constants C1 , C2 , C3 , and C4 , independent of n, such that I˙x ◦ s˙I ≤ C1 n2 µk , I¨x ◦ s¨I ≤ C2 n4 µk , √ max{I(Dk )−1 x¨ I, I(Dk )¨sI} ≤ C3 n2 µk , max{I¨x ◦ s˙I, I˙x ◦ s¨I} ≤ C4 n3 µk Proof 6.9
(6.56) (6.57) (6.58) (6.59)
First, using (6.53) and Lemma 6.7, we have I˙x ◦ s˙I
= I(Dk )−1 x˙ ◦ (Dk )˙sI ≤ I(Dk )−1 x˙ II(Dk )˙sI ≤ C02 n(xk )T sk := C1 n2 µk .
(6.60)
Second, using (6.55), (6.4), (6.56), and (6.51), we have I¨x ◦ s¨I
= I(Dk )−1 x¨ ◦ (Dk )¨sI 3
2− 2 I(Dk )−1 x¨ + (Dk )¨sI2 � �2 3� 1 � ≤ 2− 2 �−2(Xk Sk )− 2 (x˙ ◦ s˙)� ⎛ ⎞2 n n � 1 � x ˙ s ˙ (x˙i s˙i )2 i i ⎝ � � ⎠ = 2 12 = 22 xik ski x k sk i=1 i=1
≤
≤ ≤
i i n 2 1 i=1 (x˙i s˙i ) 22 mini=1,...,n xik ski 1 Ix 1 ˙ ◦ s˙I2
22 1
= 22
(1 − θ )µk
≤ 22
C12 n4 µk2 (1 − θ )µk
C12 n4 µk := C2 n4 µk . 1−θ
(6.61)
Third, using (6.54), (6.4), and (6.56), we have max{I(Dk )−1 x¨ I2 , I(Dk )¨sI2 } ≤ I(Dk )−1 x¨ + (Dk )s¨I2 � �2 4C2 n4 µ 1 � � k = �−2(Xk Sk )− 2 (x˙ ◦ s˙)� ≤ 1 := C32 n4 µk . 1−θ
(6.62)
Taking the square root on both sides proves (6.58). Finally, using (6.53), (6.58), and Lemma 6.7, we have Ix¨ ◦ s˙I = I(Dk )−1 x¨ ◦ (Dk )s˙I ≤ I(Dk )−1 x¨ II(Dk )s˙I √ √ ≤ (C3 n2 µk )(C0 n µk ) := C4 n3 µk .
(6.63)
A MTY-Type Infeasible Arc-Search Algorithm for LP
•
111
Similarly, we can show Ix˙ ◦ s¨I ≤ C4 n3 µk .
(6.64)
This completes the proof. Now we are ready to estimate a conservative bound for sin(αk ). Lemma 6.9 Let (xk , λ k , sk ) be generated by Algorithm 6.1. Then, αk obtained in Step 3 satisfies the following inequality. θ , (6.65) sin(αk ) ≥ 2Cn 1
1
where C = max{1,C43 , (C1 +C2 ) 4 }. Proof 6.10 we have q(α) ≤ = ≤
Let α ∈ (0, π2 ] satisfy sin(α) =
θ 2Cn .
In view of (6.21) and Lemma 6.8,
µk ((C1 +C2 )n4 sin4 (α) + 2C4 n3 sin3 (α) + θ sin(α) − θ ) := µk p(α) � � (C1 +C2 )θ 4 2C4 θ 3 θ 2
µk + + − θ
8C3 2Cn 16C4 � 4 � 3 2
θ θ θ µk + + − θ ≤ 0.
16 4 2
θ Since p(α) is a monotonic function of sin(α), for all sin(α) ≤ 2Cn , the above inequalities hold (the last inequality holds because of θ ≤ 1). Therefore, for all θ , the inequality (6.10) holds. This completes the proof. sin(α) ≤ 2Cn
Remark 6.1 It is worthwhile to point out that the constant C depends on C0 which depends on ρ, but ρ is an unknown before we find the solution. Also, we can always find a better step-length sin(α) by solving the quartic q(α) = 0 and the calculation of the roots for a quartic polynomial is deterministic, negligible, and independent to n [125, 153]. Using the standard argument developed in Theorem 1.4, we have the following theorem. Theorem 6.3 Let (xk , λ k , sk ) be generated by Algorithm 6.1 with an initial point given by (6.24). For any ε > 0, the algorithm will terminate with (xk , λ k , sk ) satisfying (6.9) in at most O(nL) iterations, where L = max{ln((x0 )T s0 /ε), ln(Ir0b I/ε), ln(Irc0 I/ε)}.
112
• Arc-Search Techniques for Interior-Point Methods
Proof 6.11 In view of Lemma 6.1, rkb , rkc , and µk all decrease at the same rate θ (1 − sin(α)) in every iteration. In view of Lemma 6.9, the rate is at least (1 − 2Cn ). Invoking Theorem 1.4 proves the claim.
6.3
Concluding Remarks
An infeasible arc-search interior-point algorithm was proposed in this Chapter. The algorithm searches for the optimizer along an ellipse that approximates the centralpath. It is shown that the arc-search algorithm attained a polynomial bound O(nL), which is at least as good as the best existing bound for infeasible-interior-point al gorithms for linear programming using line search. Numerical test result reported in [161] is promising. In the next two chapters, the result will be improved further in terms of both complexity bound and computational efficiency.
Chapter 7
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
Interior-point method was regarded as a mature technique of linear programming [144, page 2], following many important developments between 1980-1990, such as a proposal of path-following method [88], the establishment of polynomial bounds for path-following algorithms [70, 71], the development of Mehrotra’s predictorcorrector (MPC) algorithm [89] and independent implementation and verification [82, 83], and the proof of the polynomiality of some related infeasible interior-point algorithms [92, 171]. Although many more algorithms were proposed between 1990 2010 (see, for example, [93, 90, 116, 120, 63]), there was no significant improvement in the best polynomial bound for interior-point algorithms, and there was no report of a better algorithm than an MPC variation, described in [144, page 198], for general linear programming problems1 . In fact, most popular interior-point method software packages implemented the MPC variation described in [144, page 198], for example, OB1 [82], LOQO [139], PCx [23], and LIPSOL [172]. However, as pointed out in previous chapters, researchers knew that there were still some dilemmas between theoretical polynomial bounds and computational ex perience. Namely, strategies implemented in the popular software packages, such as (a) search optimizers in large neighborhood, (b) use higher-order derivatives to ap proximate central path, and (c) start from infeasible interior-point were proved to be efficient, but theoretical analysis gave worse polynomial bounds when these strate gies were considered. These dilemmas were discussed in Chapters 2, 3, and 4. In 1 There
was some noticeable progress focused on problems with special structures, for example [142].
114
� Arc-Search Techniques for Interior-Point Methods
Chapter 5, we introduced arc-search to replace the traditional line search and showed that this higher-order method works for feasible short step algorithm, i.e., higherorder algorithm with arc-search achieves the best polynomial bound and has promis ing numerical performance. In Chapter 6, we showed that arc-search also improves polynomial bound for algorithm starting from infeasible initial point. In this chapter, our goal is to demonstrate that arc-search incorporating all three strategies mentioned above will be more efficient than the best line search algorithm, Mehrotra’s method. Instead of theoretical analysis, we perform extensive numerical testing in order to demonstrate the claim. We discuss an implementation of an infea sible arc-search interior-point algorithm, which allows us to test a lot more Netlib problems, because very few problems in benchmark Netlib test set have an interiorpoint, as noted in [20]. The implemented algorithm is almost identical to Mehrotra’s, except the former uses arc-search discussed in Chapters 5 and 6 while the latter uses line search. We show that the implemented arc-search infeasible interior-point algo rithm is very competitive in computation by testing all Netlib problems in standard form and comparing the results to those obtained by the MPC described in [144, page R 198]. To have a fair comparison, both algorithms are implemented in MATLAB� (curvelp.m for arc-search algorithm and mehrotra.m for Mehrotra’s algorithm); for all test problems, the two MATLAB codes use the same pre-processor and post processor, start from the same initial point, use the same parameters, and terminate with the same stopping criterion. Since the main cost in computation for both algo rithms is to solve linear systems of equations, which is exactly the same for both algorithms, and the arc-search infeasible interior-point algorithm uses less iterations in most tested problems than the MPC described in [144, page 198], we believe that the proposed arc-search algorithm is more attractive than the MPC algorithm. The material in this Chapter was first presented in [156] (which was the first arc-search infeasible interior-point algorithm for LP) and was published in [158] a few years later.
7.1
Installation and Basic Structure of CurveLP
CurveLP is available from Netlib (http://www.netlib.org/numeralgo/) as the na43 package. It is distributed as a single file CurveLP.zip. To install the library, down load and unpack the zip archive, start MATLAB, change to the root folder CurveLP. The readers will see four m-files: curvelp.m is the implementation of the proposed al gorithm, mehrotra.m is the implementation of Mehrotra’s algorithm which is mainly used for the purpose of comparison for the performance of CurveLP and Mehrotra’s algorithms, extract.m is used to extract (A, b, c) from Netlib linear programming test problems which are in the same folder in MATLAB format, and infeasibleexample.m is the MATLAB code used to generate Figure 7.2. README describes the simple procedures for testing and running the code. All four m-files are developed in MATLAB version 2006a and tested in MAT LAB versions of 2006a and 2012b on Windows.
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
�
115
7.2 An Infeasible Long-Step Arc-Search Algorithm for Linear Programming Consider the linear programming in the standard form described in (1.4) and its dual programming that is also presented in the standard form described in (1.5). As pointed out before, if a problem is not represented in the standard form, it is easy to transform it to the standard form by using the method described in [144, pages 226-230]. Similar to the aforementioned discussions, we will denote the residuals of the equality constraints (the deviation from the feasibility) by rb = Ax − b, rc = AT λ + s − c,
(7.1)
the duality measure by x T s . (7.2) n The central path C(t) of the primal-dual linear programming problem is parametrized by a scalar t > 0 as follows. For each interior point (x, λ , s) ∈ C(t) on the central path, there is a t > 0 such that µ=
Ax = b
(7.3a)
A λ + s = c (x, s) ≥ 0 xi si = t, i = 1, . . . , n.
(7.3b) (7.3c) (7.3d)
T
Therefore, the central path is an arc in R2n+m parametrized as a function of t and is denoted as C(t) = {(x(t), λ (t), s(t)) : t > 0}. (7.4) As t → 0, the central path (x(t), λ (t), s(t)) represented by (7.3) approaches to a solu tion of LP represented by (1.4) because (7.3) reduces to the KKT condition as t → 0. Because of the high cost of finding an initial feasible point and the central path described in (7.3), we consider a modified problem which allows infeasible initial point:
T
Ax − b = rb
A λ + s − c = rc (x, s) ≥ 0 xi si = t, i = 1, . . . , n.
(7.5a) (7.5b) (7.5c) (7.5d)
We search for the optimizer along an infeasible central path neighborhood. The infeasible central path neighborhood F(γ) considered in this chapter is defined as a collection of points that satisfy the following conditions, F(γ(t)) = {(x, λ , s) : �(rb (t), rc (t))� ≤ γ(t)�(r0b , r0c )�, (x, s) > 0},
(7.6)
116
� Arc-Search Techniques for Interior-Point Methods
where rb (1) = r0b , rc (1) = r0c , γ(t) ∈ [0, 1] is a monotonic function of t such that γ(1) = 1 and γ(t) → 0 as t → 0. In the next section, we will� show (Lemma 7.2) that x �(rb (t), rc (t))� ≤ γ(t)�(r0b , r0c )� always holds with γ(0) = ∞ j=0 (1 − sin(α j )) → 0 π because 2 ≥ α xj > 0. Therefore, the only restriction of this neighborhood is (x, s) > 0. This shows that the central path neighborhood is the widest in comparison to any neighborhood we have considered. Starting from any initial point (x0 , λ 0 , s0 ) in a central path neighborhood that satisfies (x0 , s0 ) > 0, for k ≥ 0, we consider a special arc parametrized by t and defined by the current iterate as follows: T
Ax(t) − b = trkb ,
(7.7a)
= trkc ,
A λ (t) + s(t) − c (x(t), s(t)) > 0,
(7.7b) (7.7c)
x(t) ◦ s(t) = txk ◦ sk .
(7.7d)
Clearly, each iteration starts at t = 1; and (x(1), λ (1), s(1)) = (xk , λ k , sk ). We want the iterate to stay inside F(γ) as t decreases. We denote the infeasible central path defined by (7.7) as H(t) = {(x(t), λ (t), s(t)) : t ≥ τ ≥ 0}.
(7.8)
If this arc is inside F(γ) for τ = 0, then as t → 0, (rb (t), rc (t)) := t(rkb , rkc ) → 0; and equation (7.7d) implies that µ(t) → 0; hence, the arc will approach to an optimal solution of (1.4) because (7.7) reduces to KKT condition as t → 0. To avoid comput ing the entire infeasible central path H(t), we will search along an approximation of H(t) and keep the iterate in F(γ). Therefore, we will use an ellipse E(α) in 2n + m dimensional space to approximate the infeasible central path H(t), where E(α) is given by E(α) = {(x(α), λ (α), s(α)) : (x(α), λ (α), s(α)) =�a cos(α) +�b sin(α) +�c}, (7.9)
�a ∈ R2n+m and �b ∈ R2n+m are the axes of the ellipse, and �c ∈ R2n+m is the center of the ellipse. Given the current iterate y = (xk , λ k , sk ) = (x(α0 ), λ (α0 ), s(α0 )) ∈ E(α) which is also on H(t), we will determine�a, �b,�c and α0 such that the first and second derivatives of E(α) at (x(α0 ), λ (α0 ), s(α0 )) are the same as those of H(t) at (x(α0 ), λ (α0 ), s(α0 )). Therefore, by taking the first order derivative for (7.7) at (x(α0 ), λ (α0 ), s(α0 )) = (xk , λ k , sk ) ∈ E, we have ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ A 0 0 rkb ⎣ 0 AT I ⎦ ⎣ λ˙ ⎦ = ⎣ rkc ⎦ , (7.10) Sk 0 Xk s˙ xk ◦ sk These linear systems of equations are very similar to those used in Chapter 5 except that equality constraints in (7.3) are not assumed to be satisfied. By taking the second derivative, we have ⎡ ⎤⎡ ⎤ ⎡ ⎤ x¨ A 0 0 0 ⎦. ⎣ 0 AT I ⎦ ⎣ λ¨ ⎦ = ⎣ 0 (7.11) −2x˙ ◦ s˙ Sk 0 Xk s¨
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
•
117
One of the brilliant contributions of [89] is that, instead of using (7.11), the second order term (‘corrector’) of the primal-dual trajectory is computed with the centering direction. Following the idea of [89], we modify (7.11) slightly to make sure that a substantial segment of the ellipse stays in F(t), thereby making sure that the step size along the ellipse is significantly greater than zero, ⎡ ⎤⎡ ⎤ ⎡ ⎤ x¨ (σk ) A 0 0 0 ⎣ 0 AT I ⎦ ⎣ λ¨ (σk ) ⎦ = ⎣ ⎦, 0 (7.12) σk µk e − 2˙x ◦ s˙ Sk 0 Xk s¨(σk ) where the duality measure µk is evaluated at (xk , λ k , sk ), and we set the centering pa rameter σk satisfying 0 < σk < σmax ≤ 0.5. Several slightly different heuristics have been proposed in literature. Mehrotra’s original idea in [89] was a subroutine CENPAR which was fine tuned by Lustig et al. in [82, page 440] in order to improve compu tational performance, which, in turn, was modified again and the modified centering method was still credited to Mehrotra in [144, page 196] (please note that Mehrotra is one of the authors of PCx [23]). The formula of [144] is widely believed to be the best method and is now implemented in most state-of-the-art software packages and published papers, such as [120, 139, 23, 172]. We follow the common practice and use the formula of [144] in our implementation. We emphasize that the second derivatives are functions of σk , which is the idea discussed in Section 3.4 and is used to speed up the convergence of Algorithm 3.4. Several relations follow immediately from (7.10) and (7.12). Lemma 7.1 Let (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) be defined in (7.10) and (7.12). Then, the following relations hold. sT x˙ + xT s˙ = xT s = nµ, sT x¨ + xT s¨ = σ µn − 2x˙ T s˙, x¨ T s¨ = 0.
(7.13)
Equations (7.10) and (7.12) can be solved in either unreduced form, or augmented system form, or normal equation form, as suggested in [144]. We solve the normal equations for (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) as follows: (AXS−1 AT )λ˙ = AXS−1 rc − b, s˙ = rc − AT λ˙ , −1
x˙ = x − XS
s˙,
(7.14a) (7.14b) (7.14c)
and (AXS−1 AT )λ¨ = −AS−1 (σ µe − 2x˙ ◦ s˙), s¨ = −AT λ¨ . x¨ = S
−1
(σ µe − Xs¨ − 2x˙ ◦ s˙).
(7.15a) (7.15b) (7.15c)
118
• Arc-Search Techniques for Interior-Point Methods
Given the first and second derivatives defined by (7.10) and (7.12), an analytic expression of the ellipse that is used to approximate the central path is derived in Chapter 5. Theorem 7.1 Let (x(α), λ (α), s(α)) be an arc defined by (7.9) passing through a point (x, λ , s) ∈ E ∩ H, and its first and second derivatives at (x, λ , s) be (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨), which
are defined by (7.10) and (7.12). Then, the ellipse approximation of H(t) is given by
x(α, σ ) = x − x˙ sin(α) + x¨ (σ )(1 − cos(α)).
(7.16)
λ (α, σ ) = λ − λ˙ sin(α) + λ¨ (σ )(1 − cos(α)).
(7.17)
s(α, σ ) = s − s˙ sin(α) + s¨(σ )(1 − cos(α)).
(7.18)
In the algorithm proposed below, we suggest taking step size αks = αkλ which may
not be equal to the step size of αkx . Algorithm 7.1 Data: A, b, c, and step size scaling factor β ∈ (0, 1).
Initial point: λ 0 = 0, x0 > 0, s0 > 0, and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1: Calculate (x˙ , λ˙ , s˙) using (7.14) and set αx a := arg max{α ∈ [0, 1] | xk − αx˙ ≥ 0}, αsa := arg max{α
Step 2: Calculate µ a = eter
k
∈ [0, 1] | s − αs˙ ≥ 0}.
˙ (xk +αxa x˙ )T (sk +αsa s) n
� σ=
µa µk
(7.19a) (7.19b)
and compute the centering param �3 .
(7.20)
Step 3: Compute (x¨ , λ¨ , s¨) using (7.15) with calculated σ from (7.20). Step 4: Set π α x = arg max{α ∈ [0, ] | xk − x˙ sin(α) + x¨ (1 − cos(α)) ≥ 0}, 2 π s α = arg max{α ∈ [0, ] | sk − s˙ sin(α) + s¨(1 − cos(α)) ≥ 0}. 2
(7.21a) (7.21b)
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
�
119
Step 5: Scale the step size by αkx = β α x and αks = β α s such that the update xk+1 = xk − x˙ sin(αkx ) + x¨ (1 − cos(αkx )) > 0, λ k+1 = λ k − λ˙ sin(α s ) + λ¨ (1 − cos(α s )), s
k+1
=s
k
k k s s − s˙ sin(αk ) + s¨(1 − cos(αk )) > 0.
(7.22a) (7.22b) (7.22c)
Step 6: Set k ← k + 1. Go back to Step 1. end (for) Remark 7.1 The main difference between the proposed algorithm and Mehrotra’s MPC algorithm described in [144, 198] is in Steps 4 and 5 where the iterate moves along the ellipse instead of a straight line. More specifically, instead of using (7.21) and (7.22), Mehrotra’s method uses α x = arg max{α ∈ [0, 1] | xk − α(x˙ − x¨ ) ≥ 0}, s
k
(7.23a)
α = arg max{α ∈ [0, 1] | s − α(s˙ − s¨) ≥ 0}.
(7.23b)
xk+1 = xk − αkx (x˙ − x¨ ) > 0, λ k+1 = λ k − α s (λ˙ − λ¨ ),
(7.24a)
and
s
k+1
=s
k
k − αks (˙s − s¨) > 0.
(7.24b) (7.24c)
Note that the end points of arc-search algorithm (αkx , αks ) = (0, 0) and (αkx , αks ) = in (7.21) and (7.22) are equal to the end points of Mehrotra’s formulas in (7.23) and (7.24); for any (αkx , αks ) between (0, π2 ), the ellipse is a better approxima tion of the infeasible central path. Therefore, the proposed algorithm should have a larger step size than Mehrotra’s method and be more efficient. This intuitive has been verified by the numerical test result reported in Section 7.4. ( π2 , π2 )
The following lemma shows that searching along the ellipse in iterations will reduce the residuals of the equality constraints to zero as k → ∞, provided that αkx and αks are bounded below and away from zero. Lemma 7.2 �k−1 � x Let rkb = Axk − b, rkc = AT λ k + sk − c, ρk = k−1 j=0 (1 − sin(α j )), and νk = j=0 (1 − sin(α sj )). Then, the following relations hold. k−1 �
0 rbk = rbk−1 (1 − sin(αkx−1 )) = · · · = rb
rck
= rck−1 (1 − sin(αks−1 )) = · · ·
= r0c
(1 − sin(α xj )) = r0b ρk ,
(7.25a)
j=0
k−1 �
(1 − sin(α sj )) = r0c νk .
j=0
(7.25b)
120
� Arc-Search Techniques for Interior-Point Methods
Proof 7.1
From Theorem 7.1, searching along ellipse generates iterate as follows. xk+1 − xk = −x˙ sin(αkx ) + x¨ (1 − cos(αkx )), λ k+1 − λ k = −λ˙ sin(α s ) + λ¨ (1 − cos(α s )), k
k
sk+1 − sk = −s˙ sin(αks ) + s¨(1 − cos(αks )).
In view of (7.10) and (7.12), we have
k k+1 rk+1 − x k ) = A(−x˙ sin(αkx ) + x¨ (1 − cos(αkx )) b − rb = A(x x = −Ax˙ sin(αkx ) = −rk
b sin(αk ),
(7.26)
x k therefore, rk+1 b = rb (1 − sin(αk )); this proves (7.25a). Similarly,
rck+1 − rck = = = =
AT (λ k+1 − λ k ) + (sk+1 − sk ) AT (−λ˙ sin(αks ) + λ¨ (1 − cos(αks )) − s˙ sin(αks ) + s¨(1 − cos(αks )) −(AT λ˙ + s˙) sin(αks ) + (AT λ¨ + s¨)(1 − cos(αks )) −rkc sin(αks ), (7.27)
therefore, rk+1 = rck (1 − sin(αks )); this proves (7.25b). c To show that the duality measure decreases with iterations, we present the fol lowing lemma. Lemma 7.3 Let αx be the step length for x(σ , α) and αs be the step length for s(σ , α) and λ (σ , α) defined in Theorem 7.1. Assume that αx = αs := α, then, the updated duality measure can be expressed as µ(α)
= −
Proof 7.2
µ[1 − sin(α) + σ (1 − cos(α))] � 1� T (x¨ rc − λ¨ T rb ) sin(α)(1 − cos(α)) + (x˙ T rc − λ˙ T rb )(1 − cos(α))2 (7.28) . n
First, from (7.10) and (7.12), we have x˙ T AT λ˙ + x˙ T s˙ = x˙ T rc ,
this gives x˙ T s˙ = − λ˙ T rb + x˙ T rc . Similarly, x˙ T s¨ = −x˙ T AT λ¨ = −λ¨ T rb ,
x¨ T s˙ = x¨ T s˙ + x¨ T AT λ˙ = x¨ T rc .
Using these relations with (7.2) and Lemmas 7.1, we have µ(α) =
(x − x˙ sin(αx ) + x¨ (1 − cos(αx )))T (s − s˙ sin(αs ) + s¨(1 − cos(αs ))) /n
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
�
121
xT s xT s˙ sin(αs ) + sT x˙ sin(αx ) − n n xT s¨(1 − cos(αs )) + sT x¨ (1 − cos(αx )) x˙ T s˙ sin(αs ) sin(αx ) + + n n x˙ T s¨ sin(αx )(1 − cos(αs )) + s˙T x¨ sin(αs )(1 − cos(αx )) − n x˙ T s˙ sin2 (α) − 2x˙ T s˙(1 − cos(α)) = µ[1 − sin(α) + σ (1 − cos(α))] + n x˙ T s¨ sin(α)(1 − cos(α )) + s˙T x¨ sin(α)(1 − cos(α)) − n = µ[1 − sin(α) + σ (1 − cos(α))] � 1� T (x¨ rc − λ¨ T rb ) sin(α)(1 − cos(α )) + (x˙ T rc − λ˙ T rb )(1 − cos(α ))2 . − n (7.29) =
This finishes the proof. Remark 7.2 In view of Lemma 7.2, if sin(α) is bounded below and away from zero, then rb → 0 and rc → 0 as k → ∞. Therefore, in view of Lemmas 7.3 and 5.4, we have, as k → ∞, µ(α ) ≈ µ[1 − sin(α) + σ (1 − cos(α))] ≤ µ[1 − sin(α) + σ sin2 (α)] < µ provided that λ˙ , x˙ , λ¨ , and x¨ are bounded. This means that equation µ(α) < µ for any α ∈ (0, π2 ) as k → ∞. As a matter of fact, in all numerical tests, the decrease of the duality measure was observed in every iteration, even for αx �= αs . Positivity of x(σ , αx ) and s(σ , αs ) is guaranteed if (x, s) > 0 holds and αx and αs are small enough. Assuming that x˙ , s˙, x¨ , and s¨ are bounded, the claim can easily be seen from the following relations
7.3
x(σ , αx ) = x − x˙ sin(αx ) + x¨ (1 − cos(αx )) > 0,
(7.30)
s(σ , αs ) = s − s˙ sin(αs ) + x¨ (1 − cos(αs )) > 0.
(7.31)
Implementation Details
In this section, we discuss factors that are normally not discussed in the main body of algorithms but affect noticeably, if not significantly, the effectiveness and efficiency of the infeasible interior-point algorithms. Most of these factors have been discussed in widely spread literatures, and they are likely implemented differently from code to code. We will address all of these implementation topics and provide the detailed information of the implementation in this chapter. As we will compare arc-search
122
• Arc-Search Techniques for Interior-Point Methods
method and Mehrotra’s method, to make a meaningful and fair comparison, we will implement everything discussed in this section the same way for both methods, so that the only differences of the two algorithms in our implementations are in Steps 4 and 5, where the arc-search method uses formulas (7.21) and (7.22) and Mehrotra’s method uses (7.23) and (7.24). However, the difference of the computational cost is very small because these computations are all analytic and negligible compared to the cost of solving linear systems.
7.3.1 Initial point selection Initial point selection has been known as an important factor in the computational efficiency for most infeasible interior-point algorithms. However, many commercial software packages do not provide sufficient details, for example, [23, 172]. We will use the methods proposed in [89, 83]. In [89, page 589], the initial point is selected as follows: λˆ = (AAT )−1 Ac; sˆ = c − AT λˆ ; xˆ = AT (AAT )−1 b.
(7.32)
Let δx = max(−1.1 min{xˆi }, 0) and δs = max(−1.1 min{sˆi }, 0). We then calculate T
(xˆ + δx e) (sˆ + δs e) δˆx = δx + 0.5 n i=1 (sˆi + δs )
(7.33)
T
(sˆ + δs e) (xˆ + δx e) δˆs = δs + 0.5 n i=1 (xˆi + δx )
(7.34)
and generate λ 0 = λˆ and s0i = sˆi + δˆs , i = 1, . . . , n, and xi0 = xˆi + δˆx , i = 1, . . . , n as a possible initial point. In [83, page 445], the initial point is selected as follows: xˆ = AT (AAT )−1 b.
(7.35)
Let ξ1 = max(−100 min{xˆi }, 100, IbI1 /100) and ξ2 = 1 + IcI1 , where I · I1 is the l1 norm. We then calculate, for i = 1, . . . , n, xi0 = max{ξ1 , xˆi } and ⎧ ci + ξ2 , if ci > ξ2 ⎪ ⎪ ⎪ ⎨− c , if ci < −ξ2 i (7.36) si0 = ⎪ + ξ , if ci ≥ 0, ci < ξ2 c i 2 ⎪ ⎪ ⎩ if ci ≥ −ξ2 , ci < 0, ξ2 , and set λ 0 = 0. For these two sets of initial points, we then calculate max{IAx0 − bI, IAT λ 0 + s0 − cI, µ0 }
(7.37)
and select the initial point that yields a smaller value because we have observed from testing Netlib problems that this selection generally reduces the number of iterations.
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
�
123
7.3.2 Pre-process Pre-process or pre-solver is a major factor that can significantly affect the numer ical stability and computational efficiency. Many literatures have been focused on this topic, for example [144, 83, 12, 7, 84]. As we will test all linear programming problems in standard form in Netlib, we focus on the strategies only for the standard linear programming problems in the form of (1.4), which are solved in normal equa tions2 . We will use Ai,· for the ith row of A, A·, j for the jth column of A, and Ai, j for the element at (i, j) position of A. While reducing � the problem, we will express the objective function in two parts, cT x = fob j + k ck xk . The first part fob j at the beginning is zero and is updated all the time as we reduce the problem (remove some ck from c); the terms in the summation in the second part are continuously reduced and ck are updated as necessary when we reduce the problem. When we select pre-process methods, besides considering the numerical stabil ity, efficiency and effectiveness in solving the linear programming problem, we will also assess their impact on the seamless implementation in post-process, which is the process to restore the expression of the solution in the original coordinate sys tem. To have a seamless implementation in the post-process, we store several vectors for every pre-process: corig = c for the original coefficients of the objective func tion; x f inal , which is set to zero at the beginning, will be used to store the optimal solution in the original unreduced coordinate; at the beginning of the pre-process, xidx = [1, 2, . . . , n]T , xidx will be reduced to keep a mapping between the orders of x in the reduced coordinate system and x f inal in the unreduced coordinate system. For pre-process 9, we will store a sparse matrix and a few more vectors: A post , which is empty at the beginning of the pre-process, is used to store equations which are eliminated in the pre-process 9 but need to be resolved in the final stage to recover the variables in the original coordinate system; b post is a vector associated with A post to recover the variables in the original coordinate system; and xtobe is a vector of coordinate information of the variables to be resolved in the final stage. These matrix and vectors need to be updated in the pre-process so that we can recover the solution expressed in the original coordinate system. We describe these updates only for the pre-processes that are chosen to be implemented. The first 6 pre-process methods presented below were reported in various litera tures, such as [144, 23, 12, 7, 84]; the rest of them, to the best of our knowledge, are not reported anywhere. 1. Empty row If Ai,· = 0 and bi = 0, this row can be removed. If Ai,· = 0 but bi �= 0, the problem is infeasible. 2. Duplicate rows If there is a constant k such that Ai,· = kA j,· and bi = kb j , a duplicate row can be removed. If Ai,· = kA j,· but bi �= kb j , the problem is infeasible. 2 Some strategies are specifically designed for solving augmented system form, for example, the ones discussed in [43], which we will not discuss in this chapter.
124
• Arc-Search Techniques for Interior-Point Methods
3. Empty column If A·,i = 0 and ci ≥ 0, xi = 0 is the right choice for the minimization. Remove the ith column A·,i and ci . Also remove xidx (i). If A·,i = 0 but ci < 0, the problem is unbounded as xi → ∞. 4. Duplicate columns If A·,i = A·, j , then Ax = b can be expressed as A·,i (xi + x j ) + k=i, � j A·,k xk = b, Moreover, if ci = c j , cT x can be expressed as ci (xi + x j ) + k=i, � j ck xk . Since xi ≥ 0 and x j ≥ 0, we have (xi + x j ) ≥ 0. Hence, a duplicate column can be removed. 5. Row singleton If Ai,· has exact one nonzero element, i.e., Ai,k �= 0 for some k, and for � k, Ai, j = 0; then xk = bi /Ai,k and cT x = ck bi /Ai,k + j=k ∀j = � c j x j . For �= � i, A�,· x = b� can be rewritten as j= � k A�, j x j = b� − A�,k bi /Ai,k . This suggests the following update: (i) if xk < 0, the problem is infeasible, oth erwise, continue, (ii) fopt + ck bi /Ai,k → fopt , (iii) remove ck from c, (iv) b� − A�,k bi /Ai,k → b� , (v) insert xk = bi /Ai,k into x f inal (xidx (k)), and (vi) re move xidx (k). With these changes, we can remove the ith row and the kth column. 6. Free variable If A·,i = −A·, j and ci = −c j , then we can rewrite Ax = b as A·,i (xi − x j ) + T � i, j A·,k xk , and c x = ci (xi − x j ) + � j ck xk . The new variable xi − x j k= k=i, is a free variable which can be solved if Aα,i �= 0 for some row α (otherwise, it is an empty column, which has been discussed). This gives ⎛ ⎞ � 1 ⎝ Aα,k xk ⎠ . xi − x j = bα − Aα,i k= � i, j
For any Aβ ,i �= 0, β �= α, Aβ ,· x = bβ can be expressed as � Aβ ,i (xi − x j ) + Aβ ,k xk = bβ , k�=i, j
or
⎛
Aβ ,i ⎝bα − Aα,i or
�� � j k=i,
Also,
cT x
⎞ �
Aα,k xk ⎠ +
� j k=i,
Aβ ,k −
Aβ ,i Aα,k Aα,i
�
Aβ ,k xk = bβ ,
� j k=i,
� xk = b β −
Aβ ,i bα . Aα,i
can be rewritten as ⎞ ⎛ � � ci ⎝ ck xk , bα − Aα,k xk ⎠ + Aα,i � j k=i,
� j k=i,
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
or
•
125
ci Aα ,k Aα,i
→
� � ci Aα ,k c i bα � xk . + ck − Aα,i Aα,i � j k=i,
bα This suggests the following update: (i) fob j + cAiα,i → fob j , (ii) ck − Aβ ,i Aα,k Aα,i
Aβ ,i bα Aα,i
ck , (iii) Aβ ,k − → Aβ ,k , (iv) bβ − → bβ , (v) delete Aα,· , bα , m − 1 → m, delete A·,i , A·, j , ci , c j , and n − 2 → n. 7. Fixed variable defined by a single row If bi < 0 and Ai,· ≥ 0 with at least one j such that Ai, j > 0, then the problem is infeasible. Similarly, If bi > 0 and Ai,· ≤ 0 with at least one j such that Ai, j < 0, then the problem is infeasible. If bi = 0, but either max(Ai,· ) ≤ 0 or min(Ai,· ) ≥ 0, then, for any j such that Ai, j �= 0, x j = 0 has to hold. Therefore, we can remove all such rows in A and b, and such columns in A and c. Also remove all corresponding elements in xidx . 8. Fixed variable defined by multiple rows If bi = b j , but either max(Ai,· − A j,· ) ≤ 0 or min(Ai,· − A j,· ) ≥ 0, then for any k such that Ai,k − A j,k = � 0, xk = 0 has to hold. This suggests the following update: (i) remove kth columns of A and c if Ai,k − A j,k �= 0, and (ii) remove either ith or jth row depending on which has more nonzeros. The same idea can be used for the case when bi + b j = 0. 9. Positive variable defined by signs of Ai,· and bi Since ⎞ ⎛ � 1 ⎝ bα − Aα,k xk ⎠ , xi = Aα,i k=i �
if the sign of Aα,i is the same as bα and opposite to all Aα,k for k �= i, then xi ≥ 0 is guaranteed. We can solve xi , and substitute back into Ax = b and cT x. A bα This suggests taking the following actions: (i) if Aβ ,i �= 0, bβ − Aβ ,iα,i → bβ , (ii) moreover, if Aα,k �= 0, then Aβ ,k − (iv) ck −
ci Aα,k Aα,i
Aβ ,i Aα,k Aα,i
bα → Aβ ,k , (iii) fob j + cAiα,i → fob j ,
post → ck , (v) add a row to the end of A post such that Aα,i = 1 and
post and add Aα,k /Aα,i → Aαpost ,k if Aα,k �= 0, (vi) add b(α)/Aα,i to the end of b xidx (i) to the end of btobe , (vii) remove the αth row and ith column from A and remove xidx (i).
10. A singleton variable defined by two rows If Ai,· −A j,· is a singleton and Ai,k −A j,k �= 0 for one and only one k, then xk = bi −b j Ai,k −A j,k . This suggests the following update: (i) if xk ≥ 0 does not hold, the problem is infeasible, (ii) if xk ≥ 0 does hold, for ∀� �= i, j and A�,k �= 0, b� − b −b A�,k A i −Aj → b� , (iii) remove either the ith or the jth row, and remove the i,k
j,k
126
• Arc-Search Techniques for Interior-Point Methods
kth column from A, (iv) remove ck from c, and (v) update fob j + ck fob j .
bi −b j Ai,k
→
We have tested all these ten pre-solvers, and they all work in terms of reducing the problem sizes and making the problems easier to solve, in most cases. How ever, pre-solvers 2, 4, 6, 8 and 10 are observed to be significantly more time con suming than pre-solvers 1, 3, 5, 7 and 9. Moreover, our experience shows that presolvers 1, 3, 5, 7 and 9 are more efficient in reducing the problem sizes than presolvers 2, 4, 6, 8 and 10. Therefore, in our implementation, we use only pre-solvers 1, 3, 5, 7, and 9 for all of our test problems. The proposed pre-process set (1,3,5,7,9) is very efficient. Table 7.1 compares the reduced problem sizes obtained by the proposed pre-process and by the pre-process of PCx [23]. Among all the tested problems, the pre-process of [23] is slightly bet ter only for 4 problems (agg, agg2, agg3, fffff800), while the proposed pre process in this chapter is better or the same for all other problems. PCx [23] does not report the reduced sizes for some large problems (Osa 07, Osa 14, Osa 30, Qap12, Qap15, Qap8). For the only one reported large problem Stocfor3, the proposed pre-process generates a much smaller problem: (m, n) = (5974, 12840) vs. (m, n) = (15362, 22228). If a full pre-process set (1 − 10), described in this section, is used, the reduced problem sizes are smaller for all tested problems than the ones obtained in [23]. Table 7.1: Pre-processes comparison for test problems in Netlib. Problem Adlittle Afiro Agg Agg2 Agg3 Bandm Beaconfd Blend Bnl1 Bnl2 Brandy Degen2 Degen3 fffff800 Israel Lotfi Maros r7 Osa 07 Osa 14 Osa 30 Qap12 Qap15
before prep m n 56 138 27 51 488 615 516 758 516 758 305 472 173 295 74 114 643 1586 2324 4486 220 303 444 757 1503 2604 525 1208 174 316 153 366 3136 9408 1118 25067 2337 54797 4350 104374 3192 8856 6330 22275
proposed pre-solve m n 54 136 8 32 391 479 514 755 514 755 192 347 57 147 49 89 429 1314 1007 3066 113 218 440 753 1490 2591 487 991 174 316 113 326 7440 2152 25030 1081 2300 54760 4313 104337 3048 8712 6105 22050
pre-solve of [23] m n 55 137 27 51 390 477 514 750 514 750 240 395 86 171 71 111 610 1491 1964 4009 133 238 444 757 1503 2604 322 826 174 316 133 346 2152 7440 -
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
Qap8 Sc105 Sc205 Sc50a Sc50b Scagr25 Scagr7 Scfxm1 Scfxm2 Scfxm3 Scrs8 Scsd1 Scsd6 Scsd8 Sctap1 Sctap2 Sctap3 Share1b Share2b Ship04l Ship04s Ship08l Ship08s Ship12l Ship12s Stocfor1 Stocfor2 Stocfor3 Truss
912 105 205 50 50 471 129 330 660 990 490 77 147 397 300 1090 1480 117 96 402 402 778 778 1151 1151 117 2157 16675 1000
1632 163 317 78 78 671 185 600 1200 1800 1275 760 1350 2750 660 2500 3340 253 162 2166 1506 4363 2467 5533 2869 165 3045 23541 8806
848 44 89 19 14 343 91 238 479 720 115 77 147 397 284 1033 1408 102 87 292 216 470 274 610 340 34 766 5974 1000
1568 102 201 47 42 543 147 500 1003 1506 893 760 1350 2750 644 2443 3268 238 153 1905 1281 3121 1600 4171 1943 82 1654 12840 8806
104 203 49 48 469 127 305 610 915 421 77 147 397 284 1033 1408 112 96 292 216 470 275 610 340 102 1980 15362 1000
•
127
162 315 77 76 669 183 568 1136 1704 1199 760 1350 2750 644 2443 3268 248 162 1905 1281 3121 1604 4171 1943 150 2868 22228 8806
An alternative way to see the superiority of the proposed pre-process is to use performance profile which, to our best knowledge, was first proposed in [132] and carefully analyzed in [29, 46]. Let S be the set of solvers and P be the set of test problems. Let ns be the number of solvers, n p be the number of test problems, m p,s be the merit function of using solver s for the problem p. The performance ratio is defined as [132]: r p,s =
m p,s . min{m p,s : s ∈ S}
(7.38)
The performance profile is defined as the distribution function of a performance (merit) metric [29]. For this problem, the merit function m p,s is the size of the re duced problem p using pre-process s. The performance profiles for the proposed pre-process and PCx’s pre-process are given in Figure 7.1.
128
� Arc-Search Techniques for Interior-Point Methods
Performance (reduced problem size) profiles comparison 1
CurveLP pre−process PCx pre−process 0.9
P (rp,s ≤ τ )
0.8
0.7
0.6
0.5
0.4 1
2
3
4
5
6
7
8
9
10
τ
Figure 7.1: Performance profiles comparison for CurveLP and PCx pre-processes.
7.3.3 Post-process With the implementation described in pre-process, the post-process is very sim ple and is given as the following MATLAB code. for i = 1 : length(xidx ) x f inal (xidx (i)) = x(i); end for ii=length(xtobe ):-1:1 idxp=find(A post (ii,:));
x f inal (xtobe (ii))=b post (ii);
for jj=1:length(idxp)
if idxp(jj)∼= xtobe (ii) x f inal (xtobe (ii)) = x f inal (xtobe (ii)) −A post (ii, idxp( j j)) ∗ x f inal (idxp( j j)); end
end
end
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
•
129
Remark 7.3 After restoring the solution from the reduced x back to x f inal in the original coordinate system, we can compare fopt and corig x f inal and verify that our code is correctly implemented if fopt = corig x f inal .
7.3.4 Matrix scaling Scaling is believed (for example, [23]) to be a good practice for ill-conditioned matrix A where the ratio max |Ai, j | (7.39) min {|Ak,l | : Ak,l = 0} is big. PCx adopted a scaling strategy proposed in [22]. Let Φ = diag(φ1 , · · · , φm ) and Ψ = diag(ψ1 , · · · , ψn ) be the diagonal scaling matrices of A. The scaling for matrix A in [23, 22] is equivalent to minimize
Ai j log2
.
φi ψ j Ai j �=0
Various methods of solving this problem are proposed [23, 22]. We implemented these methods and some variations and tested against standard linear program ming problems in Netlib. All of these methods have similar impact on the effi ciency of infeasible interior-point algorithms. We ran the proposed interior-point algorithms and Mehrotra’s algorithm with and without scaling for problems in Table 7.1 and compare the iteration counts. The following 9 problems use less iterations after the scaling is used: Bnl1, Sc205, Scagr25, Scfxm1, Scfxm3, Sc tap1, Sctap2, Ship04s, Stocfor1. The following 20 problems use the same num ber of iterations after the scaling is used: Agg, Bandm, Lotfi, Maros r7, Qap8, Sc105, Scagr7, Scrs8, Scsd1, Scsd8, Sctap3, Share1b, Share2b, Ship04l, Ship08l, Ship08s, Ship12l, Stocfor2, Stocfor3, Truss. The remaining 22 problems use more iterations after the scaling is used: Adlittle, Afiro, Agg2, Agg3, Beaconfd, Blend, Bnl2, Brandy, Degen2, Degen3, fffff800, Israel, Osa 07, Osa 14, Osa 30, Qap 12, Qap 15, Sc50a, Sc50b, Scfxm2, Scsd6, Ship12s. These test results show that, although scaling can improve efficiency of in feasible interior-point algorithms for some problems, over all, it does not help. In addition, there are no clear criteria on what problems may benefit from scaling and what problems may be adversely affected by scaling. Therefore, we decide not to use scaling in all our test problems. However, the ratio defined in (7.39) is a good indicator which can be used to determine if pre-solver 9 should be applied or not.
7.3.5 Removing row dependency from A Theoretically, convergence analyses in most existing literatures assume that the matrix A is full rank. Practically, row dependency causes some computa
130
• Arc-Search Techniques for Interior-Point Methods
tional difficulties. However, many real world problems, including some prob lems in Netlib, have dependent rows. Though using standard Gaussian elimina tion method can reduce A into a full rank matrix, the sparse structure of A will be destroyed. In [6], Andersen reported an efficient method that removes row dependency of A. His paper also claimed that not only the numerical stability is improved by the method, but the cost of the effort can also be justified. One of the main ideas is to identify most independent rows of A in a cheap and easy way and separate these independent rows from those that may be dependent. A variation of Andersen’s method can be summarized as follows. First, it assumes that all empty rows have been removed by pre-solver. Sec ond, matrix A often contains many column singletons (the column has only one nonzero), for example, slack variables are column singletons. Clearly, a row con taining a column singleton cannot be dependent. If these rows are separated (tem porarily removed) from the other rows of A, new column singletons may appear and more rows may be separated. This process may separate many rows from rest rows of A in practice. Permutation operations can be used to move the singletons to the diagonal elements of A. The dependent rows are among the rows left in the process. Then, Gaussian elimination method can be applied with pivot selec tion using Markowitz criterion [30, 28]. Some implementation details include (a) break ties by choosing element with the largest magnitude, and (b) use threshold pivoting. We have tested and analyzed the impact of removing dependent rows using Andersen’s method. Among 51 tested Netlib problems, this method is beneficial (in terms of finding accurate solution) to Mehrotra’s algorithm for 9 problems: Bnl2, Degen2, Degen3, Osa 07, Qap 15, Qap 8, Scfxm1, Scfxm3, Stocfor1. It is beneficial (in terms of finding accurate solution) to the proposed arc-search algo rithm for only one problem: Stocfor1. Considering the significant computational cost, we choose to not use this function unless we feel it is necessary when it is used as part of handling degenerate solutions discussed later. To have a fair comparison between the two algorithms, we will make it clear in the test report in Section 7.4 which algorithms and/or problems use this function and what al gorithms and/or problems do not.
7.3.6 Linear algebra for sparse Cholesky matrix Similar to Mehrotra’s algorithm, the majority of the computational cost of our proposed algorithm is related to solving possibly ill-conditioned sparse Cholesky systems (7.14) and (7.15), which can be expressed as an abstract problem as follows. (7.40) AD2 AT u = v, 1
1
where D = X 2 S− 2 is identical in (7.14) and (7.15), but u and v are differ ent vectors. Many popular LP solvers [23, 172] call a modified software pack
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
•
131
age3 of [104] which uses some linear algebra specifically developed for the ill-conditioned sparse Cholesky decomposition [77]. Although MATLAB has a function Chol, it does not yet implement features for ill-conditioned matrices, which occur frequently in interior-point method in linear programming prob lems. This is the major difference between our implementation and other popular LP solvers, which is most likely the main reason that our test results are slightly different from test results reported in other literature.
7.3.7 Handling degenerate solutions An important result in linear programming [40] is that there always exist strictly complementary optimal solutions that meet the conditions x∗ ◦ s∗ = 0 and x∗ + s∗ > 0. Therefore, the columns of A can be partitioned as B ⊆ {1, 2, . . . , n}, the set of indices of the positive coordinates of x∗ , and N ⊆ {1, 2, . . . , n}, the set of indices of the positive coordinates of s∗ , such that B ∪ N = {1, 2, . . . , n} and B ∩ N = ∅. Thus, we can partition A = (AB , AN ), and define the primal and dual optimal faces by P∗ = {x : AB xB = b, x ≥ 0, xN = 0}, and D∗ = {(λ , s) : ATN λ + SN = cN , sB = 0, s ≥ 0}. However, not all optimal solutions in linear programming are strictly comple mentary. The following example shows this fact: min x1 subject to x1 + x2 + x3 = 1, (x1 , x2 , x3 ) ≥ 0.
x∈R3
whose dual is ⎡
⎤ ⎡ ⎤ 1 1 ⎣ ⎦ ⎣ 1 λ + s = 0 ⎦ , s ≥ 0. max λ subject to λ ∈R,s∈R3 1 0 The primal-dual solutions for this problem are x∗ = (0,t, 1 − t)T , λ ∗ = 0, s∗ = (1, 0, 0)T , for any t ∈ [0, 1]. For t ∈ (0, 1), it is clear that (x∗ , λ ∗ , s∗ ) are strictly complementary solutions. However, for t = 0 or t = 1 the primal-dual solutions are no longer strictly com plementary because there is an index i for which xi∗ and s∗i are both zero. Although many interior-point algorithms are proved to converge strictly to complementary solutions, this claim may not be true for Mehrotra’s method and the arc-search method proposed in this chapter. 3 See
[144, Pages 216-219] for details.
132
• Arc-Search Techniques for Interior-Point Methods
Recall that the problem pair (1.4) and (1.5) is called to have a primal degener ate solution if a primal optimal solution x∗ has less than m positive coordinates, and have a dual degenerate solution if a dual optimal solution s∗ has less than n − m positive coordinates. The pair (x∗ , s∗ ) is called degenerate if it is primal or dual degenerate. This means that as xk → x∗ , equation (7.40) can be written as T (AB XB S−1 B AB )u = v,
(7.41)
If the problem converges to a primal degenerate solution, then some diagonal T elements of XB are zeros, hence, the rank of (AB XB S−1 B AB ) is less than m as k ∗ x → x . In this case, there is a difficulty in solving (7.41). Difficulty caused by degenerate solutions in interior-point methods for linear programming has been known for a long time [48]. We have observed this troublesome incidence in quite a few Netlib test problems. Similar observation was also reported in [44]. Though we don’t see any special attention or report on this troublesome issue from some widely cited papers and LP solvers, such as [89, 82, 83, 23, 172], we noticed from [144, page 219] that some LP solvers [23, 172] twisted the sparse Cholesky decomposition code [104] in order to overcome the difficulty. In our implementation, we use a different method to avoid the difficulty be cause we do not have access to the code of [104]. After each iteration, minimum xk is examined. If min{xk } ≤ εx , then, for all components of x satisfying xi ≤ εx , we delete A.i , xi , si , ci , and the ith component of rc ; use the method proposed in Subsection 7.4.5 to check if the updated A is full rank and make the updated A full rank if it is necessary. The default εx is 10−6 . For problems that needs a different εx , we will make it clear in the report of the test results.
7.3.8 Analytic solution of α x and α s We know that α x and α s in (7.23) can easily be calculated in analytic form. Similarly, α x and α s in (7.21) can also be calculated in analytic form as follows. For each i ∈ {1, . . . , n}, we can select the largest αxi such that for any α ∈ [0, αxi ], the ith inequality of (7.21a) holds, and the largest αsi such that for any α ∈ [0, αsi ] the ith inequality of (7.21b) holds. We then define αx = αs =
min {αxi },
(7.42)
min {αsi }.
(7.43)
i∈{1,...,n}
i∈{1,...,n}
αxi and αsi can be given in analytical forms according to the values of x˙i , x¨i , s˙i , s¨i . First, from (7.21), we have xi + x¨i ≥ x˙i sin(α) + x¨i cos(α).
Clearly, let β = sin(α), this is equivalent to finding β ∈ (0, 1] such that � xi − x˙i β + x¨i (1 − 1 − β 2 ) ≥ 0.
(7.44)
(7.45)
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
•
133
However, we prefer to use (7.44) in the following analysis because of its geomet ric property. Case 1 (x˙i = 0 and x¨i �= 0): For x¨i ≥ −xi , and for any α ∈ [0, π2 ], xi (α) ≥ 0 holds. For x¨i ≤ −xi , to meet xi +x¨i x¨i ,
(7.44), we must have cos(α) ≥ �
xi +x¨i x¨i
−1
cos
. Therefore,
if xi + x¨i ≥ 0
π 2
αxi =
xi +x¨i x¨i
or, α ≤ cos−1
(7.46)
if xi + x¨i ≤ 0.
Case 2 (x¨i = 0 and x˙i = � 0): For x˙i ≤ xi , and for any α ∈ [0, π2 ], xi (α) ≥ 0 holds. For x˙i ≥ xi , to meet (7.44), we must have sin(α) ≤ xx˙ii , or α ≤ sin−1 � α xi =
xi x˙i
. Therefore, if x˙i ≤ xi
π 2 −1
xi x˙i
sin
(7.47)
if x˙i ≥ xi
Case 3 (x˙i >� 0 and x¨i > 0): � Let x˙i = x˙i2 + x¨i2 cos(β ), and x¨i = x˙i2 + x¨i2 sin(β ), (7.44) can be rewritten as � xi + x¨i ≥ x˙i2 + x¨i2 sin(α + β ), (7.48) where
� β = sin
�
x¨i
−1
.
�
x˙i2 + x¨i2
(7.49)
� For x¨i + xi ≥ x˙i2 + x¨i2 , and for any α ∈ [0, π2 ], xi (α) ≥ 0 holds. For x¨i + xi ≤ � ¨i x˙i2 + x¨i2 , to meet (7.48), we must have sin(α + β ) ≤ √xi +x , or α + β ≤ x˙i2 +x¨i2 � � ¨i . Therefore, sin−1 √xi +x 2 2 x˙i +x¨i
αxi =
⎧ ⎨
π 2
⎩ sin−1
�
¨i √xi +x 2 2 x˙i +x¨i
�
− sin−1
�
√ x¨2i
if xi + x¨i ≥
�
if xi + x¨i ≤
x˙i +x¨i2
�
x˙i2 + x¨i2
�
x˙i2 + x¨i2
(7.50) 0 and x¨i < 0): Case 4 (x˙i >� � Let x˙i = x˙i2 + x¨i2 cos(β ), and x¨i = − x˙i2 + x¨i2 sin(β ), (7.44) can be rewritten as � xi + x¨i ≥ x˙i2 + x¨i2 sin(α − β ), (7.51) where
� β = sin−1
�
−x¨i
x˙i2 + x¨i2
� .
(7.52)
134
• Arc-Search Techniques for Interior-Point Methods
� For x¨i + xi ≥ x˙i2 + x¨i2 , and for any α ∈ [0, π2 ], xi (α ) ≥ 0 holds. For x¨i + xi ≤ � x˙i2 + x¨i2 , to meet (7.51), we must have sin(α − β ) ≤ √xi +2 x¨i 2 , or α − β ≤ x˙i +x¨i
� � ¨i
sin−1 √xi +x . Therefore, 2 2 x˙i +x¨i
αxi =
⎧ ⎨
π 2
⎩ sin−1
�
¨i √xi +x 2 2 x˙i +x¨i
�
+ sin−1
�
� x˙i2 + x¨i2 � if xi + x¨i ≤ x˙i2 + x¨i2
if xi + x¨i ≥
�
√−2x¨i
x˙i +x¨i2
(7.53) and x¨i < 0): Case 5 (x˙i < 0� � Let x˙i = − x˙i2 + x¨i2 cos(β ), and x¨i = − x˙i2 + x¨i2 sin(β ), (7.44) can be rewrit ten as � xi + x¨i ≥ − x˙i2 + x¨i2 sin(α + β ), (7.54) where
� β = sin−1
−x¨i
�
� x˙i2 + x¨i2
.
(7.55)
For x¨i + xi ≥ 0 and any α ∈ [0, π2 ], xi (α) ≥ 0 holds. For x¨i +�xi ≤ 0, to�meet (7.54), √(x2i +x¨i2) , or α + β ≤ π − sin−1 we must have sin(α + β ) ≥ − x˙i +x¨i
α xi =
⎧ ⎨
π 2
⎩ π − sin−1
�
−(xi +x¨i )
√
x˙i2 +x¨i2
�
− sin−1
�
¨i √−x 2
x˙i +x¨i2
�
−(xi +x¨i )
√
x˙i2 +x¨i2
. Therefore,
if xi + x¨i ≥ 0 if xi + x¨i ≤ 0
(7.56)
Case 6 (x˙i < 0 and x¨i > 0): Clearly (7.44) always holds for α ∈ [0, π2 ]. Therefore, we can take α xi =
π . 2
(7.57)
Case 7 (x˙i = 0 and x¨i = 0): Clearly (7.44) always holds for α ∈ [0, π2 ]. Therefore, we can take α xi =
π . 2
(7.58)
Similar analysis can be performed for α s in (7.21) and similar results can be obtained for αsi . For completeness, we list the formulas without repeating the proofs. Case 1a (s˙i = 0, s¨i �= 0): � π if si + s¨i ≥ 0 2 αsi = (7.59) −1 si +s¨i cos if si + s¨i ≤ 0. s¨i
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
•
135
Case 2a (s¨i = 0 and s˙i �= 0): � αsi =
if s˙i ≤ si
π 2
sin
−1
si s˙i
(7.60)
if s˙i ≥ si
Case 3a (s˙i > 0 and s¨i > 0):
αsi =
⎧ ⎨
π 2
⎩ sin−1
�
¨i √si +s 2 2
�
s˙i +s¨i
− sin−1
�
�
√ s2¨i
s˙i +s¨i2
� s˙2i + s¨2i
� if si + s¨i < s˙2i + s¨2i if si + s¨i ≥
(7.61)
Case 4a (s˙i > 0 and s¨i < 0):
α si =
⎧ ⎨
π 2
⎩ sin−1
�
¨i √si +s 2 2 s˙i +s¨i
�
+ sin−1
�
¨i √−s 2
�
if si + s¨i ≥ if si + s¨i ≤
s˙i +s¨2i
�
s˙i2 + s¨i2
�
s˙2i + s¨2i
(7.62)
Case 5a (s˙i < 0 and s¨i < 0):
α si =
⎧ ⎨
π 2
⎩ π − sin−1
�
−(si +s¨i )
√2
s˙i +s¨2i
�
− sin−1
�
¨i √−s 2
s˙i +s¨2i
�
if si + s¨i ≥ 0 if si + s¨i ≤ 0
(7.63)
Case 6a (s˙i < 0 and s¨i > 0): Clearly (7.44) always holds for α ∈ [0, π2 ]. Therefore, we can take αsi =
π . 2
(7.64)
Case 7a (s˙i = 0 and s¨i = 0): Clearly (7.44) always holds for α ∈ [0, π2 ]. Therefore, we can take αsi =
π . 2
(7.65)
7.3.9 Step scaling parameter A fixed step scaling parameter is used in PCx [23]. A more sophisticated step scaling parameter is used in LIPSOL, according to [144, Pages 204-205]. In our implementation, we use an adaptive step scaling parameter which is given below β = 1 − e−(k+2) ,
(7.66)
where k is the number of iterations. This parameter will approach to one as k → ∞.
136
• Arc-Search Techniques for Interior-Point Methods
7.3.10 Terminate criteria The main stopping criterion used in our implementations of the arc-search method and Mehrotra’s method is similar to that of LIPSOL [172] Irkb I µk Irkc I + + < 10−8 . T max{1, IbI} max{1, IcI} max{1, Ic xk I, IbT λ k I} In case the algorithms fail to find a good search direction, the programs also stop if step sizes αkx < 10−8 and αks < 10−8 . Finally, if due to the numerical problem, rkb or rck does not decrease but 10rk−1 < rbk or 10rk−1 < rkc , the programs stop. c b
7.4
Numerical Tests
In this section, we first examine a simple problem and show graphically what feasible central path and infeasible central path look like, why ellipsoidal approx imation may be a better approximation to infeasible central path than a straight line, and how the arc-search is carried out for this simple problem. Using a figure, we can easily see that searching along the ellipse is more attractive than search ing along a straight line. We then provide the numerical test results of larger scale Netlib test problems in order to validate our observation from this simple problem.
7.4.1 A simple illustrative example Let us consider the following example: min x1 ,
s.t. x1 + x2 = 5, x1 ≥ 0, x2 ≥ 0.
The feasible central path (x, λ , s) defined in (7.5) satisfies the following condi tions: x1 + x2 = 5, � � � � � � 1 1 s1 λ+ = , 0 1 s2 x1 s1 = µ,
x2 s2 = µ.
The optimizer is given by x1 = 0, x2 = 5, λ = 0, s1 = 1, and s2 = 0. The feasible central path of this problem is given analytically as � 5 − 2µ − (5 − 2µ)2 + 20µ λ= , (7.67a) 10 (7.67b) s1 = 1 − λ , s2 = −λ , x1 = µ/s1 , x2 = µ/s2 .
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
�
137
6 5.5 5 4.5 4 3.5 3 2.5 2 −1
0
1
2
3
4
Figure 7.2: Arc-search for the simple example.
The feasible and infeasible central paths for this example are arcs in 5 dimensional space (λ , x1 , s1 , x2 , s2 ). If we project the central paths into 2 dimensional subspace spanned by (x1 , x2 ), they are arcs in 2-dimensional subspace. Figure 7.2 shows the first two iterations of Algorithm 7.1 in the 2 dimensional subspace spanned by (x1 , x2 ). In Figure 7.2, the initial point (x10 , x20 ) is marked by ‘x’ in red; the optimal solution is marked by ‘*’ in red; (x˙ , s˙, λ˙ ) is calculated by using (7.10); (x¨ , s¨, λ¨ ) is calculated by using (7.12); the projected feasible central path C(t) near the optimal solution is calculated by using (7.67) and is plotted as a continuous line in black; the infeasible central path H(t) start ing from current iterate is calculated by using (7.7) and plotted as the dotted lines in blue; and the projected ellipsoidal approximations E(α) are the dotted lines in green (they may look like continuous lines sometimes because many dots are used). In the first iteration, the iterate ’x’ moves along the ellipse (defined by in Theorem 7.1) to reach the next iterate marked as ’o’ in red because the calcu lation of infeasible central path (the blue line) is very expensive and ellipse is cheap to calculate and a better approximation to the infeasible central path than a straight line. The remaining iterations are simply the repetition of the process until it reaches the optimal solution (s∗ , x∗ ). Only two iterations are plotted in Figure 7.2. It is worthwhile to note that, in this simple problem, the infeasible central path has a sharp turn in the first iteration, which may happen a number of times for general problem, as discussed in [141]. The arc-search method is expected to perform better than Mehrotra’s method in iterations that are close to the sharp turns. In this simple problem, after the first iteration, the feasible central path C(t), the infeasible central path H(t), and the ellipse E(α) are all very close to each other and close to a straight line.
138
• Arc-Search Techniques for Interior-Point Methods
7.4.2 Netlib test problems The algorithm developed in this chapter is implemented in a MATLAB func tion, we name it as curvelp.m. Mehrotra’s algorithm is also implemented in a MATLAB function, we name it as mehrotra.m. They are almost identical. Both algorithms use exactly the same initial point, the same stopping criteria, the same pre-process and post-process, and the same parameters. The only difference of the two implementations is that the arc-search method searches for the opti mizer along an ellipse and Mehrotra’s method searches for the optimizer along a straight line. Netlib test problems represented in MATLAB format are available in [50]. Numerical tests for both algorithms have been performed for all Netlib LP problems that are presented in standard form, except Osa 60 (m = 10281 and n = 232966) because the PC computer used for the testing does not have enough memory to handle this problem. The iteration numbers used to solve these prob lems are listed in Table 7.2. Table 7.2: Numerical results for test problems in Netlib. Problem Adlittle Afiro Agg Agg2 Agg3 Bandm Beaconfd Blend Bnl1 Bnl2+ Brandy Degen2* Degen3* fffff800 Israel Lotfi Maros r7 Osa 07* Osa 14 Osa 30 Qap12 Qap15* Qap8* Sc105 Sc205 Sc50a Sc50b Scagr25
iter 15 9 18 18 17 19 10 12 32 31 21 16 22 26 23 14 18 37 35 32 22 27 12 10 13 10 8 19
CurveLP.m objective 2.2549e+05 -464.7531 -3.5992e+07 -2.0239e+07 1.0312e+07 -158.6280 3.3592e+04 -30.8121 1.9776e+03 1.8112e+03 1.5185e+03 -1.4352e+03 -9.8729e+02 5.5568e+005 -8.9664e+05 -25.2647 1.4972e+06 5.3574e+05 1.1065e+06 2.1421e+06 5.2289e+02 1.0411e+03 2.0350e+02 -52.2021 -52.2021 -64.5751 -70.0000 -1.4753e+07
infeas 1.0e-07 1.0e-11 5.0e-06 4.6e-07 3.1e-08 3.2e-11 1.4e-12 1.0e-09 2.7e-09 5.4e-10 3.0e-06 1.9e-08 7.0e-05 4.3e-05 7.4e-08 3.5e-10 1.6e-08 4.2e-07 2.0e-09 1.0e-08 1.9e-08 3.9e-07 1.2e-12 3.8e-12 3.7e-10 3.4e-12 1.0e-10 5.0e-07
iter 15 9 22 20 18 22 11 14 35 38 19 17 22 31 29 18 21 35 37 36 24 44 13 11 12 9 8 18
Mehrotra.m objective 2.2549e+05 -464.7531 -3.5992e+07 -2.0239e+07 1.0312e+07 -158.6280 3.3592e+04 -30.8122 1.9776e+03 1.8112e+03 1.5185e+03 -1.4352e+03 -9.8729e+02 5.5568e+05 -8.9665e+05 -25.2647 1.4972e+06 5.3578e+05 1.1065e+06 2.1421e+06 5.2289e+02 1.0410e+03 2.0350e+02 -52.2021 -52.2021 -64.5751 -70.0000 -1.4753e+07
infeas 3.4e-08 8.0e-12 5.2e-05 5.2e-07 8.8e-09 8.3e-10 1.4e-10 4.9e-11 3.4e-09 9.3e-07 6.2e-08 2.0e-10 1.2e-09 7.7e-04 1.8e-08 2.7e-07 6.4e-09 1.5e-07 3.0e-08 1.3e-08 6.2e-09 1.5e-05 7.1e-09 9.8e-11 8.8e-11 8.3e-08 9.1e-07 4.6e-09
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
Scagr7 Scfxm1+ Scfxm2 Scfxm3+ Scrs8 Scsd1 Scsd6 Scsd8 Sctap1 Sctap2 Sctap3 Share1b Share2b Ship04l Ship04s Ship08l Ship08s Ship12l Ship12s Stocfor1** Stocfor2 Stocfor3 Truss
15 20 23 24 23 12 14 13 20 21 20 22 13 17 17 19 17 21 17 20/14 22 34 25
-2.3314e+06 1.8417e+04 3.6660e+04 5.4901e+04 9.0430e+02 8.6666 50.5000 9.0500e+02 1.4122e+03 1.7248e+03 1.4240e+03 -7.6589e+04 -4.1573e+02 1.7933e+06 1.7987e+06 1.9090e+06 1.9201e+06 1.4702e+06 1.4892e+06 -4.1132e+04 -3.9024e+04 -3.9976e+04 4.5882e+05
2.7e-09 3.1e-07 2.3e-06 1.9e-06 1.2e-11 1.0e-10 1.5e-13 6.7e-10 2.6e-10 2.1e-10 5.7e-08 6.5e-08 4.9e-11 5.2e-11 2.2e-11 1.6e-07 3.7e-08 4.7e-13 1.0e-10 2.8e-10 2.1e-09 4.7e-08 1.7e-07
17 22 26 23 30 13 16 14 27 21 22 25 15 18 20 22 20 21 19 14 22 38 26
-2.3314e+06 1.8417e+04 3.6660e+04 5.4901e+04 9.0430e+02 8.6666 50.5000 9.0500e+02 1.4123e+03 1.7248e+03 1.4240e+03 -7.6589e+04 -4.1573e+02 1.7933e+06 1.7987e+06 1.9091e+06 1.9201e+06 1.4702e+06 1.4892e+06 -4.1132e+04 -3.9024e+04 -3.9976e+04 4.5882e+05
�
139
1.1e-07 1.6e-08 2.6e-08 9.8e-08 1.8e-10 8.7e-14 7.9e-13 1.3e-10 0.0031 4.4e-07 5.9e-07 1.5e-06 7.9e-10 2.9e-11 4.5e-09 1.0e-10 4.5e-12 1.0e-08 2.1e-13 1.1e-10 1.6e-09 6.4e-08 9.5e-06
Several problems have degenerate solutions which make them difficult to solve or need significantly more iterations. We choose to use the option described in Section 7.4.7 to solve these problems. For problems marked with ‘+’, this op tion is called only for Mehrotra’s algorithm because Mehrotra’s algorithm cannot solve these problems without using this option. For problems marked with ‘*’, both algorithms need to call this option for better results. For the problem with ‘**’, only Mehrotra’s algorithm needs to use the option described in Section 7.4.7; but if both algorithms use the option described in Section 7.4.7, the it eration counts are the same for both algorithms. We need to keep in mind that although using the option described in Section 7.4.7 reduces the iteration count significantly, these iterations are significantly more expensive. Therefore, simply comparing iteration counts for problem(s) marked with ‘+’ will lead to a conclu sion in favor of Mehrotra’s method (which is what we will do in the following discussions). Since the major cost in each iteration for both algorithms is solving linear systems of equations, which are identical in these two algorithms, we conclude that the iteration number is a good measure of efficiency. In view of Table 7.2, it is clear that Algorithm 7.1 uses less iterations than Mehrotra’s algorithm to find the optimal solutions for majority tested problems. Among 51 tested prob lems, Mehrotra’s method uses fewer iterations (7 iterations in total) than the arcsearch method for only 6 problems (brandy, osa 07, sc205, sc50a, scagr25,
140
� Arc-Search Techniques for Interior-Point Methods
Performance (number of iterations) profiles comparison 1
CurveLP algorithm Mehrotra algorithm
0.9 0.8
P (rp,s ≤ τ )
0.7 0.6 0.5 0.4 0.3 0.2
1
1.1
1.2
1.3
1.4 τ
1.5
1.6
1.7
1.8
Figure 7.3: Performance profiles comparison for curvelp.m and mehrotra.m.
scfxm34 ), while the arc-search method uses fewer iterations (115 iterations in total) than Mehrotra’s method for 38 problems. For the rest 7 problems, both methods use the same number of iterations. The arc-search method is numeri cally more stable than Mehrotra’s method because, for problems bnl2, scfxm1, stocfer1, the arc-search method does not need to use the option described in Section 7.4.7, but Mehrotra’s method does need to use the option to solve the problems. For problem scatp1, Mehrotra’s method terminated with a relatively large error. We can also use performance profiles to explain superiority of the proposed algorithm. The performance function in this case is the number of iterations. Figure 7.3 gives the performance profiles for both curvelp.m and mehrotra.m.
7.5
Concluding Remarks
This chapter proposes and implements an arc-search interior-point pathfollowing algorithm that searches for optimizers along the ellipse that approx imates infeasible central path. The proposed algorithm is different from Mehro tra’s method only in search path. Both the arc-search method and Mehrotra’s 4 For this problem, Mehrotra’s method needs to use the option described in Section 7.4.7 but the arcsearch method does not need to. As a result, Mehrotra’s method uses noticeably more CPU time than the arc-search method.
A Mehrotra-Type Infeasible Arc-Search Algorithm for LP
•
141
method are implemented in MATLAB. Moreover, the two methods use ex actly the same initial point, the same pre-process and post-process, the same parameters, and the same stopping criteria. By doing this, we can compare both algorithms in a fair and controlled way. Numerical test is conducted using Netlib problems for both methods. The results show that the proposed arc-search method is more efficient and reliable than the well-known Mehrotra’s method. The MATLAB implementations for both algorithms are also available in MAT LAB file exchange website [157].
Chapter 8
√ An O( nL) Infeasible Arc-Search Algorithm for LP For interior-point methods using line search, we have discussed in previous chap ters the dilemmas between theoretical analysis and computational experience. For example, higher-order algorithms that use second or higher-order deriva tives have been proved to improve the computational efficiency [89, 82, 83] but higher-order algorithms have either poorer polynomial bound than the first-order algorithms [97] or do not have a polynomial bound at all [89]. In addition, some algorithm with the best polynomial bound performs poorly in a real computa tional test. For example, the short step interior-point algorithm, which searches for the optimizer in a narrow neighborhood, has the best polynomial bound [70] but performs very poorly in practical computation, while the long step interiorpoint algorithm [71], which searches for the optimizer in a larger neighborhood, performs better in numerical test but has inferior polynomial bound [144]. Even worse, Mehrotra’s predictor-corrector (MPC) algorithm, which was widely re garded as the most efficient interior-point algorithm in computation before 2010 and is competitive to the simplex algorithm for large problems, has not been proved to be polynomial (it may not even be convergent [19]). It should be noted that lack of polynomiality was a serious concern for simplex method [65], there fore, motivated ellipsoid method for linear programming [59], and was one of the main arguments in the early development of interior-point algorithms [144, 57]. In Chapters 5, 6, and 7, we introduced arc-search method and show that the new strategy may solve the dilemmas. In Chapter 5, we proposed a feasible arcsearch interior-point algorithm for linear programming which uses higher-order
√ An O( nL) Infeasible Arc-Search Algorithm for LP
•
143
derivatives to construct an ellipse and approximate the central path. Intuitively, searching along this ellipse will generate a longer step size than searching along any straight line. Indeed, the arc-search (higher-order) algorithm in Chapter 5 achieved the best polynomial bound which has partially solved the dilemma. Some promising numerical test results were demonstrated. In Chapter 6, we proposed an algorithm that uses infeasible starting point and showed the method has improved polynomial bound over algorithms that use line search. To demonstrate the superiority of the arc-search strategy proposed in Chapters 5 and 6 for practical problems, we devised an arc-search infeasi ble interior-point algorithm in Chapter 7, which allowed us to test a lot of more Netlib problems. The proposed algorithm is very similar to Mehrotra’s algorithm but replaces search direction by an arc used in Chapters 5 and 6. The compre hensive numerical test for a larger pool of Netlib problems clearly shows the superiority of arc-search over traditional line-search for interior-point method. Because the purpose of Chapter 7 is to demonstrate the computational merit of the arc-search method, the algorithm is a mimic of Mehrotra’s algorithm and we have not shown its convergence. In fact, we noticed in Chapter 7 that Mehro tra’s algorithm does not converge for several Netlib problems if the measure of handling degeneration is not applied. The purpose of this Chapter is to develop some infeasible interior-point algorithms which are both computationally com petitive to simplex algorithms (i.e., at least as good as Mehrotra’s algorithm) and theoretically polynomial. We first devise an algorithm slightly different from the one in Chapter 7 and we show that this algorithm is polynomial. We then propose a simplified version of the first algorithm. This simplified algorithm will search for optimizers in a neighborhood larger than those used in short step and long step path-following algorithms, thereby generating potentially a larger step size. Yet, we want to show that the modified algorithm has the best polynomial complexity bound, in particular, we want to show that the complexity bound is better than O(n2 L) in [171], which was established for an infeasible interior-point algorithm searching for optimizers in the long-step central-path neighborhood, and better than O(nL) in [92, 90, 161], which were obtained for infeasible interior-point algorithms using a narrow short-step central-path neighborhoods.√As a matter of fact, the simplified algorithm achieves the complexity bound O( nL), which is the same as the best polynomial bound for all interior-point algorithms. To make these algorithms attractive in theory and efficient in numerical com putation, we remove some unrealistic and unnecessary assumption made by in feasible interior-point algorithms using line search to prove some convergence results. First, we do not assume that the initial point has the form of (ζ e, 0, ζ e), where e is a vector of all ones, and ζ is a scalar which is an upper bound of the optimal solution as defined by I(x∗ , s∗ )I∞ ≤ ζ , which is an unknown be fore an optimal solution is found. Computationally, some most efficient starting point selections in [89, 83] (see Chapters 5 and 6) do not meet this restriction.
144
• Arc-Search Techniques for Interior-Point Methods
Second, we remove a requirement1 that I(rkb , rkc )I ≤ [I(r0b , rc0 )I/µ0 ]β µk , which is required in the convergence analysis for infeasible interior-point algorithms in Chapter 4 (see also, for example, equation (6.2) of [144], equation (13) of [67], and Theorem 2.1 of [92]). Our extensive numerical experience shows that this unnecessary requirement is the major barrier in achieving a large step size for the infeasible interior-point methods developed before this author’s work. This is the main reason that the infeasible interior-point algorithms using line search with proven convergence results do not perform well comparing to Mehrotra’s al gorithm which does not have any convergence result. We demonstrate the com putational merits of the proposed algorithms by testing these algorithms along with Mehrotra’s algorithm and the very efficient arc-search algorithm in Chapter 7 using Netlib problems and comparing the test results. To have a fair compari son, we use the same initial point, the same pre-process and post-process, and the same termination criteria for the four algorithms in all test problems. The result shows the superiority of these arc-search algorithms. These results solve all the dilemmas we discussed in Chapters 2, 3, and 4. The materials of this chapter is based on [159].
8.1
Preliminaries
Consider the linear programming in the standard form described in (1.4) and the dual programming that is also presented in the standard form in (1.5). Interior-point algorithms require that all the iterates satisfy the conditions x > 0 and s > 0. Infeasible interior-point algorithms, however, allow the iterates to deviate from the equality constraints. Again, we denote the residuals of the equality constraints (the deviation from the feasibility) by rb = Ax − b, rc = AT λ + s − c.
(8.1)
the duality measure by xT s . (8.2) n The central-path C of the primal-dual linear programming problem is parametrized by a scalar τ > 0 as follows. For each interior point (x, λ , s) ∈ C on the central path, there is a τ > 0 such that µ=
Ax = b
(8.3a)
AT λ + s = c (x, s) > 0 xi si = τ, i = 1, . . . , n.
(8.3b)
(8.3c)
(8.3d)
the requirement of �(rkb , rkc )� ≤ [�(r0b , rc0 )�/µ0 ]β µk , β ≥ 1 is a constant, k is the iteration count, and rck , are residuals of primal and dual constraints, respectively, and µk is the duality measure. All of these notations will be given in the next section. 1 In
rkb
√ An O( nL) Infeasible Arc-Search Algorithm for LP
•
145
Therefore, the central path is an arc in R2n+m parametrized as a function of τ and is denoted as C = {(x(τ), λ (τ), s(τ)) : τ > 0}. (8.4)
As τ → 0, the central path (x(τ), λ (τ), s(τ)) represented by (8.3) approaches to a solution of the linear programming problem represented by (1.4) because (8.3) reduces to the KKT conditions as τ → 0. Because of the high cost of finding the feasible initial point and its corre sponding central-path described in (8.3), we consider a modified problem which allows infeasible iterates on an arc A(τ) which satisfies the following conditions. T
Ax(τ) − b = τrkb := rbk (τ), τrck
(8.5a)
rkc (τ),
:= A λ (τ) + s(τ) − c = (x(τ), s(τ)) > 0,
(8.5b) (8.5c)
x(τ) ◦ s(τ) = τxk ◦ sk .
(8.5d)
where x(1) = xk , s(1) = sk , λ (1) = λ k , rb (1) = rkb , rc (1) = rkc , (rb (τ), rc (τ)) = τ(rkb , rkc ) → 0 as τ → 0. Clearly, as τ → 0, the arc defined as above will approach to an optimal solution of (1.4) because (8.5) reduces to KKT condition as τ → 0. We restrict the search for the optimizer in either the neighborhood F1 or the neighborhood F2 defined as follows: F1 = {(x, λ , s) : (x, s) > 0, xik sik ≥ θ µk }, F2 = {(x, λ , s) : (x, s) > 0},
(8.6a) (8.6b)
where θ ∈ (0, 1) is a constant. The neighborhood (8.6b) is clearly the widest neighborhood used in all existing literatures. Throughout this chapter, we make the following assumption. Assumption 1: A is a full rank matrix. Assumption 1 is trivial and non-essential as A can always be reduced to meet this condition in polynomial operations. With this assumption, however, the mathematical treatment will be significantly simplified. In Section 8.4, we will describe a method based on [6] to check if a problem meets this assumption. If it is not, the method will reduce the problem to meet this assumption. Assumption 2: There is at least an optimal solution of (1.4), i.e., the KKT condition holds. This assumption is weaker than the assumption made for feasible interiorpoint method where an interior-point (or strictly feasible solution of (1.4)) is
146
� Arc-Search Techniques for Interior-Point Methods
required. The assumption implies that there is at least one feasible solution of (1.4), which will be used in our convergence analysis. Although the infeasible central path defined in (8.5) allows for infeasible initial point, the calculation of (8.5) is still not practical. We consider a sim ple approximation of (8.5) discussed in Chapters 5–7. Starting from any point (xk , λ k , sk ) with (xk , sk ) > 0, we consider a specific arc which is defined by the current iterate and (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) as follows: ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ A 0 0 rkb ⎣ 0 AT I ⎦ ⎣ λ˙ ⎦ = ⎣ rkc ⎦ , (8.7) Sk 0 Xk xk ◦ sk s˙ ⎡ ⎤⎡ ⎤ ⎡ ⎤ x¨ (σk ) A 0 0 0 ⎦, ⎣ 0 AT I ⎦ ⎣ λ¨ (σk ) ⎦ = ⎣ 0 (8.8) k k −2x˙ ◦ s˙ + σk µk e 0 X S s¨(σk )
where σk ∈ [0, 1] is the centering parameter discussed in [144, page 196], and the duality measure µk is evaluated at (xk , λ k , sk ). We emphasize that the secondorder derivatives are functions of σk which we will carefully select in the range of 0 < σmin ≤ σk ≤ 1. A crucial but normally not emphasized fact is that, if A is full rank, and Xk and Sk are positive diagonal matrices, then the matrix ⎡ ⎤ A 0 0 ⎣ 0 AT I ⎦ Sk 0 Xk is nonsingular. This guarantees that (8.7) and (8.8) have unique solutions, which is important not only in theory but also in computation, to all interior-point algo rithms in linear programming. Therefore, we introduce the following assumption. Assumption 3: Xk > 0 and Sk > 0 are bounded below and away from zeros for all k iterations until the program is terminated. A simple trick of rescaling the step-length, introduced by Mehrotra [89] (see our implementation in Section 8.5.10), guarantees that this assumption holds. Given the first and second derivatives defined by (8.7) and (8.8), an analytic expression of an ellipse, which is an approximation of the curve defined by (8.5), is derived in Chapter 5 (see also [151, 153]). Theorem 8.1 Let (x(α), λ (α), s(α)) be an ellipse defined by (x(α), λ (α), s(α))|α=0 = (xk , λ k , sk ) and its first and second derivatives (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) which are defined by (8.7) and (8.8). Then, the ellipse is an approximation of A(τ) and is given by x(α, σ ) = xk − x˙ sin(α) + x¨ (σ )(1 − cos(α)).
(8.9)
√ An O( nL) Infeasible Arc-Search Algorithm for LP
λ (α, σ ) = λ k − λ˙ sin(α) + λ¨ (σ )(1 − cos(α)). k
s(α, σ ) = s − s˙ sin(α) + s¨(σ )(1 − cos(α)).
•
147
(8.10) (8.11)
It is clear from the theorem that the search for the optimizer is carried out along an arc parametrized by α with an adjustable parameter σ . We will use several simple results that can easily be derived from (8.7) and (8.8). To sim plify the notations, we will drop the superscript and subscript k unless a confu sion might be introduced. We will also use the short notation (x¨ , λ¨ , s¨) instead of (x¨ (σ ), λ¨ (σ ), s¨(σ )). Using the relations ˆ x˙ = AT (AAT )−1 rb + Av, AT λ˙ + s˙ = rc , X−1 x˙ + S−1 s˙ = e, ˆ is an orthonormal basis of the null space of A and using the similar where A derivation presented in Chapter 5 (see also [153, Lemma 3.5] or [154, Lemma 3.3]), we have
⇐⇒ ⇐⇒
ˆ X−1 x˙ = X−1 [AT (AAT )−1 rb + Av],
S−1 (AT λ˙ + s˙) = S−1 rc , X−1 x˙ + S−1 s˙ = e
� � � −1 � v ˆ , −S−1 AT = e − X−1 AT (AAT )−1 rb − S−1 rc X A λ˙ � � � � ˆ T SX−1 A ˆ )−1 A ˆ TS � � v (A e − X−1 AT (AAT )−1 rb − S−1 rc . ˙λ = −(AXS−1 AT )−1 AX (8.12)
This gives the following analytic solutions for (8.7). λ˙ = −(AXS−1 AT )−1 AX(e − X−1 AT (AAT )−1 rb − S−1 rc ), T
−1
T −1
−1
T
T −1
(8.13a)
−1
(8.13b) s˙ = A (AXS A ) AX(e − X A (AA ) rb − S rc ) + rc , T −1 −1 T −1 T T −1 −1 T T ˆ) A ˆ S(e − X A (AA ) rb − S rc ) + A (AA )−1 rb . ˆ (A ˆ SX A x˙ = A (8.13c) These relations can easily be reduced to some simpler formulas that will be used in Sections 8.5.6 and 8.5.8. (AXS−1 AT )λ˙ = AXS−1 rc − b, s˙ = rc − AT λ˙ , x˙ = x − XS
−1
s˙.
(8.14a)
(8.14b)
(8.14c)
Several relations follow immediately from (8.7) and (8.8) (see also in [159]). Lemma 8.1 Let (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) be defined in (8.7) and (8.8). Then, the following relations hold. T ¨ s◦x˙ +x◦s˙ = x◦s, sT x˙ + xT s˙ = xT s, sT x+x s¨ = −2x˙ T s˙ + σ µn, x¨ T s¨ = 0. (8.15)
148
• Arc-Search Techniques for Interior-Point Methods
Most popular interior-point algorithms of linear programming (e.g., Mehro tra’s algorithm) use heuristics to select σ first and then select the step size. In Section 3.4 (see also [154]), it has been shown that a better strategy is to se lect both centering parameter σ and step size α at the same time. This requires (x¨ , λ¨ , s¨) to be represented explicitly in terms of σk . Applying the similar deriva tion in Section 3.4 to (8.8), we have the following explicit solution for (x¨ , λ¨ , s¨) in terms of σ . ˆ )−1 A ˆ T X−1 (−2x˙ ◦ s˙ + σ µe) := px σ + qx , ˆ (A ˆ T SX−1 A x¨ = A λ¨ = −(AXS−1 AT )−1 AS−1 (−2x˙ ◦ s˙ + σ µe) := pλ σ + qλ , −1
T
s¨ = A (AXS
T −1
A )
−1
AS
(−2x˙ ◦ s˙ + σ µe) := ps σ + qs .
(8.16a) (8.16b) (8.16c)
These relations can easily be reduced to some simpler formulas as follows: (AXS−1 AT )λ¨ = AS−1 (2x˙ ◦ s˙ − σ µe), s¨ = −AT λ¨ , −1
x¨ = S
(σ µe − Xs¨ − 2x˙ ◦ s˙),
(8.17a) (8.17b) (8.17c)
or their equivalence to be used in Sections 8.5.6 and 8.5.8: (AXS−1 AT )pλ = −AS−1 µe, (AXS−1 AT )qλ = 2AS−1 (x˙ ◦ s˙), px = S
−1
ps = −AT pλ , qs = −AT qλ ,
−1
µe − S
Xps , qx = −S
−1
−1
Xqs − 2S
(x˙ ◦ s˙).
From (8.16) and (8.8), we also have ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 px A 0 0 ⎣ 0 A T I ⎦ ⎣ pλ ⎦ = ⎣ 0 ⎦ , S 0 X ps µe ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 A 0 0 qx ⎦. ⎣ 0 AT I ⎦ ⎣ qλ ⎦ = ⎣ 0 S 0 X qs −2˙x ◦ s˙
(8.18a) (8.18b) (8.18c)
(8.19)
(8.20)
From these relations, it is straightforward to derive the following: Lemma 8.2 Let (px , pλ , ps ) and (qx , qλ , qs ) be defined in (8.19) and (8.20); (x˙ , λ˙ , s˙) be defined in (8.7). Then, for every iteration k (k is omitted for the sake of notational simplicity), the following relations hold. qTx ps = 0, qTs px = 0, qTx qs = 0, pTx ps = 0, T
T
T
T
T
˙ s px + x ps = nµ, s qx + x qs = −2x˙ s,
(8.21a) (8.21b)
√ An O( nL) Infeasible Arc-Search Algorithm for LP
�
s ◦ px + x ◦ ps = µe, s ◦ qx + x ◦ qs = −2x˙ ◦ s˙.
149
(8.21c)
Remark 8.1 Under Assumption 3, a simple but very important observation from (8.13) and (8.16) is that x˙ , s˙, x¨ , and s¨ are all bounded if rb and rc are bounded, which we will show later.
To prove the convergence of the first algorithm, the following condition is required in every iteration: (8.22) xk ◦ sk ≥ θ µk e where θ ∈ (0, 1) is a constant.
Remark 8.2 Given (xk , λ k , sk , x˙ , λ˙ , s˙, x¨ , λ¨ , s¨) with (xk , sk ) > (0, 0), our strategy is to use the relations described in Lemmas 8.1 and 8.2 to find some appropriate αk ∈ (0, π/2] and σk ∈ [σmin , 1] such that k+1 1. �(rk+1 b , rc )� and µk+1 decrease in every iteration and approach to zero as k → 0.
2. (xk+1 , sk+1 ) > (0, 0). 3. xk+1 ◦ sk+1 ≥ θ µk+1 e.
The next lemma to be used in the discussion is provided in Chapter 7. Lemma 8.3 � Let rkb = Axk − b, rck = AT λ k + sk − c, and νk = k−1 j=0 (1 − sin(α j )). Then, the fol lowing relations hold. 0 rbk = rk−1 b (1 − sin(αk−1 )) = · · · = rb
rck = rck−1 (1 − sin(αk−1 )) = · · · = rc0
k−1 �
(1 − sin(α j )) = r0b νk ,
(8.23a)
0 (1 − sin(α j )) = rc
νk .
(8.23b)
j=0 k−1 �
j=0
In a compact form, (8.23) can be rewritten as (rkb , rkc ) = νk (rb0 , r0c ).
(8.24)
Lemma 8.3 indicates clearly that, in order to reduce (rkb , rkc ) quickly, we should take the largest possible step size αk → π/2. If νk = 0, then rkb = 0 = rkc . In this case, the problem has a feasible interior-point (xk , λ k , sk ) and can be solved by using some efficient feasible interior-point algorithm. Therefore, in the re mainder of this chapter, we use the following assumption:
150
• Arc-Search Techniques for Interior-Point Methods
Assumption 4: νk > 0 for ∀k ≥ 0. A rescale of αk similar to the strategy discussed in [144] is suggested in Sec tion 8.4.10, which guarantees that Assumptions 3 and 4 hold in every iteration. To examine the decreasing property of µ(σk , αk ), we need the following result. Lemma 8.4 Let αk be the step length at kth iteration, and let x(σk , αk ), s(σk , αk ), and λ (σk , αk ) be defined in Theorem 8.1. Then, the updated duality measure after an iteration from k can be expressed as µk+1 := µ(σk , αk ) =
1 [au (αk )σk + bu (αk )] , n
(8.25)
where au (αk ) = nµk (1 − cos(αk )) − (x˙ T ps + s˙T px ) sin(αk )(1 − cos(αk )) and bu (αk ) = nµk (1−sin(αk ))−[x˙ T s˙(1−cos(αk ))2 +(s˙T qx +x˙ T qs ) sin(αk )(1−cos(αk ))] are coefficients which are functions of αk . Proof 8.1
Using (8.2), (8.16), and Lemmas 8.1 and 8.2, we have nµ(σk , αk )
= =
=
xk − x˙ sin(αk ) + x¨ (1 − cos(αk )) T
T
T
T
sk − s˙ sin(αk ) + s¨(1 − cos(αk )) T
T
xk sk − xk s˙ + sk x˙ sin(αk ) + xk s¨ + sk x¨ (1 − cos(αk )) +x˙ T s˙ sin2 (αk ) − x˙ T s¨ + s˙T x¨ sin(αk )(1 − cos(αk ))
nµk (1 − sin(αk )) + σk µk n − 2x˙ T s˙ (1 − cos(αk )) + x˙ T s˙ sin2 (αk ) − x˙ T s¨ + s˙T x¨ sin(αk )(1 − cos(αk ))
nµk (1 − sin(αk )) + nσk µk (1 − cos(αk )) − x˙ T s˙(1 − cos(αk ))2 − x˙ T s¨ + s˙T x¨ sin(αk )(1 − cos(αk )) � � = nµk (1 − cos(αk )) − (x˙ T ps + s˙T px ) sin(αk )(1 − cos(αk )) σk +nµk (1 − sin(αk )) −[x˙ T s˙(1 − cos(αk ))2 + (s˙T qx + x˙ T qs ) sin(αk )(1 − cos(αk ))] := au (αk )σk + bu (αk ). =
This proves the lemma.
(8.26)
√ An O( nL) Infeasible Arc-Search Algorithm for LP
�
151
T
Since Assumption 3 implies that µk = xk sk /n is bounded below and away from zero, in view of Lemma 5.4, it is easy to see from (8.26) the following proposition. Proposition 8.1 For any fixed σk , if x˙ , s˙, x¨ , and s¨ are bounded, then there always exists αk ∈ (0, 1) bounded below and away from zero such that µ(σk , αk ) decreases in every iteration. Moreover, µk+1 := µ(σk , αk ) → µk (1 − sin(αk )) as αk → 0.
Now, we show that there exists αk bounded below and away from zero such that the requirement 2 of Remark 8.2 holds. Let ρ ∈ (0, 1) be a constant, and xk = min xik , sk = min skj .
(8.27)
φk = min{ρxk , νk }, ψk = min{ρsk , νk }.
(8.28)
0 < φk e ≤ ρxk , 0 < φk e ≤ νk e,
(8.29a)
i
j
Denote φk and ψk such that
It is clear that k
0 < ψk e ≤ ρs , 0 < ψk e ≤ νk e.
(8.29b)
Positivity of x(σk , αk ) and s(σk , αk ) is guaranteed if (x0 , s0 ) > 0 and the fol lowing conditions hold. xk+1
= x(σk , αk ) = xk − x˙ sin(αk ) + x¨ (1 − cos(αk )) = px (1 − cos(αk ))σk + [xk − x˙ sin(αk ) + qx (1 − cos(αk ))] (8.30) := ax (αk )σk + bx (αk ) ≥ φk e.
sk+1
= s(σ , αk ) = sk − s˙ sin(αk ) + x¨ (1 − cos(αk )) = ps (1 − cos(αk ))σk + [sk − s˙ sin(αk ) + qs (1 − cos(αk ))] (8.31) := as (αk )σk + bs (αk ) ≥ ψk e.
If xk+1 = xk − x˙ sin(αk ) + x¨ (1 − cos(αk )) ≥ ρxk holds, from (8.29a), we have xk+1 ≥ φk e. Therefore, inequality (8.30) will be satisfied if (1 − ρ)xk − x˙ sin(αk ) + x¨ (1 − cos(αk )) ≥ 0,
(8.32)
which holds for some αk > 0 bounded below and away from zero because (1 − ρ)xk > 0 is bounded below and away from zero. Similarly, from (8.29b), inequality (8.31) will be satisfied if (1 − ρ)sk − s˙ sin(αk ) + s¨(1 − cos(αk )) ≥ 0,
(8.33)
152
� Arc-Search Techniques for Interior-Point Methods
which holds for some αk > 0 bounded below and away from zero because (1 − ρ)sk > 0 is bounded below and away from zero. We summarize the above discussion as the following proposition. Proposition 8.2 There exists αk > 0 bounded below and away from zero such that (xk+1 , sk+1 ) > 0 for all iteration k.
The next proposition addresses requirement 3 of Remark 8.2. Proposition 8.3 There exist αk bounded below and away from zero for all k such that (8.22) holds. Proof 8.2
From (8.30) and (8.31), since xik ski ≥ θ µk , we have
xik+1 sik+1 = [xik − x˙i sin(αk ) + x¨i (1 − cos(αk ))][ski − s˙i sin(αk ) + s¨i (1 − cos(αk ))] = xik ski − [x˙i sik + xik s˙i ] sin(αk ) + [x¨i ski + xik s¨i ](1 − cos(αk )) +x˙i s˙i sin2 (αk ) − [x¨i s˙i + x˙i s¨i ] sin(αk )(1 − cos(αk )) + x¨i s¨i (1 − cos(αk ))2 = xik ski (1 − sin(αk )) + x˙i s˙i [sin2 (αk ) − 2(1 − cos(αk ))] + σk µk (1 − cos(αk )) −[x¨i s˙i + x˙i s¨i ] sin(αk )(1 − cos(αk )) + x¨i s¨i (1 − cos(αk ))2
2 = xik sk
i (1 − sin(αk )) − x˙i s˙i (1 − cos(αk )) + σk µk (1 − cos(αk ))
−[x¨i s˙i + x˙i s¨i ] sin(αk )(1 − cos(αk )) + x¨i s¨i (1 − cos(αk ))2
≥ θ µk (1 − sin(αk )) + σk µk (1 − cos(αk ))
−[x¨i s˙i + x˙i s¨i ] sin(αk )(1 − cos(αk )) + (x¨i s¨i − x˙i s˙i )(1 − cos(αk ))2 .
Therefore, using µk+1 = µ(σk , αk ) and (8.26), we have xik+1 sk+1 − θ µk+1 i ≥ θ µk (1 − sin(αk )) + θ σk µk (1 − cos(αk )) + (1 − θ )σk µk (1 − cos(αk )) −[x¨i s˙i + x˙i s¨i ] sin(αk )(1 − cos(αk )) + (x¨i s¨i − x˙i s˙i )(1 − cos(αk ))2 . θ −θ µk (1 − sin(αk )) − θ σk µk (1 − cos(αk )) + x˙ T s˙(1 − cos(αk ))2 n � θ� T T + x˙ s¨ + s˙ x¨ sin(αk )(1 − cos(αk )) n � � � θ = (1 − cos(αk )) (1 − θ )σk µk + x¨i s¨i − x˙i s˙i + x˙ T s˙ (1 − cos(αk )) n � � � � � θ T x˙ s¨ + s˙T x¨ sin(αk ) − x¨i s˙i + x˙i s¨i − n (8.34) := (1 − cos(αk ))p(α).
√ An O( nL) Infeasible Arc-Search Algorithm for LP
�
153
Since Assumption 3 implies (a) µk is bounded below and away from zero, and (b) x˙ , s˙, x¨ , and s¨ are all bounded, (1 − θ )σk µk is bounded below and away from zero, there must be an αk bounded below and away from zero such that p(α) ≥ 0. This proves the claim.
Let σmin and σmax be constants, and 0 < σmin < σmax ≤ 1. From Propositions 8.1, and 8.2, and Lemma 8.3, we conclude: Proposition 8.4 For any fixed σk such that σmin ≤ σk ≤ σmax , there is a constant δ > 0 related to lower bound of αk such that (a) rkb − rk+1 ≥ δ e, (b) rck − rck+1 ≥ δ e, (c) µk − µk+1 ≥ δ , and b k+1 k+1 (d) (x , s ) > 0.
8.2
A Basic Algorithm
This algorithm considers the search in the neighborhood (8.6a). Based on the discussion in the previous section, we will show in this section that the following arc-search infeasible interior-point algorithm is well-defined and converges in polynomial iterations. Algorithm 8.1 Data: A, b, c.
Parameter: ε ∈ (0, 1), σmin ∈ (0, 1), σmin ≤ σmax ∈ (0, 1), θ ∈ (0, 1), and ρ ∈ (0, 1).
Initial point: λ 0 = 0 and (x0 , s0 ) > 0.
for iteration k = 0, 1, 2, . . .
Step 0: If �r0b � ≤ ε, �rc0 � ≤ ε, and µk ≤ ε, stop. Step 1: Calculate µk , rkb , rkc , λ˙ , s˙, x˙ , pkx , pkλ , pks , qkx , qkλ , and qks . Step 2: Find some appropriate αk ∈ (0, π/2] and σk ∈ [σmin , σmax ] to satisfy x(σk , αk ) ≥ φk e, s(σk , αk ) ≥ ψk e, µk > µ(σk , αk ), x(σk , αk )s(σk , αk ) ≥ θ µ(σk , αk )e. Step 3: Set (xk+1 , λ k+1 , sk+1 ) = (x(σk , αk ), λ (σk , αk ), s(σk , αk )) and µk+1 = µ(σk , αk ). Step 4: Set k + 1 → k. Go back to Step 1. end (for)
The algorithm is well defined because of the three propositions in the pre vious section, i.e., there is a series of αk bounded below and away from zero
154
� Arc-Search Techniques for Interior-Point Methods
such that all conditions in Step 2 hold. Therefore, a constant ρ ∈ (0, 1) satisfying ρ ≥ (1 − sin(αk )) does exist for all k ≥ 0. Denote βk =
min{xk , sk } ≥ 0, νk
(8.35)
and β = inf{βk } ≥ 0. k
(8.36)
The next lemma shows that β is bounded below and away from zero. Lemma 8.5 Assuming that ρ ∈ (0, 1) is a constant and for all k ≥ 0, ρ ≥ (1 − sin(αk )). Then, we have β ≥ min{x0 , s0 , 1}. Proof 8.3 For k = 0, (x0 , s0 ) > 0, and ν0 = 1, therefore, β0 ≥ min{x0 , s0 , 1} holds. Assuming that βk ≥ min{x0 , s0 , 1} holds for k > 0, we would like to show that βk+1 = min{βk , 1} holds for k + 1. We divide our discussion into three cases. Case 1: min{xk+1 , sk+1 } = xk+1 ≥ ρxk ≥ xk (1 − sin(αk )). Then, we have βk+1 =
min{xk+1 , sk+1 } xk (1 − sin(αk )) ≥ ≥ βk . νk (1 − sin(αk )) νk+1
Case 2: min{xk+1 , sk+1 } = sk+1 ≥ ρsk ≥ sk (1 − sin(αk )). Then, we have βk+1 =
min{xk+1 , sk+1 } sk (1 − sin(αk )) ≥ ≥ βk . νk (1 − sin(αk )) νk+1
Case 3: min{xk+1 , sk+1 } ≥ νk . Then, we have βk+1 =
min{xk+1 , sk+1 } νk ≥ ≥ 1. νk (1 − sin(αk )) νk+1
Adjoining these cases, we conclude β ≥ min{β0 , 1}.
The main purpose of the remaining section is to establish a polynomial bound for this algorithm. In view of (8.5) and (8.6a), to show the convergence of Algo rithm 8.1, we need to show that there is a sequence of αk ∈ (0, π/2] with sin(αk ) being bounded by a polynomial of n, and σk ∈ [σmin , σmax ], such that (a) rkb → 0, rkc → 0 (which can be shown by using Lemma 8.3), and µk → 0, (b) (xk , sk ) > 0 for all k ≥ 0, and (c) x(σk , αk ) ◦ s(σk , αk ) ≥ θ µ(σk , αk )e for all k ≥ 0. Although the strategy presented below is similar to the one used by Kojima [67], Kojima, Megiddo, and Mizuno [68], Wright [144], and Zhang [171], the
√ An O( nL) Infeasible Arc-Search Algorithm for LP
•
155
convergence results in this chapter do not depend on some unrealistic and unnec essary restrictions assumed in those papers. We start with a simple but important 1 1 observation. Let D = X 2 S− 2 = diag(Dii ). Lemma 8.6 For Algorithm 8.1, there is a constant C1 independent of n such that for ∀i ∈ {1, . . . , n} � � ski xik √ √ k −1 k (Dii ) νk = νk ≤ C1 nµk , Dii νk = νk ≤ C1 nµk . (8.37) k k xi si Proof 8.4 We know that min{xk , sk } > 0, νk > 0, and β > 0 is a constant inde pendent of n. By the definition of βk , we have xik ≥ xk ≥ βk νk ≥ β νk and skj ≥ sk ≥ βk νk ≥ β νk . This gives, for ∀i ∈ {1, . . . , n}, � � ski √ 1 1√ k −1 (Dii ) νk = ν ≤ sik xik ≤ nµk := C1 nµk . k β β xik √ Using a similar argument for skj ≥ β νk , we can show that Dii νk ≤ C1 nµk .
The main idea in the proof is based on a crucial observation used in many lit eratures, for example, Mizuno [92] and Kojima [67]. Let S¯ be defined by (6.25). Let (x¯ , λ¯ , s¯) ∈ S¯ be a feasible point satisfying Ax¯ = b and AT λ¯ + s¯ = c. The ex istence of (x¯ , λ¯ , s¯) is guaranteed by Assumption 2. We will make an additional assumption in the remaining discussion. Assumption 5: There exists a big constant M which is independent to the problem size n and m such that min(x¯ ,λ¯ ,s¯)∈S¯ I x0 − x¯ , s0 − s¯ I < M. For (x¯ , λ¯ , s¯) meeting Assumption 5, since Ax˙ = rkb = νk r0b = νk (Ax0 − b) = νk A(x0 − x¯ ), we have A(x˙ − νk (x0 − x¯ )) = 0. Similarly, since AT λ˙ + s˙ = rkc = νk r0c = νk (AT λ 0 + s0 − c) = νk (AT (λ 0 − λ¯ ) + (s0 − s¯)), we have AT (λ˙ − νk (λ 0 − λ¯ )) + (s˙ − νk (s0 − s¯)) = 0.
156
• Arc-Search Techniques for Interior-Point Methods
Using Lemma 8.1, we have s ◦ (x˙ − νk (x0 − x¯ )) + x ◦ (s˙ − νk (s0 − s¯)) = x ◦ s − νk s ◦ (x0 − x¯ ) − νk x ◦ (s0 − s¯). Thus, in matrix form, we have ⎡ ⎤ ⎡ ⎤ ⎤⎡ x˙ − νk (x0 − x¯ ) A 0 0 0 ⎣ 0 AT I ⎦⎣ λ˙ − νk (λ 0 − λ¯ ) ⎦ = ⎣ ⎦ 0 0 0 0 x ◦ s − νk s ◦ (x − x¯ ) − νk x ◦ (s − s¯) S 0 X s˙ − νk (s − s¯) (8.38) Denote (δ x, δ λ , δ s) = (x˙ − νk (x0 − x¯ ), λ˙ − νk (λ 0 − λ¯ ), s˙ − νk (s0 − s¯)) and r = r1 + r2 + r3 with (r1 , r2 , r3 ) = (x ◦ s, −νk s ◦ (x0 − x¯ ), −νk x ◦ (s0 − s¯)). For i = 1, 2, 3, let (δ xi , δ λ i , δ si ) be the solution of ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 δ xi 0 ⎣ 0 AT I ⎦ ⎣ δ λ i ⎦ = ⎣ 0 ⎦ (8.39) S 0 X δ si ri Clearly, we have δ x = δ x1 + δ x2 + δ x3 = x˙ − νk (x0 − x¯ ), δ λ = δ λ 1 + δ λ 2 + δ λ 3 = λ˙ − νk (λ 0 − λ¯ ), 1
2
3
0
¯ δ s = δ s + δ s + δ s = s˙ − νk (s − s).
(8.40a) (8.40b) (8.40c)
From the second row of (8.39), we have (D−1 δ xi )T (Dδ si ) = 0, for i = 1, 2, 3, therefore, ID−1 δ xi I2 , IDδ si I2 ≤ ID−1 δ xi I2 + IDδ si I2 = ID−1 δ xi + Dδ si I2 .
(8.41)
Applying Sδ xi + Xδ si = ri to (8.41) for i = 1, 2, 3, respectively, we obtain the following relations √ √ 1 ID−1 δ x1 I, IDδ s1 I ≤ ID−1 δ x1 + Dδ s1 I = I(Xs) 2 I = xT s = nµ, (8.42a) ID−1 δ x2 I, IDδ s2 I ≤ ID−1 δ x2 + Dδ s2 I = νk ID−1 (x0 − x¯ )I, −1
ID
3
3
δ x I, IDδ s I ≤ ID
−1
3
3
0
δ x + Dδ s I = νk ID(s − s¯)I.
(8.42b) (8.42c)
Considering (8.39) with i = 2, we have Sδ x2 + Xδ s2 = r2 = −νk S(x0 − x¯ ), which is equivalent to δ x2 = −νk (x0 − x¯ ) − D2 δ s2 . Thus, from (8.40a), (8.43), and (8.42), we have ID−1 x˙ I = ID−1 [δ x1 + δ x2 + δ x3 + νk (x0 − x¯ )]I
(8.43)
√ An O( nL) Infeasible Arc-Search Algorithm for LP
= ID−1 δ x1 − Dδ s2 + D−1 δ x3 I ≤ ID−1 δ x1 I + IDδ s2 I + ID−1 δ x3 I.
•
157
(8.44)
Considering (8.39) with i = 3, we have Sδ x3 + Xδ s3 = r3 = −νk X(s0 − s¯), which is equivalent to δ s3 = −νk (s0 − s¯) − D−2 δ x3 .
(8.45)
Thus, from (8.40c), (8.45), and (8.42), we have IDs˙I = ID[δ s1 + δ s2 + δ s3 + νk (s0 − s¯)]I = IDδ s1 + Dδ s2 − D−1 δ x3 I ≤ IDδ s1 I + IDδ s2 I + ID−1 δ x3 I.
(8.46)
From (8.42a), we can summarize the above discussion as the following (see also [67]) lemma. Lemma 8.7 Let (x0 , λ 0 , s0 ) be the initial point of Algorithm 8.1, and (x¯ , λ¯ , s¯) ∈ S¯ meet Assump tion 5. Then, √ ˙ ≤ nµ + IDδ s2 I + ID−1 δ x3 I. ID˙sI, ID−1 xI (8.47) Remark 8.3 If the initial point (x0 , λ 0 , s0 ) is a feasible point satisfying Ax0 = b and AT λ 0 + s0 = c, then the problem is reduced to a feasible interior-point problem which has been discussed in Chapter 5. In this case, inequality (8.47) is reduced to √ IDs˙I, ID−1 x˙ I ≤ nµ because x0 = x¯ , s0 = s¯, and IDδ s2 I = ID−1 δ x3 I = 0 from √ (8.42b) and (8.42c). Using IDs˙I, ID−1 x˙ I ≤ nµ, we have proved in Chapter 5 that a √ feasible arc-search algorithm is polynomial with complexity bound O( n log(1/ε)). In the remainder of the chapter, we will focus on the case that the initial point is infeasible. Lemma 8.8 Let (x˙ , λ˙ , s˙) be defined in (8.7). Then, there is a constant C2 independent of n such that in every iteration of Algorithm 8.1, the following inequality holds. √ IDs˙I, ID−1 x˙ I ≤ C2 nµ. Proof 8.5
(8.48)
Since D and D−1 are diagonal matrices, in view of (8.39), we have (Dδ s2 )T (D−1 δ x3 ) = (δ s2 )T (δ x3 ) = 0.
158
• Arc-Search Techniques for Interior-Point Methods
Let (x0 , λ 0 , s0 ) be the initial point of Algorithm 8.1, and (x¯ , λ¯ , s¯) ∈ S¯ meet Assump tion 5. Then, from (8.42b) and (8.42c), we have IDδ s2 − D−1 δ x3 I2
= IDδ s2 I2 + ID−1 δ x3 I2
= νk2 ID−1 (x0 − x¯ )I2 + νk2 ID(s0 − s¯)I2 � k� � � � k� si xi 2 0 T 0 0 T 0 = νk (x − x¯ ) diag k (x − x¯ ) + (s − s¯) diag k (s − s¯) s x ⎡i � � ⎤ i k s diag xik 0 ⎥ T⎢ i 2 0 0 ⎢ � � ⎥ x0 − x¯ , s0 − s¯ = νk x − x¯ , s − s¯ ⎣ k ⎦ x 0 diag ski i � k k� s x ≤ νk2 max ik , ki I x0 − x¯ , s0 − s¯ I2 i xi s i = νk2 max{Dii−2 , Dii2 }I x0 − x¯ , s0 − s¯ I2 i
≤ C12 nµk I x0 − x¯ , s0 − s¯ I2 ,
(8.49)
where the last inequality follows from Lemma 8.6. Adjoining this result with Lemma 8.7 and Assumption 5 gives √ √ √ IDs˙I, ID−1 x˙ I ≤ nµk + I x0 − x¯ , s0 − s¯ IC1 nµk ≤ C2 nµk . This completes the proof.
From Lemma 8.8, we can obtain several inequalities that will be used in our convergence analysis. The first one is given as follows. Lemma 8.9 Let (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) be defined in (8.7) and (8.8). Then, there exists a constant C3 > 0 independent of n such that the following relations hold. ID−1 x¨ I, IDs¨I ≤ C3 nµk0.5 , � n 0.5 −1 ID px I, IDps I ≤ µ , θ k 2C2 ID−1 qx I, IDqs I ≤ √ 2 nµk0.5 . θ
(8.50a) (8.50b) (8.50c)
Proof 8.6 From the last row of (8.20), using the facts that qTx qs = 0, xik ski > θ µk , and Lemma 8.8, we have
⇐⇒
Sqx + Xqs = −2x˙ ◦ s˙
D−1 qx + Dqs = 2(XS)−0.5 (−x˙ ◦ s˙) = 2(XS)−0.5 (−D−1 x˙ ◦ Ds˙)
√ An O( nL) Infeasible Arc-Search Algorithm for LP
•
159
ID−1 qx I2 , IDqs I2 ≤ ID−1 qx I2 + IDqs I2 = ID−1 qx + Dqs I2
=⇒ ≤ ≤
4I(XS)−0.5 I2 ID−1 x˙ I · IDs˙I
4 C2 nµk θ µk 2
2
=
2
(2C22 n)2
µk . θ
Taking the square root on both sides gives 2C2 √ ID−1 qx I, IDqs I ≤ √ 2 n µk . θ
(8.51)
From the last row of (8.19), using the facts that pTx ps = 0 and xik ski ≥ θ µk , we have Spx + Xps = µk e ⇐⇒ =⇒
D−1 px + Dps = (XS)−0.5 µk e ID−1 px I2 , IDps I2 ≤ ID−1 px I2 + IDps I2 = ID−1 px + Dps I2 ≤ I(XS)−0.5 I2 n(µk )2 ≤ nµk /θ .
Taking the square root on both sides gives ID−1 px I, IDps I ≤
�
n√ µk . θ
(8.52)
Combining (8.51) and (8.52) proves (8.50a).
The following inequalities are direct results of Lemmas 8.8 and 8.9. Lemma 8.10 Let (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) be defined in (8.7) and (8.8). Then, the following relations hold. √ √ |x˙ T s˙| |x¨ T s˙| |x˙ T s¨| ≤ C2C3 nµk , ≤ C2C3 nµk . (8.53) ≤ C22 µk , n n n Moreover, 3
3
|x˙i s˙i | ≤ C22 nµk , |x¨i s˙i | ≤ C2C3 n 2 µk , |x˙i s¨i | ≤ C2C3 n 2 µk , |x¨i s¨i | ≤ C32 n2 µk . (8.54) Proof 8.7
The first relation of (8.53) is given as follows. |x˙ T s˙| |(D−1 x˙ )T (Ds˙)| ID−1 x˙ I · IDs˙I = ≤ ≤ C22 µk . n n n
(8.55)
Similarly, we have √ |x¨ T s˙| |(D−1 x¨ )T (Ds˙)| ID−1 x¨ I · IDs˙I = ≤ ≤ C2C3 nµk , n n n
(8.56)
160
• Arc-Search Techniques for Interior-Point Methods
and √ |˙xT s¨| |(D−1 x˙ )T (D¨s)| ID−1 x˙ I · ID¨sI ≤ = ≤ C2C3 nµk . n n n
(8.57)
The first relation of (8.54) is given as follows. −1 −1 ˙ I · ID˙sI ≤ C22 nµk . |x˙i s˙i | = |D−1 ii x˙i Dii s˙i | ≤ |Dii x˙i | · |Dii s˙i | ≤ ID x
(8.58)
Similar arguments can be used for the remaining inequalities of (8.54). 3
Now, we are ready to show that there exists a constant κ0 = O(n 2 ) such that for every iteration, for some σk ∈ [σmin , σmax ] and all sin(αk ) ∈ (0, κ10 ], all conditions in Step 2 of Algorithm 8.1 hold. Lemma 8.11 C4 There exists a positive constant C4 independent of n and an α¯ defined by sin(α¯ ) ≥ √ n such that for ∀k ≥ 0 and sin(αk ) ∈ (0, sin(α¯ )], (xik+1 , sik+1 ) := (xi (σk , αk ), si (σk , αk )) ≥ (φk , ψk ) > 0
(8.59)
holds. Proof 8.8
From (8.30) and (8.29a), a conservative estimation can be obtained by xi (σk , αk ) = xik − x˙i sin(αk ) + x¨i (1 − cos(αk )) ≥ ρxik
which is equivalent to xik (1 − ρ) − x˙i sin(αk ) + x¨i (1 − cos(αk )) ≥ 0. Multiplying D−1 ii to this inequality and using Lemmas 8.8, 8.10, and 5.4, we have −1 (xik sik )0.5 (1 − ρ) − D−1 ii x˙i sin(αk ) + Dii x¨i (1 − cos(αk )) √ √ √ ≥ µk θ (1 − ρ) −C2 n sin(αk ) −C3 n sin2 (αk ) .
Clearly, the last expression is greater than zero for all sin(αk ) ≤ √
C4 √ n
≤ sin(α¯ ), where
θ (1−ρ) √ C4 = 2 max{C . This proves xik+1 ≥ φk > 0. Similarly, we have sik+1 ≥ ψk > 0. 2 , C3 } This completes the proof.
Lemma 8.12 There exists a positive constant C5 independent of n and an αˆ defined by sin(αˆ ) ≥ such that for ∀k ≥ 0 and sin(α) ∈ (0, sin(αˆ )], the following relation
C5 1
n4
√ An O( nL) Infeasible Arc-Search Algorithm for LP
� � � � sin(αk ) C5 µk (σk , αk ) ≤ µk 1 − ≤ µk 1 − 1 4 4n 4
�
161
(8.60)
holds. Proof 8.9
Using (8.26), Lemmas 5.4 and 8.10, we have
µ(σk , αk )
=
µk (1 − sin(αk )) + σk µk (1 − cos(αk )) − −
≤
≤ =
x˙ T s˙ (1 − cos(αk ))2 n
x˙ T s¨ + s˙T x¨ sin(αk )(1 − cos(αk )) � n �
µk 1 − sin(αk ) + σk sin2 (αk ) � T � T � � |x˙ s¨| |x¨ T s˙| |x˙ s˙| 4 3 + sin (αk ) + + sin (αk ) n n n � � √ µk 1 − sin(αk ) + σk sin2 (αk ) +C22 sin4 (αk ) + 2C2C3 n sin3 (αk ) � � �� √ µk 1 − sin(αk ) 1 − σk sin(αk ) −C22 sin3 (αk ) − 2C2C3 n sin2 (αk ) .
Let C5 =
1
2/3 √ 4 max{σk ,C2 , 2C2C3 }
.
then, for all sin(αk ) ∈ (0, sin(αˆ )] and σmin ≤ σk ≤ σmax , inequality (8.60) holds. This completes the proof. Lemma 8.13 There exists a positive constant C6 independent of n and an αˇ defined by sin(αˇ ) ≥ C6 k k 3 such that if xi si ≥ θ µk holds, then for ∀k ≥ 0, ∀i ∈ {1, . . . , n}, and sin(α) ∈ n2
(0, sin(αˇ )], the following relation
xik+1 sik+1 ≥ θ µk+1
(8.61)
holds. Proof 8.10
Using (8.34) and (8.26), we have xik+1 sk+1 − θ µk+1 i ≥ σk µk (1 − θ )(1 − cos(αk )) � � θ (x˙ T s¨ + s˙T x¨ ) sin(αk )(1 − cos(αk )) − x¨i s˙i + x˙i s¨i − n � � θ (x˙ T s˙) (1 − cos(αk ))2 + x¨i s¨i − x˙i s˙i + n
(8.62)
162
• Arc-Search Techniques for Interior-Point Methods
Therefore, if � θ (x˙ T s¨ + s˙T x¨ ) sin(αk ) n � � θ (x˙ T s˙) + x¨i s¨i − x˙i s˙i + (1 − cos(αk )) ≥ 0, n �
σk µk (1 − θ ) −
x¨i s˙i + x˙i s¨i −
(8.63)
then xik+1 sik+1 ≥ θ µk+1 . The inequality (8.63) holds if � θ (x˙ T s¨ + s˙T x¨ ) �� � � σk µk (1 − θ ) − |x¨i s˙i | + |x˙i s¨i | + � � sin(αk ) n � � � θ (x˙ T s˙) � � � − |x¨i s¨i | + |x˙i s˙i | + � � sin2 (αk ) ≥ 0. n �
Using Lemma 8.10, we can easily find some αˇ defined by sin(αˇ ) ≥
C6 3
to meet the
n2
above inequality.
Now, the convergence result follows from the standard argument using The orem 1.4, which is restated below for convenience. Theorem 8.2 Let ε ∈ (0, 1) be given. Suppose that an algorithm generates a sequence of iterations {χk } that satisfies � � δ χk+1 ≤ 1 − ω χk , k = 0, 1, 2, . . . , (8.64) n for some positive constants δ and ω. Then, there exists an index K with K = O(nω log(χ0 /ε)) such that χk ≤ ε for ∀k ≥ K.
In view of Lemmas 8.3, 8.11, 8.12, 8.13, and Theorem 8.2, we can state our main result as the following: Theorem 8.3 Algorithms 8.1 is a polynomial algorithm with polynomial complexity bound of 3 O(n 2 max{log((x0 )T s0 /ε), log(Ir0b I/ε), log(Ir0c I/ε)}).
√ An O( nL) Infeasible Arc-Search Algorithm for LP
8.3
•
163
√ The O( nL) Algorithm
Algorithm 8.2 is a simplified version of Algorithm 8.1. The only difference be tween the two algorithms is that this algorithm searches for optimizers in a larger neighborhood defined by (8.6b). From the discussion in Section 8.1, the follow ing arc-search infeasible interior-point algorithm is well-defined. Algorithm 8.2 Data: A, b, c.
Parameter: ε ∈ (0, 1), σmin ∈ (0, 1), σmax ∈ (0, 1), and ρ ∈ (0, 1).
Initial point: λ 0 = 0 and (x0 , s0 ) > 0.
for iteration k = 0, 1, 2, . . .
Step 0: If Irkb I ≤ ε, Irkc I ≤ ε, and µk ≤ ε, stop. Step 1: Calculate µk , rkb , rkc , λ˙ , s˙, x˙ , pkx , pλk , psk , qxk , qλk , and qsk . Step 2: Find some appropriate αk ∈ (0, π/2] and σk ∈ [σmin , σmax ] to satisfy x(σk , αk ) ≥ φk e, s(σk , αk ) ≥ ψk e, µk > µ(σk , αk ).
(8.65)
Step 3: Set (xk+1 , λ k+1 , sk+1 ) = (x(σk , αk ), λ (σk , αk ), s(σk , αk )) and µk+1 = µ(σk , αk ). Step 4: Set k + 1 → k. Go back to Step 1. end (for) Remark 8.4 It is clear that the only difference between Algorithm 8.1 and Algo rithm 8.2 is in Step 2, where Algorithm 8.2 does not require xi (σk , αk )si (σk , αk ) ≥ θ µ(σk , αk ). We have seen from Lemma 8.13 that this requirement is the main barrier in achieving a better polynomial bound.
Denote γ = min{1, ρβ }.
(8.66)
Lemma 8.14 Assume that Algorithms 8.2 terminates in finite iterations K and for some constant γ 2 νk2 C C, sin(αk ) ≤ n1/4 holds. If µ0 > 1, then there is a positive constant θ = min{ } k∈K µk such that θ < 1. Proof 8.11 First , Assumption 3 implies that µk is bounded below and away from zero before the convergence. Assumption 4 indicates that νk > 0. Since K is finite
164
� Arc-Search Techniques for Interior-Point Methods
and γ > 0 is a constant, θ is an attainable positive constant. Setting k = 0 in (8.23) yields ν0 = 1. This means that γ 2 ν02 ≤ 1. Since µ0 > 1, we have θ ≤ νk2 µk
γ 2 ν02 µ0
≤
ν02 µ0
< 1.
This verifies that the claim holds for k = 0. Assume that < 1 is true for k ≥ 0, we show that the claim is true for k + 1. In view of (8.26) and Lemmas 8.10 and 5.4, we have � T � � x˙ s˙ � �(1 − cos(αk ))2 µk+1 ≥ µk (1 − sin(αk )) + σk µk (1 − cos(αk )) − �� n � � T � � x˙ s¨ + x¨ T s˙ � � sin(αk )(1 − cos(αk )) −�� � n 1 ≥ µk (1 − sin(αk )) + σk µk sin2 (αk ) −C22 µk sin4 (αk ) 2 √ (8.67) −2C2C3 nµk sin3 (αk ). Therefore, using assumption of 2 νk+1 µk+1
≤
0 is bounded below and away from zero before the convergence of Al k gorithm 8.2 and β > 0, then, for a nonnegative constant θ ≥ 0, the inequality xik ski ≥ θ µk holds for ∀i ∈ {1, 2, . . . , n} and ∀k ≥ 0. Proof 8.12 First, Assumption 3 implies that µk is bounded below and away from zero before the convergence. Assumption 4 indicates that νk > 0. By the definition of β , we have β ≤
xk νk
≤
xik νk
and β ≤
sk νk
≤
xk ≥ β νk > 0,
sik νk ,
which can be written as
sk ≥ β νk > 0.
√ An O( nL) Infeasible Arc-Search Algorithm for LP
�
165
Since φk = min{ρxk , νk }, we have either φk = ρxk ≥ ρβ νk or φk = νk , which means that (8.69) φk ≥ min{1, ρβ }νk = γ νk . Since ψk = min{ρsk , νk }, with a similar argument, we can show ψk ≥ min{1, ρβ }νk = γ νk .
(8.70)
Using (8.30) and (8.31), the definition of φk and ψk , and the above two formulas, we have xik ski ≥ φk−1 ψk−1 ≥ γ 2 νk2−1 > γ 2 νk2−1 (1 − sin(αk−1 ))2 = γ 2 νk2 > 0. Let θ = inf{ k
ν2 γ 2 νk2 ν2 }, then, γ 2 { µk } ≥ γ 2 inf{ k } = θ ≥ 0, and we have k k µk µk xik ski ≥
γ 2 νk2 µk ≥ θ µk . µk
This completes the proof.
Since νk > 0 for all iteration k (see Assumption 4 and Section 8.4.10), we immediately have the following corollary. Corollary 8.1 γ 2 νk2 } > 0, k≤K µk k k where θ is a positive constant independent of n; in addition, xi si ≥ θ µk holds for ∀i ∈ {1, 2, . . . , n} and for 0 ≤ k ≤ K. If Algorithm 8.2 terminates in finite iterations K, then we have θ = min{
Proposition 8.5 Assume that �r0b �, �r0c �, and µ0 are all finite. Then, Algorithm 8.2 terminates in finite iterations. Proof 8.13 Since �r0b �, �r0c �, and µ0 are finite, in view of Proposition 8.4, in every iteration, these variables decrease at least a constant. Therefore, the algorithm will terminate in finite steps.
Proposition 8.5 implies that xik sik ≥ θ µk holds for a positive constant θ , ∀i ∈ {1, 2, . . . , n} and for 0 ≤ k ≤ K. Since xik sik ≥ θ µk is the most strict condition required in Lemma 8.13 (sin(αˇ ) ≥ C36 is smaller than both sin(αˆ ) ≥ C15 and n2
n4
C4 sin(α¯ ) ≥ √ ) to show the polynormiality of Algorithm 8.1, and since Algorithm n
8.2 does not check the condition xik sik ≥ θ µk , and it checks only (8.65), from Lemmas 8.11, 8.12, and Theorem 8.2, we conclude
166
� Arc-Search Techniques for Interior-Point Methods
Theorem 8.4 If Algorithm 8.2 terminates in finite iterations, it converges in a number of iterations bounded by a polynomial of the order 1
O(n 2 max{log((x0 )T s0 /ε), log(�r0b �/ε), log(�r0c �/ε)}).
8.4
Implementation Details
The two proposed algorithms are very similar to the one in Chapter 7, in that they all solve standard form linear programming using arc-search techniques. Many implementation details for the four algorithms (two proposed in this chapter, one proposed in Chapter 7, and Mehrotra’s algorithm described in Chapter 4) are in common. However, some algorithm-specific parameters (Section 8.5.1), selec tion of αk (Section 8.5.8 for arc-search), section of σk (Section 8.5.9), and rescale αk (Section 8.5.10) for the two algorithms proposed in this chapter are different from the ones discussed in Chapter 7. Most implementation details have been thoroughly discussed in Chapter 7. Since all these details affect the numerical efficiency, we summarize all details implemented for the algorithms discussed in this chapter and explain the reasons why some implementations in Chapter 7 are adopted and some are not.
8.4.1 Default parameters Several parameters are used in Algorithms 8.1 and 8.2. In our implementation, the following defaults are used without a serious effort to optimize the results for all test problems: θ = min{10−6 , 0.1∗min{x0 ◦s0 }/µ0 }, σmin = 10−6 , σmax = 0.4 for Algorithm 8.1, σmax = 0.3 for Algorithm 8.2, ρ = 0.01, and ε = 10−8 . Note that θ is used only in Algorithm 8.1 and this θ selection guarantees xi0 si0 ≥ θ µ0 .
8.4.2 Initial point selection Initial point selection has been known to be an important factor in the computa tional efficiency for most infeasible interior-point algorithms [23, 172]. We use the methods proposed in [89, 83] to generate candidate initial points. We then compare (8.71) max{�Ax0 − b�, �AT λ 0 + s0 − c�, µ0 } obtained by these two methods and select the initial point with smaller value of (8.71) as we guess this selection may reduce the number of iterations (see Chapter 7 for detail).
√ An O( nL) Infeasible Arc-Search Algorithm for LP
�
167
8.4.3 Pre-process and post-process Pre-solver strategies for the standard linear programming problems represented in the form of (1.4) and solved in normal equations were fully investigated in Chapter 7. Five of them were determined to be effective and efficient in appli cation. The same set of the pre-solver is used for algorithms described in this chapter. The post-process is also the same as in Chapter 7.
8.4.4 Matrix scaling Based on the test and analysis of Chapter 7, it is determined that matrix scaling does not improve efficiency in general. Therefore, we will not use scaling in the implementation. However, the ratio max |Ai, j | min {|Ak,l | : Ak,l �= 0}
(8.72)
is used to determine if pre-process rule 9 of Chapter 7 is used.
8.4.5 Removing row dependency from A Removing row dependency from A is studied in [6], Andersen reported an effi cient method that removes row dependency of A. Based on the study in Chapter 7, we choose to not use this function unless we feel it is necessary when it is used as part of handling degenerate solutions, discussed below. To have a fair comparison of all tested algorithms, we will clarify in Section 8.5 which algo rithms and/or problems use this function and which algorithms and/or problems do not use this function.
8.4.6 Linear algebra for sparse Cholesky matrix Similar to Mehrotra’s algorithm, the majority of the computational cost of the proposed algorithms is related to solving sparse Cholesky systems (8.14) and (8.18), which can be expressed as an abstract problem as follows. AD2 AT u = LΛLT u = v, 1 2
− 12
(8.73)
where D = X S is identical in (8.14) and (8.18), L is a lower triangle ma trix, Λ is a diagonal matrix; but u and v are different vectors. Many popular LP solvers [23, 172] call a software package [104] which uses some linear al gebra specifically developed for the ill-conditioned sparse Cholesky decompo R has not yet implemented the function with sition [77]. However, MATLAB� the features for ill-conditioned matrices. We implement the same method as in Chapter 7.
168
• Arc-Search Techniques for Interior-Point Methods
8.4.7 Handling degenerate solutions Difficulty caused by degenerate solutions in interior-point methods for linear pro gramming has been an issue for a long time [48]. Similar observation was also reported in [44]. In our implementation, we have an option for handling degen erate solutions, as described in Chapter 7.
8.4.8 Analytic solution of αk Given σk , we know that αk can be calculated in analytic form [151, 157]. Since Algorithms 8.1 and 8.2 are slightly different from the ones in Chapter 7, the formulas to calculate αk are slightly different too. We provide these formulas without giving proofs (the proof is very similar to the ones in Chapter 7). For each i ∈ {1, . . . , n}, we can select the largest αxi such that for any α ∈ [0, αxi ], the ith inequality of (8.30) holds, and the largest αsi such that for any α ∈ [0, αsi ] the ith inequality of (8.31) holds. We then define αx =
min {αxi },
(8.74)
min {αsi },
(8.75)
αk = min{α , α s },
(8.76)
αs =
i∈{1,...,n}
i∈{1,...,n} x
where αxi and αsi can be obtained (using a similar argument as in Chapter 7) in analytical forms represented by φk , x˙ i , x¨ i (= pxi σ + qxi ), ψk , s˙i , and s¨i (= psi σ + qsi ). In every iteration, several σ may be tried to find the best σk and αk while φk , ψk , x˙ , s˙, px , ps , qx and qs are fixed (see details in the next section). Case 1 (x˙i = 0 and x¨i �= 0): � π α xi =
xi −φk +x¨i x¨i
−1
cos
Case 2 (x¨i = 0 and x˙i �= 0): �
−1
sin
α xi =
⎩ sin−1
�
xi −φk +x¨i √2 2 x˙i +x¨i
�
− sin−1
�
(8.78)
if x˙i ≥ xi − φk
� β = sin−1
π 2
if x˙i ≤ xi − φk
xi −φk x˙i
Case 3 (x˙i > 0 and x¨i > 0): Let
(8.77)
if xi − φk + x¨i ≤ 0.
π 2
α xi =
⎧ ⎨
if xi − φk + x¨i ≥ 0
2
x¨i �
x˙i2 + x¨i2
√ x¨2i
x˙i +x¨i2
�
� .
(8.79) �
x˙i2 + x¨i 2 �
if xi − φk + x¨i ≤ x˙i2 + x¨i2
if xi − φk + x¨i ≥
(8.80)
√ An O( nL) Infeasible Arc-Search Algorithm for LP
Case 4 (x˙i > 0 and x¨i < 0): Let
� β = sin
αxi =
⎧ ⎨
π 2
⎩ sin−1
�
xi −φk +x¨i √2 2 x˙i +x¨i
�
�
.
x˙i2 + x¨i2
+ sin−1
�
(8.81)
� x˙i2 + x¨i2 �
if xi − φk + x¨i ≤ x˙i2 + x¨i2 if xi − φk + x¨i ≥
�
¨i √−x 2
169
�
−x¨i
−1
•
x˙i +x¨i2
(8.82) Case 5 (x˙i < 0 and x¨i < 0): Let
� β = sin
αxi =
⎧ ⎨
π 2
⎩ π − sin−1
�
−(xi −φk +x¨i )
√
x˙i2 +x¨i 2
−1
�
−x¨i
�
x˙i2 + x¨i2
�
− sin−1
�
¨i √−x 2
.
�
x˙i +x¨i2
(8.83) if xi − φk + x¨i ≥ 0 if xi − φk + x¨i ≤ 0 (8.84)
Case 6 (x˙i < 0 and x¨i > 0): α xi =
π . 2
(8.85)
α xi =
π . 2
(8.86)
Case 7 (x˙i = 0 and x¨i = 0):
Case 1a (s˙i = 0, s¨i �= 0): � π αsi =
if si − ψk + s¨i ≥ 0
2 si −ψk +s¨i s¨i
−1
cos
if si − ψk + s¨i ≤ 0.
Case 2a (s¨i = 0 and s˙i �= 0): � π αsi =
if s˙i ≤ si − ψk
2 −1
sin
si −ψk s˙i
if s˙i ≥ si − ψk
(8.87)
(8.88)
Case 3a (s˙i > 0 and s¨i > 0):
α si =
⎧ ⎨
π 2
⎩ sin−1
�
si −ψk +s¨i √2 2 s˙i +s¨i
�
− sin−1
�
√ s2¨i
s˙i +s¨2i
�
� s˙2i + s¨2i � if si − ψk + s¨i < s˙2i + s¨i2
if si − ψk + s¨i ≥
(8.89)
170
• Arc-Search Techniques for Interior-Point Methods
Case 4a (s˙i > 0 and s¨i < 0):
αsi =
⎧ ⎨
π 2
⎩ sin−1
�
si −ψk +s¨i √2 2 s˙i +s¨i
�
+ sin−1
�
�
¨i √−s 2
�
s˙2i + s¨2i �
if si − ψk + s¨i ≤ s˙i2 + s¨2i
if si − ψk + s¨i ≥
s˙i +s¨2i
(8.90) Case 5a (s˙i < 0 and s¨i < 0):
αsi =
⎧ ⎨
π 2
⎩ π − sin−1
�
−(si −ψk +s¨i )
√2
s˙i +s¨i2
�
− sin−1
�
√−2s¨i
if si − ψk + s¨i ≥ 0
�
if si − ψk + s¨i ≤ 0
s˙i +s¨2i
(8.91) Case 6a (s˙i < 0 and s¨i > 0): αsi =
π . 2
(8.92)
αsi =
π . 2
(8.93)
Case 7a (s˙i = 0 and s¨i = 0):
8.4.9 Selection of centering parameter σk From the convergence analysis, it is clear that the best strategy to achieve a large step size is to select both αk and σk at the same time, similar to the idea of Section 3.4. Therefore, we will find a σk which maximizes the step size αk . The problem can be expressed as max
min {αxi (σ ), αsi (σ )},
σ ∈[σmin ,σmax ] i∈{1,...,n}
(8.94)
where 0 < σmin < σmax < 1, αxi (σ ) and αsi (σ ) are calculated using (8.77)-(8.93) for a fixed σ ∈ [σmin , σmax ]. Problem (8.94) is a minimax problem without regu larity conditions involving derivatives. Golden section search [34] seems to be a reasonable method for solving this problem. However, given the fact from (8.30) that αxi (σ ) is a monotonic increasing function of σ if pxi > 0 and αxi (σ ) is a monotonic decreasing function of σ if pxi < 0 (and similar properties hold for αsi (σ )), we can use the condition min{ min αxi (σ ), min αsi (σ )} > min{ min αxi (σ ), min αsi (σ )}, (8.95) i∈pxi 0
√ An O( nL) Infeasible Arc-Search Algorithm for LP
•
171
Algorithm 8.3 Data: (x˙ , s˙), (px , ps ), (qx , qs ), (xk , sk ), φk , and ψk . Parameter: ε ∈ (0, 1), σlb = σmin , σub = σmax ≤ 1. for iteration k = 0, 1, 2, . . . Step 0: If σub − σlb ≤ ε, set α =
min {αxi (σ ), αsi (σ )}, stop.
i∈{1,...,n}
Step 1: Set σ = σlb + 0.5(σub − σlb ). Step 2: Calculate αxi (σ ) and αsi (σ ) using (8.77)-(8.93). Step 3: If (8.95) holds, set σlb = σ , else set σub = σ . Step 4: Set k + 1 → k. Go back to Step 1. end (for)
It is known that Golden section search yields a new interval whose length is 0.618 of the previous interval in all iterations [80], while the proposed algorithm yields a new interval whose length is 0.5 of the previous interval and, therefore, is more efficient than Golden section. For Algorithm 8.1, after executing Algorithm 8.3, we may still need to further reduce αk using Golden section or bisection to satisfy µ(σk , αk ) < µk , x(σk , αk )s(σk , αk ) ≥ θ µ(σk , αk )e.
(8.96a) (8.96b)
For Algorithm 8.2, after executing Algorithm 8.3, we need to check only (8.96a) and decide if further reduction of αk is needed. Remark 8.5 Comparing Lemmas 8.11 and 8.12, we guess that the restriction of satisfying the condition of µ(σk , αk ) < µk is weaker than the restriction of satisfy ing the conditions of x(σk , αk ) ≥ φk and s(σk , αk ) ≥ ψk , which are solved in (8.94). Indeed, we observed that σk and αk obtained by solving (8.94) always satisfy the weaker restriction. Nevertheless, we keep this check for the safety concern.
8.4.10 Rescaling αk To maintain νk > 0, in each iteration, after an αk is found as in the above process, we rescale αk = min{0.9999αk , 0.99π/2} < 0.99π/2 so that νk = νk−1 (1 − sin(αk )) > 0 holds in every iteration. Therefore, Assumption 4 is al ways satisfied. We notice that the rescaling also prevents xk and sk from getting too close to zero in early iterations, which may cause problems while solving (8.73).
172
� Arc-Search Techniques for Interior-Point Methods
8.4.11 Terminate criteria The main stopping criterion used in the implementations is slightly deviated from the one described in previous sections but follows the convention used by most infeasible interior-point software implementations, such as LIPSOL [172] �rkb � µk �rkc � + + < 10−8 . max{1, �b�} max{1, �c�} max{1, �cT xk �, �bT λ k �} In case the algorithms fail to find a good search direction, the program also stops if step sizes αkx < 10−8 and αks < 10−8 . Finally, if (a) due to the numerical problem, �rkb � or �rkc � does not decrease k k−1 k −8 but 10�rk−1 b � < �rb � or 10�rc � < �rc �, or (b) if µ < 10 , the program stops.
8.5
Numerical Tests
The two algorithms proposed in this chapter are implemented in MATLAB func tions and named as arclp1.m and arclp2.m. These two algorithms are com pared with two efficient algorithms, the arc-search algorithm proposed in Chap ter 7 (implemented as curvelp.m) and the well-known Mehrotra’s algorithm discussed in Chapters 4 and 7 (implemented as mehrotra.m). The main cost of the four algorithms in each iteration is the same, involving the linear algebra for sparse Cholesky decomposition, which is the first equa tion of (8.73). The cost of Cholesky decomposition is O(n3 ), which is much higher than the next expensive task O(n2 ), the cost of solving the second equa tion of (8.73). Therefore, we conclude that the iteration count is a good measure to compare the performance for these four algorithms. All MATLAB codes of the above four algorithms are tested against to each other using the benchmark Netlib problems. The four MATLAB codes use exactly the same initial point, the same stopping criteria, the same pre-process and the same post-process, so that the comparison of the performance of the four algorithms is reasonable. Numerical tests for all algorithms have been performed for all Netlib linear programming problems that are presented in standard form, except Osa 60 (m = 10281 and n = 232966) because the PC computer used for the testing does not have enough memory to handle this problem. The iteration numbers used to solve these prob lems are listed in Table 8.1. We noted, in Chapter 7, that curvelp.m and mehrotra.m have difficulty with some problems because of the degenerate solutions, but the proposed Algo rithms 8.1 and 8.2 implemented as arclp1.m and arclp2.m have no difficulty with all test problems. Although we have the option of handling degenerate so lutions implemented in arclp1.m and arclp2.m, this option is not used for all the test problems. However, curvelp.m and mehrotra.m have to use this option because these two codes reached some degenerate solutions for several
√ An O( nL) Infeasible Arc-Search Algorithm for LP
�
173
problems which make them difficult to solve or need significantly more itera tions. For problems marked with ‘+’, this option is called only for Mehrotra’s method. For problems marked with ‘*’, both curvelp.m and mehrotra.m need to call this option for better results. For problems with ‘**’, this option is called for both curvelp.m and mehrotra.m but curvelp.m does not need to call this feature, however, calling this feature reduces iteration count. We need to keep in mind that, although using the option described in Section 8.4.7 reduces the iter ation count significantly, these iterations are significantly more expensive [161]. Therefore, simply comparing iteration counts for problems marked with ‘+’, ‘*’, and ‘**’ will lead to a conclusion in favor of curvelp.m and mehrotra.m (which is what we will do in the following discussions). Table 8.1: Comparison of arclp1.m, arclp2.m, curvelp.m, and mehrotra.m for problems in Netlib. Problem Adlittle
Afiro
Agg
Agg2
Agg3
Bandm
Beaconfd
Blend
Bnl1
algorithm curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m
iter 15 15 16 17 9 9 9 9 18 22 20 20 18 20 21 21 17 18 20 20 19 22 20 20 10 11 11 11 12 14 14 14 32 35
obj 2.2549e+05 2.2549e+05 2.2549e+05 2.2549e+05 -464.7531 -464.7531 -464.7531 -464.7531 -3.5992e+07 -3.5992e+07 -3.5992e+07 -3.5992e+07 -2.0239e+07 -2.0239e+07 -2.0239e+07 -2.0239e+07 1.0312e+07 1.0312e+07 1.0312e+07 1.0312e+07 -158.6280 -158.6280 -158.6280 -158.6280 3.3592e+04 3.3592e+04 3.3592e+04 3.3592e+04 -30.8121 -30.8122 -30.8122 -30.8121 1.9776e+03 1.9776e+03
infeasibility 1.0e-07 3.4e-08 3.0e-11 8.0e-11 1.0e-11 8.0e-12 6.2e-13 1.0e-12 5.0e-06 5.2e-05 3.7e-06 7.0e-06 4.6e-07 5.2e-07 3.1e-08 2.6e-08 3.1e-08 8.8e-09 1.5e-08 1.8e-08 3.2e-11 8.3e-10 3.6e-11 3.4e-11 1.4e-12 1.4e-10 1.8e-12 6.0e-12 1.0e-09 4.9e-11 1.6e-12 2.5e-12 2.7e-09 3.4e-09
174
• Arc-Search Techniques for Interior-Point Methods
Bnl2+
Brandy
Degen2*
Degen3*
fffff800
Israel
Lotfi
Maros r7
Osa 07*
Osa 14
Osa 30
Qap12
Qap15*
arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curve mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m
34 34 31 38 35 35 21 19 24 23 16 17 19 19 22 22 35 26 26 31 28 28 23 29 27 25 14 18 16 16 18 21 20 20 37 35 32 30 35 37 42 42 32 36 42 39 22 24 23 22 27 44 28
1.9776e+03 1.9776e+03 1.8112e+03 1.8112e+03 1.8112e+03 1.8112e+03 1.5185e+03 1.5185e+03 1.5185e+03 1.5185e+03 -1.4352e+03 -1.4352e+03 -1.4352e+03 -1.4352e+03 -9.8729e+02 -9.8729e+02 -9.8729e+02 -9.8729e+02 5.5568e+005 5.5568e+05 5.5568e+05 5.5568e+05 -8.9664e+05 -8.9665e+05 -8.9664e+05 -8.9664e+05 -25.2647 -25.2647 -25.2646 -25.2647 1.4972e+06 1.4972e+06 1.4972e+06 1.4972e+06 5.3574e+05 5.3578e+05 5.3578e+05 5.3578e+05 1.1065e+06 1.1065e+06 1.1065e+06 1.1065e+06 2.1421e+06 2.1421e+06 2.1421e+06 2.1421e+06 5.2289e+02 5.2289e+02 5.2289e+02 5.2289e+02 1.0411e+03 1.0410e+03 1.0410e+03
2.9e-09 7.8e-10 5.4e-10 9.3e-07 3.5e-06 1.9e-07 3.0e-06 6.2e-08 2.4e-06 1.8e-07 1.9e-08 2.0e-10 5.9e-10 1.5e-08 7.0e-05 1.2e-09 8.6e-08 1.2e-08 4.3e-05 7.7e-04 3.7e-09 4.9e-09 7.4e-08 1.8e-08 3.4e-08 6.3e-08 3.5e-10 2.7e-07 7.8e-09 6.5e-09 1.6e-08 6.4e-09 1.7e-09 1.8e-09 4.2e-07 1.5e-07 8.4e-10 5.7e-5 2.0e-09 3.0e-08 5.2e-09 8.9e-09 1.0e-08 1.3e-08 1.3e-08 1.6e-08 1.9e-08 6.2e-09 2.9e-10 2.5e-09 3.9e-07 1.5e-05 8.4e-08
√ An O( nL) Infeasible Arc-Search Algorithm for LP
Qap8*
Sc105
Sc205
Sc50a
Sc50b
Scagr25
Scagr7
Scfxm1+
Scfxm2
Scfxm3+
Scrs8
Scsd1
Scsd6
arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m
27 12 13 12 12 10 11 11 11 13 12 12 12 10 9 10 10 8 8 10 10 19 18 19 19 15 17 17 17 20 22 21 21 23 26 24 24 24 23 23 25 23 30 28 27 12 13 11 11 14 16 16 16
1.0410e+03 2.0350e+02 2.0350e+02 2.0350e+02 2.0350e+02 -52.2021 -52.2021 -52.2021 -52.2021 -52.2021 -52.2021 -52.2021 -52.2021 -64.5751 -64.5751 -64.5751 -64.5751 -70.0000 -70.0000 -70.0000 -70.0000 -1.4753e+07 -1.4753e+07 -1.4753e+07 -1.4753e+07 -2.3314e+06 -2.3314e+06 -2.3314e+06 -2.3314e+06 1.8417e+04 1.8417e+04 1.8417e+04 1.8417e+04 3.6660e+04 3.6660e+04 3.6660e+04 3.6661e+04 5.4901e+04 5.4901e+04 5.4901e+04 5.4901e+04 9.0430e+02 9.0430e+02 9.0430e+02 9.0429e+2 8.6666 8.6666 8.6666 8.6667 50.5000 50.5000 50.5000 50.5000
1.4e-08 1.2e-12 7.1e-09 6.2e-11 1.1e-10 3.8e-12 9.8e-11 2.2e-12 5.6e-12 3.7e-10 8.8e-11 4.4e-11 4.5e-11 3.4e-12 8.3e-08 8.5e-13 5.9e-13 1.0e-10 9.1e-07 3.6e-12 1.8373e-12 5.0e-07 4.6e-09 1.7e-08 2.1e-08 2.7e-09 1.1e-07 7.0e-10 9.2e-10 3.1e-07 1.6e-08 3.3e-05 6.8e-06 2.3e-06 2.6e-08 4.8e-05 1.1e-05 1.9e-06 9.8e-08 1.2e-04 4.0988e-04 1.2e-11 1.8e-10 1.0e-10 1.2e-08 1.0e-10 8.7e-14 3.3e-15 5.3e-15 1.5e-13 8.6e-15 2.6e-13 4.8e-13
•
175
176
• Arc-Search Techniques for Interior-Point Methods
Scsd8
Sctap1
Sctap2
Sctap3
Share1b
Share2b
Ship04l
Ship04s
Ship08l
Ship08s
Ship12l
Ship12s
Stocfor1**
curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m
13 14 15 15 20 27 20 20 21 21 22 20 20 22 21 21 22 25 26 26 13 15 15 15 17 18 19 18 17 20 19 19 19 22 20 20 17 20 19 19 21 21 21 21 17 19 21 20 20/14 14 13 14
9.0500e+02 9.0500e+02 9.0500e+02 9.0500e+02 1.4122e+03 1.4123e+03 1.4123e+03 1.4123e+03 1.7248e+03 1.7248e+03 1.7248e+03 1.7248e+03 1.4240e+03 1.4240e+03 1.4240e+03 1.4240e+03 -7.6589e+04 -7.6589e+04 -7.6589e+04 -7.6589e+04 -4.1573e+02 -4.1573e+02 -4.1573e+02 -4.1573e+02 1.7933e+06 1.7933e+06 1.7933e+06 1.7933e+06 1.7987e+06 1.7987e+06 1.7987e+06 1.7987e+06 1.9090e+06 1.9091e+06 1.9090e+06 1.9091e+06 1.9201e+06 1.9201e+06 1.9201e+06 1.9201e+06 1.4702e+06 1.4702e+06 1.4702e+06 1.4702e+06 1.4892e+06 1.4892e+06 1.4892e+06 1.4892e+06 -4.1132e+04 -4.1132e+04 -4.1132e+04 -4.1132e+04
6.7e-10 1.3e-10 2.6e-13 3.7e-13 2.6e-10 0.0031 1.4e-11 1.8e-11 2.1e-10 4.4e-07 1.4e-12 1.1e-12 5.7e-08 5.9e-07 1.9e-12 2.5e-12 6.5e-08 1.5e-06 1.9e-07 2.3e-07 4.9e-11 7.9e-10 1.4e-10 8.9e-11 5.2e-11 2.9e-11 1.3e-10 5.9e-11 2.2e-11 4.5e-09 3.1e-10 3.1e-09 1.6e-07 1.0e-10 1.8e-11 1.2e-11 3.7e-08 4.5e-12 1.7e-09 3.2e-11 4.7e-13 1.0e-08 3.0e-10 6.5e-11 1.0e-10 2.1e-13 5.0e-11 1.4e-10 2.8e-10 1.1e-10 8.6890e-11 1.1e-10
√ An O( nL) Infeasible Arc-Search Algorithm for LP
Stocfor2
curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m curvelp.m mehrotra.m arclp1.m arclp2.m
Stocfor3
Truss
22 22 22 22 34 38 37 37 25 26 24 24
-3.9024e+04 -3.9024e+04 -3.9024e+04 -3.9024e+04 -3.9976e+04 -3.9976e+04 -3.9977e+04 -3.9976e+04 4.5882e+05 4.5882e+05 4.5882e+05 4.5882e+05
�
177
2.1e-09 1.6e-09 4.3e-09 4.3e-09 4.7e-08 6.4e-08 7.7e-08 6.8e-08 1.7e-07 9.5e-06 5.2e-07 1.7e-09
Performance profile2 is used to compare the efficiency of the four algorithms. Figure 8.1 is the performance profile of iteration numbers of the four algorithms. It is clear that curvelp.m is the most efficient algorithm of the four algorithms; arclp2.m is slightly better than arclp1.m and mehrotra.m; and the efficien cies of arclp1.m and mehrotra.m are roughly the same. Overall, the effi ciency difference of the four algorithms is not very significant. Given the fact that arclp1.m and arclp2.m are convergent in theory and more stable in nu merical test, we believe that these two algorithms are better choices in practical applications. Performance (number of iterations) profiles comparison 1 CurveLP Mehrotra arclp1 arclp2
0.9 0.8
P (rp,s ≤ τ )
0.7 0.6 0.5 0.4 0.3 0.2 0.1 1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
τ
Figure 8.1: Performance profile comparison of the four algorithms.
2 To our best knowledge, performance profile was first used in [132] to compare the performance of different algorithms. The method has been becoming very popular after its merit was carefully analyzed in [29].
178
8.6
• Arc-Search Techniques for Interior-Point Methods
Concluding Remarks
In this chapter, we propose two computationally efficient polynomial interiorpoint algorithms. These algorithms search the optimizer along ellipses that ap proximate the central path. The first algorithm is proved to be polynomial and its simplified version, the second algorithm, has a better complexity bound than all existing infeasible interior-point algorithms and achieves the best complexity bound for all existing, feasible or infeasible, interior-point algorithms. Numeri cal test results for all the Netlib standard linear programming problems show that these two algorithms are competitive to the state-of-the-art Mehrotra’s Algorithm which has no convergence result. The results obtained in this chapter solve all the dilemmas discussed in Section 4.4.
ARC-SEARCH INTERIOR-POINT METHODS: EXTENSIONS
III
Chapter 9
An Arc-Search Algorithm for Convex Quadratic Programming Interior-point algorithms for convex quadratic programming have been proposed for decades by many researchers. One of the earliest and well-known algorithms was proposed by Dikin [27]. After Karmarkar’s pioneer work on interior-point polynomial algorithm for linear programming [57], interior-point polynomial al gorithms for convex quadratic programming have been investigated by many re searchers. For example, Ye and Tse [166] extended Karmarkar’s algorithm and proved that their algorithm has polynomial complexity bound O(n log(1/ε)). Monteiro and Adler√ proposed a different algorithm [96] and improved the com plexity bound to O( n log(1/ε)). However, these algorithms do not use higherorder information, which is believed intuitively (and demonstrated by numerical test) to be useful in practical implementation [8, 89]. Gondzio [41] also con sidered multiple centrality corrections for linear programming and provided nu merical results to demonstrate the potential benefit of higher-order method, but [89, 41] did not show the polynomiality of their methods. The first polynomial higher-order algorithm can probably be attributed to Monteiro, Adler, and Resende [97], who derived the complexity bound 1 1 O(n 2 (1+ r ) log(1/ε)) for their algorithm, where r ≥ 1 is the order of derivatives. Clearly, this bound is not as good as the one for the first-order algorithm [96]. This chapter extends the arc-search technique discussed in Chapter 5 to convex √ quadratic problem. A polynomialR algorithm with complexity bound code is implemented for the algo O( n log(1/ε)) is proposed. A MATLAB�
182
� Arc-Search Techniques for Interior-Point Methods
rithm. A simple example is used to demonstrate how this algorithm works. The proposed algorithm is also tested on all convex quadratic programming problems listed in [53]. The result is compared with the one obtained by LOQO [47]. This preliminary result shows that the proposed algorithm is promising because it uses fewer iterations in all these tested problems than the iterations LOQO uses.
9.1
Problem Descriptions
This chapter considers the convex quadratic programming (QP) in the standard form: 1 T (QP) min x Hx + cT x, 2 subject to Ax = b, x ≥ 0, (9.1) where 0 � H ∈ Rn×n is a positive semidefinite matrix, A ∈ Rm×n , b ∈ Rm , c ∈ Rn are given, and x ∈ Rn is the vector to be optimized. Associated with the quadratic programming is the dual programming (DQP) that is also presented in the standard form: 1 max − xT Hx + bT λ , (DQP) 2 (9.2) subject to −Hx + AT λ + s = c, s ≥ 0, x ≥ 0, where λ ∈ Rm is the dual variable vector, and s ∈ Rn is the dual slack vector. Denote the feasible set F as a collection of all points that meet the constraints of (QP) and (DQP). F = {(x, λ , s) : Ax = b, AT λ + s − Hx = c, (x, s) ≥ 0},
(9.3)
and the strictly feasible set F o as a collection of all points that meet the con straints of (QP) and (DQP) and are strictly positive F o = {(x, λ , s) : Ax = b, AT λ + s − Hx = c, (x, s) > 0}.
(9.4)
Throughout this chapter, we make the following assumptions. Assumptions: 1. A is a full rank matrix. 2. F o is not empty. Assumption 2 implies the existence of a central path. Since 1 ≤ m and m < n, assumption 2 implies 2 ≤ n. It is well known that x ∈ Rn is an optimal solution of (9.1) if and only if x, λ , and s meet the following KKT conditions Ax = b
(9.5a)
An Arc-Search Algorithm for Convex Quadratic Programming
AT λ + s − Hx = c xi si = 0, i = 1, . . . , n (x, s) ≥ 0.
•
183
(9.5b) (9.5c) (9.5d)
For convex (QP) problem, KKT condition is also sufficient for x to be a global optimal solution. Similar to the linear programming, the central path follow ing algorithm proposed here tries to search the optimizers along an arc that is an approximation of the central path C ∈ F o ⊂ F, where the central path C is parametrized by a scalar τ > 0 as follows. For each interior point (x, λ , s) on the central path, there is a τ > 0 such that Ax = b
(9.6a)
A λ + s − Hx = c xi si = τ, i = 1, . . . , n (x, s) > 0.
(9.6b) (9.6c) (9.6d)
T
Therefore, the central path is an arc in R2n+m parametrized as a function of τ and is denoted as C = {(x(τ), λ (τ), s(τ)) : τ > 0}. (9.7)
As τ → 0, the moving point (x(τ), λ (τ), s(τ)) on the central path represented by (9.6) approaches to the solution of (QP) represented by (9.1).
9.2 An Arc-search Algorithm for Convex Quadratic Programming Let 1 > θ > 0, and the duality measure be given by (5.17) which is repeated below. xT s µ= . n We denote N2 (θ ) = {(x, λ , s) | Ax = b, −Hx + AT λ + s = c, (x, s) > 0, Ix ◦ s − µeI ≤ θ µ}. (9.8) It is worthwhile to note N2 (θ ) ⊂ F o . For (x, λ , s) ∈ N2 (θ ), since (1 − θ )µ ≤ xi si ≤ (1 + θ )µ, we have xi si maxi (xi si ) mini (xi si ) xi si ≤ ≤µ≤ ≤ . 1+θ 1+θ 1−θ 1−θ
(9.9)
The idea of arc-search proposed in this chapter is very simple. The algorithm starts from a feasible point in N2 (θ ) close to the central path, constructs an arc that passes through the point and approximates the central path, searches along
184
� Arc-Search Techniques for Interior-Point Methods
the arc to a new point in a larger area N2 (2θ ) that reduces the duality gap xT s and meets (9.6a), (9.6b), and (9.6d). The process is repeated by finding a better point close to the central path or on the central path in N2 (θ ) that simultaneously meets (9.6a), (9.6b), and (9.6d). As the duality measure or duality gap approaches zero, condition (9.6c) is met and the optimal solution is then found.
9.2.1 Predictor step We will use an ellipse E in 2n + m dimensional space to approximate the central path C described by (9.6), where E = {(x(α), λ (α), s(α)) : (x(α), λ (α), s(α)) =�a cos(α) +�b sin(α) +�c}, (9.10) �a ∈ R2n+m and �b ∈ R2n+m are the axes of the ellipse, and�c ∈ R2n+m is the center of the ellipse. Given a point y = (x, λ , s) = (x(α0 ), λ (α0 ), s(α0 )) ∈ E, which is close to or on the central path,�a, �b,�c are functions of α, y, y˙ , and y¨ , where y˙ and y¨ are defined as ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ 0 A 0 0 ⎣ −H AT I ⎦ ⎣ λ˙ ⎦ = ⎣ 0 ⎦ , (9.11) S 0 X x◦s s˙ ⎤ ⎡ ⎤⎡ ⎤ ⎡ x¨ A 0 0 0 ⎣ −H AT I ⎦ ⎣ λ¨ ⎦ = ⎣ ⎦. 0 (9.12) S 0 X −2x˙ ◦ s˙ s¨ It has been shown in Chapter 5 that the calculation of �a, �b, and �c in the expression of the ellipse can be avoided. The following formulas are used instead. Theorem 9.1 Let (x(α), λ (α), s(α)) be an arc defined by (9.10) passing through a point (x, λ , s) ∈ E, and its first and second derivatives at (x, λ , s) be (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) which are defined by (9.11) and (9.12). Then, an ellipse approximation of the central path is given by x(α) = x − x˙ sin(α) + x¨ (1 − cos(α)). (9.13) λ (α) = λ − λ˙ sin(α) + λ¨ (1 − cos(α)).
(9.14)
s(α) = s − s˙ sin(α) + s¨(1 − cos(α)).
(9.15)
Assuming (x, s) > 0, one can easily see that, if xx˙ , xx¨ , ss˙ , and ss¨ are bounded (we will show that this claim is true), and if α is small enough, then x(α) > 0 and s(α) > 0. We will also show that searching along this arc will reduce the duality T measure, i.e., µ (α) = x(α )n s(α) < µ.
An Arc-Search Algorithm for Convex Quadratic Programming
•
185
Lemma 9.1 Let (x, λ , s) be a strictly feasible point of (QP) and (DQP), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) meet (9.11) and (9.12), (x(α), λ (α), s(α)) be calculated using (9.13), (9.14), and (9.15), then the following conditions hold. Ax(α) = b, AT λ (α) + s(α) − Hx(α) = c. Proof 9.1 Since (x, λ , s) is a strict feasible point, the result follows from direct calculation by using (9.4), (9.11), (9.12), and Theorem 9.1. Lemma 9.2 Let x˙ , s˙, x¨ , and s¨ be defined in (9.11) and (9.12), and H be positive semidefinite matrix. Then, the following relations hold. x˙ T s˙ = x˙ T H˙x ≥ 0,
(9.16)
x¨ T s¨ = x¨ T Hx¨ ≥ 0.
(9.17)
x¨ s˙ = x˙ s¨ = x˙ Hx¨.
(9.18)
T
T
T
−(x˙ T s˙)(1 − cos(α))2 − (x¨ T s¨) sin2 (α) ≤ (x¨ T s˙ + x˙ T s¨) sin(α)(1 − cos(α)) ≤ (x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α). −(x˙ T s˙) sin2 (α) − (x¨ T s¨)(1 − cos(α))2 ≤ (x¨ T s˙ + x˙ T s¨) sin(α)(1 − cos(α)) ≤ (x˙ T s˙) sin2 (α) + (x¨ T s¨)(1 − cos(α))2 .
(9.19)
(9.20)
For α = π2 , (9.19) and (9.20) reduce to − x˙ T s˙ + x¨ T s¨ ≤ (x¨ T s˙ + x˙ T s¨) ≤ x˙ T s˙ + x¨ T s¨.
(9.21)
Proof 9.2 Pre-multiplying x˙ T to the second rows of (9.11) and pre-multiplying x¨ T to the second rows of (9.12), then using the first rows of (9.11) and (9.12), gives x˙ T s˙ = x˙ T Hx˙ and x¨ T s¨ = x¨ T Hx¨. (9.16) and (9.17) follow from the fact that H is positive semidefinite. Pre-multiply x¨ T in the second row of (9.11) and, using the first row of (9.12), we have x¨ T H˙x = x¨T s˙. Pre-multiply x˙ T in the second row of (9.12) and, using the first row of (9.11), we have x˙ T s¨ = x˙ T Hx¨ = x¨T s˙. This gives, (x˙ (1 − cos(α)) + x¨ sin(α))TH(x˙(1 − cos(α)) + x¨ sin(α))
186
• Arc-Search Techniques for Interior-Point Methods
= (x˙ T Hx˙)(1 − cos(α))2 + 2(x˙T Hx¨) sin(α)(1 − cos(α)) + (x¨T Hx¨) sin2 (α) = (x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α) + (x¨ T s˙ + x˙ T s¨) sin(α)(1 − cos(α)) ≥ 0, which is the first inequality of (9.19), and (x˙ (1 − cos(α)) − x¨ sin(α))T H(x˙(1 − cos(α)) − x¨ sin(α)) = (x˙ T Hx˙)(1 − cos(α))2 − 2(x˙T Hx¨) sin(α)(1 − cos(α)) + (x¨T Hx¨) sin2 (α) = (x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α) − (x¨ T s˙ + x˙ T s¨) sin(α)(1 − cos(α)) ≥ 0, which is the second inequality of (9.19). Replacing x˙ (1 − cos(α)) and x¨ sin(α) by x˙ sin(α) and x¨ (1 − cos(α)), and following the same derivation, we can obtain (9.20). This finishes the proof.
A simple result, which was provided in (6.54) and restated here (see also [95]), will be used. Lemma 9.3 Let u, v, and w be real vectors of same size satisfying u + v = w and uT v ≥ 0. Then, 2IuI · IvI ≤ IuI2 + IvI2 ≤ IuI2 + IvI2 + 2uT v = Iu + vI2 = IwI2 .
(9.22)
Using Lemmas 9.2, 5.3, and 9.3, we can show that xx˙ , xx¨ , ss˙ , and ss¨ are bounded, as claimed in the following two Lemmas. Lemma 9.4 Let (x, λ , s) ∈ N2 (θ ) and (x˙ , λ˙ , s˙) meet (9.11). Then, � x˙ �2 � s˙ �2 n � � � � , � � +� � ≤ x s 1−θ �2 � x˙ �2 � s˙ �2 � n � � � � , � � � � ≤ x s 2(1 − θ ) 0≤ Proof 9.3
(9.23)
(9.24)
x˙ T s˙ n(1 + θ ) := δ1 n. ≤ 2(1 − θ ) µ
(9.25)
From the last row of (9.11), we have Sx˙ + Xs˙ = XSe 1 1 1 1 1 1 ⇐⇒ X− 2 S 2 x˙ + X 2 S− 2 s˙ = X 2 S 2 e. 1
1
1
1
1
1
Let u = X− 2 S 2 x˙ , v = X 2 S− 2 s˙, and w = X 2 S 2 e, from Lemma 9.2, uT v ≥ 0. Using Lemma 9.3, we have IuI2 + IvI2 =
n � x˙2 si i
i=1
xi
+
n � s˙2 xi i
i=1
si
1
1
≤ IX 2 S 2 eI2 = nµ.
An Arc-Search Algorithm for Convex Quadratic Programming
•
187
Dividing both sides of the inequality by min s j x j and using (9.9) gives j
n � x˙2
i 2 x i=1 i
+
n � s˙2 i
i=1
s2i
≤
nµ n ≤ , min j (s j x j ) 1 − θ
or equivalently � x˙ �2 � s˙ �2 n � � � � . � � +� � ≤ x s 1−θ This proves (9.23). Combining (9.23) and Lemma 5.3 yields �2 � x˙ �2 � s˙ �2 � n � � � � . � � � � ≤ x s 2(1 − θ ) This leads to
� x˙ �� s˙ � n � �� � . � �� � ≤ x s 2(1 − θ )
(9.26)
Therefore, using (9.9) and Cauchy-Schwarz inequality yields x˙ T s˙ µ
� �T � � |x˙ |T |s˙| |x˙ |T |s˙| |x˙ | |s˙| ≤ ≤ (1 + θ ) ≤ (1 + θ ) µ maxi (xi si ) x s � x˙ �� s˙ � n(1 + θ ) � �� � ≤ (1 + θ )� �� � ≤ , x s 2(1 − θ )
(9.27)
which is the second inequality of (9.25). From Lemma 9.2, x˙ T s˙ = x˙ T Hx˙ ≥ 0, we have the first inequality of (9.25). Lemma 9.5 Let (x, λ , s) ∈ N2 (θ ), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) meet (9.11) and (9.12). Then, � x¨ �2 � s¨ �2 (1 + θ )n2 � � � � . � � +� � ≤ x s (1 − θ )3 0≤
x¨ T s¨ n2 (1 + θ )2 := δ2 n2 . ≤ µ 2(1 − θ )3
� x˙ T s¨ � n 32 (1 + θ ) 23 3 � � := δ3 n 2 , � �≤ 2 µ (1 − θ ) Proof 9.4
(9.28)
(9.29)
� x¨ T s˙ � n 32 (1 + θ ) 32 3 � � := δ3 n 2 . � �≤ 2 µ (1 − θ )
(9.30)
Similar to the proof of Lemma 9.4, from the last row of (9.12), we have Sx¨ + Xs¨ = −2 (x˙ ◦ s˙)
1
1
1
1
1
1
⇐⇒ X− 2 S 2 x¨ + X 2 S− 2 s¨ = −2X− 2 S− 2 (x˙ ◦ s˙) ,
188
• Arc-Search Techniques for Interior-Point Methods 1
1
1
1
1
1
Let u = X− 2 S 2 x¨ , v = X 2 S− 2 s¨, and w = −2X− 2 S− 2 (x˙ ◦ s˙), from Lemma 9.2, uT v ≥ 0. Using Lemma 9.3, we have 2
2
IuI + IvI =
n � x¨2 si i
i=1
xi
+
n � s¨2 xi i
i=1
si
n � 2 2� � �2 � x˙i s˙i � � − 12 − 12 ≤ �−2X S (x˙ ◦ s˙)� = 4 xi si i=1
Dividing both sides of the inequality by µ and using (9.9) gives � n � � n � � n � x¨2 � � x˙2 s˙2 � s¨i2 i i i (1 − θ ) + ≤ 4(1 + θ ) , x2 i=1 s2i xi2 s2i i=1 i i=1 or equivalently � x¨ �2 � s¨ �2 � 1+θ � � � � � � x˙ s˙ �2 (1 + θ )n2 . � � +� � ≤ 4 � ◦ � ≤ (1 − θ )3 x s 1−θ x s Therefore, using Lemma 5.3, we have � x¨ �2 � s¨ �2 1 � (1 + θ )n2 �2 � � � � , � � � � ≤ 4 (1 − θ )3 x s and x¨ T s¨ µ
≤ ≤
� �T � � |x¨ |T |s¨| |x¨ |T |s¨| |x¨ | |s¨| ≤ (1 + θ ) ≤ (1 + θ ) µ maxi (xi si ) x s � x¨ �� s¨ � n2 (1 + θ )2 � �� � (1 + θ )� �� � ≤ , 2(1 − θ )3 x s
which is the second inequality of (9.29). From Lemma 9.2, x¨ T s¨ = x¨ T Hx¨ ≥ 0, we have the first inequality of (9.29). Similarly, it is easy to show (9.30).
Using the bounds established in Lemmas 9.2, 9.4, 9.5, and 5.4, we can obtain the lower bound and upper bound for µ(α). Lemma 9.6 Let (x, λ , s) ∈ N2 (θ ), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) meet (9.11) and (9.12). Let x(α) and s(α) be defined by (9.13) and (9.15). Then, 1 (x˙ T s˙) sin4 (α) + (x˙ T s˙) sin2 (α) n µ(α) = µ(1 − sin(α)) 1 + x¨ T s¨ − x˙ T s˙ (1 − cos(α))2 − x˙ T s¨ + s˙T x¨ sin(α)(1 − cos(α)) n 1 ¨ sin4 (α) + (x¨ T s) ¨ sin2 (α) . (9.31) µ(1 − sin(α)) + (x¨ T s) n µ(1 − sin(α)) −
≤
≤
An Arc-Search Algorithm for Convex Quadratic Programming
Proof 9.5
•
189
Using (9.13) and (9.15), we have
nµ(α) = x(α)T s(α)
=
xT − x˙ T sin(α) + x¨ T (1 − cos(α))
=
xT s − xT s˙ sin(α) + xT s¨(1 − cos(α))
=
use last rows of (9.11) and (9.12)
=
=
use (9.19) use Lemma 5.4 use Lemma 9.5
≤ ≤
s − s˙ sin(α) + s¨(1 − cos(α))
−x˙ T s sin(α) + x˙ T s˙ sin2 (α) − x˙ T s¨ sin(α)(1 − cos(α)) +x¨ T s(1 − cos(α)) − x¨ T s˙ sin(α)(1 − cos(α)) +x¨ T s¨(1 − cos(α))2 xT s − (xT s˙ + sT x˙ ) sin(α) + (xT s¨ + sT x¨ )(1 − cos(α)) −(x˙ T s¨ + s˙T x¨ ) sin(α)(1 − cos(α)) + x˙ T s˙ sin2 (α) +x¨ T s¨(1 − cos(α))2 nµ(1 − sin(α)) − 2x˙ T s˙(1 − cos(α)) −(x˙ T s¨ + s˙T x¨ ) sin(α)(1 − cos(α))
+x˙ T s˙(1 − cos2 (α)) + x¨ T s¨(1 − cos(α))2
nµ(1 − sin(α)) + (x¨ T s¨ − x˙ T s˙)(1 − cos(α))2
−(x˙ T s¨ + s˙T x¨ ) sin(α)(1 − cos(α))
nµ(1 − sin(α)) + (x¨ T s¨ − x˙ T s˙)(1 − cos(α))2
+(x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α)
nµ 1 − sin(α) + (x¨ T s¨) sin4 (α) + (x¨ T s¨) sin2 (α)
≤ nµ 1 − sin(α) + δ2 n sin2 (α) + δ2 n sin4 (α) .
This proves the second inequality of the lemma. Combining the last equality of the above formulas and (9.20) proves the first inequality of the lemma.
To keep all iterates of the algorithm inside the feasible set, we need (x(α), s(α)) > 0 in all the iterations. We will prove that this is guaranteed if µ(α) > 0 holds. The following corollary states the condition for µ(α) > 0 to hold. Corollary 9.1 If µ > 0, then for any fixed θ ∈ (0, 1), there is an α¯ depending on θ , such that for any sin(α) ≤ sin(α¯ ), µ(α) > 0. In particular, if θ = 0.148, sin(α¯ ) ≥ 0.6286. Proof 9.6
Since � � 1 µ(α) ≥ µ (1 − sin(α) − (x˙ T s˙) sin4 (α) + (x˙ T s˙) sin2 (α) nµ � � (1 + θ ) := µr(α), ≥ µ (1 − sin(α) − sin4 (α) + sin2 (α) 2(1 − θ )
µ > 0, and r(α) is a monotonic decreasing function in [0, π2 ] with r(0) > 0, r( π2 ) < 0, there is a unique real solution sin(α¯ ) ∈ (0, 1) of r(α) = 0 such that for all sin(α)
0 , or µ(α) > 0. It is easy to verify that if θ = 0.148, sin(α¯ ) = 0.6286 is the solution of r(α) = 0. Remark 9.1 Intuitively, to search in a wider region will generate a longer step. Therefore, the larger the θ is, the better. However, in order to derive the convergence result, θ ≤ 0.148 is imposed in Lemma 9.12.
To reduce the duality measure in an iteration, we need to have µ(α) ≤ µ. For linear programming, it has been shown in Chapter 5 that µ(α) ≤ µ for α ∈ [0, αˆ ] with αˆ = π2 , and the larger the α in the interval is, the smaller the µ(α) will be. This claim may not be true for the convex quadratic programming and it needs to be modified. We first introduce the famous Cardano’s formula, which can be found in [113]. Lemma 9.7 Let p and q be the real numbers that are related to the following cubic algebra equation x3 + px + q = 0. If Δ=
� q �2
+
� p �3
> 0, 2 3 then the cubic equation has one real root that is given by � �� � � � � �� � � � 3 q q 2 p 3 3 q q 2 p 3 x= − + + + − − + . 2 2 3 2 2 3 Lemma 9.8 Let (x, λ , s) ∈ N2 (θ ), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) meet (9.11) and (9.12). Let x(α) and s(α) be defined by (9.13) and (9.15). Then, there exists ⎧ π x¨ T s¨ 1 ⎪ ⎨ 2 , if nµ ≤ 2 αˆ = (9.32) ⎪ T ⎩ sin−1 (g), if x¨nµs¨ > 12 where � � � � � � � �3 � � � � nµ �2 � 1 �3 � 2 3 3 nµ nµ 1 nµ � � g= + + + − + , 2x¨ T s¨ 3 2x¨ T s¨ 2x¨ T s¨ 3 2x¨ T s¨ such that for every α ∈ [0, αˆ ], µ(α) ≤ µ.
An Arc-Search Algorithm for Convex Quadratic Programming
Proof 9.7
Clearly, if
�
191
From the second inequality of (9.31), we have � � x¨ T s¨ x¨ T s¨ 3 sin(α) + sin (α) . µ(α) − µ ≤ µ sin(α) −1 + nµ nµ x¨ T s¨ nµ
≤ 12 , for any α ∈ [0, π2 ], the function � � x¨ T s¨ 3 x¨ T s¨ sin(α) + sin (α) ≤ 0, f (α) := −1 + nµ nµ T
and µ(α) ≤ µ. If x¨nµs¨ > 21 , using Cardano’s formula, we conclude that the function f has one real solution sin(α) ∈ (0, 1). The solution is given as � � � � � � �3 � � � � � � nµ �2 � 1 �3 2 3 3 1 nµ nµ nµ � � sin(αˆ ) = + + + − + . 2x¨ T s¨ 2x¨ T s¨ 3 2x¨ T s¨ 2x¨ T s¨ 3 This proves the Lemma.
According to Theorem 9.1, Lemmas 9.1, 9.4, 9.5, 9.6, and 9.8, if α is small enough, then (x(α), s(α)) > 0, and µ(α) < µ, i.e., the search along the arc de fined by Theorem 9.1, will generate a strict feasible point with smaller duality measure. Since (x, s) > 0 holds in all iterations, reducing the duality measure to zero means approaching to the solution of the convex quadratic programming. We will apply a similar idea used in Chapters 3 and 5, i.e., starting with an iterate in N2 (θ ), searching along the approximated central path in order to reduce the duality measure and to keep the iterate in N2 (2θ ), and then making a correction to move the iterate back to N2 (θ ). Let a0 = −θ µ < 0, a1 = θ µ > 0, a2 = 2θ a3
a4
x˙ T s˙ x˙ T H˙x = 2θ ≥ 0, n n
� � 1 � � = � x˙ ◦ s¨ + s˙ ◦ x¨ − (x˙ T s¨ + s˙T x¨ )e� ≥ 0,
n
� � x˙ T s˙ 1
� � = � x¨ ◦ s¨ − s˙ ◦ x˙ − (x¨ T s¨ − s˙T x˙ )e� + 2θ
≥ 0.
n
n
We define a quartic polynomial in terms of sin(α) as follows:
q(α) = a4 sin4 (α) + a3 sin3 (α) + a2 sin2 (α) + a1 sin(α) + a0 = 0.
(9.33)
192
• Arc-Search Techniques for Interior-Point Methods
Since q(α) is a monotonic increasing function of α ∈ [0, π2 ], q(0) < 0 and q( π2 ) > 0, the polynomial has exactly one positive root in [0, π2 ]. Moreover, since (9.33) is a quartic equation, all the solutions are analytical and the computational cost is independent of the size of A (n and m) and is negligible [35]. Lemma 9.9 Let θ ≤ 0.148 and (xk , λ k , sk ) ∈ N2 (θ ), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) be calculated from (9.11) and (9.12). Let sin(α˜ ) be the only positive real solution of (9.33) in [0, 1]. Assume sin(α) ≤ min{sin(α˜ ), sin(α¯ )}, let (x(α), λ (α), s(α)) and µ(α) be updated as follows (x(α), λ (α), s(α)) = (xk , λ k , sk ) − (x˙ , λ˙ , s˙) sin(α) + (x¨ , λ¨ , s¨)(1 − cos(α)), (9.34) µ(α) = +
µk (1 − sin(α)) � 1� T (x¨ s¨ − x˙ T s˙)(1 − cos(α))2 − (x˙ T s¨ + s˙T x¨ ) sin(α)(1 − cos(α)) . n (9.35)
Then, (x(α), λ (α), s(α)) ∈ N2 (2θ ). Proof 9.8 Since sin(α˜ ) is the only positive real solution of (9.33) in [0, 1] and q(0) < 0, substituting a0 , a1 , a2 , a3 and a4 into (9.33), we have, for all sin(α) ≤ sin(α˜ ), �� �� 1 T � � T ¨ ˙ ¨ ˙ ¨ ˙ ˙ ¨ �x ◦ s − s ◦ x − (x s − s x)e� sin4 (α) n �� �� 1 � � + �x˙ ◦ s¨ + s˙ ◦ x¨ − (x˙ T s¨ + s˙T x¨ )e� sin3 (α) n � � � � x˙ T s˙ x˙ T s˙ 4 ≤ − 2θ sin2 (α) + θ µk (1 − sin(α)). (9.36) sin (α) − 2θ n n From (9.11), (9.12), (9.34) and (9.35), using Lemmas 5.4, 9.6 and (9.36), we have � � � � �x(α) ◦ s(α) − µ(α)e� � � = �(xk ◦ sk − µk e)(1 − sin(α)) � � 1 + x¨ ◦ s¨ − x˙ ◦ s˙ − (x¨ T s¨ − x˙ T s˙)e (1 − cos(α))2 n � � � 1 � − x˙ ◦ s¨ + s˙ ◦ x¨ − (x˙ T s¨ + x¨ T s˙)e sin(α)(1 − cos(α))� n � � � � ≤ (1 − sin(α))�xk ◦ sk − µk e� � � 1 � � +�(x¨ ◦ s¨ − x˙ ◦ s˙ − (x¨ T s¨ − x˙ T s˙))e�(1 − cos(α))2 n
An Arc-Search Algorithm for Convex Quadratic Programming
� � 1 � � +�(x˙ ◦ s¨ + s˙ ◦ x¨ − (x˙ T s¨ + x¨ T s˙)e� sin(α)(1 − cos(α)) n ≤ θ µk (1 − sin(α)) � � 1 � � +�(x¨ ◦ s¨ − x˙ ◦ s˙ − (x¨ T s¨ − x˙ T s˙))e� sin4 (α) + a3 sin3 (α) n x˙ T s˙ (sin4 (α) + sin2 (α)) ≤ 2θ µk (1 − sin(α)) − 2θ n ≤ 2θ µ(α). [use Lemma 9.6]
•
193
(9.37)
Hence, the point (x(α), λ (α), s(α)) satisfies the proximity condition for N2 (2θ ). To check the positivity condition (x(α), s(α)) > 0, note that the initial condition (x, s) > 0. It follows from (9.37) and Corollary 9.1 that, for θ ≤ 0.148 and sin(α) ≤ sin(α¯ ), xi (α)si (α) ≥ (1 − 2θ )µ(α) > 0.
(9.38)
Therefore, we cannot have xi (α) = 0 or si (α) = 0 for any index i when α ∈ [0, sin−1 (α¯ )]. This proves (x(α), s(α)) > 0. Remark 9.2 It is worthwhile to note, by examining the proof of Lemma 9.9, that sin(α˜ ) is selected for the proximity condition (9.37) to hold, and sin(α¯ ) is selected for µ(α) > 0, thereby assuring the positivity condition (9.38) to hold.
The lower bound of sin(α¯ ) is estimated in Corollary 9.1. To estimate the lower bound of sin(α˜ ), we need the following lemma. Lemma 9.10 Let (x, λ , s) ∈ N2 (θ ), (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) meet (9.11) and (9.12). Then, � � µ(1 + θ ) � � n, �x˙ ◦ s˙� ≤ 2(1 − θ )
(9.39)
� � (1 + θ )2 µ � � n2 , �x¨ ◦ s¨� ≤ 2(1 − θ )3
(9.40)
� � (1 + θ ) 32 µ 3 � � ˙ ¨ x ◦ s n2 , � �≤ (1 − θ )2 � � (1 + θ ) 32 µ 3 � � n2 . �x˙ ◦ s¨� ≤ (1 − θ )2 Proof 9.9
Since n � �2 n � �2 � x˙ �2 � � s˙ �2 � x˙i s˙i � � � � , � � = , � � = x xi s si i=1
i=1
(9.41) (9.42)
194
• Arc-Search Techniques for Interior-Point Methods
From Lemma 9.4, we have �
�2 n 2(1 − θ ) � n � � �� n � � � � x˙ �2 � s˙ �2 � x˙i 2 � s˙i 2 � � � � ≥ � � � � = x s xi si i=1 i=1 � � n 2 � x˙ s˙ �2 � x˙i s˙i � � ≥ =� ◦ � x s x i si i=1 �2 n � � �2 � x˙i s˙i 1 � � ≥ = �x˙ ◦ s˙� . 2 2 (1 + θ )µ (1 + θ ) µ i=1
This proves (9.39). Using n � �2 n � �2 � x¨ �2 � � s¨ �2 � x¨i s¨i � � � � , � � = , � � = x xi s si i=1
i=1
and Lemmas 9.5 and 5.3, then, following the same procedure, it is easy to verify (9.40). From (9.23) and (9.28), we have �� � � n (1 + θ )n2 (1 − θ ) (1 − θ )3 � x¨ �2 � s˙ �2 � x˙ �2 � s¨ �2 � � � � � � � � ≥ � � � � +� � � � x s �x n �s � � � � � n � � �� n � � � n � �2 � x¨i 2 � � x˙i 2 � s¨i 2 s˙i = + xi si xi si i=1 i=1 i=1 i=1 � � � � n n 2 � 2 � x¨i s˙i x˙i s¨i ≥ + xi si xi si i=1 i=1 �2 �2 � n � n � � x¨i s˙i x˙i s¨i ≥ + (1 + θ )µ (1 + θ )µ i=1 i=1 �� � � �2 � 1 � �2 � � ˙ ¨ ¨ ˙ = x ◦ s + x ◦ s � � � � . (1 + θ )2 µ 2 (9.43) This proves the lemma.
The next technical lemma will be used a few times in the rest of the book. Lemma 9.11 Let u and v be the n-dimensional vectors. Then, � � � � 1 � � � � �u ◦ v − uT v e� ≤ �u ◦ v�. n
An Arc-Search Algorithm for Convex Quadratic Programming
Proof 9.10
•
195
Simple calculation gives � �2 1 � � �u ◦ v − uT v e� n �2 � n n
� 1� = ui vi − ui vi
n i=1 i=1 ⎛ �2 ⎞ � n n n � � � 1 ⎝u2i vi2 − 2ui vi = ui vi + 2 ui vi ⎠ n n i=1
=
n �
i=1
u2i vi2
i=1
=
n �
u2i vi2
i=1
2 − n
�
1 − n
�
n �
i=1
�2 ui vi
i=1 n �
1 + n
�
n �
�2 ui vi
i=1
�2 ui vi
i=1
≤ Iu ◦ vI2 .
This completes the proof.
This result will be used in the proof of the following lemma. Lemma 9.12 Let θ ≤ 0.148. Then, sin(α˜ ) ≥
2θ √ n
for n ≥ 2.
Proof 9.11 First, notice that q(sin(α)) is a monotonic increasing function of sin(α) for α ∈ [0, π2 ] and q(sin(0)) < 0, therefore, we need only to show that 2θ ) < 0 for θ ≤ 0.148 and n ≥ 2. From Lemma 9.11, we have q( √ n � � � � � � 1 � � � � � � �x˙ ◦ s¨ + s˙ ◦ x¨ − (x˙ T s¨ + s˙T x¨ )� ≤ �x˙ ◦ s¨� + �s˙ ◦ x¨ �, n and
� � � � � � 1 � � � � � � �x¨ ◦ s¨ − s˙ ◦ x˙ − (x¨ T s¨ − s˙T x˙ )� ≤ �x¨ ◦ s¨� + �s˙ ◦ x˙ �. n Using these results and Lemmas 9.10, 9.4, and 9.5 for the quartic polynomial (9.33), we have, for α ∈ [0, π2 ], �� � � � � � � � � x˙ T s˙ � � � � � � � � q(sin(α)) ≤ �x¨ ◦ s¨� + �s˙ ◦ x˙ � + 2θ sin4 (α) + �x˙ ◦ s¨� + �s˙ ◦ x¨ � sin3 (α) n x˙ T s˙ 2 sin (α) + θ µ sin(α) − θ µ + 2θ n � � (1 + θ )2 2 n(1 + θ ) θ (1 + θ ) ≤µ n + + sin4 (α) 2(1 − θ )3 2(1 − θ ) (1 − θ )
196
• Arc-Search Techniques for Interior-Point Methods 3
+2
� (1 + θ ) 2 3 3 θ (1 + θ ) 2 2 sin (α) + n sin (α) + θ sin(α) − θ . (1 − θ )2 (1 − θ )
Substituting sin(α) =
2θ √ n
gives
� 3 3 � 2θ � � � (1 + θ )2 n(1 + θ ) θ (1 + θ ) 16θ 4 (1 + θ ) 2 n 2 8θ 3 2 q √ ≤µ n + + + 2 2(1 − θ )3 (1 − θ )2 n 32 2(1 − θ ) (1 − θ ) n2 n � θ (1 + θ ) 4θ 2 2θ + +θ √ −θ (1 − θ ) n n � 8θ 3 (1 + θ )2 8θ 3 (1 + θ ) 16θ 4 (1 + θ ) =θ µ + + (1 − θ )3 n(1 − θ ) (1 − θ )n2 3 � 16θ 2 (1 + θ ) 2 4θ 2 (1 + θ ) 2θ √ + + + − 1 := θ µ p(θ ). (9.44) n(1 − θ ) (1 − θ )2 n Since p(θ ) is monotonic increasing function of θ , p(0) < 0, n ≥ 2, and it is easy to verify that p(0.148) < 0 for n = 2, this proves the lemma.
9.2.2 Corrector step Corollary 9.1, Lemmas 9.1, 9.9, and 9.12 prove the feasibility of searching for the optimizer along the ellipse. To move the iterate back to N2 (θ ), we use the direction defined by ⎡
A 0 ⎣ −H AT S(α) 0
⎤⎡ ⎤ ⎡ ⎤ 0 Δx 0 ⎦, I ⎦ ⎣ Δλ ⎦ = ⎣ 0 X(α) Δs µ(α)e − x(α) ◦ s(α)
(9.45)
and we update (xk+1 , λ k+1 , sk+1 ) and µk+1 by (xk+1 , λ k+1 , sk+1 ) = (x(α), λ (α), s(α)) + (Δx, Δλ , Δs),
(9.46)
T
xk+1 sk+1 µk+1 = . (9.47) n Next, we show that the combined step (searching along the arc in N2 (2θ ) and moving back to N2 (θ )) will reduce the duality gap of the iterate, i.e., µk+1 < µk , if we select some appropriate θ and α. We introduce the following Lemma before we prove this result. Lemma 9.13 Let (x(α), λ (α), s(α)) ∈ N2 (2θ ) and (Δx, Δλ , Δs) be defined as in (9.45). Then, 0≤
ΔxT Δs 2θ 2 (1 + 2θ ) δ0 ≤ µ(α) := µ(α). 2 n n n(1 − 2θ )
(9.48)
An Arc-Search Algorithm for Convex Quadratic Programming
•
197
Proof 9.12 First, pre-multiply ΔxT in the second row of (9.45), then, applying the first row of (9.45) gives 0 ≤ ΔxT HΔx = ΔxT Δs, which is the first inequality of (9.48). From the third row of (9.45), we have S(α)Δx + X(α)Δs = µ(α)e − X(α)S(α)e. 1
1
Multiplying both sides by X− 2 (α)S− 2 (α) gives 1
1
1
1
1
1
X− 2 (α)S 2 (α)Δx + X 2 (α)S− 2 (α)Δs = X− 2 (α)S− 2 (α) µ(α)e − X(α)S(α)e . 1
1
Let D = X 2 (α)S− 2 (α), u = D−1 Δx, and v = DΔs. It is easy to see that uT v = ΔxT Δs = ΔxT HΔx ≥ 0. Using Lemma 9.3 and the assumption of (x(α), λ (α), s(α)) ∈ N2 (2θ ), we have n � � (Δxi )2 si (α) i=1 −1
=ID
xi (α)
(Δsi )2 xi (α) + si (α)
�
ΔxI2 + IDΔsI2
1
1
≤IX− 2 (α)S− 2 (α)[µ(α)e − X(α)S(α)e]I2 n � (µ(α) − xi (α)si (α))2 ≤ xi (α)si (α) ≤ ≤
i=1 n 2 i=1 (µ(α) − xi (α)si (α))
mini (xi (α)si (α))
(2θ )2 µ(α) (2θ )2 µ 2 (α) = . (1 − 2θ )µ(α) (1 − 2θ )
(9.49)
Dividing both sides by µ(α) and using xi (α)si (α) ≥ µ(α)(1 − 2θ ) yields n �
� (Δxi )2 (Δsi )2 (1 − 2θ ) 2 + xi (α) s2i (α) i=1 �� � � � � � Δx �2 � Δs �2 =(1 − 2θ ) � � +� � x(α) s(α) ≤
�
(2θ )2 , (1 − 2θ )
(9.50)
i.e., � Δx �2 � Δs �2 � 2θ �2 � � � � . � � +� � ≤ x(α) 1 − 2θ s(α)
(9.51)
198
� Arc-Search Techniques for Interior-Point Methods
Invoking Lemma 5.3, we have � Δx �2 � Δs �2 1 � 2θ �4 � � � � . � � ·� � ≤ x(α) 4 1 − 2θ s(α)
(9.52)
� Δx � � Δs � 2θ 2 � � � � . � �·� �≤ x(α) s(α) (1 − 2θ )2
(9.53)
This gives
Using Cauchy-Schwarz inequality, we have
≤
(Δx)T (Δs) µ(α) n � |Δxi ||Δsi | i=1
µ(α)
n � |Δxi | |Δsi | xi (α) si (α) i=1 � Δx �T � Δs � � � � � =(1 + 2θ )� � � � x(α) s(α) � Δx � � Δs � � � � � ≤(1 + 2θ )� �·� � x(α) s(α)
≤(1 + 2θ )
≤
2θ 2 (1 + 2θ ) . (1 − 2θ )2
(9.54)
Therefore, (Δx)T (Δs) 2θ 2 (1 + 2θ ) ≤ µ(α). n n(1 − 2θ )2
(9.55)
This proves the lemma.
For linear programming, it is known (see Chapter 3) that µk+1 = µ(α). This claim is not always true for the convex quadratic programming, as is pointed out in Lemma 9.14. Therefore, some extra work is needed in order to make sure that the duality measure will be reduced in every iteration. Lemma 9.14 Let (x(α), λ (α), s(α)) ∈ N2 (2θ ) and (Δx, Δλ , Δs) be defined as in (9.45). Let (xk+1 , λ k+1 , sk+1 ) be defined as in (9.46). Then, � � � � T 2θ 2 (1 + 2θ ) δ0 xk+1 sk+1 ≤ µ(α) 1 + µ(α) ≤ µk+1 := = µ(α) 1 + n n n(1 − 2θ )2
An Arc-Search Algorithm for Convex Quadratic Programming
Proof 9.13 Using the third row of (9.45), we have Lemma 9.13, it is, therefore, straightforward to obtain
x(α)T Δs+s(α)T Δx n
•
199
= 0. From
x(α)T s(α) 1 T (x(α) + Δx)T (s(α) + Δs) + Δx Δs = n n n 2θ 2 (1 + 2θ ) µk+1 ≤ µ(α) + µ(α). n(1 − 2θ )2
µ(α) ≤ =
This proves the lemma.
Now, we show that the correction step brings the iterate from N2 (2θ ) back to N2 (θ ). Lemma 9.15 Let (x(α), λ (α), s(α)) ∈ N2 (2θ ) and (Δx, Δλ , Δs) be defined as in (9.45). Let (xk+1 , λ k+1 , sk+1 ) be updated by using (9.46). Then, for θ ≤ 0.148 and sin(α) ≤ sin(α¯ ), (xk+1 , λ k+1 , sk+1 ) ∈ N2 (θ ). Proof 9.14 From Lemma 9.11, we have � �2 � �2 1 1 T � � 0 ≤ �Δx ◦ Δs − (ΔxT Δs)e� = IΔx ◦ ΔsI2 − n Δx Δs ≤ IΔx ◦ ΔsI2 . (9.56) n n 1
1
Let D = X 2 (α)S− 2 (α). Pre-multiplying X(α)S(α) yields DΔs + D−1 Δx = X(α)S(α)
− 12
− 12
in the last row of (9.45)
µ(α)e − X(α)S(α)e .
Let u = DΔs, v = D−1 Δx, use the technical Lemma 3.2 and the assumption of (x(α), λ (α), s(α)) ∈ N2 (2θ ), we have � � � � � �2 − 12 3� � � � � � µ(α)e − X(α)S(α)e � �Δx ◦ Δs� = �u ◦ v� ≤ 2− 2 � X(α)S(α) = 2
− 32
n � (µ(α) − xi (α)si (α))2 i=1
3
≤ 2− 2 ≤
3
2− 2
xi (α)si (α)
Iµ(α)e − x(α) ◦ s(α)I2 mini (xi (α)si (α)) 2 1 θ µ(α) (2θ )2 µ(α)2 = 22 . (1 − 2θ )µ(α) (1 − 2θ )
(9.57)
Define (xk+1 (t), sk+1 (t)) = (x(α), s(α)) +t(Δx, Δs). Using the last row of (9.45) and nµ(α) = x(α)T s(α), we have x(α) + tΔx µk+1 (t) =
T
n
s(α) + tΔs
200
• Arc-Search Techniques for Interior-Point Methods
=
ΔxT Δs x(α)T s(α) + t 2 ΔxT Δs = µ(α) + t 2 . n n
(9.58)
Using this relation and (9.56), (9.57), and Lemmas 9.9 and 9.14, we have
= = =
� � � k+1 � �x (t) ◦ sk+1 (t) − µk+1 (t)e� � � t2 � � �[x(α) + tΔx] ◦ [s(α) + tΔs] − µ(α)e − ΔxT Δse� n � � t2 � � �x(α) ◦ s(α) + t[µ(α)e − x(α) ◦ s(α)] + t 2 Δx ◦ Δs − µ(α)e − ΔxT Δse� n � �� � 1 � � �(1 − t) [x(α) ◦ s(α) − µ(α)e] + t 2 Δx ◦ Δs − ΔxT Δse � n 1
≤ ≤ :=
22 θ2 (1 − t)(2θ )µ(α) + t 2 µ(α) (1 − 2θ ) � � 1 2 2 2 2 θ (1 − t)(2θ ) + t µk+1 (1 − 2θ )
[use Lemma 9.9, (9.56), and (9.57)] [use Lemma 9.14]
f (t, θ )µk+1 .
(9.59)
� � � � Therefore, taking t = 1 gives �xk+1 ◦ sk+1 − µk+1 e� ≤ that, for θ ≤ 0.29, 1 22 θ2 ≤ θ. (1 − 2θ )
1
2 2 θ2 (1−2θ ) µk+1 .
It is easy to see
For θ ≤ 0.148 and t ∈ [0, 1], noticing that 0 ≤ f (t, θ ) ≤ f (t, 0.148) ≤ 0.296(1 − t) + 0.044t 2 < 1, and using Corollary 9.1, we have, for an additional condition sin(α) ≤ sin(α¯ ), xik+1 (t)sik+1 (t) ≥ (1 − f (t, θ )) µk+1 (t) � � t2 = (1 − f (t, θ )) µ(α) + ΔxT Δs n ≥ (1 − f (t, θ )) µ(α) > 0,
(9.60)
Therefore, (xk+1 (t), sk+1 (t)) > 0 for t ∈ [0, 1], i.e., (xk+1 , sk+1 ) > 0. This finishes the proof. Lemma 9.16 For θ ≤ 0.148, if
θ sin(α) = √ , n
(9.61)
An Arc-Search Algorithm for Convex Quadratic Programming
then µk+1 < µk . Moreover, for sin(α) =
θ √
n
•
201
,
� � 0.148θ µk+1 ≤ µk 1 − √ . n
(9.62)
Proof 9.15
From Lemmas 9.14, 9.6, 5.4, 9.2, 9.4, and 9.5, we have � � 2θ 2 (1 + 2θ ) µk+1 ≤ µ(α) 1 + (9.63a) n(1 − 2θ )2 � � T � � � � � � x˙ T s¨ s˙T x¨ δ0 x¨ s¨ x˙ T s˙ 4 3 ≤µk 1 − sin(α) + − sin (α) − + sin (α) 1+ nµ nµ nµ nµ n � � � � 1 3 n(1 + θ )2 4 2n 2 (1 + θ ) 2 δ0 ≤µk 1 − sin(α) + sin (α) + sin3 (α) 1+ (1 − θ )2 n 2(1 − θ )3 (9.63b)
�
�
�
�
δ0 δ0 δ0
− 1+ sin(α) + δ2 n 1 + sin4 (α)
n n n � � � 1 δ0 + 2δ3 n 2 1 + sin3 (α) . n �
=µk 1 +
Substituting sin(α) =
θ √ n
(9.63c)
into (9.63c) gives
µk+1 ≤
µk 1 +
2θ 2 (1 + 2θ ) θ 2θ 3 (1 + 2θ ) θ 4 (1 + θ )2 √ − − + n(1 − 2θ )2 n n 32 (1 − 2θ )2 2n(1 − θ )3 3
3
θ 6 (1 + θ )2 (1 + 2θ ) 2θ 3 (1 + θ ) 2 4θ 5 (1 + θ ) 2 (1 + 2θ ) + 2 + n2 (1 − θ )3 (1 − 2θ )2 n (1 − 2θ )2 (1 − θ )2 n(1 − θ )2 � � 3 � 1 1 2θ (1 + 2θ ) θ 3 (1 + θ )2 2θ 2 (1 + θ ) 2 � = µk 1 − θ √ − + + 2(1 − θ )3 (1 − θ )2 n n (1 − 2θ )2 � � 3 � 2θ 2 (1 + 2θ ) 1 θ 5 (1 + θ )2 (1 + 2θ ) 4θ 4 (1 + θ ) 2 (1 + 2θ ) � −θ 3 − 2 + (1 − θ )3 (1 − 2θ )2 (1 − 2θ )2 (1 − θ )2 n 2 (1 − 2θ )2 n +
(9.64)
For θ ≤ 0.148, we have 3
2θ 2 (1 + 2θ ) θ 5 (1 + θ )2 (1 + 2θ ) 4θ 4 (1 + θ ) 2 (1 + 2θ ) > + , (1 − 2θ )2 (1 − θ )3 (1 − 2θ )2 (1 − 2θ )2 (1 − θ )2 and
3
2θ (1 + 2θ ) θ 3 (1 + θ )2 2θ 2 (1 + θ ) 2 + + < 0.852, 2(1 − θ )3 (1 − θ )2 (1 − 2θ )2
202
• Arc-Search Techniques for Interior-Point Methods
therefore, � µk+1 < µk 1 − θ
1 0.852 √ − √ n n
�
� � 0.148θ = µk 1 − √ . n
This proves (9.62).
We summarize all the results in this section as the following theorem. Theorem 9.2 Let θ = 0.148, n ≥ 2, and (xk , λ k , sk ) ∈ N2 (θ ). Then, for sin(α ) = holds that (x(α), λ (α), s(α)) ∈ N2 (2θ ); √ . µk 1 − 0.148θ n
(xk+1 , λ k+1 , sk+1 )
θ √
n
, it
∈ N2 (θ ); and µk+1 ≤
Proof 9.16 From Corollary 9.1 and Lemma 9.12, we have sin(α) ≤ min{sin(α˜ ), sin(α¯ )}. Therefore, Lemma 9.9 holds, i.e., (x(α), λ (α), s(α)) ∈ N2 (2θ ). Since sin(α) ≤ sin(α¯ ) and (x(α), λ (α), s(α)) ∈ N2 (2θ ), Lemma 9.15 states (xk+1 , λ k+1 , sk+1 ) ∈ N2 (θ ). Since sin(α) = √θn , Lemma 9.16 states µk+1 ≤ √ µk 1 − 0.148θ . This finishes the proof. n
We present the proposed algorithm as follows: Algorithm 9.1 (Arc-search path-following)
Data: A, H � 0, b, c, θ = 0.148, ε > 0, initial point (x0 , λ 0 , s0 ) ∈ N2 (θ ), and µ0 =
T
x0 s0
n .
for iteration k = 0, 1, 2, . . . Step 1: Solve the linear systems of equations (9.11) and (9.12) to get (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨). Step 2: Let sin(α ) = (9.35).
θ √
n
. Update (x(α), λ (α), s(α)) and µ(α) by (9.34) and
Step 3: Calculate (Δx, Δλ , Δs) by solving (9.45), update (xk+1 , λ k+1 , sk+1 ) and µk+1 by using (9.46) and (9.47). Set k + 1 → k. Go back to Step 1. end (for)
9.3
Convergence Analysis
The first result in this section extends a result of linear programming (Theorem 2.1) to convex quadratic programming.
An Arc-Search Algorithm for Convex Quadratic Programming
•
203
Lemma 9.17 Suppose Assumption 2 holds, i.e., F o �= ∅. Then, for each K ≥ 0, the set {(x, s) | (x, λ , s) ∈ F, xT s ≤ K} is bounded. Proof 9.17 The proof is almost identical to the proof in Theorem 2.1. It is given here for completeness. Let (x¯ , λ¯ , s¯) be any fixed vector in F o , and (x, λ , s) be any vector in F with xT s ≤ K. Then, AT (λ¯ − λ ) + (s¯ − s) − H(x¯ − x) = 0, therefore, (x¯ − x)T AT (λ¯ − λ ) + (s¯ − s) − H(x¯ − x) = 0. Since A(x¯ − x) = 0, this means (x¯ − x)T (s¯ − s) = (x¯ − x)T H(x¯ − x) ≥ 0. This leads to ¯ x¯ T s¯ + K ≥ x¯ T s¯ + xT s ≥ x¯ T s + xT s. Since (x¯ , s¯) > 0 is fixed, let ξ = min
i=1,··· ,n
min{x¯i , s¯i }.
Then, x¯ T s¯ + K ≥ ξ eT (x + s) ≥ max max{ξ xi , ξ si }, i=1,··· ,n
i.e., for i ∈ {1, · · · , n}, 0 ≤ xi ≤
1 (K + x¯ T s¯), ξ
0 ≤ si ≤
1 (K + x¯ T s¯). ξ
This proves the lemma.
The following theorem is a direct result of Lemmas 9.17, 9.1, Theorem 9.2, KKT conditions, Theorem 1.5. Theorem 9.3 Suppose Assumptions 1 and 2 hold, then the sequence generated by Algorithm 9.1 converges to a set of accumulation points, and all these accumulation points are global optimal solutions of the convex quadratic programming.
204
• Arc-Search Techniques for Interior-Point Methods
Let (x∗ , λ ∗ , s∗ ) be any solution of (9.5), where x∗ is a solution of the primary quadratic programming and (λ ∗ , s∗ ) is a solution of the dual quadratic program ming, following the notation of [10], we denote index sets B, S, and T as B = { j ∈ {1, . . . , n} | x∗j �= 0}.
(9.65)
S = { j ∈ {1, . . . , n} | s∗j �= 0}.
(9.66)
T = { j ∈ {1, . . . , n} |
s∗j
=
x∗j
= 0}.
(9.67)
Goldman-Tucker demonstrated [40] (see Theorem 1.3) that if H = 0, then there is at least one primal-dual solution (x∗ , λ ∗ , s∗ ) such that the corresponding B ∩ S = ∅ = T and B ∪ S = {1, . . . , n}. A solution with this property is called strictly complementary. This property has been used in many papers to prove the locally super-linear convergence of interior-point algorithms in linear pro gramming. However, it is pointed out in [49] that this partition does not hold for general quadratic programming problems. A simple example of convex quadratic program is given by 1 min x2 , x ≥ 0. 2 Its dual program is 1 min s2 , s ≥ 0. 2 ∗ Clearly, the optimal solution x = 0 and s∗ = 0 is not a strictly complementary solution. We will show that as long as a convex quadratic programming has strictly complementary solution(s), an interior-point algorithm will generate a sequence to approach strict complementary solution(s). As a matter of fact, from Lemma 9.17, we can extend the result of [144, Lemma 5.13] to the case of convex quadratic programming, and obtain the following lemma which is independent of any algorithm. Lemma 9.18 Let µ0 > 0, and γ ∈ (0, 1). Assume that the convex QP has strictly complementary solution(s). Then, for all points (x, λ , s) ∈ F o , xi si > γ µ, and µ < µ0 , there are constants M, C1 , and C2 such that I(x, s)I ≤ M, 0 < xi ≤ µ/C1 (i ∈ S), si ≥ C2 γ (i ∈ S),
(9.68)
0 < si ≤ µ/C1 (i ∈ B).
(9.69)
xi ≥ C2 γ (i ∈ B).
(9.70)
An Arc-Search Algorithm for Convex Quadratic Programming
•
205
Proof 9.18 The first result follows immediately from Lemma 9.17 by setting K = nµ0 . Let (x∗ , λ ∗ , s∗ ) be any primal-dual strictly complementary solution. Since (x∗ , λ ∗ , s∗ ) and (x, λ , s) are both feasible, A(x − x∗ ) = 0, (x∗ , λ ∗ , s∗ )
AT (λ − λ ∗ ) + (s − s∗ ) − H(x − x∗ ) = 0.
(x − x∗ )T (s − s∗ ) = (x − x∗ )T H(x − x∗ ) ≥ 0.
(9.71)
xi∗
is strictly complementary solution, T = ∅, = 0 for i ∈ S, and Since s∗i = 0 for i ∈ B. Since xT s = nµ, (x∗ )T s∗ = 0, from (9.71), we have � � nµ ≥ xT s∗ + sT x∗ = xi s∗i + xi∗ si . i∈S
i∈B
Since each term in the summations is positive and bounded above by nµ, we have for any i ∈ S, s∗i > 0, therefore, 0 < xi ≤
nµ . s∗i
Denote D∗ = {(λ ∗ , s∗ ) | si∗ > 0} and P∗ = {(x∗ ) | xi∗ > 0}, we have 0 < xi ≤ This leads to max xi ≤ i∈S
sup(λ ∗ ,s∗ )∈D∗ s∗i
.
nµ . mini∈S sup(λ ∗ ,s∗ )∈D∗ s∗i
Similarly, max si ≤ i∈B
nµ
nµ . mini∈B supx∗ ∈P∗ xi∗
Combining these two inequalities gives max{max xi , max si } i∈S
i∈B
≤ =
nµ min{mini∈S sup(λ ∗ ,s∗ )∈D∗ s∗i , mini∈B supx∗ ∈P∗ xi∗ } µ . C1
This proves (9.69). Finally, xi si ≥ γ µ, hence, for any i ∈ S,
Similarly, for any i ∈ B,
si ≥
γµ γµ ≥ = C2 γ. µ/C1 xi
xi ≥
γµ γµ ≥ = C2 γ. µ/C1 si
This completes the proof.
Lemma 9.18 leads to the following
206
• Arc-Search Techniques for Interior-Point Methods
Theorem 9.4 Let (xk , λ k , sk ) ∈ N2 (θ ) be generated by Algorithm 9.1. Assume that the convex QP has strictly complementary solution(s). Then, every limit point of the sequence is a strictly complementary primary-dual solution of the convex quadratic programming, i.e., xi∗ ≥ C2 γ (i ∈ B). (9.72) s∗i ≥ C2 γ (i ∈ S), Proof 9.19 From Lemma 9.18, (xk , sk ) is bounded, therefore, there is at least one limit point (x∗ , s∗ ). Since (xik , sik ) is in the neighborhood of the central path, i.e., xik ski > γ µk = (1 − θ )µk , sik ≥ C2 γ (i ∈ S),
xik ≥ C2 γ (i ∈ B),
every limit point will meet (9.72) due to the fact that C2 γ is a constant.
√ We now show that the complexity bound of Algorithm 9.1 is O( n log(1/ε)). For this purpose, we need Theorem 1.4, which is restated below for convenience. Theorem 9.5 Let ε ∈ (0, 1) be given. Suppose that an algorithm for solving (9.5) generates a se quence of iterations that satisfies � � δ µk+1 ≤ 1 − ω µk , k = 0, 1, 2, . . . , (9.73) n for some positive constants δ and ω. Suppose that the starting point (x0 , λ 0 , s0 ) satisfies µ0 ≤ 1/ε. Then, there exists an index K with K = O(nω log(1/ε)) such that µk ≤ ε for ∀k ≥ K.
Combining Theorems 9.2 and 9.5 gives: Theorem 9.6 √ The complexity of Algorithm 9.1 is bounded by O( n log(1/ε)).
9.4
Implementation Details
Algorithm 9.1 is presented in a form that is convenient for convergence analysis. Some implementation details that make the algorithm effective and efficient are discussed in this section.
An Arc-Search Algorithm for Convex Quadratic Programming
•
207
9.4.1 Termination criterion Algorithm 9.1 needs a termination criterion in real implementation. One can use
µk ≤ ε,
k
IrB I = IAx − bI ≤ ε,
IrC I = IAT λ k + sk − Hxk − cI ≤ ε, k
k
(x , s ) > 0.
(9.74a)
(9.74b)
(9.74c)
(9.74d)
An alternative criterion is given in linprog [172] IrC I µ IrB I + + ≤ ε. T max{1, IbI} max{1, IcI} max{1, Ic xI, IbT λ I}
(9.75)
9.4.2 Finding initial (x0 , λ 0 , s0 ) ∈ N2 (θ )
Algorithm 9.1 requires an initial point (x0 , λ 0 , s0 ) ∈ N2 (θ ). We use a modified algorithm of [20] to provide such an initial point in our implementation. Denote X = diag(x1 , . . . , xn ), S = diag(s1 , . . . , sn ). Starting from any point (x, λ , s) with (x, s) > 0 that may or may not be in F o , moving the point to a point close to or on the central path amounts to approxi mately solving ⎛ ⎞ Ax − b (9.76) F(x(t), λ (t), s(t)) = ⎝AT λ + s − Hx − c⎠ = 0, (x, s) > 0. XSe − tµe (9.76) can be solved by repeatedly searching along Newton directions while keeping (x, s) > 0. In each step, the Newton direction (dx, dλ , ds) can be cal culated by ⎡ ⎤⎡ ⎤ ⎡ ⎤ A 0 0 dx 0 ⎦. ⎣ −H AT I ⎦ ⎣ dλ ⎦ = ⎣ 0 (9.77) S 0 X ds µe − x ◦ s This process is described in the following Algorithm 9.2 (Find Initial (x0 , λ 0 , s0 ) ∈ N2 (θ ))
Data: A, H � 0, b, c, ε > 0, and initial point (x0 , λ 0 , s0 ) with (x0 , s0 ) > 0.
for iteration k = 0, 1, 2, . . .
Check conditions IrB I = IAxk − bI ≤ ε,
(9.78a)
208
• Arc-Search Techniques for Interior-Point Methods
IrC I = IAT λ k + sk − Hxk − cI ≤ ε, k k
Irt I = IX S e − µeI ≤ θ µ, (xk , sk ) > 0.
(9.78b) (9.78c) (9.78d)
If (9.78) holds, (xk , λ k , sk ) is a point in N2 (θ ). Set the solution (x0 , λ 0 , s0 ) = (xk , λ k , sk ) and stop. If (9.78) does not hold, calculate the Newton direction (dxk , dλ k , dsk ) from (9.77). Carry out line search along the Newton direction (xk+1 , λ k+1 , sk+1 ) = (xk + αdxk , λ k + αdλ k , sk + αdsk )
(9.79)
such that the α satisfies (xk + αdxk , sk + αdsk ) > 0 and �
0, and the search direction is toward the central path, it is not surprising that we observe in all of our test examples that the condition (xk + αdxk , sk + αdsk ) > 0 always holds. However, a rigorous analysis is needed or an alternative algorithm with rigorous analysis should be used if one wants to find initial points for general convex quadratic programming problems.
9.4.3 Solving linear systems of equations In Algorithm 9.1, the majority computational operation in each iteration is to solve linear systems of equations (9.11), (9.12), and (9.45). Directly solving each of these linear systems of equations requires O(2n + m)3 operation count. The following Theorem and its corollaries provide a more efficient way to solve these linear systems of equations. Theorem 9.7 ˆ ∈ Rn×(n−m) be a base of the null space of A. Let (x, λ , s) ∈ F o and (x˙ , λ˙ , s˙) Let A meet (9.11). Then, x˙ ˆ −1 A ˆ (A ˆ T SX−1 + H A) ˆ T Se, = X−1 A x s˙ x˙ = e− , s x and λ˙ = AAT
−1
A (H˙x − s˙) .
An Arc-Search Algorithm for Convex Quadratic Programming
•
209
−1
Proof 9.20 Pre-multiplying AAT A in AT λ˙ + s˙ − Hx˙ = 0 gives the last equa x˙ tion. Since Ax˙ = 0, we have AX x = 0, this means that there exists a vector v such ˆ i.e., that X xx ˙ = Av, x˙
ˆ (9.81) = X−1 Av. x From the last row of (9.11) s˙ x˙ ˆ = e − = e − X−1 Av. s x
(9.82)
Similarly, AT λ˙ + s˙ − Hx˙ = 0 is equivalent to s˙ ˆ = 0. S−1 AT λ˙ + − S−1 HAv s
(9.83)
ˆ − S−1 HAv ˆ = 0, or in Substituting (9.82) into (9.83) gives S−1 AT λ˙ + e − X−1 Av matrix form � �� v � −1 −1 −1 T ˆ (X + S H)A, −S A = e. (9.84) λ˙ Since H is positive semidefinite, (SX−1 + H) is positive definite and invertible, hence, (X−1 +S−1 H) = S−1 (SX−1 +H) is invertible. Since (X−1 +S−1 H)−1 S−1 = ˆ are full rank matrices, we have that (SX−1 + H)−1 is positive definite, A and A ˆ T S(X−1 + S−1 H)A ˆ are positive definite and in A(X−1 + S−1 H)−1 S−1 AT and A vertible. It is easy to verify that ⎡ ⎣
⎤ �−1 � � ˆ T S(X−1 + S−1 H)A ˆ TS ˆ A A ˆ , −S−1 AT = I. ⎦ (X−1 + S−1 H)A � � −1 − A(X−1 + S−1 H)−1 S−1 AT A(X−1 + S−1 H)−1 �
Taking the inverse in (9.84) gives ⎡ ⎤ −1 � � ˆ T S(X−1 + S−1 H)A ˆ TS ˆ v A A ⎦ e. (9.85) =⎣ λ˙ −1 −1 −1 −1 T −1 −1 −1 −1 − A(X + S H) S A A(X + S H) Substituting (9.85) into (9.81) proves the result. ˆ will be a sparse matrix if A is a sparse matrix and if sparse Remark 9.4 A QR decomposition [26] is used. This feature is important for large size prob −1 lems. AAT A and H are constants and independent on iterations, therefore, −1 T AA A can be stored, and the computation of λ˙ can be very efficient. The ex actly same idea can be extended to solve (9.12) and (9.45).
210
• Arc-Search Techniques for Interior-Point Methods
Corollary 9.2 ˆ ∈ Rn×(n−m) be a base of the null space of A. Let (x, λ , s) ∈ F o , (x˙ , λ˙ , s˙) and Let A ¨ (x¨ , λ , s¨) meet (9.11) and (9.12). Define f = −2 xx˙ ◦ ss˙ . Then, x¨ ˆ ˆ A ˆ T SX−1 + H A = X−1 A x x¨ s¨ = f− , x s and λ¨ = AAT
−1
−1
ˆ T Sf, A
A (H¨x − s¨) .
Corollary 9.3 ˆ ∈ Rn×(n−m) be a base of the null space of A. Let (x, λ , s) ∈ F o , (x˙ , λ˙ , s˙) and Let A µe ¨ ¨ meet (9.11) and (9.12). Define f1 = x(α)◦s(α) − e. Then, (¨x, λ , s) Δx ˆ A ˆ T S(α)X−1 (α) + H A ˆ = X−1 (α)A x(α)
−1
ˆ T S(α)f1 , A
Δs Δx = f1 − , s(α) x(α) and, Δλ = AAT
−1
A (HΔx − Δs) .
Theorem 9.7, Corollaries 9.2 and 9.3 provide efficient formulas to calculate ˆ (x˙ , λ˙ , s˙), (x¨ , λ¨ �, s¨), and � (Δx, Δλ , Δs). Let A be obtained by QR decomposition R ˆ AT = [Q1 , A] . The computational procedure for x˙ , x¨ , s˙, s¨, λ˙ , and λ¨ is 0 summarized as follows. Algorithm 9.3 ˙ s, ¨ λ˙ , and λ¨ ) ˙ x, ¨ s, (compute x, ˆ H � 0, X, S, and e. Data: Matrices A, A, T ˆ Compute A Se. ˆ. ˆ T SX−1 + H A Compute P = A −1 Compute R = P . ˆ T Se, x˙ = X−1 x, ˆ A ˙ ss˙ = e − xx˙ , and s˙ = S ss˙ . Compute x˙ = AR x x˙ s˙ Compute f = −2 x ◦ s . ˆ ˆ T Sf and s¨ = Sf − SX−1 x¨ . Compute x¨ = ARA −1 −1 ˙ Compute λ = AAT A (Hx˙ − s˙) and λ¨ = AAT A (Hx¨ − s¨).
The computational counts for (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) are estimated as follows. ˆ T Se requires O((n − m)n); - Computing A
An Arc-Search Algorithm for Convex Quadratic Programming
�
211
� T � −1 � �−1 ˆ SX + H A ˆ - Computing R = A requires O((n − m)n2 + (n − 2 3 m) n + (n − m) ); � � ˆ ˆ T Se, x˙ = X−1 x, ˙ ss˙ = e − xx˙ , and s˙ = S ss˙ requires - Computing x˙ = ARA x O(n(n − m) + n); - Computing f = −2 xx˙ ◦ ss˙ requires O(n); ˆ A ˆ T Sf and s¨ = Sf − SX−1 x¨ requires O(n(n − m) + n). - Computing x¨ = AR � �−1 � �−1 - Computing λ˙ = AAT A (H˙x − s˙) and λ¨ = AAT A (H¨x − s¨) re quires O(n(n + m) + n). Clearly, this implementation is much cheaper than directly solving the sys tems of equations. A similar algorithm is used to calculate (Δx, Δλ , Δs). Algorithm 9.4 (compute Δx, Δλ , Δs) ˆ , H � 0, X(α), S(α), and e.
Data: Matrices A, A Compute f1 = X−1 (α)S−1 (α)µe − e.
ˆ T S(α)f1 .
Compute A � � ˆ. ˆ T S(α)X−1 (α) + H A Compute P = A −1 Compute R = P . ˆ A ˆ T S(α)f and Δs = S(α)f1 − S(α)X−1 (α)Δx. Compute Δx = AR � T �−1 1 Compute Δλ = AA A (HΔx − Δs).
9.4.4 Duality measure reduction Directly using sin(α) = √θn in Algorithm 9.1 provides a convenient formula to prove the polynomiality. However, this choice of sin(α) is too conservative in practice, because the step size in N2 (2θ ) may be small and the duality measure may not be reduced fast enough. A better choice of sin(α) should have a larger step in every iteration so that the polynomiality is reserved and fast convergence is achieved. From analysis in Section 9.2, conditions that restrict step size are proxim ity conditions, positivity conditions, and duality reduction condition. Assuming θ ≤ 0.148, the proximity condition (9.59) holds for (xk+1 , sk+1 ) without other re striction; however, three more factors restrict the search step length. First, prox imity condition (9.37) is met for sin(α) ∈ [0, sin(α˜ )], where sin(α˜ ) is the small est positive solution of (9.33) and it is estimated very conservatively in Lemma 9.12. However, an efficient implementation should use sin(α˜ ), the smallest pos itive solution of (9.33). Since (9.33) is a quartic function of sin(α), the cost of finding the smallest positive solution is negligible [35]. Second, from (9.38) and (9.60), µ(α) > 0 is required for positivity conditions (x(α), s(α)) > 0 and
212
� Arc-Search Techniques for Interior-Point Methods
(xk+1 , sk+1 ) > 0 to hold. Since sin(α¯ ) estimated in Corollary 9.1 may be a little conservative, we directly calculate sin(α¯ ), which is the smallest positive solution of � 1 � (9.86)
µ(α ) ≥ µ(1 − sin(α)) − (x˙ T s˙) sin4 (α ) + (x˙ T s˙) sin2 (α) = σ ,
n
where σ > 0 is a small number. The positivity conditions are guaranteed for all sin(α) ∈ [0, sin(α¯ )]. Since (9.86) is a quartic function of sin(α), the cost of finding the smallest positive solution is negligible. Third, from (9.63a) and Lemma 9.6, for µk+1 ≤ µk to hold, we need �
�
2θ 2 (1 + 2θ ) 2θ 2 (1 + 2θ ) − 1 +
sin(α)
n(1 − 2θ )2 n(1 − 2θ )2 � � �
2θ 2 (1 + 2θ ) x¨ T s¨ � 2 + 1 +
sin (α ) + sin4 (α) n(1 − 2θ )2 nµ ≤ 0. For the sake of convergence analysis, Lemma 9.16 is used. For efficient imple 2 (1+2θ ) mentation, the following solution should be adopted. Denote p = 2θ n(1−2θ )2 > 0, T
q = x¨nµs¨ > 0, z = sin(α) ∈ [0, 1], and
F (z) = (1 + p)qz4 + (1 + p)qz2 − (1 + p)z + p. For z ∈ [0, 1] and q ≤ 16 , F � (z) = (1 + p)(4qz3 + 2qz − 1) ≤ 0, therefore, the upper bound of the duality measure is a monotonic decreasing function of sin(α) for α ∈ [0, π2 ]. The larger the α is, the smaller the upper bound of the duality gap will be. For q > 16 , to minimize the upper bound of the duality gap, we can find the solution of F � (z) = 0. It is easy to check from discriminator of Lemma 9.7 (see also [113]) that the cubic polynomial F � (z) has only one real solution, which is given by � � �
�
� �
� � 3 � � �
� � nµ �2 � 1 �3 2 3 3 µ n
nµ
1
nµ
� � sin(α˘ ) =
+
+ +
+
− 8x¨ T s¨ 8x¨ T s¨ 8x¨ T s¨ 6 6
8x¨ T s¨ := χ.
(9.87)
Since F �� (sin(α˘ ) = (1 + p)(12q sin2 (α˘ ) + 2q) > 0, at sin(α˘ ) ∈ [0, 1), the upper bound of the duality gap is minimized. Therefore, we can define ⎧ T ⎪ π2 , if x¨nµs¨ ≤ 16 ⎨ α˘ =
(9.88)
⎪ ⎩ sin−1 (χ ) , if x¨ T s¨ > 1 . nµ 6
An Arc-Search Algorithm for Convex Quadratic Programming
213
•
It is worthwhile to note that for α < α˘ , F ' (sin(α)) < 0, i.e., F(sin(α)) is a mono tonic decreasing function of α ∈ [0, α˘ ]. As we can see from the above discussion, α¯ and α˘ are used for satisfying positivity conditions and minimizing the upper bound of the duality gap, and there is little room to improve these values. To further minimize the duality gap in each iteration, we may select the final step size sin(α) as follows. Algorithm 9.5 (Select step size)
Data: Fixed iteration number �, sin(α˘ ), sin(α¯ ), and sin(α˜ ).
Step 1: If sin(α˜ ) = min{sin(α˘ ), sin(α¯ ), sin(α˜ )} < min{sin(α˘ ), sin(α¯ )} = sin(αˇ ), using golden section search (� iterations) to get an α in the inter val [sin(α˜ ), sin(αˇ )] such that � � � � �x(α) ◦ s(α) − µ(α)e� ≤ 2θ µ(α). Step 2: Otherwise, select α = αˇ .
Therefore, Algorithm 9.1 can be implemented as follows. Algorithm 9.6 (Arc-search path-following) Data: A, H � 0, b, c, θ = 0.148, ε > σ > 0.
Step 0: Find initial point (x0 , λ 0 , s0 ) ∈ N2 (θ ) using Algorithm 9.2, and µ0 = for iteration k = 0, 1, 2, . . .
T
x0 s0 n .
Step 1: If (9.75) holds or µ ≤ σ , stop. Otherwise continue. Step 2: Compute (x˙ , λ˙ , s˙) and (x¨ , λ¨ , s¨) using Algorithm 9.3. Step 3: Find sin(α˜ ), the smallest positive solution of the quartic polyno mial of (9.33), sin(α¯ ), the smallest positive solution of the quartic polyno mial of (9.86), and sin(α˘ ) from (9.88). Select α by Algorithm 9.5. Update (x(α), λ (α), s(α)) and µ(α) by (9.34) and (9.35). Step 4: Compute (Δx, Δλ , Δs) using Algorithm 9.4, update (xk+1 , λ k+1 , sk+1 ) and µk+1 by using (9.46) and (9.47). Set k + 1 → k. Go back to Step 1. end (for) Remark 9.5 The condition µ > σ guarantees that the equation (9.86) has a positive solution before terminate criterion is met.
214
9.5
� Arc-Search Techniques for Interior-Point Methods
Numerical Examples
In this section, we will first use a simple example to demonstrate how the algo rithm works. Then, we will test QP examples originating from [53] and compare the result with the one reported in [47].
9.5.1 A simple example In this subsection, we use a problem in [106, page 464] to illustrate the dif ference between the active set method and the arc-search interior-point method developed in this chapter. The problem is given as follows: min f (x) = (x1 − 1)2 + (x2 − 2.5)2
(9.89a)
x1 − 2x2 + 2 ≥ 0, −x1 − 2x2 + 6 ≥ 0, −x1 + 2x2 + 2 ≥ 0, x1 ≥ 0, x2 ≥ 0.
(9.89b) (9.89c) (9.89d) (9.89e)
x
subject to
The optimal solution is x∗ = (1.4, 1.7). Having assumed that every search pro vides an accurate solution and the initial point is x0 = (2, 0), the active set al gorithm will find the initial active set, constraints 3 and 5, then searches to the point x1 = (1, 0), then searches to the point x2 = (1, 1.5), then finds the optimal solution. The detail of the search procedure is provided in [106, pages 464-465]. The search path is depicted in the red line in Figure 9.1. The problem is converted to the standard form suitable for interior point al gorithms as follows.
x1 ≥ 0,
1 min f (x) = xT Hx + cT x + 7.25 x 2 subject to x1 − 2x2 − x3 + 2 = 0, −x1 − 2x2 − x4 + 6 = 0, −x1 + 2x2 − x5 + 2 = 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0,
(9.90a) (9.90b) (9.90c) (9.90d) (9.90e)
where H = diag(2, 2, 0, 0, 0), cT = (−2, −5, 0, 0, 0), and bT = (−2, −6, −2). In standard form, the optimal solution is x∗ = (1.4, 1.7, 0.0, 1.2, 4.0). The initial point x0 = (2, 0.01, 0.01, 0.01, 0.01) and s0 = (0.5, 100, 100, 100, 100) is used so that the initial (x, s) is an interior point and close to the initial point of the active set method. This initial point satisfies x0 ◦ s0 = e. ε = 0.000001 is used in the termination criterion. σ = 0.000001ε 3 is used in (9.86). The central path pro jected in the (x1 , x2 ) plane in Figures 9.1, 9.2, and 9.3 is a dot line in black. The
An Arc-Search Algorithm for Convex Quadratic Programming
�
215
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0
0.5
1
1.5
2
2.5
3
3.5
4
Figure 9.1: Arc-search for the example in [106, page 464].
1.7
1.68
1.66
1.64
1.62
1.6 1.38 1.385 1.39 1.395
1.4
1.405 1.41 1.415 1.42 1.425
Figure 9.2: Arc-search approaches optimal solution for the example.
approximations of the central path are ellipses that are projected to the (x1 , x2 ) plane in Figures 9.1, 9.2, and 9.3. They are dot lines in blue. Unlike its counter part in linear programming, the central path is not close to a straight line, but the ellipse approximation of the central path in quadratic programming is still very good. The corrector step is also very efficient in bringing the iterate back to the central path. Figures 9.2 and 9.3 are magnified parts of Figure 9.1. They provide more detailed information when the arc-search approaches the optimal solution. In the plots, the red ‘x’ is the initial point of the test problem; the red circle ‘o’ are the points obtained by either Algorithm 9.2 (move to the central path initially) or the correct steps (Step 4); the black ‘x’ are the points obtained by searching along the ellipses (Step 3); the red ‘*’ is the optimal solution of the problem. After five
216
� Arc-Search Techniques for Interior-Point Methods
1.7 1.699 1.698 1.697 1.696 1.695 1.694 1.693 1.692 1.691 1.3999
1.4
1.4001
1.4002
1.4003
1.4004
Figure 9.3: Arc-search approaches optimal solution for the example.
(5) iterations, the algorithm finds the optimal solution of this problem. At the con vergence point, the slack variable is s∗ = (0, 0, 0.8, 0, 0), therefore, the algo rithm converges to a strict complementary solution. The duality gap values in the iterations are (µ0 , µ1 , µ2 , µ3 , µ4 , µ5 ) = (1, 0.45454, 0.03578, 6.38692e − 004, 3.08053e − 007, 7.63864e − 014). From these figures, we can see intu itively that searching along an ellipse that approximates the central path is attrac tive. The convergence rate appears to be super-linear.
9.5.2 Test on problems in [53] In [47], a quadratic programming software LOQO is presented and numerical test were conducted for many quadratic programming problems. Among these prob lems, some are from a set of nonlinear programming test problems published in a book [53] which is open to access, easy to convert to MATLAB format, and widely used by researchers to test nonlinear programming algorithms. Algorithm 9.1, with some improvements described in Section 9.4, is implemented in MAT LAB and tested for all 7 convex quadratic problems collected in [53]. The result is compared to the one obtained by LOQO [47]. The polynomial algorithm pro posed in this chapter uses the same number iterations as or fewer iterations than LOQO in every problem, and converges to some equally good or better points in all problems. The comparison is listed in Table 9.1. For these 7 problems, LOQO uses 67 iterations, while the proposed algorithm uses 49 iterations, 27% fewer in total iterations than LOQO; and the latter converges to points which have equal or higher accuracy in objective functions.
An Arc-Search Algorithm for Convex Quadratic Programming
•
217
Table 9.1: Iteration counts for test problems in [53]. Problem hs021 hs035 hs035mod hs051 hs052 hs051 hs076
9.6
iteration numbers using arc-search 12 6 5 4 8 8 6
objective value obtained by arc-search 4.0000001e-2 -8.8888889e+0 -8.7500000e+0 -6.0000000e+0 -6.7335244e-1 -1.9069767e+0 -4.6818182e+0
iteration numbers using LOQO 13 8 5 8 9 11 13
objective value obtained by LOGO 4.0000001e-2 -8.8888889e+0 -8.7499999e+0 -6.0000000e+0 -6.7335244e-1 -1.9069767e+0 -4.6818182e+0
Concluding Remarks
This chapter proposed an arc-search interior-point path-following algorithm for convex quadratic programming that searches for the optimizers along ellipses that approximate the central √ path. The algorithm is proved to be polynomial with the complexity bound O( n log(1/ε)), which is the best bound for linear pro gramming. A simple example is provided in order to demonstrate how the al gorithm works. Preliminary test on quadratic programming problems originating from [53] shows that the proposed algorithm is promising. As we already know, infeasible starting point and wide neighborhood are two other strategies that can also improve the computational efficiency. However, these strategies have not yet been investigated.
Chapter 10
An Arc-Search Algorithm for QP with Box Constraints
To avoid the cost of finding a feasible initial point for general inequality con strained optimization problems, infeasible interior-point method becomes very popular. However, if a feasible initial point is available, intuitively, a feasible interior-point method is more attractive than the infeasible interior-point method, because the effort to bring the iterates gradually from infeasible region to feasible region is not needed anymore. Another advantage of the feasible interior-point method is that all iterates are feasible, which is desired by many engineering problems where real-time optimization code may have to be terminated before an optimizer is found. In this case, the latest available iterate is still feasible and can be applied as a nearly optimal solution. In this chapter, we show that the convex quadratic programming problem subject only to bound (box) constraints does have an explicit feasible initial point. This problem has been studied in [51], where a first-order interior-point algorithm is considered. In this chapter, we will consider an arc-search higher-order interior-point algorithm, which starts at a feasible initial point. We believe that these two features will improve the com putational efficiency. We propose an algorithm that is specially designed for this problem and show that the algorithm is also more efficient than the algorithm in the previous chapter because of two improvements due to the special struc ture of the constraints: (1) the enlarged searching neighborhood, and (2) an ex plicit feasible initial interior point. Moreover, we show that this algorithm has
An Arc-Search Algorithm for QP with Box Constraints
�
219
a very desirable theoretical property, the best polynomial complexity. Although the idea of the proof of polynomiality is somewhat similar to that used in Chapter 9, we provided a complete proof for several reasons: (i) the proof shows that the searching neighborhood is larger than the general convex quadratic programming with linear constraints, which supports the claim of the efficiency of the newly proposed algorithm, (ii) the proof is fairly different from the one in Chapter 9 because of some special structure of the box constraints and the proof carefully takes care of these differences, and (iii) the proof provides complete results. The R . Some numerical test is presented in algorithm is implemented in MATLAB� order to demonstrate the effectiveness and efficiency of the proposed algorithm. The materials of this chapter are based on [155].
10.1
Problem Descriptions
We will consider a convex quadratic problem with box constraints in a standard form: (BQP) min
1 T T 2 x Hx + c x,
subject to − e ≤ x ≤ e,
(10.1)
where 0 � H ∈ Rn×n is a positive definite matrix, c ∈ Rn is given, and x ∈ Rn is the variable vector to be optimized. In view of the KKT conditions (1.49), since H is positive definite matrix, x is an optimal solution of (9.1) if and only if x, λ , and γ satisfy −λ + γ − Hx = c, −e ≤ x ≤ e, (λ , γ) ≥ 0, λi (ei − xi ) = 0, γi (ei + xi ) = 0, i = 1, . . . , n.
(10.2a)
(10.2b)
(10.2c)
(10.2d)
Denote y = e − x ≥ 0, z = e + x ≥ 0. The KKT conditions can be rewritten as Hx + c + λ − γ = 0, x + y = e, x − z = −e, (y, z, λ , γ) ≥ 0, λi yi = 0, γi zi = 0, i = 1, . . . , n.
(10.3a)
(10.3b)
(10.3c)
(10.3d)
For the convex (QP) problem, the KKT conditions are also sufficient for x to be a global optimal solution (see Chapter 1). Denote the feasible set F as a collection of all points that meet the constraints (10.3a), (10.3b), (10.3c) F = {(x, y, z, λ , γ) : Hx + c + λ − γ = 0, (y, z, λ , γ) ≥ 0, x + y = e, x − z = −e}, (10.4)
220
• Arc-Search Techniques for Interior-Point Methods
and the strictly feasible set F o as a collection of all points that meet the con straints (10.3a), (10.3b), and are strictly positive in (10.3c) F o = {(x, y, z, λ , γ) : Hx + c + λ − γ = 0, (y, z, λ , γ) > 0, x + y = e, x − z = −e}. (10.5) Similar to the linear programming, the central path C ∈ F o ⊂ F is defined as a curve in finite dimensional space parametrized by a scalar τ > 0 as follows. For each interior point (x, y, z, λ , γ) ∈ F o on the central path, there is a τ > 0 such that Hx + c + λ − γ = 0, x + y = e, x − z = −e, (y, z, λ , γ) > 0, λi yi = τ, γi zi = τ, i = 1, . . . , n.
(10.6a)
(10.6b)
(10.6c)
(10.6d)
Therefore, the central path is an arc that is parametrized as a function of τ and is denoted as C = {(x(τ), y(τ), z(τ), λ (τ), γ(τ)) : τ > 0}. (10.7)
As τ → 0, the moving point (x(τ), y(τ), z(τ), λ (τ), γ(τ)) on the central path represented by (10.6) approaches the solution of (QP) represented by (9.1). Throughout the rest of this chapter, the following assumption is made. Assumption: 1. F o is not empty. Assumption 1 implies the existence of a central path. This assumption is always true for the convex quadratic programming problem with box constraints. An explicit initial interior point will be provided later in this chapter. Let 1 > θ > 0, denote p = (y, z), ω = (λ , γ), and the duality measure µ=
λ T y + γ T z pT ω = . 2n 2n
(10.8)
A set of neighborhood of the central path is defined as N2 (θ ) = {(x, y, z, λ , γ) ∈ F o : Ip ◦ ω − µeI ≤ θ µ} ⊂ F o .
(10.9)
As the duality measure is reduced to zero, the neighborhood of N2 (θ ) will be a neighborhood of the central path that approaches the optimizer(s) of the BQP problem, therefore, all points inside N2 (θ ) will approach the optimizer(s) of the BQP problem. For (x, y, z, λ , γ) ∈ N2 (θ ), since (1 − θ )µ ≤ ωi pi ≤ (1 + θ )µ, where ωi are either λi or γi , and pi are either yi or zi , it must have ωi pi maxi ωi pi mini ωi pi ωi pi ≤ ≤µ≤ ≤ . 1+θ 1+θ 1−θ 1−θ
(10.10)
An Arc-Search Algorithm for QP with Box Constraints
•
221
10.2 An Interior-Point Algorithm for Convex QP with Box Constraints The idea of the arc-search algorithm proposed in this section is very simple. The algorithm starts from a feasible point in N2 (θ ) close to the central path, constructs an arc that passes through the point and approximates the central path, searches along the arc to a new point in a larger area N2 (2θ ) that reduces the duality measure pT ω and meets (10.6a), (10.6b), and (10.6c). The process is repeated by finding a better point close to the central path or on the central path in N2 (θ ) that simultaneously meets (10.6a), (10.6b), and (10.6c) and reduces the duality measure. When the duality measure is reduced to zero, the optimal solution is found. Following the idea used in Chapter 9, an ellipse E in an appropriate dimen sional space introduced in Chapter 5 will be used to approximate the central path C described by (10.6), where E
= {(x(α), y(α), z(α), λ (α), γ(α)) : (x(α), y(α), z(α), λ (α), γ(α)) =ba cos(α) +bb sin(α) +bc}, (10.11)
ba ∈ R5n and bb ∈ R5n are the axes of the ellipse, bc ∈ R5n is the center of the ellipse. Given a point (x, y, z, λ , γ) = (x(α0 ), y(α0 ), z(α0 ), λ (α0 ), γ(α0 )) ∈ E which is close to or on the central path,ba, bb,bc are functions of α, (x, λ , γ, y, z), (x˙ , y˙ , z˙ , λ˙ , γ˙ ), and (x¨ , y¨ , z¨ , λ¨ , γ¨ ), where (x˙ , y˙ , z˙ , λ˙ , γ˙ ) and (x¨ , y¨ , z¨ , λ¨ , γ¨ ) are defined by ⎡ ⎤⎡ ⎤ ⎡ ⎤ x˙ H 0 0 I −I 0
⎢ I I 0 0 0 ⎥ ⎢ y˙ ⎥ ⎢ 0 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ I 0 −I 0 0 ⎥ ⎢ z˙ ⎥ = ⎢ 0 ⎥ ,
(10.12)
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎣ 0 Λ 0 Y 0 ⎦ ⎣ λ˙ ⎦ ⎣ λ ◦ y ⎦
0 0 Γ 0 Z γ ◦ z
γ˙ and
⎡
⎤⎡ ⎤ ⎡ ⎤ x¨ 0
H 0 0 I −I ⎢ I I 0 0 0 ⎥ ⎢ y¨ ⎥ ⎢ ⎥ 0
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ I 0 −I 0 0 ⎥ ⎢ z¨ ⎥ = ⎢ ⎥ ,
0
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎣ 0 Λ 0 Y 0 ⎦ ⎣ λ¨ ⎦ ⎣ −2λ˙ ◦ y
˙ ⎦
0 0 Γ 0 Z γ¨ −2γ˙ ◦ z˙
(10.13)
where Λ = diag(λ ), Γ = diag(γ), Y = diag(y), and Z = diag(z). The first rows of (10.12) and (10.13) are equivalent to Hx˙ = γ˙ − λ˙ ,
Hx¨ = γ¨ − λ¨ .
(10.14)
The next 2 rows of (10.12) and (10.13) are equivalent to x˙ = −y˙ ,
x˙ = z˙ ,
x¨ = −y¨ ,
x¨ = z¨ .
(10.15)
222
• Arc-Search Techniques for Interior-Point Methods
The last 2 rows of (10.12) and (10.13) are equivalent to p ◦ ω˙ + p˙ ◦ ω = p ◦ ω,
(10.16)
p ◦ ω¨ + p¨ ◦ ω = −2p˙ ◦ ω˙ ,
(10.17)
where ◦ denotes the Hadamard product. It has been shown in Chapter 5 that one can avoid the calculation ofba, bb, and bc in the expression of the ellipse. The following formulas are used instead. Theorem 10.1 Let the 5-tuple (x(α), y(α), z(α), λ (α), γ(α)) be an arc defined by (10.11) passing through a point (x, y, z, λ , γ) ∈ E, and its first and second derivatives at (x, y, z, λ , γ) be (x˙ , y˙ , z˙ , λ˙ , γ˙ ) and (x¨ , y¨ , z¨ , λ¨ , γ¨ ) which are defined by (10.12) and (10.13). Then, an ellipse approximation of the central path is given by x(α) = x − x˙ sin(α) + x¨ (1 − cos(α)),
(10.18)
y(α) = y − y˙ sin(α) + y¨ (1 − cos(α)),
(10.19)
z(α) = z − z˙ sin(α) + z¨ (1 − cos(α)), λ (α) = λ − λ˙ sin(α) + λ¨ (1 − cos(α)),
(10.20) (10.21)
γ(α) = γ − γ˙ sin(α) + γ¨ (1 − cos(α)).
(10.22)
Two compact representations for p(α) = (y(α), z(α)) and ω(α) = (λ (α), γ(α)) are given below: p(α) = p − p˙ sin(α) + p¨ (1 − cos(α)),
(10.23)
ω(α) = ω − ω˙ sin(α) + ω¨ (1 − cos(α)).
(10.24)
The duality measure at point (x(α), p(α), ω(α)) is defined as: µ(α) =
λ (α)T y(α) + γ(α)T z(α) p(α)T ω(α) = . 2n 2n ˙
(10.25) ¨
Assuming (y, z, λ , γ) > 0, one can easily see that if yy˙ , zz˙ , λλ , γγ˙ , yy¨ , zz¨ , λλ , γγ¨ are bounded (this will be shown to be true), and if α is small enough, then, y(α) > 0, z(α) > 0, λ (α) > 0, and γ(α) > 0. It will also be shown that searching along this ellipse will reduce the duality measure, i.e., µ(α) < µ. Lemma 10.1 Let (x, y, z, λ , γ) be a strictly feasible point of (BQP), (x˙ , y˙ , z˙ , λ˙ , γ˙ ) and (x¨ , y¨ , z¨ , λ¨ , γ¨ ) meet (10.12) and (10.13), (x(α), y(α), z(α), λ (α), γ(α)) be calculated using (10.18), (10.19), (10.20), (10.21), and (10.22), then the following conditions hold. x(α) + y(α) = e, x(α) − z(α) = −e, Hx(α) + c + λ (α) − γ(α) = 0.
An Arc-Search Algorithm for QP with Box Constraints
•
223
Proof 10.1 Since (x, y, z, λ , γ) is a strictly feasible point, the result follows from direct calculation by using (10.5), (10.12), (10.13), and Theorem 10.1. Lemma 10.2 Let (x˙ , p˙ , ω˙ ) be defined by (10.12), (x¨ , p¨ , ω¨ ) be defined by (10.13), and H be positive semidefinite matrix. Then, the following relations hold: p˙ T ω˙ = x˙ T (γ˙ − λ˙ ) = x˙ T Hx˙ ≥ 0,
(10.26)
p¨ T ω¨ = x¨ T (γ¨ − λ¨ ) = x¨ T Hx¨ ≥ 0,
(10.27)
p¨ T ω˙ = x¨ T (γ˙ − λ˙ ) = x˙ T (γ¨ − λ¨ ) = p˙ T ω¨ = x˙ T Hx¨ ;
(10.28)
−(x˙ T Hx˙ )(1 − cos(α))2 − (x¨ T Hx¨ ) sin2 (α) ≤ (x¨ T (γ˙ − λ˙ ) + x˙ T (γ¨ − λ¨ )) sin(α)(1 − cos(α)) ≤ (x˙ T Hx˙ )(1 − cos(α))2 + (x¨ T Hx¨ ) sin2 (α);
(10.29)
and −(x˙ T Hx˙ ) sin2 (α) − (x¨ T Hx¨ )(1 − cos(α))2 ≤ (x¨ T (γ˙ − λ˙ ) + x˙ T (γ¨ − λ¨ )) sin(α)(1 − cos(α)) ≤ (x˙ T Hx˙ ) sin2 (α) + (x¨ T Hx¨ )(1 − cos(α))2 .
(10.30)
For α = π2 , (10.29) and (10.30) reduce to − x˙ T Hx˙ + x¨ T Hx¨ ≤ (x¨ T Hx˙ + x˙ T Hx¨ ) ≤ x˙ T Hx˙ + x¨ T Hx¨ . Proof 10.2
From (10.15), we have x˙ T (γ˙ − λ˙ ) = z˙ T γ˙ + y˙ T λ˙ = p˙ T ω˙ , x¨ T (γ¨ − λ¨ ) = z¨ T γ¨ + y¨ T λ¨ = p¨ T ω¨ , x¨ T (γ˙ − λ˙ ) = p¨ T ω˙ ,
and x˙ T (γ¨ − λ¨ ) = p˙ T ω¨ .
Pre-multiplying x˙ T and x¨ T to (10.14) gives
x˙ T (γ˙ − λ˙ ) = x˙ T Hx˙ , x¨ T (γ¨ − λ¨ ) = x¨ T Hx¨ ,
x¨ T (γ˙ − λ˙ ) = x¨ T Hx˙ = x˙ T Hx¨ = x˙ T (γ¨ − λ¨ ).
(10.31)
224
• Arc-Search Techniques for Interior-Point Methods
Equations (10.26) and (10.27) follow from the first two equations and the fact that H is positive semidefinite. The last equation is equivalent to (10.28). Using (10.26), (10.27), and (10.28) gives (x˙ (1 − cos(α)) + x¨ sin(α ))T H(x˙ (1 − cos(α)) + x¨ sin(α)) = (x˙ T Hx˙ )(1 − cos(α ))2 + 2(x˙ T Hx¨ ) sin(α)(1 − cos(α)) + (x¨ T Hx¨ ) sin2 (α) = (x˙ T Hx˙ )(1 − cos(α))2 + (x¨ T Hx¨ ) sin2 (α) + (x¨ T (γ˙ − λ˙ ) + x˙ T (γ¨ − λ¨ )) sin(α)(1 − cos(α)) ≥ 0, which is the first inequality of (10.29). Using (10.26), (10.27), and (10.28) also gives (x˙ (1 − cos(α)) − x¨ sin(α ))T H(x˙ (1 − cos(α)) − x¨ sin(α)) = (x˙ T Hx˙ )(1 − cos(α ))2 − 2(x˙ T Hx¨ ) sin(α)(1 − cos(α)) + (x¨ T Hx¨ ) sin2 (α) = (x˙ T Hx˙ )(1 − cos(α ))2 + (x¨ T Hx¨ ) sin2 (α) −(x¨ T (γ˙ − λ˙ ) + x˙ T (γ¨ − λ¨ )) sin(α)(1 − cos(α)) ≥ 0, which is the second inequality of (10.29). Replacing x˙ (1 − cos(α)) and x¨ sin(α) by x˙ sin(α) and x¨ (1 − cos(α)), and using the same method, one can obtain equation (10.30).
From Lemmas 10.2, 5.3, and 9.3, it can be shown that λ˙ γ˙ λ,γ
, pp¨ := yy¨ , zz¨ , and ing two Lemmas.
ω¨ ω
:=
λ¨ γ¨ λ,γ
p˙ p
:=
y˙ z˙ y, z
,
ω˙ ω
:=
are all bounded as claimed in the follow
Lemma 10.3 Let (x, p, ω) = (x, y, z, λ , γ) ∈ N2 (θ ) and (x˙ , p˙ , ω˙ ) = (x˙ , y˙ , z˙ , λ˙ , γ˙ ) meet equation (10.12). Then, � p˙ �2 � ω˙ �2 2n � � � � , (10.32) � � +� � ≤ p 1−θ ω � p˙ �2 � ω˙ �2 � n �2 � � � � , (10.33) � � � � ≤ p 1−θ ω 0≤ Proof 10.3
p˙ T ω˙ 1+θ ≤ n := δ1 n. µ 1−θ
(10.34)
From the last two rows of (10.12), or equivalently (10.16), it must have Λy˙ + Yλ˙ = ΛYe, Γz˙ + Zγ˙ = ΓZe. 1
1
Pre-multiplying Y− 2 Λ− 2 on both sides of the first equality gives 1 1 1 1 1 1 Y− 2 Λ 2 y˙ + Y 2 Λ− 2 λ˙ = Y 2 Λ 2 e.
An Arc-Search Algorithm for QP with Box Constraints 1
•
225
1
Pre-multiplying Z− 2 Γ− 2 on both sides of the second equality gives 1
1
1
1
1
1
Z− 2 Γ 2 z˙ + Z 2 Γ− 2 γ˙ = Z 2 Γ 2 e. (10.35) � � � � � � 1 1 1 1 1 1 Y 2 Λ− 2 λ˙ Y2 Λ2 e Y− 2 Λ 2 y˙ Let u = ,v= , and w = , using (10.15) 1 1 1 1 1 1 Z− 2 Γ 2 z˙ Z 2 Γ− 2 γ˙ Z2 Γ2 e and Lemma 10.2 yields uT v = y˙ T λ˙ + z˙ T γ˙ = x˙ T (γ˙ − λ˙ ) ≥ 0. Using Lemma 9.3 and (10.8) yields � � � � n � 2 n 2γ ˙ 2 yi γ˙2 zi � y ˙ λ z ˙ λ i i i i IuI2 + IvI2 = + i + + i yi zi γi λi i=1
≤
n � i=1
i=1
(yi λi + zi γi ) =
2n �
pi ωi = 2nµ.
(10.36)
i=1
Since pi > 0 and ωi > 0, dividing both sides of the inequality by mini (pi ωi ) and using (10.10) gives � � � � n � 2 n � p˙ �2 � ω˙ �2 � y˙i z˙i2 γ˙i2 λ˙ i2 2nµ 2n � � � � + 2 + + 2 = � � +� � ≤ ≤ . 2 2 p ω min (p ω ) 1 −θ y z γ λ i i i i i i i i=1 i=1 (10.37) This proves (10.32). Combining (10.32) and Lemma 5.3 yields � p˙ �2 � ω˙ �2 � n �2 � � � � . � � � � ≤ p (1 − θ ) ω This leads to
� p˙ �� ω˙ � n � �� � . � �� � ≤ p ω (1 − θ )
(10.38)
Therefore, using (10.10) and Cauchy-Schwarz inequality yields � �T � � p˙ T ω˙ |p˙ |T |ω˙ | |p˙ |T |ω˙ | |ω˙ | |p˙ | ≤ ≤ (1 + θ ) ≤ (1 + θ ) µ maxi (pi ωi ) µ ω p � p˙ �� ω˙ � 1 + θ � �� � ≤ (1 + θ )� �� � ≤ n, (10.39) p ω 1−θ which is the second inequality of (10.34). From Lemma 10.2, p˙ T ω˙ = x˙ T (γ˙ − λ˙ ) = x˙ T Hx˙ ≥ 0, the first inequality of (10.34) follows. Lemma 10.4 Let (x, p, ω) = (x, y, z, λ , γ) ∈ N2 (θ ), (x˙ , y˙ , z˙ , λ˙ , γ˙ ) and (x¨ , y¨ , z¨ , λ¨ , γ¨ ) meet equations (10.12) and (10.13). Then � p¨ �2 � ω¨ �2 4(1 + θ )n2 � � � � , � � +� � ≤ p ω (1 − θ )3
(10.40)
226
• Arc-Search Techniques for Interior-Point Methods
� p¨ �2 � ω¨ �2 � 2(1 + θ )n2 �2 � � � � , � � � � ≤ p (1 − θ )3 ω
(10.41)
p¨ T ω¨ 2(1 + θ )2 2 ≤ n := δ2 n2 , µ (1 − θ )3 � p˙ T ω¨ � (2n(1 + θ )) 32 � p¨ T ω˙ � (2n(1 + θ )) 32 3 3 � � � � 2 := δ3 n , � := δ3 n 2 . � �≤ �≤ (1 − θ )2 µ µ (1 − θ )2 0≤
Proof 10.4 have
(10.42) (10.43)
Similar to the proof of Lemma 10.3, from (10.13) and (10.17), it must Λy¨ + Yλ¨ = −2 y˙ ◦ λ˙ ⇐⇒
1 1 1 1 1 1 Y− 2 Λ 2 y¨ + Y 2 Λ− 2 λ¨ = −2Y− 2 Λ− 2 y˙ ◦ λ˙ ,
and Γz¨ + Zγ¨ = −2 (z˙ ◦ γ˙ ) 1
1
1
1
1
1
⇐⇒ Z− 2 Γ 2 z¨ + Z 2 Γ− 2 γ¨ = −2Z− 2 Γ− 2 (z˙ ◦ γ˙ ) . � � � 1 � � � 1 1 1 1 1 −2Y− 2 Λ− 2 y˙ ◦ λ˙ Y 2 Λ− 2 λ¨ Y− 2 Λ 2 y¨ Let u = , v= , and w = , 1 1 1 1 1 1 Z− 2 Γ 2 z¨ Z 2 Γ− 2 γ¨ −2Z− 2 Γ− 2 (z˙ ◦ γ˙ ) using (10.15) and Lemma 10.2 yields uT v = y¨ T λ¨ + z¨ T γ¨ = x¨ T (γ¨ − λ¨ ) ≥ 0. Using Lemma 9.3 yields � � � � n � 2 n � y¨i λi z¨2i γi λ¨ i2 yi γ¨i2 zi 2 2 IuI + IvI = + + + yi zi γi λi i=1 i=1 � � � �2 1 1 1 1 � �2 � � ≤ �−2Y− 2 Λ− 2 y˙ ◦ λ˙ � + �−2Z− 2 Γ− 2 (z˙ ◦ γ˙)� � � n � y˙2i λ˙ i2 z˙2i γ˙i2 = 4 + . yi λi z i γi i=1
Dividing both sides of the inequality by µ and using (10.10), we have � n �
� �� n � y¨2 z¨2 � � λ¨ i2 γ¨i2 i i (1 − θ ) + 2 + + y2i zi λi2 γi2 i=1 i=1 �� � � � � � p¨ �2 � ω¨ �2 = (1 − θ ) � � + � � p ω � n � �� � y˙2 λ˙ 2 z˙2 γ˙2 i i ≤ 4(1 + θ ) + 2i i2 , 2 λ2 y zi γi i i i=1 in view of Lemma 10.3, this leads to � p¨ �2 � ω¨ �2 � � � � 1+θ � 1+θ � � � � � � p˙ ω˙ �2 � p˙ �2 � ω˙ �2 4(1 + θ )n2 . (10.44) � � +� � ≤ 4 � ◦ � ≤4 � � � � ≤ p 1−θ p ω ω 1−θ p ω (1 − θ )3
An Arc-Search Algorithm for QP with Box Constraints
•
227
This proves (10.40). Combining (10.40) and Lemma 5.3 yields � p¨ �2 � ω¨ �2 � 2(1 + θ )n2 �2 � � � � . � � � � ≤ p (1 − θ )3 ω Using (10.10) and Cauchy-Schwarz inequality yields p¨ T ω¨ µ
≤ ≤
� �T � � ¨ |p| |p¨ |T |ω¨ | |p¨ |T |ω¨ | |ω¨ | ≤ (1 + θ ) ≤ (1 + θ ) µ maxi (pi ωi ) p ω � p¨ �� ω¨ � 2n2 (1 + θ )2 � �� � (1 + θ )� �� � ≤ , (1 − θ )3 p ω
which is the second inequality of (10.42). Using (10.15) and Lemma 10.2, one must have p¨ T ω¨ = y¨ T λ¨ + z¨ T γ¨ = x¨ T (γ¨ − λ¨ ) = x¨ T Hx¨ ≥ 0. This proves the first inequality of (10.42). Finally, using (10.10), Cauchy-Schwarz inequality, (10.32), and (10.40) yields � T � � �T � � �p˙ ω¨ � |p˙ |T |ω¨ | |p˙ |T |ω¨ | |p˙ | |ω¨ | ≤ (1 + θ ) ≤ ≤ (1 + θ ) µ maxi (pi ωi ) µ p ω 1 1 � � � � 3 � p˙ �� ω¨ � 2 2n 4(1 + θ )n2 2 (2n(1 + θ )) 2 � �� � ≤ (1 + θ )� �� � ≤ (1 + θ ) ≤ . p ω 1−θ (1 − θ )2 (1 − θ )3 This proves the first inequality of (10.43). Replacing p˙ by p¨ and ω¨ by ω˙ , then using the same reasoning, one can prove the second inequality of (10.43).
From the bounds established in Lemmas 10.2, 10.3, 10.4, and 5.4, the lower bound and upper bound for µ(α) can be obtained. Lemma 10.5 Let (x, p, ω) = (x, y, z, λ , γ) ∈ N2 (θ ), (x˙ , y˙ , z˙ , λ˙ , γ˙ ) and (x¨ , y¨ , z¨ , λ¨ , γ¨ ) meet equations (10.12) and (10.13). Let x(α), y(α), z(α), λ (α), and γ(α) be defined by (10.18), (10.19), (10.20), (10.21), and (10.22). Then, 1 T x˙ Hx˙ (1 − cos(α))2 + sin2 (α) 2n 1 ≤µ(α) = µ(1 − sin(α)) + x¨ T (γ¨ − λ¨ ) − x˙ T (γ˙ − λ˙ ) (1 − cos(α))2 2n 1 − x˙ T (γ¨ − λ¨ ) + x¨ T (γ˙ − λ˙ ) sin(α)(1 − cos(α)) 2n 1 ≤µ(1 − sin(α)) + x¨ T Hx¨ (1 − cos(α))2 + sin2 (α) . (10.45) 2n µ(1 − sin(α)) −
Proof 10.5
Using (10.19), (10.21), (10.16), and (10.17), one must have
yT (α)λ (α)
228
• Arc-Search Techniques for Interior-Point Methods
= yT − y˙ T sin(α) + y¨ T (1 − cos(α))
λ − λ˙ sin(α) + λ¨ (1 − cos(α))
=yT λ − yT λ˙ sin(α) + yT λ¨ (1 − cos(α))
− y˙ T λ sin(α) + y˙ T λ˙ sin2 (α) − y˙ T λ¨ sin(α)(1 − cos(α))
+ y¨ T λ (1 − cos(α)) − y¨ T λ˙ sin(α)(1 − cos(α)) + y¨ T λ¨ (1 − cos(α))2 ˙ sin(α) + (yT λ¨ + λ T y)(1 ¨ − cos(α))
=yT λ − (yT λ˙ + λ T y)
T
− (y˙ T λ¨ + λ˙ y¨ ) sin(α)(1 − cos(α)) + y˙ T λ˙ sin2 (α) + y¨ T λ¨ (1 − cos(α))2
=yT λ (1 − sin(α)) − 2y˙ T λ˙ (1 − cos(α))
T
− (y˙ T λ¨ + λ˙ y¨ ) sin(α)(1 − cos(α))
+ y˙ T λ˙ (1 − cos2 (α)) + y¨ T λ¨ (1 − cos(α))2
=yT λ (1 − sin(α)) + (y¨ T λ¨ − y˙ T λ˙ )(1 − cos(α))2
T
− (y˙ T λ¨ + λ˙ y¨ ) sin(α)(1 − cos(α)).
(10.46)
Using (10.20), (10.22), (10.16), (10.17), and a similar derivation of (10.46), one gets zT (α)γ(α) = zT γ(1 − sin(α)) + (z¨ T γ¨ − z˙ T γ˙ )(1 − cos(α))2 −(z˙ T γ¨ + γ˙ T z¨ ) sin(α)(1 − cos(α)).
(10.47)
Combining (10.46) and (10.47), then using (10.15) and (10.29) yields 2nµ(α) = pT (α)ω(α)
=yT (α)λ (α) + zT (α)γ(α)
=(yT λ + zT γ)(1 − sin(α)) + (y¨ T λ¨ + z¨ T γ¨ − y˙ T λ˙ − z˙ T γ˙ )(1 − cos(α))2
− (y˙ T λ¨ + z˙ T γ¨ + y¨ T λ˙ + z¨ T γ˙ ) sin(α)(1 − cos(α)) =(yT λ + zT γ)(1 − sin(α)) + (x¨ T (γ¨ − λ¨ ) − x˙ T (γ˙ − λ˙ ))(1 − cos(α))2 − (x˙ T (γ¨ − λ¨ ) + x¨ T (γ˙ − λ˙ )) sin(α)(1 − cos(α)) T
T
T
T
(10.48)
2
≤(y λ + z γ) (1 − sin(α)) + (x¨ Hx¨ − x˙ Hx˙ )(1 − cos(α)) + x˙ T Hx˙ (1 − cos(α))2 + x¨ T Hx¨ sin2 (α)
=(yT λ + zT γ) (1 − sin(α)) + x¨ T Hx¨ (1 − cos(α))2 + x¨ T Hx¨ sin2 (α).
Dividing both sides by 2n proves the second inequality of the lemma. Combining (10.48) and (10.30) proves the first inequality of the lemma.
To keep all the iterates of the algorithm inside the strictly feasible set, (p(α), ω(α)) > 0 for all iterations is required. This is guaranteed when µ(α) > 0 holds. The following corollary states the condition for µ(α) > 0 to hold.
An Arc-Search Algorithm for QP with Box Constraints
•
229
Corollary 10.1 If µ > 0, then, for any fixed θ ∈ (0, 1), there is an α¯ > 0 depending on θ , such that for any sin(α) ≤ sin(α¯ ), µ(α) > 0. In particular, if θ = 0.19, sin(α¯ ) ≥ 0.6158. Proof 10.6 Using the formulas x˙ T Hx˙ T = x˙ T (γ˙ − λ˙ ) = p˙ T ω˙ and ((1 − cos(α))2 ≤ sin4 (α) presented in Lemmas 10.2 and 5.4, we can rewrite the first inequality of (10.45) as � � 1 T 4 2 µ(α) ≥ µ 1 − sin(α) − p˙ ω˙ sin (α) + sin (α) 2nµ � � (1 + θ ) 4 2 := µr(α) ≥ µ 1 − sin(α) − sin (α) + sin (α) 2(1 − θ ) where (10.34) is used for the second inequality. Since µ > 0, and r(α) is a mono tonic decreasing function in [0, π2 ] with r(0) > 0 and r( π2 ) < 0, there is a unique real solution sin(α¯ ) ∈ (0, 1) of r(α) = 0 such that for all sin(α) < sin(α¯ ), r(α) > 0 , or µ(α) > 0. It is easy to check that if θ = 0.19, sin(α¯ ) = 0.6158 is the solution of r(α) = 0. Remark 10.1 Corollary 10.1 indicates that for any θ ∈ (0, 1), there is a positive α¯ such that for α ≤ α¯ , µ(α) > 0. Intuitively, to search in a wider region will generate a longer step. Therefore, the larger the θ is, the better. To derive the convergence result, θ ≤ 0.148 is imposed in Lemma 9.12 and θ ≤ 0.19 is imposed in Lemma 10.13. This is an indication that an algorithm specifically derived for convex QP with box constraint will be more efficient than the general algorithm for convex QP proposed in Chapter 9.
To reduce the duality measure in an iteration, it must have µ(α) ≤ µ. For linear programming, it has been shown in Chapter 5 that µ(α) ≤ µ for α ∈ [0, αˆ ] with αˆ = π2 , and the larger the α in the interval is, the smaller the µ(α) will be. This claim is not true for the convex quadratic programming with box constraints and it needs to be modified as follows. Lemma 10.6 Let (x, p, ω) = (x, y, z, λ , γ) ∈ N2 (θ ), (x˙ , y˙ , z˙ , λ˙ , γ˙ ) and (x¨ , y¨ , z¨ , λ¨ , γ¨ ) meet equations (10.12) and (10.13). Let x(α), y(α), z(α), λ (α), and γ(α) be defined by (10.18), (10.19), (10.20), (10.21), and (10.22). Then, there exists ⎧ 1 π x¨ T Hx¨ ⎪ ⎨ 2 , if nµ ≤ 2 αˆ = (10.49) ⎪ ⎩ −1 x¨ T Hx¨ 1 sin (g), if nµ > 2
230
• Arc-Search Techniques for Interior-Point Methods
where � � � � 3 nµ nµ � g= + T T x¨ Hx¨ x¨ Hx¨
� � � �3 � � 2 3 1 nµ nµ � + + − T T 3 x¨ Hx¨ x¨ Hx¨
2
+
� �3 1 , 3
such that for every α ∈ [0, αˆ ], µ(α) ≤ µ. Proof 10.7
From the second inequality of (10.45), we have � � x¨ T Hx¨ 3 x¨ T Hx¨ µ(α) − µ ≤ µ sin(α) −1 + sin(α) + sin (α) . 2nµ 2nµ
Clearly, if
x¨ T Hx¨ 2nµ
(10.50)
≤ 12 , for any α ∈ [0, π2 ], the function � � x¨ T Hx¨ 3 x¨ T Hx¨ sin(α) + sin (α) ≤ 0, f (α) := −1 + 2nµ 2nµ
(10.51)
T
Hx¨ and µ(α) ≤ µ. If x¨2nµ > 21 , noticing that f (α) is a monotonic increasing function of α with f (0) < 0 and f ( π2 ) > 0, f (α) has a unique positive root in (0, π2 ). The solution is given by Cardano’s formula as follows. � � � � � � �3 � � �3 � � 2 3 3 nµ nµ 1 nµ nµ 2 1 � � ˆ sin(α ) = + + + − + . x¨ T Hx¨ x¨ T Hx¨ 3 x¨ T Hx¨ x¨ T Hx¨ 3 (10.52) This proves the Lemma.
According to Theorem 10.1, Lemmas 10.1, 10.3, 10.4, and 10.6, if α is small enough, then (p(α), ω(α)) > 0, and µ(α) < µ, i.e., the search along the ellipse defined by Theorem 10.1 will generate a strictly feasible point with a smaller duality measure. Since (p, ω) > 0 holds in all iterations, reducing the duality measure to zero means approaching the solution of the convex quadratic pro gramming. This can be achieved by applying a similar idea to the one that was used in Chapters 3 and 5, i.e., starting with an iterate in N2 (θ ), searching along the approximated central path to reduce the duality measure and to keep the iter ate in N2 (2θ ), and then making a correction to move the iterate back to N2 (θ ). The following notations will be used. a0 = −θ µ < 0, a1 = θ µ > 0, a2 = 2θ
p˙ T ω˙ x˙ T (γ˙ − λ˙ ) x˙ T Hx˙ = 2θ = 2θ ≥ 0, 2n 2n 2n
An Arc-Search Algorithm for QP with Box Constraints
a3
•
231
� � 1 � � = � p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ T ω¨ + ω˙ T p¨ )e� ≥ 0,
2n
and
a4
� � p˙ T ω˙ 1 � � = � p¨ ◦ ω¨ − ω˙ ◦ p˙ − (p¨ T ω¨ − ω˙ T p˙ )e� + 2θ
2n
2n
� � T ˙ ˙ x 1 T Hx � � = � p¨ ◦ ω¨ − ω˙ ◦ p˙ − (p¨ ω¨ − ω˙ T p˙ )e� + 2θ
≥ 0.
2n
2n
Denote a quartic polynomial in terms of sin(α) as follows: q(α) = a4 sin4 (α) + a3 sin3 (α) + a2 sin2 (α) + a1 sin(α) + a0 = 0.
(10.53)
Since q(α) is a monotonic increasing function of α ∈ [0, π2 ], q(0) = −θ µ < 0 and q( π2 ) = a2 + a3 + a4 ≥ 0, the polynomial has exactly one positive root in [0, π2 ]. Moreover, since (10.53) is a quartic equation, all the solutions are analytical and the computational cost is independent of m and n, and negligible [113]. Lemma 10.7 Let (x, y, z, λ , ω) ∈ N2 (θ ), (x˙ , y˙ , z˙ , λ˙ , ω˙ ) and (x¨ , y¨ , z¨ , λ¨ , ω¨ ) be calculated from (10.12) and (10.13). Denote by sin(α˜ ) ∈ [0, 1] the only positive real solution of (10.53). Assume sin(α) < min{sin(α˜ ), sin(α¯ )}, let (x(α), y(α), z(α), λ (α), γ(α)) and µ(α) be updated as follows: (x(α), y(α), z(α), λ (α), γ(α)) = (x, y, z, λ , γ) − (x˙ , y˙ , z˙ , λ˙ , γ˙ ) sin(α) + (x¨ , y¨ , z¨ , λ¨ , γ¨ )(1 − cos(α)),(10.54) µ(α) = +
µ(1 − sin(α)) 1 (p¨ T ω¨ − p˙ T ω˙ )(1 − cos(α))2 − (p˙ T ω¨ + p¨ T ω˙ ) sin(α)(1 − cos(α)) . 2n (10.55)
Then, (x(α), y(α), z(α), λ (α), γ(α)) ∈ N2 (2θ ). Proof 10.8 Since sin(α˜ ) is the only positive real solution of (10.53) in [0, 1] and q(0) < 0, substituting a0 , a1 , a2 , a3 and a4 into (10.53) yields, for all sin(α) ≤ sin(α˜ ), �� �� 1 T � � T ¨ ˙ ¨ ˙ ˙ ¨ ˙ ¨ p ◦ ω − ω ◦ p − (p ω − ω p )e � � sin4 (α) 2n �� �� 1 � � + �p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ T ω¨ + ω˙ T p¨ )e� sin3 (α) 2n � � � � p˙ T ω˙ p˙ T ω˙ 4 ≤ − 2θ sin2 (α) + θ µ(1 − sin(α)). (10.56) sin (α) − 2θ 2n 2n
232
• Arc-Search Techniques for Interior-Point Methods
Using (10.23), (10.24), (10.16), (10.17), (10.55), Lemma 5.4, (10.56), and the first inequality of (10.45) yields � � � � �p(α) ◦ ω(α) − µ(α)e� � � � � =� p − p˙ sin(α) + p¨ (1 − cos(α)) ◦ ω − ω˙ sin(α) + ω¨ (1 − cos(α)) − µ(α)e� � � � 1 � =�(p ◦ ω − µe)(1 − sin(α)) + p¨ ◦ ω¨ − p˙ ◦ ω˙ − (p¨ T ω¨ − p˙ T ω˙ )e (1 − cos(α))2 2n � � � 1 T � − p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ ω¨ + p¨ T ω˙ )e sin(α)(1 − cos(α))� 2n � � � � 1 � � � � ≤(1 − sin(α))�p ◦ ω − µe� + �(p¨ ◦ ω¨ − p˙ ◦ ω˙ − (p¨ T ω¨ − p˙ T ω˙ ))e�(1 − cos(α))2 2n � � 1 � � + �(p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ T ω¨ + p¨ T ω˙ )e� sin(α)(1 − cos(α)) (10.57) 2n � � 1 � � ≤θ µ(1 − sin(α)) + �(p¨ ◦ ω¨ − p˙ ◦ ω˙ − (p¨ T ω¨ − p˙ T ω˙ ))e� sin4 (α) + a3 sin3 (α) 2n p˙ T ω˙ ≤2θ µ(1 − sin(α)) − 2θ (sin4 (α) + sin2 (α)) 2n � � x˙ T Hx˙ ≤2θ µ(1 − sin(α)) − (1 − cos(α))2 + sin2 (α) 2n ≤2θ µ(α).
(10.58)
Hence, the point (x(α), p(α), ω(α)) satisfies the proximity condition for N2 (2θ ). To check the positivity condition (p(α), ω(α)) > 0, in view of the initial condition (p, ω) > 0, it follows from (10.58) and Corollary 10.1 that, for sin(α) < sin(α¯ ) and θ < 0.5, (10.59) pi (α)ωi (α) ≥ (1 − 2θ )µ(α) > 0. Therefore, it cannot have pi (α) = 0 or ωi (α) = 0 for any index i when α ∈ [0, sin−1 (α¯ )). This proves (p(α), ω(α)) > 0. Remark 10.2 It is worthwhile to note, by examining the proof of Lemma 10.7, that sin(α˜ ) is selected for the proximity condition (10.58) to hold, and sin(α¯ ) is selected for µ(α) > 0, thereby insuring that the positivity condition (10.59) holds.
The lower bound of sin(α¯ ) is estimated in Corollary 10.1. To estimate the lower bound of sin(α˜ ), the following lemma is needed. Lemma 10.8 Let (x, p, ω) ∈ N2 (θ ), (x˙ , p˙ , ω˙ ) and (x¨ , p¨ , ω¨ ) meet equations (10.12) and (10.13). Then, � � (1 + θ ) � � nµ, (10.60) �p˙ ◦ ω˙ � ≤ (1 − θ )
An Arc-Search Algorithm for QP with Box Constraints
� � 2(1 + θ )2 � � n2 µ, �p¨ ◦ ω¨ � ≤ (1 − θ )3 � � 2√2(1 + θ ) 32 3 � � n 2 µ, �p¨ ◦ ω˙ � ≤ (1 − θ )2 � 2√2(1 + θ ) 32 3 � � � n 2 µ. �p˙ ◦ ω¨ � ≤ (1 − θ )2 Proof 10.9
•
233
(10.61) (10.62) (10.63)
Since �2 2n � �2 2n � � ω˙ �2 � � p˙ �2 � p˙i ω˙ i � � � � , � � = , � � = p pi ωi ω i=1
i=1
from Lemma 10.3 and (10.10), it must have �2 n 1−θ � 2n � � � � 2n � � � � p˙ �2 � ω˙ �2 � p˙i 2 � ω˙ i 2 � � � � ≥ � � � � = pi ωi p ω i=1 i=1 �2 � 2n � � � p˙i ω˙ i � p˙ ω˙ �2 ≥ =� ◦ � p ω pi ωi i=1 �2 2n � � �2 � 1 p˙i ω˙ i � � ˙ ˙ ≥ = p ◦ ω � � , (1 + θ )µ (1 + θ )2 µ 2 �
i=1
i.e.,
This proves (10.60). Using
� �2 � 1 + θ �2 � � nµ . �p˙ ◦ ω˙ � ≤ 1−θ �2 2n � 2n � �2 � p¨ �2 � � ω¨ �2 � ω¨ i p¨i � � � � , , � � = � � = p ωi pi ω i=1
i=1
and Lemma 10.4, then following the same procedure, it is easy to verify (10.61). From (10.32) and (10.40), one obtains � �� � �� � � � � �� � � � � 2n 4(1 + θ )n2 � p˙ �2 � ω˙ �2 � p¨ �2 � ω¨ �2 ≥ � � +� � � � +� � 3 (1 − θ ) (1 − θ ) p ω p ω � p¨ �2 � ω˙ �2 � p˙ �2 � ω¨ �2 � � � � � � � � ≥ � � � � +� � � � p p ω ω � 2n � � � � 2n � � � � 2n � � � � 2n � � � 2 � p¨i � ω˙ i 2 � p˙i 2 � ω¨ i 2 = + pi pi ωi ωi i=1
i=1
i=1
i=1
234
• Arc-Search Techniques for Interior-Point Methods
≥ ≥ =
�2 2n � � p¨i ω˙ i i=1 2n � �
pi ωi
+
�2 2n � � p˙i ω¨ i i=1
pi ωi 2n � �
�2 �2 p¨i ω˙ i p˙i ω¨ i + (1 + θ )µ (1 + θ )µ i=1 i=1 �� � � �2 � 1 � �2 � � ˙ ¨ ¨ ˙ p ◦ ω p ◦ ω + � � � � , (1 + θ )2 µ 2 (10.64)
i.e., � �2 � �2 (2n)3 (1 + θ )3 � � � � µ 2. �p¨ ◦ ω˙ � + �p˙ ◦ ω¨ � ≤ (1 − θ )4
This proves the lemma.
Lemma 10.7 established a relation between the solution of (10.53) and the proximity condition of (x(α), p(α), ω(α)) ∈ N2 (2θ ). By estimating the solu tion of (10.53), we will find an estimation of sin(α˜ ) such that, for all α ≤ α˜ , the proximity condition of (x(α), p(α), ω(α)) ∈ N2 (2θ ) holds. The following lemma gives the estimation of sin(α˜ ). Lemma 10.9 Let θ ≤ 0.22. Then, sin(α˜ ) ≥
θ √
n
.
Proof 10.10 First notice from (10.53) that q(sin(α)) is a monotonic increasing function of sin(α) for α ∈ [0, π2 ]. Also, q(sin(0)) < 0 and q(sin( π2 )) ≥ 0 hold, there fore, one needs only to show that q( √θn ) < 0 for θ ≤ 0.22. Using Lemma 9.11 yields � � � � � � 1 � � � � � � �p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ T ω¨ + ω˙ T p¨ )e� ≤ �p˙ ◦ ω¨ � + �ω˙ ◦ p¨ �, 2n � � � � � � 1 � � � � � � �p¨ ◦ ω¨ − ω˙ ◦ p˙ − (p¨ T ω¨ − ω˙ T p˙ )e� ≤ �p¨ ◦ ω¨ � + �ω˙ ◦ p˙ �. 2n Substituting these relations into (10.53) and using Lemmas 10.8, 10.3, and 10.4, we have, for α ∈ [0, π2 ], �� � � � � p˙ T ω˙ � � � � q(sin(α)) ≤ �p¨ ◦ ω¨ � + �ω˙ ◦ p˙ � + 2θ sin4 (α) 2n � � � � � � � � + �p˙ ◦ ω¨ � + �ω˙ ◦ p¨ � sin3 (α) p˙ T ω˙ + 2θ sin2 (α) + θ µ sin(α) − θ µ 2n �� � 2(1 + θ )2 2 n(1 + θ ) θ (1 + θ ) ≤µ n + + sin4 (α) (1 − θ )3 (1 − θ ) (1 − θ )
An Arc-Search Algorithm for QP with Box Constraints
•
235
√ (1 + θ ) 32 3 3 +4 2 n 2 sin (α) (1 − θ )2 � θ (1 + θ ) 2 + sin (α) + θ sin(α) − θ . (1 − θ ) Since n ≥ 1 and θ > 0, substituting sin(α) = θ q √ ≤µ n
�
θ √
n
gives
2(1 + θ )2 2 n(1 + θ ) θ (1 + θ ) + n + (1 − θ ) (1 − θ ) (1 − θ )3
�
θ4 n2
√ (1 + θ ) 32 n 32 θ 3 θ (1 + θ ) θ 2 θ +4 2 +θ √ −θ 3 + 2 (1 − θ ) n 2 (1 − θ ) n n 3 2 3 4 θ (1 + θ ) θ (1 + θ ) 2θ (1 + θ ) =θ µ + + (1 − θ )3 n(1 − θ ) (1 − θ )n2 √ 2 3 4 2θ (1 + θ ) 2 θ 2 (1 + θ ) θ + + + √ −1 2 (1 − θ ) n(1 − θ ) n
2θ 3 (1 + θ )2 θ 3 (1 + θ ) θ 4 (1 + θ ) + + (1 − θ )3 (1 − θ ) (1 − θ ) √ 2 3 2 4 2θ (1 + θ ) 2 θ (1 + θ ) + + + θ − 1 := θ µ p(θ ). (1 − θ )2 (1 − θ )
≤θ µ
(10.65)
Since p(θ ) is monotonic increasing function of θ ∈ [0, 1), p(0) < 0, and it is easy to verify that p(0.22) < 0, this proves the lemma.
Corollary 10.1, Lemmas 10.7, and 10.9 prove the feasibility of searching op timizer along the ellipse. To move the iterate back to N2 (θ ), one can use the direction (Δx, Δy, Δz, Δλ , Δγ), defined by ⎡ ⎤⎡ ⎤ ⎡ ⎤ H 0 0 I −I Δx 0
⎢ I ⎢ ⎥ ⎢ ⎥ I 0 0 0 ⎥ 0
⎢ ⎥ ⎢ Δy ⎥ ⎢ ⎥ ⎢ I ⎥ ⎢ ⎥ ⎢ ⎥ .
0 −I 0 0 ⎥ ⎢ Δz ⎥ = ⎢ 0
⎢ ⎥ ⎣ 0 Λ(α) 0 Y(α)
0 ⎦ ⎣ Δλ ⎦ ⎣ µ(α)e − λ (α) ◦ y(α) ⎦
0 0 Γ(α) 0 Z(α) Δγ µ(α)e − γ(α) ◦ z(α) (10.66) and update (xk+1 , pk+1 , ω k+1 ) and µk+1 by (xk+1 , pk+1 , ω k+1 ) = (x(α), p(α), ω(α)) + (Δx, Δp, Δω),
(10.67)
T
pk+1 ω k+1 µk+1 = , 2n where Δp = (Δy, Δz)
(10.68)
236
• Arc-Search Techniques for Interior-Point Methods
and Δω = (Δλ , Δγ). � � � Y(α) 0 Λ(α) 0 Denote P(α) = , Ω(α) = , and D = 0 Z(α) 0 Γ(α) 1 1 P 2 (α)Ω− 2 (α). Then, the last 2 rows of (10.66) can be rewritten as �
PΔω + ΩΔp = µ(α)e − P(α)Ω(α)e.
(10.69)
Now, we are ready to show that the correction step brings the iterate from N2 (2θ ) back to N2 (θ ). Lemma 10.10 Let (x(α), p(α), ω(α)) ∈ N2 (2θ ) and (Δx, Δp, Δω) be defined as in (10.66). Let (xk+1 , pk+1 , ω k+1 ) be updated by using (10.67). Then, for θ ≤ 0.29 and sin(α) ≤ sin(α¯ ), (xk+1 , pk+1 , ω k+1 ) ∈ N2 (θ ). Proof 10.11
Using Lemma 9.11 yields � �2 1 � � 0 ≤ �Δp ◦ Δω − (ΔpT Δω)e� ≤ IΔp ◦ ΔωI2 . 2n
Pre-multiplying P(α)Ω(α)
− 12
(10.70)
on the both sides of (10.69) yields
DΔω + D−1 Δp = P(α)Ω(α)
− 12
µ(α)e − P(α)Ω(α)e .
Let u = DΔω, v = D−1 Δp, from the first three rows of (10.66), it must have uT v = ΔpT Δω = ΔyT Δλ + ΔzT Δγ = ΔxT (Δγ − Δλ ) = ΔxT HΔx ≥ 0.
(10.71)
Using Lemma 3.2, the last two rows of (10.66), and the assumption of (x(α), p(α), ω(α)) ∈ N2 (2θ ) yields � � � � �Δp ◦ Δω �
� � � 3� � � = �u ◦ v� ≤ 2− 2 � P(α)Ω(α) 3
= 2− 2 3
≤
3
2− 2
�2 � µ(α)e − P(α)Ω(α)e �
2n � (µ(α) − pi (α)ωi (α))2 i=1
≤ 2− 2
− 12
pi (α)ωi (α)
Iµ(α)e − p(α) ◦ ω(α)I2 mini pi (α)ωi (α) 2 1 θ µ(α) (2θ )2 µ(α)2 = 22 . (1 − 2θ )µ(α) (1 − 2θ )
(10.72)
An Arc-Search Algorithm for QP with Box Constraints
•
237
Define (pk+1 (t), ω k+1 (t)) = (p(α), ω(α)) + t(Δp, Δω). From (10.69) and (10.25), one gets p(α)T Δω + ω(α)T Δp = 2nµ(α) −
2n �
pi (α)ωi (α) = 0.
(10.73)
i=1
Therefore, p(α) + tΔp µk+1 (t) = =
T
ω(α) + tΔω
2n ΔpT Δω p(α)T ω(α) + t 2 ΔpT Δω = µ(α) + t 2 . 2n 2n
(10.74)
In view of (10.71), we have ΔpT Δω = ΔxT HΔx ≥ 0, therefore, it must have µk+1 (t) ≥ µ(α). Using (10.74), (10.69), (10.70), and (10.72) yields
= =
=
=
� � � k+1 � �p (t) ◦ ω k+1 (t) − µk+1 (t)e� � � t2 � � ΔpT Δω e� �(p(α) + tΔp) ◦ (ω(α) + tΔω) − µ(α)e − 2n � � �p(α) ◦ ω(α) + t[ω(α) ◦ Δp + p(α) ◦ Δω] � t2 � +t 2 Δp ◦ Δω − µ(α)e − ΔpT Δω e� 2n � � �p(α) ◦ ω(α) + t[µ(α)e − p(α) ◦ ω(α)] � t2 � +t 2 Δp ◦ Δω − µ(α)e − ΔpT Δω e� 2n � �� � 1 � � 2 T Δp Δω e � �(1 − t) [p(α) ◦ ω(α) − µ(α)e] + t Δp ◦ Δω − 2n 1
22 θ2 ≤ (1 − t)(2θ )µ(α) + t µ(α) (1 − 2θ ) � � 1 22 θ2 ≤ (1 − t)(2θ ) + t 2 µk+1 := f (t, θ )µk+1 . (1 − 2θ ) 2
� � � � Therefore, taking t = 1 gives �pk+1 ◦ ω k+1 − µk+1 e� ≤ that, for θ ≤ 0.29, 1 22 θ2 = 0.2832 < θ . (1 − 2θ )
(10.75)
1
2 2 θ2 (1−2θ ) µk+1 .
It is easy to see
For θ ≤ 0.29 and t ∈ [0, 1], noticing
0 ≤ f (t, θ ) ≤ f (t, 0.29) ≤ 0.58(1 − t) + 0.2832t 2 < 1,
238
• Arc-Search Techniques for Interior-Point Methods
and using (10.74), (10.71), and Corollary 10.1, one gets, for an additional condition sin(α) ≤ sin−1 (α¯ ), pk+1 (t)ωik+1 (t) ≥ (1 − f (t, θ )) µk+1 (t) i � � t2 = (1 − f (t, θ )) µ(α) + ΔpT Δω n ≥ (1 − f (t, θ )) µ(α) > 0,
(10.76)
Therefore, (pk+1 (t), ω k+1 (t)) > 0 for t ∈ [0, 1], i.e., (pk+1 , ω k+1 ) > 0. This finishes the proof.
The next step is to show that the combined step (searching along the arc in N2 (2θ ) and moving back to N2 (θ )) will reduce the duality measure of the iterate, i.e., µk+1 < µk , if some appropriate θ and α are selected. The following two Lemmas are introduced for this purpose. Lemma 10.11 Let (x(α), p(α), ω(α)) ∈ N2 (2θ ) and (Δx, Δp, Δω) be defined as in (10.66). Then, 0≤
θ 2 (1 + 2θ ) δ0 ΔpT Δω ≤ µ(α) := µ(α). n 2n n(1 − 2θ )2
(10.77)
Proof 10.12 The first inequality of (10.77) follows from (10.71). Pre-multiplying 1 1 both sides of (10.69) by P− 2 (α)Ω− 2 (α) gives 1
1
1
1
1
1
P− 2 (α)Ω 2 (α)Δp+P 2 (α)Ω− 2 (α)Δω = P− 2 (α)Ω− 2 (α) µ(α)e−P(α)Ω(α)e . Let
1
1
u = P− 2 (α)Ω 2 (α)Δp, 1
1
v = P 2 (α)Ω− 2 (α)Δω, and
1
1
w = P− 2 (α)Ω− 2 (α) µ(α)e − P(α)Ω(α)e , in view of (10.71), it must have uT v = ΔpT Δω ≥ 0. Using Lemma 9.3 and the assumption of (x(α), p(α), ω(α)) ∈ N2 (2θ ) yields IuI2 + IvI2 =
2n � � (Δpi )2 ωi (α) i=1
pi (α)
+
(Δωi )2 pi (α) ωi (α)
�
An Arc-Search Algorithm for QP with Box Constraints
≤IwI2 = ≤
•
239
2n � (µ(α) − pi (α)ωi (α))2
pi (α)ωi (α)
i=1
2n 2
i=1 (µ(α) − pi (α)ωi (α))
mini pi (α)ωi (α)
(2θ )2 µ(α)
(2θ )2 µ 2 (α) ≤ = . (1 − 2θ )µ(α) (1 − 2θ )
(10.78)
Dividing both sides by µ(α) and using pi (α)ωi (α) ≥ µ(α)(1 − 2θ ) yields � � 2n � (Δpi )2 (Δωi )2 (1 − 2θ ) + p2i (α) ωi2 (α) i=1 �� � � � � � Δp �2 � Δω �2 =(1 − 2θ ) � � +� � p(α) ω(α) ≤
(2θ )2 , (1 − 2θ )
(10.79)
i.e., � Δp �2 � Δω �2 � 2θ �2 � � � � . � � +� � ≤ p(α) ω(α) 1 − 2θ
(10.80)
Invoking Lemma 5.3, one gets � Δp �2 � Δω �2 1 � 2θ �4 � � � � . � � ·� � ≤ p(α) ω(α) 4 1 − 2θ
(10.81)
� Δp � � Δω � 2θ 2 � � � � . � �·� �≤ p(α) ω(α) (1 − 2θ )2
(10.82)
This gives
Using Cauchy-Schwarz inequality leads to (Δp)T (Δω) µ(α) ≤
2n � |Δpi ||Δωi | i=1
µ(α)
2n � |Δpi | |Δωi | pi (α) ωi (α) i=1 � Δp �T � Δω � � � � � =(1 + 2θ )� � � � p(α) ω(α)
≤(1 + 2θ )
240
• Arc-Search Techniques for Interior-Point Methods
� Δp � � Δω � � � � � ≤(1 + 2θ )� �·� � p(α) ω(α) ≤
2θ 2 (1 + 2θ ) . (1 − 2θ )2
(10.83)
Therefore, (Δp)T (Δω) θ 2 (1 + 2θ ) ≤ µ(α). 2n n(1 − 2θ )2
(10.84)
This proves the lemma. Lemma 10.12 Let (x(α), p(α), ω(α)) ∈ N2 (2θ ) and (Δx, Δp, Δω) be defined as in (10.66). Let (xk+1 , pk+1 , ω k+1 ) be defined as in (10.67). Then, µ(α) ≤ µk+1 :=
� � � � T δ0 θ 2 (1 + 2θ ) pk+1 ω k+1 ≤ µ(α) 1 + = µ(α) 1 + . 2n n n(1 − 2θ )2
Proof 10.13 Using the fact that p(α)T Δω + ω(α)T Δp = 0 established in (10.73) in the proof of Lemma 10.10, and Lemma 10.11, it is straightforward to obtain µ(α) ≤ = ≤
p(α)T ω(α) 1 + ΔpT Δω 2n 2n (p(α) + Δp)T (ω(α) + Δω) = µk+1 2n
θ 2 (1 + 2θ )
µ(α) + µ(α). n(1 − 2θ )2
(10.85)
This proves the lemma.
For linear programming, it is shown in Chapter 5 that µk+1 = µ(α). This claim is not always true for the convex quadratic programming, as pointed out in Lemma 10.12. Therefore, some extra work is needed to make sure that the µk will be reduced in every iteration. Lemma 10.13 For θ ≤ 0.19, if
θ sin(α) = √ , n
then µk+1 < µk . Moreover, for sin(α) =
θ √
� µk+1 ≤ µk
n
=
(10.86)
0.19 √ , n
� 0.0185 1− √ . n
(10.87)
An Arc-Search Algorithm for QP with Box Constraints
•
241
Proof 10.14 Using Lemmas 10.12, 10.5, 5.4, 10.2, 10.3, and 10.4, and noticing p¨ T ω¨ ≥ 0 and p˙ T ω˙ ≥ 0 yields � � � � δ0 θ 2 (1 + 2θ ) µk+1 ≤ µ(α) 1 + = µ(α) 1 + (10.88a) n n(1 − 2θ )2 � � T � p˙ T ω˙ p¨ ω¨ =µk 1 − sin(α) + − (1 − cos(α))2 2nµk 2nµk � � �� � p˙ T ω¨ ω˙ T p¨ δ0 − + sin(α)(1 − cos(α)) 1 + n 2nµk 2nµk � �� �� � �� � � � ω˙ T p¨ � Tω � � ¨ ˙ p¨ T ω¨ δ0 p � � 4 3 � � ≤µk 1 − sin(α) + sin (α) + � +� 1+ � sin (α) n 2nµk 2nµk � � 2nµk � � �� � 1 3 n(1 + θ )2 4 2(2n) 2 (1 + θ ) 2 δ0 3 ≤µk 1 − sin(α) + sin (α) + sin (α) 1 + . (1 − θ )2 n (1 − θ )3 (10.88b)
Substituting sin(α) =
θ √
n
into (10.88b) gives
�� � 1 3 θ n(1 + θ )2 θ 4 2(2n) 2 (1 + θ ) 2 θ 3 δ0 µk+1 ≤µk 1 − √ + + 1 + 3 n (1 − θ )2 (1 − θ )3 n2 n n2 � � � � 3 3 δ0 θ θ 4 (1 + θ )2 2 2 θ 3 (1 + θ ) 2 =µk 1 − √ + + 1+ 3 2 n n(1 − θ ) n(1 − θ ) n � 3 3 θ δ0 θ 4 (1 + θ )2 2 2 θ 3 (1 + θ ) 2 θ δ0 =µk 1 − √ + + + − 3 n n(1 − θ )3 n(1 − θ )2 n n2 � �� 3 3 3 4 2 δ0 θ (1 + θ ) 2 2 θ (1 + θ ) 2 + + 3 n n(1 − θ ) n(1 − θ )2 � � � 3 3 θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 θ δ0 =µk 1 − √ 1 − √ − √ − √ n nθ n(1 − θ )3 n(1 − θ )2 � �� 3 3 θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 θ δ0 − 3 1− √ − √ . n(1 − θ )3 n(1 − θ )2 n2 �
Since 3
3
θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 1− √ − √ n(1 − θ )2 n(1 − θ )3 3
≥
1−
3
θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 − := f (θ ), (1 − θ )2
(1 − θ )3
where f (θ ) is a monotonic decreasing function of θ , and for θ ≤ 0.37, f (θ ) > 0.
242
• Arc-Search Techniques for Interior-Point Methods
Therefore, for θ ≤ 0.37, the following relation holds. � � �� 3 3 θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 θ δ0 µk+1 ≤µk 1 − √ 1 − √ − √ − √ n(1 − θ )2 n nθ n(1 − θ )3 � � �� 3 3 θ θ (1 + 2θ ) θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 =µk 1 − √ 1 − √ −√ − √ . n(1 − θ )3 n(1 − θ )2 n n(1 − 2θ )2
(10.89)
Since 3
3
θ (1 + 2θ ) θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 √ 1− √ − − √ n(1 − θ )3 n(1 − θ )2 n(1 − 2θ )2 3
≥
3
θ (1 + 2θ ) θ 3 (1 + θ )2 2 2 θ 2 (1 + θ ) 2 1− − − := g(θ ), (1 − θ )3 (1 − θ )2 (1 − 2θ )2
(10.90)
where g(θ ) is a monotonic decreasing function of θ , one can conclude, for θ ≤ 0.19, g(θ ) > 0.0976 > 0. For θ = 0.19, it must have θ g(θ ) > 0.0185 and � � 0.0185 µk+1 ≤ µk 1 − √ . n This proves (10.87). Remark 10.3 As seen in this section, that starting with (x0 , p0 , ω 0 ), the arcsearch interior-point algorithm is carried out to find (x(α), p(α), ω(α)) ∈ N2 (2θ ) and (xk+1 , pk+1 , ω k+1 ) ∈ N2 (θ ) such that µk+1 < µk . In view of the proofs of Lem mas 10.7, 10.10, and 10.13, the positivity conditions of (x(α), p(α), ω(α)) > 0 and (xk+1 , pk+1 , ω k+1 ) > 0 relies on µ(α) > 0 which, according to Corollary 10.1, is achievable for any θ ≤ 0.19 and is given by a bound in terms of α¯ . The proximity condition for (x(α), p(α), ω(α)) relies on the real positive root of q(sin(α)) = 0, denoted by sin(α˜ ), which is conservatively estimated in Lemma 10.9 under the con dition that θ ≤ 0.22; the proximity condition for (xk+1 , pk+1 , ω k+1 ) is established in Lemma 10.10 under the condition that θ ≤ 0.29. Finally, duality measure reduction µk+1 < µk is established in Lemma 10.13 under the condition that θ ≤ 0.19. For all these results to hold, it just needs to take the smallest bound θ = 0.19.
Summarizing all the results in this section leads to the following theorem. Theorem 10.2 Let θ = 0.19 and (xk , pk , ω k ) ∈ N2 (θ ). Then, (x(α), p(α), ω(α)) ∈ N2 (2θ ); √ (xk+1 , pk+1 , ω k+1 ) ∈ N2 (θ ); and µk+1 ≤ µk 1 − 0.0185 . n Proof 10.15 By examining Corollary 10.1 and Lemma 10.9, one can se lect sin(α) ≤ min{sin(α˜ ), sin(α¯ )}. Therefore, Lemma 10.7 holds. This means
An Arc-Search Algorithm for QP with Box Constraints
•
243
that the relation (x(α), p(α), ω(α)) ∈ N2 (2θ ) holds. Since sin(α) ≤ sin(α¯ ) and (x(α), p(α), ω(α)) ∈ N2 (2θ ), Lemma 10.10 states (xk+1 , pk+1 , ω k+1 ) ∈ N2 (θ ). For √ θ = 0.19 and sin(α) = √θn , Lemma 10.13 states µk+1 ≤ µk 1 − 0.0185 . This fin n ishes the proof. Remark 10.4 It is worthwhile to point out that θ = 0.19 for the box constrained quadratic optimization problem is larger than the θ = 0.148 for linearly constrained quadratic optimization problem. This makes the searching neighborhood larger and the following algorithm more efficient than the algorithm in Chapter 9.
The proposed algorithm is presented as below: Algorithm 10.1 (Arc-search path-following) Data: H � 0, c, n, θ = 0.19, ε > 0.
Initial point (x0 , p0 , ω 0 ) ∈ N2 (θ ), and µ0 = for iteration k = 0, 1, 2, . . .
T
p0 ω 0 2n .
Step 1: Solve the linear systems of equations (10.12) and (10.13) to get (x˙ , p˙ , ω˙ ) and (x¨ , p¨ , ω¨ ). Step 2: Let sin(α) = and (10.55).
θ √ . n
Update (x(α), p(α), ω(α)) and µ(α) by (10.54)
Step 3: Solve (10.66) to get (Δx, Δp, Δω), update (xk+1 , pk+1 , ω k+1 ) and
µk+1 by using (10.67) and (10.68).
Step 4: Set k + 1 → k. Go back to Step 1.
end (for)
10.3
Convergence Analysis
The first result in this section extends a result of Chapter 9 to convex quadratic programming subject to box constraints. Lemma 10.14 Suppose F o �= ∅. Then, for each K ≥ 0, the set {(x, p, ω) | (x, p, ω) ∈ F, pT ω ≤ K} is bounded.
244
• Arc-Search Techniques for Interior-Point Methods
Proof 10.16 The proof is similar to the proof in Chapter 9. It is given here for completeness. First, x is bounded because −e ≤ x ≤ e. Since x + y = e and −e ≤ x ≤ e, it is easy to see 0 ≤ y = e − x ≤ 2e. Since x − z = −e, it is easy to see 0 ≤ z = x + e ≤ 2e. Therefore, y and z are also bounded. Let (x¯ , y¯ , z¯ , λ¯ , γ¯ ) be any fixed point in F o , and (x, y, z, λ , γ) be any point in F with yT λ + zT γ ≤ K. Using the definition of F o and F yields H(x¯ − x) + (λ¯ − λ ) − (γ¯ − γ) = 0. Therefore, (x¯ − x)T H(x¯ − x) + (x¯ − x)T (λ¯ − λ ) − (x¯ − x)T (γ¯ − γ) = 0, or equivalently, (x¯ − x)T (γ¯ − γ) − (x¯ − x)T (λ¯ − λ ) = (x¯ − x)T H(x¯ − x) ≥ 0. This gives ((x¯ + e) − (x + e))T (γ¯ − γ) − ((x¯ − e) − (x − e))T (λ¯ − λ ) ≥ 0. Substituting x − e = −y and x + e = z yields (z¯ − z)T (γ¯ − γ) + (y¯ − y)T (λ¯ − λ ) ≥ 0. This leads to z¯ T γ¯ + zT γ − zT γ¯ − z¯ T γ + y¯ T λ¯ + yT λ − yT λ¯ − y¯ T λ ≥ 0, or in a compact form p¯ T ω¯ + pT ω − pT ω¯ − p¯ T ω ≥ 0. Since (p¯ , ω¯ ) > 0 is fixed, let ξ = min
i=1,··· ,n
min{p¯i , ω¯ i },
then, using pT ω ≤ K, p¯ T ω¯ + K ≥ ξ eT (p + ω) ≥ max max{ξ pi , ξ ωi }, i=1,··· ,n
i.e., for i ∈ {1, · · · , n}, 0 ≤ pi ≤
1 (K + p¯ T ω¯ ), ξ
0 ≤ ωi ≤
1 (K + p¯ T ω¯ ). ξ
This proves the lemma.
The following theorem is a direct result of Lemmas 10.14, 10.1, Theorem 10.2, KKT conditions, Theorem 1.5.
An Arc-Search Algorithm for QP with Box Constraints
•
245
Theorem 10.3 Suppose that Assumption 1 holds, then the sequence generated by Algorithm 10.1 converges to a set of accumulation points, and all these accumulation points are global optimal solutions of the convex quadratic programming subject to box con straints.
Let (x∗ , p∗ , ω ∗ ) be any solution of (10.2), denote index sets B, S, and T as B = { j ∈ {1, . . . , 2n} | p∗j = � 0}.
(10.91)
S = { j ∈ {1, . . . , 2n} | ω ∗j = � 0}.
(10.92)
T = { j ∈ {1, . . . , 2n} | p∗j = ω ∗j = 0}.
(10.93)
According to Goldman-Tucker theorem [40], for the linear programming, B ∩ S = ∅ = T and B ∪S = {1, . . . , 2n}. A solution with this property is called strictly complementary. An example in Chapter 9 shows that there may have quadratic programming problems that do not have strictly complementary solution. But if a convex quadratic programming subject to box constraints has strictly com plementary solution(s), an interior-point algorithm will generate a sequence to approach strict complementary solution(s). As a matter of fact, from Lemma 10.14, we can show that Lemma 9.18 has a parallel result for convex quadratic programming subject to box constraints given as follows: Lemma 10.15 Let µ0 > 0, and ρ ∈ (0, 1). Assume that the convex BQP (10.1) has strictly com plementary solution(s). Then, for all points (x, p, ω) ∈ F o , pi ωi > ρ µ, and µ < µ0 , there are constants M, C1 , and C2 such that I(p, ω)I ≤ M, 0 < pi ≤ µ/C1 (i ∈ S), ωi ≥ C2 ρ (i ∈ S),
(10.94)
0 < ωi ≤ µ/C1 (i ∈ B).
(10.95)
pi ≥ C2 ρ (i ∈ B).
(10.96)
Proof 10.17 The first result (10.94) follows immediately from Lemma 10.14. Let (x∗ , p∗ , ω ∗ ) be any strictly complementary solution. Since (x∗ , p∗ , ω ∗ ) and (x, p, ω) are both feasible, it must have (y − y∗ ) = −(x − x∗ ) = −(z − z∗ ),
H(x − x∗ ) + (λ − λ ∗ ) − (γ − γ ∗ ) = 0.
Therefore, (y − y∗ )T (λ − λ ∗ ) + (z − z∗ )T (γ − γ ∗ ) = (x − x∗ )T H(x − x∗ ) ≥ 0, or
yT λ + zT γ ≥ yT λ ∗ + zT γ ∗ + (y∗ )T λ + (z∗ )T γ
(10.97)
246
• Arc-Search Techniques for Interior-Point Methods
as (p∗ )T ω ∗ = 0. Since (x∗ , y∗ , z∗ , λ ∗ , γ ∗ ) = (x∗ , p∗ , ω ∗ ) is strictly complementary solution, it must have T = ∅, p∗i = 0 for i ∈ S, and ωi∗ = 0 for i ∈ B. Since pT ω = 2nµ, from (10.97), it must have pT ω
= yT λ + zT γ ≥ yT λ ∗ + zT γ ∗ + (y∗ )T λ + (z∗ )T γ = pT ω ∗ + ω T p∗
2nµ ≥ pT ω ∗ + ω T p∗ =
⇐⇒
�
pi ωi∗ +
i∈S
�
p∗i ωi .
(10.98)
i∈B
Since each term in the summations is positive and bounded above by 2nµ, it must have ωi∗ > 0 for any i ∈ S; therefore, 0 < pi ≤
2nµ . ωi∗
Denote D∗ = {(p∗ , ω ∗ ) | ωi∗ > 0} and P∗ = {(p∗ , ω ∗ ) | pi∗ > 0}, it must have 0 < pi ≤ This leads to
2nµ . sup(p∗ ,ω ∗ )∈D∗ ωi∗
max pi ≤ i∈S
2nµ . mini∈S sup(p∗ ,ω ∗ )∈D∗ ωi∗
max ωi ≤
2nµ . mini∈B sup(p∗ ,ω ∗ )∈P∗ p∗i
Similarly, i∈B
Combining these two inequalities gives max{max pi , max ωi } i∈S
≤ =
i∈B
2nµ min{mini∈S sup(p∗ ,ω ∗ )∈D∗ ωi∗ , mini∈B sup(p∗ ,ω ∗ )∈P∗ p∗i } µ . C1
This proves (10.95). Finally, since pi ωi ≥ ρ µ, we have, for any i ∈ S,
Similarly, for any i ∈ B, This completes the proof.
ωi ≥
ρµ ρµ ≥ = C2 ρ. µ/C1 pi
pi ≥
ρµ ρµ ≥ = C2 ρ. µ/C1 ωi
(10.99)
An Arc-Search Algorithm for QP with Box Constraints
•
247
Lemma 10.15 leads to the following Theorem 10.4 Let (xk , pk , ω k ) ∈ N2 (θ ) be generated by Algorithm 10.1. Assume that the convex QP with box constraints has strictly complementary solution(s). Then, every limit point of the sequence is a strictly complementary solution of the convex quadratic programming with box constraints, i.e., ωi∗ ≥ C2 ρ (i ∈ S),
p∗i ≥ C2 ρ (i ∈ B).
(10.100)
Proof 10.18 From Lemma 10.15, (pk , ω k ) is bounded; therefore, there is at least one limit point (p∗ , ω ∗ ). Since (pki , ωik ) is in the neighborhood of the central path, i.e., pki ωik > ρ µk := (1 − θ )µk , pki ≥ C2 ρ (i ∈ B),
ωik ≥ C2 ρ (i ∈ S),
every limit point will meet (10.100) due to the fact that C2 ρ is a constant.
We are now ready to prove the complexity bound of Algorithm 10.1. Com bining Lemma 10.13 and Theorem 1.4 gives Theorem 10.5 √ The complexity of Algorithm 10.1 is bounded by O( n log(1/ε)).
10.4
Implementation Issues
Algorithm 10.1 is presented in a form that is convenient for the convergence analysis. Some implementation details that make the algorithm more efficient are discussed in this section.
10.4.1 Termination criterion Algorithm 10.1 needs a termination criterion in real implementation. One can use
k
k
k
k
k
k
k
µk ≤ ε,
IrX I = IHx + λ − γ + cI ≤ ε,
IrY I = Ix + y − eI ≤ ε, IrZ I = Ix − z + eI ≤ ε, k
k
(p , ω ) > 0.
(10.101a) (10.101b) (10.101c) (10.101d) (10.101e)
248
• Arc-Search Techniques for Interior-Point Methods
An alternate criterion is similar to the one used in linprog [172] κ :=
IrX I µk IrY I + IrZ I + + ≤ ε. (10.102) 2n max{1, IcI} max{1, IxkT Hxk + cT xk I}
10.4.2 A feasible initial point For feasible interior-point algorithms, an important prerequisite is to start with a feasible interior point. While finding an initial feasible point may not be a simple and trivial task for even linear programming with equality constraints [144], for quadratic programming subject to box constraints, finding the initial point is not an issue. As a matter of fact, the following initial point (x0 , y0 , z0 , λ 0 , γ 0 ) is an interior point, moreover, (x0 , y0 , z0 , λ 0 , γ 0 ) ∈ N2 (θ ). x0 = 0, y0 = z0 = e > 0, ci λi0 = 4(1 + IcI2 ) − > 0, 2 ci 0 2 γi = 4(1 + IcI ) + > 0. 2
(10.103a) (10.103b) (10.103c)
It is easy to see that this selected point meets (10.5). First, the following rela tions, Hx0 + c + λ 0 − γ 0 = 0, (y0 , z0 , λ 0 , γ 0 ) > 0, x0 + y0 = e, and x0 − z0 = −e are easy to verify. Since µ0 =
n i=1
λi0 + γi0 = 2n
n i=1
8(1 + IcI2 ) = 4(1 + IcI2 ), 2n
(10.104)
for θ = 0.19, it must have
n n �2 � � � � � 0 (λi0 − µ0 )2 + (γi0 − µ0 )2 �p
◦ ω 0 − µ0 e� =
i=1
i=1
2
=
IcI ≤ 16θ 2 (1 + IcI2 )2 = θ 2 (µ0 )2 . 2
This shows that (x0 , y0 , z0 , λ 0 , γ 0 ) ∈ N2 (θ ).
10.4.3 Step size Directly using sin(α) = √θn in Algorithm 10.1 provides an effective formula to prove the polynomiality. However, this choice of sin(α) is too conservative in practice because this search step in N2 (2θ ) is too small and the speed of duality measure reduction is slow. A better choice of sin(α) should have a larger step in every iteration so that the polynomiality is reserved and fast convergence is achieved. In view of Remark 10.3, conditions that restrict step size are positivity
An Arc-Search Algorithm for QP with Box Constraints
•
249
conditions, proximity conditions, and duality reduction condition. This subsec tion examines how to enlarge the step size under these restrictions. First, from (10.59) and (10.76), µ(α) > 0 is required for positivity condi tions (p(α), ω(α)) > 0 and (pk+1 , ω k+1 ) > 0 to hold. Since sin(α¯ ), estimated in Corollary 10.1, is conservative, a better selection of α¯ is to use (10.45), Lemmas 5.4 and 10.2: 1 µ(α) ≥ µ(1 − sin(α)) − x˙ T Hx˙ (1 − cos(α))2 + sin2 (α) 2n 1 ˙ sin4 (α) + sin2 (α) ≥ µ(1 − sin(α)) − (p˙ T ω) 2n := f (sin(α)) = σ , (10.105) where σ > 0 is a small number, and f (sin(α)) is a monotonic decreasing func tion of sin(α) with f (sin(0)) = µ > 0 and f (sin( π2 )) < 0. Therefore, equation (10.105) has a unique positive real solution for α ∈ [0, π2 ]. Since (10.105) is a quartic function of sin(α), the cost of finding the smallest positive solution is negligible [113]. Second, in view of (10.75), the proximity condition for (xk+1 , yk+1 , zk+1 , λ k+1 , γ k+1 ) ∈ N2 (θ ) holds for θ ≤ 0.19 without further restriction. The proximity condition (10.58) is met for sin(α) ∈ [0, sin(α˜ )], where sin(α˜ ) is the smallest positive solution of (10.53) and it is estimated very conservatively in Lemma 10.9. An efficient implementation should use sin(α˜ ), the smallest positive solution of (10.53). Ac tually, there exists a α´ which is normally larger than α˜ such that the proximity condition (10.58) is met for sin(α) ∈ [0, sin(α´ )]. Let b0 = −θ µ < 0, b3
b4 and
b1 = θ µ > 0, � � � � 1 = � p˙ ◦ ω¨ + ω˙ ◦ p¨ − 2n (p˙ T ω¨ + ω˙ T p¨ )e� + θn p˙ T ω¨ + p¨ T ω˙ = a3 + θn p˙ T ω¨ + p¨ T ω˙ ,
� � � � 1 (p¨ T ω¨ − ω˙ T p˙ )e� − θn p¨ T ω¨ − p˙ T ω˙ = � p¨ ◦ ω¨ − ω˙ ◦ p˙ − 2n = a4 − θn p¨ T ω¨ ,
p(α) := b4 (1 − cos(α))2 + b3 sin(α)(1 − cos(α)) + b1 sin(α) + b0 . (10.106) Applying the second inequality of (10.30) to cos(α)), we have
θ n
p˙ T ω¨ + p¨ T ω˙ sin(α)(1 −
�b3 sin(α)(1 − cos(α))� = a3 + θn p˙ T ω¨ + p¨ T ω˙ sin(α)(1 − cos(α)) ≤ a3 sin3 (α) + θn x˙ T Hx˙ sin2 (α) + x¨ T Hx¨ (1 − cos(α))2 .
250
• Arc-Search Techniques for Interior-Point Methods
Therefore,
2 b� 4 (1 − cos(α)) + b3 sin(α)(1 − cos(α)) � θ T ≤ a4 − n x¨ Hx¨ (1 − cos(α))2 +a3 sin3 (α) + θn x˙ T Hx˙ sin2 (α) + x¨ T Hx¨ (1 − cos(α))2 = a4 (1 − cos(α))2 + a3 sin3 (α) + θn x˙ T Hx˙ sin2 (α) ≤ a4 sin4 (α) + a3 sin3 (α) + a2 sin2 (α).
This shows that p(α) = b4 (1 − cos(α))2 + b3 sin(α)(1 − cos(α)) + b1 sin(α) + b0 ≤ a4 sin4 (α) + a3 sin3 (α) + a2 sin2 (α) + a1 sin(α) + a0 = q(α) where q(α) is defined in (10.53). Therefore, the smallest positive solution α` of p(α) is larger than the smallest positive solution α˜ of q(α). Hence, the goal is to show that for sin(α) ∈ [0, sin(α` )], the proximity condition (10.58) holds. Since for sin(α) ∈ [0, sin(α` )], p(α) ≤ 0 is equivalent to � � 1 � � � p¨ ◦ ω¨ − ω˙ ◦ p˙ − (p¨ T ω¨ − ω˙ T p˙ )e� (1 − cos(α))2 2n
� � 1 � � + � p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ T ω¨ + ω˙ T p¨ )e� sin(α)(1 − cos(α))
2n
� � � � � 1 1 � T p¨ T ω¨ − p˙ T ω˙ (1 − cos(α))2 − ≤ (2θ ) p˙ ω¨ + p¨ T ω˙ sin(α)(1 − cos(α)) 2n
+θ µ(1 − sin(α)).
2n
(10.107)
Substituting this inequality into (10.57) and using the formula for µ(α) in Lemma 10.45, we have
≤
≤
≤
=
� � � � �p(α) ◦ ω(α) − µ(α)e� � � � � (1 − sin(α))�p ◦ ω − µe� � � 1 � � +�(p¨ ◦ ω¨ − p˙ ◦ ω˙ − (p¨ T ω¨ − p˙ T ω˙ ))e�(1 − cos(α))2 2n � � 1 � � +�(p˙ ◦ ω¨ + ω˙ ◦ p¨ − (p˙ T ω¨ + p¨ T ω˙ )e� sin(α)(1 − cos(α)) 2n θ µ(1 − sin(α)) � θ� T + p¨ ω¨ − p˙ T ω˙ (1 − cos(α))2 − p˙ T ω¨ + p¨ T ω˙ sin(α)(1 − cos(α)) n +θ µ(1 − sin(α)) � 1 2θ µ(1 − sin(α)) + x¨ T (γ¨ − λ¨ ) − x˙ T (γ˙ − λ˙ ) (1 − cos(α))2 2n � 1 − x˙ T (γ¨ − λ¨ ) + x¨ T (γ˙ − λ˙ ) sin(α)(1 − cos(α)) 2n 2θ µ(α). [use Lemma 10.45] (10.108)
An Arc-Search Algorithm for QP with Box Constraints
•
251
This is the proximity condition for (x(α), y(α), z(α), λ (α), γ(α)). Therefore, the smallest positive solution α` of p(α) gives a better condition for the proximity condition (10.108) to hold. However, p(α) may not be a monotonic increasing function, therefore, it may not be solved efficiently. Denote bˆ 0 = b0 , bˆ 1 = b1 , bˆ 3 =
b3 if b3 ≥ 0, 0 if b3 < 0,
bˆ 4 =
b4 if b4 ≥ 0, 0 if b4 < 0,
and pˆ(α) := bˆ 4 (1 − cos(α))2 + bˆ 3 sin(α)(1 − cos(α)) + bˆ 1 sin(α) + bˆ 0 . (10.109) Since p(α) ˆ ≥ p(α), the smallest positive solution α´ of p(α) ˆ is smaller than smallest positive solution α` of p(α). To estimate the smallest solution of α´ , by noticing that p(α) ˆ is a monotonic increasing function of α and p(0) ˆ = −θ µ < 0, one can simply use the bisection method. The computational cost is independent of the problem size n and is negligible. Since both estimated step sizes α´ and α˜ guarantee the proximity condition for (x(α), y(α), z(α), λ (α), γ(α)) to hold, one can select αˇ = max{α´ , α˜ } ≥ α˜ , which guarantees that the polynomiality claim will hold. Third, for duality measure reduction, from the first inequality of (10.29), we have − x¨ T (γ˙ − λ˙ ) + x˙ T (γ¨ − λ¨ ) sin(α)(1 − cos(α)) ≤ (x˙ T Hx˙ )(1 − cos(α))2 + (x¨ T Hx¨ ) sin2 (α).
Substituting this relation into µ(α) in Lemma 10.5 and using Lemmas 10.2 and 5.4 yields 1 µ(α) = µ(1 − sin(α)) + 2n x¨ T (γ¨ − λ¨ ) − x˙ T (γ˙ − λ˙ ) (1 − cos(α))2 1 x˙ T (γ¨ − λ¨ ) + x¨ T (γ˙ − λ˙ ) sin(α)(1 − cos(α)) − 2n
1 ≤ µ(1 − sin(α)) + 2n x¨ T (γ¨ − λ¨ ) − x˙ T (γ˙ − λ˙ ) (1 − cos(α))2 + 1 + 2n (x˙ T Hx˙ )(1 − cos(α))2 + (x¨ T Hx¨ ) sin2 (α) 1 1 T ≤ µ(1 − sin(α)) + 2n x¨ (γ¨ −- λ¨ )(1 − cos(α))2 ++ 2n (x¨ T Hx¨ ) sin2 (α) 4 2 1 T ≤ µ(1 − sin(α)) + 2n x¨ Hx¨ sin (α) + sin (α)
Substituting this inequality into (10.88a) yields µk+1
θ 2 (1 + 2θ ) ≤ µ(α) 1 + n(1 − 2θ )2 � θ 2 (1 + 2θ ) θ 2 (1 + 2θ ) − 1+ sin(α) ≤ µk 1 + 2 n(1 − 2θ )2 n(1 − 2θ ) � � θ 2 (1 + 2θ ) p¨ T ω¨ � 2 4 + 1+ sin (α) + sin (α) . (10.110) n(1 − 2θ )2 2nµ
252
• Arc-Search Techniques for Interior-Point Methods
For µk+1 ≤ µk to hold, one needs �
�
θ 2 (1 + 2θ ) θ 2 (1 + 2θ ) − 1 +
sin(α)
n(1 − 2θ )2
n(1 − 2θ )2 � � θ 2 (1 + 2θ ) p¨ T ω¨
+ 1 +
sin2 (α) + sin4 (α) ≤ 0.
n(1 − 2θ )2 2nµ For the sake of convenience in convergence analysis, a conservative estimate is used in Lemma 10.13. For an efficient implementation, the following solution 2 (1+2θ ) p¨ T ω¨ should be adopted. Denote u = θn(1−2θ )2 > 0, v = 2nµ > 0, z = sin(α) ∈ [0, 1], and F(z) = (1 + u)vz4 + (1 + u)vz2 − (1 + u)z + u.
For z ∈ [0, 1] and v ≤ 16 , F ' (z) = (1 + u)(4vz3 + 2vz − 1) ≤ 0; therefore, the upper bound of the duality measure µk+1 is a monotonic decreasing function of sin(α) for α ∈ [0, π2 ]. The larger α is, the smaller the upper bound of the duality measure will be. For v > 16 , in order to minimize the upper bound of the duality measure, one can find the solution of F ' (z) = 0. It is easy to check from discriminator in Lemma 9.7 that the cubic polynomial F ' (z) has only one real solution which is given by sin(α˘ )
=
� � � � � � � �2 � �3 � �2 � �3 � � 3 3 nµ 1 nµ nµ 1 � nµ � + + + − + 6 6 4p¨ T ω¨ 4p¨ T ω¨ 4p¨ T ω¨ 4p¨ T ω¨
:=
χ.
(10.111)
Since F '' (sin(α˘ )) = (1 + u)(12v sin2 (α˘ ) + 2v) > 0 at sin(α˘ ) ∈ [0, 1), the upper bound of the duality measure is minimized. Therefore, one can define ⎧ ¨ T ω¨ π ⎪ if p2nµ ≤ 61 ⎨ 2, α˘ =
(10.112) ⎪ ⎩ sin−1 (χ) , if p¨ T ω¨ > 1 . 2nµ 6 It is worthwhile to note that for α < α˘ , F ' (sin(α)) < 0, i.e., F(sin(α)) is a monotonic decreasing function of α ∈ [0, α˘ ]. The step size selection process, is therefore, a simple algorithm, given below. Algorithm 10.2 (Step Size Selection) Data: σ > 0.
Step 1: Find the positive real solution of (10.105) to get sin(α¯ ).
Step 2: Find the smallest positive real solution of (10.109) to get sin(α´ ), the smallest
positive real solution of (10.53) to get sin(α˜ ), and set sin(αˇ ) = max{sin(α˜ ), sin(α´ )}.
Step 3: Calculate α˘ given by (10.112).
Step 4: The step size is obtained as sin(α) = min{sin(α¯ ), sin(αˇ ), sin(α˘ )}.
An Arc-Search Algorithm for QP with Box Constraints
•
253
10.4.4 The practical implementation Therefore, a more efficient implementation than Algorithm 10.1 is the following algorithm: Algorithm 10.3 (Arc-search path-following)
Data: H � 0, c, n, θ = 0.19, ε > σ > 0.
Step 0: Find initial point (x0 , p0 , ω 0 ) ∈ N2 (θ ) using (10.103), κ using (10.102), and
µ0 using (10.104).
while κ > ε
Step 1: Compute (x˙ , p˙ , ω˙ ) and (x¨ , p¨ , ω¨ ) using (10.12) and (10.13). Step 2: Select sin(α) using Algorithm 10.2. Update (x(α), p(α), ω(α)) and µ(α) using (10.54) and (10.55). Step 3: Compute (Δx, Δp, Δω) using (10.66), update (xk+1 , pk+1 , ω k+1 ) and µk+1 using (10.67) and (10.68). Step 4: Computer κ using (10.102). Step 5: If κ ≤ ε, quit. Otherwise, set k + 1 → k. Go back to Step 1. end (while) Remark 10.5 The condition µ > σ guarantees that the equation (10.105) has a positive solution before termination criterion is met.
10.5
A Design Example
We claim that the convex quadratic programming problem with box constraints has a lot of applications. One example is the widely-used linear quadratic reg ulator with control saturation. In this section, the OrbView-2 spacecraft orbitraising design example discussed in [160] is used to demonstrate the effective ness and efficiency of the proposed algorithm1 . Let w = (wx , wy , wz ) be the space craft body rate with respect to the reference frame expressed in the body frame, q¯ = (q0 , q1 , q2 , q3 ) be the quaternion of the spacecraft attitude with respect to the reference frame represented in the body frame and q = (q1 , q2 , q3 ) be the reduced quaternion, J = diag(Jx , Jy , Jz ) be the spacecraft inertia matrix, and hw be the an gular momentum produced by a momentum wheel. Orbit-raising is performed by 4 fixed thrusters (1 Newton) with on/off switches which are mounted on the antinadir face of the spacecraft in each corner of a square with a side length of 2d 1 There
are algorithms to solve this type of problem [9], but the new method is more efficient.
254
• Arc-Search Techniques for Interior-Point Methods
meter. The thrusters point to +z direction and canted 5 degree from z-axis. (more details were provided in [160]). The matrices of the thruster force direction F and moment arms R in the body frame are given as ⎡ ⎤ −a −a a a
F = [f1 , f2 , f3 , f4 ] = ⎣ a −a −a a ⎦
1 1 1 1 ⎡ ⎤ −d −d d d d −d ⎦ .
R = [r1 , r2 , r3 , r4 ] = ⎣ −d d −� −� −� −� Let x = (wx , wy , wz , q1 , q2 , q3 ) be the states of the attitude and u = (T1 , T2 , T3 , T4 ) be the control variable with T1 , T2 , T3 , T4 the thrust level of the four thrusters. The linear time-invariant system under consideration is represented in a reduced quaternion model (see [160]). ⎡
⎤
hw 0 0 0 0 0 Jx ⎢ 0 0 0 0 0 0 ⎥ ⎢ h ⎥ ⎢ w ⎥ 0 0 0 0 0 ⎥ x
Jz x˙ = ⎢ ⎢ 0.5 0 0 0 0 0 ⎥ ⎢ ⎥ ⎣ 0 0.5 0 0 0 0 ⎦
0 0 0.5 0 0 0 ⎡ 1 ⎤ 0 Jx 0 ⎡ ⎤ ⎡ ⎤
⎢ 0 J1 0 ⎥ r1 × f1 T T1 y ⎢ ⎥ ⎢ 0 0 1 ⎥ ⎢ r2 × f2 ⎥ ⎢ T2 ⎥ ⎥ ⎢ ⎥ J ⎥⎢ + ⎢ ⎢ 0 0 0z ⎥ ⎣ r3 × f3 ⎦ ⎣ T3 ⎦
⎢ ⎥ ⎣ 0 0 0 ⎦ r4 × f4 T4 0 0 0 = Ax + Bu, (10.113) with the control constraints −e ≤ u = (T1 , T2 , T3 , T4 ) ≤ e.
(10.114)
The problem is converted to discrete model using MATLAB function c2d with sampling time 1 second. The design is to minimize J=
N−1 �
1 T 1 � � T x N PxN + x k Qxk + uTk Ruk , u0 ,u1 ,··· ,uN−1 2
2
min
(10.115)
k=0
where the horizon number N = 30, the matrices P, Q, and R are given by
� 1 �
I3 0 2.5 P = Q =
, R = I6 . 0 10000I3
An Arc-Search Algorithm for QP with Box Constraints
�
255
Other spacecraft parameters (d = 0.248m, � = 0.815m, Ix = 189kg.m2 , Iy = 159kg.m2 , and Iz = 114kg.m2 , and hw = −2.8N.m.s) are the same as the ones of [128]. The algorithm is implemented in MATLAB. In our implementation of Algorithm 10.3, ε = 10−6 and σ = 10−10 are selected. After 20 iterations, the algorithm converges. The optimal thrust controller design is obtained.
10.6
Concluding Remarks
As explained at the beginning of the chapter, there is no need to work on infea sible interior-point algorithms for convex quadratic programming problem with box constraints because a feasible initial point is always available. There is, how ever, a need to develop feasible interior-point algorithms that search for the opti mizer in a wider neighborhood as this strategy may lead to better algorithms than the one proposed in this chapter.
Chapter 11
An Arc-Search Algorithm for LCP Linear complementarity problem (LCP) [21] is a class of optimization problems that includes linear programming, quadratic programming, and linear comple mentarity problem. Because of the great success of interior-point method in linear programming [82, 83, 89], researchers quickly turned their strike direc tion to solve LCP using interior-point method and achieved the expected success [72, 171]. After arc-search method demonstrated its superiority in interior-point method [152, 153], quite a few papers on arc-search interior-point algorithms for LCP have been published since 2015. The first one is probably due to Kheirfam and Chitsaz [64], which consider a special LCP problem, the so called P∗ (κ) linear complementarity problem. Pirhaji, Zangiabadi, and Mansouri [110] dis cussed a �2 -neighborhood infeasible interior-point algorithm for LCP. Pirhaji and Kheirfam developed an arc-search infeasible interior-point algorithm for hori zontal linear complementarity problem in the N−∞ neighborhood of the cen tral path [61]. Zangiabadi, Mansouri, and Amin [112] proposed an arc-search interior-point algorithm for monotone linear complementary problem over sym metric cones. Shahraki et al. [123] discussed a wide neighborhood infeasibleinterior-point method with arc-search for linear complementarity problem over symmetric cones (SCLCPs). Yuan, Zhang, and Huang [168] devised a wide neighborhood interior-point algorithm with arc-search for P∗ (k) LCP. Shahraki and Delavarkhalafi [122] proposed an arc-search predictor-corrector infeasibleinterior-point algorithm for P∗ (k)-LCPs. This chapter presents an arc-search fea sible interior-point algorithm for LCP [167], which is tied closely to the method discussed in Chapter 9.
An Arc-Search Algorithm for LCP
11.1
•
257
Linear Complementarity Programming
In Section 1.3.3, we introduced a linear complementarity problem and showed that it includes linear programming and quadratic programming. Another form of general LCP is the so-called horizontal linear complementarity problem [171]. By the horizontal linear complementarity problem, we mean the following non linear system with non-negativity constraints: Mx + Ns = q, x ◦ s = 0, (x, s) ≥ 0,
(11.1)
where, given vector q ∈ Rn and matrices M, N ∈ Rn×n , x, s ∈ Rn are variables to be determined. This problem is general enough to include the standard (mono tone) linear complementarity problem (monotone LCP), linear, and quadratic programs. It is easy to see that when N = −I, problem (11.1) is equivalent to the monotone linear complementarity problem given in (1.38). Moreover, con sider the quadratic programming: 1 min cT x + xT Hx, s.t. Ax = b, x ≥ 0, (11.2) 2 where given c ∈ Rn , b ∈ Rm , A ∈ Rm×n , and symmetric H ∈ Rn×n , x ∈ Rn is the variable vector to be optimized. The KKT conditions of (1.37) can be rewritten as horizontal LCP with: � � � � � � A 0 b M= , N= ˆ , q= ˆ , (11.3) ˆ −AH A Ac ˆ ∈ R(n−m)×n is the full rank matrix that meets AA ˆ T = 0, i.e., the rows where A ˆ of A span the null space of A. When H = 0, the problem (11.2) is reduced to a linear programming problem. On the other hand, horizontal linear complementarity problem can also be represented as a standard LCP. Assuming N is invertible, problem (11.1) can be rewritten as: ¯ + s = q, ¯ x ◦ s = 0, (x, s) ≥ 0, −Mx (11.4) −1 −1 ¯ which is a standard LCP with −M = N M and q¯ = N q. For this reason, in this chapter, we will discuss the standard LCP problem (1.38) which is rewritten below: s = Mx + q, (x, s) ≥ 0, x ◦ s = 0, (11.5) We assume, as in [69], that 0 � M ∈ Rn×n is positive semi-definite, and n ≥ 2 (because n = 1 is trivial).
11.2
An Arc-search Interior-point Algorithm
For convenience of reference, we define the feasible set of problem (11.5) by F = {(x, s) ∈ Rn × Rn | s = Mx + q, (x, s) ≥ 0},
(11.6)
258
• Arc-Search Techniques for Interior-Point Methods
and the strict feasible set F 0 = {(x, s) ∈ Rn × Rn | s = Mx + q, (x, s) > 0}.
(11.7)
We assume that the interior-point condition holds [118], i.e., F 0 is not empty. The basic idea of the feasible interior-point method is to replace the x ◦ s = 0 in (11.5) by the perturbed equation x ◦ s = µe in order to get the following parametrized system s = Mx + q, x ◦ s = µe, x, s ≥ 0.
(11.8a) (11.8b) (11.8c)
If the LCP satisfies the interior-point condition, then system (11.8) has a unique solution for every µ > 0, denoted by (x(µ), s(µ)). The set of all such solutions forms a homotopy path, which is called the central path of the LCP and is used as a guideline to the solution of LCP [174]. Thus, the central path is an arc in R2n parametrized as a function of µ, defined as C = {(x(µ), s(µ)) | µ > 0}.
(11.9)
If µ → 0, then the limit of the central path exists and is an optimal solution for LCP (11.5). Similar to the definition in Chapter 9, we define an ellipse E(α) in 2n dimensional space in order to approximate the central path C as follows E(α) = {(x(α), s(α)) | (x(α), s(α)) =ba cos(α) +bb sin(α) +bc},
(11.10)
where ba,bb ∈ R2n are the axes of the ellipse, and bc ∈ R2n is the center of the ellipse. Let z = (x(µ), s(µ)) = (x(α0 ), s(α0 )) ∈ E(α) be close to or on the central path. Using the approach of Chapter 9, we define the first and second derivatives at (x(α0 ), s(α0 )) to have the form as if they were on the central path, satisfying � � � � � � M −I x˙ 0 = , (11.11) S X s˙ x ◦ s − σ µe � � � � � � M −I x¨ 0 = , (11.12) S X s¨ −2x˙ s˙ T
where µ = xn s , σ ∈ (0, 14 ) is the centering parameter. Let α ∈ [0, π2 ] and (x(α), s(α)) be updated from (x, s) after the searching along the ellipse. Similar to the previous chapters, we have the following theo rem.
An Arc-Search Algorithm for LCP
�
259
Theorem 11.1 Let (x(α), s(α)) be an arc defined by (11.10) passing through a point (x, s) ∈ E(α), and its first and second derivatives at (x, s) be (x˙ , s˙) and (x¨ , s¨), which are defined by (11.11) and (11.12). Then, an ellipse approximation of the central path is given by x(α) = x − sin(α)x˙ + (1 − cos(α))x¨ ,
(11.13)
s(α) = s − sin(α)s˙ + (1 − cos(α))s¨.
(11.14)
x(α) ◦ s(α) = x ◦ s − sin(α)(x ◦ s − σ µe) + χ(α),
(11.15)
To simplify the notation, let g(α) := 1 − cos(α). Using Theorem 11.1, Equa tions (11.11) and (11.12), we get
where χ(α) := −g2 (α)x˙ ◦ s˙ − sin(α)g(α)(x˙ ◦ s¨ + s˙ ◦ x¨ ) + g2 (α)x¨ ◦ s¨.
(11.16)
The search for the optimizer is carried out in the wide neighborhood of the central path, defined as N−∞ (γ) = {(x, s) ∈ F 0 | min(x ◦ s) ≥ γ µ},
(11.17)
where γ ∈ (0, 12 ) is a constant independent of n. In order to facilitate the analysis of the algorithm, we denote � � π �� sin(αˆ ) := max sin(α) | (x(α), s(α)) ∈ N−∞ (γ), α ∈ 0, . (11.18) 2 The proposed algorithm starts from an initial point (x0 , s0 ) ∈ N−∞ (γ), which is close to or on the central path C. Using the ellipse that passes through the point (x0 , s0 ), the algorithm searches along the ellipse and gets a new iterate (x(α), s(α)), which reduces the value of the duality gap. The procedure is re peated until an ε-approximate solution of the LCP is found. It is worthwhile to indicate that this algorithm is a feasible interior-point algorithm. A formal de scription of the algorithm is presented below: Algorithm 11.1 (Arc-search IPM for LCP) data: ε > 0, γ ∈ (0, 12 ), and σ ∈ (0, 14 ). initial point: (x0 , s0 ) ∈ N−∞ (γ) and (x0 )T s0 . for iteration k = 0, 1, 2, ... Step 1: If (xk )T sk ≤ ε, then stop. Step 2: Solve the systems (11.11) and (11.12) to get (x˙ , s˙) and (x¨ , s¨). Step 3: Compute sin(αˆ k ) by (11.18), and let (xk+1 , sk+1 ) = (x(αˆ k ), s(αˆ k )). Step 4: Calculate µk+1 = end(for)
(xk+1 )T sk+1 , n
and set k := k + 1.
260
• Arc-Search Techniques for Interior-Point Methods
11.3
Convergence Analysis
We first establish several lemmas which are useful for the convergence analysis of the algorithm. First, starting from a feasible solution, searching for the opti mizer along the ellipse, the iterate always meets the equality constraints. For the sake of simplicity, in the rest of the analysis, we omit the index k if confusion will not be introduced. Lemma 11.1 Let (x, s) be a strictly feasible point of LCP, (x˙ , s˙) and (x¨ , s¨) meet (11.11) and (11.12), respectively, (x(α), s(α)) be calculated using (11.13) and 11.14), then Mx(α) − s(α) = −q. Proof 11.1 Since (x, s) is a strict feasible point, by Theorem 11.1, equations (11.11) and (11.12), we have Mx(α) − s(α) = M[x − sin(α)x˙ + (1 − cos(α))x¨ ] − [s − sin(α)s˙ + (1 − cos(α))s¨] = Mx − sin(α)Mx˙ + (1 − cos(α))Mx¨ − s + sin(α)s˙ − (1 − cos(α))s¨ = Mx − s − sin(α)(Mx˙ − s˙) + (1 − cos(α))(Mx¨ − s¨) = −q. This completes the proof.
The next lemma provides several useful relations. Lemma 11.2 Let x˙ , s˙, x¨ and s¨ be defined in (11.11) and (11.12), and M be a positive semi-define matrix. Then, the following relations hold: x˙ T s˙ = x˙ T Mx˙ ≥ 0,
(11.19)
x¨ T s¨ = x¨ T Mx¨ ≥ 0,
(11.20)
x¨ s˙ = x˙ s¨ = x˙ Mx¨ ,
(11.21)
T
T
T
−(x˙ T s˙)(1 − cos(α))2 − (x¨ T s¨)sin2 (α) ≤ [x¨ T s˙ + x˙ T s¨] sin(α)(1 − cos(α)) ≤ (x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α),
(11.22)
−(x˙ T s˙)sin2 (α) − (x¨ T s¨)(1 − cos(α))2 ≤ [x¨ T s˙ + x˙ T s¨] sin(α)(1 − cos(α)) ≤ (x˙ T s˙)sin2 (α) + (x¨ T s¨)(1 − cos(α))2 .
(11.23)
An Arc-Search Algorithm for LCP
•
261
Proof 11.2 Pre-multiplying x˙ T and x¨ T to the first rows of (11.11) and (11.12), respectively, we have x˙ T s˙ = x˙ T Mx˙ and x¨ T s¨ = x¨ T Mx¨ . The second two inequalities of (11.19) and (11.20) follow from the fact that M is positive semi-definite. Similarly, we also have x¨ T Mx˙ = x¨ T s˙ and x˙ T Mx¨ = x˙ T s¨, which means x¨ T s˙ = x˙ T s¨ = x˙ T Mx¨ . Since M is a positive semi-definite matrix, we have [(1 − cos(α))x˙ + sin(α)x¨ ]T M[(1 − cos(α))x˙ + sin(α)x¨ ] = (x˙ T Mx˙ )(1 − cos(α))2 + 2(x˙ T Mx¨ ) sin(α)(1 − cos(α)) + (x¨ T Mx¨ ) sin2 (α) = (x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α) + (x¨ T s˙ + x˙ T s¨) sin(α)(1 − cos(α)) ≥ 0. (11.24) From the above result, we can get the first inequality of (11.22). Similarly, we also have [(1 − cos(α))x˙ − sin(α)x¨ ]T M[(1 − cos(α))x˙ − sin(α)x¨ ] = (x˙ T Mx˙ )(1 − cos(α))2 − 2(x˙ T Mx¨ ) sin(α)(1 − cos(α)) + (x¨ T Mx¨ ) sin2 (α) = (x˙ T s˙)(1 − cos(α))2 + (x¨ T s¨) sin2 (α) − (x¨ T s˙ + x˙ T s¨) sin(α)(1 − cos(α)) ≥ 0, (11.25) which implies the second inequality of (11.22). Substituting (1 − cos(α))x˙ and sin(α)x¨ by sin(α)x˙ and (1 − cos(α))x¨ , respectively, following the same way, we can obtain the inequalities of (11.23).
The next lemma was used in previous chapters. We repeat it here for easy reference. Lemma 11.3 For α ∈ [0, π2 ], the following inequalities hold. sin(α) ≥ sin2 (α) = 1 − cos2 (α)≥1 − cos(α).
(11.26)
The following lemma is useful for the estimation of the upper bound of µ(α). Lemma 11.4 Let (x, s) ∈ N−∞ (γ), (x˙ , s˙) and (x¨ , s¨) be calculated from (11.11) and (11.12). Let x(α) and s(α) be defined as (11.13) and (11.14), respectively. Then, 1� 2 µ(α) = µ[1 − (1 − σ ) sin(α)] + 1 − cos(α) x¨ T s¨ − x˙ T s˙ n � − sin(α)(1 − cos(α)) x˙ T s¨ + s˙T x¨ � 1� ≤ µ[1 − (1 − σ ) sin(α)] + x¨ T s¨ sin4 (α) + sin2 (α) . (11.27) n
262
� Arc-Search Techniques for Interior-Point Methods
Proof 11.3
Using (11.13), (11.14), (11.11) and (11.12), we get
x(α)T s(α) � �� � = xT − x˙ T sin(α) + x¨ T (1 − cos(α)) s − s˙ sin(α) + s¨(1 − cos(α))
nµ(α) =
¨ − cos(α))
= xT s − xT s˙ sin(α) + xT s(1 −x˙ T s sin(α) + x˙ T s˙ sin2 (α) − x˙ T s¨ sin(α)(1 − cos(α)) ¨ − cos(α))2 +x¨ T s(1 − cos(α)) − x¨ T s˙ sin(α)(1 − cos(α)) + x¨ T s(1 = xT s − (xT s˙ + sT x˙ ) sin(α) + (xT s¨ + sT x¨ )(1 − cos(α)) −(x˙ T s¨ + s˙T x¨ ) sin(α)(1 − cos(α)) + x˙ T s˙ sin2 (α) + x¨ T s¨(1 − cos(α))2 = nµ[1 − (1 − σ ) sin(α)] + (1 − cos(α))2 (x¨ T s¨ − x˙ T s˙) −(x˙ T s¨ + s˙T x¨ ) sin(α)(1 − cos(α)) ≤ nµ[1 − (1 − σ ) sin(α)] + (1 − cos(α))2 (x¨ T s¨ − x˙ T s˙) +(1 − cos(α))2 x˙ T s˙ + sin2 (α)x¨ T s¨ = nµ[1 − (1 − σ ) sin(α)] + (1 − cos(α))2 x¨ T s¨ + sin2 (α)x¨ T s¨ ¨ ≤ nµ[1 − (1 − σ ) sin(α)] + (sin4 (α) + sin2 (α))(x¨ T s), where the first inequality follows from Lemma 11.2, and the second inequality is from Lemma 11.3. This proves the lemma.
Now, we present some technical results which are used to prove the main 1 1 results of convergence analysis. Denote D = X 2 S− 2 , where 0 < (x, s) ∈ R2n . The next result was presented in [174]. Lemma 11.5 Let (u, v) ≥ 0 and u, v ∈ Rn , then, �u ◦ v�1 ≤ �Du� �D−1 v� ≤ Proof 11.4 �u ◦ v�1
� 1 � �Du�2 + �D−1 v�2 . 2
Since −1 = �Du ◦ D−1 v�1 = |(D1 u1 )(D−1 1 v1 )| + . . . + |(Dn un )(Dn vn )| −1 −1 = (D1 u1 )(D1 v1 ) + . . . + (Dn un )(Dn vn ),
and �Du�2 �D−1 v�2 =
� � −1 2 2 (D1 u1 )2 + . . . + (Dn un )2 (D−1 1 v1 ) + . . . + (Dn vn ) ,
it follows directly from Cauchy-Schwarz inequality that �u ◦ v�1 ≤ �Du�2 �D−1 v�2 . This proves the first inequality. The second inequality follows the fact that ab ≤ 1 2 2 2 (a + b ) for any scalars a and b.
In what follows, we give the upper bounds for �D−1 x˙ �, �Ds˙�, �D−1 x¨ � and �Ds¨�, which are essential to the convergence analysis.
An Arc-Search Algorithm for LCP
•
263
Lemma 11.6 For (x, s) ∈ N−∞ (γ), let (x˙ , s˙) be the solution of (11.11), then, we have ID−1 x˙ I2 + IDs˙I2 ≤ β1 µn, Ix˙ ◦ s˙I1 ≤
β1 µn , 2
and ID−1 x˙ I ≤ 2
� � β1 µn, IDs˙I ≤ β1 µn,
where β1 ≥ 1 ≥ 1 − 2σ + σγ .
Proof 11.5
1
Pre-Multiplying the second equation in (11.11) by (XS)− 2 , we get
1
D−1 x˙ + Ds˙ = (XS)− 2 (x ◦ s − σ µe). Then, taking the squared norm on both sides and using xi si ≥ γ µ yields 1
ID−1 x˙ I2 + 2x˙ T s˙ + IDs˙I2
1
= I(x ◦ s) 2 I2 − 2σ µn + I(XS)− 2 σ µeI2 ≤
xT s − 2σ µn +
Iσ µeI2 mini (xi si )
≤
µn − 2σ µn +
σ 2 µn γ
≤
β1 µn.
(11.28)
Since x˙ T s˙ = x˙ T Mx˙ ≥ 0, we have ID−1 x˙ I2 + IDs˙||2 ≤ β1 µn. Using this result and Lemma 11.5, we obtain � � ID−1 x˙ I ≤ β1 µn, IDs˙I ≤ β1 µn, and Ix˙ ◦ s˙||1 ≤
β1 µn . 2
This proves the lemma. Lemma 11.7 If (x, s) ∈ N−∞ (γ), let (x¨ , s¨) be the solution of (11.12), then we have ¨ 2 + IDs¨I2 ≤ β2 µn2 , Ix¨ ◦ s¨I1 ≤ ID−1 xI
β2 µn2 , 2
and where β2 =
β12 γ
ID−1 x¨ I ≤ ≥ 1.
� � β2 µn, IDs¨I ≤ β2 µn,
264
• Arc-Search Techniques for Interior-Point Methods 1
Proof 11.6 Pre-multiplying (XS)− 2 to the second equation in (11.12), taking the squared norm on both sides, and using the fact that x¨ T s¨ ≥ 0, we obtain ID−1 x¨ I2 + IDs¨I2
≤ ≤ ≤ ≤
1
I(XS)− 2 (−2x˙ ◦ s˙)I2 I − 2x˙ ◦ s˙I2 mini (xi si ) 4Ix˙ ◦ s˙I21 γµ β2 µn2 ,
(11.29)
From the above inequality and Lemma 11.5, we get � � ID−1 x¨ I ≤ β2 µn, IDs¨I ≤ β2 µn, and Ix¨ ◦ s¨I1 ≤
β2 µn2 . 2
This completes the proof.
The next result follows directly from Lemmas 11.6 and 11.7. Lemma 11.8 � For β3 = β1 β2 ≥ 1, we have 3
3
Ix˙ ◦ s¨I1 ≤ ID−1 x˙ IIDs¨I ≤ β3 µn 2 , Ix¨ ◦ s˙I1 ≤ ID−1 x¨ IIDs˙I ≤ β3 µn 2 .
Let sin(αˆ 0 ) = lemma.
1 β4 n ,
where β4 =
β1 +β2 +4β3 σ (1−γ)
≥ 1, and we have the following
Lemma 11.9 Let χ(α) be defined as (11.16), then, for all α ∈ (0, αˆ 0 ], we have χ(α) ≥ − 12 sin(α)µ(1 − γ)σ e. Proof 11.7 χ(α)
Using Lemmas 11.6, 11.7, 11.8 and g(α) ≤ sin2 (α), we have = −g2 (α)x˙ ◦ s˙ − sin(α)g(α)(x˙ ◦ s¨ + s˙ ◦ x¨ ) + g2 (α)x¨ ◦ s¨ � � ≥ − sin4 (α)Ix˙ ◦ s˙I1 − sin3 (α)Ix˙ ◦ s¨ + s˙ ◦ x¨ I1 − sin4 (α)Ix¨ ◦ s¨I1 e � � 3 1 4 1 4 3 2 2 ≥ − sin (α)β1 µn + 2 sin (α)β3 µn + sin (α)β2 µn e 2 2
An Arc-Search Algorithm for LCP
•
265
� � 3 1 − sin(α)µ sin3 (αˆ 0 )β1 n + 4 sin2 (αˆ 0 )β3 n 2 + sin3 (αˆ 0 )β2 n2 e 2 � � 1 β1 4β3 β2 = − sin(α)µ + + e
2 β43 n2 β 2 n 12 β43 n
≥
4
1 1 ≥ − sin(α)µ [β1 + β2 + 4β3 ]e 2 β4
1
= − sin(α)µ(1 − γ)σ e. 2
(11.30)
This completes the proof. Lemma 11.10 Let (x, s) ∈ N−∞ (γ), and sin(αˆ ) be defined by (11.18). Then, for all sin(α) ∈ [0, sin(αˆ 0 )], we have (x(α), s(α)) ∈ N−∞ (γ) and sin(αˆ ) ≥ sin(αˆ 0 ). Proof 11.8
= ≥ ≥ =
=
≥
≥
Using (11.15), (11.17), Lemmas 11.9, 11.3, and 11.4, we have min(x(α) ◦ s(α))
min[x ◦ s − sin(α)(x ◦ s − σ µe) + χ(α)]
min(x ◦ s)(1 − sin(α)) + sin(α)σ µ + min(χ(α))
1 (1 − sin(α))γ µ + sin(α)σ µ − sin(α)(1 − γ)σ µ 2 1 γ µ(α) − σ γ µ sin(α) + sin(α)σ µ − sin(α)(1 − γ)σ µ 2 � γ� 2 T T ¨ ˙ ¨ ˙ − (1 − cos(α)) (x s − x s) − sin(α)(1 − cos(α))(x˙ T s¨ + s˙T x¨ ) n 1 γ µ(α) + sin(α)(1 − γ)σ µ − sin(α)(1 − γ)σ µ 2
γ 2 T T − (1 − cos(α)) (x¨ s¨ − x˙ s˙)
n γ + sin(α)(1 − cos(α))(x˙ T s¨ + s˙T x¨ ) n 1 γ γ µ(α) + sin(α)(1 − γ)σ µ − (1 − cos(α))2 (x¨ T s¨ + x˙ T s˙) 2 n γ T T + sin(α)(1 − cos(α))(x˙ s¨ + s˙ x¨ ) n 1 γ γ µ(α) + sin(α)(1 − γ)σ µ − sin2 (α)(x˙ T s˙) 2 n
γ ¨ − (1 − cos(α))2 (x¨ T s)
n γ ¨ + sin(α)(1 − cos(α))(x˙ T s¨ + s˙T x) (11.31) n
266
� Arc-Search Techniques for Interior-Point Methods
where the last two inequalities follow from the relations x˙ T s˙ ≥ 0 and Lemma 11.3, respectively. Using Lemmas 11.6, 11.7 and 11.8, we can verify that the following relation holds. 1 γ sin(α)(1 − γ)σ µ − sin2 (α)(x˙ T s˙) 2 n γ 2 T ¨ − (1 − cos(α)) (x¨ s) n γ + sin(α)(1 − cos(α))(x˙ T s¨ + s˙T x¨ ) n �n � γ β1 n β 2 n2 3 ≥ µ sin(α) (1 − γ)σ − sin(α) − sin (α) − 2β3 n3/2 sin2 (α) n 2γ 2 2 �n β2 4β3 � γ β1 ≥ µ sin(α) (1 − γ)σ − − 3 − 2 1/2 [use sin(αˆ 0 ) = β1n ] 4 2n γ β4 β4 n β4 n � � γ n β1 β2 4β3 ≥ µ sin(α) (1 − γ)σ − − − β4 2n γ β4 β4 �n � γ 2 +4β3 ≥ µ sin(α) (1 − γ)σ − (1 − γ)σ [use β4 = β1σ+β (1−γ) ] 2n γ ≥ 0. (11.32) Combining (11.31) and (11.32) shows that : min(x(α) ◦ s(α)) ≥ γ µ(α). This proves that x(α) ◦ s(α) > 0. Since x(α) and s(α) are continuous in [0, αˆ 0 ] and (x, s) > 0, hence, x(α) > 0, s(α) > 0. By (11.17) and (11.18), we have (x(α), s(α)) ∈ N−∞ (γ). In view of (11.18), we obtain sin(αˆ ) ≥ sin(αˆ 0 ).
Now, we are ready to present the iteration complexity bound of Algorithm 11.1. To this end, we need to obtain a new upper bound of µ(α). From Lemmas 11.4, 11.3, and 11.7, it follows that for α ∈ [0, αˆ 0 ], � 1� 4 ¨ (α) + sin2 (α)) µ(α) ≤ µ[1 − (1 − σ ) sin(α)] + x¨ T s(sin n � 1� ≤ µ[1 − (1 − σ ) sin(α)] + �x¨ ◦ s¨�1 (sin4 (α) + sin2 (α)) n � � β2 n 3 ≤ µ 1 − (1 − σ − (sin (α) + sin(α))) sin(α) 2 � � � β2 β2 � ≤ µ 1− 1−σ − 3 2 − sin(α) 2β4 2β4 n � � 1 ≤ 1 − sin(α) µ, 8 where the last two inequalities come from sin(αˆ 0 ) = β1 +β2 +4β3 (1−γ)σ
≥
β2 σ
≥ 4β2 , and n ≥ 2.
1 β4 n ,
0 < σ < 41 , and β4 =
An Arc-Search Algorithm for LCP
Theorem 11.2 Let (x0 , s0 ) ∈ N−∞ (γ), sin(αˆ 0 ) = O(n log (x
0 T 0
) s ε
Proof 11.9
1 β4 n ,
•
267
then Algorithm 11.1 terminates at most
) iterations. Using the relation sin(αˆ ) ≥ sin(αˆ 0 ), we have
1 1 1 µ(αˆ ) ≤ 1 − sin(αˆ ) µ ≤ 1 − sin(αˆ 0 ) µ = 1 − µ. 8 8 8β4 n Then, we obtain (xk )T sk ≤ 1 −
1 8β4 n
k
(x0 )T s0 ,
which implies, 1−
1 8β4 n
and k log 1 − Since log 1 −
k
(x0 )T s0 ≤ ε,
1 ε ≤ log 0 T 0 . 8β4 n (x ) s
1 1 ≤− ≤ 0, 8β4 n 8β4 n
0 T 0
Thus, for k ≥ 8β4 n log (x
) s ε
, we have (xk )T sk ≤ ε.
This completes the proof.
11.4
Concluding Remarks
Although there are quite a few papers that discuss arc-search algorithms for LCP problems, the majority of the research is focused on a very special type of LCP, i.e., P∗ (k)-LCPs, because of the technical difficulty of using higher-order method for general monotone LCPs. There are many problems which are worth investigating for general LCPs. For example, are there efficient infeasible arcsearch interior-point algorithms for general monotone LCPs? Moreover, if such algorithms exist, what is the lowest polynomial bound achievable for infeasi ble arc-search interior-point algorithms? Computationally, how efficient are the arc-search interior-point algorithms in comparison to traditional interior-point algorithms?
Chapter 12
An Arc-Search Algorithm for Semidefinite Programming Semidefinite programming (SDP) is a convex optimization problem over the in tersecting of affine set and the cone of positive semidefinite matrices [66, 165]. SDP has applications in many areas, such as combinatorial optimization [2], sys tem and control theory [14], and eigenvalue optimization problems [107]. In the last two decades, SDP has become an active research area in mathematical pro gramming as it became clear that the interior-point methods for linear program ming (LP) can often be extended to the more general SDP case. Several interiorpoint methods (IPMs) designed for LP have been successfully generalized to SDP, for example, the ones in [73, 74, 103, 138]. As discussed in Chapters 2, 3 and 4, the majority of IPMs search optimizer along a straight line related to the first-order and higher-order derivatives of the central path [55, 56]. However, most optimization problems are nonlinear, and the ideal search should be along arcs, such as the central path in LP, not straight lines. After this author developed the idea of primal-dual IPMs with arc-search, the idea has been used by different authors to solve various IPM problems. For example, Yang et al. [145, 149, 147, 148, 161], Kheirfam and Chitsaz [64], and Mansouri et al. [85, 111] investigated the infeasible IPMs with arc-search for LP and symmetric conic optimization (SCO). Later, Pirhaji et al. [110] intro duced a l2 -neighborhood infeasible IPM with arc-search for linear complemen tary problem (LCP). Kheirfam [60, 61] proposed infeasible arc-search algorithms in the negative infinity neighborhood for SCO and horizontal LCP. Many of these
An Arc-Search Algorithm for Semidefinite Programming
•
269
works improved the theoretical complexity bounds over traditional line search IPMs. Recently, Kheirfam and Moslemi [62] extended arc-search strategy to the SDP case. The derived complexity bound is of the same order as the one for LP case. This chapter discusses an arc-search algorithm for SDP problem, which is proposed by Zhang et al. [170].
12.1
Preliminary
The following notations are used throughout the chapter. The set of all symmetric n × n real matrices is denoted by S n . For M ∈ S n , we write M � 0(M � 0) if M n (S+n ) the set of is positive definite (positive semidefinite). We also denote by S++ n all matrices in S which are positive definite (positive semidefinite). For a matrix M with all real eigenvalues, we denote its eigenvalues by λi (M), i = 1, 2, ..., n, in increasing order, and its smallest and largest eigenvalues by λmin (M) = λ1 (M) and λmax (M) = λn (M), respectively. Moreover, the spectral condition number of a symmetric matrix M is denoted by cond(M) = λmax (M)/λmin (M). The spectral radius of M is denoted by ρ(M) := max{|λi (M)| : i = 1, 2, ..., n}. Given G and H in Rm×n , the inner product between them is defined as G • H := Tr(GT H), where Tr(M) is the trace of M ∈ Rn×n . For M ∈ Rn×n , IMIF , IMI denote the Frobenius norm and matrix 2-norm, respectively. Some matrix analysis results will play important roles in this chapter. The following formulas involving matrix trace (see [54, 109]) will be used repeat edly. Lemma 12.1 Let A, B, X ∈ Rn×n . Then, we have
Tr(A) = Tr(AT ), Tr(A + B) = Tr(A) + Tr(B), Tr(AB) = Tr(BA), ∂ ∂ Tr(AXT ) = A, ⇒ if X ∈ Sn , then Tr(AX) = A. ∂X ∂X
(12.1)
(12.2)
(12.3)
(12.4)
The Kronecker product of an m × n matrix A and a p × q matrix B, is an mp × nq matrix, A ⊗ B, defined as ⎡ ⎤ a11 B a12 B . . . a1n B ⎢ a21 B a22 B . . . a2n B ⎥ ⎢ ⎥ A ⊗ B = ⎢ . (12.5)
.. .. .. ⎥ . ⎣ . . . . ⎦
am1 B am2 B . . . amn B
270
• Arc-Search Techniques for Interior-Point Methods
Some Kronecker product identities are useful, these are listed below. The proofs can be found in [54, 109]. Lemma 12.2 Let A, B, C, and D be matrices with appropriate dimensions. Then, we have (A ⊗ B)T = AT ⊗ BT , −1
−1
−1
(A ⊗ B) = A ⊗ B , (A ⊗ B)(C ⊗ D) = AC ⊗ BD, Tr(A ⊗ B) = Tr(A)Tr(B) = Tr(ΛA ⊗ ΛB ) eig(A ⊗ B) = eig(A) ⊗ eig(B).
(12.6)
(12.7)
(12.8)
(12.9)
(12.10)
where ΛA denotes the diagonal matrix with eigenvalues of A. Corollary 12.1 The Kronecker product A ⊗ B is positive semidefinite if and only if both A and B are positive semidefinite or negative semidefinite. Proof 12.1 In view of the last formula of Lemma 12.2, the claim follows from that, given the eigenvalues λ1 , . . . , λn for A and σ1 , , . . . , σn for B, the eigenvalues of A ⊗ B are ∀i, j, λi σ j . This completes the proof.
The vector operator applied on a matrix A, denoted by vec(A), stacks the columns one by one from left to the right into a vector. For example, a 2 × 2 matrix A and vec(A) can be expressed as ⎡
⎤
a11 � � ⎢ a21 ⎥ a a ⎥ A = 11 12 , vec(A) = ⎢ ⎣ a12 ⎦ .
a21 a22 a22 The Kronecker product and the vector-operator are related by the following lemma, which can be found in [54, 109]. Lemma 12.3 Let A, B, and X be matrices with appropriate dimensions, a, and b be vectors with appropriate dimensions, and α be a scalar. Then, the following equations hold. vec(AXB) = (BT ⊗ A)vec(X), T
T
Tr(A B) = vec(A) vec(B),
(12.11) (12.12)
An Arc-Search Algorithm for Semidefinite Programming
vec(A + B) = vec(A) + vec(B), vec(αA) = α · vec(A), T T a XBX b = vec(X)T (B ⊗ baT )vec(X).
�
271
(12.13)
(12.14)
(12.15)
Proof 12.2 We prove only (12.12) and leave the rest for readers. Let A = [a1 , a2 , . . . , an ] ∈ Rn×n , and B = [b1 , b2 , . . . , bn ] ∈ Rn×n . Then, we have ⎡ ⎤ b1 ⎢ ⎥ (vec(A))T vec(B) = [aT1 , a2T , . . . , anT ] ⎣ ... ⎦ bn =
⎛⎡
n �
aT1
⎤
⎞
⎟ ⎜⎢ ⎥ aTi bi = Tr ⎝⎣ ... ⎦ [b1 , b2 , . . . , bn ]⎠ = Tr(AT B). i=1 aTn
(12.16)
Proposition 12.1 Let vectors u, v, r ∈ Rn and nonsingular matrices E, F ∈ Rn×n satisfy Eu + Fv = r. If S = FET is symmetric positive definite, then D = S−1/2 F = S1/2 E−T , D−T = S−1/2 E = S1/2 F−T ,
(12.17)
�D−T u�2 + �Dv�2 + 2uT v = �S−1/2 r�2 .
(12.18)
and
Proof 12.3
First, the following relations are equivalent: ⇐⇒ ⇐⇒
S = FET SE−T = F 1 1 S 2 E−T = S− 2 F := D.
Similarly, we have the following equivalent relations: ⇐⇒ ⇐⇒ 1
S−1 = E−T F−1 = F−T E−1 S−1 E = F−T 1 1 S− 2 E = S 2 F−T . 1
1
Using D−T = (S− 2 F)−T = (FT S− 2 )−1 = S 2 F−T , we get (12.17). Pre-multiplying both side of Eu + Fv = r by S−1/2 , using (12.17) (noticing that uT ES−1 Fv = uT v), and taking 2-norm squared show (12.18).
272
• Arc-Search Techniques for Interior-Point Methods
12.2
A Long Step Feasible SDP Arc-Search Algorithm
Let S n be the set of real symmetric n × n matrices. We consider the SDP in standard form, which is given as : min C • X, s.t. Ai • X = bi , i = 1, . . . , m, X � 0,
X∈S n
(12.19)
where C ∈ S n , Ai ∈ S n for k = 1, . . . , m, and b = (b1 , . . . , bm ) ∈ Rn are given, and 0 � X ∈ S n are the primal variables to be optimized. Because of the symmetric restriction, there are n(n + 1)/2 free variables (not n2 variables) in X. The dual problem of (12.19) is given by m �
minn bT y, s.t.
X∈S
i=1
yi Ai + S = C, S � 0,
(12.20)
where y = [y1 , . . . , ym ]T ∈ Rm and 0 � S ∈ S n are the dual variables. Again, there are n(n + 1)/2 free variables in S because of the symmetric restriction. The set of primal-dual feasible solutions is denoted by n
m
n
F := {(X, y, S) ∈ S × R × S : Ai • X = bi ,
m � i=1
yi Ai + S = C, X, S � 0}, (12.21)
and the set of the primal-dual strict feasible set is 0
n
m
n
F := {(X, y, S) ∈ S × R × S : Ai • X = bi ,
m � i=1
yi Ai + S = C, X, S � 0}. (12.22)
Throughout this chapter, we also make two standard assumptions: Assumptions: 1. F 0 = ∅. 2. Ai is linearly independent. The first assumption makes sure that the SDP problem has an optimal solu tion. The second assumption guarantees that there are solutions for the symmetric first and second order derivatives (see proposition 12.2). The KKT conditions for the positive semidefinite problem is given by: Ai • X = Tr(Ai X) = bi , i = 1, . . . , m, X � 0 m � yi Ai + S = C, S � 0,
(12.23a) (12.23b)
i=1
XS = 0.
(12.23c)
An Arc-Search Algorithm for Semidefinite Programming
•
273
The KKT system of equations has total n(n + 1) + m variables.
Denote the duality measure for the SDP as:
µ = X • S/n = Tr(XS)/n.
(12.24)
The perturbed KKT conditions for the positive semidefinite problem is, therefore, given by : Ai • X = bi , i = 1, . . . , m, X � 0 m � yi Ai + S = C, S � 0,
(12.25a) (12.25b)
i=1
XS = µI.
(12.25c)
Given a value for µ, there is a solution of (12.25). Therefore, the systems of equations (12.25) define a curve, named the central path in n(n + 1) + m dimen sional space. The so-called path-following interior-point method approximates the central path using the derivatives and searches for the optimizers along the approximated central path. To approximate the central path, the easiest way is to use a linear approximation, which involves some formulas equal or similar to the following one ˙ = µI XS + XS˙ + SX (12.26) ˙ are required to be symmetric so that X + X ˙ and S + S˙ are where both S˙ and X symmetric. Unfortunately, none of the products on left hand side items in (12.26) ˙ will be symmetric for sure. are guaranteed to be symmetric, neither S˙ nor X However, it is straightforward (but very tedious) to verify [4] that a modified equation of (12.26) � 1� ˙ + SX ˙ + XS ˙ = µI XS + SX + XS˙ + SX 2
(12.27)
˙ and S˙ ! gives symmetric X n×n and H(M) = � Let usT �consider a little more general case M ∈ R 1 M + M . The following simple observation is useful for the future analysis: 2 for any scalar α and matrices M, M1 , M2 ∈ Rn×n , it is easy to verify H(M) = (H(M))T , H(αM) = αH(M), H(M1 + M2 ) = H(M1 ) + H(M2 ).
(12.28)
(12.29)
(12.30)
Zhang [173] introduced a more general similarly transformed symmetrization operator � 1� HP (M) = PMP−1 + (PMP−1 )T . (12.31) 2
274
• Arc-Search Techniques for Interior-Point Methods
For the special case P = I, we define HP (M) = HI (M) = H(M), the last notation is used for this special case. Lemma 12.4 For any scalar α and matrices M, M1 , M2 ∈ Rn×n , it is easy to verify HP (M) = (HP (M))T , HP (αM) = αHP (M), HP (M1 + M2 ) = HP (M1 ) + HP (M2 ), Tr(HP (M)) = Tr(M).
(12.32)
(12.33)
(12.34)
(12.35)
Therefore, the modified perturbed KKT conditions, which guarantee to have ˙ and S˙ , can be rewritten as follows: symmetric X ˙ = 0, i = 1, . . . , m, Ai • X m � y˙ i Ai + S˙ = 0,
(12.36a) (12.36b)
i=1
˙ = HP (XS) − σ µI, HP (XS˙ + XS)
(12.36c)
where σ ∈ [0, 1] is the centering parameter. Similarly, we have the second order equations: ¨ = 0, i = 1, . . . , m, Ai • X m � y¨ i Ai + S¨ = 0,
(12.37a) (12.37b)
i=1
˙ ¨ = −2HP (X ˙ S). HP (XS¨ + XS)
(12.37c)
˙ , S˙ , X ¨ It is important to emphasize that (12.36) and (12.37) guarantee that X and S¨ are all symmetric which in turn guarantee that Xk and Sk will be symmetric. By taking extra care of the iteration, we can prove that they are actually positive definite in all iterations. Let P be nonsingular matrix, and define ˙ = PXP, ˙ −1 . ˆ = PXP, X ˙ ˆ Sˆ = P−1 SP−1 , Sˆ˙ = P−1 SP X Then, we have ˙ HP (XS˙ + XS) −1 ˙ + SX ˙ )PT ˙ = P(XS˙ + XS)P + P−T (SX ˙ −1 + PXPP ˙ −1 SP−1 = PXPP−1 SP ˙ −T PT XPT + P−T SP−T PT XP ˙ T +P−T SP
(12.38)
An Arc-Search Algorithm for Semidefinite Programming
˙ˆ Sˆ + (X ˙ˆ Sˆ )T ˆ S˙ˆ + X ˆ S˙ˆ + X = X ˙ˆ Sˆ ). ˆ S˙ˆ + X = H(X
•
275
(12.39)
˙ is equivalent to perform a scaling first and then This means that HP (XS˙ + XS) perform the symmetrization. Therefore, the last equation of (12.36) can be rewrit ten as ˙ˆ Sˆ ) = H(X ˆ Sˆ ) − σ µI, ˆ S˙ˆ + X H(X (12.40) Similarly, the last equation of (12.37) can be rewritten as ˙ˆ S˙ˆ ). ˆ S¨ˆ + X ˆ¨ Sˆ ) = −2H(X H(X
(12.41)
Applying vector-operator (see Lemma 12.3) to (12.40) gives ˆ vec(X ˆ˙ ) + Fˆ vec(S˙ˆ ) = vec(H(X ˆ Sˆ ) − σ µI), E
(12.42)
where
ˆ ≡ 1 (Sˆ ⊗ I + I ⊗ Sˆ ), Fˆ ≡ 1 (X ˆ ⊗I+I⊗X ˆ ). E (12.43) 2 2 n2 ˆ ∈ S++ ˆ ∈ S+n2 if S � 0) and In view of Corollary 12.1, it follows that E if S � 0 (E 2 n2 Fˆ ∈ S++ if X � 0 (Fˆ ∈ S+n if X � 0). Applying vector-operator to (12.41) gives ¨ˆ ) + Fˆ vec(S¨ˆ ) = −2vec(H(XS ˙ˆ ˙ˆ )). ˆ vec(X E
(12.44)
When P � 0 holds, the following proposition, due to Shida, Shindoh and Kojima [98, 124], is useful. Proposition 12.2 Let X � 0, S � 0, P � 0 be given and suppose that Assumption 2 holds, then, a suf ficient condition for systems (12.36) and (12.37) to have a unique solution is that ˆ Fˆ + Fˆ Eˆ � 0. Moreover, the later condition holds if H(X ˆ Sˆ ) = HP (XS) � 0. E
Several popular selections of P are: (a) P = I proposed by Alizadeh, Haeberly, and Overton [4] (denoted as AHO), (b) P = S1/2 proposed by Helmberg, Randl, Vanderbei, and Wolkowicz [52], Kojima, Shida, and Shindoh [73], and Menteiro [94] (denoted as HRVW/KSS/M), (c) P = X−1/2 proposed by Kojima, Shida, 1/2 and Shindoh [73], and Menteiro [94] (denoted as KSS/M), and (d) P = Wnt proposed by Nesterov and Todd [102, 103] (denoted as NT), where Wnt := S1/2 (S1/2 XS1/2 )−1/2 S1/2 = X−1/2 (X1/2 SX1/2 )1/2 X−1/2 . ˙ and generate symmetric All these selections meet the definition of HP (XS˙ + XS) ˙ ˙ X and S. To make the convergence analysis easier to handle, we want to select P ˆ and Sˆ are commutative, i.e., such that the matrices X ˆ Sˆ = Sˆ X ˆ. X
(12.45)
276
• Arc-Search Techniques for Interior-Point Methods
We denote the set of P that meets the condition (12.45) as ˆ Sˆ = Sˆ X ˆ }. P(X, S) = {P ∈ Sn++ : X
(12.46)
It is easy to see that this set is not empty as HRVW/KSS/M, KSS/M, and NT scaling matrices are all in the set defined in (12.46). However, AHO (P = I) scaling is not. Using HRVW/KSS/M scaling P = S1/2 as an example, we have ˆ = S1/2 XS1/2 , Sˆ = S−1/2 SS−1/2 = I, X ˆ Sˆ = Sˆ X ˆ . The following lemma will be used later on. therefore, X Lemma 12.5 ˆ Sˆ = Sˆ X ˆ holds, then, we have Eˆ Fˆ = Fˆ Eˆ . If X Proof 12.4
It is straightforward to verify that Eˆ Fˆ
= =
Fˆ Eˆ
= =
1 ˆ ˆ ⊗I+I⊗X ˆ) (S ⊗ I + I ⊗ Sˆ )(X 4 1 ˆˆ ˆ +X ˆ ⊗ Sˆ + I ⊗ Sˆ X ˆ) (SX ⊗ I + Sˆ ⊗ X 4 1 ˆ ˆ )(Sˆ ⊗ I + I ⊗ Sˆ ) (X ⊗ I + I ⊗ X 4 1 ˆˆ ˆ +X ˆ ⊗ Sˆ + I ⊗ X ˆ Sˆ ) (XS ⊗ I + Sˆ ⊗ X 4
ˆ Sˆ = Sˆ X ˆ holds, we have E ˆ Fˆ = Fˆ Eˆ . Since X
In the remaining discussion, we assume that NT scaling is used. Therefore, Eˆ and Fˆ are commutative. Now, we are ready to discuss arc-search technique for the aforementioned SDP problem. Let’s consider the modified central path Ai • X = bi , i = 1, . . . , m, X � 0 m � yi Ai + S = C, S � 0,
(12.47a) (12.47b)
i=1
HP (XS) = µI,
(12.47c)
The following lemma [173] shows that the central path (12.25) and the modified central path (12.47) are equivalent. Lemma 12.6 For M ∈ Rn×n with real spectrum, nonsingular P ∈ Rn×n and a scalar µ, HP (M) = µI ⇔ M = µI.
(12.48)
An Arc-Search Algorithm for Semidefinite Programming
Proof 12.5
•
277
Suppose that the equality holds on the left. We must have PMP−1 = µI + G
for some skew-symmetric matrix G. Since the spectrum is real, we must have G = 0, otherwise M would have an eigenvalue µ + φk (G) for a nonzero pure imaginary number φk (M), contradicting the realness assumption for the spectrum of M. The inverse is obvious.
The algorithm to be discussed restricts iterates to the negative infinity neigh borhood of the central path, defined by N−∞ (γ) = {(X, y, S) ∈ F 0 : λmin (XS) ≥ γ µ},
(12.49)
where γ ∈ (0, 1) is a constant independent of n. Since ˆ Sˆ ), λmin (XS) = λmin (PXPP−1 SP−1 ) = λmin (X this means that this negative infinity neighborhood definition is working for both original and modified central paths. Following the idea in Chapter 5, we will use an ellipse E(α) to approximate the central path C described by E(α) = {(X(α), y(α), S(α)) : (X(α), y(α), S(α)) = cos(α)ba + sin(α)bb +bc}, (12.50)
where ba, bb ∈ S n × Rm × S n are the axes of the ellipse, and bc ∈ S n × Rm × S n is the center of the ellipse. From the first two rows of the equations (12.36) and (12.37), it is obvious to see that ˙ • S˙ = 0, X ¨ • S˙ = 0, X ˙ • S¨ = 0, X ¨ • S¨ = 0. X
(12.51)
Let α ∈ [0, π2 ]. The following lemma provides the formulas of the ellipsoidal approximation of the central path C. Lemma 12.7 Let E(α) be the ellipse defined in (12.50) which passes the point (X, y, S). Moreover, ¨ , y¨ , S¨ ) ˙ , y˙ , S˙ ) and (X assume that the first and second derivatives of the central path (X at (X, y, S) satisfy (12.36) and (12.37). Then, an ellipsoidal approximation of the (modified) central path is given by ˙ + (1 − cos(α))X ¨, X(α) = X − sin(α)X
(12.52)
S(α) = S − sin(α)S˙ + (1 − cos(α))S¨ ,
(12.53)
y(α) = y − sin(α)y˙ + (1 − cos(α))y¨ .
(12.54)
278
• Arc-Search Techniques for Interior-Point Methods
Proof 12.6
Similar to the proof in Chapter 5, let Z(α) = (X(α), y(α), S(α)) = cos(α)ba + sin(α)bb +bc.
(12.55)
Then, taking derivative twice at Z(α0 ) gives ˙ (α) = (X ˙ (α), y˙ (α), S˙ (α)) = − sin(α)ba + cos(α)bb, Z
(12.56)
¨ ¨ ¨ (α), y¨ (α), S(α)) Z(α) = (X = − cos(α)ba − sin(α)bb.
(12.57)
It is straightforward to verify the following relations from the results above: ˙ − cos(α)Z¨ , bb = cos(α)Z˙ − sin(α)Z ¨ , bc = Z + Z ¨. ba = − sin(α)Z
(12.58)
Denote ba
˙ − cos(α)Z ¨, = (aX , ay , aS ) = − sin(α)Z
this gives ˙ − cos(α)X ¨, aX = − sin(α)X ay = − sin(α)y˙ − cos(α)y¨ , aS = − sin(α)S˙ − cos(α)S¨ . Similarly, denote bb
˙ − sin(α)Z ¨, = (bX , by , bS ) = cos(α)Z
this gives ˙ − sin(α)X ¨, bX = cos(α)X by = cos(α)y˙ − sin(α)y¨ , ¨ bS = cos(α)S˙ − sin(α)S; and denote ¨, bc = (cX , cy , cS ) = Z + Z this gives ¨, cX = X + X cy = y + y¨ , cS = S + S¨ . Let X(α) and S(α) be the updated X and S after the search, since X = X(α0 ) = cos(α0 )aX + sin(α0 )bX + cx , we have X(α) = cos(α0 − α)aX + sin(α0 − α)bX + cX
An Arc-Search Algorithm for Semidefinite Programming
•
279
= [cos(α0 ) cos(α) + sin(α0 ) sin(α)]aX
+[sin(α0 ) cos(α) − cos(α0 ) sin(α)]bX
+cX − cos(α)cX + cos(α)cX
= cos(α)X + sin(α0 ) sin(α)aX − cos(α0 ) sin(α)bX + (1 − cos(α))cX ˙ + cos(α0 )X ¨ ] sin(α0 ) sin(α) = cos(α)X − [sin(α0 )X ˙ ¨ ¨) −[cos(α0 )X − sin(α0 )X] cos(α0 ) sin(α) + (1 − cos(α))(X + X ˙ = X − [sin2 (α0 ) sin(α) + cos2 (α0 ) sin(α)]X ¨ +[− sin(α0 ) cos(α0 ) sin(α) + sin(α0 ) cos(α0 ) sin(α) + (1 − cos(α))]X ˙ + (1 − cos(α))X ¨. = X − sin(α)X (12.59) Similarly, it follows that S(α) = S − sin(α)S˙ + (1 − cos(α))S¨ y(α) = y − sin(α)y˙ + (1 − cos(α))y¨ . This finishes the proof.
Define � π� . sin(αˆ ) := max sin(α) : (X(α), y(α), S(α)) ∈ N−∞ (γ), ∀ α ∈ 0, 2 (12.60) We are now in the position to describe the wide neighborhood interior-point al gorithm with arc-search for SDP problem. Algorithm 12.1 Data: ε > 0, γ ∈ (0, 1), and σ ∈ (0, 1). Initial point: (X0 , y0 , S0 ) ∈ N−∞ (γ), and µ0 = for iteration k = 0, 1, 2, . . .
X0 •S0 n .
Step 1. If Xk • Sk ≤ ε, then stop. ¨ ˙ , y˙ , S˙ ) and (X ¨ , y¨ , S). Step 2. Solve the systems (12.36) and (12.37) to get (X Step 3. Compute sin(αˆ ) by (12.60).
Step 4. Calculate (Xk+1 , yk+1 , Sk+1 ) = (X(αˆ k ), y(αˆ k ), S(αˆ k )) and set µk+1 = Xk+1 •Sk+1 , n Step 5. k := k + 1. Go back to step 1. end (for)
The following lemma indicates that if the initial point satisfies the equality constraints in N−∞ (γ), then a search along the ellipse will also meet the equality constraints in N−∞ (γ).
280
• Arc-Search Techniques for Interior-Point Methods
Lemma 12.8 Let (X, y, S) be a strictly feasible point of the primal SDP (12.19) and the ˙ , y˙ , S˙ ) and (X ¨ , y¨ , S¨ ) satisfy (12.36) and (12.37), respectively, dual SDP (12.20), (X (X(α), y(α), S(α)) are calculated using Lemma 12.7, then Ai • X(α) = bi ,
m �
yi (α)Ai + S(α) = c.
i=1
Proof 12.7 Since (X, y, S) is a strictly feasible point, and from Lemma 12.7, (12.36) and (12.37), we have ˙ + (1 − cos(α))X ¨] Ai • X(α) = Ai • [X − sin(α)X ˙ ¨ = Ai • X − sin(α)Ai • X + (1 − cos(α))Ai • X = bi . Similarly, we obtain m �
yi (α)Ai + S(α) = c.
i=1
This completes the proof.
The next two lemmas derive a relation between the duality measure and up dated duality measure. Lemma 12.9 Let (X, y, S) be a strictly feasible point of the primal SDP (12.19) and the ¨ satisfy (12.36) and (12.37), respectively, ˙ , y˙ , S˙ ) and (X ¨ , y¨ , S) dual (12.20), (X (X(α), y(α), S(α)) are calculated using Lemma 12.7. Let g(α) := 1 − cos(α), then HP (X(α)S(α)) = (1 − sin(α))HP (XS) + sin(α)σ µI + HP (χ(α)),
(12.61)
where ˙ S˙ ) − sin(α)g(α)HP (X ˙ S¨ + X ¨ S˙ ) HP (χ(α)) := −g2 (α)HP (X 2 ¨ ¨ S). +g (α)HP (X Proof 12.8
(12.62)
From Lemma 12.7, we get X(α)S(α)
˙ + (1 − cos(α))X ¨ ][S − sin(α)S˙ + (1 − cos(α))S¨ ] = [X − sin(α)X ˙ ¨ = XS − sin(α)XS + (1 − cos(α))XS ˙ S˙ − (1 − cos(α)) sin(α)X ¨ S˙ − sin(α)XS˙ + sin2 (α)X ¨ S¨ ˙ S¨ + (1 − cos(α))2 X +(1 − cos(α))XS¨ − sin(α)(1 − cos(α))X 2 ˙ + g(α)(XS ¨ + sin (α)(X ˙ S˙ ) ˙ + XS) ¨ + XS) = XS − sin(α)(XS 2 ¨ ¨ ˙ ¨ ˙ ¨ − sin(α)g(α)(XS + XS) + g (α)(XS).
An Arc-Search Algorithm for Semidefinite Programming
•
281
Applying Lemma 12.4 (the linearity of HP (·)) to this equality, and using the last rows of (12.36) and (12.37) and the fact that sin2 (α) − 2g(α) = sin2 (α) − 2(1 − cos(α)) = sin2 (α) − 1 + 2 cos(α) − sin2 (α) − cos2 (α) = −(1 − cos(α))2 = −g2 (α), we obtain ˙ + g2 (α)HP (X ˙ + XS) ¨ S¨ ) HP (X(α)S(α)) = HP (XS) − sin(α)HP (XS 2 ˙ S˙ ) − sin(α)g(α)HP (X ˙ S¨ + X ¨ S˙ ) + sin (α)HP (X ¨ ¨ +g(α)HP (XS + XS) [use (12.36)] = (1 − sin(α))HP (XS) + sin(α)σ µI ˙ S¨ + X ¨ S˙ ) − sin(α)g(α)HP (X 2 2 ˙ ˙ S) + g (α)HP (X ¨ S¨ ) + sin (α)HP (X ¨ ¨ + XS) +g(α)HP (XS [use (12.37)] = (1 − sin(α))HP (XS) + sin(α)σ µI ˙ S¨ + X ¨ S˙ ) − sin(α)g(α)HP (X 2 ˙ ˙ ˙ S˙ ) + sin (α)HP (XS) − 2g(α)HP (X ¨ ¨ S) +g2 (α)HP (X = (1 − sin(α))HP (XS) + sin(α)σ µI ˙ S¨ + X ¨ S˙ ) − sin(α)g(α)HP (X 2 2 ˙ ˙ ¨ S¨ ) −g (α)HP (XS) + g (α)HP (X = (1 − sin(α))HP (XS) + sin(α)σ µI + HP (χ(α)), (12.63) which completes the proof. Lemma 12.10 ¨ be calculated from (12.36) and ˙ , y˙ , S˙ ) and (X ¨ , y¨ , S) Let (X, y, S) ∈ N−∞ (γ), (X (12.37). Let X(α) and S(α) be defined as in Lemma 12.7. Then, µ(α) = [1 − (1 − σ ) sin(α)]µ. Proof 12.9 Using the identity Tr(HP (M)) = Tr(M) in Lemma 12.4 and (12.61) in Lemma 12.9 and (12.51), we have X(α) • S(α) = Tr(X(α)S(α)) = Tr(HP (X(α)S(α)) = (1 − sin(α))Tr(HP (XS)) + sin(α)σ µn + Tr(HP (χ(α))) = (1 − sin(α))X • S + sin(α)σ µn ˙ • S¨ + X ¨ • S˙ ) − sin(α)g(α)(X
282
• Arc-Search Techniques for Interior-Point Methods
˙ • S˙ ) + g2 (α)(X ¨ • S¨ ) −g2 (α)(X = [1 − (1 − σ ) sin(α)]nµ.
(12.64) (12.65)
Dividing both sides by n yields µ(α) =
X(α) • S(α) = [1 − (1 − σ ) sin(α)]µ. n
This completes the proof.
To proceed, we need several technical lemmas, which are derived by Mon teiro and Zhang in [98]. We denote the eigenvalues of the matrix XS as λ1 ≤ λ2 ≤ . . . , ≤ λn , and λ [(XS)] be any eigenvalue of XS. Since λ [(XS)] = λ [(XS)T ] = ˆ Sˆ , and Sˆ X ˆ are similar to ei λ [(ST XT )] = λ [(SX)], and S1/2 XS1/2 , X1/2 SX1/2 , X ther XS or SX, they all have the same eigenvalues. In addition, we let Λ denote the diagonal matrix Λ = diag(λ1 , λ2 , . . . , λn ). Lemma 12.11 For any P ∈ P(X, S), there exists an orthogonal matrix QP and diagonal matrices Λ(X) and Λ(S) such that ˆ )QT . ˆ := PXP = QP Λ(X (i). X P (ii). Sˆ := P−1 SP−1 = QP Λ(Sˆ )QTP . ˆ )Λ(Sˆ ), and hence, X ˆ Sˆ = Sˆ X ˆ = QP ΛQT . (iii). Λ = Λ(X P n and S n , the commutativity of X ˆ ∈ S++ ˆ ∈ S++ ˆ and Sˆ Proof 12.10 Noticing that X ensures that the two matrices share a common set of orthogonal eigenvectors [54, Theorem 1.3.19], from which (i) and (ii) follow. Moreover, by (i) and (ii), we have ˆ )Λ(Sˆ )QT . Since the spectra XS and X ˆ Sˆ are the same, by permuting the ˆ Sˆ = QP Λ(X X P ˆ )Λ(Sˆ ), therefore, (iii) holds. columns of QP , if necessary, we have Λ = Λ(X
Lemma 12.12 Let P ∈ P(X, S) and G := Eˆ −1 Fˆ . Then, n
2
� (σ µ − λi ) ˙ˆ )I2 + IG1/2 vec(S˙ˆ )I2 + 2X ˙ˆ • S˙ˆ = IG−1/2 vec(X . λi i=1
Moreover, if λmin (XS) ≥ γ µ for some γ ∈ (0, 1), then n � (σ µ − λi )2 i=1
λi
≤ (1 − 2σ + σ 2 /γ)nµ.
(12.66)
An Arc-Search Algorithm for Semidefinite Programming
•
283
ˆ )− 12 , taking 2-norm Proof 12.11 Pre-multiplying both sides of (12.42) by (Fˆ E squared, noticing that Eˆ and Fˆ are positive definite and commutative, and using (12.12), we get 1 ˙ˆ • S˙ˆ ˆ )− 12 Fˆ vecSˆ˙ I2 + 2X ˆ˙ I2 + I(Fˆ E I(Fˆ Eˆ )− 2 Eˆ vecX
ˆ Sˆ ) − σ µI)I2 . ˆ )− 12 vec(H(X = I(Fˆ E
Since P ∈ P(X, S), Fˆ and Eˆ are commutative, which implies that 1 ˆ Fˆ )− 12 Eˆ = Fˆ − 12 Eˆ − 12 Eˆ = (Eˆ −1 Fˆ )− 12 = G− 12 , (Fˆ Eˆ )− 2 Eˆ = (E
ˆ 12 = G 12 . ˆ − 12 Fˆ = E ˆ − 12 Fˆ − 12 Fˆ = (E ˆ −1 F) (Fˆ E) Hence, for the proof of the first statement it remains to show that 1 ˆ Sˆ ) − σ µI)I2 = I(Fˆ Eˆ )− 2 vec(H(X
n � (σ µ − λi )2
λi
i=1
.
(12.67)
Using (12.43), Lemma 12.11 (ii), and (12.10), we find the spectral decomposition of ˆ to be E 1 1ˆ ˆ T. Eˆ = (Sˆ ⊗ I + I ⊗ Sˆ ) = Q [Λ(Sˆ ) ⊗ I + I ⊗ Λ(Sˆ )]Q 2 2 ˆ = QP ⊗ QP is an orthogonal matrix of dimension n2 . Similarly, by (12.43) where Q and Lemma 12.11 (i), we have 1 ˆ ˆ [Λ(X) ˆ T. ˆ ⊗ I + I ⊗ Λ(X ˆ ) = 1Q ˆ )]Q Fˆ = (X ⊗I+I⊗X 2 2 Therefore, using Lemma 12.11 (iii), we obtain ˆ [Λ ⊗ I + I ⊗ Λ + Λ(X) ⊗ Λ(S) + Λ(S) ⊗ Λ(X)]−1 Q ˆ T, (Fˆ Eˆ )−1 = 4Q where, from Lemma 12.2, the matrix in the middle is diagonal with the property that its ((i − 1)n + i)-th diagonal element is equal to 1/(4λi ), for i = 1, . . . , n. For i = 1, the first diagonal element is equal to 1/(4λ1 ). On the other hand, observe that ˆ vec(Λ − σ µI), ˆ Sˆ − σ µI) = Q vec(X where vec(Λ − σ µI) is an n2 -vector having at most n nonzero components, namely, its ((i − 1)n + i)-th component is equal to λi − σ µ for i = 1, . . . , n. The above two relations and a straightforward verification finally yield ˆ Sˆ − σ µI)T (E ˆ Fˆ )−1 vec(X ˆ Sˆ − σ µI) = vec(X
n � (σ µ − λi )2 i=1
λi
,
which proves (12.67) and, hence, the first part of the lemma. To prove the second part of the lemma, we use the fact that nµ = Tr(XS) = ni=1 λi in order to obtain n � (σ µ − λi )2 i=1
λi
n
≤ σ 2µ2
� n σ 2 nµ − 2σ nµ + λi ≤ − 2σ nµ + nµ, λ1 γ i=1
284
� Arc-Search Techniques for Interior-Point Methods
which completes the proof of the lemma.
The next technical lemma is as follows: Lemma 12.13 n × Rm × S n , P ∈ S n , and Q ∈ P(X, S). Then, Suppose that (X, y, S) ∈ S++ ++ ++ λmin [HP (X, S)] ≤ λmin [XS] = λmin [HQ (X, S)]. Proof 12.12
(12.68)
Since Q ∈ P(X, S), we have � � �T � HQ (XS) = 12 QXQQ−1 SQ−1 + QXQQ−1 SQ−1 � � ˆ Sˆ + Sˆ X ˆ =X ˆ Sˆ = QXSQ−1 . = 1 X 2
By similarity, λmin [XS] = λmin [QXSQ−1 ] = λmin [HQ (XS)]. Moreover, λmin [XS] = λmin [PXSP−1 ] ≥ λmin [H(PXSP−1 )] = λmin [H(XS)], where the inequality follows from the fact that the real part of the spectrum of a real matrix is contained between the largest and smallest eigenvalues of its Hermitian part (see p.187 of [54]), for example). We have, thus, shown that (12.68) holds.
The last technical lemma is given as: Lemma 12.14 n , we have For any u, v ∈ Rn and G ∈ S++ � �u��v� ≤ cond(G)�G−1/2 u��G1/2 v� ≤ Proof 12.13
� � cond(G) � −1/2 2 �G u� + �G1/2 v�2 . 2 (12.69)
We have �u�2 ≤
uT G−1 u = λmax (G)�G−1/2 u�2 λmin (G−1 )
and �v�2 ≤
�G1/2 v�2 vT Gv = . λmin (G) λmin (G)
Combining these two inequality gives �u��v� ≤
�
−1/2
cond(G)�G
1/2
u��G
� cond(G) v� ≤ (�G−1/2 u�2 + �G1/2 v�2 ) 2
An Arc-Search Algorithm for Semidefinite Programming
•
285
where the last inequality follows from ab ≤ 12 (a2 + b2 ).
The remaining proofs are provided by Zhang, Yuan, Zhou, Luo, and Huang in [170]. Lemma 12.15 Let a point (X, y, S) ∈ N−∞ (γ) and a scaling matrix P ∈ P(X, S) be given, and define G = Eˆ −1 Fˆ where Eˆ and Fˆ are given by equation (12.43). Then, the solution ˙ , y˙ , S˙ ) of (12.36) satisfies (X � ˙ S˙ )IF ≤ cond(G) β1 µn, IHP (X 2 and
1 ˙ˆ I ≤ �β µn, IG 12 vecS˙ˆ I ≤ �β µn, IG− 2 vecX 1 1 2
where β1 ≥ 1 ≥ 1 − 2σ + σγ . ˆ Fˆ )− 12 and taking normProof 12.14 Pre-multiplying the equation of (12.42) by (E squared on both sides, we have 1 ˙ˆ I2 + I(Fˆ E ˙ˆ • S˙ˆ ˆ )− 12 Fˆ vecS˙ˆ I2 + 2X I(Fˆ Eˆ )− 2 Eˆ vecX
ˆ )− 12 vec(H(X ˆ Sˆ ) − σ µI)I2 . = I(Fˆ E
ˆ are commutative, which implies that Since P ∈ P(X, S), Fˆ and E ˆ )− 12 Eˆ = (E ˆ −1 Fˆ )− 12 = G− 12 , (Fˆ E ˆ )− 12 Fˆ = (E ˆ −1 Fˆ ) 12 = G 12 . (Fˆ E It follows that 1 ˙ˆ I2 + IG 12 vecS˙ˆ I2 + 2X ˆ Sˆ ) − σ µI)I2 . ˆ˙ • Sˆ˙ = I(Fˆ E ˆ )− 12 vec(H(X IG− 2 vecX
˙ˆ • S˙ˆ = 0 and Lemma 12.12, we have From X 2
1 ˙ˆ I2 + IG 12 vecS˙ˆ I2 ≤ 1 − 2σ + σ IG− 2 vecX γ
nµ ≤ β1 µn.
By Lemma 12.14, we can obtain ˙ˆ ˙ˆ SI IX F
˙ˆ ˙ˆ ISI ≤ IXI F F ˙ˆ ˙ˆ IvecSI = Ivec XI √ � � 1 cond(G) ˙ˆ I2 + IG 12 vecS˙ˆ I2 ≤ IG− 2 vecX 2 √ cond(G) ≤ β1 µn. 2
286
• Arc-Search Techniques for Interior-Point Methods
Thus, we have ˙ S˙ ) HP (X
and
˙ˆ ˙ˆ S) = �H(�X F �� � 1 ˙ ˙ � ˙ ˙ �T � � � ˆ ˆ ˆ ˆ = � 2 XS + XS � �� �F � � �� � � ˙ ˙ T� � ˙ˆ ˙ˆ � ˆˆ � ≤ 12 �X S� + � � XS � F F ˆ˙ Sˆ˙ F = √X cond(G) ≤ β1 µn, 2
F
1 ˙ˆ ≤ G− 2 vecX
1 G 2 vecS˙ˆ ≤
β1 µn,
β1 µn.
This completes the proof. Corollary 12.2 n and P ∈ P(X, S) is the NT scaling matrix. Let a point (X, y, S) ∈ N−∞ (γ), G ∈ S++ Then, ˙ F ≤ 1 β1 µn. ˙ S) HP (X 2 Proof 12.15
Since P is the NT scaling matrix, we have (see also [134, 98]) 1
1
1
1
1
1
1
1
1
1
−2 XX− 22(X 2 SX 2 ) 2 X− 2 = S, Wnt XWnt = X− 2 (X 2 SX 2 ) 2 X _ _. I
_
_.
2
1 1 X 2 SX 2
which is equivalent to 1
1
−1
−1
Wnt2 XWnt2 = Wnt 2 SWnt 2 , ˆ = PXP = P−1 SP−1 = Sˆ and consequently E ˆ = Fˆ . This implies which means X cond(G) = 1. In view of Lemma 12.15, the claim follows. Lemma 12.16 Let a point (X, y, S) ∈ N−∞ (γ) and P ∈ P(X, S) be the NT scaling matrix, and define ¨ , y¨ , S¨ ) G = Eˆ −1 Fˆ , where Eˆ and Fˆ are given by equation (12.43). Then, the solution (X of (12.37) satisfies ¨ S¨ ) F ≤ 1 β2 µn2 , HP (X 8 and 1 ¨ˆ ≤ 1 β µn, G 12 vecS¨ˆ ≤ 1 β µn, G− 2 vecX 2 2 2 2 where β2 =
β12 γ
≥ 1.
An Arc-Search Algorithm for Semidefinite Programming
•
287
1 Proof 12.16 Pre-multiplying (Fˆ Eˆ )− 2 to the equation (12.44), and taking the squared norm on both sides, we have 1 ¨ˆ I2 + IG 12 vecS¨ˆ I2 + 2X ¨ˆ • S¨ˆ = 4I(Fˆ Eˆ )− 12 vec(H(X ˙ˆ S˙ˆ ))I2 . IG− 2 vecX
ˆ = Sˆ and, consequently, E ˆ = Fˆ , which implies cond(G) = Since NT scaling is used, X ˆ )−1 ) ≤ 1 . Using 1. In addition, it has been proven in Lemma 12.12 that ρ((Fˆ E 4λ1 Lemmas 12.14, 12.15, we obtain 1 ˆ˙ S˙ˆ ))I2 4I(Fˆ Eˆ )− 2 vec(H(X
since ρ((Fˆ Eˆ )−1 ) ≤
1 4λ1
ˆ )− 12 I2 Ivec(X ˆ˙ S˙ˆ )I2 ≤ 4I(Fˆ E ˙ ˙ −1 ˆ ) )I(X ˆ Sˆ )I2F = 4ρ((Fˆ E 1 1 2 2 2 ≤ 4 4λ 4 β1 µ n 1 ≤ 4γ1 µ β12 µ 2 n2 = 14 β2 µn2 .
ˆ¨ • Sˆ¨ = 0, we have Since X 1 ¨ˆ I2 + IG 12 vecS¨ˆ I2 ≤ 1 β µn2 . IG− 2 vecX 2 4
From Lemma 12.14 and cond(G) = 1, we get ˆ¨ S¨ˆ IF IX
¨ˆ I ISˆ¨ I = IvecX ¨ˆ I IvecS¨ˆ I IX F F � cond(G) ˆ¨ I2 + IG1/2 vecS¨ˆ I2 ≤ IG−1/2 vecX 2 � � 1 1 1 2 β2 µn = β2 µn2 . ≤ 2 4 8 ≤
Hence, it follows that ¨ S¨ )IF IHP (X
ˆ¨ Sˆ¨ )IF = IH(X � � �� �1 ¨ ¨ ¨ˆ S¨ˆ T � � ˆ Sˆ + X = � X �2 � F � � � �� � � ¨ ¨ T� 1 � ¨ˆ ¨ˆ � � � ˆ ˆ X S ≤ + X S � � � � 2 F F ¨ ¨ ˆ ˆ = IXSI F
≤ and
1 β2 µn2 , 8
1 ¨ˆ ≤ 1 �β µn, IG− 2 vecXI 2 2 The proof is completed.
1 ¨ˆ ≤ 1 �β µn. IG 2 vecSI 2 2
288
• Arc-Search Techniques for Interior-Point Methods
Lemma 12.17 � Assume that NT scaling is used. Let β3 = β1 β2 ≥ 1, then ˙ S¨ )IF ≤ 1 β3 µn 32 , IHP (X ¨ S˙ )IF ≤ 1 β3 µn 32 . IHP (X 2 2 Proof 12.17 tation yields
Using Lemma 12.14 and cond(G) = 1, some straightforward compu ˙ S¨ )IF IHP (X
˙ˆ S)I ˆ¨ F = � IH(�X �� �1 ˙ ¨ ˙ˆ ¨ˆ T � � ˆ ˆ + XS = � XS �2 � F ¨ ˙ ˆ Sˆ IF ≤ IX ˆ˙ I IvecS¨ˆ I = IvecX 1 ˙ˆ IG 12 vecS¨ˆ I, ≤ IG− 2 vecXI
From lemmas 12.15 and 12.16, it follows that ˙ S¨ )IF ≤ 1 β3 µn 32 . IHP (X 2 Similarly, ¨ S˙ )IF ≤ 1 β3 µn 32 . IHP (X 2 This proof is completed.
We need a result in [54, Theorem 4.3.27, page 194] in our proof, which is stated as the following lemma. Lemma 12.18 Let A, B ∈ S n , and let λi (A), λi (B), λi (A + B) denote the ith eigenvalues of A, B, and A + B arranged in increasing order, i.e., λ1 (A + B) = λmin (A + B). Then λ1 (A + B) ≥ λ1 (A) + λ1 (B).
Let sin(αˆ 0 ) =
1 3
βn 4
where β =
β1 +β2 +2β3 σ (1−γ) ,
which will be used in the following
lemmas. Lemma 12.19 Let HP (χ(α)) be defined as (12.62), α ∈ (0, αˆ 0 ] and P be a NT scaling matrix. Then, we have 1 λmin (HP (χ(α))) ≥ − sin(α)(1 − γ)µσ . 2
An Arc-Search Algorithm for Semidefinite Programming
•
289
Proof 12.18 We will use Lemmas 12.9, 12.15, 12.16, 12.17, the simple fact of g(α) ≤ sin2 (α) and λmin (·) is a homogeneous concave function on the space of sym metric matrices. From (12.62) and Lemma 12.18, we have � ˙ ˙ S)) λmin (HP (χ(α))) = λmin −g2 (α)(HP (X ˙ S¨ + X ¨ S˙ )) − sin(α)g(α)(HP (X � 2 ¨ ¨ +g (α)(HP (XS))
˙ S˙ )) ≥ −g2 (α)λmax (HP (X ˙ S¨ + X ¨ S˙ )) − sin(α)g(α)λmax (HP (X ¨ S¨ )) +g2 (α)λmin (HP (X 2 ˙ S˙ )IF ≥ −g (α)IHP (X ˙ S¨ + X ¨ S˙ )IF − sin(α)g(α)IHP (X ¨ S¨ )IF −g2 (α)IHP (X 2 ¨ F ˙ S˙ )IF − sin(α)g(α)IHP (X ˙ S)I ≥ −g (α)IHP (X 2 ¨ S˙ )IF − g (α)IHP (X ¨ S¨ )IF − sin(α)g(α)IHP (X � � 3 1 1 ≥ − sin4 (α)β1 µn + 2 sin3 (α)β3 µn 2 + sin4 (α)β2 µn2 2 4 � � 3 1 1 ≥ − sin(α)µ sin3 (αˆ 0 )β1 n + 2 sin2 (αˆ 0 )β3 n 2 + sin3 (αˆ 0 )β2 n2 2 4 � � 2β3 β2 1 β1 = − sin(α)µ + 1 5 + 2 2 β 4β 3 n 4 β 3n 4 1 1 ≥ − sin(α)µ [β1 + β2 + 2β3 ] 2 β
1
= − sin(α)(1 − γ)µσ , 2 where the last inequality follows from β =
β1 +β2 +2β3 σ (1−γ) .
(12.70) This finishes the proof.
Now, we are ready to present the main result of the analysis. Lemma 12.20 Let (X, y, S) ∈ N−∞ (γ), α ∈ (0, αˆ 0 ], and sin(αˆ ) be defined by (12.60), then (X(α), y(α), S(α)) ∈ N−∞ (γ) and sin(αˆ ) ≥ sin(αˆ 0 ). Proof 12.19 has
Combining (12.61), (12.62), Lemmas 12.10, 12.13, 12.18, 12.19, one
λmin (X(α)S(α)) [use Lemma 12.13] ≥ λmin (HP (X(α)S(α))) [use (12.61)] ≥ λmin [(1 − sin(α))HP (XS) + σ µ sin(α)I + HP (χ(α))]
290
• Arc-Search Techniques for Interior-Point Methods
[use Lemma 12.18] ≥ (1 − sin(α))λmin (HP (XS)) + σ µ sin(α) + λmin (HP (χ(α))) ˆ Sˆ ) + σ µ sin(α) + λmin (HP (χ(α))) [use Lemma 12.13] = (1 − sin(α))λmin (X
1 [use Lemma 12.19] ≥ (1 − sin(α))γ µ + σ µ sin(α) − sin(α)(1 − γ)σ µ 2 1 [use Lemma 12.10] = γ µ(α) + sin(α)(1 − γ)σ µ − sin(α)(1 − γ)σ µ 2 ≥ γ µ(α) > 0. (12.71) This reveals that X(α)S(α) is nonsingular for all α ∈ [0, αˆ ]. By using continuity of the eigenvalues of a symmetric matrix [173] and X � 0, S � 0, it follows that X(α) � 0, S(α) � 0 for all α ∈ [0, αˆ 0 ]. From (12.49) and Lemma 12.8, we have (X(α), S(α)) ∈ N−∞ (γ). By (12.60), we obtain sin(αˆ ) ≥ sin(αˆ 0 ). This completes the proof.
In the following theorem, we give an upper bound for the required number of iterations for Algorithm 12.1 to obtain an ε-approximate solution of (12.19) and (12.20). Theorem 12.1 Let (X0 , y0 , S0 ) ∈ N−∞ (γ), sin(αˆ 0 ) = 3
12.1 terminates in at most O(n 4 log
1 3
βn 4 X0 •S0 ε )
and β =
β1 +β2 +2β3 σ (1−γ) .
Then, the Algorithm
iterations.
Due to Lemma 12.10 and the inequality sin(αˆ ) ≥ sin(αˆ 0 ), we have � � 1−σ µ(αˆ ) = [1 − (1 − σ ) sin(αˆ )]µ ≤ [1 − (1 − σ ) sin(αˆ 0 )]µ = 1 − µ. 3 βn4
Proof 12.20
Thus, we obtain � k
k
X •S ≤
1−
1−σ 3
βn4
�k (X0 ) • S0 .
Since we need to have Xk • Sk ≤ ε, it suffices to have � �k 1−σ 1− (X0 ) • S0 ≤ ε. 3 4 βn
(12.72)
Take logarithms, we obtain � k log 1 −
1−σ βn
3 4
� ≤ log
ε . X0 • S0
Using − log(1 − θ ) ≥ θ for 0 < θ < 1, which is equivalent to Lemma 1.2, then we 3
have that (12.72) holds, if k ≥
βn 4 1−σ
log X
0
•S0 ε .
This completes the proof.
An Arc-Search Algorithm for Semidefinite Programming
12.3
•
291
Concluding Remarks
Monteiro and Zhang [98] analyzed several popular interior-point algorithms, such as AHO [4], HRVW/KSS/M [52, 73, 94], KSS/M [73, 94], and NT [103]. They derived polynomial bounds, as summarized here: 3
0
0
3
0
0
(a) O(n 2 log X ε•S ) for Algorithm HRVW/KSS/M. (b) O(n 2 log X ε•S ) for Algorithm KSS/M. 0
0
(c) O(n log X ε•S ) for Algorithm NT. Clearly, the algorithm discussed in this chapter has a better polynomial complex 0 0 3 ity bound, which is O(n 4 log X ε•S ). Still, there are problems to be investigated and questions to be answered. For example, can we develop infeasible interior-point arc-search algorithms for SDP problem? Can we find infeasible interior-point arc-search algorithms for SDP problem with lower polynomial bound? Computationally, is arc-search infeasible interior-point algorithm(s) more efficient than traditional infeasible interior-point algorithm(s) for SDP? To answer the last question, a software package has to be developed and compared to the ones such as SDPT5 [136], SeduMi [129], and SDPA [146], etc. Finally, There are quite a few papers on interior-point algorithms for non linear programming problems, for example, [13, 16, 17, 32, 33, 45, 78, 105, 131, 137, 140]. There should be many opportunities to extend the arc-search techniques to this broad research area. A very recent effort in this direction is described in [76].
References
[1] P. A. Absil, R. Mahony and R. Sepulchre, Optimization algorithms on matrix mani folds, Princeton University Press, Princeton, 2008. [2] F. Alizadeh, Combinatorial optimization with interior-point methods and semi-definite matrices, Ph.D. thesis, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1993. [3] F. Alizadeh, Interior-point methods in semidefinite programming with applications to combinatorial optimization, SIAM Journal on Optimization, 5 (1993), pp. 13-51. [4] F. Alizadeh, J. P. A. Haeberly and M. L. Overton, Complementarity and nondegen eracy in semidefinite programming, Mathematical Programming, 77 (1997), pp. 111 128. [5] A. Altman and J. Gondzio, Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization, Optimization Methods and Software, 11 (1999), pp. 275-302. [6] E. D. Andersen, Finding all linearly dependent rows in large-scale linear program ming, Optimization methods and software, 6 (1995), pp. 219-227. [7] E. D. Andersen and K. D. Andersen, Presolving in linear programming, Mathematical Programming, 71 (1993), pp. 221-245. [8] D. A. Bayer and J. C. Lagaris, The nonlinear geometry of linear programming, I. Affine and projective scaling trajectories, Transactions of the American Mathematical Society, 314 (1989), pp. 499-526. [9] D. P. Bertsekas, Projected Newton methods for optimization problems with simple con straints, SIAM Journal on Control and Optimization, 20 (1982), pp. 221-246. [10] A. B. Berkelaar, K. Roos and T. Terlaky, The optimal set and optimal partition ap proach to linear and quadratic programming, in Recent Advances in Sensitivity Anal ysis and Parametric Programming, H. Greenberg and T. Gal, eds., Kluwer Publishers Berlin, 1997. [11] R. Bland, D. Goldfarb and M. Todd, The ellipsoid method: A survey, Operations Re search, 29 (1981), pp. 1039-1091.
References
•
293
[12] A. L. Brearley, G. Mitra and H. P. Williams, Analysis of mathematical program ming problems prior to applying the simplex algorithm, Mathematical Programming, 8 (1975), pp. 54-83. [13] S. Browne, J. Dongarra, E. Grosse and T. Rowan, The Netlib mathematical software repository, DLib magazine, http://www.dlib.org/dlib/september95/netlib/09browne.html, Accessed 2 January, 2020. [14] S. Boyd, L. El Ghaoui, E. Feron and V. Balakrishnan, Linear matrix inequalities in sys tem and control theory, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1994. [15] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, UK, 1994. [16] R. H. Byrd, J. C. Gilbert and J. Nocedal, A trust region method based on interior point techniques for nonlinear programming, Mathematical Programming, 89 (2000), pp. 149-185. [17] R. H. Byrd, M. E. Hribar and J. Nocedal, An interior point algorithm for large-scale nonlinear programming, SIAM Journal on Optimization, 9 (1999), 877-900. [18] M. P. Do Carmo, Differential geometry of curves and surfaces, Prentice-Hall, New Jersey, 1976. [19] C. Cartis, Some disadvantages of a Mehrotra-type primal-dual corrector interior point algorithm for linear programming, Applied Numerical Mathematics, 59 (2009), pp. 1110-1119. [20] C. Cartis and N. I. M. Could, Finding a point in the relative interior of a polyhedron, Technical Report NA-07/01, Computing Laboratory, Oxford University, Oxford, UK, 2007. [21] R. W. Cottle, J. S. Pang and R. E. Stone, The linear complementarity problem, New York, Academic Press, 1992. [22] A. R. Curtis and J. K. Reid, On the automatic scaling of matrices for Gaussian elimi nation, IMA Journal of Applied Mathematics, 10 (1972), pp. 118-124. [23] J. Czyzyk, S. Mehrotra, M. Wagner and S. J. Wright, PCx user guide (version 1.1), Technical Report OTC 96/01, Optimization Technology Center, 1997. [24] G. B. Dantzig, Programming in a linear structure, Econometrica, 17 (1949), pp. 73-74. [25] G. B. Dantzig, Linear programming and extension, Princeton University Press, New Jersey, 1963. [26] T. A. Davis, Multifrontal multithreaded rank-revealing sparse QR factorization, Tech nical Report, Department of Computer and information Science and engineering, Uni versity of Florida, Florida, 2008. [27] I. I. Dikin, Iterative solution of problems of linear and quadratic programming, Soviet Mathematics Doklady, 8 (1967), pp. 674-675.
294
• References
[28] J. Dobes, A modified Markowitz criterion for the fast modes of the LU factorization. Proceedings of 48th Midwest Symposium on Circuits and Systems, (2005), pp. 955 959. [29] E. D. Dolan and J. J. More, Benchmarking optimization software with performance profiles, Mathematical Programming, 91 (2002), pp. 201-213. [30] J. F. Duff, A. M. Erisman and J. K. Reid, Direct method for sparse matrices, Oxford University Press, New York, 1989. [31] A. Edelman, T. A. Arias and S. T. Smith, The Geometry of algorithms with orthogo nality of constraints, SIAM J. Matrix Anal. Appl, 20 (1998), 303-353. [32] A. S. El-Bakry, R. A. Tapia, T. Tsuchiya and Y. Zhang, On the formulation and theory of the Newton interior-point method for nonlinear programming, Journal of Optimiza tion Theory and Applications, 89 (1996), pp. 507-541, [33] A. Forsgren and P. E. Gill, Primal-dual interior methods for nonconvex nonlinear programming, SIAM Journal on Optimization, 8 (1998), pp. 1132-1152. [34] J. Ekefer, Sequential minimax search for a maximum, Proceedings of the American Mathematical Society, 4 (1953), pp. 502-506. [35] D. Herbison-Evans, Solving quartics and cubics for graphics, Technical Report TR94 487, Basser Department of Computer Science, University of Sydney, Sydney, Aus tralia, 1994. [36] A. V. Fiacco and G. P. McCormick, Nonlinear programming: Sequential unconstrined minimization techniques, Wiley, New York, 1968. [37] O. Friedmann, A subexponential lower bound for Zadeh’s pivoting rule for solving linear programs and games, in Integer Programming and Combinatoral Optimization 2011, Lecture Notes in Computer Science 6655, O. Gunluk and G. J. Woeginger, eds., Springer, Berlin, 2011, pp. 192-206. [38] K. R. Frisch, The logarithmic potential method of convex programming, Technical Report, University Institute of Economics, Oslo, Norway, 1955. [39] D. Gabay, Minimizing a differentiable function over a differentiable manifold, Journal of Optimization Theory and Applications, 37 (1982), pp. 177-219. [40] A. Goldman and A. Tucker, Theory of linear programming, in Linear equalities and related systems, H. Kuhn and A. Tucker, eds., Princeton University Press, Princeton, 1956, pp. 53-97. [41] J. Gondzio, Multiple centrality corrections in a primal-dual method for linear pro gramming, Computational Optimization and Applications, 6 (1994), pp. 137-156 [42] C. C. Gonzaga, Polynomial affine algorithms for linear programming, Mathematical Programming 49 (1990), pp. 7-21. [43] C. T. L. S. Ghidini, A. R. L. Oliveira, J. Silvab and M. I. Velazco, Combining a hybrid preconditioner and a optimal adjustment algorithm to accelerate the convergence of interior point methods, Linear Algebra and its Applications, 436 (2012), pp. 1267 1284.
References
•
295
[44] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin and M. H. Wright, On projected Newton barrier methods for linear programming and an equivalence of Karmarkar’s projective method, Mathematical Programming, 36 (1986), pp. 183-209. [45] N. I. M. Gould, D. Orban and P. L. Toint, CUTEst: A constrained and unconstrained testing environment with safe threads for mathematical optimization, Computational Optimization and Applications, 60 (2015), pp. 545–557. [46] N. I. M. Gould and J. Scott, A note on performance profiles for benchmarking software, ACM Transactions on Mathematical Software, 43 (2016), pp. 15. [47] I. Griva, D. F. Shanno, R. J. Vanderbei and H. Y. Benson, Global convergence of a primal-dual interior-point method for nonlinear programming, Algorithmic Opera tions Research, 3 (2008), pp. 27-52. [48] O. Guler, D. den Hertog, C. Roos, T. Terlaky and Tsuchiya, Degeneracy in interiorpoint methods for linear programming: A survey, Annals of Operations Research, 46 (1993), pp. 107-138. [49] O. Guler and Y. Ye, Convergence behavior of interior-point algorithms, Mathematical Programming, 60, (1993), pp. 215-228. [50] http://users.clas.ufl.edu/hager/coap/format.htm. [51] C. G. Han, P. Pardalos and Y. Ye, Computational aspects of an interior point algorithm for quadratic programming problem with box constraints, in Large-Scale Numerical Optimization, T. F. Coleman and Y. Li, eds. SIAM Publications, Philadelphia, PA, 1990, pp. 92-112. [52] C. Helmberg, F. Rendl, R. J. Vanderbei and H. Wolkowicz, An interior-point method for semidefinite programming, SIAM Journal on Optimization, 6 (1996), pp. 342-361. [53] W. Hock and K. Schittkowski, Test examples for nonlinear programming codes, Lec ture Notes in Economics and Mathematical Systems, 187, Springer-Verlag, Berlin, 1981. [54] R. A. Horn and C. R. Johnson, Topics in matrix analysis, Cambridge University Press, Cambridge, UK, 1991. √ [55] P. Hung and Y. Ye, An asymptotical O( nL)-iteration path-following linear pro gramming algorithm that use wide neighborhoods, SIAM Journal on Optimization, 6 (1996), pp. 570-586. [56] B. Jansen, C. Roos and T. Terlaky, Improved complexity using higher-order correctors for primal-dual Dikin affine scaling, Mathematical Programming, 76 (1996), pp. 117 130. [57] N. Karmarkar, A new polynomial-time algorithm for linear programming, Combina torics, 4 (1984), pp. 373-395. [58] W. Karush, Minima of functions of several variables with inequalities as side con straints, M.Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, Chicago, Illi nois, 1939. [59] L. Khachiyan, A polynomial algorithm in linear programming, Doklady Akademiia Nauk SSSR, 224 (1979), pp. 1093-1096.
296
• References
[60] B. Kheirfam, An arc-search interior point method in the N −∞ neighborhood for sym metric optimization, Fundamenta Informaticae, 146 (2016), pp. 255-269. [61] B. Kheirfam, An arc-search infeasible interior-point algorithm for horizontal linear complementarity problem in the N −∞ neighbourhood of the central path, Interna tional Journal of Computer Mathematics, 94 (2017), pp. 2271-2282. [62] B. Kheirfam and M. Moslemi, On the extend of an arc-search interior-point algo rithm for semidefinite optimization, Numerical Algebra, Control, and Optimization, 2 (2018), pp. 261-275. [63] B. Kheirfam, K. Ahmadi and F. Hasani, A modified full-Newton step infeasible interior-point algoirhm for linear optimization, Asia-Pacific Journal of Operational Research, 30 (2013), pp. 11-23. [64] B. Kheirfam and M. Chitsaz, A corrector-predictor arc-search interior point algorithm for P∗ (k)-LCP acting in a wide neighborhood of the central path, Iranian Journal of Operations Research, 6 (2015), pp. 1-18. [65] V. Klee and G. J. Minty, How good is the simplex algorithm? in Inequalities, O. Shisha, eds., Academic Press, Providence, RI, 1972, pp. 159-175. [66] E. de Klerk, Aspects of semidefinite programming: Interior point algorithms and se lected applications, Kluwer Academic Publishers, Dordrecht, The Netherlands. 2002. [67] M. Kojima, Basic lemmas in polynomial-time infeasible-interior-point methods for linear programs, Annals of Operations Research, 62 (1996), pp. 1-28. [68] M. Kojima, N. Megiddo and S. Mizuno, A primal-dual infeasible interior-point algo rithm for linear programming, Mathematical Programming, 61 (1993), pp. 261-280. [69] M. Kojima, N. Megiddo, T. Noma and A. Yoshise, A unified approach to interior point algorithms for linear complementarity problems: A summary, Operations Research Letters, 10, (1991), pp. 247254. [70] M. Kojima, S. Mizuno and A. Yoshise, A polynomial-time algorithm for a class of linear complementarity problem, Mathematical Programming, 44 (1989), pp. 1039 1091. [71] M. Kojima, S. Mizuno and A. Yoshise, A primal-dual interior point algorithm for lin ear programming, in Mathematical Programming: Interior-point and related methods, N. Megiddo, eds., Springer-Verlag, New York, 1989, pp. 29-47. √ [72] M. Kojima, S. Mizuno and A. Yoshise, A O( nL) iteration potential reduction al gorithm for linear complementarity programming, Mathematical Programming, 50 (1991), pp. 331-342. [73] M. Kojima, S. Shindoh and S. Hara, Interior-point methods for the monotone semidef inite linear complementarity problem in symmetric matrices, SIAM journal on Opti mization, 7 (1997), pp. 86-125. [74] M. H. Koulaei and T. Terlaky, On the extension of a Mehrotra-type algorithm for semidefinit optimization, Technical Report 2007/4, Advanced optimization Lab., De partment of Computing and Software, McMaster University, Hamilton, Ontario, 2007.
References
•
297
[75] H. W. Kuhn and A. W. Tucker, Nonlinear programming, Proceedings of 2nd Berkeley Symposium, Berkeley, University of California Press, pp. 481-492, 1951. [76] E. Lida, Y. Yang and M. Yamashita, An infeasible interior-point arc-search algorithm for nonlinear constrained optimization, arXiv:1909.10706[math.OC], 2019. [77] J. W. Liu, Modification of the minimum degree algorithm by multiple elimination, ACM Transactions on Mathematical Software, 11 (1985), pp. 141-153. [78] T. T. Lu and S. H. Shiou, Inverses of 2 × 2 block matrices, Computers and Mathematics with Applications, 43 (2002), pp. 119-129. [79] D. G. Luenberger, Introduction to linear and nonlinear programming, Addison Wes ley, Massachusetts, 1972. [80] D. Luenberger, Linear and nonlinear programming, Second Edition, Addison-Wesley Publishing Company, Menlo Park, 1984. [81] D. G. Luenberger and Y. Ye, Linear and nonlinear programming, Springer, New York, 2008. [82] I. Lustig, R. Marsten and D. Shannon, Computational experience with a primal-dual interior-point method for linear programming, Linear Algebra and Its Applications, 152, 1991, pp. 191-222. [83] I. Lustig, R. Marsten and D. Shannon, On implementing Mehrotra’s predictorcorrector interior-point method for linear programming, SIAM journal on Optimiza tion, 2 (1992), pp. 432-449. [84] A. Mahajan, Presolving mixed-integer linear programs, Preprint ANL/MCS-P1752 0510, Argonne National Laboratory, 2010. [85] H. Mansouri, M. Pirhaji and M. Zangiabadi, An arc search infeasible interior-point algorithm for symmetric optimization using a new wide neighborhood, Acta Appli candae Mathematicae, 157 (2018), pp. 75-91. [86] H. Markowitz, Portfolio selection, The Journal of Finance, 7 (1952), pp. 77-91. [87] G. P. McCormick, A modification of Armijo’s step-size rule for negative curvature, Mathematical Programming, 13 (1977), pp. 111–115. [88] N. Megiddo, Pathway to the optimal set in linear propramming, in Program in mathe matical programming: interior point and related methods, N. Megiddo, eds., Springer Verlag, New York, 1989, pp. 131-158. [89] S Mehrotra, On the implementation of a primal-dual interior point method, SIAM Journal on Optimization, 2 (1992), pp. 575-601. [90] J. Miao, Two infeasible interior-point predictor-corrector algorithms for linear pro gramming, SIAM journal on Optimization, 6 (1996), pp. 587-599. [91] S. Mizuno, A new polynomial time method for a linear complementarity problem, Mathematical Programming, 56 1(992), pp. 31-43. [92] S. Mizuno, Polynomiality of infeasible-interior-point algorithms for linear program ming, Mathematical Programming, 67 (1994), pp. 109-119.
298
• References
[93] S. Mizuno, M. Todd and Y. Te, On adaptive step primal-dual interior-point algorithms for linear programming, Mathematics of Operations Research, 18 (1993), pp. 964-981. [94] R. Monteiro, Primal-dual path-following algorithms for semidefinite programming, SIAM Journal on Optimization, 7 (1997), pp. 663-678. [95] R. Monteiro and I. Adler, Interior path following primal-dual algorithm, Part I: linear programming, Mathematical Programming, 44 (1989), pp. 27-41. [96] R. Monteiro and I. Adler, Interior path following primal-dual algorithms. Part II: convex quadratic programming, Mathematical Programming, 44 (1989), pp. 43-66. [97] R. Monteiro, I. Adler and M. Resende, A polynominal-time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series exten sion, Mathematics of Operations Research, 15 (1990), pp. 191-214. [98] R. D. C. Monteiro and Y. Zhang, A unified analysis for a class of long-step primal-dual path-following interior-point algorithms for semidefinite programming, Mathematical Programming, 81 (1998), pp. 281-299. [99] Y. Nesterov, Long-step strategies in interior-point primal-dual methods, Mathematical Programming, 76 (1996), pp. 47-94. [100] Y. Nesterov, Lectures on convex optimization, Springer, Gewerbestrasse, Switzerland, 2018. [101] Y. Nesterov and A. Nemirovskii, Interior-point polynomial methods in convex pro gramming, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1994. [102] Y. Nesterov and M. Todd, Self-scaled barriers and interior-point methods for convex programming, Mathematics of Operations Research, 22 (1997), pp. 1-42. [103] Y. Nesterov and M. Todd, Primal-dual interior-point methods for self-scaled cones, SIAM Journal on Optimization, 8 (1998), pp. 256-268. [104] E. Ng and B.W. Peyton, Block sparse Cholesky algorithm on advanced uniprocessor computers, SIAM Journal on Scientific Computing, 14 (1993), pp. 1034-1056. [105] J. Nocedal, A. Wachter and R. A. Waltz, Adaptive barrier update strategies for non linear interior methods, SIAM Journal on Optimization, 19 (2009), pp. 1674-1693. [106] J. Nocedal and S. J. Wright, Numerical optimization, Springer-Verlag, New York, 1999. [107] , M. L. Overton, Large-scale optimization of eigenvalues, SIAM Journal on Optimiza tion, 2 (1992), pp. 88-120. [108] K. Paparrizos, N. Samaras and D. Zissopoulos, Linear programming: Klee–Minty ex amples, In Encyclopedia of Optimization, C. Floudas and P. Pardalos, eds, Springer, Boston, MA, 2008, pp. 17-36. [109] K. B. Petersen and M. S. Pedersen, The matrix https://www.math.uwaterloo.ca/ hwolkowi/matrixcookbook.pdf, 2012.
cookbook,
[110] M. Pirhaji, M. Zangiabadi and H. Mansouri, An �2 -neighborhood infeasible interiorpoint algorithm for linear complementarity problems, 4OR, 15 (2017), pp. 111-131.
References
•
299
[111] M. Pirhaji, M. Zangiabadi and H. Mansouri, A corrector-predictor arc search interiorpoint algorithm for symmetric optimization, Acta Mathematica Scientia, 38 (2018), pp. 1269-1284. [112] M. Pirhaji, M. Zangiabadi, H. Mansouri and S. H. Amin, An arc-search interior-point algorithm for monotone linear complementary problem over symmetric cones, Math ematical Modeling and Analysis, 23 (2018), pp. 1-16. [113] A. D. Polyanin and Manzhirov, Handbook of mathematics for engineers and scientist, Chapman & Hall/CRC, Noca Raton,2007. [114] J. Renegar, A polynomial-time algorithm, based on Newton’s method, for linear pro gramming, Mathematical Programming, 40 (1988), pp. 59-93. [115] R. T. Rockafellar, Convex analysis, Princeton University Press, New Jersey, 1970. [116] C. Roos, A full-Newton step O(n) infeasible interior-point algorithm for linear opti mization, SIAM Journal on Optimization, 16 (2006), pp. 1110-1136. [117] C. Roos, T. Terlaky and J-Ph. Vial, Theory and algorithms for linear optimization: An interior-point approach, John Wiley and Sons, Chichester, 1997. [118] C. Roos, T. Terlaky and J-Ph. Vial, Interior-point methods for linear optimization, Springer, New York, 2006. [119] J. B. Rosen, Pattern separation by convex programming, Analysis and Applications, 10 (1965), pp. 123-134. [120] M. Salahi, J. Peng and T. Terlaky, On Mehrotra-type predictor-corrector algorithms, SIAM Journal on Optimization, 18 (2007), pp. 1377-1397. [121] F. Santos, A counterexample to the Hirsch conjecture, Annals of Mathematics, 176 (2012), pp. 383-412. [122] M. S. Shahraki and A. Delavarkhalafi, An arc-search predictor-corrector infeasibleinterior-point algorithm for P∗ (κ)-SCLCPs, Numerical Algorithms, Accepted, doi:10.1007/s11075-019-00736-4 (2019). [123] M. S. Shahraki, H. Mansouri and M. Zangiabadi, A wide neighborhood infeasibleinterior-point method with arc-search for SCLCPs, Optimization, 67 (2018), pp. 409 425. [124] M. Shida, S. Shindoh and M. Kojima, Existence and uniqueness of search directions in interior-point algorithms for the SDP and the monotone SDLCP, SIAM Journal on Optimization, 8 (1998), pp. 387-396. [125] S. L. Shmakov, A universal method of solving quartic equations, International Journal of Pure and Applied Mathematics, 71 (2011), pp. 251-259. [126] S. Smale, Mathematical problems for the next century, in V. I. Arnold, M. Atiyah, P. Lax and B. Mazur, Mathematics: Frontiers and perspectives, American Mathemati cal Society, 1999, pp. 271294. [127] S. T. Smith, Geometric optimization methods for adaptive filtering, Ph.D. thesis, De partment of Applied Mathematics, Harvard University, Cambridge, MA, 1993.
300
• References
[128] P. M. Stoltz, S. Sivapiragasam and T. Anthony, Satellite orbit-raising using LQR con trol with fixed thrusters, Advances in the Astronautical Sciences, 98 (1998), pp. 109 120. [129] J. F. Sturm, Using SeDuMi 1.02, A Matlab toolbox for optimization over symmetric cone, Optimization Methods and Software, 11 (1999), pp. 625-653. [130] K. Tanabe, Centered Newton method for mathematical programming, in System mod eling and optimization: Proceedings of the 13th IFIP conference, Lecture Notes in Control and Information Systems 113, Berlin, 1987, Springer-verlag, New York, 1988, pp. 197-206. [131] A. L. Tits, A. Wachter, S. Bakhtiarl, T. J. Urban and C. T. Lawrence, A primal-dual method for nonlinear programming with strong global and local convergence proper ties, Mathematical Programming, 8 (1998), pp. 1132-1152. [132] A. L. Tits and Y. Yang, Globally convergent algorithms for robust pole assignment by state feedback, IEEE transactions on Automatic Control, 41 (1996), pp. 1432-1452. [133] M. J. Todd, The many facets of linear programming, Mathematical Programming, Ser. B, 91 (2002), pp. 417-436. [134] M. J. Todd, K. C. Toh and R. H. Tutuncu, On the Nesterov–Todd direction in semidef inite programming, SIAM Journal on Optimization, 8(3) (1999), pp. 769796. [135] M. J. Todd and Y. Ye, A centered projective algorithm for linear programming, Math ematics of Operations Research, 15 (1990), pp. 508-529. [136] K. C. Toh, M. J. Todd and R. H. Tutuncu, SDPT3 A Matlab software package for semidefinite programming, Version 1.3, Optimization Methods and Software, 11 (1999), pp. 545-581. [137] M. Ulbrich, S. Ulbrich and L. N. Vicente, A globally convergent primal-dual interiorpoint filter method for nonlinear programming, Mathematical Programming, 100 (2004), pp. 379-410. [138] L. Vandenberghe and S. Boyd, A primal-dual potential reduction method for problems involving matrix inequalities, Mathematical Programming, 69 (1995), pp. 205-236. [139] R. J. Vanderbei. LOQO: An interior point code for quadratic programming, Optimiza tion Methods and Software, 12 (1999), pp. 451-484. [140] R. J. Vanderbei and D. F. Shanno, An interior-point algorithm for nonconvex nonlinear programming, Computational Optimization and Applications, 13 (1999), pp. 231-252. [141] S. A. Vavasis and Y. Ye, A primal dual interior-point method whose running time depends on the constraint matrix, Mathematical Programming, 74 (1996), pp .79-120. [142] L. B. Winternitz, A. L. Tits and P.-A. Absil, Addressing rank degeneracy in constraintreduced interior-point methods for linear optimization, Journal of Optimization The ory and Applications, 160 (2014), pp. 127-157. [143] P. Wolfe, The simplex method for quadratic programming, Econometrica, 27, (1959), pp. 382-398.
References
•
301
[144] S. Wright, Primal-Dual Interior-Point Methods, Society for Industrial and Applied Mathematics, Philadelphia, 1997. [145] X. Yang, Study on wide neighborhood in interior-point method for symmetric cone programming, Ph.D thesis, Xidian University, Xian, 2017. [146] M. Yamashita, K. Fujisawa, M. Fukuda, K. Nakata and M. Nakata, A highperformance software package for semidefinite programs: SDPA 7, Research Report B-463, Dept. of Mathematical and Computing Science, Tokyo Institute of Technology, Tokyo Japan, 2010. [147] X. Yang, H. Liu and Y. Zhang, An arc-search infeasible-interior-point method for sym metric optimization in a wide neighborhood of the central path, Optimization Letters, 11 (2017), pp. 135-152. [148] X. Yang and Y. Zhang, A Mizuno-Todd-Ye predictor-corrector infeasible-interior-point method for symmetric optimization with the arc-search strategy, Journal of Inequalities and Applications, 2017, 2017:291. [149] X. Yang, Y. Zhang and H. Liu, A wide neighborhood infeasible-interior-point method with arc-search for linear programming, Journal of Applied Mathematics and Com puting, 51 (2016), pp. 209-225. [150] Y. Yang, Robust system design: Pole assignment approach, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 1996. [151] Y. Yang, Arc-search path-following interior-point algorithm for linear programming, Optimization Online, 2009. [152] Y. Yang, A polynomial arc-search interior-point algorithm for convex quadratic pro gramming, European Journal of Operational Research, 215 (2011), pp. 25-38. [153] Y. Yang, A polynomial arc-search interior-point algorithm for linear programming, Journal of Optimization Theory and Applications, 158 (2013), pp. 859-873. [154] Y. Yang, An efficient polynomial interior-point algorithm for linear programming, arXiv:1304.3677[math.OC], 2013. [155] Y. Yang, Constrained LQR design using interior-point arc-search method for convex quadratic programming with box constraints, arXiv:1304.4685[math.OC], 2013. [156] Y. Yang, Arc-search infeasible interior-point algorithm for linear programming, arXiv:1406.4539 [math.OC], 2014. [157] https://www.mathworks.com/matlabcentral/fileexchange/53911-curvelp. [158] Y. Yang, CurveLP-a MATLAB implementation of an infeasible interior-point algo rithm for linear programming, Numerical Algorithms, 74 (2017), pp. 967-996. [159] Y. Yang, Two computationally efficient polynomial-iteration infeasible interior-point algorithms for linear programming, Numerical Algorithms, 79 (2018), pp. 957-992. [160] Y. Yang, Spacecraft modeling, attitude determination, and control: Quaternion-based method, CRC Press, Boca Raton, 2019.
302
• References
[161] Y. Yang and M. Yamashita, An arc-search O(nL) infeasible-interior-point algorithm for linear programming, Optimization Letters, 12 (2018), pp. 781-798. [162] Y. Yang and Z. Zhou, An analytic solution to Wahba’s problem, Aerospace Science and Technology, 30 (2013), pp. 46-49. [163] Y. Ye, An O(n3 L) potential reduction algorithm for linear programming, Mathemati cal programming, 50 (1991), pp. 239-258. [164] Y. Ye, Interior point algorithms: Theory and analysis, John Wiley & Son, Inc., New York, 1997. [165] Y. Ye, Conic linear programming, Preprint available http://web.stanford.edu/class/msande314/sdpmain.pdf, 2017.
on
internet
[166] Y. Ye and E. Tse, An extension of Karmarkar’s projective algorithm for convex quadratic programming, Mathematical Programming, 44 (1989), pp. 157-179. [167] B. Yuan, M. Zhang and Z. Huang, A wide neighborhood primal-dual interior-point algorithm with arc-search for linear complementarity problem, Journal of Nonlinear Functional Analysis, Article ID 31, (2018). [168] B. Yuan, M. Zhang and Z. Huang, A wide neighborhood interior-point algorithm with arc-search for P∗ (k) linear complementarity problem, Applied Numerical Mathemat ics, 136 (2019), pp. 293-304. [169] N. Zadeh, What is the worst case behavior of the simplex algorithm? Technical Report No. 27, Department of Operations Research, Stanford University, Stanford, California, 1980. [170] M. Zhang, B. Yuan, Y. Zhou, X. Luo and Z. Huang, A primal-dual interior-point algo rithm with arc-search for semidefinite programming, Optimization Letters, 13 (2019), pp. 1157-1175. [171] Y. Zhang, On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem, SIAM Journal on Optimization, 4 (1994), pp. 208-227. [172] Y. Zhang, Solving large-scale linear programs by interior-point methods under the matlab environment, Technical Report TR96-01, Department of Mathematics and Statistics, University of Maryland, Baltimore, MD, 1996. [173] Y. Zhang, On extending some primal-dual interior-point algorithms from linear pro gramming to semi-definite programming, SIAM Journal on Optimization, 8 (1998), pp. 365-386. [174] Y. Zhang and D. T. Zhang, On polynomiality of the Mehrotra-type predictor-corrector interior-point algorithms, Mathematical Programming, 68 (1995), pp. 303-318.
Index
σk , 23
centering parameter, 23
maximizes step size, 170
algorithm, 3
long-step, 33
long-step path-following, 39
Mehrotra’s, 73
MTY, 33
MTY predictor-correct, 42
path-following, 32
polynomial interior-point, 21
polynomial iterations, 153
potential-reduction, 24
potential-reduction interior-point, 31
primal-dual path-following interior-point,
31
primal-dual potential-reduction interior-
point, 31
short step, 33
short step infeasible interior-point, 72
short step interior-point, 57
short step path-following, 55
short step path-following interior-point,
42
simplex, 3
step size, 249
well-defined, 153
algorithms
infeasible interior-point, 143
analysis, 14
convex, 14
arc-search, 3, 79, 221
ellipse, 221
�2 -neighborhood interior-point for LCP, 256
arc-search feasible interior-point, 256
arc-search higher-order interior-point,
218
arc-search infeasible interior-point, 114,
153
convergence, 153
curveLP, 114
efficiency, 4
feasible interior-point, 31, 33, 248
high-order, 33
higher-order path-following, 55
infeasible arc-search interior-point, 267
infeasible interior-point, 33, 144
infeasible predictor-corrector, 98
infeasible primal-dual path-following
interior-point, 31
infeasible-predictor-corrector, 98
interior-point, 3, 10, 242
Karmarkar’s, 79
Khachiyan’s ellipsoid, 7
long step infeasible interior-point, 63,
72
box constraints, 218
long step interior-point, 57
central path, 4, 22, 32, 55, 79, 80, 144, 183,
long step path-following, 55
220, 273
long step path-following interior-point,
neighborhood, 220
42
304
• Index
of LCP, 258
center of, 221
central path 115
equation, 22
complexity
augmented system form, 117
polynomial, 4
linear system, 22
simplex method, 7
normal equation form, 117
condition, 8
unreduced form, 117
complementarity, 9
formula, 190
duality reduction, 249
Cardano’s, 190
first order necessary, 20
function, 3
first order optimality, 19, 20
convex, 15
KKT, 17, 18, 80, 145, 182
Lagrange, 50
KKT conditions, 8
Lagrangian, 19
necessary, 19
logarithmic barrier, 21
necessary and sufficient, 19
objective, 3
positivity, 249
primal and dual potential, 23
proximity, 40, 249
smooth, 3
convergence, 229
strict convex, 15
convergence theory, 4
convex optimization, 268
convex quadratic programming, 181, 218, geodesic, 3
Goldman-Tucker theorem, 245
229
arc-search, 221
initial point, 4, 6, 122, 143, 144, 166
corrector step, 43, 45, 58
explicit feasible, 218
criteria
feasible, 75, 207, 218
termination, 144
infeasible, 75, 145, 157, 218
criterion, 130
inner product, 18
Markowitz, 130
real symmetric matrices, 18
stopping, 136, 172
termination, 207, 247
KKT conditions, 20, 219, 244
decomposition, 53
Lagrange multipliers, 50
Cholesky, 131, 167
lemma, 10
ill-conditioned sparse Cholesky, 131
Farkas, 10
QR, 53
linear complementarity problem, 17, 18
sparse QR, 54
mixed, 18
dependency
monotone, 17
removing, 167
linear independent, 18
derivative
linear programming, 3, 6, 13
first, 81
canonical form, 6
higher-order, 4, 55
standard form, 6
second, 81
dual programming, 6
manifold
dual slack vector, 6
Riemannian, 3
dual variable vector, 6
matrix, 5
duality gap, 9, 23, 184, 212
2-norm, 269
duality measure, 9, 23, 42, 49, 58, 83, 99, 115,
Cholesky, 130
117, 144, 146, 220, 222, 240, 252
commutative, 275
diagonal, 6
ellipse, 81, 98, 116, 221
eigenvalue, 269
analytic expression, 146
Frobenius norm, 269
axes, 81, 116
identity, 6
center, 116
Index ill-conditioned, 129, 131, 167
inner product, 269
Kronecker product, 269
nonsingular, 146, 274
positive definite, 6, 219, 269
positive diagonal, 146
positive semidefinite, 6, 17, 182, 269
scaling, 129, 167
sparse structure, 130
spectral condition number, 269
spectral radius, 269
symmetric, 269
trace, 6, 18, 269
transpose, 6
method, 3
feasible interior-point, 9
Gaussian elimination, 130
higher-order, 75
infeasible interior-point, 10
interior-point, 3, 7, 42, 268
Khachiyan’s ellipsoid, 7
Mehrotra’s, 4
potential-reduction, 21
predictor-corrector, 3
primal potential-reduction, 21
primal-dual potential-reduction, 21
second-order, 4
the first-order, 4
neighborhood, 99, 145, 153, 163
2-norm, 32
central path, 259
central-path, 32, 46
infeasible, 64
infeasible central path, 115
narrow, 4, 33, 42, 72
narrow infeasible, 58
one-side ∞-norm, 32
wide, 4, 33, 75
wider, 72
nonlinear programming, 21
norm, 5
1-norm, 5
∞-norm, 6
Euclidean norm, 5
induced norm, 6
matrix norm, 6
numerical stability, 130
objective, 23
operator
•
305
similarly transformed symmetrization,
273
optimal solution, 143
upper bound, 143
optimization, 3
combinatorial, 268
convex, 14
convex quadratic, 19
equality constrained, 3
linear, 19
nonlinear optimization, 19
unconstrained, 3
optimizer, 3, 218
parameter, 23
adjustable σ , 147
centering, 117, 146, 148
default for Algorithm 8.1, 166
weight, 23
penalty, 23
performance profile, 127, 177
pivot rule, 7
Dantzig’s, 7
Zadeh’s, 7
polynomial, 3
Taylor, 56
polynomial bound, 4, 41, 42, 53
worst case, 4
polynomial time, 7
polynomiality, 13
polytope, 7
diameter, 7
edge, 7
vertex, 7
positivity, 121, 151, 242
post-process, 128, 144, 167
pre-process, 123, 144
pre-solve, 167
pre-solver, 123
predictor step, 42, 58
problem, 8
P∗ (κ)-linear complementarity, 256
constrained optimization, 19
convex, 8
convex optimization, 19
convex quadratic, 14
convex quadratic programming, 15
dual, 18
horizontal linear complementarity, 256
inequality constrained optimization, 218
linear complementarity, 256
306
• Index
minimax, 170
monotone linear complementarity, 15,
256
Netlib benchmark, 114
pattern separation, 14
portfolio selection, 14
positive semidefinite, 18
positive semidefinite programming, 15,
18
primal-dual linear programming, 144
proximity condition, 249
qualification, 20
linearly independent constraint, 20
real positive root, 242
residual, 58
dual, 59
dual programming, 58, 98
equality constraint, 144
equality constraints, 115
primal, 59
primal programming, 58, 98
Riemannian manifold, 3
row dependency, 129
scaling, 129
semidefinite programming, 268
set, 3, 219
active, 20
convex, 15
empty set, 6
epsilon−solution, 58
equality constraints, 19
feasible, 8, 19, 182, 189, 257
feasible set, 219
index set B, 91, 245
index set N , 91
index set S, 245
index set T , 245
inequality constraints, 19
level, 22
real symmetric matrices, 18
strict feasible, 22, 258
strictly feasible, 8, 182, 220
simplex method
Dantzig’s, 7
solution
ε-optimal, 61
ε-solution, 58
analytic αk , 168
degenerate, 132, 168
dual feasible, 8
global, 15, 19
global optimal, 183, 219
linear programming, 145
local, 15, 19
nearly optimal, 218
optimal, 8
optimal dual, 8
optimal primal, 8
primal feasible, 8
primal-dual, 7
strictly complementary optimal, 13, 131
strictly dual feasible, 8
strictly global, 15, 19
strictly local, 15, 19
strictly primal feasible, 8
unique, 146
space
null space, 6, 47
starting point, 4
feasible, 4
infeasible, 4
step size, 58, 148, 248
step-size, 3
strictly complementary, 13, 91, 131, 245
system, 22
perturbed KKT, 22, 46
technique, 4
arch-search, 4
theorem, 10
convergence, 72
Goldman-Tucker, 10
separating hyperplane, 10
variable dual, 18
primal, 18
slack variable, 7
vector, 5
dual slack, 17
dual variable, 17
vector operator, 270