Duality in Global Optimization: Optimality Conditions and Algorithmical Aspects 9783826561153, 3826561155


226 23 1MB

English Pages 117 Year 1999

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Duality in Global Optimization: Optimality Conditions and Algorithmical Aspects
 9783826561153, 3826561155

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Duality in Global Optimization: Optimality Conditions and Algorithmical Aspects

Dissertation zur Erlangung des akademischen Grades einer Doktorin der Naturwissenschaften Dem Fachbereich IV der Universitat Trier vorgelegt von

Mirjam Dur

Berichte aus der Mathematik

Mirjam Dür

Duality in Global Optimization: Optimality Conditions and Algorithmical Aspects

.

Shaker Verlag Aachen 1999

Die Deutsche Bibliothek - CIP-Einheitsaufnahme Dür, Mirjam: Duality in Global Optimization: Optimality Conditions and Algorithmical Aspects / Mirjam Dür. - Als Ms. gedr. - Aachen : Shaker, 1999 (Berichte aus der Mathematik) Zugl.: Trier, Univ., Diss., 1999 ISBN 3-8265-6115-5

.

Copyright Shaker Verlag 1999 Alle Rechte, auch das des auszugsweisen Nachdruckes, der auszugsweisen oder vollständigen Wiedergabe, der Speicherung in Datenverarbeitungsanlagen und der Übersetzung, vorbehalten. Als Manuskript gedruckt. Printed in Germany. ISBN 3-8265-6115-5 ISSN 0945-0882 Shaker Verlag GmbH • Postfach 1290 • 52013 Aachen Telefon: 02407 / 95 96 - 0 • Telefax: 02407 / 95 96 - 9 Internet: www.shaker.de • eMail: [email protected]

To M.

Acknowledgements

The work on this thesis was carried out under the supervision of R. Horst, University of Trier, Germany, whom I thank for giving me the opportunity of studying at the University of Trier for two years. I am also grateful to J.{B. Hiriart{Urruty, Universite Paul Sabatier, Toulouse, France, for his readiness to act as a referee of this thesis and for giving many hints for future research. I highly bene ted from the collaboration with the members of the global optimization group in Trier: N. V. Thoai, U. Raber and M. Locatelli. W. Oettli, University of Mannheim, Germany, gave valuable comments on Chapter 2 of this thesis. My thank also goes to I. M. Bomze, University of Vienna, Austria, for continual encouragement and stimulating discussions. My work would not have been possible without the nancial support of the \Deutsche Forschungsgemeinschaft" during my time in the \Graduiertenkolleg Mathematische Optimierung" in Trier from April 1996 to May 1998. The members of the \Graduiertenkolleg Mathematische Optimierung", in particular its speaker E. Sachs, provided a pleasant atmosphere for working. Finally, I would like to thank the members of the Department of Statistics at Vienna's University of Economics and Business Administration, which I am aliated with since June 1998, for their support and encouragement.

Erster Berichterstatter:

Prof. Dr. R. Horst, Universitat Trier

Zweiter Berichterstatter:

Prof. Dr. J.{B. Hiriart{Urruty, Universite Paul Sabatier, Toulouse

Tag der mundlichen Prufung:

12. April 1999

Contents I Optimality Conditions

1

1 Introduction to Part I

3

2 Global Optimality Conditions for Minimizing Di erences of Functions 5 2.1 Introducing the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2 Preliminaries from Convex Analysis . . . . . . . . . . . . . . . . . . . . .

6

2.3 Optimality Conditions for D.C. Problems . . . . . . . . . . . . . . . . . .

8

2.4 Optimality Conditions for a More General Class of Functions . . . . . . . 10

3 Global Optimality Conditions for Convex Maximization

17

3.1 From D.C. Programming to Convex Maximization . . . . . . . . . . . . . 18 3.2 Interconnection of Global Optimality Criteria . . . . . . . . . . . . . . . 22 3.3 Further Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.1 Reformulation of (HU) in the Di erentiable Case . . . . . . . . . 26 3.3.2 Maximization of Strictly Convex Quadratic Functions over Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.3 Equivalences between Nonconvex Optimization Problems . . . . . 29

4 Connections between Local and Global Optimality Conditions

31

4.1 Necessary Conditions for Local Optimality . . . . . . . . . . . . . . . . . 31 4.2 The Piecewise Ane Case . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Sucient Conditions for Local Optimality . . . . . . . . . . . . . . . . . 36 4.4 A Generalization of Strekalovsky's Optimality Condition to D.C. Problems 37 i

Contents

ii

5 Remarks on D.C. Decompositions

39

5.1 Existence of D.C. Decompositions . . . . . . . . . . . . . . . . . . . . . . 39 5.2 D.C. Decompositions for Polynomials . . . . . . . . . . . . . . . . . . . . 40

II Algorithmical Aspects

45

6 Introduction to Part II

47

7 The Branch{and{Bound Algorithm

49

7.1 The Basic Branch{and{Bound Scheme . . . . . . . . . . . . . . . . . . . 49 7.2 Branching and Bounding Procedures . . . . . . . . . . . . . . . . . . . . 51 7.3 Convergence Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8 Lagrange Duality and Partitioning Techniques

55

8.1 Convex Envelopes and Duality . . . . . . . . . . . . . . . . . . . . . . . . 55 8.1.1 Convex Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 8.1.2 Duality Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 8.2 Branch{and{Bound Methods with Dual Bounds . . . . . . . . . . . . . . 59 8.2.1 Limit Behaviour on Nested Sequences . . . . . . . . . . . . . . . . 59 8.2.2 Partitioning Methods with Dual Bounds . . . . . . . . . . . . . . 62 8.2.3 Partly Convex Optimization Problems . . . . . . . . . . . . . . . 64 8.3 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.3.1 Linearly Constrained Problems and Convexi cation . . . . . . . . 66 8.3.2 Generalized Bilinear Constraints . . . . . . . . . . . . . . . . . . . 66 8.3.3 Maximizing the Sum of Ane Ratios . . . . . . . . . . . . . . . . 67 8.3.4 Concave Minimization under Reverse Convex Constraints . . . . . 68

9 Global Optimization of Sums of Ratios and the Corresponding Multiple{Criteria Decision Problem 69 9.1 Applications and Background . . . . . . . . . . . . . . . . . . . . . . . . 69

Contents

iii

9.2 Application of the Basic Branch{and{Bound Ratios Problem . . . . . . . . . . . . . . . . . 9.3 Convergence . . . . . . . . . . . . . . . . . . . 9.4 Upper Bounds for Sums of Ane Ratios . . . 9.5 Lower Bounds for Sums of Ane Ratios . . .

Scheme ..... ..... ..... .....

to the Sum{of{ .......... .......... .......... ..........

9.5.1 The Corresponding Multiple{Objective Problem . . 9.5.2 A Generalized Parametric Approach . . . . . . . . 9.5.3 A Finite Procedure for Calculating Ecient Points 9.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

71 75 76 80 80 81 86 88

10 Second Branch{and{Bound Approach for the Sum{of{Ratios Problem 91 10.1 Reformulating the Problem . . . . . . 10.2 The Algorithm . . . . . . . . . . . . . 10.2.1 Upper Bounds . . . . . . . . . . 10.2.2 Computation of Feasible Points 10.3 Convergence . . . . . . . . . . . . . . . 10.4 Numerical Results . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

92 93 94 96 96 98

Bibliography

101

Index

107

Part I Optimality Conditions

1

Chapter 1 Introduction to Part I Duality theory is a very old subject of investigation in the eld of optimization. It is best known in the context of linear programming, where a \dual problem" can be associated with each given problem which is then called \primal problem". The dual linear problem is closely related to the primal one, the optimal values (if they exist) are equal, and the dual optimal solution allows for nice interpretations. This duality concept is the classical one named after Lagrange. A Lagrange{dual problem can also be assigned to nonlinear or even nonconvex problems. But whereas in convex programming, provided that suitable regularity conditions are ful lled, equality of the primal and dual optimal values can still be guaranteed, this does not necessarily hold true for nonconvex problems. Early references dealing with Lagrange duality theory for nonlinear problems are Geo rion [32] and Falk [27]. A detailed chapter on duality in convex programming can be found in Avriel [7]. A more recent survey is the tutorial paper by Brinkhuis [13]. Lagrange{duality for a special class of problems was treated, among others, by Fulop [31]. There exist, however, other notions of duality. Apart form Lagrange{duality, the one most often encountered in the area of optimization is probably Fenchel{duality. Fenchel{ duality is often investigated when theoretical properties of a certain type of optimization problems are to be examined. The connection between Fenchel{ and Lagrange{duality is was studied, among others, by Hiriart{Urruty and Lemarechal [40] and by Antoni [4]. Dealing with duality theory is interesting both from a theoretical and from a practical point of view. This ambiguity was the motivation for the present thesis and is also the reason why the thesis is divided into two parts. The rst part deals with some theoretical aspects of duality theory, in particular with global optimality conditions. Here, Fenchel's concept of duality will play an important role. In contrast, the second part investigates practical, i.e. algorithmical aspects of duality. More precisely, the use of Lagrange{duality in Branch{and{Bound algorithms is investigated and the so developed convergence theory is used to design algorithms for a partic3

4

Introduction to Part I

ular nonconvex global optimization problem, the sum{of{ratio maximization problem. Here, also some aspects of multiple criteria optimization will be taken into account. But let us start with Part I, i.e. with global optimality conditions. Various optimality criteria have been proposed for di erent types of global optimization problems. Many of them make use of \dual information". They operate in the topological dual of the given topological vector space or use Fenchel{duality. We rst review some optimality criteria given in the literature for so{called d.c. optimization problems. These conditions basically rely on techniques from convex analysis, which are brie y recalled. Then it is shown how the d.c. optimality conditions extend to the more general setting of di erences of functions belonging to (almost) arbitrary large classes. In Chapter 3, optimality criteria for the convex maximization problem are investigated. The convex maximization problem is in fact a special case of the d.c. optimization problem analyzed before. We show the interconnection of several optimality criteria and give specializations to the di erentiable case. Chapter 4 investigates the connection between global and local optimality conditions both for the d.c. problem and for the convex maximization problem. Necessary and suf cient optimality criteria are examined and di erent \globalization methods" are studied. Finally, in Chapter 5 we give some remarks on d.c. decompositions, i.e. we consider the problem of nding a decomposition of a given function as the di erence of two convex functions. This is an important question, because it is known that many nonconvex functions are d.c. functions, so the optimality criteria from Chapters 2{4 apply. No general method is known, however, to nd this decomposition explicitly.

Chapter 2 Global Optimality Conditions for Minimizing Di erences of Functions In convex programming, various necessary and sucient optimality conditions, i.e. conditions characterizing the optimal solution of a given optimization problem, are known. Maybe the most famous among them are the Karush{Kuhn{Tucker conditions. These conditions rely on rst and second order information, such as the gradients of the objective and constraint functions. The fundamental property which makes convex problems relatively easy to treat is the fact that every local solution is already a global one. Therefore, methods using local information (such as gradients) are sucient to detect a global solution of the problem. For nonconvex programming problems, this is not true any more. Here we have to deal with many inecient local solutions that are di erent from the global one. Thus, local information is no longer sucient for solving such problems. We need to have global information about the behaviour of the functions. Di erent tools must be developed. Evidently, optimality conditions for nonconvex problems have to incorporate the structure of the problem. One type of problem for which several optimality conditions are known is the so{called d.c. problem. In this chapter, we give an overview over optimality conditions for d.c. problems and investigate in which more general settings they are valid. The results of this chapter have been formulated and proved in Dur [23].

2.1 Introducing the Problem Let X be a locally convex Hausdor topological vector space and let g; h : X ! IR[f+1g be convex functions. Recall that a function f : X ! IR [ f+1g is called convex, if for any two points x1; x2 2 X and for any  2 [0; 1] we have f (x1 + (1 , )x2)  f (x1) + (1 , )f (x2):

5

6

Chapter 2. Minimizing Differences of Functions

We follow the convention to set +1, (+1) = +1. The reason why we allow functions to take the value +1 will become clear in Chapter 3, when we consider optimization problems subject to constraints. With the functions g and h the sum g + h is again a convex function, as is the maximum maxfg; hg and the multiple g for any positive . The di erence g , h, however, is not a convex function any more. This is the reason why optimization problems which involve di erences of convex functions are particularly dicult, as they are nonconvex problems. In this chapter, we will deal with optimization problems of the following type: min[g (x) , h(x)]

s.t. x 2 X :

(2.1)

A function representable as the di erence of two convex functions is often called a d.c. function, therefore (2.1) is often called a d.c. problem. When solving (2.1), we are looking for a globally optimal solution, i.e. for a point x 2 X with the property g (x) , h(x)  g (x) , h(x) 8x 2 X: Conditions which characterize such a global minimizer are called global optimality conditions. Several global optimality conditions for (2.1) have been proposed. Even though d.c. problems are nonconvex problems, global optimality criteria for this type of problem basically rely on techniques from convex analysis. In the next section we brie y review the most important topics from this eld.

2.2 Preliminaries from Convex Analysis Denote by X  the topological dual of X and for x 2 X ; y 2 X  de ne hx; y i := y (x). The following de nitions apply not only for convex functions, but for arbitrary functions. De nition 2.2.1 Let f : X

! IR [ f+1g.

 dom f := fx 2 X : f (x) < +1g is called the domain of f .  f is called proper, if dom f 6= ;.  f is called lower semicontinuous (l.s.c.) at x 2 X , if f (x) = limy!inf x f (y ), it is called l.s.c., if it is l.s.c. at every point x 2 X . Recall that in the context of topological vector spaces the limes inferior is de ned as follows:

lim inf f (y ); y!inf x f (y ) := N sup 2N x y 2N ( )

where N (x) denotes the family of neighbourhoods of x, cf. Laurent [53].

7

2.2. Preliminaries from Convex Analysis



f  : X  ! IR [ f+1g de ned as f (y ) := sup[hx; y i , f (x)] x2X

is called the Fenchel{conjugate function (sometimes Fenchel{Rockafellar conjugate function) of f . f  := (f  ) is called the biconjugate function of f . Note that f (x) + f  (y )  hx; y i for all (x; y ) 2 X  X  .

A convex function f on IRn need not be di erentiable, but the set of points in dom f where the gradient rf (x) does not exist is of Lebesgue{measure zero. The usual concept of di erentiability therefore has to be extended, and we are led to so{called subdi erentials and approximate subdi erentials. De nition 2.2.2 Let f : X

 The set

! IR [ f+1g be a convex function and let x 2 dom f .

@f (x) := fy 2 X  : f (x)  f (x) + hx , x; y i

8 x 2 Xg

is called subdi erential of f at x. The elements of @f (x) are called subgradients.

 For given "  0, the set @"f (x) := fy 2 X  : f (x)  f (x) + hx , x; y i , "

8 x 2 Xg

is called approximate subdi erential or "{subdi erential of f at x.

If f is di erentiable at x, then the set @f (x) reduces to the singleton frf (x)g. The "{subdi erential @" f (x), however, is in general a set larger than @f (x) even if rf (x) exists. We will need another useful de nition:

De nition 2.2.3 Let f : IRn ! IR be an arbitrary function. The epigraph of f is de ned

to be

Epi f := f(x; ) 2 IRn  IR : f (x)  g; i.e. the epigraph of f is the set of points on and above the graph of f (x). Epi f is a convex set if and only if f is a convex function, it is closed if and only if f is l.s.c. (see Laurent [53]). Recall some well-known properties which will be needed in the sequel:

8

Chapter 2. Minimizing Differences of Functions

Lemma 2.2.1 Let f : X ! IR [ f+1g be a proper convex function and x 2 int dom f . We have: (i) y 2 @f (x) () f  (y ) = hx; y i , f (x), (ii) y 2 @" f (x) () f  (y ) + f (x) , hx; y i  ", (iii) If, in addition, f is l.s.c., then @f (x) is nonempty, (iv) If f is convex and l.s.c., then f  (x) = f (x) 8 x 2 int dom f .

Proof. (i,=)): y 2 @f (x) =) hx; yi , f (x)  hx; yi , f (x) 8 x 2 X =) hx; yi , f (x)  f (y). The converse inequality follows from the de nition of f . (i,(=): hx; y i, f (x) = f  (y )  hx; y i, f (x) 8 x 2 X =) f (x)  f (x)+ hx , x; y i 8 x 2 X =) y 2 @f (x). (ii): y 2 @" f (x) () f (x) ,hx; y i + hx; y i, f (x)  " 8 x 2 X () f (x) ,hx; y i + f  (y ) 

".

(iii): Since f is l.s.c. and proper, the epigraph Epi f is nonempty and closed. Therefore, for every x 2 int dom f there exists a hyperplane supporting Epi f at the point (x; f (x)) 2 Epi f , i.e. there exists a y 2 X  such that f (x)  f (x) + hx , x; y i 8 x 2 X . (iv): For any x 2 X ; y 2 X  , we have f (x)  hx; y i , f  (y ), therefore we have f (x)  f (x) 8 x 2 int dom f . To see the converse inequality, take x 2 int dom f and y 2 @f (x) 6= ;. Then f  (x)  hx; y i , f (y ) = f (x), where the last equality comes from (i). } Following the notion of Flores{Bazan [30], we introduce a last de nition:

De nition 2.2.4 Let f : X ! IR [f+1g be l.s.c., let C denote the space of continuous, real{valued functions on X and let x 2 dom f . We call @ f ( x) := f' 2 C : f (x)  f ( x) + '(x) , '( x)

8 x 2 Xg

the {subdi erential of f at x.

2.3 Optimality Conditions for D.C. Problems The next theorem summarizes several global optimality conditions for d.c. problems given in the literature.

Theorem 2.3.1 Let g; h : X ! IR be lower semicontinuous proper convex functions and let x 2 dom g \ dom h. Then the following conditions are equivalent:

9

2.3. Optimality Conditions for D.C. Problems

(G) (FB) (HU) (ST) (HPT)

x is a global minimizer of g , h on X , @ h(x)  @ g (x), @" h(x)  @" g (x) 8 "  0, g (x) , h(x) = inf  [h (y ) , g (y )], y2X maxfh(x) ,  : g (x)  ;  ,   g (x) , h(x); x 2 X ; ; 

2 IRg = 0.

Condition (FB) was recently given by Flores{Bazan in [30]. Condition (HU) was developed by Hiriart{Urruty in a series of papers, see [35, 36, 39]. Condition (ST) was discovered independently by Singer [67] and Toland [74, 75]. A version of (HPT) was shown by Horst, Pardalos and Thoai in [47], a similar condition was derived by Thoai in [73]. We give a proof of each of the conditions by showing some implications among them rather that establishing separate equivalences to (G). We show (G) =) (FB) =) (HU) =) (ST) =) (G) =) (HPT) =) (G). (G) =) (FB): (G) is equivalent to saying g (x) , g (x)  h(x) , h(x) 8 x 2 X . Therefore, for any function ' 2 C , if h(x) , h(x)  '(x) , '(x) 8 x 2 X , then trivially g (x) , g (x)  '(x) , '(x) 8 x 2 X , so (FB) is ful lled. (FB) =) (HU): Let "  0 and y 2 @" h(x) be given. We have to show that y 2 @" g (x). Assume we can construct a function ' with the properties ' 2 @ h(x) and Proof.

'(x) , '(x)  hx , x; y i , "

8x 2 X:

(2.2)

If we succeed in constructing such a ', then we are obviously done. To this purpose, take y 2 @h(x) and de ne '(x) := supfhx , x; y i , "; hx , x; yig

8x 2 X: Clearly, '(x) = 0 and ' 2 @ h(x) (since y 2 @"h(x) and y 2 @h(x)). Property (2.2) is

also ful lled, so (HU) holds. (HU) =) (ST): For y 2 @h(x), Lemma 2.2.1(i) and (HU) yield h (y ) , g (y ) = g (x) , h(x). For y 62 @h(x), we clearly have h (y ) > hx; y i , h(x), so " := h (y ) + h(x) , hx; y i > 0:

We have y 2 @" h(x) from Lemma 2.2.1(ii), therefore y 2 @" g (x) and, applying Lemma 2.2.1(ii) once more, g (y ) + g (x) , hx; y i  " = h (y ) + h(x) , hx; y i:

Hence we get h (y ) , g (y )  g (x) , h(x).

10

Chapter 2. Minimizing Differences of Functions

(ST) =) (G): From the assumption we have g (x) + g (y )  h(x) + h (y ) 8 y 2 X  . But then for every x 2 dom h \ dom g we have hx; yi , g(y) , g(x)  hx; yi , h(y) , h(x) 8 y 2 X : Taking the supremum over all y 2 X  gives with Lemma 2.2.1(iv) g (x) , h(x)  g (x) , h(x) 8 x 2 dom h \ dom g; as desired. (G) =) (HPT): Assume (HPT) is not ful lled, i.e. there exists (~x; ~ ; ~) 2 X  IR  IR satisfying the constraints g (~x)  ~ and ~ , ~  g (x) , h(x), but with an objective function value h(~x) , ~ > 0. Then it follows that g (~x) , h(~x) < ~ , ~  g (x) , h(x) which contradicts (G). (HPT) =) (G): Assume x is not a global minimizer, i.e. there exists x~ 2 X such that g (~x) , h(~x) < g (x) , h(x). Set ~ := g (~x) and ~ := g (~x) , g (x) + h(x). Then (~x; ~ ; ~) is feasible for the auxiliary problem in (HPT), but we have h(~x) , ~ > 0, a contradiction. } Condition (HPT) is substantially di erent from the remaining conditions, as it does not make use of any \dual information" whatsoever. Instead, it transforms the given minimization problem into a maximization problem with known optimal value, but at the price of two additional variables. Also, as can be seen from the proof, it does not depend on convexity of the functions g and h, i.e. (HPT) is also valid in nonconvex settings. (FB) was originally given for a more general problem: It is a necessary and sucient for global optimality if h is a proper and lower semicontinuous and g is an arbitrary proper function (see Theorem 3.1 in Flores{Bazan [30]). It is striking how similar conditions (HU) and (FB) are. If we remember that convex functions are upper envelopes of ane functions, whereas lower semicontinuous functions are upper envelopes of continuous functions, and if we recall the de nitions of @" h(x) and @ h(x), then we can guess that there is some connection between the \supporting functions" of g and h and the optimality conditions for the respective problems. This is indeed the case, as we outline in the next section.

2.4 Optimality Conditions for a More General Class of Functions We rst make the notion of \supporting functions" more precise: De nition 2.4.1 Let F be an arbitrary family of real{valued functions de ned on X . We call a function f : X ! IR [ f+1g an F {function if f is representable as 8x 2 X; f (x) = max '(x) '2F f

2.4. A More General Class of Functions

11

where Ff  F is a suitable subset.

For F {functions, we de ne an F {subdi erential and an F {conjugate function similar to De nitions 2.2.1 and 2.2.2:

De nition 2.4.2 Let f be an F {function and let x 2 dom f .  For "  0, we de ne the set @"F f ( x) := f' 2 F : f (x)  f ( x) + '(x) , '( x) , " 8 x 2 Xg to be the "{F {subdi erential of f at x. For " = 0, @ F f (x) := @0F f (x) will be called F {subdi erential of f at x.  We call the function f F : F ! IR, de ned as f F (') := sup['(x) , f (x)] x2X

the F {conjugate function of f . The function f FF (x) := (f F )F (x) =

sup['(x) , f F (')]

'2F

is called F {biconjugate function.

F {subdi erential and F {conjugate functions enjoy similar properties as the \ordinary" subdi erential and conjugate functions (cf. Lemma 2.2.1).

Lemma 2.4.1 Let f : X ! IR [ f+1g be a proper F {function and let x 2 dom f . (i) @"F f (x) is nonempty for any "  0, (ii) ' 2 @ F f (x) () f F (') = '(x) , f (x), (iii) f FF (x) = f (x) 8 x 2 dom f . Proof. (i): From the assumption we conclude that there exists a function '0 2 Ff with the properties '0 ( x) = f ( x);

'0 (x)

 f ( x) 8 x 2 X :

It is easy to see that '0 2 @"F f (x) 8 "  0. (ii, =)): We have '(x) , f (x)  '(x) , f (x) 8 x 2 X from the assumption, therefore '( x)

, f (x)  sup['(x) , f (x)] = f F ('):

The de nition of f F entails equality.

x2X

12

Chapter 2. Minimizing Differences of Functions

(ii, (=): If '(x) , f (x) = f F ('), then '( x)

, f (x)  '(x) , f (x)

8x 2 X;

so ' 2 @ F f (x). (iii): From the de nitions, we have

 '(x) , f F (') 8 x 2 X ; 8 ' 2 F : Therefore, f (x)  f FF (x) 8 x 2 dom f . For the converse inequality, let x 2 dom f and ' 2 @ F f (x). Then f FF (x)  '(x) , f F (x) = f (x), with the last equality coming from (ii). } f (x)

Part (i) of this proof relies on the fact that we have de ned F {functions to be the maximum rather than the supremum of some family of functions. This is analogous to Lemma 2.2.1(iii): If a convex function f is l.s.c. at x 2 int dom f , then there exists an ane function minorizing f and taking the value f (x) at x. Its slope is in @f (x), so the subdi erential is nonempty. With the tools developed above, we are able to formulate and prove optimality conditions for F {functions in an analogous way as in Section 2.3, but for a much more general class of functions.

Theorem 2.4.1 Let F be an arbitrary family of real{valued functions de ned on X , and let h : X ! IR [ f+1g be a proper F {function. Let g : X ! IR [ f+1g be an arbitrary proper function. Then a necessary and sucient condition for x 2 dom g \ dom h to be a global solution to the problem min [g (x) , h(x)] x2X is that

x) @"F h(

 @"F g(x)

8 "  0:

(2.3)

Proof. Necessity: is obvious (cf. the implication (G) =) (FB) in the proof of Theorem 2.3.1). Suciency: Assume that x is not a global minimizer. Then there exists a point x^ 2 dom g \ dom h such that g (^x) , h(^x) < g (x) , h(x). Take ' 2 @ F h(^x). Then h( x)

and We have ' 2

@ F h( x). "

" := h( x)

But from

 h(^x) + '(x) , '(^x); , h(^x) , '(x) + '(^x)  0:

g (^ x) < h(^ x) + g ( x)

, h(x) = g(x) + '(^x) , '(x) , "

2.4. A More General Class of Functions

13

we see that ' 62 @"F g (x).

}

A special instance of this general optimality condition is Hiriart{Urruty's condition for d.c. problems.

Remark 2.4.1 In (2.3) it is sucient to require: x) 8 " 2 S; x)  @"F g ( @"F h( where S is a dense subset of IR+ . Proof. We show that @"F h( x)  @"F g ( x) 8 " 2 S

=)

@"F h( x)

 @"F g(x) 8 " 2 IR+:

Let " 2 IR+ . Then there exists a sequence

f"ngn2IN  S ;

"n

 ";

with " = limn!1 "n . Now take ' 2 @"F h(x). Since @"F h(x)  @"Fn h(x) 8 n 2 IN (because of "  "n ), we have x) 8 n: ' 2 @"F h( n

But then

'

which means g (x)

2 @"Fn g(x) 8 n;

 g(x) + '(x) , '(x) , "n

8 x 2 IRn; 8 n 2 IN:

By passing to the limit over all n, we see that ' 2 @"F g (x).

}

Now the question arises, when we have to check @"F h(x)  @"F g (x) for all "  0 (as in Hiriart{Urruty's condition) and when it suces to check this condition for " = 0 only (as in Flores{Bazan's condition). The answer is that @ F h(x)  @ F g (x) is sucient, if the family F and with this the subdi erential is rich enough. More precisely:

Theorem 2.4.2 Under the assumptions of Theorem 2.4.1, if @ F h(x1 ) \ @ F h(x2 ) 6= ; for any two points x1 2 dom h; x2 2 dom h; then @ F h(x)  @ F g (x) is a necessary and sucient condition for the point x 2 dom g \ dom h to be a global minimizer of min [g (x) , h(x)]. x2X

14

Chapter 2. Minimizing Differences of Functions

Proof. This follows from the proof of Theorem 2.4.1.

}

A di erent characterization was pointed out to me by Werner Oettli (University of Mannheim, Germany):

Theorem 2.4.3 If the family F ful lls the two properties: (i) ' 2 F =) ' + c 2 F for any c 2 IR, (ii) '1 2 F ; '2 2 F =) maxf'1; '2g 2 F , then the following implication holds true: @0F h(x)  @0F g (x)

=) @"F h(x)  @"F g (x) 8 "  0:

Proof. Take ' 2 @"F h(x). We have to show that ' 2 @"F g(x). To this purpose, choose 'h 2 @0F h(x) and de ne (x) := maxf'(x) , '(x) , "; 'h (x) , 'h (x)g: Then 2 F and (x) = 0. Since h(x) , h(x) h(x) , h(x)

 '(x) , '(x) , ";  'h(x) , 'h(x);

we have h(x) , h(x)  (x) = (x) , (x) and therefore, 2 @0F h(x). But then, form our assumption, 2 @0F g (x), giving g (x) , g (x)  (x) , (x)  '(x) , '(x) , ";

so we get ' 2 @"F g (x).

}

A Singer{Toland type of optimality theorem (cf. (ST) of Theorem 2.3.1) also holds in the context of F {functions:

Theorem 2.4.4 Let g; h : X ! IR [ f+1g be F {functions. Then x 2 dom g \ dom h is a global minimizer of g , h on X if and only if g (x) , h(x) = 'inf [hF (') , g F (')]: 2F Proof. (=)): If x is a global minimizer, then for any function ' 2 F we have '(x) , g (x) + g (x)  '(x) , h(x) + h(x) 8x 2 X:

2.4. A More General Class of Functions

15

Taking rst the supremum over all x 2 X and then the in mum over all ' 2 F yields g (x) , h(x)  'inf [hF (') , g F (')]: 2F

Now assume the above inequality holds strictly. Then g (x) , h(x) < hF (') , g F (')

8 ' 2 F:

(2.4)

Choose '0 2 @ F h(x). Then we know from Lemma 2.4.1(ii) that hF ('0 ) = '0 (x) , h(x). Inserting this into (2.4) gives g F ('0 ) < '0 (x) , g (x);

contradicting De nition 2.4.2. ((=): We have g (x) + g F (')  h(x) + hF (') 8 ' 2 F . Therefore, for any x 2 X , '(x) , g F (') , g (x)  '(x) , hF (') , h(x)

8 ' 2 F:

Taking the supremum over all ' 2 F and exploiting Lemma 2.4.1(iii) gives g (x) , h(x)  g (x) , h(x)

8x 2 X;

so x is a global minimizer.

}

We have thus generalized conditions (HU) and (FB) as well as (ST). Throughout this chapter, the structure of the locally convex topological vector space X was, except for existence questions, remarkably rarely used. This shows that all of the optimality conditions are mere reformulations of the condition g (x) , h(x)  g (x) , h(x) 8 x 2 X and the topological structure is not a question of substance. In the remaining chapters we will therefore con ne ourselves to nite{dimensional problems, i.e. problems where X = IRn .

16

Chapter 2. Minimizing Differences of Functions

Chapter 3 Global Optimality Conditions for Convex Maximization Let D  IRn be a closed convex set satisfying int D 6= ;, and let f : IRn ! IR be a convex function. Recall that a set D is called convex if for any x1 2 D; x2 2 D, and for any  2 [0; 1] we have x1 + (1 , )x2 2 D:

When talking about convex maximization, we will mean the problem max f (x)

s.t. x 2 D:

(3.1)

In spite of the convexity of the function f , (3.1) is a nonconvex problem. This is because we have convexity \in the wrong direction". It is easy to construct problems where D is a polytope and each of its vertices yields a local maximizer of (3.1). Global optimality conditions for (3.1) can be derived from optimality conditions for d.c. problems, as the two problems are closely related. Nonetheless, the convex maximization problem has experienced a lot of research for it's own account. For the state{of{the{art in convex maximization, we refer to the survey by Benson [8] as well as to the textbooks by Horst and Tuy [51] and Horst et al. [47]. There, also various algorithms and abundant applications can be found. As already mentioned, in the remainder of the thesis, we restrict ourselves to the nite{ dimensional space IRn rather than treating arbitrary locally convex topological spaces. The results, however, extend in a straightforward way to the in nite{dimensional case. The results of this chapter have been published in Dur, Horst and Locatelli [25]. 17

18

Chapter 3. Optimality Conditions for Convex Maximization

3.1 From D.C. Programming to Convex Maximization We rst explain how the convex maximization problem can be derived from the d.c. problem. To this purpose we introduce functions that take the value zero on the constraint set D and the value +1 outside of D. De nition 3.1.1 Let D be an arbitrary set. The characteristic function (or indicator function) of D, D : IRn ! IR [ f+1g, is de ned as ( x2D D (x) = +01 for for x 62 D:

The characteristic function of a set D is a convex function if and only if D is a convex set. Now problem (3.1) can be rewritten as max [f (x) , D (x)]

s.t. x 2 IRn ;

, min [D (x) , f (x)]

s.t. x 2 IRn :

which is the same as So we see that (3.1) is, up to a change of sign, a special case of the d.c. problem (2.1), where D plays the role of g and f plays the role of h. The optimality conditions outlined in Chapter 2 extend in a natural way to the convex maximization case. We only need to calculate the approximate subdi erential and the conjugate function of D . Recall some de nitions and properties. De nition 3.1.2 Let D be a nonempty convex set and let "  0.

 Given x 2 D, the set N" (D; x) := fy 2 IRn : hx , x; y i  "

8 x 2 Dg

is called the set of "{normal directions to D at x. It contains the ordinary cone N (D; x) of normal directions, which is the limiting case for " = 0.

 The function D : IRn ! IR [ f1g, D (y ) := supfhx; y i : x 2 Dg

is called support function of the set D. Note that N" (D; x) = fy 2 IRn : D (y )  hx; y i + "g.

19

3.1. From D.C. Programming to Convex Maximization

There turns out to be identity between the approximate subdi erential of D and the set of "{normal directions and between the conjugate function of D and the support function of D, respectively.

Lemma 3.1.1

Let

D

be a convex set, let

x 2 IRn ,

and let

"  0.

We have

@"D (x) = N" (D; x) and

D (y ) = D (y )

8 y 2 IRn:

Proof. See Hiriart{Urruty and Lemarechal [40].

}

Therefore, for the convex maximization problem, the conditions (HU) and (ST) read as follows:

Condition (HU) for the convex maximization problem: x 2 D is a global maximizer of (3.1) if and only if @" f (x)  N" (D; x) 8 "  0: Condition (ST) for the convex maximization problem: x 2 D is a global maximizer of (3.1) if and only if f (x) = supn [ D (y ) , f  (y ) ]: y 2IR

Another global optimality condition for the convex maximization problem was proposed by Strekalovsky [69]{[71], see also Hiriart{Urruty [36]. It is closely related to (HU), but it uses a di erent \globalization tool": Instead of introducing the parameter " and requiring that the condition holds for all "  0, Strekalovsky considers all x which are in the same level set as x:

Condition (S) for the convex maximization problem: Assume that

inf f (x) < f (x):

x2D

(3.2)

Then x 2 D is a global maximizer of (3.1) if and only if @f (x)  N (D; x)

8 x 2 IRn such that f (x) = f (x):

Without the assumption inf ff (x) : x 2 Dg < f (x), condition (S) is not sucient for global optimality, as the following example shows:

20

Chapter 3. Optimality Conditions for Convex Maximization

Example 3.1.1 Consider f (x) = x2; D = [,1; 1] and x = 0. Then x is the only point in the level{set fx 2 IRn : f (x) = 0g and @f (0) = f0g  N (D; x), but x is not a global maximizer.

Strekalovsky's requirement that the condition @f (x)  N (D; x) has to hold for all points on the level{surface of f at level f (x) seems somewhat unnatural. Next, we present an alternative formulation of (S) with a new proof showing that only points on this level{ surface have to be considered which are in D.

Proposition 3.1.1 If there exists x0 2 int D satisfying f (x0) < f (x), then x 2 Argmaxff (x) : x 2 Dg () fx 2 IRn : f (x) = f (x)g \ int D = ;; (3.3) where Argmaxff (x) : x 2 Dg denotes the set of maximizers of the problem maxff (x) : x 2 Dg. The proof uses the following Lemma, where bdA denotes the boundary of a set A  IRn .

Lemma 3.1.2 Let A; B  IRn be convex sets satisfying int A \ int B 6= ;. Then bdA \ int B = ; () B  A: Proof. (=)): Assume there exists y 2 B n A. Let x 2 int A \ int B , and let [x; y [ = fz = (1 , )x + y : 0   < 1g: Then [x; y [  int B (cf. Rockafellar [60, Theorem 6.1]). But x 2 int A and y 2 B n A would imply ; 6= ([x; y [ \ bdA)  int B \ bdA, a contradiction. ((=): B  A =) int B  int A =) bdA \ int B = ;. } Proof of Proposition 3.1.1. Since f is convex on IRn, it is continuous everywhere (see Hiriart{Urruty and Lemarechal [40]), and hence lower semicontinuous and upper{ semicontinuous. Lower{semicontinuity implies that A := fx : f (x)  f (x)g is closed, upper{semicontinuity and f (x0) < f (x) imply that fx : f (x) < f (x)g is open and nonempty. Moreover, from a standard argument on the convexity of f , it follows that bdA = fx 2 IRn : f (x) = f (x)g: Therefore, Lemma 3.1.2 with B = D leads to (3.3). }

21

3.1. From D.C. Programming to Convex Maximization

Clearly, if int = 6 ;, with convex and closed, then = cl int so that the assumption that inf x2D ( ) () in (S) is equivalent to the assumption that ( 0) () for 0 some 2 int in Proposition 3.1.1. Moreover, D

D

f x

x

D

D

< f x

f x

< f x

D

02

( ) 8 such that ( ) = ()

= @f y

y

f y

f x ;

since otherwise () = ( ) = minf ( ) : 2 IRn g, which is absurd in view of ( 0) (). Using ( ) = f0g 8 2 int and ( ) 6= ; 8 2 we see that ( )  ( ) 8 2 such that ( ) = () (3.4) implies the optimality condition in (3.3), i.e. (3.4) is sucient for global optimality of . Necessity of (3.4) is trivial, since (3.4) holds for local maxima (cf. Chapter 4, where also more discussion on Condition (S) will be provided). Additional proofs of (S) and of (ST) will result from Section 3.2. We will also study a slightly di erent version of Condition (HPT) stated in Section 2.3. First, rewrite problem (3.1). Introducing an additional variable 2 IR, (3.1) can be written in so{called canonical d.c. form with linear objective f x

f y

f x

x

f x


0. Then f (~x) > t, i.e. t 6= maxft : (x; t) 2 F g: ((=): Assume that (x; t) is not an optimal solution of (3.5). Then there exists } (~x; t~) 2 D  IR with f (~x) , t~  0 and t~ > t, and hence f (~x) , t > 0. Notice that F = cl int F follows from D = cl int D and f convex, and assumption (ii) in (CDC) is obviously ful lled.

3.2 Interconnection of Global Optimality Criteria Next, we show how each of the four optimality conditions above can be derived from each of the other conditions. We rst show the following technical lemma

Lemma 3.2.1

Let

x 2 D.

Then

f (x)  supfd (y ) , f  (y ) : y 2 IRng:

Proof. If f (x) > supfD(y) , f (y) : y 2 IRng, then f (x) > D (y ) , f (y ) 8 y 2 IRn : (3.6) In (3.6), choose y 2 @f (x). From Lemma 2.2.1(i) we know that hx; yi , f (x) = f (y): (3.7) Combining (3.6) and (3.7) yields D (y ) < hx; y i, contradicting the de nition of D (y ), since x 2 D. } Now we are able to show the connection between the mentioned optimality criteria. To this end, we throughout assume that (3.2) holds.

(HU) () (ST) It follows from the de nitions of f (y ); @"f (x); N"(D; x) and D (y ) that (HU) is equivalent to h

f (y )  hx; y i , f (x) + "

=) D (y )  hx; y i + "

((=): (ST) is equivalent to D (y )  f (x) + f (y )

8 y 2 IRn;

i

8 "  0:

(3.8)

3.2. Interconnection of Global Optimality Criteria

23

so that (3.8) follows trivially. (=)): Assume that (ST) does not hold. In view of Lemma 3.2.1, this means f (x) < supfD (y ) , f (y ) : y 2 IRn g:

Then

D (y0) > f (x) + f  (y0)  hx; y0i

for some y0 2 IRn ; where the last inequality follows from De nition 2.2.1. Choose

(3.9)

" := f  (y0) , hx; y0 i + f (x)  0:

Then

f  (y0 ) = hx; y0i , f (x) + ";

i.e. the left hand side of the implication (3.8) is ful lled for this particular value of ". But inserting " into (3.9) yields D (y0 ) > hx; y0 i + ", so the righthand side of (3.8) is not satis ed. Hence also (HU) is violated. } (HU)

() (S)

(=)): Assume that (S) does not hold, i.e.

9 x~ 2 D with f (~x) = f (x) and y 2 @f (~x); where y 2= N (D; x~): From y 2 @f (~x) and f (~x) = f (x) follows

hx , x~; yi  0; whereas y 2= N (D; x~) implies

9 x 2 D such that hx , x~; yi > 0: (3.10) We distinguish two cases. If hx , x~; y i = 0, then y 2 @"f (x) 8 "  0, since f (x)  f (~x) + hx , x~; y i = f (x) + hx , x; y i + hx , x~; y i = f (x) + hx , x; y i 8 x 2 IRn; and, obviously, y 2 @f (x) implies y 2 @" f (x) 8 " > 0. Moreover, hx , x~; y i = 0 implies hx , x; yi = hx , x~; yi > 0; because of (3.10), such that y 2= N"(D; x) whenever " < hx , x~; y i. If, however, hx , x~; y i < 0, then, for " := hx~ , x; y i, one sees in a similar way by using f (~x) = f (x) and (3.10) that y 2 @"f (x) but y 2= N" (D; x).

24

Chapter 3. Optimality Conditions for Convex Maximization

((=): Assume that (HU) does not hold, i.e.

9 " > 0; y0 2 IRn; x0 2 D such that f (x)  f (x) + hx , x; y0i , " 8 x 2 IRn; (3.11) and

hx0 , x; y0i > ":

Choosing x = x0 in (3.11) yields, by means of (3.12), f (x0) assumption (3.2), there must exist x1 2 D, satisfying

>

(3.12) f ( x). But, from the

f (x1 ) < f ( x) < f (x0 ):

Then, from continuity and convexity of the function f , one sees that there exists x~ 2 [x1; x0[  D such that f (~x) = f (x) and, for d := (x0 , x~), the directional derivative f 0 (~ x; d) must be positive. Recall that the directional derivative of f at x ~ in direction d is de ned to be f (~ x + td) , f (~ x) f 0 (~ x; d) := lim t#0 t

(cf. Hiriart{Urruty and Lemarechal [40, Chapter VI.1]). Because of

fh i : y 2 @f (~x)g

f 0 (~ x; d) = sup d; y

(3.13)

(cf. again Hiriart{Urruty and Lemarechal [40, Chapter VI.1]), there is y0 2 @f (~x) satisfying hd; y0 i = hx0 , x~; y0i > 0; so that y0 2= N (D; x~) and (S) does not hold. } (HU)

() (CDC)

((=): Assume that (HU) does not hold, i.e. we have (3.11) and (3.12) and, as above, which means that (CDC) does not hold for (x; t) with t = f (x).

f (x0 ) > f ( x)

(=)): Assume that (CDC) does not hold. Then there is x0 2 D such that f (x0) > f (x). In other words, with g (x) = f (x) , f (x), we have (x0; 0) 2= Epi g; From convexity and continuity of f , we know that Epi g is a closed convex set. Therefore, there exists a hyperplane in IRn+1 strictly separating (x0; 0) from Epi g , i.e. there exist y 2 IRn and " > 0 such that

hx , x; yi , "  f (x) , f (x) while

hx0 , x; yi , " > 0:

This implies y 2 @" f (x) and y 2= N"(D; x).

8 x 2 IRn; }

25

3.2. Interconnection of Global Optimality Criteria (ST)

() (S)

From De nition 3.1.2, it follows immediately that y 2 N (~x; D) () D (y ) = hx~; y i:

Therefore, in the argumentation below, we will replace (S) by h

i

y 2 @f (~x)

=) D (y ) = hx~; y i 8 x~ 2 D such that f (~x) = f (x):

(3.14)

(=)): Assume that (3.14) does not hold. Then

9 y0 2 @f (~x) with f (~x) = f (x) such that D (y0) > hx~; y0i:

(3.15)

But from y0 2 @f (~x) and Lemma 2.2.1(i) we know that f  (y0) + f (~x) = hx~; y0i;

and hence by (3.15) D (y0) , f  (y0) > hx~; y0i , f  (y0 ) = f (~x) = f (x);

a contradiction to (ST). ((=): Assume that (ST) does not hold. Then, by De nition 2.2.1 and Lemma 3.2.1,

9 y0 2 IRn; x0 2 D such that hx0; y0i , f (x) > hx; y0i , f (x) 8 x 2 IRn:

(3.16)

For x = x0; (3.16) yields f (x0) > f (x). The remaining arguments are similar to those in the proof of (S) =) (HU). } (ST)

() (CDC)

((=): In the proof of (ST) (= (S) above, we saw that, if (ST) does not hold, then (passing through nonoptimality) one has f (x0) > f (x) for some x0 2 D. Then, clearly (CDC) cannot hold (consider t = f (x)). (=)): If (CDC) does not hold, then

9 x0 2 D; t0 2 IR; t0  f (x) such that f (x0) > t0 which implies f (x0) > f (x). Then, for y0 2 @f (x0), we have f (x0) + f  (y0) = hx0 ; y0i,

and hence

D (y0) , f  (y0) = D (y0 ) , hx0; y0 i + f (x0)  f (x0 ) > f (x);

since D (y0)  hx0; y0 i, i.e. (ST) does not hold.

}

26

Chapter 3. Optimality Conditions for Convex Maximization

(S)

() (CDC)

(=)): Above we saw that f (x0) > f (x) for some x0 2 D immediately follows when (CDC) does not hold. Then (S) cannot be ful lled (see the above proof of the suciency of (S) for global optimality in the proof of (HU) (= (S)). ((=): Following the lines of previous proofs involving (S), it is easy to see that, if (S) is not ful lled, then again there is x0 2 D satisfying f (x0) > f (x) (use f (x) = f (~x), the de nitions of @f (~x) and N (D; x~)), which again by setting t = f (x) contradicts (CDC). }

3.3 Further Optimality Conditions 3.3.1 Reformulation of (HU) in the Di erentiable Case Assume, in addition to the assumptions in the de nition of problem (3.1), that the set Let rf (x) denote

D is compact, and the convex function f is di erentiable everywhere. the gradient of f at x. Without loss of generality, let x = 0

and

f (x) = 0:

We begin with formulating condition (HU) in a di erent way. From the de nition of @" f (0) it follows that y 2 @"f (0)

() supfhx; yi , f (x) : x 2 IRng  " ()

f  (y )  ":

We assume that the above supremum is attained: this is the case, if, for example,

f (x) ,hx; y i is coercive, which is implied, e.g., when f (x) is 1{coercive (cf. Hiriart{Urruty and Lemarechal [39]). Since maximizers z 2 IRn of the concave function hx; y i , f (x) are characterized by the system rf (z ) , y = 0, we see that @"f (0) = frf (z ) : z 2 IRn ; hz; rf (z )i , f (z )  "g:

Notice that the above assumptions on the existence of maxfhx; y i , f (x) : x 2 IRn g for y 2 @"f (0), could be replaced in the following sense. Since y 2 @" f (0), we are not interested in vectors y for which hx; y i , f (x) is unbounded from above. Therefore, the above reasoning remains valid if, in the formula for @" f (0) we admit \stationary points at in nity", i.e. we add to @" f (0) all limits lim sup rf (zi), where fzig  IRn such that z !1 the limit exists and lim sup(hzi; rf (zi)i , f (zi))  ". IN

i

zi !1

Let again

D (y ) := supfhx; y i : x 2 Dg

27

3.3. Further Optimality Conditions

denote the support function of D and de ne r(") := supfD (rf (z )) : hz; rf (z )i , f (z )  "; z 2 IRn g:

(3.17) Then, the above arguments yield an equivalent formulation of @" f (0)  N" (D; 0) 8 "  0 (that is (HU)) as sup [r(") , "]  0: (3.18) "

0

Notice that, since the order of the two maximization processes involved in (3.17) can be reversed, one has r(") = sup x2D

(

h

supn hx; rf (z )i : hz; rf (z )i , f (z )  "

z

i

)

:

2IR

(3.19)

3.3.2 Maximization of Strictly Convex Quadratic Functions over Convex Sets In this subsection, the preceding result is specialized to the case of a strictly convex quadratic function f (x) = 21 hx; Qxi + hx; ci; (3.20) nn where Q 2 IR is a real positive de nite symmetric n  n matrix and c 2 IRn . Global optimality conditions and algorithms especially for quadratic programming over polyhedra have been developed by Bomze and Danninger in [12], [18] and [19]. The point x = 0 is an optimal solution of problem (3.1) with f (x) of the form (3.20), f (0) = 0, if and only if

Proposition 3.3.1



p

q



max max hx; ci + 2" hx; Qxi , "  0: "0 x2D Proof.

An easy calculation shows that (3.19) reduces to r(") = max x2D



h

hx; ci + zmax hx; Qzi : 21 hz; Qzi  " 2IRn

(3.21)

i

:

(3.22)

Let, for every x 2 D; "  0, z (x; ") denote an optimal solution of the inner maximization problem in (3.22). Clearly, for all x 2 D; z (x; 0) = 0, and for all "  0, one can choose z (0; ") = 0. If x 6= 0 and " > 0, then every z (x; ") must satisfy the corresponding Karush-Kuhn{Tucker (KKT) conditions if the Slater condition is ful lled. Moreover, since the linear function hx; Qz i attains its maximum over the ellipsoid E (") = fz 2 IRn : 12 hz; Qz i  "g at a boundary point of E ("), the KKT{conditions reduce to 1 2 hz; Qz i = ";

Q(x , uz ) = 0;

u 2 IR+ :

(3.23)

28

Chapter 3. Optimality Conditions for Convex Maximization

Since Q is positive de nite, hence nonsingular, one obtains from (3.23) the unique solution p 2"x z (x; ") = q

hx; Qxi

and

r (") = max x2D



 p q hx; ci + 2" hx; Qxi :

Since (3.24) also holds when " = 0, and when r(") is attained at (3.21) from (3.18).

(3.24) x

= 0, one obtains

}

Remark 3.3.1 (i) Notice that Proposition 3.3.1 can also be proved by the following arguments. Let, for x 2 IRn , " 2 IR , p q r ("; x) := hx; ci + 2" hx; Qxi , "; +

and rewrite (3.21) as

max max r ("; x) = max max r ("; x)  0: " x2D x2D " 0

0

Clearly, for every x 2 IRn , r(; x) is concave in [0; 1[ and di erentiable in ]0; 1[. Therefore, if it exists, r  (x) := maxfr ("; x) : "  0g is attained at "(x) = 0 or at a point "(x) satisfying @r("; x)=@" = 0 at " = "(x). This yields "(x) = 21 hx; Qxi so that

h i + hx; Qxi = f (x);

r  (x) = x; c

1 2

and (3.21) is equivalent to maxff (x) : x 2 Dg  0: (ii) If D is a polytope, then, by concavity of r("; ), for all "  0 the maximum of r("; x) over D is attained at some vertex v of D. Hence (3.21) can be reduced to

max max r("; v ) = vmax max r("; v )  0; " v 2V D 2V D " 0

(

)

(

)

0

where V (D) denotes the vertex set of D. Let "(v ) be the point where maxfr("; v ) :  0g is attained, and let

"

" :=

maxf"(v ) : v 2 V (D)g:

Then, obviously, we can replace "  0 by 0  "  ", and in (3.18) the maximum has to be taken only over this nite interval.

29

3.3. Further Optimality Conditions

3.3.3 Equivalences between Nonconvex Optimization Problems The alternative proof of Proposition 3.3.1 given in Remark 3.3.1 shows that Proposition 3.3.1 can be generalized to much more general problem classes.

Proposition 3.3.2 Let f : IRn ! IR, let 0 2 D  IRn, with D compact. Assume that there exists a closed, convex set C  IRm+ and a map r : C  IRn ! IR; ("; x) 7! r("; x)

satisfying

(i) r(; x) is concave for all x 2 D and 9 r(x) := maxfr("; x) : " 2 C g, (ii) r(x) = f (x) 8 x 2 D: Then maxff (x) : x 2 Dg = f (0) = 0 if and only if

max max r("; x) = max max r("; x)  0: "2C x2D x2D "2C

Proof. This is immediate from max r ("; x) = f (x) 8 x 2 D: "2C

(3.25)

}

Notice that concavity of r(; x) is not needed in Proposition 3.3.2. Concavity is assumed, however, in view of possible applications, since then r(x) can easily be computed by standard univariate optimization techniques. Although its proof is trivial, stating Proposition 3.3.2 seems to be worthwhile, because of its interesting practical applications in view of the lefthand side of the equality in (3.25): Finding r(") := maxfr("; x) : x 2 Dg can be very easy, so that considering the problem maxfr(") : "  0g rather than treating the original problem minff (x) : x 2 Dg makes sense numerically. Some examples are considered next.

Example 3.3.1 Let D  IRn, be compact, let h : IRn ! IR, let s : IRn ! IR+ with s(x) > 0 8 x 2 D . Then the following pairs f (x) and r ("; x) satisfy (i) and (ii) in Proposition 3.3.2 for the speci ed set C  IR+ (veri cation via @r("; x)=@" = 0):

q

(i) f (x) = h(x) , 2 s(x), r ("; x) = h(x) , "s(x) , 1" , C = ["0 ; 1[; "0 > 0. (ii) f (x) = h(x) + (1 , p1 )[s(x)]p=p,1; r ("; x) = h(x) , p1 "p + "s(x), C = IR+ .

p > 1,

30

Chapter 3. Optimality Conditions for Convex Maximization

(iii) f (x) = h(x) , log s(x), r("; x) = h(x) , "s(x) + log " + 1, C = ["0 ; 1[; "0 > 0:

Example 3.3.2 Let D  IRn be compact, h : IRn ! IR; ki : IRn ! IR+; and si : IRn ! IR+ such that si (x) > 0 8 x 2 D (i = 1; : : :; m). For C = IRm+ , the functions f (x) = h(x) + 12 and

r("; x) = h(x) , 12

Xm ki(x)=si(x) i=1

Xm "2i si(x) + Xm "iqki(x) i=1

i=1

satisfy (i) and (ii) of Proposition 3.3.2 (veri cation via r" r(x; ") = 0). Notice that, for example, if D is a polytope, h is convex, and ki = `2i , with `i : IRn ! IR+ convex, si concave (i = 1; : : : ; m), then r("; ) is concave, and hence, for each " 2 IRm+ , maxfr("; x) : x 2 Dg is attained at a vertex of D, whereas f (x) is (in general) neither convex nor concave, i.e. vertex optimality of maxff (x) : x 2 Dg cannot be inferred directly. If D is convex, h is concave, ki are concave, and si are convex (i = 1; : : : ; m), then, for each " 2 IRm+ , maxfr("; x) : x 2 Dg is a standard concave maximization problem, i.e. the problem of maximizing a concave function over a convex set.

Example 3.3.3 Let D  IRn; compact, h : IRn ! IR; s; k : IRn ! IR+, C = IR+ . Then f (x) = h(x) , minfs(x); k(x)g and

r("; x) = h(x) , (j" , 1jk(x) + "s(x)) satisfy Proposition 3.3.2 (consider the cases " > 1 and "  1). If D is convex, h is concave and s; k are convex, then maxfr("; x) : x 2 Dg is again a standard concave

maximization problem.

Chapter 4 Connections between Local and Global Optimality Conditions Up to now, we have developed and proved several criteria for global optimality. In this chapter, we will investigate the connection between these global optimality conditions and conditions for local optimality. We will deal with both the d.c. problem and the convex maximization problem, which we restate for the ease of reference:

D.C. Problem:

min[g (x) , h(x)] s.t. x 2 IRn : g; h : IR ! IR are taken to be convex functions.

(4.1)

n

Convex Maximization Problem: max f (x) s.t. x 2 D;

(4.2)

with f : IRn ! IR being a convex function and D  IRn being a closed convex set. In the next sections, we will develop necessary and sucient local optimality conditions in terms of "{subdi erentials. We also give a generalization of Strekalovsky's condition (S) to the d.c. problem. The results of this chapter can also be found in Dur [22].

4.1 Necessary Conditions for Local Optimality In Section 3.1 we saw that in the optimality conditions for the convex maximization problem of Hiriart{Urruty and Strekalovsky, respectively, the inclusion @f (x)  N (D; x) was \globalized" in two di erent ways: In Hiriart{Urruty's condition by introducing the parameter " and by requiring that the condition @" f (x)  N" (D; x) holds for all "  0, in Strekalovsky's condition by demanding that @f (x)  N (D; x) is valid for all x in the level set of x . Since Hiriart{Urruty's condition for the convex maximization problem was derived from the d.c. problem, the idea to modify Strekalovsky's condition 31

32

Chapter 4. Local and Global Optimality Conditions

in an analogous way in order to obtain an optimality condition for the d.c. problem is immediate, but it turns out that this is not possible in a straightforward way: We have x  is a global solution of (4.1) =) @h(x)  @g (x) 8 x 2 IRn such that g (x) , h(x) = g (x) , h(x); but the converse is not true, as the following example shows:

Example 4.1.1 Consider the functions g; h : IR ! IR, ( ( 2 0 for x  0 x for x  0 g (x) = and h(x) =

for x > 0 0 for x > 0 The point x = 0 ful lls @h(x) = @g (x) = f0g, and, since g , h is strictly increasing, it is the only point with the property g (x) , h(x) = g (x) , h(x). Nevertheless x = 0 is not a global minimizer of g (x) , h(x). x2

We will give a modi cation of Strekalovsky's optimality criterion to d.c. problems in Section 4.4. In both optimality conditions by Hiriart{Urruty, for the d.c. problem as well as for the convex maximization problem, there is an underlying inclusion which is globalized by introducing the parameter ". Next, we investigate these underlying inclusions. Without the globalizing parameter ", are these inclusions maybe local optimality conditions? It turns out that they are both necessary but not sucient for local optimality. First, we deal with the convex maximization problem.

Proposition 4.1.1 If x is a local solution of (4.2), then @f (x)  N (D; x). Proof. Using again the directional derivative f 0 ( x; d) := lim t#0

we have:

x 

f ( x + td) t

, f (x) ;

is a local solution of (4.2) =) f 0 (x; x , x)  0 8 x 2 D () sup hx , x; yi  0 8 x 2 D y 2@f ( x)

() hx , x; yi  0 8 x 2 D; 8 y 2 @f (x) () @f (x)  N (D; x):

The rst equivalence comes from (3.13). } In the above proof, the implication f 0 ( x; x , x )  0 8 x 2 D =) x is a local solution of (4.2) does not hold in general (it does hold under additional assumptions, cf. Proposition 4.2.1). To see this, consider

33

4.1. Necessary Conditions for Local Optimality

Example 4.1.2 max f (x) = x21 + x22 s.t. x 2 D = [0; 1]2 Take the point x = (0; 1) which is obviously not a local solution to this problem, although f (x; x , x)  0 8 x 2 D. We also have @f (x) = f(0; 2)g  N (D; x). 0

This example shows that @f (x)  N (D; x) is not sucient for local optimality, even if the additional assumption (3.2) in Strekalovsky's condition, namely inf x IRn f (x) < f (x), is ful lled. For the d.c. problem, we can show a result analogous to Proposition 4.1.1: 2

Proposition 4.1.2 Let x be a local solution of (4.1). Then @h(x)  @g(x). To prove this, we rst show the following lemma:

Lemma 4.1.1 Let g : IRn ! IR be a convex function, U (x) a neighbourhood of x. If g (x)  g (x) + hx , x; y i 8 x 2 U (x); then y 2 @g (x). Proof. Let x~ 62 U (x). We can choose  2]0; 1[, such that z := x + (1 , )~x 2 U (x). Then we have from our assumption, and

g (z )  g (x) + hz , x; y i g (z )  g (x) + (1 , )g (~x)

because of the convexity of g . This yields g (x) + (1 , )g (~x)  g (x) + hz , x; y i or equivalently g (~x)  g (x) + 1=(1 , )hz , x; y i: Plugging in the de nition of z gives g (~x)  g (x) + hx~ , x; y i: Since x~ was arbitrary, this means y 2 @g (x).

}

The remarkable property described in Lemma 4.1.1 does not hold for the "{ subdi erential, which shows that "{subgradient is not a local but a global concept:

34

Chapter 4. Local and Global Optimality Conditions

2 Example 4.1.3 Consider the function {subp g(px) = x and the point x = 0. The "p di erential of g at x is: @"g (0) = [,2 "; 2 " ]. Choose  > 2. Then clearly y :=  " 62

@" g (0).

De ne a neighbourhood U (0) :=

 

, ,

q

2 , 4

p

"=2;



,

q

p

2 , 4



"=2 :

Then we have g (x)  yx , " for all x 2 U (0). So we see that for any y 62 @" g (0) we can nd a neighbourhood U (0) such that g (x)  yx , " 8 x 2 U (0).

Proof of Proposition 4.1.2. Let y 2 @h(x). Then h(x) , h( x)  hx , x; y i 8 x 2 IRn: Since x is a local solution, there exists a neighbourhood U (x) such that g (x) , g ( x)  h(x) , h( x)

implying that

g (x)  g ( x) + hx , x ; y i Lemma 4.1.1 shows that y 2 @g (x).

8 x 2 U (x); 8 x 2 U (x):

}

To see that @h(x)  @g (x) is not sucient for x to be a local minimizer of (4.1), consider again the functions of Example 4.1.1. The point x = 0 ful lls @h(x) = @g (x) = f0g, but is not a local solution of g (x) , h(x).

4.2 The Piecewise Ane Case In this section, we show that the necessary local optimality conditions established in Section 4.1 are also sucient provided that the functions g and h are piecewise ane. The same result was, by slightly di erent means, attained by Hiriart{Urruty [35]. Recall that a convex function f : IRn ! IR is piecewise ane if it can be represented as the supremum of a nite number of ane functions: f (x) = sup[hx; aii + bi ]; i2I

where ai 2 IR

n

; bi

2 IR and I is some nite index set.

Proposition 4.2.1 Let f be a piecewise ane convex function. If f ( x; x , x)  0 8 x 2 D; 0

then x is a local maximizer of (4.2).

4.2. The Piecewise Affine Case

35

Let  2 . Then there exists a neighbourhood () such that the function is ane in () along each feasible direction . Now take ~ 2 () and consider the direction := ~ , . There exists an index 2 (depending on ~), with (~) = h ~ j i + j and () = h ~ j i + j . Then we have from our assumption: ( ~ , ) = h ~ ,  j i  0 () h ~ j i + j  h  j i + j () (~)  () which means that  is a local maximizer of (4.2). } Proof.

x

D

U x

U x

x

f

d

U x

d

x

f x

f

0

x; a

x; x

b

x

x

x

f x

x

j

x; a

I

b

x; a

x; a

f x

b

x; a

b

f x ;

x

Corollary 4.2.1 Let f be a piecewise ane convex function. Then x  is a local maximizer of (4.2) if and only if @ f (x)  N (D; x).

Proof.

This follows from Proposition 4.2.1 and the proof of Proposition 4.1.1.

}

In particular, Proposition 4.2.1 and Corollary 4.2.1 are valid for linear problems. For the d.c. problem (4.1) we have the same result: Let be piecewise ane convex functions. If  is a local solution of (4.1).

Proposition 4.2.2

g; h

()  (), then

@h x

@g x

x

Let be a direction in IRn. Then ()  () =) supfh i : 2 ()g  supfh i : 2 ()g () ( )  ( ) () ( , ) ( )  0 Since and are piecewise ane, so is , . Therefore, there exists a neighbourhood () such that , is ane along all directions in (). A similar reasoning as in the proof of Proposition 4.2.1 now shows that  is a local solution of (4.1). } Proof.

d

@h x

@g x

d; y

h

0

g

g

U x

h

x; d h

y

g

0

0

@h x

x; d

y

@g x

:

g

g

d; y

x; d

h

h

d

U x

x

Combining Proposition 4.1.2 and Proposition 4.2.2 we see that in the piecewise ane case the modi cation of Strekalovsky's condition is easy: Let g; h be piecewise ane convex functions. x 2 IRn is a global minimizer of (4.1) if and only if n @ h(x)  @ g (x) for all x 2 IR such that g (x) , h(x) = g (x) , h(x):

Corollary 4.2.2

36

Chapter 4. Local and Global Optimality Conditions

4.3 Sucient Conditions for Local Optimality The next theorem shows that, reducing the size of the parameter set " is taken from, we get a sucient local optimality condition for d.c. problems.

Theorem 4.3.1 Let g; h : IRn ! IR be convex functions. x 2 IRn is a local minimizer of g , h on IRn if there exists " > 0 such that @" h( x)

 @"g(x)

8 " 2 [0; " ]:

Proof. Assume the assertion is not true. Then in every neighbourhood of x there is a point with smaller objective function value, i.e. there exists a sequence fxn gn2 with IN

xn

and

! x

(n ! 1)

, h(xn) < g(x) , h(x) 8 n 2 IN: Now for every n, choose yn 2 @h(xn). Then, by de nition, g (xn )

h(x)

Therefore, for all n,

 h(xn) + hx , xn; yni

8 x 2 IRn:

(4.3)

:= h(x) , h(xn ) + hxn , x; yn i  0: Form (4.3) and the de nition of "n it follows that "n

yn

2 @"n h(x)

8 n 2 IN:

(4.4)

Since xn ! x, the sequence yn is bounded (this follows from Hiriart{Urruty and Lemarechal [40, Proposition VI.6.2.2]). Hence "n ! 0 (n ! 1). Therefore, there must be an index N such that "n

2 [0; "]

8 n  N:

But from

() ()

g (xN )

g (xN ) , g ( x) < h(xN ) , h( x) , g(x) < hxN , x; yN i , h(x) + h(xN ) , hyN ; xN , xi g (xN ) , g ( x) < hxN , x ; yN i , "N

} we see that yN 62 @"N g (x). With (4.4) this contradicts our assumptions. Unfortunately, the converse of Theorem 4.3.1 is again not true. We can have local minimizers x of g (x) , h(x) which do not ful ll @" h(x)  @" g (x) for any strictly positive ", as illustrated by the following example:

37

4.4. Generalizing Strekalovsky's Optimality Condition

Example 4.3.1 Let g (x) =

(

0 for (x , 1)2 for



x 1 x>1

and

h(x) =

(

(x + 1)2 for 0 for

 ,1 ,1

x x>

The point x = 0 is a local minimizer. But for any " > 0, we have

p

@" g (0) = [0; 2(

whereas

1 + " , 1)];

p ,2( 1 + " , 1); 0]: So there is no " > 0 such that @" h(0)  @" g (0), in spite of the fact that x = 0 is a local @" h(0) = [

minimizer.

The equivalent of Theorem 4.3.1 for the convex maximization problem (4.2) is immediate:

Theorem 4.3.2 Let f : IRn ! IR be a convex function, D be a convex set. x 2 D is a local maximizer of f on D if there exists " > 0 such that @" f ( x)

 N"(D; x)

8 " 2 [0; "]:

The proof is analogous to the proof of Theorem 4.3.1 and is omitted here. This condition is also not necessary for local optimality, as the next example shows:

Example 4.3.2 max h(x) = (x , 1)2

s.t. x 2 D = [0; 3]:

Obviously, the point x = 0 is a local maximizer. An easy calculation shows that i h p p @" h( x) = 2(, 1 + " , 1); 2( 1 + " , 1) and

] , 1; "=3]: So we have @"h(x)  N"(D; x) only if "  24 or " = 0. N" (D; x ) =

4.4 A Generalization of Strekalovsky's Optimality Condition to D.C. Problems We saw that a straightforward modi cation of Strekalovsky's criterion is only possible in special cases. With the help of Theorem 4.3.1, we are now able to give a generalization that is valid without any additional assumptions on g and h.

38

Chapter 4. Local and Global Optimality Conditions

Theorem 4.4.1 Let g; h : IRn ! IR be convex functions. x 2 IRn is a global minimizer of g , h on IRn if and only if there exists " > 0 such that for all x 2 IRn with g (x) , h(x) = g ( x) , h( x) we have

Proof.

@" h(x)

 @"g(x) 8 " 2 [0; " ]:

(=)): All x 2 IRn satisfying g (x) , h(x) = g (x) , h(x) are also global solutions, therefore for all x 2 IRn such that g (x) , h(x) = g (x) , h(x) the inclusion @" h(x)  @" g (x) for all "  0 holds true. ((=): From Theorem 4.3.1, we know that all x 2 IRn with g (x) , h(x) = g (x) , h(x) are local minimizers. But if all x in the level{set of x are local minimizers, then x must be a global minimizer (this follows from the continuity of g , h, for a detailed proof see [37, Exercice 2.5]). }

Chapter 5 Remarks on D.C. Decompositions In the previous chapters, we saw several optimality conditions for problems involving d.c. functions, i.e. functions which are representable as the di erence of two convex functions. Two questions arise: 1. How can we recognize whether or not a function is d.c.? 2. How can we construct a d.c. decomposition for a given function?

5.1 Existence of D.C. Decompositions An answer to the rst question was given by Hartman [33] who showed that d.c. decomposability is in fact a local property:

De nition 5.1.1 A function : IRn ! IR is called locally d.c., if, for every  2 IRn, f

there exists a convex neighbourhood of  and convex functions ( ) = U ( ) , U ( ) for all 2 U

f x

g

x

x

h

x

x

gU

and

x

hU

such that

U:

Hartman [33] showed the following:

Theorem 5.1.1 Every locally d.c. function on IRn is globally d.c. on IRn. An important consequence of this theorem is the following corollary.

Corollary 5.1.1 Every function : IRn ! IR whose second partial derivatives are continuous everywhere is d.c. f

Another interesting property is stated in Hiriart{Urruty [34]. 39

40

Chapter 5. Remarks on D.C. Decompositions

Theorem 5.1.2

Let f be a di erentiable d.c. function on IR. f is then continuously di erentiable and can be written as di erence of continuously di erentiable convex functions.

More about d.c. functions, the behaviour of the d.c. property under function operations, and the construction of new d.c. functions from given ones can be found in Hiriart{Urruty [34], in Horst et al. [47], as well as in Tuy [77].

5.2 D.C. Decompositions for Polynomials The second question, how to construct a d.c. decomposition for a given function, is a very dicult one. According to Corollary 5.1.1, d.c. functions are related to functions of bounded variation. Recall that a function f : [a; b]  IR ! IR is called of bounded variation, if there exists a constant M > 0 such that for every choice of fx0; x1; : : : ; xn g  [a; b] we have n jf (xk ) , f (xk,1)j  M:

X

k=1

It is well{known from elementary calculus that a function f is of bounded variation on [a; b] if and only if it can be written as the di erence of two monotonically increasing functions. These two functions are given explicitly in terms of the variation of f . Remembering that the integral of a monotonically increasing function is a convex function, this approach may lead to the desired d.c. decomposition of a function whose rst derivative is of bounded variation. The main diculty in this approach is, of course, the calculation of the variation. This may be as dicult as solving the corresponding optimization problem. Another drawback is that the above reasoning is only valid for functions of a single variable, as there is no adequate concept of variation in higher dimensions. Thus, the question remains dicult. Nevertheless, at least for polynomials some decompositions can be given. First, consider polynomials of one variable. Obviously, there is no problem if the exponent is even. For polynomials of the form xn with n odd, we give a decomposition in the next theorem. The main idea behind this theorem is that nding a d.c. decomposition of a function f is equivalent to nding a convex function ' such that f + ' is convex. Also recall that a twice di erentiable function is convex if and only if its Hessian is a positive semide nite matrix.

Theorem 5.2.1

Let f : IR ! IR; f (x) = xn , with n  3 odd. Let

:=

n(n , 1) 4(n + 1)(n , 2) > 0:

Then g (x) := xn+1 + xn + xn,1 is a convex function.

41

5.2. D.C. Decompositions for Polynomials

Proof. The second derivative of g is "

g 00 (x) = xn,3 (n + 1)nx2 + n(n , 1)x +

#

n(n , 1)2 4(n + 1) :

It is easy to see that the expression in brackets is nonnegative for any x 2 IR. Therefore,

h

f (x) = xn+1 + xn + xn,1

i

,

h

xn+1 + xn,1

}

i

is a d.c. decomposition of f (x) = xn , for n  3 odd. With Theorem 5.2.1, any one{dimensional polynomial can be decomposed as the di erence of two convex polynomials. This method has the advantage that the decomposition remains within the class of polynomials. Next, we aim to develop decompositions of polynomials of more than one variable. This is not straightforward any more. We will end up with an iterative procedure which is based on a series of lemmata that are proved rst. It will turn out that nonnegative d.c. decompositions play an important role.

De nition 5.2.1 A d.c. decomposition f = g , h is called nonnegative, if both g  0 and h  0. Lemma 5.2.1 Any convex function f : IRn ! IR has a nonnegative d.c. decomposition. Proof. If f  0, then no decomposition is necessary. Therefore, assume there exists x 2 IRn with f (x) < 0. Take y 2 @f (x). Then, by de nition, f (x) + hx , x; yi  f (x)

so

8 x 2 IRn;

'(x) := minff (x) + hx , x; yi; 0g  f (x)

8 x 2 IRn:

' is nonpositive by construction and, as the minimum of two ane functions, ' is concave. Hence f , ' is convex and nonnegative, and f = [ f , ' ] , [ ,' ]

is the desired nonnegative d.c. decomposition of f .

}

Corollary 5.2.1 Any d.c. function f : IRn ! IR has a nonnegative d.c. decomposition.

42

Chapter 5. Remarks on D.C. Decompositions

Proof. Let f = g , h be an arbitrary d.c. decomposition of f . From the above lemma we know that both g and h have nonnegative d.c. decompositions, say g = 'g ,

where 'g ; g ; 'h ;

h

g

h = 'h ,

;

h

;

: IR ! IR+ are nonnegative convex functions. Then n

f = [ 'g +

h

] , [ 'h +

g

]

}

is a nonnegative d.c. decomposition of f .

The next lemma is trivial, but important for our iterative method to construct d.c. decompositions of arbitrary polynomials.

Lemma 5.2.2 Let f : IRn ! IR+ be a nonnegative convex function. Then f 2 is convex. Proof. This is straightforward from the de nition of convexity. Corollary 5.2.2 Let f1; f2 : IRn ! IR+ be nonnegative convex functions. Then h i f1f2 = 12 (f1 + f2)2 , 21 f12 + 12 f22

}

(5.1)

is a nonnegative d.c. decomposition of their product.

Proof. This obviously follows from Lemma 5.2.2.

}

To derive a d.c. decomposition of a product of arbitrary d.c. functions f1 and f2 , we can assume that we are given nonnegative d.c. decompositions of both f1 and f2, say f1 = g1 , h1

f2 = g2 , h2 ;

where g1 ; h1; g2; h2 : IR ! IR+ are nonnegative convex functions. Then n

f1f2 = (g1 , h1 )(g2 , h2 ) = g1g2 , h1 g2 , g1 h2 + h1 h2 :

The last four terms are products of nonnegative convex functions and can be decomposed according to (5.1), yielding the nonnegative decomposition h i h i f1f2 = 21 (g1 + g2 )2 + (h1 + h2 )2 , 21 (g1 + h2 )2 + (g2 + h1 )2 :

This decomposition has also been given by Horst et al. [47].

(5.2)

43

5.2. D.C. Decompositions for Polynomials

Now we iteratively gain d.c. decompositions of polynomials of arbitrary many variables. To start, we only need a nonnegative d.c. decomposition for ( ) = . The decomposition used in the proof of Lemma 5.2.1 is not a polynomial one any more. Therefore, as we would like the decomposition to stay within the class of polynomials, we propose to use f x

x

Decomposing

= [ 2 + + 1] , [ 2 + 1] x

x

x



xy

x

:





= [ 2 + + 1] , [ 2 + 1] [ 2 + + 1] , [ 2 + 1] x

x

x

y

y

y

according to (5.2) gives, after some calculation, xy

=

1 2

h

2 4+2 3+9 2+4 +4 x

h

x

x

x

, 21 2 4 + 2 3 + 9 2 + 4 + 4 x

x

x

x

2 2

+2

2

+2

2

+2 4+2 3+9 2+4 +8+2

2 2

+2

2

+2

2

+2 4+2 3+9 2+4 +8

x y

x y

x y

x y

xy

xy

y

y

y

y

i

xy

i

y

y

y

y

This decomposition immediately shows the drawback of the iterative method: The exponents grow too quickly and the decomposition becomes huge and complicated. For example, the function ( ) = allows a much simpler decomposition, which is a consequence of the next proposition. f x; y

(

Proposition 5.2.1 g x; y

Proof.

The Hessian

) :=

xy

+

xy

x

2

of is

H

g

H

+

y

2 is a convex function.

= 21 12

!

}

which is a positive de nite matrix. This proposition yields the (even positive) d.c. decomposition xy

Polynomials of the form ( position. We have

f x; y

=

h xy

)=

+

n n

x y

x

2

+

y

2

i

,

h x

+

2

y

2

i :

, with  2 even, also permit an elegant decomn

i

h

h

= n n + 2n + 2n , 2n + 2n for  2 even. This is entailed by the following proposition. n n

x y

x y

x

x

y

i

y

n

Proposition 5.2.2 If n function.

2

(

is even, then g x; y

) :=

n n

x y

+

x

2n

+

y

2n is a convex

44

Chapter 5. Remarks on D.C. Decompositions

Proof.

H

=

The Hessian of this function is ( , 1) n,2 n + 2 (2 , 1) n n

x

y

n

2 n,1 y n,1

n

x

2n,2

( , 1)

n x

n n

!

2 n,1 y n,1

n x

+ 2 (2 , 1)

n n,2

x y

n

n

y

2n,2

:

Since the rst principal minor ( , 1)

n n

x

n,2 y n

+ 2 (2 , 1) n

n

x

2n,2

0

8 ( ) 2 IR2 x; y

is nonnegative and the determinant det

H

( , 1)2 2n,2 2n,2 + 2 2 (2 , 1)( , 1) 3n,2 n,2 + 2 2(2 , 1)( , 1) n,2 3n,2 + 4 2(2 , 1)2 2n,2 , 4 2n,2 2n,2 h i = 2 2( , 1)(2 , 1) 3n,2 n,2 + n,2 3n,2 h i + 2 ( , 1)2 + 4 2(2 , 1)2 , 2 2n,2 2n,2

=

2

n

n

x

n

n x

n

y

n

x

n

n

y

x

n

n

y

x

y

2n,2

y

n

n

n

n

n

n

is also nonnegative for all ( function.

x; y

x

y

n

) 2 IR2 ,

n

H

x

n

y

x

y

is positive semide nite and

g

is a convex

}

Summing up, we have seen that nding a d.c. decomposition of arbitrary polynomials is possible in an iterative way, but this method is not really practical. For some special polynomials, a simpler decomposition can be found, but the question has not been solved satisfactorily in the general case. Of course, not only polynomials are decomposable as di erence of convex functions. For example, decompositions of piecewise ane functions can be found in Bittner [11] and Melzer [56].

Part II Algorithmical Aspects

45

Chapter 6 Introduction to Part II In the rst part of this thesis, we dealt with several conditions characterizing a global optimizer of a given optimization problem. Such criteria are very interesting from a theoretical point of view. In practical situations, however, one wants to calculate the global optimizer explicitly. In most cases, global optimality conditions do not help very much in this task, as they are mere descriptions of the optimal solution (but see Bomze and Danninger [12], [18] and [19]. Therefore, we are led to the eld of algorithms for solving global optimization problems. Many di erent types of algorithms have been proposed for di erent classes of problems. The most popular among them are cutting plane algorithms, outer approximation methods, and Branch{and{Bound algorithms. For a detailed introduction to the theory of these algorithms we refer to the textbooks co{authored by Horst [51], [47] and [46]. In the present thesis, we will concentrate on Branch{and{Bound methods. These methods are often used to solve various types of global optimization problems. The basic idea is to relax the feasible set, to partition this relaxed set successively and to compute lower and upper bounds of the optimal objective function value on the so obtained partition sets. As both the partitioning and the bounding procedures can be adapted to the respective problem, Branch{and{Bound forms a relatively exible method. We will brie y review the prototype Branch{and{Bound algorithm in Chapter 7, where we will also discuss the bounding and partitioning procedures and give some basic convergence conditions. In Chapter 8, duality aspects will come in again. We will develop an idea which comes straightforward from Lagrange duality. It is well{known that, when a Lagrange dual problem is associated to a minimization problem, the optimal value of the dual yields a lower bound on the optimal value of the primal problem. We make this idea more precise and investigate the convergence of algorithms which, in each iteration, solve a Lagrange dual problem to calculate a lower bound. We also investigate problems which are convex in some of the variables, and nonconvex in the rest of the variables. For these problems, we show that operating entirely in the space of the nonconvex variables also results in a convergent algorithm. Since the problem dimension can be reduced considerably by this technique, numerical e ort will be reduced as well. Moreover, it turns out that in 47

48

Introduction to Part II

some special cases the Lagrange dual problem can be transformed into an ordinary linear problem, so that this approach seems to be numerically promising. A special problem to which this reasoning applies is studied in both Chapter 9 and 10: the sum{of{ratios problem consists of maximizing a sum of p  2 quotients of ane functions over a polytope. We show how its dual, which is equivalent to an LP, can be used to obtain upper bounds. To obtain lower bounds, we propose to compute ecient points of a corresponding multiple{criteria problem, which is also discussed in more detail. Numerical results from a Fortran 90 implementation show that the algorithm is fast and applicable to problems with two, three or four ratios involved. For the sake of comparison, a second Branch{and{Bound algorithm for the same problem is studied in Chapter 10. This algorithm does not use Lagrange duality in the bounding procedures. Instead, it uses linear subfunctionals to compute bounds, which is also a widely used technique. Numerical experiments show, however, that this algorithm does not behave satisfactorily, compared to the algorithm presented in Chapter 9.

Chapter 7 The Branch{and{Bound Algorithm This chapter is supposed to be an auxiliary one for the remainder of the thesis. It collects some well{known basic facts on Branch{and{Bound algorithms, on practical issues as well as on convergence theory. The purpose of this is to create a framework which can be referred to in the remaining chapters. Branch{and{Bound algorithms have experienced a vast number of applications. Their general theory, implementational topics, numerical experiments and many instances of Branch{and{Bound algorithms applicable to special types of problems are comprehensively described in Horst/Tuy [51] as well as in Horst et al. [47]. There also a large number of further references can be found.

7.1 The Basic Branch{and{Bound Scheme We present a Branch{and{Bound scheme for problems of the following type: min f (x) s.t. hi (x)  0; i = 1; : : :m; where f; hi : IRn ! IR are l.s.c. functions (i = 1; : : : ; m). Assume the feasible set M := fx 2 IRn : hi (x)  0; i = 1; : : :; mg to be nonempty and compact. The Weierstrass theorem then ensures the existence of the global minimum. The basic idea of Branch{and{Bound is rather simple: Start with a relaxed set C1  M and compute a lower and an upper bound on min x2M f (x). Partition C1 into nitely many subsets and compute improved lower and upper bounds. Repeat this process until the di erence of lower and upper bounds is small enough. Such is the basic idea. In order to make this more precise we next state the exact algorithmical scheme. The following sections provide more details about partitioning 49

50

Chapter 7. The Branch{and{Bound Algorithm

and bounding procedures as well as convergence conditions.

Basic Branch{and{Bound Algorithm Initialization:

Compute a relaxed set C1  M . Compute a lower bound (C1) satisfying f (x): (C1)  x min M C 2

\

1

Compute a ( nite) set Q(C1)  M \ C1 of feasible points. (C1 ) := minff (x) : x 2 Q(C1 )g is an upper bound for the optimal value on M \ C1 . Set 1 := (C1), set 1 := (C1). Choose x1 2 Q(C1) such that f (x1) = 1 . Set M1 := fC1g. Set k := 1.

Iteration k  1: Stopping Criterion:

If k = k then Stop. The point xk is an optimal solution, k is the optimal value.

Otherwise:

Partition Ck into a nite number of sets Ck;1; : : : ; Ck; . For j = 1; : : : ; k , compute lower bounds (Ck;j ) satisfying k

k  (Ck;j )  x M minC f (x) 2

\

k;j

and sets Q(Ck;j )  M \ Ck;j of feasible points. n n oo Set k+1 := min k ; min f (x) : x 2 Sj =1 Q(Ck;j ) . Choose xk+1 2 fxk g [ Sj =1 Q(Ck;j ) such that f (xk+1) = k+1 (xk+1 is the best feasible point known at iteration k). Set Mk+1 := (Mk n fCk g) [ Sj =1 fCk;j g. Set Mk+1 := fC 2 Mk+1 : (C ) < k+1 g. If Mk+1 6= ;, set k+1 := minf (C ) : C 2 Mk+1 g, else set k+1 := k+1 . Choose Ck+1 2 Mk+1 such that (Ck+1) = k+1 . k

k

0

k

0

Go to iteration k + 1.

51

7.2. Branching and Bounding Procedures

Note that as soon as a set C is detected to be infeasible (M \ C = ;), it can be deleted from the set M . This is usually done by assigning a very large lower bound to C , e.g. by setting (C ) = +1. A drawback in the Branch{and{Bound method is that although the optimal solution is found rather quickly, often numerous further iterations are necessary to ensure that this really is the desired solution. k;j

k;j

k

k;j

7.2

k;j

Branching and Bounding Procedures

The partition sets C in the Branch{and{Bound algorithm are most often chosen to be simplices, (hyper)rectangles or cones. This is expressed by the terms simplicial, rectangular and conical Branch{and{Bound methods. The former two will be used in Chapters 9 and 10, respectively. The partitioning procedure performed in every iteration is understood in the following sense: k

De nition 7.2.1 Let C  IR and let I be a nite set of indices. A family fD : i 2 I g n

of subsets of C is said to be a partition of C if C= D and D \ D = @D \ @D

[ 2

i

i

i

j

i

i

8 i; j 2 I; i 6= j;

j

I

where @D denotes the boundary of D . i

i

If the partition sets used in the algorithm are rectangles, then an often used partitioning method is bisection: Let the rectangle R be described as R = fx 2 IR : a  x  bg, where a; b 2 IR and the {symbol is meant componentwise. Choose the longest of its edges, [a ; b ] say, and compute ! := 1=2(a + b ). Then the sets R1 := fx 2 R : x  ! g; R2 := fx 2 R : x  ! g form a partition of R in the sense of De nition 7.2.1. This procedure is called \bisection along the longest edge". If the partition sets used in a Branch{and{Bound algorithm are simplices, then two methods of partitioning are commonly used: Radial subdivision and bisection. First recall that an n{simplex S  IR is de ned to be the convex hull of n + 1 anely independent points v 0; v 1; : : : ; v 2 IR , i.e. a simplex is a full{dimensional polytope with n + 1 vertices. Now let S = convfv 0; v 1; : : : ; v g be given and choose an arbitrary point ! 2 S n fv0; v1; : : : ; v g. This point is uniquely represented as a convex combination of the vertices:  = 1:   0 8 i; ! =  v; n

n

`

`

`

`

`

`

n

n

n

n

n

X

X n

n

=0

i

i

i

i

=0

i

For each i such that  6= 0 form the simplex S := convfv 0; : : : ; v ,1; !; v +1; : : : ; v g: i

i

i

i

n

52

Chapter 7. The Branch{and{Bound Algorithm

Note that, since ! is not allowed to be a vertex of S , the number of indices i such that i 6= 0 is at least 2. The so obtained family fSi g of simplices forms a so{called radial subdivision of S which is a partition in the sense of De nition 7.2.1. If the point ! is taken to be the midpoint of the longest edge of S , then the partitioning method is called bisection. If ! is not the midpoint but an arbitrary point on the longest edge, then one speaks of generalized bisection. More on this topic can be found in Horst [42]. Some partitioning procedures have a special quality: Every nested sequence of sets generated by the procedure eventually shrinks to a singleton. De nition 7.2.2 A partitioning method is called exhaustive, if for all decreasing se-

quences fCk gk2IN there holds

lim C k!1 k

=

\ C = fx g: k 

k2IN

Bisection of simplices or rectangles along the longest edge are exhaustive subdivision methods, radial subdivision of simplices is not. It depends on the structure of the problem how a starting set C1  M can be obtained. In most cases this is no diculty at all. The main diculty in using Branch{and{Bound algorithms consists in computing lower bounds. If we are to calculate a lower bound (Ck)  min f (x) x2M \C k

there are several possibilities of doing so. It depends again on the structure of the problem which one can actually be realized. An often encountered technique is using linear or convex subfunctionals, i.e. functionals underestimating f . Assume that for each Ck in the algorithm a function gk can be found such that gk (x)  f (x) 8 x 2 Ck and the minimization problem (7.1) (Ck ) = min gk (x) x2C k

can be solved. (Ck) is then a lower bound, as desired. Often linear subfunctionals gk can be constructed. Then in every iteration of the algorithm the linear subproblem (7.1) has to be solved. A di erent possibility will be studied in Chapter 8: The use of Lagrange{duality for computing lower bounds. Determination of the sets Q(Ck ) of feasible points normally does not require additional calculations. In most cases feasible points are a by{product of the lower bound calculation.

53

7.3. Convergence Conditions

7.3

Convergence Conditions

A Branch{and{Bound algorithm is called nite, if it terminates at some iteration k because the stopping criterion is met, otherwise it is called in nite. We have the following obvious result (cf. Horst/Tuy [51, Corollary IV.1]):

Lemma 7.3.1

If a Branch{and{Bound procedure is in nite, then it generates at least one in nitely decreasing sequence fCk gk2IN of successively re ned partition sets.

An in nitely decreasing sequence of successively re ned partition sets is meant to be a sequence fCk gk2IN generated by the algorithm such that Ck  Ck+1 8 k 2 IN. Note that the index k need not correspond to the iteration index, it may be some subsequence thereof. In practice, a Branch{and{Bound procedure will always be made nite by replacing the stopping criterion k = k with k , k  ", where " is a prescribed accuracy. However, we are still interested in convergence conditions, i.e. in conditions ensuring that the sequence k of lower bounds will converge to the sought minimum min f (x). A x2M crucial lemma is the following one:

Lemma 7.3.2

If every in nite decreasing sequence fCk gk2IN of successively re ned partition sets satis es

T

Ck  M , k2IN (ii) lim (Ck) = minff (x) : x 2 C g, k!1 (i)

then

C :=

lim k!1 k

= minff (x) : x 2 M g:

Proof. Since in every iteration we choose Ck such that k = (Ck), we have lim = lim (Ck) = min f (x)  min f (x): x2C x2M k!1 k k!1

The inequality comes from the fact that C is a subset of M . On the other hand, all k are lower bounds, i.e. k  min f (x) 8 k 2 IN: x2M This yields the desired equality. }

New results showing the convergence of Branch{and{Bound algorithms using radial subdivision of simplices along with convex envelopes for the lower bound calculation can be found in Locatelli and Raber [54],[55].

54

Chapter 7. The Branch{and{Bound Algorithm

Note that, when solving a maximization problem instead of the described minimization problem, convergence depends on the behaviour of the upper bounds. It is obvious how the analogue of Lemma 7.3.2 reads in this case. With the tools developed in this chapter, we are now able to prove convergence of Branch{and{Bound algorithms using Lagrange{dual bounds, the topic of Chapter 8. We allude once more to the fact that all the theory of this chapter is well{known. In outlining the most important facts we basically followed the textbooks by Horst/Tuy [51] and Horst et al. [47].

Chapter 8 Lagrange Duality and Partitioning Techniques In this chapter, we investigate whether and under which conditions solving the Lagrangian dual of a given problem yields a valid bounding procedure, i.e. a bounding procedure satisfying a sucient condition for the convergence of the corresponding Branch{ and{Bound approach. Such a dual approach seems to be conceptually natural, since, by the well{known \weak duality", a feasible solution of the dual yields a lower bound without any convexity or regularity requirements on the given (primal) problem. Due to nonconvexity, however, a positive duality gap has to be expected between the optimal values of primal and dual. It is shown that, for very general problem classes, this duality gap can be reduced to zero in the limit by appropriate re nement of the partition sets so that, in general, solving the dual yields a valid bound. A similar result holds for partly convex problems where exhaustive partitioning is applied only in the space of nonconvex variables. Applications include Branch{and{Bound approaches for linearly constrained problems where convex envelopes can be computed, certain generalized bilinear problems, linearly constrained optimization of the sum of ratios of ane functions (see Chapter 9), and concave minimization under reverse convex constraints. The results of this chapter have appeared as Dur and Horst [24]. A more recent reference dealing with the same theory is Thoai [72]. 8.1

8.1.1

Convex Envelopes and Duality

Convex Envelopes

We begin with recalling the concept of the convex envelope of a nonconvex function which is a basic tool in theory and algorithms of nonconvex global optimization (see Horst/Tuy [51] or Horst et al. [47] and references therein). 55

56

Chapter 8. Lagrange Duality and Partitioning Techniques

De nition 8.1.1 Let C  IR be nonempty, compact and convex, and let f : C ! IR be lower semicontinuous (l.s.c.) on C . Then the function ' : C ! IR, n

C;f

' (x) := supfh(x) : h : C ! IR convex, h  f on C g C;f

is said to be the convex envelope of f over C .

Notice that it is often convenient to eliminate formally the set C by setting 8 < f (x); x 2 C f (x) = : +1; x 2= C and replacing ' accordingly by its extension ' : IR ! IR [ f1g. It is well{known and easy to see that ' is l.s.c. on C , and hence is representable as the pointwise supremum of the ane minorants of f . Geometrically ' is the function whose (closed) epigraph coincides with the convex hull of the epigraph of f . Alternative representations are ) (X +1 +1 X ' (x) = inf f (x ) : ( 1 ; : : :; +1) 2  +1; x 2 C; x = x ; n

C;f

C;f

C;f

C;f

n

n

C;f

=1

i

i

n

n

i

i

i

where 

is the standard simplex (  +1 := ( 1; : : : ; +1 ) 2 IR n

+1

n

n

+1 :

n

+1 X

n

i

=1

=1

i

i

)

= 1 ;  0; i = 1 ; : : : ; n + 1 ; i

i

as well as

' = (f ) ; where, as in De nition 2.2.1, f  denotes the Fenchel{Rockafellar conjugate function of f . For proofs and further references see, e.g., Horst/Tuy [51], Horst et al. [47] or Rockafellar [60]. In the sequel we need the following basic properties (the proofs of which can also be found in the aforementioned references). C;f

Lemma 8.1.1 Let f and C be de ned as in De nition 8.1.1, and let D  C be compact and convex, g : IR ! IR be an ane function. Then (i) m := minff (x) : x 2 C g = minf' (x) : x 2 C g, (ii) fy 2 C : f (y ) = mg  fy 2 C : ' (y ) = mg, (iii) ' (x)  ' (x) 8 x 2 C , n

C;f

C;f

(iv) '

C;f

C;f

D;f

+

g

= ' + g. C;f

Notice that the result (ii) can be stated more precisely: it is easy to see that the set of global minimizers of ' over C is the convex hull of the set of global minimizers of f over C . C;f

57

8.1. Convex Envelopes and Duality

8.1.2

Duality Gap

Consider the general nonconvex global optimization problem min f (x) (P ) s.t. x 2 C; h(x)  0; where C  IRn is nonempty, compact and convex, f : C ! IR is l.s.c. on C , and h : C ! IRm is (componentwise) l.s.c. on C . Assume that the feasible set M = fx 2 C : h(x)  0g is nonempty so that an optimal solution exists. For u 2 IRm+ , x 2 C de ne the Lagrangian

L(x; u) := f (x) + hh(x); ui of (P ) and the dual objective function d : IRm+ ! IR d(u) := min L(x; u): x2C Then the dual of (P ) is de ned as the problem max d(u) (D) s.t. u 2 IRm+ : Let min(P ) and sup(D) (respectively max(D) when the maximum is attained) denote the optimal values of (P ) and (D), respectively. Since (P ) can be written as min max L(x; u); x2C u2IRm +

one immediately has the well{known weak duality: min(P )  sup(D): If f is convex on the feasible set M of (P ) and a certain \constraint quali cation" is ful lled then strong duality holds: min(P ) = sup(D): See Geo rion [32] for a thorough comprehensive discussion of duality in mathematical programming. We mention two of the most frequently used constraint quali cations: (CQ1) f is convex on an open set containing M , and h(x) = Ax , b; A 2 IRmn , b 2 IRm, (CQ2) the mapping h is convex and there exists x0 2 C satisfying h(x0 ) < 0 (this condition is called Slater's Condition).

58

Chapter 8. Lagrange Duality and Partitioning Techniques

In our nonconvex case, however, a duality gap  := min(P ) , sup(D) > 0 has to be expected. Consider now the linearly constrained case where h(x) = Ax , b; A 2 IRmn ; b 2 IRm . In this case, a probably not yet fully exploited relation between the dual of the nonconvex problem (P ) and its convexi ed primal was proved in Falk [27] by means of the theory of Fenchel{Rockafellar conjugate functions. Next we review this relation and provide a simpler proof. Let min ' (x) (P ) s.t. x 2 C; Ax  b denote the problem which arises from (P ) by replacing the objective function f by its convex envelope ' . Consider the corresponding Lagrangian C;f

C;f

L(x; u) = ' (x) + hAx , b; ui and denote by (D), sup(D), min(P ) the dual of (P ), its optimal value, and the optimal value of (P ), respectively. C;f

Assume that h(x) = Ax , b; A 2 IRmn ; b 2 IRm, and that a constraint quali cation holds for (P ). Then

Proposition 8.1.1

min(P ) = sup(D) = sup(D) and

Proof.

 = min(P ) , min(P ): By Lemma 8.1.1(iv), we have

' (; u) = L(; u) on C 8 u 2 IRm+ ; where ' (; u) denotes the convex envelope of L(x; u) with respect to x over C . Therefore, it follows from Lemma 8.1.1(i) that, for all u 2 IRm+ , d(u) := min L(x; u) = min ' (x; u) = min L(x; u) = d(u); x2C x2C x2C C;L

C;L

C;L

i.e. the objective functions of (D) and (D) coincide, and hence sup(D) = sup(D). The assertion follows, since for problems (P ) and (D) strong duality holds: min(P ) = sup(D).

}

In this chapter, our intention is to investigate the possible use of dual problems of type (D) to obtain lower bounds on the optimal value of (P ) within Branch{and{Bound

59

8.2. Branch{and{Bound Methods with Dual Bounds

schemes for solving nonconvex global optimization problems. In such a scheme, the set C is usually a partition set as described in Section 7.2, i.e. an n{simplex or an n{rectangle. When the constraints are linear then Proposition 8.1.1 allows us to switch from the dual (D) to the convexi ed primal (P ) according as which formulation is easier to handle. For example, when f is a concave function and C is an n{simplex, it is known that ' is the ane function which coincides with f at the vertices of C (see Horst [41], Horst/Tuy [51] or Horst et al. [47]), so that problem (P ) reduces to a linear program, a result which is more dicult to deduce from the dual formulation (D). In the nonlinearly constrained case, where h = (h1 ; : : :; hm ) is an arbitrary l.s.c. mapping on C , estimates of the duality gap involving certain de nitions of the \lack of convexity of a function" have been given by Aubin and Ekeland [6] and by Pappalardo [57]. For our purpose, however, it seems to be more convenient to employ convex envelopes in a similar way as for the linearly constrained case. Returning to the nonlinearly constrained case, let (P ) denote the problem which arises from (P ) when all of the functions f; hi are replaced by their convex envelopes over C , and let d(u); (D) denote the corresponding dual objective and dual problem, respectively. Notice that, for every u 2 IRm+ , the Lagrangian of this convexi ed primal is a convex underestimator of the Lagrangian of (P ) over C but not necessarily its convex envelope. Therefore, we have min(P )  sup(D)  sup(D); and min(D) = min(P ); whenever a constraint quali cation is ful lled for (P ). This yields C;f

Corollary 8.1.1 Let (P ); (D) and (P ); (D) be de ned as above and assume that a constraint quali cation holds for problem (P ). Then 0   = min(P ) , sup(D)  min(P ) , min(P ): 8.2

Branch{and{Bound Methods with Dual Bounds

8.2.1 Limit Behaviour on Nested Sequences Convergence of a Branch{and{Bound algorithm depends crucially on the limit behaviour of the lower bounds on nested sequences of partition sets (cf. Chapter 7). In view of Proposition 8.1.1 and Corollary 8.1.1 the limit behaviour of convex envelopes on such sequences is studied rst.

Lemma 8.2.1 Let Ak  IRn be compact and ; =6 Ak+1  Ak for all k 2 IN. Then 0 1 \ \ conv @ Ak A = convAk ; k2IN

where conv denotes the convex hull operation.

k2IN

60

Chapter 8. Lagrange Duality and Partitioning Techniques

The inclusion

0 1 \ \ conv @ A A  convA 2IN 2IN is clear, since for any two sets A  B there holds convA  convB . Proof.

k

k

k

k

For the opposite inclusion recall that every point of the convex hull of a compact set A in IR can be represented as convexTcombination of n +1 points of A (this is Caratheodory's theorem). Thus, for every x 2 convA , there exist, for every k 2 IN, y 2 A ;  2 2IN [0; 1]; i = 1; : : :; n + 1, satisfying n

k

ki

k

ki

k

+1 X

n

i

=1

 = 1;

+1 X

n

x=

ki

i

 y : ki

=1

ki

By compactness of A1 and of [0; 1], we can nd a subsequence fk g 2IN  IN such that, for each i = 1; : : : ; n + 1, there exist  2 [0; 1], y 2 A1 satisfying q

i

 !  and y kq ;i

q

i

i

kq ;i

!y

q ! 1:

as

i

Since sequence fA g 2IN is decreasing (nested) by assumption, it follows that y 2 T A the . Finally, from 2IN +1 +1 X X  = lim !1  = 1 k

k

i

k

k

n

n

i

and

i

=1

q

+1 X

n

 y = lim !1 =1 ! i

i

i

q

i

kq ;i

=1

+1 X

n

=1



kq ;i

y

kq ;i

=x

i

we see that x 2 conv T A . 2IN

}

k

k

Notice that Lemma 8.2.1 does not hold for unbounded sets A (take A = f0g [ [k; 1[). k

k

Application of Lemma 8.2.1 to \truncated" epigraphs of lower semicontinuous functions yields the following result on the convergence of convex envelopes. For Tall k 2 IN; let C  IR be compact, convex and ; = 6 C +1  C . Let C := lim C = C . Moreover, let f : C ! IR be lower semicontinuous and 1 !1 2IN bounded on C1 . Then, for the convex envelopes ' and ' of f over C and C , respectively, there holds n

Corollary 8.2.1 k

k

k

k

k

k

Ck ;f

lim !1 '

k

Ck ;f

(x) = sup ' 2IN k

Ck ;f

(x) = ' (x) C;f

C;f

8 x 2 C:

k

k

61

8.2. Branch{and{Bound Methods with Dual Bounds

Proof. Let   supff (x) : x 2 C1g and, for k 2 IN, let Ak = f(x; t) 2 Ck  IR : f (x)  t  g denote the \truncated" epigraph of f on Ck . Since Ck is compact and the epigraph Epi (f; Ck) of the l.s.c. function f over Ck is closed, we see that Ak is compact for all k 2 IN. Clearly, \ lim Ak = Ak = f(x; t) 2 C  IR : f (x)  t  g: k!1 k2IN

The assertion then follows from Lemma 8.1.1(iii), Lemma 8.2.1 and the well{known fact that the epigraph of the convex envelope of f (over Ck , respectively C ) is the convex hull of the epigraph of f over the corresponding set, where, of course, epigraphs can be replaced by truncated epigraphs in the above sense. } The limit behaviour of sequences of minima of convex envelopes over nested sequences of compact convex sets follows from the following more general result.

Lemma 8.2.2 For all k 2 IN, let Ck  IRn be compact and ; 6= Ck+1  Ck , and let T C := klim Ck = Ck . Moreover, let fk : Ck ! IR be l.s.c. on Ck satisfying fk+1  fk !1 k2IN on Ck+1 and fk   on Ck , for all k. Then minfsup fk (x) : x 2 C g = klim fmin fk (x) : x 2 Ckg = supfmin fk(x) : x 2 Ck g: !1 k2IN

k2IN

Proof. It is easy to see that f := sup fk is l.s.c. on C (cf. Aubin [5]), so that all minima k2IN exist. Let xk 2 Argminffk (x) : x 2 Ck g. Since Ck+1  Ck and fk+1  fk on Ck+1 we

have

fk (xk )  fk+1(xk+1)  minff (x) : x 2 C g =: m; and hence existence of f  := klim fk(xk)  m. !1 In order to show the opposite inequality, consider the truncated epigraphs Fk := f(x; r) 2 IRn  IR : x 2 Ck ; fk (x)  r  g: Again it is easy to T see that Fk is compact for all k 2 IN. Moreover, Fk+1  Fk so that F := klim F Fk exists. It is well{known that k = !1 k2IN

! \ Epi sup fk ; C = Epi (fk ; C ) k2IN

k2IN

(see again Aubin [5]), so that

F = f(x; r) : x 2 C; f (x)  r  g:

62

Chapter 8. Lagrange Duality and Partitioning Techniques

The sequence of pairs (xk ; fk (xk )) 2 Fk ; k 2 IN, has accumulation points by compactness of F1 , and, by passing to a subsequence, if necessary, we can assume that lim (x ; f (x )) = (x k!1 k k k



; f ) 2 F:

Hence f   f (x)  m because of lower semicontinuity of f .

}

Corollary 8.2.2 With the notations and assumptions of Corollary 8.2.1, there holds minf'C;f (x) : x 2 C g = klim fmin 'Ck;f (x) : x 2 Ckg: !1 Proof. Apply Lemma 8.2.2 to fk = 'Ck;f .

}

Of course, Corollary 8.2.2 can also be derived directly via Lemma 8.1.1(i) and lower{ semicontinuity and boundedness of f . 8.2.2

Partitioning Methods with Dual Bounds

We now establish a convergence result for Branch{and{Bound algorithms with dual bounds to solve global optimization problems of type min f (x) (P ) s.t. hi (x)  0; i = 1; : : :; m; with nonempty compact feasible set

M = fx 2 IRn : hi (x)  0; i = 1; : : : ; mg where f; hi are l.s.c. real{valued functions on a compact convex set C1  M . The following theorem, based on the results of the previous section, shows that dual bounds in partitioning methods for of type (P ) lead to convergent procedures in the sense of Lemma 7.3.2.

Theorem 8.2.1 Let M = fx 2 IRn : hi(x)  0; i = 1; : : : ; mg be nonempty and compact, where hi : C1 ! IR; i = 1; : : :; m, are l.s.c. on the compact convex set C1  M . Let f : C1 ! IR be l.s.c. and bounded on C1 , and let fCk gk2IN be a decreasing sequence of nonempty, compact convex sets in IRn converging to C  M . Furthermore, for every k 2 IN, let ) ( m vk = umax min f (x) + 2IRm x2Ck +

X i=1

ui hi(x) ;

and assume that a constraint quali cation holds for the convexi ed problem

63

8.2. Branch{and{Bound Methods with Dual Bounds

k

min ' (x) (x)  0; i = 1; : : : ; m x2C : Ck ;f

s.t. '

(P )

Ck ;hi

k

Then

lim v = minff (x) : x 2 C g:

k!1

k

Note that v is actually the optimal value of the dual of problem (P ) stated below. k

Proof.

k

Consider the problem

min f (x) s.t. h ( x )  0; i = 1; : : : ; m (P ) x2C : Recall from weak duality and from Corollary 8.1.1 that, for every k 2 IN, we have the bounds 0  min(P ) , v  min(P ) , min(P ) (8.1)  min(P ) , minf' (x) : x 2 C g; where the last inequality follows from fx : ' (x)  0; i = 1; : : : ; m; x 2 C g  C : Lower semicontinuity of the functions h on C1 implies compactness of the feasible sets C \ M of (P ). Let x 2 Argminff (x) : x 2 C \ M g. Then, from C +1  C , we have f (x )  f (x +1)  minff (x) : x 2 C \ M g =: m; so that lim f (x )  m !1 exists. By passing to a subsequence, if necessary, we can assume that x ! x 2 C \ M; and hence lim f (x )  f (x)  m: !1 Since C  M , we have shown that lim min(P ) = minff (x) : x 2 C g: !1 i

k

k

k

k

k

k

k

k

Ck ;f

k

Ck ;hi

k

i

k

k

k

k

k

k

k

k

k

k

k

k

k

k

k

In view of Corollary 8.2.2 and Lemma 8.1.1(i) we likewise have lim minf' (x) : x 2 C g = minff (x) : x 2 C g; !1 k

Ck ;f

k

and Theorem 8.2.1 follows from (8.1) by letting k ! 1.

}

64

Chapter 8. Lagrange Duality and Partitioning Techniques

8.2.3 Partly Convex Optimization Problems Many optimization problems involve two variables x IRn and y IRp such that objective and constraint functions are convex (or even linear) in x (for each xed y ) and nonconvex in y (for each xed x). Often the dimension p of the space of the \nonconvex" variable y is considerably smaller than the dimension n of the x{space. When constructing a Branch{and{Bound method for such a problem, one would like to employ partitioning only in the y {space, whereas the bounding procedure inevitably has to involve both variables. Examples of such decomposition methods include the approaches of Pardalos and Rosen [58] for concave quadratic minimization over polytopes, of Horst and Thoai [48] for linearly constrained concave minimization, of Horst and Thoai [49] for biconcave minimization problems, and of Horst, Muu and Nast [43] for so{called quasiconvex{concave programs. In this section, we show that for fairly large classes of optimization problems satisfying certain regularity conditions, dual bounds lead to convergent decomposition{partitioning algorithms in the above sense. Let now (P ) denote the problem min f (x; y ) (P ) s.t. hi (x; y ) 0; i = 1; : : : ; m x X; y Y; n p where X IR ; Y IR are convex and compact, f; hi : X Y IR; f ( ; y ) convex, l.s.c. and bounded on X , hi ( ; y ) convex, l.s.c. on X for every y Y ; and f (x; ); hi (x; ) continuous on Y for every x X . Assume that the feasible set M = (x; y ) X Y : hi (x; y) 0; i = 1; : : : ; m is nonempty and compact. Let (Pk ) denote the problem which arises from problem (P ) when Y is replaced by a nonempty compact convex subset Yk of Y , i.e. min f (x; y ) s.t. hi (x; y ) 0; i = 1; : : :; m (Pk ) x X; y Yk : Let ( ) m X min vk = umax f ( x; y ) + u h ( x; y ) (8.2) i i m 2IR x2X; y 2Y 2

2



2



2







!

2







2

f

2





g



2

+

k

2

i=1

be the optimal value of the dual problem (Dk ) corresponding to (Pk ), and, as before, let (Pk ); (Dk ) denote the convexi ed primal and dual, which arise from (Pk ) and (Dk ) when f and each hi is replaced by its convex envelope, respectively. In order to derive correctness of dual bounds in Branch{and{Bound methods involving partitions only in the y {space via Lemma 7.3.2, we have to assume that the partitioning

65

8.2. Branch{and{Bound Methods with Dual Bounds

procedure is exhaustive in the sense of De nition 7.2.2, i.e. that every nested sequence

fY g 2IN of partition sets eventually shrinks to a singleton fyg as k ! 1. The following k

k

theorem shows that the convergence condition in Lemma 7.3.2 is ful lled in this case.

Theorem 8.2.2 Consider problem (P ) with the above assumptions and notations. Let fY g 2IN be a decreasing exhaustive sequence of nonempty compact convex sets in IR satisfying Y1  Y and converging to a singleton fy g, y  2 Y . Assume that a constraint k

p

k

quali cation holds for each convexi ed problem (P ). Then, for v de ned in (8.2), there holds lim v = min ff (x; y) : h (x; y)  0; i = 1; : : :; m; (x; y) 2 X  fygg: !1 k

k

k

k

i

Proof. Let

M := f(x; y) : h (x; y)  0; i = 1; : : :; m; x 2 X; y 2 Y g denote the feasible set of problem (P ), M := f(x; y) : '  (x; y)  0; i = 1; : : : ; m; x 2 X; y 2 Y g the feasible set of (P ), and M  := f(x; y) : h (x; y )  0; i = 1; : : :; m; x 2 X g: We have M  \ M 6= ; by compactness of X; Y ; M , continuity of h (x; ) and the assumption y  2 Y . Moreover, lim M = lim M = M ; !1 !1 because of Corollary 8.2.1, continuity of h (x; ) and, by convexity of h (; y ), ' f  g (x; y ) = h (x; y) on X  fyg: Application of Lemma 8.2.2 with C = M ; f = '  shows that lim min(P ) = minf' f g (x; y ) : (x; y ) 2 M  g = minff (x; y ) : (x; y ) 2 M  g; !1 since ' f g (x; y ) = f (x; y) on X  fy g from convexity of f (; y ). On the other hand, it is easy to see by the same arguments as in the corresponding part of the proof of Theorem 8.2.1, that lim min(P ) = minff (x; y ) : (x; y ) 2 M  g: !1 The assertion then follows from Corollary 8.1.1, which states that 0  min(P ) , v  min(P ) , min(P ) by passing to the limit k ! 1. } k

i

k

k

k

X

k

Yk ;hi

k

i

k

k

k

i

k

k

i

X

y

i

;hi

k

k

k

X

X

k

y

y

i

k

k

X

Yk ;f

;f

;f

k

k

k

k

k

Notice, that again (with inf replacing min if necessary) the result does not hold for unbounded sets X : Choose f (x; y ) = e, ; X = IR+ ; Y = [0; 1]; Y = [0; 1 ]: xy

k

k

66

Chapter 8. Lagrange Duality and Partitioning Techniques

8.3 Some Applications 8.3.1 Linearly Constrained Problems and Convexi cation Applications of Theorem 8.2.1 include all Branch{and{Bound algorithms for global optimization of nonconvex objective functions over polytopes which use either convex envelope constructions or dual lower bounds, since, by Proposition 8.1.1, these two bounding procedures are equivalent. Examples include the algorithm of Pardalos and Rosen [58] for concave quadratic minimization (which can also be regarded as application of Theorem 8.2.2 when additive linear terms in the objective function are admitted), the algorithm of Horst [41] for general concave minimization over polytopes, and the method of Al{ Khayyal and Falk [1] for certain biconcave problems.

8.3.2 Generalized Bilinear Constraints Another application of Theorem 8.2.2 are Branch{and{Bound methods with exhaustive partitioning procedures in the y {space for problems of type min hx; ci s.t. A(y )x  b

x 2 X; y 2 Y;

(8.3)

where c 2 IRn ; b 2 IRm; X and Y are polytopes in IRn+ and IRp, respectively, and A(y) : Y ! IRmn is a continuous matrix mapping. Assume that each entry aij (y) of A(y) is a concave function Y ! IR (alternatively it will turn out that quasiconcavity of aij (y) is sucient for the practical applicability of dual bounds). Notice that (8.3) includes bilinearly constrained problems and various practical problems such as, for example, the pooling problem in oil re neries. Often, one encounters the condition x  0 rather than x 2 X for a polytope X . However, when upper bounds on x are known, which is often the case, the conditions x  0 can be replaced by x 2 X , with the compact set X de ned as X := fx 2 IRn : 0  x  Meg, with M > 0 suciently large, e = (1; : : :; 1)T 2 IRn . A dual problem corresponding to a nonempty, compact convex partition set Yk of Y (or of a suitable set Y1  Y ) is max

min fhx; ci + hA(y )x , b; uig:

u2IRm + x2X; y 2Yk

(8.4)

When X is the above box with M suciently large, problem (8.4) reduces to a linear program, if we assume that there is u 2 IRm+ such that

AT (y)u + c  0

8 y 2 Yk :

67

8.3. Some Applications

This assumption is ful lled, for example, when A(y ) has a row with positive entries for all y 2 Yk . Notice that such a row can always be generated by adding the redundant constraint eT x  M  n to the original constraints. Given the above assumption, problem (8.4) reduces to max ,hu; bi s.t. AT (y )u + c  0 8 y 2 Yk (8.5) u 2 IRm+ : Let aTj (y )u + cj denote the j {th row in AT (y )u + c. Then the constraints in (8.5) are equivalent to u 2 IRm+ and min aTj (y )u + cj : y 2 Yk  0; for all j = 1; : : :; n: (8.6) But, by our concavity assumption on the elements of A(y ), each minimum in (8.6) is attained at a vertex v of Yk so that (8.5) reduces to the LP max ,hu; bi s.t. aTj (v )u + cj  0; v 2 V (Yk ); j = 1; : : : ; n (8.7) m u 2 IR+ ; where V (Yk ) denotes the vertex set of Yk . Notice that Yk is often a simplex or an n{rectangle with known vertex set. Convergence of a Branch{and{Bound approach for solving problem (8.3) which uses exhaustive subdivision in the y {space and the dual bounds from (8.7) follows from Theorem 8.2.2. A completely di erent proof employing results from parametric programming has been given by Ben{Tal, Eiger and Gershovitz [10].

n

o

8.3.3 Maximizing the Sum of Ane Ratios As it will be extensively studied in Chapter 9, we only brie y mention the Sum{of{Ratio maximization problem here for the sake of completeness: Let A 2 IRmn ; b 2 IRm and P := fx 2 IRn : Ax  b; x  0g be bounded. Furthermore, for i = 1; : : : ; p; (p 2 IN), let ci ; di 2 IRn ; i ; i 2 IR. Assume that for all i, hx; dii + i > 0 8 x 2 P: The problem of maximizing a sum of ratios of ane functions is the problem p x; c i + i i max hhx; di i + i i=1 s.t. Ax  b; x  0:

X

A reformulation of this problem into a partly convex problem which Theorem 8.2.2 can be applied to, will be given in Chapter 9.

68

Chapter 8. Lagrange Duality and Partitioning Techniques

8.3.4 Concave Minimization under Reverse Convex Constraints As nal example, consider the problem class min ( ) s.t. ( )  0 2 f x

g x

;

x

(8.8)

X;

where : IRn ! IR and : IRn ! IRm are concave mappings, and  IRn is a partitioning polytope with known vertex set ( ). Since ( ) + h ( ) i is concave in for all 2 IRm, the dual of (8.8) becomes f

g

X

V

f x

g x ;u

x

X

u

+

max min f ( ) + h ( ) ig

u2IRm + v 2 V (X )

f v

g v ;u

;

which is equivalent to the linear program (in the variables 2 IR t

max s.t. ( ) + h ( ) i  2 IRm t

f v

g v ;u u

t

+:

8 2 ( ) v

V

X ;

; u

2 IRm) +

Chapter 9 Global Optimization of Sums of Ratios and the Corresponding Multiple{Criteria Decision Problem Let n ; d (i = 1; : : :; p), g (k = 1; : : :; m) denote continuous real{valued functions on the n{dimensional Euclidean space IR , and let i

i

k

n

P = fx 2 IR : g (x)  0; k = 1; : : : ; mg : (9.1) We assume throughout the chapter that P is nonempty and bounded. Furthermore, let d (x) > 0 on P; i = 1; : : : ; p: Consider the sum{of{ratios program s.t. x 2 P: (9.2) max nd ((xx)) ; =1 Usually, the number p of ratios is considerably smaller than the number n of decision variables. We are interested in p  2 and present a general algorithmic approach which is (theoretically) applicable for various classes of nonlinear functions involved in (9.1){(9.2). However, we will concentrate on the case when all functions involved are ane. For this case, we will substantiate our theoretical concept and we will report on a Fortran90{ implementation yielding promising numerical results. The results of this chapter have been summarized in Dur, Horst and Thoai [26]. n

k

i

X p

i

i

i

9.1 Applications and Background Model (9.2) arises in various economic as well as non{economic applications, whenever one or several rates are to be optimized. We give a few examples mainly following the surveys by Schaible [64] and [65], where numerous other applications can be found. Numerators and denominators in (9.2) may represent pro t, cost, capital, risk or time. 69

70

Chapter 9. Sums of Ratios and the Corresponding MOP

Model (9.2) is closely related to the associated multiple{objective optimization problem, where several ratios are to be maximized simultaneously and the objective function in (9.2) can be seen as a utility function expressing a compromise between the di erent objective functions of the multiple{objective problem. Notice that model (9.2) does include the case where some ratios are not proper quotients, i.e. ( ) = 1. Hence, model (9.2) also describes situations where a compromise is sought between absolute and relative terms like pro t and return on investment (pro t/capital) or return and return/risk (cf. Schaible [63]). Almogy and Levin [2] and Falk and Palocsay [28] each formulate a deterministic equivalent of a multistage stochastic shipping problem in the form of (9.2). Other applications include pro t maximization under xed cost (Colantoni, Manes and Whinston [17]), various models in cluster analysis (Rao [59]), queueing location problems (Drezner, Schaible and Simchi{Levi [21], see also Zhang [78]), and inventory models (Schaible and Lowe [66]). di x

Single{ratio fractional programming (where = 1 in (9.2)), has been extensively studied (see, e.g., the bibliography Schaible [64] in the Handbook of Global Optimization). This case is equivalent to a linear program when all functions involved are ane (Charnes and Cooper [16]). When is a convex set and the objective function is a single ratio of a concave and a convex function, then the fractional program can be transformed into a convex program according to Schaible [61]. In this case, a local maximum is a global one, strong duality relations hold, and several solution techniques are available (cf. Schaible [62] and [64]). Unfortunately, for 1, the case we are interested in here, none of the above properties of single{ratio fractional programs is true any longer. In particular, a local maximum may not be a global one, even if all functions involved are linear. This unpleasant multiextremality of model (9.2) explains why, compared with other types of multi{ratio fractional programs, very little is known about this sum{of{ratios problem. For model (9.2) with 1, if all functions involved are ane and is bounded, three approaches have been proposed (each of which allowing some generalization). For a comparison of algorithms and references, we follow Schaible [65]. Almogy and Levin [3] give a necessary and sucient optimality condition in terms of a related parametric linear program, which can be viewed as a generalization of Dinkelbach's parametric method for the single{ratio case (see Dinkelbach [20] or Schaible [64] and references therein) to the case of 1, and propose several related algorithmic procedures. Falk and Palocsay [28], however, give a counterexample to Almogy and Levin's optimality condition (for = 2) which takes away the basis of their procedures. Two further interesting approaches for = 2, each with a number of di erent sophisticated ideas, have been proposed by Cambini, Martein and Schaible [15], and by Falk and Palocsay [28] and [29]. The approach of Cambini et al. relies on properties which hold only for = 2, so that it cannot be extended to the case of more than two ratios. The algorithm uses pivoting methods. It is the only nite method so far and converges even if the feasible region is unbounded. Falk and Palocsay's method can be extended to the case of more than two ratios. They transform the problem from the original space IR into the image space p

P

p >

p >

P

p >

p

p

p

n

T

:= f 2 IR : = ( ) ( ) = 1 y

p

yi

ni x =di x ; i

;:::;p

;

x

2 Pg :

71

9.2. Application of the Basic B&B Scheme

(an idea which we will adopt) and successively reduce the size of the feasible subset containing the solution, thus isolating the optimal solution. Their method is based on a sucient optimality condition related to the same parametric program as in Almogy and Levin [3]. A drawback in their approach is that this condition is not necessary so that an optimal solution may be identi ed as such only through additional iterations. However, some results relating to the justi cation of the algorithm are contained in Falk and Palocsay [28]. A minor error in their approach has been corrected by Cambini, Marchi, Martein and Schaible [14]. A fourth method which may be applied to our problem is one by Konno, Kuno and Yajima. In a series of papers they discuss several (parametric) approaches for problems of type ( ) X 1 min ( ) + ( ) 2( ) : p

f x

i=1

fi

x

 fi

x

x 2 P

where is compact and convex, is convex, and 1 2 are positive convex. Parametrization techniques similar to those for multiplicative programs can be developed for sum{ of{ratios problems. A recent tutorial survey of these approaches can be found in the monograph Konno, Thach, and Tuy [52]. However, these approaches are designed for general multiplicative problems and not for the special case of sum{of{ratios problems such that it seems reasonable to develop algorithms which take into account the special structure of this problem. P

f

fi ; fi

9.2 Application of the Basic Branch{and{Bound Scheme to the Sum{of{Ratios Problem The approach we propose is to solve sum{of{ratios problems by a Branch{and{Bound algorithm. The algorithm is shown through numerical examples to be applicable to the cases of at least four ratios. We rst reformulate the problem in a way that yields a reduction of the dimension of the problem. Then we give the basic scheme of a Branch{ and{Bound algorithm for this problem. In this section, our intention is to give the results in as mathematically general a way as possible. However, we also discuss in which cases this algorithm is practically implementable. The following considerations are valid for very general classes of functions involved in problem (9.1){(9.2), provided that Assumptions 9.2.1 and 9.2.2 as stated below are ful lled. This is particularly the case if (i) the set is convex, the functions are positive concave and the functions are convex, = 1 , or (ii) all functions involved in (9.1){(9.2) are ane. P

ni

di

i

;:::;p

In this section, we outline our theory in as general a way as possible. In the later sections, however, we will concentrate on case (ii).

72

Chapter 9. Sums of Ratios and the Corresponding MOP

Assumption 9.2.1 Assume that, for = 1 , the bounds 0  minf ( ) ( ) : 2 g i

`i

and

0

ui

can be computed. Set

0 `

=(

;:::;p

ni x =di x

x

(9.3)

P

 maxf ( ) ( ) : 2 g ni x =di x

0 0 `1 ; : : : ; `p

)

;

0 u

x

=(

(9.4)

P

0 0 u1 ; : : : ; u p

).

This assumption is not only ful lled whenever the corresponding single{ratio fractional program can be solved (exact bounds), but also for large classes of multiextremal functions (inexact bounds, cf. the monographs Horst et al. [47] and Horst/Tuy [51]). These include all problem classes with convex where, for each ( ) ( ), a concave majorant and a convex minorant over can be constructed. Other possible inexact bounds are: minf ( ) : 2 g 0 = max f ( ): 2 g and f ( ): 2 g 0 = max minf ( ) : 2 g P

ni x =di x

P

`i

ui

ni x

x

P

di x

x

P

ni x

x

P

di x

x

P

After introducing the additional variable = ( y

Y0

:

y1 ; : : : ; yp

) 2 IR and the rectangle p

:= f 2 IR : 0   0g p

y

`

y

u

;

it is easy to see that problem (9.2) is equivalent to the problem

X p

max =1 s.t. ( ) , ( )  0

yi

i

ni x

y i di x

x

2

P;

y

;

2

i

(9.5)

=1

; : : : ; p;

Y0 :

For this formulation of the problem, we propose a Branch{and{Bound algorithm which uses rectangular partition sets in the space IR . Note that the number of quotients is usually much smaller than the dimension of the original problem. Therefore, operating in the image{space IR substantially reduces the computational e ort. Denote the objective function in (9.5) by p

p

n

p

( ) :=

f y

X p

yi

i=1

and, for 2 0, de ne y

Y

( ) := f 2 : ( ) ,

P y

x

P

ni x

( )0 =1

yi di x

; i

g

;:::;p :

(9.6)

73

9.2. Application of the Basic B&B Scheme

With these notations we arrive at the formulation max f (y ) s.t. x 2 P (y ); y 2 Y0; which is problem (9.5) rewritten.

(9.7)

Assumption 9.2.2 Our second assumption is that, for each rectangle Y = fy 2 IR : `  y  ug p

contained in the rectangle Y0 , we must be able to decide whether or not Y contains a feasible point of problem (9.7), i.e. we must be able to decide whether the system x 2 P (y ); y

2Y

(9.8)

has a solution (x; y ), and if the answer is in the armative, to determine it.

For solving this problem, it is sucient to investigate P (`), but, since the y {part of a solution (x; y ) of (9.8) de nes a lower bound of the objective function f (y ), additional devices to nd y with

Xy > X` p

p

i

i

i

=1

=1

i

should be investigated (cf. Falk and Palocsay [29] and Section 9.5). Since the feasibility problem x 2 P (`) is equivalent to optimization problems of type (9.3), (9.4), Assumption 9.2.2 is ful lled for virtually the same problem classes for which Assumption 9.2.1 is ful lled. It is well{known that linear and certain concave systems of inequalities can be treated in polynomial time. For problem classes involving convex, d.c. or Lipschitz functions, we refer to Horst et al. [47], Horst and Tuy [51], Horst, Nast and Thoai [45], Horst and Nast [44], Horst and Thoai [50]. Now we can apply the Branch{and{Bound Scheme given in Section 7.1 to our special problem. The partition sets we use are rectangles Y in IR . A starting rectangle is p

:= fy 2 IR : `0  y  u0g;

Y0

p

where `0 and u0 are de ned as in (9.3){(9.4). The partitioning method used in every iteration is bisection of the rectangle along its longest edge as described in Section 7.2, yielding in iteration q the rectangles Y

q;

1

 Y and q

Y

q;

2

Y: q

In this chapter, denote lower bounds by and upper bounds by . Given a rectangle Y

:= fy 2 IR : `  y  ug; p

lower and upper bounds are easily obtained: It is easy to see that (Y ) := f (u)

(9.9)

74

Chapter 9. Sums of Ratios and the Corresponding MOP

is an upper bound on 2 max 2 y

P (y )

Y ;x

f (y ),

whereas

(Y ) := f (`)

is a lower bound. Of course, if a solution of (9.8), i.e. a feasible point (x; y ) is calculated, then

(Y ) := f (y ) (9.10) is a better lower bound. If the rectangle Y is discovered to be infeasible, i.e. if the system x 2 P (y ); y

2Y

does not have a solution, then Y is deleted from further consideration for the rest of the algorithm. If it does have a solution (x; y ), then this is a feasible point to problem (9.7). During the iteration, the respective best feasible point is chosen to derive global lower bounds. We want the algorithm to terminate with an "{optimal solution. This is de ned to be the following: De nition 9.2.1 Given " > 0, a point (x ; y ) is called an "{optimal solution of problem q

(9.7) if it satis es

and, for all y 2 Y0

2Y ; such that P (y ) 6= ;, y

f (y ) 

(

q

f (y "

2 P (y )

x

q

0

q

q



q



) + " f (y ) q

f (y f (y

if if

q q

) 6= 0; ) = 0:

This de nition takes into account the relative weights of " and the objective function value, resulting not in an absolute but a relative error. In order to obtain such an "{optimal solution, we de ne in iteration q the new family of partition sets, which have to be further investigated, to be as follows:

M

M0 n := M0 n 8 >
:

n

q +1

q +1

n

Y

Y

2 M0 : (Y ) ,  "j j 2 M0 : (Y )  " q +1

q +1

o

q +1

M0 = (M n fY g) [ fY q +1

q +1

q

q

q;1

;Y

q;2

o

if

q +1

if

q +1

g

6= 0; = 0;

(cf. Section 7.1). The stopping criterion must be de ned accordingly. Theorem 9.3.1 in the next section shows that, for a rectangle Y = fy 2 IR : `  y  ug, the (worst) bounds (Y ) = f (`) (if P (`) 6= ;, otherwise Y is deleted) along with (Y ) = f (u) yield a convergent algorithm. p

75

9.3. Convergence

Therefore, the above described basic algorithm is intended to provide a conceptual scheme rather than a universal ecient algorithm. Improvement of the bounds will obviously preserve convergence in the sense of Theorem 9.3.1. When solving the system of inequalities x 2 P (y ) is computationally cheap, one could, for example, after nding x 2 P (`) search for x 2 P (u). If P (u) 6= ;, lower and upper bounds on Y would coincide, i.e. the best feasible solution in the rectangle being examined has been found, so that the rectangle can be pruned (deleted from further consideration). If P (u) = ;, one could investigate P ((` + u)=2); in case P ((` + u)=2) = ;, consider P (3` + u)=4), and consider P ((` + 3u)=4) when P ((` + u)=2) 6= ;, etc. Sections 9.4 and 9.5 present some more sophisticated LP{based bound improvement strategies for the case when all functions involved in problem (9.2) are ane. 9.3

Convergence

Of course we must show that our Branch{and{Bound algorithm converges if the bounds given in (9.9) and (9.10) are used and bisection of rectangles is used in every iteration. Remember that bisection along the longest edge is an exhaustive partitioning technique.

Theorem 9.3.1 The Branch{and{Bound algorithm for the sum{of{ratios problem described in the previous section terminates after a nite number q of iterations yielding an "{optimal solution (xq ; y q ). Proof. Let fYq gi2IN be an arbitrary decreasing subsequence of rectangles generated by the algorithm. Suppose that fYq g is in nite. Since exhaustive partitioning is used, the sequence of diameters d(Yq ) of Yq converges to 0 as i ! 1. Therefore, since f (y ) is continuous, for any positive real number "0, there must be an index i0 such that i

i

i

i

f (uqi0 ) , f (`qi0 )  "0 ;

where Yq 0 = fy : `q 0  y  uq 0 g. Since qi0 describes a nite depth in the Branch{and{Bound tree of the algorithm and at each depth only a nite number of iterations can occur, there must be a nite iteration index q corresponding to qi0 such that the algorithm stops in iteration q . It remains to show that the point (xq ; y q ) is an "{optimal solution. We demonstrate this for the case f (y q ) 6= 0. The case f (y q ) = 0 can be treated analogously. If the algorithm stops at iteration q, then Mq = ;, in other words (Y )  q + "j q j 8 Y 2 Mq0 ,1 ; i

i.e.

i

i

max (Y )  q + "j q j:

Y 2M0q,1

76

Chapter 9. Sums of Ratios and the Corresponding MOP

But it is clear that for all y 2 Y0 with P (y ) 6= ;, we have f (y ) 

max (Y ):

Y 2Mq0 ,1

Combined with (9.10), i.e. with the fact that q = f (y q ) by construction, the last two inequalities give the desired result. }

9.4 Upper Bounds for Sums of Ane Ratios In the basic sum{of{ratios algorithm, as outlined above, the crude upper bound f (u) proposed in (9.9) for the optimal value of problem (9.7) restricted to the rectangle Y = fy 2 IRp : `  y  ug can be improved substantially when the optimal solution of its Lagrange{dual can be computed. For y 2 Y , let the set P (y ) from (9.6) be described as P (y ) = fx 2 IRn : g (x; y )  0g

with appropriate g : IRn  Y ! IRm+p. Then the Lagrange{dual problem of problem (9.7) is min d(v ); s.t. v 2 IRm+ +p ; (9.11) where ) (X p yi + v T g (x; y ) : y 2 Y : (9.12) d(v ) := sup i=1

Let y~ be an optimal solution to problem (9.7) restricted to the partition set Y , i.e. let y~ satisfy (X ) p p X f (~ y) = y~i = max yi : x 2 P (y ); y 2 Y ; i=1

i=1

and let v~ be an optimal solution to problem (9.11). Then it is well{known that d(~ v)  f (~ y ):

This is the weak duality theorem. But, unless suitable convexity and regularity conditions are ful lled, we have to expect a positive duality gap d(~ v ) , f (~ y ) > 0:

However, it has been shown in Chapter 8 that, under very mild regularity conditions (such as upper semicontinuity of the function involved), the duality gap eventually reduces to zero when in (9.7) and in (9.11){(9.12) the rectangle Y is replaced by a nested sequence of rectangles Yq satisfying d(Yq ) & 0 as q ! 1. Dual bounds are exact in the limit when combined with an exhaustive subdivision procedure.

77

9.4. Upper Bounds for Sums of Affine Ratios

For the remainder of this section, we con ne ourselves to the case where all functions involved in problem (9.1){(9.2) are ane: Let P := fx 2 IRn : Ax  b; x  0g where A = (aij ) 2 IRmn ; b 2 IRm, and let, for i = 1; : : :; p, ni(x) = hx; cii + i; di(x) = hx; dii + i n with ci ; di 2 IR and i ; i 2 IR. For a given rectangle Y = fy 2 IRp : `  y  ug, the problem we have to solve is p X

max yi i s.t. x 2 P (y ); y 2 Y:

(9.13)

=1

where now P (y ) = fx 2 P : ni(x) , yi di (x)  0; i = 1; : : :; pg. Next we show that, for the case of ane ratios, the Lagrange{dual of this problem reduces to a linear program, which can easily be solved by standard optimization software. Problem (9.13) can be rewritten in the form p X

max yi i s.t. A(y )x  b(y ); x  0; y 2 Y; where, letting cT denote the transpose of a vector c, we de ne for all y 2 Y =1

0 cT , y dT 1 B C ... B C C 2 IR p m n ; A(y) := B B T T @ cp , ypdp C A A 0 y , 1 B C ... B C C 2 IRp m: b(y) := B B @ yp p , p C A b 1 1

1

( +

and

(9.14)

1

1

)

1

+

The dual objective function in (9.12) becomes

d(v) = sup

(X p i=1

) D E yi + A(y)x , b(y); v : x  0; y 2 Y :

and the dual problem (9.11) now reads min sup

(X p i=1

) D E yi + A(y)x , b(y); v : x  0; y 2 Y s.t. v 2 IRm p : +

+

(9.15)

78

Chapter 9. Sums of Ratios and the Corresponding MOP

The objective function can be simpli ed (at the cost of additional constraints) as follows: Clearly, there exist v 2 IRm+ +p such that AT (y)v  0 8 y 2 Y: (9.16) Take, e.g., v = 0. For every v satisfying (9.16), the value d(v ) is attained at x = 0, i.e. we have (X ) p d(v) = max y , h b ( y ) ; v i : i y2Y i=1

On the other hand, whenever for some v~ 2 IRm+ +p there exist y 2 Y and i 2 f1; : : :; m + pg such that (AT (y )~v)i > 0 then xi(AT (y)~v)i ! +1 as xi ! +1; so d(~v) = +1: Therefore, the dual problem reduces to (X ) p y , h b ( y ) ; v i min max i y2Y i=1 (9.17) s.t. AT (y )v  0 8 y 2 Y;

v 2 IRm+ +p : Now the objective function has been considerably simpli ed, the same can be done with the constraints. To this purpose, let aTj (y )v denote the j th row of AT (y ). The constraint AT (y )v  0 8 y 2 Y (9.18) can be rewritten as max haT (y); vi  0; j = 1; : : :; n: (9.19) y 2Y j For every v , the functions haTj (y ); v i are ane in y 2 Y . Hence, each maximum in (9.19) is attained at some vertex of Y such that (9.18) reduces to a nite number of linear constraints. Moreover, the maxima in (9.19) can be calculated explicitly. Let cij ; dij ( i = 1; : : : ; p ; j = 1; : : : ; n) denote the entries of ci ; di , respectively and let aij ( i = 1; : : :; m ; j = 1; : : :; n) denote the entries of A. Since p m X X haTj (y); vi = (cij , yidij )vi + aij vi+p; we have

i=1

p X

i=1

m X (cij , yi dij )vi + aij vi+p max `  y  u i=1 i i i i=1 p m X X = (cij , yij dij )vi + aij vi+p ; i=1 i=1

max haT (y); vi = y2Y j

79

9.4. Upper Bounds for Sums of Affine Ratios

where

8 < `i

yij = :

if dij  0 ui if dij < 0:

We have thus reformulated the constraint AT (y )v  0 8 y 2 Y of (9.17) as p m X X j = 1; : : :; n: (cij , yij dij )vi + aij vi+p  0; i=1

i=1

A similar reasoning shows that the objective function d(v ) in (9.17) can be reduced. We have ) (X p T b(y ) y , v max y2Y i=1 i ) (X p p m X X b v v ( y , ) , y , = max i i+p i y2Y i=1 i i=1 i i i i=1 (X ) X p p m X = max (1 , ivi )yi + i vi , bi vi+p ; y 2Y i=1

i=1

i=1

i=1

i=1

where bi , i = 1; : : :; m; are the entries of the vector b. We thus arrive at the formulation (X )! p p m X X min i vi , bi vi+p + max (1 , i vi)yi : y 2 Y i=1

p X

s.t.

i=1

(cij , yij dij )vi +

m X i=1

aij vi+p  0;

j = 1; : : :; n:

(9.20)

v 2 IR+p+m : Next, we show that problem (9.20) can be formulated as one linear program. In the last part of the objective function of (9.20), we express the values yi by the new variables zi = (yi , `i )=(ui , `i) which maps `i 7! 0 and ui 7! 1. Then one obtains, using separability of linear optimization over a rectangle, ) (X p max (1 , ivi )yi : y 2 Y i=1

= max z =

p X i=1

(X p i=1

) X p (1 , i vi )(ui , `i )zi : 0  zi  1; i = 1; : : : ; p + (1 , ivi )`i

(ui , `i ) maxf0; 1 , ivi g +

= min t

(X p i=1

i=1

p X i=1

(1 , ivi )`i

) X p (ui , `i)ti : ti  0; ti  1 , ivi ; i = 1; : : : ; p + (1 , ivi )`i: i=1

80

Chapter 9. Sums of Ratios and the Corresponding MOP

From the above discussion we obtain the following result. For each rectangle Y = fy : `  y  ug an upper bound for the p optimal value of problem (9.14) is given by (Y ) = (Y ) + P `i , where (Y ) is the i=1 optimal value of the following linear program (in the variables t; v ): Proposition 9.4.1

min s.t.

m Xp (u , ` )t + Xp ( , ` )v , X bv i=1

i

i i

i=1

i

i i i

m Xp (c , y d )v + X aij vi+p  0; ij ij ij i i=1

i=1

i=1

ti  1 , ivi ;

i i+p

j = 1; : : : ; n;

i = 1; : : : ; p;

+p v 2 IRm + ;

t 2 IRp+ :

In the Branch{and{Bound algorithm, before applying Proposition 9.4.1, we rst check whether the partitioning rectangle Y = fy : `  y  ug can immediately be deleted from further consideration. Immediate deletion occurs when there is no x 2 P satisfying n (x) `i  i  ui ; di (x) which can be checked, e.g., by Phase I of a simplex algorithm (deletion by infeasibility). Of course immediate deletion also occurs when f (u) =

Xp u   ; i q i=1

where q is the current lower bound (i.e. the best objective function value at a feasible point found so far).

9.5 Lower Bounds for Sums of Ane Ratios 9.5.1 The Corresponding Multiple{Objective Problem To compute lower bounds in a Branch{and{Bound algorithm for the ane sum{of{ratios problem, we associate with (9.2) the following multiple{objective program: max n1(x)=d1(x) ... (9.21) max np(x)=dp (x) s.t. x 2 P:

81

9.5. Lower Bounds for Sums of Affine Ratios

In this problem, we attempt to maximize the p objective functions simultaneously over the feasible set P . As, in the context of multiple{objective programming, there is in general no point x 2 P which maximizes the p objective functions at the same time, we have to deal with the following concept of \ecient solutions":

De nition 9.5.1 A point

x 2 P is called ecient for the multi{objective problem (9.21), if there is no point x~ 2 P satisfying  ni(~ x)  ndi((xx)) for all i 2 f1; : : : ; pg; di (~ x) i with strict inequality holding for at least one index i0 2 f1; : : : ; pg.

The connection between the sum{of{ratios problem (9.2) and the multi{objective problem (9.21) is established in the following known result.

Lemma 9.5.1 Every optimal solution of problem (9.2) is ecient for the multi{criteria problem (9.21).

Proof. Assume that x is an optimal solution of problem (9.2) but not ecient for  the multiple{objective program (9.21). Then, there is x~ 2 P satisfying nd xx  nd xx ,  n x i = 1; : : : ; p, and d x > nd xx for at least one j 2 f1; : : : ; pg. This implies j (~)

j(

)

j (~)

j(

)

p X ni(~ x) i=1

di (~ x)

>

p X ni(x) i=1

di (x)

i (~)

i(

)

i (~)

i(

)

;

contradicting the optimality of the point x for problem (9.2).

}

Notice that Lemma 9.5.1 holds not only for ane functions involved in the problems, but also for general nonlinear functions. Now suppose we are given a partition set Y = fy 2 IRp : `  y  ug generated by the Branch{and{Bound algorithm and we want to compute a better lower bound for the objective function f (y ) = Ppi yi than the crude bound (Y ) = f (`) given in (9.10). Our intention is to calculate ecient points x 2 P of problem (9.21) and use them to obtain better lower bounds. Notice that in view of Lemma 9.5.1, we can restrict the search for optimal solutions to the set of ecient points. =1

9.5.2 A Generalized Parametric Approach For p = 1, there is a rich class of algorithms based on the parametric optimization problem ( p )  X max ni (x) , yi di (x) : x 2 P (9.22) i=1

82

Chapter 9. Sums of Ratios and the Corresponding MOP

with parameter 2 IR . Let ( ) denote the optimal objective function value of (9.22), and let  be an optimal solution of (9.2). Then, for = 1 and  = 1( ) 1 ( ) one has (cf. Dinkelbach [20])  ( ) 0 i ( ) = 0 i =   ( ) 0 i Optimal solutions of (9.2) are optimal solutions of (9.22) with =  . Thus, solving (9.2) is essentially equivalent to nding the root of the equation ( )=0 For a bibliographic survey on the various methods based on this connection between (9.2) and (9.22), see Schaible [64]. For 1 and all functions involved ane, Almogy and Levin [3] claim that an optimal solution of (9.2) is also characterized by ( ) = 0. This is not true anymore for 1, a counterexample (for = 2) is given in Falk and Palocsay [28]. However, for arbitrary 2 IN, no exact analysis of the meaning of a pair (  ) satisfying ( ) = 0,  optimal solution of (9.22) with = , is known. y

p

z y

x

p

y

z y

n

x

>

=d

;

y < y ;

z y

y

z y

x


y :

y

z y

y

:

p >

z y

p >

p

p

x ;y

y

z y

x

y

We next show that an optimal solution  = ( ) of (9.22) satisfying ( ) = 0 is an ecient point of the multiple{objective problem (9.21). Conversely, for each ecient point  there is a point  such that ( ) = 0. Every optimal solution  of problem (9.2) is also ecient for the associated multiple{ objective program, but there might exist 2 IR such that  solves (9.22) with ( ) 6= 0 (cf. Falk and Palocsay [28]). x

x

y

x

y

z y

z y

x

p

y

x

z y

Lemma 9.5.2 Let for xed  2 IR with ( ) 6= ; y

p

P y

(

( ) = max ( ) =

z y

z x

=

p  X

( ) ,

ni x

i=1

X p

i=1



( ),



ni x

)



( ) : 2 ( )

y i di x

x



( ) = 0

yi di x

P y

(9.23)

:

Then x is ecient for the multiple{criteria problem (9.21).

Proof. De ne ( ) = ( ) ( ) ( = 1

). When ( ) = 0, we see from the , that   = ( ) = 1 2 Suppose that  is not ecient. Then there exist ~ 2 satisfying (~)   for all = 1 (~)  for some 2 f1 g fi x

ni x =di x ;

i

de nition of (  ) and ( ) 0 = 1 P y

di x x

>

; i

P;

;:::;p

z y

;:::;p

yi

fi x

;

i

x

x

fi x

fj x

> yj

yi ;

i

j

; : : : ; p:

P

; : : : ; p;

;:::;p :

83

9.5. Lower Bounds for Sums of Affine Ratios

It follows that ~ 2 (  ) and (~) 0, contradicting the optimality of x

P y

z x

>

x



in (9.23). }

Conversely, if  is ecient for the multiple{objective problem (9.21), and  = ( ), =1 , then clearly ( ) = 0 Moreover,  must also be an optimal solution of (9.23), since otherwise, there would exist a point ~ 2 ( )  satisfying (~)  ( ) =1 , and (~) ( ) for at least one 2 f1 g. Next, we discuss a straightforward sequential algorithm, which is similar to approaches usually designed for solving parametric problems with linear dependence on the parameters. Monotonical, but not necessarily nite convergence to an ecient point is shown. x

i

;:::;p

z y

:

x

i

;:::;p

fj x

yi

fi x

fi x

fi x

x

> fj x

P y

j

P

;

;:::;p

The algorithm can be implemented by standard available optimization software, for example, when (i) the set is convex, the functions are positive concave and the functions are convex, = 1 , or (ii) all functions involved in (9.1){(9.2) are ane. P

ni

di

i

;:::;p

Algorithm A: Starting with

y

0 = `0 ;

max

= 0, determine an optimal solution

q

( z

q

( )= x

X p

i

If ( ) = 0, then Stop. Otherwise, set z

q

x

=1

( ),

ni x

x

q



of

( ) : 2 ( )

q

yi di x

x

P y

q

)

(9.24)

:

q

q

yi

+1 =

set := + 1 and repeat. q

( ) ( ) q

ni x di x

q

;

i

=1

; : : : ; p;

q

The following properties hold if is compact, ( ) 0 on , and all functions involved are continuous. P

di x

>

P

Lemma 9.5.3 If the above algorithm does not terminate at iteration 0, then for each  0 we have (i) ( )= 6 ;, (ii)  +1 =6 +1, (iii) for  1, we have ( +1 )  ( ) ( +1) = 6 ( ), q

q

q

P y y

q

q

y

q

; y

q

(iv)

z

q

( ) x

q

q

y

q

P y

> z

q

+1 (xq+1).

q

P y

q

; P y

q

P y

q

84

Chapter 9. Sums of Ratios and the Corresponding MOP

Proof. (i): The initial problem (9.24) has feasible points since P = P (`0) due to (9.3), and x 2 P (y +1 ) 8 q . (ii): For i = 1; : : :; p and x 2 P (y ) we have q

q

q

q

+1 y  nd ((xx )) = y +1  nd ((xx +1)) ; since x +1 2 P (y +1). But there is at least one index j with y < y +1, since y = y +1 = n (x )=d (x ) for all i would imply z (x ) = 0, and the algorithm would have stopped at iteration q . (iii): From the de nition of P (y ) and from (ii) we have P (y +1)  P (y ) and x ,1 2 P (y ). But, by (ii), there is at least one index j 2 f1; : : :; pg such that ,1 y +1 > y = nd ((xx ,1)) ; i.e. x ,1 2= P (y +1). q

i

q

q

i

q i

i

i

q i

q

i

q

q

q j

q

i

q

q

q

q

q

q j

q i

q i

q

q

q j

q

q

j

q j

q

j

q

(iv): We have

z (x )  z (x +1) = q

q

q

>

q

p  X

=1

n (x i

i

q

p  X

n (x +1) , y d (x +1) q i

q

i

i

q



i=1  +1) , y q+1d (xq+1) = z q+1(xq+1); i

i

where the rst inequality comes from fx ; x +1g  P (y ), and the second from (ii). } q

q

q

Lemma 9.5.4 If Algorithm A is in nite, then (i) y " y  (as q ! 1); y   maxfn (x)=d (x) : x 2 P g; 8 i = 1; : : : ; p: q

(ii)

i

i

1 T P (yq ) = P (y). q =1

i

Proof. (i): For all q, since P (y ) 6= ; (Lemma 9.5.3(iii)), there is x~ 2 P , such that q

q

(~x ) ; y  nd (~ x) q i

and hence

i

i

q

i = 1; : : :; p;

q

y  maxfn (x)=d (x) : x 2 P g: The maximum exists by continuity of n ; d and compactness of P . Using monotonicity q i

(Lemma 9.5.3(ii)) we obtain (i).

i

i

i

i

85

9.5. Lower Bounds for Sums of Affine Ratios

(ii): By Lemma 9.5.3(iii) and the fact that P (y ) is compact for every q , q

P  :=

1 \ q

=1

P (y ) 6= ; q

exists. Since

x 2 P  () x 2 P (y ) () n (x)=d (x)  y () n (x)=d (x)  sup y = y

8 q 2 IN 8 q 2 IN; i = 1; : : :; p

q

i

i

i

i

q i

q2

q i

IN

i = 1; : : :; p

i

}

(Lemma 9.5.3(ii)), we obtain P  = P (y ).

 Proposition 9.5.1 If Algorithm A is in nite, then z := lim z (x ) = 0. Every accumulation point x of the sequence fx g is ecient for the multiple{objective problem (9.21) and optimal for the limit problem (with y  = lim y as in Lemma 9.5.4) !1 q

q

q

q

q

z = max

( p X

=1

i

)



n (x) , yd (x) : x 2 P (y) : i

i

i

Clearly, z (x ) > 0 for all q so that, by Lemma 9.5.3 (iv), z  := lim z (x ) !1 exists, and z   0. Suppose that z  > 0. Since z (x ) > z +1 (x +1) for all q , we must have  X n (x ) , y d (x )  z > 0:

Proof.

q

q

q

q

q

q

q

q

q

p

i

i

=1

q i

q

q

i

This is only possible, if, for all q , there is at least one index i 2 f1; : : : ; pg such that q

n (x ) , y d (x )  z =p > 0; which, after dividing by d (x ) > 0, is equivalent to n (x )  z + y : d (x ) p  d (x ) Switching to a subsequence, if necessary, since p is nite, we can assume that i = j , with j xed for all q . Using 0 < d (x )  maxfd (x) : x 2 P (y 0 )g =:  8i 0 by Lemma 9.5.3(iii) and compactness of P (y ) 6= ;, we deduce that  y +1 = nd ((xx ))  y + p z  : This contradicts Lemma 9.5.4(ii), since fy g would be unbounded. q iq

q

iq

iq



q

q

iq

iq

iq

q

q

q

iq

q iq

q

i

q

i

q j

j

j

i

q

q

q j

q j

j

86

Chapter 9. Sums of Ratios and the Corresponding MOP

Next, let  be an accumulation point of the sequence f g, which exists, since all ( ) are contained in the compact set . Without loss of generality, we denote the corresponding subsequence converging to  by f g again. We show that  is an optimal solution of the limit problem q

x

P y

x

q

P

x

max

(

z



p  X

( )= x

x

( ),

i=1

x

)



( ) : 2 ( )



ni x

q

yi di x

x

P y

:

Then ( ) = 0, which, by Lemma 9.5.2, implies that  is ecient for the multiple{ objective program (9.21). Now suppose that  is not an optimal solution of maxf ( ) : 2 ( )g. Then there exists ~ 2 ( ) satisfying (~) ( ), i.e. z

x

x

x

x

P y

x

lim

q !1

>

p  X i=1

(~) ,

ni x

p  X i=1

z



(~) =

q

yi di x

( ) ,

ni x

z

P y



p  X

(~) , p  X

( ) = lim !1



q

i=1

> z

x

x



(~)



ni x

i=1

yi di x

x

yi di x

( ),

ni x

q

( )

q

yi di x

q

 :

But this is only possible, if, for some index 0 , q

X p

i=1

(~) ,

ni x

q0



(~)

y i di x

p  X

>

( 0) ,

ni x

i=1



( 0)

q0

q

yi di x

q

;

contradicting that 0 is de ned to be optimal for maxf 0 ( ) : 2 ( 0 )g, recall that ~ 2 ( 0 ) by Lemma 9.5.4(iii). Finally, suppose that x

x

P y

q

z

q

x

x

P y

q

q

z



( ) = x

p  X i=1

( ) ,

ni x

This is only possible, if, for at least one 2 f1 ( ) ,  ( ) and hence, by continuity of the functions involved, ( ) = lim (  ( ) !1 ( a contradiction. j

nj x

yj


q

nj x

nj x dj x



( )



yi di x

q

dj x

q

>

0

:

g, 0

;

) )=



yj ;

}

9.5.3 A Finite Procedure for Calculating Ecient Points The following Procedure EFF determines an ecient point for problem (9.21) in at most iterations. It is called the lexicographic method for nding an ecient point. For more details about this method as well as multiple{objective optimization in general we refer to Steuer [68]. A nite outer approximation algorithm which generates all ecient points for a multiple{objective linear problem has recently been proposed by Benson [9]. p

87

9.5. Lower Bounds for Sums of Affine Ratios

Procedure EFF:

Initialization:

Set = Set = 1. yi

`i ; i

=1

.

;:::;p

q

Iteration

q:

Solve the single{ratio program max f ( ) ( ) : ( ) , nq x =dq x

ni x

( )0 =1

yi di x

; i

;:::;p

; 2 g x

P

(9.25)

:

Let and be an optimal solution and the optimal value of (9.25), respectively. If is the unique optimal solution of (9.25) or = , then set x

x

q

tq

q

q

x



=



q

x ;

yi

 = (( )) ni x di x

;

i

p

=1

; : : : ; p;

and terminate. Otherwise, set = , set := + 1, and go to the next iteration. yq

tq

q

q

If is a polyhedral set and all functions ( =1 ) are ane, then each of the single{ratio problems (9.25) in the procedure reduces to an ordinary linear problem (cf. Charnes and Cooper [16]) and is therefore easily solvable, e.g. with the simplex method. However, Procedure EFF can be applied in more general situations whenever the single{ ratio problems (9.25) can be solved. This is also the case if the set is convex, the functions are positive concave and the functions are convex ( = 1 ). P

ni ; di

i

;:::;p

P

ni

Proposition 9.5.2

di

i

;:::;p

The pair (x; y ) generated by the Procedure EFF satis es the fol-

lowing properties: (i) y   ` and x 2 P (y ), (ii) x is an ecient point for the multiple{objective problem (9.21).

Property (i) is obvious. (ii): If Procedure EFF terminates at iteration , then  = is the unique solution of problem (9.25). Suppose that  is not ecient. Then there exists ~ 2 ( ) such that ( ) (~) =1 (~)  ( ) with strict inequality holding for at least one index 0 2 f1 g. Uniqueness of implies ( ) (~) = 6 (~) ( ) Proof.

q < p

x

x

q

x

x

ni x

ni x

di x

di x

;

i

; : : : ; p;

i

nq x

nq x

dq x

dq x

P `

;

;:::;p

x

q

88

Chapter 9. Sums of Ratios and the Corresponding MOP

and hence

nq (~ x) nq (x) > ; dq (~ x) dq (x)

contradicting the optimality of x = x for (9.25). If Procedure EFF terminates at iteration q = p, then x = x is ecient for (9.21) by construction of the procedure. } q

p

Notice that we have y   ` but not necessarily y  2 Y = fy 2 IR : `  Nevertheless we can use (Y ) = y  = f (y )  f (`)

X

p

y

 ug.

p

i=1

i

as a lower bound. This does not destroy the convergence of the basic Branch{and{Bound algorithm since x is feasible and f (y )  f (`). Notice also another useful property of Procedure EFF: Consider two rectangles Y1 = fy 2 IR : `1  y  u1g; Y2 = fy 2 IR : `2  y  u2g satisfying `1  `2; `1 6= `2 and denote by y  the point generated by Procedure EFF for Y ; i = 1; 2: If we have y 1  `2, then it is easy to see that y 2 = y 1 holds. Therefore, (Y2) = (Y1), i.e. no lower bound calculation is necessary for the rectangle Y2 . p

p

i

i

9.6 Numerical Results Our algorithm was implemented for ane sum{of{ratios problems in Fortran 90 and run on a Sun Sparc{station 4. We rst illustrate our algorithm by one concrete example, then we present some statistical data gained from running random test examples. Consider the following example which is taken from Falk and Palocsay [28]: 2x2 + x3 2 , 2x3 + 0:8 + 47xx1 , max 3x1 2+x x, x2 + x3 1 1 + 3 x2 , x3 s.t. , x1 , x2 + x3  , 1 x1 , x2 + x3  1 ,12x1 , 5x2 , 12x3  ,34:8 ,12x1 , 12x2 , 7x3  ,29:1 6x1 , x2 , x3  4:1 x1; x2; x3  0 Our initial partition set is Y0 = fy : `0  y  u0 g with `0 = (0; 0:35131) and u0 = (1:9; 1:15686). For this rectangle we obtain the upper bound 0 = 3:05686 and the lower

9.6. Numerical Results

89

bound 0 = 2:47143. Procedure EFF which is used to calculate the lower bound gives the ecient point x = (1; 0; 0) and y  = (1:9; 0:57143). It turns out that x = (1; 0; 0) is the optimal solution, although our algorithm takes 23 more iterations to identify it as such (in this example, " = 10,3 was the chosen accuracy). This phenomenon is a drawback which is often encountered in Branch{and{Bound algorithms. The maximal number of partition sets generated throughout the algorithm was 6, the required CPU-time was 51 msec. Next, we present some numerical results obtained from randomly generated test{ examples. For each of the combinations of p; n, and m listed Table 9.1, 100 test examples were randomly generated in a way that ensured that all the assumptions made in Section 9.2 be satis ed. The statistical results can be seen from Table 9.1. Here \Iterations" stands for the average number of iterations needed to solve the problem, \PartSets" stands for the average maximal number of partition sets generated by the algorithm and \CPU{Time" denotes the average run{time in seconds. The accuracy " was again chosen to be 10,3 . p n m Iterations PartSets CPU{Time 2 6 8 55.33 12.22 0.36733 2 8 10 73.10 17.50 0.74150 2 10 12 65.48 15.88 0.96370 3 6 8 386.20 86.73 3.29903 3 8 10 564.13 134.31 8.08791 3 10 12 498.56 111.43 9.75902 4 6 8 1038.11 212.10 13.61427 4 8 10 1721.78 469.12 37.79000 4 10 12 1405.30 318.90 33.35443 Table 9.1: Numerical results for 100 random test examples. Table 9.2 reveals some more insight into the nature of these sum{of{ratios optimization problems from the point of view of statistical behaviour. Minimum Average Maximum Stand.Dev. No. Iterations 18 498.56 8757 1120.578 No. PartSets 5 111.43 1882 271.617 CPU-Time 0.4690 9.75902 181.437 23.657 Table 9.2: Statistical results for 100 random test examples with p = 3; n = 10; m = 12.

90

Chapter 9. Sums of Ratios and the Corresponding MOP

Chapter 10 Second Branch{and{Bound Approach for the Sum{of{Ratios Problem In this chapter, we present a di erent Branch{and{Bound algorithm for the ane sum{ of{ratios problem. This algorithm uses a di erent reformulation of the problem and does not make use of Lagrange{duality to calculate bounds. The techniques used here to obtain bounds are often used to design Branch{and{Bound algorithms for various types of problems. As partition sets, the algorithm presented here uses simplices rather than rectangles. Like the algorithm outlined in Chapter 9, this partitioning is performed only in the space IRp , where p is the number of ratios. Since normally p is considerably smaller than the dimension n of the x-space, this will improve eciency as compared to conceivable Branch{and{Bound approaches operating in the space IRn of decision variables. For each simplex, upper bounds are obtained by maximizing a linear function which overestimates the objective function. This can be done by solving an ordinary linear program. While doing so in every iteration, we immediately obtain feasible points whose objective function values are used to compute lower bounds. We prove the convergence of this algorithm and conclude with some numerical examples. However, these numerical results are not too encouraging, compared to the results obtained in the previous chapter. While the algorithm outlined there, which used more sophisticated techniques for obtaining bounds (dual bounding procedures and ecient point calculation) was applicable to problems with up to four ratios involved, the algorithm presented in this chapter does not seem to be satisfactory even for problems with only two ratios. This is a point in favour of the algorithm using dual bounds. It can not be inferred, however, that the use of dual bounds will generally lead to faster algorithms, not even for this particular problem. The structure of the two algorithms is too di erent to allow an estimation of the e ect of the upper bounding procedure only. Both the partitioning method and the lower bounding procedure also a ect the numerical performance of the algorithm. 91

92

Chapter 10. Second B&B Approach for the Sum{of{Ratios Problem

10.1 Reformulating the Problem The problem we are concerned with is the following: max

X ! hx; c i + p

i

i

hx; d i + =1 s.t. x 2 P; i

i

i

(10.1)

i

where c ; d 2 IR ; ; ; ! 2 IR; ! > 0 (i = 1; : : : ; p), and P is a polytope in IR . This formulation is slightly di erent from the one considered in Chapter 9, as the numbers ! come into the problem. If these ! are interpreted as weights, the objective function in (10.1) can be viewed as a utility function expressing a weighted compromise between the objective functions of a multiple{objective problem. For utility function programming, we refer to Horst and Thoai [50]. We assume throughout that both numerators and denominators are positive on P , more precisely we assume that hx; c i + > 0 8 x 2 P; and that there exists  > 0 such that for i = 1; : : : ; p i

i

n

i

i

i

n

i

i

i

i

i

hx; d i +   > 0 i

8 x 2 P:

i

(10.2)

Introducing new variables y ; i = 1; : : : ; p, we can transform problem (10.1) into the following equivalent problem: i

max

X ! hx; c i + p

i

y

i

i=1

i

i

s.t. hx; d i +  y ; i = 1; : : : ; p i

i

i

x 2 P:

(10.3)

Equivalence is here understood in the following sense: Let x 2 P be an optimal solution of (10.1). Put i = 1; : : : ; p; y  = hx ; d i + ;   then (x ; y ) is an optimal solution of (10.3). Conversely, every optimal solution (x; y ) of (10.3) ful lls y  = hx ; d i + ; i = 1; : : : ; p;  and x is an optimal solution of (10.1). In formulation (10.3), the problem is linear in x and convex in y . It is therefore possible to use algorithms proposed for more general problem classes in Horst and Thoai [49] or in Horst, Muu and Nast [43], respectively. Here however, we follow a di erent approach which takes into account the speci c structure in (10.1). To this purpose we need some notations and de nitions. i

i

i

i

i

i

93

10.2. The Algorithm

First, we can easily calculate numbers M (i = 1; : : : ; p) satisfying n o M  max hx; d i + : x 2 P which exist by compactness of P . Next, we de ne the set n o

:= (x; y ) 2 IR  IR : x 2 P; hx; d i +  y  M ; i = 1; : : : ; p : Denote by Y the projection of onto IR : n o Y := y 2 IR : 9x 2 P such that hx; d i +  y  M ; i = 1; : : : ; p : Since projections of polyhedra onto subspaces are again polyhedra, clearly, both and Y are polytopes (boundedness follows from (10.2)). With the de nition ) (X hx; c i + : (x; y) 2

F (y ) := max ! i

i

i

n

i

p

i

i

i

i

p

p

i

p

x

i

i

y

i

i=1

i

i

i

i

problem (10.3) becomes equivalent to

max F (y ) (10.4) s.t. y 2 Y: Clearly, if (x; y ) is an optimal solution of (10.3) then y  is an optimal solution of (10.4). Conversely, if y  is an optimal solution of (10.4), choose x 2 P such that X hx ; c i + F (y  ) = ! ; p

i

y

i

i=1

i.e. choose

x 2 Argmax

i

i

(X ) hx; c i + : (x; y) 2 : !  p

i

i

i=1

y

i

i

Then (x; y ) is an optimal solution of (10.3), i.e. x is an optimal solution of the original problem (10.1). For formulation (10.4) of the problem we propose a simplicial Branch{and{Bound{ algorithm which is described next.

10.2 The Algorithm In order to apply the Branch{and{Bound scheme of Chapter 7 to our problem (10.4), we need to nd a starting simplex S1 containing Y . Such a simplex is, e.g., ( ) X S1 = y 2 IR : y   (i = 1; : : : ; p); y  M ; =1 P where M = M. p

p

i

i

i

p i=1

i

The upper bounding procedure is described in the following section. The main idea there is to nd a function which overestimates F (y ) on the current partition set. Calculation of an upper bound then leads to the solution of an ordinary LP. Feasible points are a by{product of the solution of this linear program, as described in Section 10.2.2.

94

Chapter 10. Second B&B Approach for the Sum{of{Ratios Problem

10.2.1

Upper Bounds

Let S = convfv 0; v 1; : : : ; v pg  S1 be a simplex generated by the algorithm. We construct a convex function F^S (y ) which overestimates F (y ) on S : De ne the set S n := fx 2 IRn : 9y 2 S such that (x; y ) 2 g : (10.5) n n S n is the projection of the set \ (IR  S ) onto IR . Next, de ne the function F^S (y ) : S ! IR as ) (X p hx; cii + i : x 2 S n : F^S (y ) := max !i y i

i=1

F^S has the following properties:

Lemma 10.2.1 (i)

F^S (y ) is convex.

(ii)

F (y )  F^S (y ) for y 2 S .

Proof. (i) For every xed x 2 S n, clearly '(y) := Ppi=1 !i x;ciyi+ i is a convex function, since yi   > 0 and hx; ci i + i  0 on P (i = 1; : : : ; p). Therefore, the function F^S (y ), h

i

being the pointwise maximum of a family of convex functions, is also convex. (ii) Let y 2 S . Then (X ) p hx; cii + i : (x; y) 2

F (y ) = max ! i x yi i=1 ) (X p h x; cii + i nS : ( x; y ) 2 S  max ! i x yi i=1 (X ) p h x; cii + i !i = max : x 2 Sn y = F^S (y ):

i=1

i

} Let p+1 denote the standard simplex in IRp+1 , i.e. ( ) p X p+1 :=  2 IRp+1 : i  0 (i = 0; : : : ; p); i = 1 : i=0

95

10.2. The Algorithm

Next, for a given simplex S = convfv 0; v 1; : : : ; v pg, consider the following linear program (in the variables  = (0; 1; : : : ; p) 2 IRp+1 and x 2 IRn ): max s.t. hx; dii ,

p X ^

j =0 p vij j

X

j =0

FS (v j )j

 , i; x 2 P;  2 p+1 :

i = 1; : : : ; p;

(10.6)

Our method for computing upper bounds is based on the following result:

Theorem 10.2.1 Let (S ) denote the optimal value of the linear problem (10.6). Then (S ) is an upper bound for F (y ) on S \ Y . Proof. Every y 2 S is uniquely representable as y=

and since F^S (y ) is convex, we have

p X

j =0

j v j ;

0p

 2 p+1 ;

1

X X F^S (y ) = F^S @ j v j A  F^S (v j )j : j =0

Therefore, using Lemma 10.2.1, we obtain maxfF (y ) : y 2 S \ Y g n^ o  max FS (y ) : y 2 S \ Y y

8 0p < X

1

p

j =0

9 =

p X

= max F^ @ j v j A : x 2 P;  2 p+1 ; hx; di i , vij j  , i (i = 1; : : : ; p); ;x : S j =0 j =0

8p