274 10 1MB
English Pages 354 [364] Year 2020
Springer Optimization and Its Applications 155
Alexander J. Zaslavski
Convex Optimization with Computational Errors
Springer Optimization and Its Applications Volume 155 Series Editors Panos M. Pardalos , University of Florida My T. Thai , University of Florida Honorary Editor Ding-Zhu Du, University of Texas at Dallas Advisory Editors J. Birge, University of Chicago S. Butenko, Texas A&M F. Giannessi, University of Pisa S. Rebennack, Karlsruhe Institute of Technology T. Terlaky, Lehigh University Y. Ye, Stanford University Aims and Scope Optimization has continued to expand in all directions at an astonishing rate. New algorithmic and theoretical techniques are continually developing and the diffusion into other disciplines is proceeding at a rapid pace, with a spot light on machine learning, artificial intelligence, and quantum computing. Our knowledge of all aspects of the field has grown even more profound. At the same time, one of the most striking trends in optimization is the constantly increasing emphasis on the interdisciplinary nature of the field. Optimization has been a basic tool in areas not limited to applied mathematics, engineering, medicine, economics, computer science, operations research, and other sciences. The series Springer Optimization and Its Applications (SOIA) aims to publish state-of-the-art expository works (monographs, contributed volumes, textbooks, handbooks) that focus on theory, methods, and applications of optimization. Topics covered include, but are not limited to, nonlinear optimization, combinatorial optimization, continuous optimization, stochastic optimization, Bayesian optimization, optimal control, discrete optimization, multi-objective optimization, and more. New to the series portfolio include Works at the intersection of optimization and machine learning, artificial intelligence, and quantum computing. Volumes from this series are indexed by Web of Science, zbMATH, Mathematical Reviews, and SCOPUS. More information about this series at http://www.springer.com/series/7393
Alexander J. Zaslavski
Convex Optimization with Computational Errors
Alexander J. Zaslavski Department of Mathematics Amado Building Israel Institute of Technology Haifa, Israel
ISSN 1931-6828 ISSN 1931-6836 (electronic) Springer Optimization and Its Applications ISBN 978-3-030-37821-9 ISBN 978-3-030-37822-6 (eBook) https://doi.org/10.1007/978-3-030-37822-6 Mathematics Subject Classification: 49M37, 65K05, 90C25, 90C26, 90C30 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The book is devoted to the study of approximate solutions of optimization problems in the presence of computational errors. It contains a number of results on the convergence behavior of algorithms in a Hilbert space, which are known as important tools for solving optimization problems. The research presented in the book is the continuation and further developments of our book Numerical Optimization with Computational Errors, Springer 2016 [92]. In that book as well as in this new one, we study the algorithms taking into account computational errors which always present in practice. In this case the convergence to a solution does not take place. We show that our algorithms generate a good approximate solution, if computational errors are bounded from above by a small positive constant. Clearly, in practice it is sufficient to find a good approximate solution instead of constructing a minimizing sequence. On the other hand in practice computations induce numerical errors, and if one uses methods in order to solve minimization problems these methods usually provide only approximate solutions of the problems. Our main goal is, for a known computational error, to find out what approximate solution can be obtained and how many iterates one needs for this. The main difference between this new book and the previous one is that here we take into consideration the fact that for every algorithm its iteration consists of several steps and that computational errors for different steps are different, in general. This fact, which was not taken into account in our previous book, is indeed important in practice. For example, the subgradient projection algorithm consists of two steps. The first step is a calculation of a subgradient of the objective function while in the second one we calculate a projection on the feasible set. In each of these two steps there is a computational error, and these two computational errors are different in general. It may happen that the feasible set is simple and the objective function is complicated. As a result, the computational error, made when one calculates the projection, is essentially smaller than the computational error of the calculation of the subgradient. Clearly, an opposite case is possible too. Another feature of this book is that here we study a number of important algorithms which appeared recently in the literature and which are not discussed in the previous book. v
vi
Preface
The monograph contains 12 chapters. Chapter 1 is an introduction. In Chap. 2 we study the subgradient projection algorithm for minimization of convex and nonsmooth functions. We begin with minimization problems on bounded sets and generalize Theorem 2.4 of [92] proved in the case when the computational errors for the two steps of an iteration are the same. We also consider minimization problems on unbounded sets and generalize Theorem 2.8 of [92] proved in the case when the computational errors for the two steps of an iteration are the same and prove two results which have no prototype in [92]. Finally, in Chap. 2 we study the subgradient projection algorithm for zero-sum games with two players. For this algorithm each iteration consists of four steps. In each of these steps there is a computational error. We suppose that these computational errors are different and prove two results. In the first theorem, which is a generalization of Theorem 2.11 of [92] obtained in the case when all the computational errors are the same, we study the games on bounded sets. In the second result, which has no prototype in [92], we deal with the games on unbounded sets. In Chap. 3 we analyze the mirror descent algorithm for minimization of convex and nonsmooth functions, under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the objective function while in the second one we solve an auxiliary minimization problem on the set of feasible points. In each of these two steps there is a computational error. In general, these two computational errors are different. We begin with minimization problems on bounded sets and generalize Theorem 3.1 of [92] proved in the case when the computational errors for the two steps of an iteration are the same. We also consider minimization problems on unbounded sets and generalize Theorem 3.3 of [92] proved in the case when the computational errors for the two steps of an iteration are the same and prove two results which have no prototype in [92]. Finally, in Chap. 3 we study the mirror descent algorithm for zero-sum games with two players. For this algorithm each iteration consists of four steps. In each of these steps there is a computational error. We suppose that these computational errors are different and prove two theorems. In the first result, which is a generalization of Theorem 3.4 of [92] obtained in the case when all the computational errors are the same, we study the games on bounded sets. In the second result, which has no prototype in [92], we deal with the games on unbounded sets. In Chap. 4 we analyze the projected gradient algorithm with a smooth objective function under the presence of computational errors. For this algorithm each iteration also consists of two steps. The first step is a calculation of a gradient of the objective function while in the second one we calculate a projection on the feasible set. In each of these two steps there is a computational error. Our first result in this chapter, for minimization problems on bounded sets, is a generalization of Theorem 4.2 of [92] proved in the case when the computational errors for the two steps of an iteration are the same. We also consider minimization problems on unbounded sets and generalize Theorem 4.5 of [92] proved in the case when the computational errors for the two steps of an iteration are the same and prove
Preface
vii
two results which have no prototype in [92]. In Chap. 5 we consider an algorithm, which is an extension of the projection gradient algorithm used for solving linear inverse problems arising in signal/image processing. This algorithm is used for minimization of the sum of two given convex functions, and each of its iteration consists of two steps. In each of these two steps there is a computational error. These two computational errors are different. We generalize Theorem 5.1 of [92] obtained in the case when the computational errors for the two steps of an iteration are the same and prove two results which have no prototype in [92]. In Chap. 6 we study continuous subgradient method and continuous subgradient projection algorithm for minimization of convex nonsmooth functions and for computing the saddle points of convex–concave functions, under the presence of computational errors. For these algorithms each iteration consists of a few calculations, and for each of these calculations there is a computational error produced by our computer system. In general, these computational errors are different. All the results of this chapter have no prototype in [92]. In Chaps. 7–12 we analyze several algorithms under the presence of computational errors, which were not considered in [92]. Again, each step of an iteration has a computational error, and we take into account that these errors are, in general, different. An optimization problem with a composite objective function is studied in Chap. 7. A zero-sum game with two players is considered in Chap. 8. A predicted decrease approximation-based method is used in Chap. 9 for constrained convex optimization. Chapter 10 is devoted to minimization of quasiconvex functions. Minimization of sharp weakly convex functions is discussed in Chap. 11. Chapter 12 is devoted to a generalized projected subgradient method for minimization of a convex function over a set which is not necessarily convex. The author believes that this book will be useful for researches interested in the optimization theory and its applications. Rishon LeZion, Israel May 19, 2019
Alexander J. Zaslavski
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Subgradient Projection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Mirror Descent Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Gradient Algorithm with a Smooth Objective Function . . . . . . . . . . 1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 10 18 22
2
Subgradient Projection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 A Convex Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Main Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Proof of Theorem 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Subgradient Algorithm on Unbounded Sets . . . . . . . . . . . . . . . . . . . . . . . 2.6 Proof of Theorem 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Proof of Theorem 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Proof of Theorem 2.12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Zero-Sum Games with Two Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Proof of Proposition 2.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Zero-Sum Games on Bounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Zero-Sum Games on Unbounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13 Proof of Theorem 2.16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.14 An Example for Theorem 2.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 25 27 31 33 34 42 45 49 53 58 66 71 77 80
3
The Mirror Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Optimization on Bounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Main Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Optimization on Unbounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Proof of Theorem 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Proof of Theorem 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Zero-Sum Games on Bounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Zero-Sum Games on Unbounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83 83 87 90 91 96 100 105 110 116 ix
x
Contents
4
Gradient Algorithm with a Smooth Objective Function . . . . . . . . . . . . . . . 4.1 Optimization on Bounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The Main Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Optimization on Unbounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127 127 129 130 136 137
5
An Extension of the Gradient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Preliminaries and the Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 The First Extension of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 The Second Extension of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151 151 154 161 163 167
6
Continuous Subgradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Bochner Integrable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Convergence Analysis for Continuous Subgradient Method . . . . . 6.3 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Proof of Theorem 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Continuous Subgradient Method for Zero-Sum Games. . . . . . . . . . . 6.6 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Proof of Theorem 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Continuous Subgradient Projection Method . . . . . . . . . . . . . . . . . . . . . . . 6.9 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Proof of Theorem 6.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11 Continuous Subgradient Projection Method on Unbounded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.12 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.13 The Convergence Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.14 Subgradient Projection Algorithm for Zero-Sum Games . . . . . . . . . 6.15 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.16 A Convergence Result for Games on Bounded Sets . . . . . . . . . . . . . . 6.17 A Convergence Result for Games on Unbounded Sets . . . . . . . . . . .
173 173 174 177 178 181 184 188 193 195 196
7
An Optimization Problems with a Composite Objective Function. . . . 7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Algorithm and Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Proof of Theorem 7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Proof of Theorem 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
243 243 245 248 254 257
8
A Zero-Sum Game with Two Players. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 The Algorithm and the Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Proof of Theorem 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259 259 263 271
201 202 207 215 216 225 234
Contents
xi
9
PDA-Based Method for Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Preliminaries and the Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Proof of Theorem 9.2 and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
277 277 281 285
10
Minimization of Quasiconvex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
287 287 288 290
11
Minimization of Sharp Weakly Convex Functions. . . . . . . . . . . . . . . . . . . . . . 11.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Subdifferential of Weakly Convex Functions. . . . . . . . . . . . . . . . . 11.3 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 The First Main Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 An Algorithm with Constant Step Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 The Second Main Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Convex Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9 An Auxiliary Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.10 Proof of Theorem 11.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
295 295 297 298 301 308 309 313 315 316 318
12
A Projected Subgradient Method for Nonsmooth Problems . . . . . . . . . . 12.1 Preliminaries and Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 An Auxiliary Result with Assumption A2 . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 An Auxiliary Result with Assumption A3 . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Proof of Theorem 12.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Proof of Theorem 12.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
321 321 324 329 335 338 346
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Chapter 1
Introduction
In this book we study behavior of algorithms for constrained convex minimization problems in a Hilbert space. Our goal is to obtain a good approximate solution of the problem in the presence of computational errors. It is known that the algorithm generates a good approximate solution, if the sequence of computational errors is bounded from above by a small constant. In our study, presented in this book, we take into consideration the fact that for every algorithm its iteration consists of several steps and that computational errors for different steps are different, in general. In this chapter we discuss several algorithms which are studied in this book.
1.1 Subgradient Projection Method In Chap. 2 we study the subgradient projection algorithm for minimization of convex and nonsmooth functions and for computing the saddle points of convex– concave functions, under the presence of computational errors. It should be mentioned that the subgradient projection algorithm is one of the most important tools in the optimization theory, nonlinear analysis, and their applications. See, for example, [56, 58, 64, 65, 67–73, 76, 79, 80, 83, 85, 89–92, 94] and the references mentioned therein. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is the calculation of a subgradient of the objective function while in the second one we calculate the projection on the feasible set. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this. © Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_1
1
2
1 Introduction
We use the subgradient projection algorithm for constrained minimization problems in Hilbert spaces equipped with an inner product denoted by ·, · which induces a complete norm · . In this book we use the following notation. For every z ∈ R 1 denote by z the largest integer which does not exceed z: z = max{i ∈ R 1 : i is an integer and i ≤ z}. For every nonempty set D, every function f : D → R 1 , and every nonempty set C ⊂ D we set inf(f, C) = inf{f (x) : x ∈ C} and argmin(f, C) = argmin{f (x) : x ∈ C} = {x ∈ C : f (x) = inf(f, C)}. Let X be a Hilbert space equipped with an inner product denoted by ·, · which induces a complete norm · . For each x ∈ X and each r > 0 set BX (x, r) = {y ∈ X : x − y ≤ r} and for each x ∈ X and each nonempty set E ⊂ X set d(x, E) = inf{x − y : y ∈ E}. For each nonempty open convex set U ⊂ X and each convex function f : U → R 1 , for every x ∈ U set ∂f (x) = {l ∈ X : f (y) − f (x) ≥ l, y − x for all y ∈ U }, which is called the subdifferential of the function f at the point x [61, 62, 81]. Let C be a nonempty closed convex subset of X, U be an open convex subset of X such that C ⊂ U , and f : U → R 1 be a convex function. Suppose that there exist L > 0, M0 > 0 such that C ⊂ BX (0, M0 ), |f (x) − f (y)| ≤ Lx − y for all x, y ∈ U. It is not difficult to see that for each x ∈ U , ∅ = ∂f (x) ⊂ BX (0, L).
1.1 Subgradient Projection Method
3
For every nonempty closed convex set D ⊂ X and every x ∈ X there is a unique point PD (x) ∈ D satisfying x − PD (x) = inf{x − y : y ∈ D}. We consider the minimization problem f (z) → min, z ∈ C. Suppose that {ak }∞ k=0 ⊂ (0, ∞). Let us describe our algorithm. Subgradient projection algorithm Initialization: select an arbitrary x0 ∈ U . Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ ∂f (xt ) and the next iteration vector xt+1 = PC (xt − at ξt ). In [92] we study this algorithm under the presence of computational errors. Namely, in [92] we suppose that δ ∈ (0, 1] is a computational error produced by our computer system, and study the following algorithm. Subgradient projection algorithm with computational errors Initialization: select an arbitrary x0 ∈ U . Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ ∂f (xt ) + BX (0, δ) and the next iteration vector xt+1 ∈ U such that xt+1 − PC (xt − at ξt ) ≤ δ. In Chap. 2 of this book we consider more complicated, but more realistic, version of this algorithm. Clearly, for the algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the objective function f while in the second one we calculate a projection on the set C. In each of these two steps there is a computational error produced by our computer system. In general, these two computational errors are different. This fact is taken into account in the following projection algorithm studied in Chap. 2 of this book. Suppose that {ak }∞ k=0 ⊂ (0, ∞) and δf , δC ∈ (0, 1]. Initialization: select an arbitrary x0 ∈ U . Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ ∂f (xt ) + BX (0, δf )
4
1 Introduction
and the next iteration vector xt+1 ∈ U such that xt+1 − PC (xt − at ξt ) ≤ δC . Note that in practice for some problems the set C is simple but the function f is complicated. In this case δC is essentially smaller than δf . On the other hand, there are cases when f is simple but the set C is complicated and therefore δf is much smaller than δC . In Chap. 2 we prove the following result (see Theorem 2.4). Theorem 1.1 Let δf , δC ∈ (0, 1], {ak }∞ k=0 ⊂ (0, ∞) and let x∗ ∈ C satisfy f (x∗ ) ≤ f (x) for all x ∈ C. ∞ Assume that {xt }∞ t=0 ⊂ U , {ξt }t=0 ⊂ X,
x0 ≤ M0 + 1 and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PC (xt − at ξt ) ≤ δC . Then for each natural number T , T
at (f (xt ) − f (x∗ ))
t=0
≤ 2−1 x∗ − x0 2 + δC (T + 1)(4M0 + 1) +δf (2M0 + 1)
T t=0
Moreover, for each natural number T ,
at + 2−1 (L + 1)2
T t=0
at2 .
1.1 Subgradient Projection Method
f ((
T
at )−1
t=0
T
5
at xt ) − f (x∗ ), min{f (xt ) : t = 0, . . . , T } − f (x∗ )
t=0
≤ 2−1 (
T
at )−1 x∗ − x0 2 + (
t=0
T
at )−1 δC (T + 1)(4M0 + 1)
t=0
+δf (2M0 + 1) + 2−1 (
T
at )−1 (L + 1)2
T
t=0
at2 .
t=0
Theorem 1.1 is proved in Sect. 2.4. It is a generalization of Theorem 2.4 of [92] proved in the case when δf = δC . We are interestedin an optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. It is shown in Chap. 2 that the best choice is at = (T + 1)−1 AT , t = 0, . . . , T . Let T be a natural number and at = a > 0, t = 0, . . . , T . It is shown in Chap. 2 that the best choice of a is a = (2δC (4M0 + 1))1/2 (L + 1)−1 . Now we can think about the best choice of T . It is not difficult to see that it should be at the same order as δC−1 . In Chap. 2 we also study the subgradient algorithm for minimization problems on unbounded sets. Let D be a nonempty closed convex subset of X, V be an open convex subset of X such that D ⊂ V, and f : V → R 1 be a convex function which is Lipschitz on all bounded subsets of V . Set Dmin = {x ∈ D : f (x) ≤ f (y) for all y ∈ D}. We suppose that Dmin = ∅. In Chap. 2 we will prove the following result. Theorem 1.2 Let δf , δC ∈ (0, 1], M > 0 satisfy Dmin ∩ BX (0, M) = ∅, M0 ≥ 4M + 4,
6
1 Introduction
L > 0 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2), 0 < τ0 ≤ τ1 ≤ (L + 1)−1 , 0 = 2τ0−1 δC (4M0 + 1) + 2δf (2M0 + 1) + 2τ1 (L + 1)2 and let n0 = τ0−1 (2M + 2)2 0−1 . ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
{at }∞ t=0 ⊂ [τ0 , τ1 ], x0 ≤ M and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PD (xt − at ξt ) ≤ δC . Then there exists an integer q ∈ [1, n0 + 1] such that xi ≤ 3M + 2, i = 0, . . . , q and f (xq ) ≤ f (x) + 0 for all x ∈ D. Theorem 1.2 is a generalization of Theorem 2.8 of [92] which was proved in the case when δC = δf . It is proved in Sect. 2.6. We are interested in the best choice of at , t = 0, 1, . . . . Assume for simplicity that τ1 = τ0 = τ . In order to meet our goal we need to minimize the function 2τ −1 δC (4M0 + 1) + 2(L + 1)2 τ, τ ∈ (0, ∞). This function has a minimizer τ = (δC (4M0 + 1))1/2 (L + 1)−1 ,
1.1 Subgradient Projection Method
7
the minimal value of 0 is 2δf (2M0 + 1) + 4(δC (4M0 + 1))1/2 (L + 1), and n0 = Δ where −1/2
Δ = (2M + 2)2 (4M0 + 1)−1/2 (L + 1)δC
(2δf (2M0 + 1)
+4δC (4M0 + 1)1/2 (L + 1))−1 . 1/2
Note that in the theorem above δf , δC are the computational errors produced by our computer system. In view of the inequality above, in order to obtain a good approximate solution we need −1/2
c1 δC
max{δf , δC }−1 + 1 1/2
iterations, where c1 is a constant which depends only on M, M0 , L. As a result, we obtain a point ξ ∈ V such that BX (ξ, δC ) ∩ D = ∅ and 1/2
f (ξ ) ≤ inf{f (x) : x ∈ D} + 2δf (2M0 + 1) + (4M0 + 1)1/2 4(L + 1)δC . The next result, which is proved in Sect. 2.7, does not have a prototype in [92]. Theorem 1.3 Let δf , δC ∈ (0, 1), M > 1, L > 0 and let {at }∞ t=0 ⊂ (0, ∞) be such that {x ∈ V : f (x) ≤ inf(f, D) + 3} ⊂ BX (0, M − 1), |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8), δf ≤ (6M + 5)−1 and δC (12M + 9) ≤ at ≤ (L + 1)−2 for all integers t ≥ 0. ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅
8
1 Introduction
and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PD (xt − at ξt ) ≤ δC . Then xt ≤ 3M for all integers t ≥ 0 and for each natural number T , T
at (f (xt ) − inf(f, D))
t=0
≤ 2M 2 + δC (T + 1)(12M + 9) +2−1 (L + 1)2
T
at2 + δf (6M + 5)
t=0
T
at .
t=0
Moreover, for each natural number T , f ((
T
at )−1
t=0
T
at xt ) − inf(f, D),
t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D) ≤ 2M 2 (
T
T at )−1 + δC (T + 1)(12M + 9)( at )−1
t=0
+2−1 (L + 1)2
t=0 T t=0
at2 (
T
at )−1 + δf (6M + 5).
t=0
We are interestedin an optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. It is shown in Chap. 2 that the best choice is at = (T + 1)−1 A. t = 0, . . . , T . Let T be a natural number and at = a > 0, t = 0, . . . , T . It is shown in Chap. 2 that the best choice of a is
1.1 Subgradient Projection Method
9
a = (2δC (12M + 9))1/2 (L + 1)−1 . Now we can think about the best choice of T . It is shown that it should be at the same order as δC−1 The next result, which is proved in Sect. 2.8, does not have a prototype in [92]. Theorem 1.4 Let δf , δC ∈ (0, 16−1 ), M > 4, L > 0, Dmin ∩ BX (0, M) = ∅, |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 3M + 6), a = (L + 1)−1 δC
1/2
and −1/2 −1 δf } − 1.
T = 6−1 min{δC−1 , δC ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅ and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PD (xt − at ξt ) ≤ δC . Then T (T + 1)−1 f (xt ) − inf(f, D), t=0
f(
T (T + 1)−1 xt ) − inf(f, D), min{f (xt ) : t = 0, . . . , T } − inf(f, D) t=0 1/2
≤ 12(2M + 2)2 (L + 1) max{δC , δf }.
10
1 Introduction
In Chap. 2 we also study the subgradient projection algorithm for zero-sum games with two players. For this algorithm each iteration consists of four steps. In each of these steps there is a computational error. We suppose that these computational errors are different and prove two results: Theorems 2.15 and 2.16. In Theorem 2.15, which is a generalization of Theorem 2.11 of [92] obtained in the case when all the computational errors are the same, we study the games on bounded sets. In Theorem 2.16, which has no prototype in [92], we deal with the games on unbounded sets.
1.2 The Mirror Descent Method In Chap. 3 we analyze the mirror descent algorithm for minimization of convex and nonsmooth functions and for computing the saddle points of convex–concave functions, under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the objective function while in the second one we solve an auxiliary minimization problem on the set of feasible points. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this. Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let C be a nonempty closed convex subset of X, U be an open convex subset of X such that C ⊂ U , and let f : U → R 1 be a convex function. Suppose that there exist L > 0, M0 > 0 such that C ⊂ BX (0, M0 ), |f (x) − f (y)| ≤ Lx − y for all x, y ∈ U. We study the convergence of the mirror descent algorithm under the presence of computational errors. This method was introduced by Nemirovsky and Yudin for solving convex optimization problems [66]. Here we use a derivation of this algorithm proposed by Beck and Teboulle [17]. Let δf , δC ∈ (0, 1] and {ak }∞ k=0 ⊂ (0, ∞). We describe the inexact version of the mirror descent algorithm. Mirror descent algorithm Initialization: select an arbitrary x0 ∈ U . Iterative Step: given a current iteration vector xt ∈ U calculate
1.2 The Mirror Descent Method
11
ξt ∈ ∂f (xt ) + BX (0, δf ), define gt (x) = ξt , x + (2at )−1 x − xt 2 , x ∈ X and calculate the next iteration vector xt+1 ∈ U such that BX (xt+1 , δC ) ∩ argmin{gt (y) : y ∈ C} = ∅. Here δf is a computational error produced by our computer system when we calculate a subgradient of f and δC is a computational error produced by our computer system when we calculate a minimizer of the function gt . Note that gt is a convex bounded from below function on X which possesses a minimizer on C. In Sect. 3.3 we prove the following result. Theorem 1.5 Let δf , δC ∈ (0, 1], {ak }∞ k=0 ⊂ (0, ∞) and let x∗ ∈ C satisfies f (x∗ ) ≤ f (x) for all x ∈ C. ∞ Assume that {xt }∞ t=0 ⊂ U , {ξt }t=0 ⊂ X,
x0 ≤ M0 + 1 and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ), gt (z) = ξt , z + (2at )−1 z − xt 2 , z ∈ X and BX (xt+1 , δC ) ∩ argmin(gt , C) = ∅. Then for each natural number T , T t=0
at (f (xt ) − f (x∗ ))
12
1 Introduction
≤ 2−1 (2M0 + 1)2 + (δf (2M0 + 1) + δC (L + 1))
T
at
t=0
+4δC (T + 1)(2M0 + 1) + 2−1 (L + 1)2
T
at2 .
t=0
Moreover, for each natural number T , f ((
T
at )−1
t=0
T
at xt ) − f (x∗ ),
t=0
min{f (xt ) : t = 0, . . . , T } − f (x∗ ) −1
≤2
(2M0 + 1) ( 2
T
at )−1 + δf (2M0 + 1) + δC (L + 1)
t=0
+4δC (T + 1)(2M0 + 1)(
T
at )−1
t=0
+2−1 (L + 1)2
T
at2 (
t=0
T
at )−1 .
t=0
This theorem is a generalization of Theorem 3.1 of [92] proved in the case when δf = δC . We are interested in an optimal choice of at , t = 0, 1, . . . . Let T be a natural T number and AT = t=0 at be given. It is shown that the best choice is at = (T + 1)−1 AT , t = 0, . . . , T . Let T be a natural number and at = a, t = 0, . . . , T . It is shown that the best choice of a > 0 is a = (8δC (2M0 + 1))1/2 (L + 1)−1 and that the best choice of T should be at the same order as δC−1 . As a result, we obtain a point ξ ∈ U such that BX (ξ, δC ) ∩ C = ∅ and 1/2
f (ξ ) ≤ f (x∗ ) + c1 δC + δf (2M0 + 1), where the constant c1 > 0 depends only on L and M0 .
1.2 The Mirror Descent Method
13
In Chap. 3 we also use the mirror descent algorithm for minimization problems of unbounded sets. Let D be a nonempty closed convex subset of X, V be an open convex subset of X such that D ⊂ V, and f : V → R 1 be a convex function which is Lipschitz on all bounded subsets of V . Set Dmin = {x ∈ D : f (x) ≤ f (y) for all y ∈ D}. We suppose that Dmin = ∅. In Sect. 3.5 we will prove the following result. Theorem 1.6 Let δf , δC ∈ (0, 1], M > 1 satisfy Dmin ∩ BX (0, M) = ∅, M0 > 80M + 6, L > 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2), 0 < τ0 ≤ τ1 ≤ (4L + 4)−1 , 0 = 8τ0−1 δC (M0 + 1) + 2δC (L + 1) +2δf (2M0 + 1) + τ1 (L + 1)2 and let n0 = τ0−1 (2M + 2)2 0−1 + 1. Assume that ∞ ∞ {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X, {at }t=0 ⊂ [τ0 , τ1 ],
x0 ≤ M
14
1 Introduction
and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ), gt (v) = ξt , v + (2at )−1 v − xt 2 , v ∈ X and BX (xt+1 , δC ) ∩ argmin(gt , D) = ∅. Then there exists an integer q ∈ [1, n0 ] such that f (xq ) ≤ inf(f ; D) + 0 , xt ≤ 15M + 1, t = 0, . . . , q. This theorem is a generalization of Theorem 3.3 of [92] proved with δf = δC . We are interested in the best choice of at , t = 0, 1, . . . . Assume for simplicity that τ1 = τ0 . In order to meet our goal we need to minimize 0 which obtains its minimal value when τ0 = (8δC (M0 + 1))1/2 (L + 1)−1 and the minimal value of 0 is 2δC (L + 1) + 2δf (2M0 + 1) + 2(8δC (M0 + 1))1/2 (L + 1). 1/2
Thus 0 is at the same order as max{δC , δf }. It is not difficult to see that n0 is at −1/2 1/2 the same order as δC max{δC , δf }−1 . In Chap. 3 we will also prove the following two theorems which have no prototype in [92]. Theorem 1.7 Let δf , δC ∈ (0, 1), M > 0 satisfy δf ≤ (24M + 48)−1 , δC ≤ 8−1 (L + 1)−2 (16M + 16)−1 , {x ∈ V : f (x) ≤ inf(f, D) + 4} ⊂ BX (0, M), M0 = 12M + 12, L > 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2)
1.2 The Mirror Descent Method
15
and let {at }∞ t=0 ⊂ (0, ∞) satisfy for all integers t ≥ 0, 4δC (M0 + 1) ≤ at ≤ (2L + 2)−1 (L + 1)−1 . Assume that ∞ {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ), gt (v) = ξt , v + (2at )−1 v − xt 2 , v ∈ X and BX (xt+1 , δC ) ∩ argmin(gt , D) = ∅. Then xt ≤ 5M + 3 for all integers t ≥ 0 and for each natural number T , T
at (f (xt ) − inf(f, D))
t=0
≤ 2M 2 + (δf (2M0 + 1) + δC (L + 1))
T
at
t=0
−1
+4δC (T + 1)(M0 + 1) + 2
(L + 1)
2
T t=0
Moreover, for each natural number T , f ((
T t=0
at )−1
T
at xt ) − inf(f, D),
t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D)
at2 .
16
1 Introduction
≤ 2M 2 (
T
at )−1 + δf (2M0 + 1) + δC (L + 1)
t=0
+4δC (T + 1)(M0 + 1)(
T
at )−1 + 2−1 (L + 1)2
t=0
T
at2 (
t=0
T
at )−1 .
t=0
We are interested in the best choice of at , t = 0, 1, . . . . Let T be a natural T number and AT = t=0 at be given. It is shown that the best choice is at = −1 (T + 1) AT , i = 0, . . . , T . Let T be a natural number and at = a, t = 0, . . . , T . The best choice of a > 0 is a = (8δC (M0 + 1))1/2 (L + 1)−1 and the best choice of T is at the same order as δC−1 . As a result, we obtain a point ξ ∈ U such that BX (ξ, δC ) ∩ C = ∅ and 1/2
f (ξ ) ≤ inf(f, D) + c1 δC + δf (2M0 + 1), where the constant c1 > 0 depends only on L and M0 . Theorem 1.8 Let δf , δC ∈ (0, 1), M > 8 satisfy Dmin ∩ BX (0, M) = ∅, L > 0 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 8M + 8), a = (4L + 4)−1 δC , 1/2
−1/2
T = 8−1 δC
max{δf , δC }−1 . 1/2
1.2 The Mirror Descent Method
17
Assume that ∞ {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅ and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ), gt (v) = ξt , v + (2at )−1 v − xt 2 , v ∈ X and BX (xt+1 , δC ) ∩ argmin(gt , D) = ∅. Then xt ≤ 3M + 2 for all integers t ∈ [0, T ] and T (T + 1)−1 f (xt ) − inf(f, D), t=0
f ((T + 1)−1
T
xt ) − inf(f, D), min{f (xt ) : t = 0, . . . , T } − inf(f, D)
t=0 1/2
≤ 64M 2 (L + 1) max{δf , δC }. In Chap. 3 we also study the mirror descent algorithm for zero-sum games with two players. For this algorithm each iteration consists of four steps. In each of these steps there is a computational error. We suppose that these computational errors are different and prove two results: Theorems 3.6 and 3.7. In Theorem 3.6, which is a generalization of Theorem 3.4 of [92] obtained in the case when all the computational errors are the same, we study the games on bounded sets. In Theorem 3.7, which has no prototype in [92], we deal with the games on unbounded sets.
18
1 Introduction
1.3 Gradient Algorithm with a Smooth Objective Function In Chap. 4 we analyze the convergence of a projected gradient algorithm with a smooth objective function under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a gradient of the objective function while in the second one we calculate a projection on the feasible set. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this. Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let C be a nonempty closed convex subset of X, U be an open convex subset of X such that C ⊂ U , and let f : U → R 1 be a convex continuous function. We suppose that the function f is Fréchet differentiable at every point x ∈ U and for every x ∈ U we denote by f (x) ∈ X the Fréchet derivative of f at x. It is clear that for any x ∈ U and any h ∈ X f (x), h = lim t −1 (f (x + th) − f (x)). t→0
We suppose that the mapping f : U → X is Lipschitz on all bounded subsets of U . It is well known (see Lemma 2.2) that for each nonempty closed convex set D ⊂ X and each x ∈ X there exists a unique point PD (x) ∈ D such that x − PD (x) = inf{x − y : y ∈ D}. Suppose that there exist L > 1, M0 > 0 such that C ⊂ BX (0, M0 ), |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ U, f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ U. Let δf , δC ∈ (0, 1]. We describe below our algorithm. Gradient algorithm Initialization: select an arbitrary x0 ∈ U ∩ BX (0, M0 ). Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ f (xt ) + BX (0, δf )
1.3 Gradient Algorithm with a Smooth Objective Function
19
and calculate the next iteration vector xt+1 ∈ U such that xt+1 − PC (xt − L−1 ξt ) ≤ δC . In Chap. 4 we prove the following result which is a generalization of Theorem 4.2 of [92] proved in the case when δf = δC . Theorem 1.9 Let δf , δC ∈ (0, 1] and let x0 ∈ U ∩ BX (0, M0 ). ∞ Assume that {xt }∞ t=1 ⊂ U , {ξt }t=0 ⊂ X and that for each integer t ≥ 0,
ξt − f (xt ) ≤ δ and xt+1 − PC (xt − L−1 ξt ) ≤ δC . Then for each natural number T , min{f (xt ) : t = 2, . . . , T + 1} − inf(f, C), T +1
f(
T −1 xt ) − inf(f, C)
t=2
≤ (2T )−1 L(2M0 + 1)2 + LδC (6M0 + 7) + δf (4M0 + 4). We are interested in an optimal choice of T . If we choose T in order to minimize the right-hand side of the estimation in the theorem above, we obtain that T should be at the same order as max{δf , δC }−1 . In this case the right-hand side of the estimation is at the same order as max{δf , δC }. In Chap. 4 we also study optimization on unbounded sets. Let D be a nonempty closed convex subset of X, V be an open convex subset of X such that D ⊂ V, and f : V → R 1 be a convex Fréchet differentiable function which is Lipschitz on all bounded subsets of V . Set Dmin = {x ∈ D : f (x) ≤ f (y) for all y ∈ D}.
20
1 Introduction
We suppose that Dmin = ∅. We will prove the following result which is a generalization of Theorem 4.5 of [92] proved in the case when δf = δC . Theorem 1.10 Let δf , δC ∈ (0, 1], M > 0, Dmin ∩ BX (0, M) = ∅, M0 = 4M + 8, L ≥ 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2), f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2), 0 = δf (4M0 + 6) + δC L(6M0 + 8) and let n0 = 2−1 L(2M + 1)2 (δf + LδC )−1 + 1. ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M and that for each integer t ≥ 0, ξt − f (xt ) ≤ δf and xt+1 − PD (xt − L−1 ξt ) ≤ δC . Then there exists an integer q ∈ [1, n0 + 1] such that f (xq ) ≤ inf(f, D) + 0 ,
1.3 Gradient Algorithm with a Smooth Objective Function
21
xi ≤ 3M + 3, i = 0, . . . , q. The following two results have no prototype in [92]. Theorem 1.11 Let δf , δC ∈ (0, 1], M > 2 satisfy {x ∈ V : f (x) ≤ inf(f, D) + 3} ⊂ BX (0, M − 2), L ≥ 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8), f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8), δC ≤ (L(18M + 19))−1 , δf ≤ (12M + 3)−1 . ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M and that for each integer t ≥ 0, ξt − f (xt ) ≤ δf and xt+1 − PD (xt − L−1 ξt ) ≤ δC . Then xt ≤ 3M for all integers t ≥ 0 and for each natural number T , T +1
min{f (xt ) : t = 1, . . . , T + 1} − inf(f, D), f (
(T + 1)−1 xt ) − inf(f, D)
t=1
≤ LδC (18M + 19) + δf (12M + 13) + 2(T + 1)−1 M 2 L. It is clear that a best choice of T should be at the same order as max{δC , δf }−1 . Theorem 1.12 Let δf , δC ∈ (0, 1), M > 1, δC ≤ (160M)−1 , δf ≤ 120−1 ,
22
1 Introduction
Dmin ∩ BX (0, M) = ∅, L ≥ 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 10), f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 10), T = 36−1 min{δC−1 , Lδf−1 }. ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M and that for each integer t ≥ 0, ξt − f (xt ) ≤ δf and xt+1 − PD (xt − L−1 ξt ) ≤ δC . Then xt ≤ 3M for all integers t = 0, . . . , T + 1 and T (T + 1)−1 (f (xt+1 ) − inf(f, D)) t=0 T +1
min{f (xt ) : t = 1, . . . , T + 1} − inf(f, D), f (
(T + 1)−1 xt ) − inf(f, D)
t=1
≤ 72(M + 1)2 max{δC L, δf }.
1.4 Examples Example 1.13 Let X = R n , C = {x = (x1 , . . . , xn ) ∈ R n : xi ∈ [0, 1], i = 1, . . . , n}, U = {x ∈ R n : x < 2n1/2 }
1.4 Examples
23
and let for all x ∈ R n , f (x) = Ax, x + b, x, where A = (ai,j )ni,j =1 is a symmetric positive semidefinite matrix and b ∈ R n . Clearly, the function f is convex and ∇f (x) = b + 2Ax for all x ∈ R n . We consider the minimization problem f (x) → min, x ∈ C and use the subgradient projection method taking into account computational errors (see Sect. 1.2). It is clear that for all x = (x1 , . . . , xn ) ∈ R n , PC (x) = (min{max{x1 , 0}, 1}, . . . , min{max{xi , 0}, 1}, . . . , min{max{xn , 0}, 1}). Assume that our computer system has an accuracy δ∗ . Then it is not difficult to see that in our case δC = δ∗ n1/2 , M0 = 2n1/2 , δf = 2δ∗ n max{|ai,j | : i, j = 1, . . . , n}, L = b + 4(
n
2 1//2 1/2 ai,j ) n .
i,j =1
We apply Theorem 1.1 and obtain that for all integers t ≥ 0, at = a = (2δC (4M0 + 1))1/2 (L + 1)−1 = (2δ∗ n
1/2
1/2
(8n
+ 1))
1/2
(b + 1 + 4(
n
2 1/2 1/2 −1 ai,j ) n )
i,j =1
and T = δC−1 = δ∗−1 n−1/2 . Example 1.14 Consider the previous example and apply the gradient projection algorithm with the smooth objective function (see Theorem 1.9) and with the same C, Uf, A, b, δ∗ . It is not difficult to see that M0 = 2n1/2 , L = b + 4(
n i,j =1
2 1//2 1/2 ai,j ) n ,
24
1 Introduction
δC = δ∗ n1/2 , δf = 2δ∗ n max{|ai,j | : i, j = 1, . . . , n}. According to Theorem 1.9, max{δC , δf }−1 iterates should be done. As a result we obtain a point z ∈ R n such that d(z, C) ≤ δC and f (z) ≤ inf(f, C) + c0 max{δf , δC }, where c0 is a positive constant. Its value can be easily calculated using Theorem 1.9. Example 1.15 Consider the minimization problem from the previous two examples applying Theorem 1.9. Assume that we do 103 iterates. By Theorem 1.9, we obtain z ∈ R n such that d(z, C) ≤ δC and f (z) ≤ inf(f, C) + 2−1 10−3 L(2M0 + 1)2 + LδC (6M0 + 7) + δf (4M0 + 4).
Chapter 2
Subgradient Projection Algorithm
In this chapter we study the subgradient projection algorithm for minimization of convex and nonsmooth functions and for computing the saddle points of convex– concave functions, under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the objective function while in the second one we calculate a projection on the feasible set. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
2.1 Preliminaries The subgradient projection algorithm is one of the most important tools in the optimization theory, nonlinear analysis, and their applications. See, for example, [1–3, 11, 15, 17, 22, 27, 29, 30, 32, 35, 36, 38, 39, 43, 46, 53–55] and the references mentioned therein. In this chapter we use this method for constrained minimization problems in Hilbert spaces. Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let C be a nonempty closed convex subset of X, U be an open convex subset of X such that C ⊂ U , and f : U → R 1 be a convex function. Suppose that there exist L > 0, M0 > 0 such that
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_2
25
26
2 Subgradient Projection Algorithm
C ⊂ BX (0, M0 ),
(2.1)
|f (x) − f (y)| ≤ Lx − y for all x, y ∈ U.
(2.2)
In view of (2.2), for each x ∈ U , ∅ = ∂f (x) ⊂ BX (0, L).
(2.3)
It is easy to see that the following result is true. Lemma 2.1 Let z, y0 , y1 ∈ X. Then z − y0 2 − z − y1 2 − y0 − y1 2 = 2z − y1 , y1 − y0 . The next result is well known in the literature [12, 13, 92, 93]. Lemma 2.2 Let D be a nonempty closed convex subset of X. Then for each x ∈ X there is a unique point PD (x) ∈ D satisfying x − PD (x) = inf{x − y : y ∈ D}. Moreover, PD (x) − PD (y)|| ≤ x − y for all x, y ∈ X and for each x ∈ X and each z ∈ D, z − PD (x), x − PD (x) ≤ 0, z − PD (x)2 + x − PD (x)2 ≤ z − x2 . For the proof of the next simple result see Lemma 2.3 of [92]. Lemma 2.3 Let A > 0 and n ≥ 2 be an integer. Then the minimization problem n
ai2 → min
i=1
a = (a1 , . . . , an ) ∈ R n and
n
ai = A
i=1
has a unique solution a ∗ = (a1∗ , . . . , an∗ ) where ai∗ = n−1 A, i = 1, . . . , n.
2.2 A Convex Minimization Problem
27
2.2 A Convex Minimization Problem Suppose that δf , δC ∈ (0, 1] and {ak }∞ k=0 ⊂ (0, ∞). Let us describe our algorithm. Subgradient projection algorithm Initialization: select an arbitrary x0 ∈ U . Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ ∂f (xt ) + BX (0, δf ) and the next iteration vector xt+1 ∈ U such that xt+1 − PC (xt − at ξt ) ≤ δC . Here δf is a computational error produced by our computer system when we calculate a subgradient of f and δC is a computational error produced by our computer system when we calculate a projection on the set C. In this chapter we prove the following result. Theorem 2.4 Let δf , δC ∈ (0, 1], {ak }∞ k=0 ⊂ (0, ∞) and let x∗ ∈ C
(2.4)
f (x∗ ) ≤ f (x) for all x ∈ C.
(2.5)
satisfy
∞ Assume that {xt }∞ t=0 ⊂ U , {ξt }t=0 ⊂ X,
x0 ≤ M0 + 1
(2.6)
ξt ∈ ∂f (xt ) + BX (0, δf )
(2.7)
xt+1 − PC (xt − at ξt ) ≤ δC .
(2.8)
and that for each integer t ≥ 0,
and
Then for each natural number T ,
28
2 Subgradient Projection Algorithm T
at (f (xt ) − f (x∗ ))
t=0
≤ 2−1 x∗ − x0 2 + δC (T + 1)(4M0 + 1) + δf (2M0 + 1)
T
at + 2−1 (L + 1)2
t=0
T
at2 .
(2.9)
t=0
Moreover, for each natural number T , f ((
T
at )−1
t=0
≤ 2−1 (
T
at xt ) − f (x∗ ), min{f (xt ) : t = 0, . . . , T } − f (x∗ )
t=0 T
at )−1 x∗ − x0 2 + (
t=0
T
at )−1 δC (T + 1)(4M0 + 1)
t=0
+ δf (2M0 + 1) + 2−1 (
T t=0
at )−1 (L + 1)2
T
at2 .
(2.10)
t=0
Theorem 2.4 is proved in Sect. 2.4. It is a generalization of Theorem 2.4 of [92] proved in the case when δf = δC . We are interested in an optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. By Theorem 2.4, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function −1 2 φ(a0 , . . . , aT ) : = 2−1 A−1 T x∗ − x0 + AT δC (T + 1)(4M0 + 1) 2 +δf (2M0 + 1) + 2−1 A−1 T (L + 1)
T
at2
t=0
on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T . This is the best choice of at , t = 0, 1, . . . , T . Theorem 2.4 implies the following result.
2.2 A Convex Minimization Problem
29
Theorem 2.5 Let δf , δC ∈ (0, 1], a > 0 and let x∗ ∈ C satisfy f (x∗ ) ≤ f (x) for all x ∈ C. ∞ Assume that {xt }∞ t=0 ⊂ U , {ξt }t=0 ⊂ X,
x0 ≤ M0 + 1 and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PC (xt − aξt ) ≤ δC . Then for each natural number T , f ((T + 1)−1
T
xt ) − f (x∗ ), min{f (xt ) : t = 0, . . . , T } − f (x∗ )
t=0
≤ 2−1 (T + 1)−1 a −1 (2M0 + 1)2 + a −1 δC (4M0 + 1) +δf (2M0 + 1) + 2−1 (L + 1)2 a. Now we will find the best a > 0. Since T can be arbitrarily large, we need to find a minimizer of the function φ(a) := a −1 δC (4M0 + 1) + 2−1 (L + 1)2 a, a ∈ (0, ∞). Clearly, the minimizer a satisfies a −1 δC (4M0 + 1) = 2−1 (L + 1)2 a, a = (2δC (4M0 + 1))1/2 (L + 1)−1 and the minimal value of φ is (2δC (4M0 + 1))1/2 (L + 1). Theorem 2.5 implies the following result.
30
2 Subgradient Projection Algorithm
Theorem 2.6 Let δf , δC ∈ (0, 1], a = (2δC (4M0 + 1))1/2 (L + 1)−1 , x∗ ∈ C satisfy f (x∗ ) ≤ f (x) for all x ∈ C. ∞ Assume that {xt }∞ t=0 ⊂ U , {ξt }t=0 ⊂ X,
x0 ≤ M0 + 1 and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PC (xt − aξt ) ≤ δC . Then for each natural number T , f ((T + 1)−1
T
xt ) − f (x∗ ), min{f (xt ) : t = 0, . . . , T } − f (x∗ )
t=0
≤ 2−1 (T + 1)−1 (2M0 + 1)2 (L + 1)(2δC (4M0 + 1))−1/2 + δf (2M0 + 1) +2−1 (2δC (4M0 + 1))1/2 (L + 1) + δC (4M0 + 1)(L + 1)(2(4M0 + 1))−1/2 . 1/2
Now we can think about the best choice of T . It is clear that it should be at the same order as δC−1 . Putting T = δ −1 , we obtain that f ((T + 1)−1
T
xt ) − f (x∗ ), min{f (xt ) : t = 0, . . . , T } − f (x∗ )
t=0
≤ 2−1 (2M0 + 1)2 (L + 1)(8M0 + 2)−1/2 δC + δf (2M0 + 1) 1/2
1/2
+(L + 1)δC (8M0 + 2)1/2 . Note that in the theorems above δf is the computational error, produced by our computer system when we calculate a subgradient of f and δC is the computational
2.3 The Main Lemma
31
error, produced by our computer system when we calculate a projection on C subgradient. In view of the inequality above, which has the right-hand side bounded by 1/2 c1 δC + δf (2M0 + 1) with a constant c1 > 0, we conclude that after T = δC−1 iterations we obtain a point ξ ∈ U such that BX (ξ, δC ) ∩ C = ∅ and 1/2
f (ξ ) ≤ f (x∗ ) + c1 δC + δf (2M0 + 1), where the constant c1 > 0 depends only on L and M0 . It is interesting that the number of iterations depends only on δC .
2.3 The Main Lemma Lemma 2.7 Let δf , δC ∈ (0, 1], a > 0 and let z ∈ C.
(2.11)
x ∈ U ∩ BX (0, M0 + 1),
(2.12)
ξ ∈ ∂f (x) + BX (0, δf )
(2.13)
u∈U
(2.14)
u − PC (x − aξ ) ≤ δC .
(2.15)
Assume that
and that
satisfies
Then a(f (x) − f (z)) ≤ 2−1 z − x2 − 2−1 z − u2 +δC (4M0 + 1) + aδf (2M0 + 1) + 2−1 a 2 (L + 1)2 .
32
2 Subgradient Projection Algorithm
Proof In view of (2.13), there exists l ∈ ∂f (x)
(2.16)
l − ξ ≤ δf .
(2.17)
such that
By Lemmas 2.1 and 2.2 and (2.11), 0 ≤ z − PC (x − aξ ), PC (x − aξ ) − (x − aξ ) = z − PC (x − aξ ), PC (x − aξ ) − x +aξ, z − PC (x − aξ ) = 2−1 [z − x2 − z − PC (x − aξ )2 − x − PC (x − aξ )2 ] + aξ, z − x + aξ, x − PC (x − aξ ).
(2.18)
Clearly, |aξ, x − PC (x − aξ )| ≤ 2−1 (aξ 2 + x − PC (x − aξ )2 ).
(2.19)
It follows from (2.18) and (2.19) that 0 ≤ 2−1 [z − x2 − z − PC (x − aξ )2 − x − PC (x − aξ )2 ] +aξ, z − x + 2−1 a 2 ξ 2 + 2−1 x − PC (x − aξ )2 ≤ 2−1 z − x2 − 2−1 z − PC (x − aξ )2 + 2−1 a 2 ξ 2 + aξ, z − x.
(2.20)
Relations (2.1), (2.11) and (2.15) imply that |z − PC (x − aξ )2 − z − u2 | = |z − PC (x − aξ ) − z − u|(z − PC (x − aξ ) + z − u) ≤ u − PC (x − aξ )(4M0 + 1) ≤ (4M0 + 1)δC . By (2.1), (2.11), (2.12), and (2.17), aξ, z − x = al, z − x + a(ξ − l), z − x ≤ al, z − x + aξ − lz − x
(2.21)
2.4 Proof of Theorem 2.4
33
≤ al, z − x + aδf (2M0 + 1).
(2.22)
It follows from (2.3), (2.16), (2.17), (2.20), (2.21), and (2.22) that 0 ≤ 2−1 z − x2 − 2−1 z − PC (x − aξ )2 + 2−1 a 2 ξ 2 + aξ, z − x ≤ 2−1 z − x2 − 2−1 z − u2 + δC (4M0 + 1) + 2−1 a 2 (L + 1)2 + al, z − x + aδf (2M0 + 1).
(2.23)
By (2.16) and (2.23), a(f (z) − f (x)) ≥ al, z − x and a(f (x) − f (z)) ≤ al, x − z ≤ 2−1 z − x2 − 2−1 z − u2 + δC (4M0 + 1) + 2−1 a 2 (L + 1)2 +aδf (2M0 + 1). This completes the proof of Lemma 2.7.
2.4 Proof of Theorem 2.4 It is clear that xt ≤ M0 + 1, t = 0, 1, . . . . Let t ≥ 0 be an integer. Applying Lemma 2.7 with z = x∗ , a = at , x = xt , ξ = ξt , u = xt+1 we obtain that at (f (xt ) − f (x∗ )) ≤ 2−1 x∗ − xt 2 − 2−1 x∗ − xt+1 2 + δC (4M0 + 1) + at δf (2M0 + 1) + 2−1 at2 (L + 1)2 . By (2.24), for each natural number T ,
(2.24)
34
2 Subgradient Projection Algorithm T
at (f (xt ) − f (x∗ ))
t=0
≤
T
(2−1 x∗ − xt 2 − 2−1 x∗ − xt+1 2 )
t=0
+δC (4M0 + 1) + at δf (2M0 + 1) + 2−1 at2 (L + 1)2 ≤ 2−1 x∗ − x0 2 + δC (T + 1)(4M0 + 1) +δf (2M0 + 1)
T
at + 2−1 (L + 1)2
t=0
T
at2 .
t=0
Thus (2.9) is true. Evidently, (2.10) implies (2.9). Theorem 2.4 is proved.
2.5 Subgradient Algorithm on Unbounded Sets Let X be a Hilbert space with an inner product ·, · which induces a complete norm · , D be a nonempty closed convex subset of X, V be an open convex subset of X such that D ⊂ V,
(2.25)
and f : V → R 1 be a convex function which is Lipschitz on all bounded subsets of V . Set Dmin = {x ∈ D : f (x) ≤ f (y) for all y ∈ D}.
(2.26)
Dmin = ∅.
(2.27)
We suppose that
In this chapter we will prove the following result. Theorem 2.8 Let δf , δC ∈ (0, 1], M > 0 satisfy
L > 0 satisfy
Dmin ∩ BX (0, M) = ∅,
(2.28)
M0 ≥ 4M + 4,
(2.29)
2.5 Subgradient Algorithm on Unbounded Sets
|f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2), 0 < τ0 ≤ τ1 ≤ (L + 1)−1 , 0 = 2τ0−1 δC (4M0 + 1) + 2δf (2M0 + 1) + 2τ1 (L + 1)2
35
(2.30) (2.31) (2.32)
and let n0 = τ0−1 (2M + 2)2 0−1 .
(2.33)
∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
{at }∞ t=0 ⊂ [τ0 , τ1 ],
(2.34)
x0 ≤ M
(2.35)
ξt ∈ ∂f (xt ) + BX (0, δf )
(2.36)
xt+1 − PD (xt − at ξt ) ≤ δC .
(2.37)
and that for each integer t ≥ 0,
and
Then there exists an integer q ∈ [1, n0 + 1] such that xi ≤ 3M + 2, i = 0, . . . , q and f (xq ) ≤ f (x) + 0 for all x ∈ D. Theorem 2.8 is a generalization of Theorem 2.8 of [92] which was proved in the case when δC = δf . It is proved in Sect. 2.6. We are interested in the best choice of at , t = 0, 1, . . . . Assume for simplicity that τ1 = τ0 = τ . In order to meet our goal we need to minimize the function 2τ −1 δC (4M0 + 1) + 2(L + 1)2 τ, τ ∈ (0, ∞). This function has a minimizer τ = (δC (4M0 + 1))1/2 (L + 1)−1 , the minimal value of 0 is
36
2 Subgradient Projection Algorithm
2δf (2M0 + 1) + 4(δC (4M0 + 1))1/2 (L + 1) and n0 = Δ where −1/2
Δ = (2M + 2)2 (4M0 + 1)−1/2 (L + 1)δC
(2δf (2M0 + 1)
+4δC (4M0 + 1)1/2 (L + 1))−1 . 1/2
Note that in the theorem above δf , δC are the computational errors produced by our computer system. In view of the inequality above, in order to obtain a good approximate solution we need −1/2
c1 δC
max{δf , δC }−1 + 1 1/2
iterations, where c1 is a constant which depends only on M, M0 , L. As a result, we obtain a point ξ ∈ V such that BX (ξ, δC ) ∩ D = ∅ and 1/2
f (ξ ) ≤ inf{f (x) : x ∈ D} + 2δf (2M0 + 1) + (4M0 + 1)1/2 4(L + 1)δC . The next result, which is proved in Sect. 2.7, does not have a prototype in [92]. Theorem 2.9 Let δf , δC ∈ (0, 1), M > 1, L > 0 and let {at }∞ t=0 ⊂ (0, ∞) be such that {x ∈ V : f (x) ≤ inf(f, D) + 3} ⊂ BX (0, M − 1),
(2.38)
|f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8),
(2.39)
δf ≤ (6M + 5)−1 and δC (12M + 9) ≤ at ≤ (L + 1)−2 for all integers t ≥ 0.
(2.40)
∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅ and that for each integer t ≥ 0,
(2.41)
2.5 Subgradient Algorithm on Unbounded Sets
37
ξt ∈ ∂f (xt ) + BX (0, δf )
(2.42)
xt+1 − PD (xt − at ξt ) ≤ δC .
(2.43)
and
Then xt ≤ 3M for all integers t ≥ 0 and for each natural number T , T
at (f (xt ) − inf(f, D))
t=0
≤ 2M 2 + δC (T + 1)(12M + 9) +2−1 (L + 1)2
T
at2 + δf (6M + 5)
t=0
T
at .
t=0
Moreover, for each natural number T , f ((
T
at )−1
t=0
T
at xt ) − inf(f, D),
t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D) ≤ 2M 2 (
T
T at )−1 + δC (T + 1)(12M + 9)( at )−1
t=0
+2−1 (L + 1)2
t=0 T t=0
at2 (
T
at )−1 + δf (6M + 5).
t=0
We are interested in an optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. By Theorem 2.9, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function −1 2 φ(a0 , . . . , aT ) := 2−1 A−1 T M + AT δC (T + 1)(12M + 9)
38
2 Subgradient Projection Algorithm
2 +δf (6M + 5) + 2−1 A−1 T (L + 1)
T
at2
t=0
on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
Theorem 2.9 and Lemma 2.3 imply the following result. Theorem 2.10 Let δf , δC ∈ (0, 1), a > 0, M > 1 and L > 0 be such that {x ∈ V : f (x) ≤ inf(f, D) + 3} ⊂ BX (0, M − 1), |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8), δf ≤ (6M + 5)−1 and δC (12M + 9) ≤ a ≤ (L + 1)−2 . ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅ and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PD (xt − aξt ) ≤ δC . Then xt ≤ 3M for all integers t ≥ 0 and for each natural number T , f ((T + 1)−1
T
xt ) − inf(f, D),
t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D)
2.5 Subgradient Algorithm on Unbounded Sets
39
≤ 2M 2 (T + 1)−1 a −1 + δC (12M + 9)a −1 +2−1 (L + 1)2 a + δf (6M + 5). Now we will find the best a > 0. Since T can be arbitrarily large, we need to find a minimizer of the function φ(a) := a −1 δC (12M + 9) + 2−1 (L + 1)2 a, a ∈ (0, ∞). Clearly, the minimizer a satisfies a −1 δC (12M + 9) = 2−1 (L + 1)2 a, a = (2δC (12M + 9))1/2 (L + 1)−1 and the minimal value of φ is (2δC (12M + 9))1/2 (L + 1). Clearly, we need to check if the minimizer a defined above satisfies (2.40). It is not difficult to see that this takes place if δC ≤ 2−1 (12M + 9)−1 (L + 1)−2 . Theorem 2.10 implies the following result. Theorem 2.11 Let δf , δC ∈ (0, 1), M > 1 and L > 0 be such that {x ∈ V : f (x) ≤ inf(f, D) + 3} ⊂ BX (0, M − 1), |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8), δC ≤ 2−1 (12M + 9)−1 (L + 1)−2 δf ≤ (6M + 5)−1 and a = (2δC (12M + 9))1/2 (L + 1)−1 . ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅
40
2 Subgradient Projection Algorithm
and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ) and xt+1 − PD (xt − at ξt ) ≤ δC . Then xt ≤ 3M for all integers t ≥ 0 and for each natural number T , f ((T + 1)−1
T
xt ) − inf(f, D),
t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D) ≤ 2M 2 (T + 1)−1 (L + 1)(2δC (12M + 9))−1/2 +(L + 1)(2δC (12M + 9))1/2 + δf (6M + 5). Now we can think about the best choice of T . It is clear that it should be at the same order as δC−1 . Putting T = δC−1 , we obtain that f ((T + 1)−1
T
xt ) − inf(f ; D), min{f (xt ) : t = 0, . . . , T } − inf(f ; D)
t=0
≤ 2M 2 (L + 1)δC (24M + 18)−1/2 1/2
+δC (12M + 9)(L + 1)(24M + 18)−1/2 1/2
+2−1 (L + 1)δC (24M + 18)1/2 + δf (6M + 5). 1/2
In view of the inequality above, which has the right-hand side bounded by 1/2 c1 δC + δf (6M + 5) with a constant c1 > 0, we conclude that after T = δC−1 iterations we obtain a point ξ ∈ U such that BX (ξ, δC ) ∩ D = ∅
2.5 Subgradient Algorithm on Unbounded Sets
41
and 1/2
f (ξ ) ≤ inf(f, D) + c1 δC + δf (6M + 5), where the constant c1 > 0 depends only on L and M. The next result, which is proved in Sect. 2.8, does not have a prototype in [92]. Theorem 2.12 Let δf , δC ∈ (0, 16−1 ), M > 4, L > 0, Dmin ∩ BX (0, M) = ∅, |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 3M + 6),
(2.44)
a = (L + 1)−1 δC
1/2
and −1/2 −1 δf } − 1.
T = 6−1 min{δC−1 , δC ∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅
(2.45)
and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf )
(2.46)
xt+1 − PD (xt − at ξt ) ≤ δC .
(2.47)
and
Then T (T + 1)−1 f (xt ) − inf(f, D), t=0
f(
T (T + 1)−1 xt ) − inf(f, D), min{f (xt ) : t = 0, . . . , T } − inf(f, D) t=0 1/2
≤ 12(2M + 2)2 (L + 1) max{δC , δf }.
42
2 Subgradient Projection Algorithm
2.6 Proof of Theorem 2.8 By (2.28) there exists z ∈ Dmin ∩ BX (0, M).
(2.48)
Lemma 2.2, (2.18), (2.34), (2.35), and (2.37) imply that x1 − z ≤ x1 − PD (x0 − a0 ξ0 ) + PD (x0 − a0 ξ0 ) − z ≤ δC + x0 − z + a0 ξ0 ≤ 1 + 2M + τ1 ξ0 .
(2.49)
In view of (2.30), (2.35), and (2.36), ξ0 ∈ ∂f (x0 ) + BX (0, 1) ⊂ BX (0, L) + 1, ξ0 ≤ L + 1.
(2.50)
It follows from (2.31) and (2.48)–(2.50) that x1 − z ≤ 2M + 2, x1 ≤ 3M + 2.
(2.51)
Assume that T is a natural number and that f (xt ) − f (z) > 0 , t = 1, . . . , T .
(2.52)
U = V ∩ {v ∈ X : v < M0 + 2}
(2.53)
C = D ∩ BX (0, M0 ).
(2.54)
Set
and
By induction we show that for every integer t ∈ [1, T ], xt − z ≤ 2M + 2, f (xt ) − f (z)
(2.55)
2.6 Proof of Theorem 2.8
43
≤ (2τ0 )−1 (z − xt 2 − z − xt+1 2 ) + τ0−1 δC (4M0 + 1) + δf (2M0 + 1) + 2−1 τ1 (L + 1)2 .
(2.56)
In view of (2.51), (2.55) holds for t = 1. Assume that an integer t ∈ [1, T ] and that (2.55) holds. It follows from (2.29), (2.48), and (2.53)–(2.55) that z ∈ C ⊂ BX (0, M0 ),
(2.57)
xt ∈ U ∩ BX (0, M0 + 1).
(2.58)
Relation (2.37) implies that xt+1 ∈ V satisfies xt+1 − PD (xt − at ξt ) ≤ 1.
(2.59)
By (2.30), (2.36) and (2.58), ξt ∈ ∂f (xt ) + BX (0, 1) ⊂ BX (0, L + 1).
(2.60)
It follows from (2.31), (2.34), (2.48), (2.55), (2.60), and Lemma 2.2 that z − PD (xt − at ξt ) ≤ z − xt + at ξt ≤ z − xt + ξt at ≤ 2M + 3, PD (xt − at ξt ) ≤ 3M + 3.
(2.61)
In view of (2.29), (2.54), and (2.61), PD (xt − at ξt ) ∈ C,
(2.62)
PD (xt − at ξt ) = PC (xt − at ξt ).
(2.63)
and
Relations (2.29), (2.53), (2.59), and (2.61) imply that xt+1 ≤ 3M + 4, xt+1 ∈ U.
(2.64)
By (2.25), (2.30), (2.36), (2.37), (2.53), (2.54), (2.57), (2.58), (2.63), (2.64), and Lemma 2.7 which holds with
44
2 Subgradient Projection Algorithm
x = xt , a = at , ξ = ξt , u = xt+1 , we have at (f (xt ) − f (z)) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 +δC (4M0 + 1) + at δf (2M0 + 1) + 2−1 at2 (L + 1)2 . It follows from the relation above, (2.31) and (2.34) that f (xt ) − f (z) ≤ (2τ0 )−1 z − xt 2 − (2τ0 )−1 z − xt+1 2 + τ0−1 δC (4M0 + 1) + (2M0 + 1)δf + 2−1 τ1 (L + 1)2 .
(2.65)
By (2.30), (2.37), (2.48), (2.62), and (2.63), f (xt ) ≥ f (PD (xt − at ξt )) − δC L ≥ f (z) − δC L. In view of the relation above, (2.32), (2.52), (2.55), (2.65), and the inclusion t ∈ [1, T ], z − xt 2 − z − xt+1 2 ≥ 0, z − xt+1 ≤ z − xt ≤ 2M + 2.
(2.66)
Therefore we assumed that (2.55) is true and showed that (2.65) and (2.66) hold. Hence by induction we showed that (2.65) holds for all t = 1, . . . , T and (2.55) holds for all t = 1, . . . , T + 1. It follows from (2.65) which holds for all t = 1, . . . , T , (2.51) and (2.52) that T 0 < T (min{f (xt ) : t = 1, . . . , T } − f (z)) ≤
T
(f (xt ) − f (z))
t=1
≤ (2τ0 )−1
T (z − xt 2 − z − xt+1 2 ) t=1
+T τ0−1 δC (4M0 + 1) + T (2M0 + 1)δf + 2−1 T τ1 (L + 1)2
2.7 Proof of Theorem 2.9
45
≤ (2τ0 )−1 (2M + 2)2 + T τ0−1 δC (4M0 + 1) +T (2M0 + 1)δf + 2−1 T τ1 (L + 1)2 . Together with (2.32) and (2.33) this implies that 0 < (2τ0 T )−1 (2M + 2)2 + τ0−1 δC (4M0 + 1) +(2M0 + 1)δf + 2−1 τ1 (L + 1)2 , 2−1 0 < (2τ0 T )−1 (2M + 2)2 , T < τ0−1 (2M + 2)2 0−1 ≤ n0 + 1. Thus we have shown that if an integer T ≥ 1 satisfies (2.52), then T ≤ n0 and z − xt ≤ 2M + 2, t = 1, . . . , T + 1, xt ≤ 3M + 2, t = 0, . . . , T + 1. This implies that there exists an integer q ∈ [1, n0 + 1] such that xt ≤ 3M + 2, t = 0, . . . , q and f (xq ) − f (z) ≤ 0 .
Theorem 2.8 is proved.
2.7 Proof of Theorem 2.9 Fix z ∈ Dmin .
(2.67)
U = V ∩ {x ∈ X : x < 3M + 2}
(2.68)
Set
and
46
2 Subgradient Projection Algorithm
C = D ∩ BX (0, 3M + 1).
(2.69)
In view of (2.38) and (2.67), z ≤ M − 1.
(2.70)
x0 − z ≤ 2M.
(2.71)
By (2.41) and (2.70),
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M.
(2.72)
It follows from (2.38), (2.67)–(2.69), and (2.72) that z ∈ C ⊂ BX (0, 3M + 1),
(2.73)
xt ∈ U ∩ BX (0, 3M).
(2.74)
Relation (2.43) implies that xt+1 ∈ V satisfies xt+1 − PD (xt − at ξt ) ≤ 1.
(2.75)
By (2.39), (2.42), and (2.74), ξt ∈ ∂f (xt ) + BX (0, 1) ⊂ BX (0, L + 1).
(2.76)
It follows from (2.38), (2.40), (2.67), (2.70), (2.72), (2.76), and Lemma 2.2 that z − PD (xt − at ξt ) ≤ z − xt + at ξt ≤ z − xt + ξt at ≤ 2M + (L + 1)at ≤ 2M + 1, PD (xt − at ξt ) ≤ 3M + 1.
(2.77)
In view of (2.69) and (2.77), PD (xt − at ξt ) ∈ C, and
(2.78)
2.7 Proof of Theorem 2.9
47
PD (xt − at ξt ) = PC (xt − at ξt ).
(2.79)
Relations (2.43), (2.68), and (2.77) imply that xt+1 ≤ PD (xt − at ξt ) + δC < 3M + 2,
(2.80)
xt+1 ∈ U.
(2.81)
By (2.43), (2.73), (2.74), (2.79), (2.81), and Lemma 2.7 which holds with x = xt , a = at , ξ = ξt , u = xt+1 , M0 = 3M + 2, we have at (f (xt ) − f (z)) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 + δC (12M + 9) + at δf (6M + 5) + 2−1 at2 (L + 1)2 .
(2.82)
There are two cases: z − xt+1 ≤ z − xt ;
(2.83)
z − xt+1 > z − xt .
(2.84)
If (2.83) holds, then in view of (2.72), z − xt+1 ≤ 2M. Assume that (2.84) holds. It follows from (2.40), (2.82), and (2.84) that f (xt ) − f (z) < at−1 δC (12M + 9) + δf (6M + 5) + 2−1 at (L + 1)2 ≤ 3. By the inequality above, (2.38) and (2.67), xt ≤ M − 1.
(2.85)
Relations (2.70) and (2.85) imply that xt − z ≤ 2M − 2.
(2.86)
48
2 Subgradient Projection Algorithm
Lemma 2.2, (2.40), (2.43), (2.67), and (2.76) imply that xt+1 − z ≤ xt+1 − PD (xt − at ξt ) + PD (xt − at ξt ) − z ≤ δC + xt − at ξt − z ≤ δC + xt − z + at ξt ≤ 1 + xt − z + (L + 1)at ≤ xt − z + 2 ≤ 2M. Thus in both cases xt+1 − z ≤ 2M and (2.82) holds. Therefore by induction we showed that for all integers t ≥ 0, xt ≤ 3M and that (2.82) holds. Let T be a natural number. By (2.67), (2.72), and (2.82), T
at (f (xt ) − inf(f, D))
t=0
≤
T (2−1 z − xt 2 − 2−1 z − xt+1 2 ) t=0
+δC (12M + 9)(T + 1) +δf (6M + 5)
T
at + 2−1 (L + 1)2
t=0
T
at2
t=0
≤ 2M 2 + δC (12M + 9)(T + 1) +δf (6M + 5)
T
at + 2−1 (L + 1)2
t=0
This completes the proof of Theorem 2.9.
T
at2 .
t=0
2.8 Proof of Theorem 2.12
49
2.8 Proof of Theorem 2.12 Clearly, there exists z ∈ Dmin ∩ BX (0, M).
(2.87)
In view of the theorem assumptions and (2.87), x0 − z ≤ 2M.
(2.88)
U = V ∩ {v ∈ X : v < 3M + 6}
(2.89)
C = D ∩ BX (0, 3M + 4).
(2.90)
Set
and
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M + 2.
(2.91)
It follows from (2.85) and (2.91) that xt ≤ 3M + 2.
(2.92)
xt ∈ U ∩ BX (0, 3M + 2).
(2.93)
In view of (2.89) and (2.92),
By (2.44), (2.46), and (2.92), ξt ≤ L + 1. Lemma 2.2, (2.45), (2.47), (2.85), (2.89), (2.91), and (2.94) imply that xt+1 − z ≤ xt+1 − PD (xt − aξt ) + PD (xt − aξt ) − z ≤ δC + xt − aξt − z ≤ δC + xt − z + aξt
(2.94)
50
2 Subgradient Projection Algorithm
≤ δC + 2M + 2 + a(L + 1) ≤ 2M + 4, xt+1 ≤ 3M + 4
(2.95)
xt+1 ∈ U.
(2.96)
and
It follows from (2.45), (2.85), (2.90), (2.91), (2.94), and Lemma 2.2 that PD (xt − aξt ) ≤ z + PD (xt − aξt ) − z ≤ M + z − xt + aξt ≤ M + 2M + 2 + 1, PD (xt − aξt ) ∈ C,
(2.97)
PD (xt − aξt ) = PC (xt − aξt ).
(2.98)
and
By (2.46), (2.47), (2.85), (2.89), (2.90), (2.93), (2.96), (2.98), and Lemma 2.7 which holds with M0 = 3M + 4, x = xt , ξ = ξt , u = xt+1 , we have a(f (xt ) − f (z)) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 + δC (12M + 17) + aδf (6M + 9) + 2−1 a 2 (L + 1)2 . In view of (2.44), (2.47), (2.85), (2.92), and the relation BX (x0 , δC ) ∩ D = ∅, we have f (xt ) ≥ f (z) − δC L.
(2.99)
2.8 Proof of Theorem 2.12
51
It follows from (2.85), (2.99), and the inequality above that z − xt+1 2 − z − xt 2 ≤ 2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9).
(2.100)
Thus we have shown that if for an integer t ≥ 0 relation (2.91) holds, then (2.99) and (2.100) are true. Let us show that for all integers t = 0, . . . , T , z − xt 2 ≤ z − x0 2 +t[2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9)] ≤ z − x0 2 +T [2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9)] ≤ 4M 2 + 8M + 4.
(2.101)
First, note that by (2.45), T [2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9)] = 2T δC (12M + 18) + 2T δC + 2T aδf (6M + 9) = 2T δC (12M + 19) + 2T δf (6M + 9)δC (L + 1)−1 1/2
−1/2 −1 δf }(2δC (12M
≤ 6−1 min{δC−1 , δC
+ 19)
+2δf (6M + 9)δC (L + 1)−1 ) 1/2
≤ 6−1 (24M + 38 + 12M + 18) ≤ 8M + 4.
(2.102)
It follows from (2.86) and (2.102) that z − x0 2 + T [2δC (12M + 18) +a 2 (L + 1)2 + 2aδf (6M + 9)] ≤ 4M 2 + 8M + 4 = (2M + 2)2 . Assume that t ∈ {0, . . . , T } \ {T } and that (2.101) holds. Then
(2.103)
52
2 Subgradient Projection Algorithm
xt − z ≤ 2M + 2 and (2.99) and (2.100) hold. By (2.100), (2.101), and (2.103), z − xt+1 2 ≤ z − xt 2 + 2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9) ≤ z − x0 2 +(t + 1)[2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9)] ≤ z − x0 2 +T [2δC (12M + 18) + a 2 (L + 1)2 + 2aδf (6M + 9)] ≤ 4M 2 + 8M + 4 and z − xt+1 ≤ 2M + 2. Thus we have shown by induction that (2.99) and (2.101) hold for all t = 0, . . . , T . It follows from (2.45), (2.85), (2.86), and (2.99) that T
a(f (xt ) − inf(f, D))
t=0
≤ 2−1 z − x0 2 + (T + 1)(δC (12M + 17) +aδf (6M + 9) + 2−1 a 2 (L + 1)2 ) ≤ 2M 2 + δC (12M + 18)(6δC )−1 +δC δf (6M + 9)(6δf δC )−1 1/2
1/2
≤ 2M 2 + 2M + 3 + M + 2 = 2M 2 + 3M + 5. Together with (2.45) this implies that
2.9 Zero-Sum Games with Two Players
53
T (T + 1)−1 f (xt ) − inf(f, D), t=0
f(
T (T + 1)−1 xt ) − inf(f, D), t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D) ≤ (2M + 2)2 a −1 (T + 1)−1 −1/2
≤ 12(2M + 2)2 (L + 1)δC
1/2
max{δC , δC δf } 1/2
≤ 12(2M + 2)2 (L + 1) max{δC , δf }.
Theorem 2.12 is proved.
2.9 Zero-Sum Games with Two Players Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Let C be a nonempty closed convex subset of X, D be a nonempty closed convex subset of Y , U be an open convex subset of X, and V be an open convex subset of Y such that C ⊂ U, D ⊂ V
(2.104)
and let a function f : U × V → R 1 possess the following properties: (i) for each v ∈ V , the function f (·, v) : U → R 1 is convex; (ii) for each u ∈ U , the function f (u, ·) : V → R 1 is concave. Assume that a function φ : R 1 → [0, ∞) is bounded on all bounded sets and that positive numbers M1 , M2 , L1 , L2 satisfy C ⊂ BX (0, M1 ), D ⊂ BY (0, M2 ),
(2.105)
|f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2 for all v ∈ V and all u1 , u2 ∈ U, |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2
(2.106)
54
2 Subgradient Projection Algorithm
for all u ∈ U and all v1 , v2 ∈ V .
(2.107)
x∗ ∈ C and y∗ ∈ D
(2.108)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(2.109)
Let
satisfy
for each x ∈ C and each y ∈ D. The following result is proved in Sect. 2.10. Proposition 2.13 Let T be a natural number, δC , δD ∈ (0, 1], {at }Tt=0 ⊂ (0, ∞) T +1 T +1 ⊂ U , {yt }t=0 ⊂ V , for and let {bt,1 }Tt=0 , {bt,2 }Tt=0 ⊂ (0, ∞). Assume that {xt }t=0 each t ∈ {0, . . . , T + 1}, BX (xt , δC ) ∩ C = ∅, BY (yt , δD ) ∩ D = ∅,
(2.110)
for each z ∈ C and each t ∈ {0, . . . , T }, at (f (xt , yt ) − f (z, yt )) ≤ φ(z − xt ) − φ(z − xt+1 ) + bt,1
(2.111)
and that for each v ∈ D and each t ∈ {0, . . . , T }, at (f (xt , v) − f (xt , yt )) ≤ φ(v − yt ) − φ(v − yt+1 ) + bt,2 .
(2.112)
Let xT = (
T
ai )
−1
i=0
yT = (
T i=0
T
at xt ,
t=0
ai )−1
T
at yt .
(2.113)
t=0
Then BX ( xT , δC ) ∩ C = ∅, BY ( yT , δD ) ∩ D = ∅,
(2.114)
2.9 Zero-Sum Games with Two Players
|(
T
at )−1
t=0
55
T
at f (xt , yt ) − f (x∗ , y∗ )|
t=0
≤(
T
at )−1 max{
t=0
T
bt,1 ,
t=0
T
bt,2 }
t=0
+ max{L1 δC , L2 δD } T +( at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]},
(2.115)
t=0
yT ) − ( |f ( xT ,
T
at )−1
t=0
T
at f (xt , yt )|
t=0
T at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} ≤( t=0 T T T at )−1 max{ bt,1 , bt,2 } +( t=0
t=0
t=0
+ max{L1 δC , L2 δD }
(2.116)
and for each z ∈ C and each v ∈ D, xT , yT ) f (z, yT ) ≥ f ( −2(
T
at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]}
t=0 T T T at )−1 max{ bt,1 , bt,2 } −2( t=0
t=0
t=0
− max{L1 δC , L2 δD }, xT , yT ) f ( xT , v) ≤ f ( T +2( at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} t=0
(2.117)
56
2 Subgradient Projection Algorithm
+2(
T
at )−1 max{
t=0
T
bt,1 ,
t=0
T
bt,2 }
t=0
+ max{L1 δC , L2 δD }.
(2.118)
Corollary 2.14 Suppose that all the assumptions of Proposition 2.13 hold and that x˜ ∈ C, y˜ ∈ D satisfy ˜ ≤ δC , yT − y ˜ ≤ δD . xT − x
(2.119)
yT )| ≤ L1 δC + L2 δD |f (x, ˜ y) ˜ − f ( xT ,
(2.120)
Then
and for each z ∈ C and each v ∈ D, f (z, y) ˜ ≥ f (x, ˜ y) ˜ −2(
T
at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]}
t=0 T T T −2( at )−1 max{ bt,1 , bt,2 } − 4 max{L1 δC , L2 δD } t=0
t=0
t=0
and f (x, ˜ v) ≤ f (x, ˜ y) ˜ T +2( at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} t=0 T T T +2( at )−1 max{ bt,1 , bt,2 } + 4 max{L1 δC , L2 δD }. t=0
t=0
t=0
Proof In view of (2.106), (2.107), and (2.119), |f (x, ˜ y) ˜ − f ( xT , yT )|
2.9 Zero-Sum Games with Two Players
57
≤ |f (x, ˜ y) ˜ − f (x, ˜ yT )| + |f (x, ˜ yT ) − f ( xT , yT )| yT + L1 x˜ − xT ≤ L1 δC + L2 δD ≤ L2 y˜ − and (2.120) holds. Let z ∈ C and v ∈ D. Relations (2.106), (2.107), and (2.119) imply that |f (z, y) ˜ − f (z, yT )| ≤ L2 δD , |f (x, ˜ v) − f ( xT , v)| ≤ L1 δC . By the relation above and (2.117)–(2.120), f (z, y) ˜ ≥ f (z, yT ) − L2 δD ≥ f ( xT , yT ) − 2(
T
at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]}
t=0 T T T −1 at ) max{ bt,1 , bt,2 } −2( t=0
t=0
t=0
− max{L1 δC , L2 δD } − L2 δD T ≥ f (x, ˜ y) ˜ − 2( at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} t=0 T T T −1 at ) max{ bt,1 , bt,2 } −2( t=0
t=0
t=0
−4 max{L1 δC , L2 δD } and f (x, ˜ v) ≤ f ( xT , v) + L1 δC ≤ f ( xT , yT ) + 2(
T
at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]}
t=0 T T T −1 at ) max{ bt,1 , bt,2 } +2( t=0
t=0
t=0
58
2 Subgradient Projection Algorithm
+ max{L1 δC , L2 δD } + L1 δC T ≤ f (x, ˜ y) ˜ + 2( at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} t=0 T T T −1 +2( at ) max{ bt,1 , bt,2 } t=0
t=0
t=0
+4 max{L1 δC , L2 δD }. This completes the proof of Corollary 2.14.
2.10 Proof of Proposition 2.13 It is clear that (2.114) is true. In view of (2.110), for each t ∈ {0, . . . , T + 1}, there exist x˜t ∈ C, y˜t ∈ D
(2.121)
xt − x˜t ≤ δC , yt − y˜t ≤ δD .
(2.122)
such that
Let t ∈ {0, . . . , T }. By (2.106)–(2.109), (2.111), (2.112), and (2.122), at (f (xt , yt ) − f (x∗ , y∗ )) ≤ at (f (xt , yt ) − f (x∗ , y˜t )) ≤ at (f (xt , yt ) − f (x∗ , yt )) + at (f (x∗ , yt ) − f (x∗ , y˜t )) ≤ φ(x∗ − xt ) − φ(x∗ − xt+1 ) +bt,1 + at L2 yt − y˜t ≤ φ(x∗ − xt ) − φ(x∗ − xt+1 ) + bt,1 + at L2 δD and
(2.123)
2.10 Proof of Proposition 2.13
59
at (f (x∗ , y∗ ) − f (xt , yt )) ≤ at (f (x˜t , y∗ ) − f (xt , yt )) = at (f (x˜t , y∗ ) − f (xt , y∗ )) + at (f (xt , y∗ ) − f (xt , yt )) ≤ at L1 x˜t − xt +φ(y∗ − yt ) − φ(y∗ − yt+1 ) + bt,2 ≤ at L1 δC + φ(y∗ − yt ) − φ(y∗ − yt+1 ) + bt,2 .
(2.124)
In view of (2.123) and (2.124), T
at f (xt , yt ) −
T
t=0
≤
at f (x∗ , y∗ )
t=0
T (φ(x∗ − xt ) − φ(x∗ − xt+1 )) t=0
+
T
bt,1 + δD L2
t=0
T
at
t=0
≤ φ(x∗ − x0 ) +
T
bt,1 + δD L2
t=0
T
at
(2.125)
t=0
and T
at f (x∗ , y∗ ) −
T
t=0
≤
at f (xt , yt )
t=0
T (φ(y∗ − yt ) − φ(y∗ − yt+1 )) t=0
+
T
bt,2 + L1 δC
t=0
≤ φ(y∗ − y0 ) +
T
at
t=0 T t=0
bt,2 + L1 δC
T t=0
at .
(2.126)
60
2 Subgradient Projection Algorithm
Relations (2.105), (2.108), (2.110), (2.125), and (2.126) imply that T T |( at )−1 at f (xt , yt ) − f (x∗ , y∗ )| t=0
t=0
T at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} ≤( t=0 T T T +( at )−1 max{ bt,1 , bt,2 } t=0
t=0
t=0
+ max{L1 δC , L2 δD }.
(2.127)
zT ∈ C
(2.128)
xT ≤ δC . zT −
(2.129)
By (2.214), there exists
such that
In view of (2.128), we apply (2.111) with z = zT and obtain that for all t = 0, . . . , T , at (f (xt , yt ) − f (zT , yt )) ≤ φ(zT − xt ) − φ(zT − xt+1 ) + bt,1 .
(2.130)
It follows from (2.106) and (2.129) that for all t = 0, . . . , T , xT , yt )| ≤ L1 zT − xT ≤ L1 δC . |f (zT , yt ) − f (
(2.131)
By (2.130) and (2.131), for all t = 0, . . . , T , xT , yt )) at (f (xt , yt ) − f ( ≤ at (f (xt , yt ) − f (zT , yt )) + at L1 δC ≤ φ(zT − xt ) − φ(zT − xt+1 ) + bt,1 + at L1 δC . Combined with (2.105), (2.110), and (2.128) this implies that
(2.132)
2.10 Proof of Proposition 2.13
61
T
T
at f (xt , yt ) −
t=0
≤
at f ( xT , yt )
t=0
T (φ(zT − xt ) − φ(zT − xt+1 ) t=0
+
T
bt,1 +
T
t=0
at L1 δC
t=0
≤ φ(zT − x0 ) +
T
bt,1 +
t=0
≤ sup{φ(s) : s ∈ [0, 2M1 + 1]} +
T
at L1 δC
t=0
T
bt,1 +
t=0
T
at L1 δC .
(2.133)
t=0
Property (ii) and (2.113) imply that T
at f ( xT , yt ) = (
t=0
T i=0
ai )
T T (at ( ai )−1 f ( xT , yt )) t=0
i=0
T at )f ( xT , yT ). ≤(
(2.134)
t=0
By (2.133) and (2.134), T
at f (xt , yt ) − (
t=0
≤
T
at )f ( xT , yT )
t=0
T
at f (xt , yt ) −
t=0
T
at f ( xT , yt )
t=0
≤ sup{φ(s) : s ∈ [0, 2M1 + 1]} +
T t=0
By (2.114), there exists
bt,1 +
T t=0
at L1 δC .
(2.135)
62
2 Subgradient Projection Algorithm
hT ∈ D
(2.136)
yT ≤ δD . hT −
(2.137)
such that
In view of (2.136), we apply (2.112) with v = hT and obtain that for all t = 0, . . . , T , at (f (xt , hT ) − f (xt , yt )) ≤ φ(hT − yt ) − φ(hT − yt+1 ) + bt,2 .
(2.138)
It follows from (2.107) and (2.137) that for all t = 0, . . . , T , yT )| ≤ L2 hT − yT ≤ L2 δD . |f (xt , hT ) − f (xt ,
(2.139)
By (2.138) and (2.139), for all t = 0, . . . , T , yT ) − f (xt , yt )) at (f (xt , yT ) − f (xt , hT )) + at (f (xt , hT ) − f (xt , yt )) ≤ at (f (xt , ≤ φ(hT − yt ) − φ(hT − yt+1 ) + bt,2 + at L2 δD .
(2.140)
In view of (2.140), T
at f (xt , yT ) −
t=0
≤
T
at f (xt , yt )
t=0
T T T (φ(hT − yt ) − φ(hT − yt+1 )) + bt,2 + at L2 δD . t=0
t=0
(2.141)
t=0
Property (i) and (2.113) imply that T
at f (xt , yT ) = (
t=0
T
ai )
i=0
≥
T
T T (at ( ai )−1 f (xt , yT )) t=0
i=0
at f ( xT , yT ).
t=0
By (2.90), (2.110), (2.136), (2.141), and (2.142),
(2.142)
2.10 Proof of Proposition 2.13
63
T
T
at f ( xT , yT ) −
t=0
≤
T
at f (xt , yT ) −
T
t=0
≤
at f (xt , yt )
t=0
at f (xt , yt )
t=0
T T T (φ(hT − yt ) − φ(hT − yt+1 )) + bt,2 + at L2 δD t=0
t=0
≤ sup{φ(s) : s ∈ [0, 2M2 + 1]} +
T
bt,2 +
t=0
T
t=0
at L2 δD .
(2.143)
t=0
It follows from (2.135) and (2.143) that |
T
at f ( xT , yT ) −
t=0
T
at f (xt , yt )|
t=0
≤ sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]} + max{
T
bt,1 ,
t=0
T
bt,2 } +
t=0
T
at max{L1 δC , L2 δD }.
t=0
This relation implies (2.116). Let z ∈ C. By (2.111), T
at (f (xt , yt ) − f (z, yt ))
t=0
≤
T T [φ(z − xt ) − φ(z − xt+1 )] + bt,1 . t=0
t=0
By property (ii) and (2.113), T t=0
at f (z, yt ) = (
T i=0
ai )
T T (at ( ai )−1 f (z, yt )) t=0
i=0
(2.144)
64
2 Subgradient Projection Algorithm
≤(
T
at )f (z, yT ).
(2.145)
t=0
In view of (2.144) and (2.145), T
at f (xt , yt ) −
t=0
≤
T
at f (z, yT )
t=0 T
at (f (xt , yt ) − f (z, yt ))
t=0
≤
T T [φ(z − xt ) − φ(z − xt+1 )] + bt,1 t=0
t=0
≤ φ(z − x0 ) +
T
(2.146)
bt,1 .
t=0
It follows from (2.105) and (2.110) that f (z, yT ) ≥ (
T
ai )−1
i=0
−(
T
T
at f (xt , yt )
t=0
ai )−1 sup{φ(s) : s ∈ [0, 2M1 + 1]} − (
i=0
T i=0
≥ f ( xT , yT ) − 2(
T
ai )−1
T
bt,1
t=0
at )−1 sup{φ(s) : s ∈ [0, max{2M1 , 2M2 } + 1]}
t=0 T T T at )−1 max{ bt,1 , bt,2 } − max{L1 δC , L2 δD } −2( t=0
t=0
t=0
and (2.117) holds. Let v ∈ D. By (2.112), T t=0
at (f (xt , v) − f (xt , yt ))
2.10 Proof of Proposition 2.13
≤
T
65
[φ(v − yt ) − φ(v − yt+1 )] +
t=0
T
(2.147)
bt,2 .
t=0
By property (i) and (2.113), T
at f (xt , v) = (
T
t=0
T T ai ) (at ( ai )−1 f (xt , v))
i=0
≥(
T
t=0
i=0
(2.148)
at )f ( xT , v).
t=0
In view of (2.147) and (2.148), T
at f ( xT , v) −
t=0
≤
T
at f (xt , yt )
t=0
T
at f (xt , v) −
t=0
T
at f (xt , yt )
t=0
≤ φ(v − y0 ) +
T
bt,2 .
t=0
Together with (2.105), (2.110), and (2.116) this implies that f ( xT , v) ≤ (
T
at )−1
t=0
+(
T
T
at f (xt , yt )
t=0
at )−1 sup{φ(s) : s ∈ [0, 2M2 + 1]} + (
t=0
T t=0
≤ f ( xT , yT ) + 2(
T
at )−1
T
bt,2
t=0
at )−1 sup{φ(s) : s ∈ [0, 2 max{M1 , M2 } + 1]}
t=0 T T T at )−1 max{ bt,1 , bt,2 } + max{L1 δC , L2 δD }. +2( t=0
t=0
t=0
Therefore (2.118) holds. This completes the proof of Proposition 2.13.
66
2 Subgradient Projection Algorithm
2.11 Zero-Sum Games on Bounded Sets Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Let C be a nonempty closed convex subset of X, D be a nonempty closed convex subset of Y , U be an open convex subset of X, and V be an open convex subset of Y such that C ⊂ U, D ⊂ V .
(2.149)
For each concave function g : V → R 1 and each x ∈ V set ∂g(x) = {l ∈ Y : l, y − x ≥ g(y) − g(x) for all y ∈ V }.
(2.150)
Clearly, for each x ∈ V , ∂g(x) = −(∂(−g)(x)).
(2.151)
Suppose that there exist L1 , L2 , M1 , M2 > 0 such that C ⊂ BX (0, M1 ), D ⊂ BY (0, M2 ),
(2.152)
a function f : U × V → R 1 possesses the following properties: (i) for each v ∈ V , the function f (·, v) : U → R 1 is convex; (ii) for each u ∈ U , the function f (u, ·) : V → R 1 is concave, for each v ∈ V , |f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2 for all u1 , u2 ∈ U
(2.153)
and that for each u ∈ U , |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2 for all v1 , v2 ∈ V .
(2.154)
For each (ξ, η) ∈ U × V , set ∂x f (ξ, η) = {l ∈ X : f (y, η) − f (ξ, η) ≥ l, y − ξ for all y ∈ U }, ∂y f (ξ, η) = {l ∈ Y :
(2.155)
2.11 Zero-Sum Games on Bounded Sets
l, y − η ≥ f (ξ, y) − f (ξ, η) for all y ∈ V }.
67
(2.156)
In view of properties (i) and (ii), (2.153), and (2.154), for each ξ ∈ U and each η ∈ V, ∅ = ∂x f (ξ, η) ⊂ BX (0, L1 ),
(2.157)
∅ = ∂y f (ξ, η) ⊂ BY (0, L2 ).
(2.158)
x∗ ∈ C and y∗ ∈ D
(2.159)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(2.160)
Let
satisfy
for each x ∈ C and each y ∈ D. Let δf,1 , δf,2 , δC , δD ∈ (0, 1] and {ak }∞ k=0 ⊂ (0, ∞). Let us describe our algorithm. Subgradient projection algorithm for zero-sum games Initialization: select arbitrary x0 ∈ U and y0 ∈ V . Iterative Step: given current iteration vectors xt ∈ U and yt ∈ V calculate ξt ∈ ∂x f (xt , yt ) + BX (0, δf,1 ), ηt ∈ ∂y f (xt , yt ) + BY (0, δf,2 ) and the next pair of iteration vectors xt+1 ∈ U , yt+1 ∈ V such that xt+1 − PC (xt − at ξt ) ≤ δC , yt+1 − PD (yt + at ηt ) ≤ δD . In this chapter we prove the following result. Theorem 2.15 Let δf,1 , δf,2 , δC , δD ∈ (0, 1] and {ak }∞ k=0 ⊂ (0, ∞). Assume that ∞ ∞ ∞ {xt }∞ t=0 ⊂ U , {yt }t=0 ⊂ V , {ξt }t=0 ⊂ X, {ηt }t=0 ⊂ Y , BX (x0 , δC ) ∩ C = ∅, BY (y0 , δD ) ∩ D = ∅
(2.161)
and that for each integer t ≥ 0, ξt ∈ ∂x f (xt , yt ) + BX (0, δf,1 ),
(2.162)
68
2 Subgradient Projection Algorithm
ηt ∈ ∂y f (xt , yt ) + BY (0, δf,2 ),
(2.163)
xt+1 − PC (xt − at ξt ) ≤ δC
(2.164)
yt+1 − PD (yt + at ηt ) ≤ δD .
(2.165)
and
Let for each natural number T , xT = (
T
at )−1
i=0
T
at xt , yT = (
t=0
T
at )−1
i=0
T
at yt .
(2.166)
t=0
Then for each natural number T , xT , δC ) ∩ C = ∅, BY ( yT , δD ) ∩ D = ∅, BX ( |(
T t=0
≤ (2
T
at )−1
T
at f (xt , yt ) − f (x∗ , y∗ )|
t=0
at )−1 (max{2M1 , 2M2 } + 1)2 + max{L1 δC , L2 δD }
t=0
+(
T
at )−1 (T + 1) max{(4M1 + 1)δC , (4M2 + 1)δD }
t=0
+2−1 (max{L1 , L2 } + 1)2 (
T t=0
at )−1
T
at2
t=0
+ max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1), |f ( xT , yT ) − (
T t=0
≤ (2
T
at )−1
T
at f (xt , yt )|
t=0
at )−1 (max{2M1 , 2M2 } + 1)2 + max{L1 δC , L2 δD }
t=0
+(
T t=0
at )−1 (T + 1) max{(4M1 + 1)δC , (4M2 + 1)δD }
(2.167)
2.11 Zero-Sum Games on Bounded Sets
69
+2−1 (max{L1 , L2 } + 1)2 (
T
at )−1
t=0
T
at2
t=0
+ max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1)
(2.168)
and for each natural number T , each z ∈ C, and each u ∈ D, xT , yT ) f (z, yT ) ≥ f ( −(
T
at )−1 (max{2M1 , 2M2 } + 1)2 − max{L1 δC , L2 δD }
t=0 T at )−1 (T + 1) max{(4M1 + 1)δC , (4M2 + 1)δD } −2( t=0
−(max{L1 , L2 } + 1)2 (
T
at )−1
t=0
T
at2
t=0
−2 max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1), xT , yT ) f ( xT , u) ≤ f ( +(
T
at )−1 (max{2M1 , 2M2 } + 1)2 + max{L1 δC , L2 δD }
t=0 T at )−1 (T + 1) max{(4M1 + 1)δC , (4M2 + 1)δD } +2( t=0
+(max{L1 , L2 } + 1) ( 2
T
at )
t=0
−1
T
at2
t=0
+2 max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1). Proof By (2.152), (2.164), and (2.165), for all integers t ≥ 0, xt ≤ M1 + 1, yt ≤ M2 + 1. Let t ≥ 0 be an integer. Applying Lemma 2.7 with a = at , x = xt , f = f (·, yt ), ξ = ξt , u = xt+1
(2.169)
70
2 Subgradient Projection Algorithm
we obtain that for each z ∈ C, at (f (xt , yt ) − f (z, yt )) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 + δC (4M1 + 1) + at δf,1 (2M1 + 1) + 2−1 at2 (L1 + 1)2 .
(2.170)
Applying Lemma 2.7 with a = at , x = yt , f = −f (xt , ·), ξ = −ηt , u = yt+1 we obtain that for each v ∈ D, at (f (xt , v) − f (xt , yt )) ≤ 2−1 v − yt 2 − 2−1 v − yt+1 2 + δD (4M2 + 1) + at δf,2 (2M2 + 1) + 2−1 at2 (L2 + 1)2 .
(2.171)
For all integers t ≥ 0 set bt,1 = δC (4M1 + 1) + at δf,1 (2M1 + 1) + 2−1 at2 (L1 + 1)2 , bt,2 = δD (4M2 + 1) + at δf,2 (2M2 + 1) + 2−1 at2 (L2 + 1)2 , and define φ(s) = 2−1 s 2 , s ∈ R 1 . It is easy to see that all the assumptions of Proposition 2.13 hold and it implies Theorem 2.15. Theorem 2.15 is a generalization of Theorem 2.11 of [92] obtained in the case when δf,1 = δf,2 = δC = δD . . . , T . For simplicity We are interested in the optimal choice of at , t = 0, 1, . assume that L1 = L2 . Let T be a natural number and AT = Tt=0 at be given. By Theorem 2.15, in orderto make the best choice of at , t = 0, . . . , T , we need to minimize the function Tt=0 at2 on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T which is the best choice of at , t = 0, 1, . . . , T .
2.12 Zero-Sum Games on Unbounded Sets
71
Let T be a natural number and at = a for all t = 0, . . . , T . Now we will find the best a > 0. In order to meet this goal we need to choose a which is a minimizer of the function ΨT (a) = ((T + 1)a)−1 (2(max{2M1 , 2M2 } + 1)2 ) +2 max{(4M1 + 1)δC , (4M2 + 1)δD } + (max{L1 , L2 } + 1)2 a. Since T can be arbitrarily large, we need to find a minimizer of the function φ(a) := 2a −1 max{(4M1 +1)δC , (4M2 +1)δD }+(max{L1 , L2 }+1)2 a, a ∈ (0, ∞). It is not difficult to see that the minimizer is a = (2 max{(4M1 + 1)δC , (4M2 + 1)δD })1/2 (max{L1 , L2 } + 1)−1 and the minimal value of φ is (8 max{(4M1 + 1)δC , (4M2 + 1)δD })1/2 (max{L1 , L2 } + 1). Now our goal is to find the best integer T > 0 which gives us an appropriate value of ΨT (a). Since in view of the inequalities above, this value is bounded from below by c0 max{δC , δD }1/2 , with the constant c0 depending on L1 , L2 , M1 , M2 , it is clear that in order to make the best choice of T , it should be at the same order as max{δC , δD }−1 . Let T = max{δC , δD }−1 . In this case we obtain a pair of points x ∈ U, y∈V such that x , δC ) ∩ C = ∅, BY ( y , δD ) ∩ D = ∅ BX ( and for each z ∈ C and each v ∈ D, f (z, y ) ≥ f ( x, y ) − c max{δC , δD }1/2 − 2 max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1), f ( x , v) ≤ f ( x, y ) + c max{δC , δD }1/2 + 2 max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1), where the constant c > 0 depends only on L1 , L2 , M1 , M2 .
2.12 Zero-Sum Games on Unbounded Sets Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Let A be a nonempty closed convex
72
2 Subgradient Projection Algorithm
subset of X, D be a nonempty closed convex subset of Y , W be an open convex subset of X, and V be an open convex subset of Y such that A ⊂ W, D ⊂ V
(2.172)
and let a function f : W × V → R 1 possess the following properties: (i) for each v ∈ V , the function f (·, v) : W → R 1 is convex and continuous; (ii) for each w ∈ W , the function f (w, ·) : V → R 1 is concave and continuous. Suppose that L1 > 1, L2 > 0, M0 , M1 , M2 > 0,
(2.173)
θ0 ∈ A ∩ BX (0, M0 ),
(2.174)
D ⊂ BY (0, M0 ),
(2.175)
sup{f (θ0 , y) : y ∈ V ∩ BY (0, M0 + 1)} < M1 ,
(2.176)
for each y ∈ V ∩ BY (0, M0 + 1), {x ∈ W : f (x, y) ≤ M1 + 4} ⊂ BX (0, M2 ),
(2.177)
for each v ∈ V ∩ BY (0, M0 + 1), |f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2 for all u1 , u2 ∈ W ∩ BX (0, M0 + 3M2 + 4)
(2.178)
and that for each u ∈ W ∩ BX (0, M0 + 3M2 + 4), |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2 for all v1 , v2 ∈ V ∩ BY (0, M0 + 2).
(2.179)
x∗ ∈ A and y∗ ∈ D
(2.180)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(2.181)
Let
satisfy
for each x ∈ A and each y ∈ D. In view of (2.172), (2.174)–(2.176), (2.180), and (2.181),
2.12 Zero-Sum Games on Unbounded Sets
73
f (x∗ , y∗ ) ≤ f (θ0 , y∗ ) < M1 ,
(2.182)
x∗ ≤ M2 , θ0 ≤ M2 .
(2.183)
In this chapter we prove the following result which does not have an analog in [92]. (Note that in [92] we consider only games on bounded sets.) Theorem 2.16 Let δf,1 , δf,2 , δA , δD ∈ (0, 1/2) and {ak }∞ k=0 ⊂ (0, ∞) satisfy δA ≤ (12M2 + 9)−1 (L1 + 1)−2 , δf,1 ≤ (6M2 + 5)−1 ,
(2.184)
δA (12M2 + 9) ≤ at ≤ (L1 + 1)−2 for all integers t ≥ 0.
(2.185)
∞ ∞ ∞ Assume that {xt }∞ t=0 ⊂ W , {yt }t=0 ⊂ V , {ξt }t=0 ⊂ X, {ηt }t=0 ⊂ Y ,
BX (x0 , δA ) ∩ A = ∅, BY (y0 , δD ) ∩ D = ∅,
(2.186)
x0 ≤ M2 + 1
(2.187)
ξt ∈ ∂x f (xt , yt ) + BX (0, δf,1 ),
(2.188)
ηt ∈ ∂y f (xt , yt ) + BY (0, δf,2 ),
(2.189)
xt+1 − PA (xt − at ξt ) ≤ δA
(2.190)
yt+1 − PD (yt + at ηt ) ≤ δD .
(2.191)
and that for each integer t ≥ 0,
and
Let for each natural number T , xT = (
T
at )−1
i=0
T
at xt , yT = (
t=0
T
at )−1
i=0
T
at yt .
(2.192)
t=0
Then for all integers t ≥ 0, xt ≤ 3M2 + 1
(2.193)
and for each natural number T , xT , δA ) ∩ A = ∅, BY ( yT , δD ) ∩ D = ∅, BX (
(2.194)
74
2 Subgradient Projection Algorithm
|(
T
at )−1
t=0
T
at f (xt , yt ) − f (x∗ , y∗ )|
t=0
≤ max{L1 δA , L2 δD } +(2
T
at )−1 (max{2M0 , 6M2 + 4} + 1)2
t=0
+ max{δf,1 , δf,2 }(2 max{2M0 + 1, 6M2 + 5}) +2−1 (max{L1 , L2 } + 1)2 (
T
at )−1
t=0
+(
T
T
at2
t=0
at )−1 (T + 1) max{(12M2 + 9)δA , (4M0 + 1)δD },
(2.195)
t=0
yT ) − ( |f ( xT ,
T t=0
at )−1
T
at f (xt , yt )|
t=0
≤ max{L1 δA , L2 δD } +(2
T
at )−1 (max{2M0 , 6M2 + 4} + 1)2
t=0
+ max{δf,1 , δf,2 }(max{2M0 + 1, 6M2 + 5}) +2−1 (max{L1 , L2 } + 1)2 (
T
at )−1
t=0
+(
T
T
at2
t=0
at )−1 (T + 1) max{(12M2 + 9)δA , (4M0 + 1)δD }
t=0
and for each natural number T , each z ∈ A, and each u ∈ D, xT , yT ) f (z, yT ) ≥ f ( − max{L1 δA , L2 δD }
(2.196)
2.12 Zero-Sum Games on Unbounded Sets
−(
T
75
at )−1 (max{2M0 , 6M2 + 4} + 1)2
t=0
−2 max{δf,1 , δf,2 }(max{2M0 + 1, 6M2 + 5}) −(max{L1 , L2 } + 1)2 (
T
at )−1
t=0
− 2(
T
T
at2
t=0
at )−1 (T + 1) max{(12M2 + 9)δA , (4M0 + 1)δD },
(2.197)
t=0
f ( xT , u) ≤ f ( xT , yT ) + max{L1 δA , L2 δD } +(
T
at )−1 (max{2M0 , 6M2 + 4} + 1)2
t=0
+ max{δf,1 , δf,2 }(2 max{2M0 + 1, 6M2 + 5}) +(max{L1 , L2 } + 1)2 (
T t=0
+ 2(
T
at )−1
T
at2
t=0
at )−1 (T + 1) max{(12M2 + 9)δA , (4M0 + 1)δD }.
(2.198)
t=0
We are interested in the optimal choice of at , t = 0, 1, . . . , T . Let T be a natural number and AT = Tt=0 at be given. By Theorem 2.16, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function Tt=0 at2 on the set {a = (a0 , . . . , aT ) ∈ R
T +1
: ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T which is the best choice of at , t = 0, 1, . . . , T . Let T be a natural number and at = a for all t = 0, . . . , T . Now we will find the best a > 0. In order to meet this goal we need to choose a which is a minimizer of the function
76
2 Subgradient Projection Algorithm
ΨT (a) = ((T + 1)a)−1 (max{2M0 , 6M2 + 2} + 1)2 +a −1 max{(12M2 + 9)δA , (4M0 + 1)δD } + (max{L1 , L2 } + 1)2 a, a > 0. Since T can be arbitrarily large, we need to find a minimizer of the function φ(a) := a −1 max{(12M2 + 9)δA , (4M0 + 1)δD } +(max{L1 , L2 } + 1)2 a, a ∈ (0, ∞). It is not difficult to see that the minimizer is a = (max{(12M2 + 9)δA , (4M0 + 1)δD })1/2 (max{L1 , L2 } + 1)−1 and the minimal value of φ is (2 max{(12M2 + 9)δA , (4M0 + 1)δD })1/2 (max{L1 , L2 } + 1). Now our goal is to find the best integer T > 0 which gives us an appropriate value of ΨT (a). In view of the inequalities above, this value is bounded from below by c0 max{δC , δD }1/2 , with the constant c0 depending on L1 , L2 , M0 , M1 , M2 , and it is clear that in order to make the best choice of T , it should be at the same order as max{δC , δD }−1 . In this case we obtain a pair of points x ∈ W ∩ BX (0, 3M2 + 4), y ∈ V such that x , δA ) ∩ A ∩ BX (0, 3M2 + 4) = ∅, BX ( y , δD ) ∩ D = ∅ BY ( and for each z ∈ A and each v ∈ D, f (z, y ) ≥ f ( x, y) −c max{δA , δD }1/2 − 2 max{δf,1 , δf,2 } max{6M2 + 5, 2M0 + 1}, f ( x , v) ≤ f ( x, y) +c max{δA , δD }1/2 + 2 max{δf,1 , δf,2 } max{6M2 + 5, 2M0 + 1}, where the constant c > 0 depends only on L1 , L2 , M0 , M1 , M2 .
2.13 Proof of Theorem 2.16
77
2.13 Proof of Theorem 2.16 By (2.175), (2.186), and (2.191), for all integers t ≥ 0, yt ≤ M0 + 1.
(2.199)
C = A ∩ BX (0, 3M2 + 2)
(2.200)
U = W ∩ {x ∈ X : x < 3M2 + 4}.
(2.201)
Set
and
We show that xt − θ0 ≤ 2M2 + 1 for all integers t ≥ 0. In view of (2.183) and (2.187), x0 − θ0 ≤ x0 + θ0 ≤ M2 + 1 + M2 = 2M2 + 1.
(2.202)
Assume that t ≥ 0 is an integer and that xt − θ0 ≤ 2M2 + 1.
(2.203)
It follows from (2.183), (2.201), and (2.203) that xt ∈ U.
(2.204)
xt+1 − PA (xt − at ξt ) ≤ 1.
(2.205)
Relation (2.190) implies
By (2.178), (2.188), (2.199), (2.201), and (2.204), ξt ∈ ∂x f (xt , yt ) + BX (0, 1) ⊂ BX (0, L1 + 1).
(2.206)
It follows from (2.174), (2.185), (2.203), (2.206), and Lemma 2.2 that θ0 − PA (xt − at ξt ) ≤ θ0 − xt + at ξt ≤ θ0 − xt + ξt at ≤ 2M2 + 2.
(2.207)
78
2 Subgradient Projection Algorithm
In view of (2.183) and (2.207), PA (xt − at ξt ) ≤ θ0 + 2M2 + 2 ≤ 3M2 + 2.
(2.208)
By (2.205) and (2.208), xt+1 ≤ PA (xt − at ξt ) + 1 ≤ 3M2 + 3.
(2.209)
It follows from (2.201) and (2.209) that xt+1 ∈ U.
(2.210)
Relations (2.200) and (2.208) imply that PA (xt − at ξt ) ∈ C,
(2.211)
PA (xt − at ξt ) = PC (xt − at ξt ).
(2.212)
and
By (2.178), (2.183), (2.188), (2.190), (2.200), (2.201), (2.203), (2.204), and (2.210)– (2.212), we apply Lemma 2.7 with M0 = 3M2 + 2, a = at , x = xt , f = f (·, yt ), ξ = ξt , u = xt+1 and obtain that for each z ∈ C, at (f (xt , yt ) − f (z, yt )) ≤ 2−1 xt − z2 − 2−1 xt+1 − z2 + δA (12M2 + 9) + at δf,1 (6M2 + 5) + 2−1 at2 (L1 + 1)2 .
(2.213)
In view of (2.213) with z = θ0 , at (f (xt , yt ) − f (θ0 , yt )) ≤ 2−1 xt − θ0 2 − 2−1 xt+1 − θ0 2 + δA (12M2 + 9) + at δf,1 (6M2 + 5) + 2−1 at2 (L1 + 1)2 .
(2.214)
There are two cases: θ0 − xt+1 ≤ θ0 − xt ;
(2.215)
2.13 Proof of Theorem 2.16
79
θ0 − xt+1 > θ0 − xt .
(2.216)
Assume that (2.215) holds. By (2.203) and (2.215), θ0 − xt+1 ≤ 2M2 + 1.
(2.217)
Assume that (2.216) is true. It follows from (2.184), (2.185), (2.214), and (2.216) that f (xt , yt ) − f (θ0 , yt ) ≤ at−1 δA (12M2 + 9) + δf,1 (6M2 + 5) + 2−1 at (L1 + 1)2 ≤ 3.
(2.218)
By (2.176), (2.199), and (2.218), f (xt , yt ) ≤ M1 + 3.
(2.219)
In view of (2.177), (2.199), and (2.219), xt ≤ M2 . Together with (2.183) this implies that xt − θ0 ≤ 2M2 . It follows from the inequality above, (2.174), (2.185), (2.190), (2.206), and Lemma 2.2 that θ0 − xt+1 ≤ xt+1 − PA (xt − at ξt ) + θ0 − PA (xt − at ξt ) ≤ δA + θ0 − PA (xt − at ξt ) δ + θ0 − xt + at ξt ≤ δA + θ0 − xt + ξt at ≤ δA + 2M2 + (L1 + 1)at ≤ 2M2 + 1 and (2.217) holds in both cases. Thus by induction we have showed that for all integers t ≥ 0, xt − θ0 ≤ 2M2 + 1
80
2 Subgradient Projection Algorithm
and that (2.210)–(2.212) hold. Now we can apply Theorem 2.15 with C, D, U and V ∩ {y ∈ Y : y < M0 + 2} and obtain that (2.195), (2.196) are true and that for each natural number T , each z ∈ C, and each u ∈ D, (2.197) and (2.198) hold. In order to complete the proof it is sufficient to show that (2.197) holds for all z ∈ A \ C. By (2.176), (2.177), (2.194), (2.197) (with z = θ0 ), and (2.200), yT ) f (z, yT ) > M1 + 4 > 4 + f (θ0 , and (2.197) holds for all z ∈ A. Theorem 2.16 is proved.
2.14 An Example for Theorem 2.16 Let X, Y be Hilbert spaces, A be a nonempty closed convex subset of X, D be a nonempty closed convex subset of Y , W be an open convex subset of X, and V be an open convex subset of Y such that A ⊂ W, D ⊂ V , let a function g : W → R 1 be convex and continuous, a function h : V → R 1 be concave and continuous, and f (w, v) = g(w)h(v), w ∈ W, v ∈ V . Suppose that c0 , M0 , M1 , M2 , L0 , L1 > 0, g(w) ≥ c0 , w ∈ W, h(v) ≥ c0 , v ∈ V , D ⊂ BY (0, M0 ), θ0 ∈ A ∩ BX (0, M0 ), |h(v1 ) − h(v2 )| ≤ L0 v1 − v2 for all v1 , v2 ∈ V ∩ BY (0, M0 + 2), sup{h(y) : y ∈ V ∩ BY (0, M0 + 1)} < M1 , {x ∈ W : g(x) ≤ c0−1 (M1 (g(θ0 ) + 1) + 4)} ⊂ BX (0, M2 ),
2.14 An Example for Theorem 2.16
81
|g(u1 ) − g(u2 )| ≤ L1 u1 − u2 for all u1 , u2 ∈ W ∩ BX (0, 3M2 + M0 + 4) and that x∗ ∈ A and y∗ ∈ D satisfy f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ ) for all x ∈ A and all y ∈ D. It is not difficult to see that all the assumptions made in Sect. 2.12 hold.
Chapter 3
The Mirror Descent Algorithm
In this chapter we analyze the mirror descent algorithm for minimization of convex and nonsmooth functions and for computing the saddle points of convex–concave functions, under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the objective function while in the second one we solve an auxiliary minimization problem on the set of feasible points. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
3.1 Optimization on Bounded Sets Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let C be a nonempty closed convex subset of X, U be an open convex subset of X such that C ⊂ U and let f : U → R 1 be a convex function. Suppose that there exist L > 0, M0 > 0 such that C ⊂ BX (0, M0 ),
(3.1)
|f (x) − f (y)| ≤ Lx − y for all x, y ∈ U.
(3.2)
In view of (3.2), for each x ∈ U , ∅ = ∂f (x) ⊂ BX (0, L). © Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_3
(3.3) 83
84
3 The Mirror Descent Algorithm
For each nonempty set D ⊂ X and each function h : D → R 1 put inf(h, D) = inf{h(y) : y ∈ D} and argmin(h, D) = argmin{h(y) : y ∈ D} = {y ∈ D : h(y) = inf(h, D)}. We study the convergence of the mirror descent algorithm under the presence of computational errors. This method was introduced by Nemirovsky and Yudin for solving convex optimization problems [66]. Here we use a derivation of this algorithm proposed by Beck and Teboulle [17]. Let δf , δC ∈ (0, 1] and {ak }∞ k=0 ⊂ (0, ∞). We describe the inexact version of the mirror descent algorithm. Mirror descent algorithm Initialization: select an arbitrary x0 ∈ U . Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ ∂f (xt ) + BX (0, δf ), define gt (x) = ξt , x + (2at )−1 x − xt 2 , x ∈ X and calculate the next iteration vector xt+1 ∈ U such that BX (xt+1 , δC ) ∩ argmin{gt (y) : y ∈ C} = ∅. Here δf is a computational error produced by our computer system when we calculate a subgradient of f and δC is a computational error produced by our computer system when we calculate a minimizer of the function gt . Note that gt is a convex bounded from below function on X which possesses a minimizer on C. In this chapter we prove the following result. Theorem 3.1 Let δf , δC ∈ (0, 1], {ak }∞ k=0 ⊂ (0, ∞) and let x∗ ∈ C
(3.4)
f (x∗ ) ≤ f (x) for all x ∈ C.
(3.5)
satisfies
∞ Assume that {xt }∞ t=0 ⊂ U , {ξt }t=0 ⊂ X,
3.1 Optimization on Bounded Sets
85
x0 ≤ M0 + 1
(3.6)
ξt ∈ ∂f (xt ) + BX (0, δf ),
(3.7)
gt (z) = ξt , z + (2at )−1 z − xt 2 , z ∈ X
(3.8)
BX (xt+1 , δC ) ∩ argmin(gt , C) = ∅.
(3.9)
and that for each integer t ≥ 0,
and
Then for each natural number T , T
at (f (xt ) − f (x∗ ))
t=0
≤ 2−1 (2M0 + 1)2 + (δf (2M0 + 1) + δC (L + 1))
T
at
t=0
+ 4δC (T + 1)(2M0 + 1) + 2−1 (L + 1)2
T
at2 .
(3.10)
t=0
Moreover, for each natural number T , f ((
T
at )−1
t=0
T
at xt ) − f (x∗ ),
t=0
min{f (xt ) : t = 0, . . . , T } − f (x∗ ) −1
≤2
(2M0 + 1) ( 2
T
at )−1 + δf (2M0 + 1) + δC (L + 1)
t=0
+4δC (T + 1)(2M0 + 1)(
T
at )−1
t=0
+ 2−1 (L + 1)2
T t=0
at2 (
T t=0
at )−1 .
(3.11)
86
3 The Mirror Descent Algorithm
Theorem 3.1 is proved in Sect. 3.3. It is a generalization of Theorem 3.1 of [92] proved in the case when δf = δC . We are interested in an optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. By Theorem 3.1, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function Tt=0 at2 on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T . This is the best choice of at , t = 0, 1, . . . , T . Let T be a natural number and at = a, t = 0, . . . , T . Now we will find the best a > 0. By Theorem 3.1, we need to choose a which is a minimizer of the function 2−1 ((T + 1)a)−1 (2M0 + 1)2 + δf (2M0 + 1) + δC (L + 1) +4a −1 δC (2M0 + 1) + 2−1 (L + 1)2 a. Since T can be arbitrarily large, we need to find a minimizer of the function φ(a) := 4a −1 δC (2M0 + 1) + 2−1 (L + 1)2 a, a ∈ (0, ∞). Clearly, the minimizer is a = (8δC (2M0 + 1))1/2 (L + 1)−1 and the minimal value of φ is (8δC (2M0 + 1))1/2 (L + 1). Now we can think about the best choice of T . It is clear that it should be at the same order as δC−1 . As a result, we obtain a point ξ ∈ U such that BX (ξ, δC ) ∩ C = ∅ and 1/2
f (ξ ) ≤ f (x∗ ) + c1 δC + δf (2M0 + 1), where the constant c1 > 0 depends only on L and M0 .
3.2 The Main Lemma
87
3.2 The Main Lemma Lemma 3.2 Let δf , δC ∈ (0, 1], a > 0 and let z ∈ C.
(3.12)
x ∈ U ∩ BX (0, M0 + 1),
(3.13)
ξ ∈ ∂f (x) + BX (0, δf ),
(3.14)
g(v) = ξ, v + (2a)−1 v − x2 , v ∈ X
(3.15)
u∈U
(3.16)
BX (u, δC ) ∩ {v ∈ C : g(v) = inf(g, C)} = ∅.
(3.17)
Assume that
and that
satisfies
Then a(f (x) − f (z)) ≤ δf a(2M0 + 1) + δC a(L + 1) + 4δC (M0 + 1) +2−1 a 2 (L + 1)2 + 2−1 z − x2 − 2−1 z − u2 . Proof In view of (3.14), there exists l ∈ ∂f (x)
(3.18)
l − ξ ≤ δ.
(3.19)
such that
Clearly, the function g is Fréchet differentiable on X. We denote by g (v) its Fréchet derivative at v ∈ X. It is easy to see that g (v) = ξ + a −1 (v − x), v ∈ X.
(3.20)
88
3 The Mirror Descent Algorithm
By (3.17), there exists u ∈ BX (u, δC ) ∩ C
(3.21)
g( u) = inf(g, C).
(3.22)
such that
It follows from (3.20)–(3.22) that for all v ∈ C, u), v − u = ξ + a −1 ( u − x), v − u. 0 ≤ g (
(3.23)
By (3.1), (3.12), (3.13), (3.18), and (3.19), a(f (x) − f (z)) ≤ ax − z, l = ax − z, ξ + ax − z, l − ξ ≤ ax − z, ξ + ax − zl − ξ ≤ δf a(2M0 + 1) + ax − z, ξ u + aξ, u − z = δf a(2M0 + 1) + aξ, x − u, aξ ≤ δf a(2M0 + 1) + x − + z − u, x − u − aξ + z − u, u − x.
(3.24)
Relations (3.12) and (3.23) imply that z − u, x − u − aξ ≤ 0.
(3.25)
In view of Lemma 2.1, u2 − u − x2 ]. z − u, u − x = 2−1 [z − x2 − z −
(3.26)
It follows from (3.3), (3.18), (3.19), and (3.21) that x − u, aξ = x − u, aξ + u − u, aξ ≤ aδC ξ + x − u, aξ ≤ aδC (L + 1) + x − u, aξ ≤ aδC (L + 1) + 2−1 x − u2 + 2−1 a 2 ξ 2 .
(3.27)
3.2 The Main Lemma
89
By (3.3), (3.18), (3.19), and (3.24)–(3.27), a(f (x) − f (z)) u, aξ ≤ δf a(2M0 + 1) + x − +z − u, x − u − aξ + z − u, u − x ≤ δf a(2M0 + 1) + aδC (L + 1) +2−1 x − u2 + 2−1 a 2 ξ 2 +2−1 z − x2 − 2−1 z − u2 − 2−1 u − x2 ≤ δf a(2M0 + 1) + δC a(L + 1) + 2−1 a 2 (L + 1)2 +2−1 z − x2 − 2−1 z − u2 + 2−1 x − u2 − 2−1 u − x2 .
(3.28)
In view of (3.1), (3.13), (3.17), and (3.21), u − x2 | |x − u2 − ≤ |x − u − u − x|(x − u + u − x) ≤ 4u − u(M0 + 1) ≤ 4(M0 + 1)δC .
(3.29)
Relations (3.1), (3.12), (3.13), and (3.21) imply that |z − u2 − z − u2 | ≤ |z − u − z − u|(z − u + z − u) ≤ u − u(4M0 + 1) ≤ (4M0 + 1)δC .
(3.30)
By (3.28), (3.29), and (3.30), a(f (x) − f (z)) ≤ δf a(2M0 + 1) + δC (L + 1) + +2−1 a 2 (L + 1)2 +2−1 z − x2 − 2−1 z − u2 + 4(M0 + 1)δC . This completes the proof of Lemma 3.2.
90
3 The Mirror Descent Algorithm
3.3 Proof of Theorem 3.1 In view of (3.1), (3.6), and (3.9), xt ≤ M0 + 1, t = 0, 1, . . . .
(3.31)
Let t ≥ 0 be an integer. Applying Lemma 3.2 with g = gt , z = x∗ , a = at , x = xt , ξ = ξt , u = xt+1 we obtain that at (f (xt ) − f (x∗ )) ≤ 2−1 x∗ − xt 2 − 2−1 x∗ − xt+1 2 +4δC (M0 + 1) + at δf (2M0 + 1) + at δC (L + 1) + 2−1 at2 (L + 1)2 .
(3.32)
By (3.1), (3.4), (3.31), and (3.32), for each natural number T , T
at (f (xt ) − f (x∗ ))
t=0
≤
T
(2−1 x∗ − xt 2 − 2−1 x∗ − xt+1 2 )
t=0
+δf (2M0 + 1)
T t=0
at + δC (L + 1)
T
at
t=0
+4(T + 1)δC (M0 + 1) + 2−1 (L + 1)2
T
at2
t=0
≤ 2−1 (2M0 + 1)2 + (δf (2M0 + 1) + δC (L + 1))
T t=0
+4(T + 1)δC (M0 + 1) + 2−1 (L + 1)2
T t=0
at2 .
at
3.4 Optimization on Unbounded Sets
91
Thus (3.10) is true. Evidently, (3.10) implies (3.11). This completes the proof of Theorem 3.1.
3.4 Optimization on Unbounded Sets Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let D be a nonempty closed convex subset of X, V be an open convex subset of X such that D ⊂ V,
(3.33)
and f : V → R 1 be a convex function which is Lipschitz on all bounded subsets of V . Set Dmin = {x ∈ D : f (x) ≤ f (y) for all y ∈ D}.
(3.34)
We suppose that Dmin = ∅. In Sect. 3.5 we will prove the following result. Theorem 3.3 Let δf , δC ∈ (0, 1], M > 1 satisfy Dmin ∩ BX (0, M) = ∅,
(3.35)
M0 > 80M + 6,
(3.36)
L > 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2),
(3.37)
0 < τ0 ≤ τ1 ≤ (4L + 4)−1 ,
(3.38)
0 = 8τ0−1 δC (M0 + 1) + 2δC (L + 1) + 2δf (2M0 + 1) + τ1 (L + 1)2
(3.39)
n0 = τ0−1 (2M + 2)2 0−1 + 1.
(3.40)
and let
92
3 The Mirror Descent Algorithm
Assume that ∞ ∞ {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X, {at }t=0 ⊂ [τ0 , τ1 ],
(3.41)
x0 ≤ M
(3.42)
ξt ∈ ∂f (xt ) + BX (0, δf ),
(3.43)
gt (v) = ξt , v + (2at )−1 v − xt 2 , v ∈ X
(3.44)
BX (xt+1 , δC ) ∩ argmin(gt , D) = ∅.
(3.45)
and that for each integer t ≥ 0,
and
Then there exists an integer q ∈ [1, n0 ] such that f (xq ) ≤ inf(f, D) + 0 , xt ≤ 15M + 1, t = 0, . . . , q. We are interested in the best choice of at , t = 0, 1, . . . . Assume for simplicity that τ1 = τ0 . In order to meet our goal we need to minimize 0 which obtains its minimal value when τ0 = (8δC (M0 + 1))1/2 (L + 1)−1 and the minimal value of 0 is 2δC (L + 1) + 2δf (2M0 + 1) + 2(8δC (M0 + 1))1/2 (L + 1). 1/2
Thus 0 is at the same order as max{δC , δf }. By (3.40) and the inequalities above, −1/2 1/2 n0 is at the same order as δC max{δC , δf }−1 . Theorem 3.3 is a generalization of Theorem 3.3 of [92] proved with δf = δC . In this chapter we will prove the following two theorems which have no prototype in [92]. Theorem 3.4 Let δf , δC ∈ (0, 1), M > 0 satisfy δf ≤ (24M + 48)−1 , δC ≤ 8−1 (L + 1)−2 (16M + 16)−1 ,
(3.46)
{x ∈ V : f (x) ≤ inf(f, D) + 4} ⊂ BX (0, M),
(3.47)
M0 = 12M + 12,
(3.48)
3.4 Optimization on Unbounded Sets
93
L > 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2)
(3.49)
and let {at }∞ t=0 ⊂ (0, ∞) satisfy for all integers t ≥ 0, 4δC (M0 + 1) ≤ at ≤ (2L + 2)−1 (L + 1)−1 .
(3.50)
∞ {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
(3.51)
x0 ≤ M
(3.52)
ξt ∈ ∂f (xt ) + BX (0, δf ),
(3.53)
gt (v) = ξt , v + (2at )−1 v − xt 2 , v ∈ X
(3.54)
BX (xt+1 , δC ) ∩ argmin(gt , D) = ∅.
(3.55)
Assume that
and that for each integer t ≥ 0,
and
Then xt ≤ 5M + 3 for all integers t ≥ 0 and for each natural number T , T
at (f (xt ) − inf(f, D))
t=0
≤ 2M 2 + (δf (2M0 + 1) + δC (L + 1))
T
at
t=0
+ 4δC (T + 1)(M0 + 1) + 2−1 (L + 1)2
T t=0
Moreover, for each natural number T ,
at2 .
(3.56)
94
3 The Mirror Descent Algorithm
f ((
T
at )−1
t=0
T
at xt ) − inf(f, D),
t=0
min{f (xt ) : t = 0, . . . , T } − inf(f, D) ≤ 2M ( 2
T
at )−1 + δf (2M0 + 1) + δC (L + 1)
t=0
+ 4δC (T + 1)(M0 + 1)(
T t=0
at )−1 + 2−1 (L + 1)2
T
at2 (
t=0
T
at )−1 .
(3.57)
t=0
We are interestedin the best choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. By Theorem 3.4, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function Tt=0 at2 on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T . This is the best choice of at , t = 0, 1, . . . , T . Let T be a natural number and at = a, t = 0, . . . , T . Now we will find the best a > 0. By Theorem 3.4, we need to choose a which is a minimizer of the function 2((T + 1)a)−1 M 2 + δf (2M0 + 1) + δC (L + 1) +4a −1 δC (M0 + 1) + 2−1 (L + 1)2 a. Since T can be arbitrarily large, we need to find a minimizer of the function φ(a) := 4a −1 δC (M0 + 1) + 2−1 (L + 1)2 a, a ∈ (0, ∞). Clearly, the minimizer is a = (8δC (M0 + 1))1/2 (L + 1)−1 and the minimal value of φ is (8δC (M0 + 1))1/2 (L + 1). Now we can think about the best choice of T . It is clear that it should be at the same order as δC−1 . As a result, we obtain a point ξ ∈ U such that
3.4 Optimization on Unbounded Sets
95
BX (ξ, δC ) ∩ C = ∅ and 1/2
f (ξ ) ≤ inf(f, D) + c1 δC + δf (2M0 + 1), where the constant c1 > 0 depends only on L and M0 . Theorem 3.5 Let δf , δC ∈ (0, 1), M > 8
(3.58)
Dmin ∩ BX (0, M) = ∅,
(3.59)
satisfy
L > 0 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 8M + 8),
(3.60)
a = (4L + 4)−1 δC ,
(3.61)
1/2
−1/2
T = 8−1 δC
max{δf , δC }−1 . 1/2
(3.62)
Assume that ∞ {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M, BX (x0 , δC ) ∩ D = ∅
(3.63)
and that for each integer t ≥ 0, ξt ∈ ∂f (xt ) + BX (0, δf ),
(3.64)
gt (v) = ξt , v + (2at )−1 v − xt 2 , v ∈ X
(3.65)
BX (xt+1 , δC ) ∩ argmin(gt , D) = ∅.
(3.66)
xt ≤ 3M + 2 for all integers t ∈ [0, T ]
(3.67)
and
Then
96
3 The Mirror Descent Algorithm
and T (T + 1)−1 f (xt ) − inf(f, D), t=0
f ((T + 1)−1
T
xt ) − inf(f, D), min{f (xt ) : t = 0, . . . , T } − inf(f, D)
t=0 1/2
≤ 64M 2 (L + 1) max{δf , δC }.
(3.68)
3.5 Proof of Theorem 3.3 By (3.35) there exists z ∈ Dmin ∩ BX (0, M).
(3.69)
Assume that T is a natural number and that f (xt ) − f (z) > 0 , t = 1, . . . , T .
(3.70)
In view of (3.45), there exists η ∈ BX (x1 , δC ) ∩ argmin(g0 , D).
(3.71)
Relations (3.44), (3.69), and (3.71) imply that ξ0 , η + (2a0 )−1 η − x0 2 ≤ ξ0 , z + (2a0 )−1 z − x0 2 .
(3.72)
It follows from (3.37), (3.42), and (3.43) that ξ0 ≤ L + 1.
(3.73)
a0−1 ≥ τ1−1 ≥ 4(L + 1).
(3.74)
In view of (3.8) and (3.41),
By (3.42), (3.69), (3.72), and (3.73), (L + 1)M + (2a0 )−1 (2M + 1)2
3.5 Proof of Theorem 3.3
97
≥ ξ0 , z + (2a0 )−1 z − x0 2 ≥ (2a0 )−1 η − x0 2 + ξ0 , η − x0 + ξ0 , x0 ≥ (2a0 )−1 η − x0 2 − (L + 1)η − x0 − (L + 1)M. Together with (3.38) and (3.41) this implies that M + (2M + 1)2 ≥ η − x0 2 − 2−1 η − x0 , (η − x0 − 4−1 )2 ≤ (4M + 1)2 , η − x0 ≤ 8M.
(3.75)
Together with (3.48), (3.69), and (3.71) this implies that η ≤ 9M, x1 ≤ 9M + 1, η − z ≤ 10M, x1 − z ≤ 10M + 1.
(3.76)
By induction we show that for every integer t ∈ [1, T ], xt − z ≤ 14M + 1,
(3.77)
f (xt ) − f (z) ≤ δf (2M0 + 1) + δC (L + 1) +4τ0−1 δC (M0 + 1) + 2−1 τ1 (L + 1)2 + (2τ0 )−1 (z − xt 2 − z − xt+1 2 ).
(3.78)
U = V ∩ {v ∈ X : v < M0 + 2}
(3.79)
C = D ∩ BX (0, M0 ).
(3.80)
Set
and
In view of (3.69) and (3.76), (3.77) holds for t = 1.
98
3 The Mirror Descent Algorithm
Assume that an integer t ∈ [1, T ] and that (3.77) holds. It follows from (3.36), (3.69), (3.79), and (3.80) that z ∈ C ⊂ BX (0, M0 ).
(3.81)
In view of (3.36), (3.69), (3.77), and (3.79), xt ∈ U ∩ BX (0, M0 + 1).
(3.82)
Relations (3.37), (3.43), and (3.82) imply that ξt ∈ ∂f (xt ) + BX (0, δf ) ⊂ BX (0, L + 1).
(3.83)
In view of (3.45), there exists h ∈ BX (xt+1 , δC ) ∩ argmin(gt , D).
(3.84)
By (3.44), (3.69), and (3.84), ξt , h + (2at )−1 h − xt 2 ≤ ξt , z + (2at )−1 z − xt 2 .
(3.85)
In view of (3.85), ξt , z + (2at )−1 z − xt 2 ≥ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt . It follows from the inequality above, (3.36), (3.38), (3.41), (3.69), (3.77), and (3.83) that (L + 1)M + (2at )−1 (14M + 1)2 ≥ (2at )−1 h − xt 2 − (L + 1)h − xt − (L + 1)(15M + 1) ≥ (2at )−1 (h − xt 2 − h − xt ) − (L + 1)(15M + 1) ≥ (2at )−1 (h − xt − 1)2 − (2at )−1 − (L + 1)(15M + 1), 4(14M + 1)2 ≥ 2 + 16M + (14M + 1)2 ≥ (h − xt − 1)2 , h − xt ≤ 28M + 4, h ≤ 44M + 5 < M0 .
(3.86)
3.5 Proof of Theorem 3.3
99
By (3.80), (3.84), and (3.86), h ∈ C.
(3.87)
Relations (3.80), (3.84), and (3.87) imply that h ∈ argmin(gt , C) and h ∈ BX (xt+1 , δC ) ∩ argmin(gt , C).
(3.88)
It follows from (3.37), (3.38), (3.43), (3.44), (3.69), (3.79), (3.82), (3.88), and Lemma 3.2 which holds with x = xt , a = at , ξ = ξt , u = xt+1 that at (f (xt ) − f (z)) ≤ δf at (2M0 + 1) + δC at (L + 1) +4δC (M0 + 1) + 2−1 at2 (L + 1)2 +2−1 z − xt 2 − 2−1 z − xt+1 2 . Together with the inclusion at ∈ [τ0 , τ1 ] this implies that f (xt ) − f (z) ≤ 4τ0−1 δC (M0 + 1) +(2M0 + 1)δf + (L + 1)δC + 2−1 τ1 (L + 1)2 + (2τ0 )−1 z − xt 2 − (2τ0 )−1 z − xt+1 2
(3.89)
and (3.78) holds. In view of (3.39), (3.70), (3.77), and (3.89), z − xt 2 − z − xt+1 2 ≥ 0, z − xt+1 ≤ z − xt ≤ 14M + 1. Hence by induction we showed that (3.78) holds for all t = 1, . . . , T and (3.77) holds for all t = 1, . . . , T + 1. It follows from (3.39), (3.40), (3.42), (3.69), (3.70), and (3.78) that
100
3 The Mirror Descent Algorithm
T 0 < T (min{f (xt ) : t = 1, . . . , T } − f (z)) ≤
T
(f (xt ) − f (z))
t=1
≤ (2τ0 )
−1
T (z − xt 2 − z − xt+1 2 ) t=1
+T (δf (2M0 + 1) + δC (L + 1) +4δC (M0 + 1)τ0−1 + 2−1 τ1 (L + 1)2 ) ≤ (2τ0 )−1 (2M + 1)2 + 4T τ0−1 δC (M0 + 1) +T (δf (2M0 + 1) + δC (L + 1) + 2−1 τ1 (L + 1)2 ), 2−1 0 < (2τ0 T )−1 (2M + 1)2 and T < τ0−1 (2M + 1)2 0−1 < n0 . Thus we have shown that if an integer T ≥ 1 satisfies f (xt ) − f (z) > 0 , t = 1, . . . , T , then T < n0 and (3.77) holds for all t = 1, . . . , T + 1. This implies that there exists an integer q ∈ {1, . . . , n0 } such that f (xq ) − f (z) ≤ 0 , xt ≤ 15M + 1, t = 0, . . . , q.
Theorem 3.3 is proved.
3.6 Proof of Theorem 3.4 Fix z ∈ Dmin .
(3.90)
3.6 Proof of Theorem 3.4
101
Set U = V ∩ {v ∈ X : v < M0 + 2}
(3.91)
C = D ∩ BX (0, M0 ),
(3.92)
M1 = 4M + 3.
(3.93)
z ≤ M.
(3.94)
x0 − z ≤ 2M.
(3.95)
and
In view of (3.47) and (3.90),
By (3.52) and (3.94),
Assume that t ≥ 0 is an integer and that xt − z ≤ M1 .
(3.96)
It follows from (3.48), (3.90), (3.91), (3.93), (3.94), and (3.96) that z ∈ C ∩ BX (0, M),
(3.97)
xt ≤ M1 + M,
(3.98)
xt ∈ U ∩ BX (0, M1 + M).
(3.99)
Relations (3.48), (3.49), (3.53), (3.93), and (3.98) imply that ξt ∈ ∂f (xt ) + BX (0, δf ) ⊂ BX (0, L + 1).
(3.100)
In view of (3.55), there exists h ∈ BX (xt+1 , δC ) ∩ argmin(gt , D).
(3.101)
By (3.54), (3.90), and (3.101), ξt , h + (2at )−1 h − xt 2 ≤ ξt , z + (2at )−1 z − xt 2 .
(3.102)
102
3 The Mirror Descent Algorithm
In view of (3.102), ξt , z + (2at )−1 z − xt 2 ≥ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt .
(3.103)
It follows from (3.50), (3.100), and (3.103) that (2at )−1 (h − xt − 1)2 − (2at )−1 ≤ (2at )−1 (h − xt 2 − h − xt ) ≤ (2at )−1 h − xt 2 − (L + 1)h − xt ≤ (2at )−1 h − xt 2 + ξt , h − xt ≤ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt + ξt xt .
(3.104)
By (3.94), (3.96), (3.98), (3.100), and (3.104), (2at )−1 (h − xt − 1)2 − (2at )−1 ≤ (2at )−1 h − xt 2 + ξt , h − xt +ξt , xt + (L + 1)(M1 + M) ≤ (2at )−1 z − xt 2 + ξt , z + (L + 1)(M1 + M) ≤ (L + 1)(M1 + 2M) + (2at )−1 M12 .
(3.105)
In view of (3.48), (3.50), (3.93), (3.98), and (3.105), (h − xt − 1)2 ≤ 1 + 2M + M12 + M1 , h − xt ≤ M1 + 2, h ≤ M1 + 2 + M1 + M ≤ 3M1 + 2 = 12M + 11 < M0 .
(3.106)
By (3.48), (3.92), (3.101), and (3.106), h ∈ C.
(3.107)
3.6 Proof of Theorem 3.4
103
Relations (3.93), (3.101), and (3.107) imply that h ∈ argmin(gt , C).
(3.108)
xt+1 ∈ U ∩ BX (0, M0 + 1).
(3.109)
By (3.101) and (3.106),
It follows from (3.48), (3.53), (3.54), (3.93), (3.97), (3.99), (3.101), (3.108), (3.109), and Lemma 3.2 which holds with x = xt , a = at , ξ = ξt , u = xt+1 that at (f (xt ) − f (z)) ≤ δf at (2M0 + 1) + δC at (L + 1) +4δC (M0 + 1) + 2−1 at2 (L + 1)2 + 2−1 z − xt 2 − 2−1 z − xt+1 2 .
(3.110)
z − xt+1 ≤ z − xt ;
(3.111)
z − xt+1 > z − xt .
(3.112)
There are two cases:
Assume that (3.111) is true. In view of (3.96) and (3.111), z − xt+1 ≤ z − xt ≤ M1 . Assume that (3.112) holds. It follows from (3.46), (3.48), (3.50), (3.110), and (3.112) that f (xt ) − f (z) ≤ 4at−1 δC (M0 + 1) + δf (2M0 + 1) +δC (L + 1) + 2−1 at (L + 1)2 ≤ 4. By the inequality above, (3.47) and (3.90),
104
3 The Mirror Descent Algorithm
xt ≤ M.
(3.113)
It follows from (3.97), (3.100), (3.103), and (3.113) that (2at )−1 (h − xt − 1)2 − (2at )−1 ≤ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt + (L + 1)M ≤ (2at )−1 z − xt 2 + ξt , z + (L + 1)M ≤ 2(L + 1)M + 4(2at )−1 M 2 . By the relation above, (3.50) and (3.113), (h − xt − 1)2 ≤ 1 + 4M 2 + 2M ≤ (2M + 1)2 , h − xt ≤ 2M + 2, h ≤ 3M + 2.
(3.114)
xt+1 ≤ 3M + 3.
(3.115)
In view of (3.101) and (3.114),
By (3.97) and (3.115), xt+1 − z ≤ 4M + 3. Together with (3.93) this implies that xt+1 − z ≤ M1 . Hence by induction we showed that for all integers t ≥ 0, xt − z ≤ M1 and (3.110) is true. It follows from (3.93), (3.97), and (3.116) that xt ≤ 5M + 3 for all integers t ≥ 0. By (3.90), (3.95), and (3.110), for each natural number T , T t=0
at (f (xt ) − inf(f, D))
(3.116)
3.7 Proof of Theorem 3.5
105
≤ 2−1 z − x0 2 + (δf (2M0 + 1) + δC (L + 1))
T
at
t=0 T
+4δC (T + 1)(M0 + 1) + 2−1 (L + 1)2
at2
t=0
≤ 2M 2 + (δf (2M0 + 1) + δC (L + 1))
T
at
t=0
+4δC (T + 1)(M0 + 1) + 2−1 (L + 1)2
T
at2 .
t=0
Theorem 3.4 is proved.
3.7 Proof of Theorem 3.5 By (3.59), there exists z ∈ Dmin ∩ BX (0, M).
(3.117)
U = V ∩ {v ∈ X : v < 8M + 8}
(3.118)
C = D ∩ BX (0, 5M + 6).
(3.119)
Set
and
In view of (3.63) and (3.116), x0 − z ≤ 2M.
(3.120)
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M + 2.
(3.121)
It follows from (3.116)–(3.119), (3.121) that z ∈ C ∩ BX (0, M),
(3.122)
106
3 The Mirror Descent Algorithm
xt ≤ 3M + 2,
(3.123)
xt ∈ U ∩ BX (0, 3M + 2).
(3.124)
Relations (3.60) and (3.123) imply that ξt ∈ ∂f (xt ) + BX (0, δf ) ⊂ BX (0, L + 1).
(3.125)
In view of (3.66), there exists h ∈ BX (xt+1 , δC ) ∩ argmin(gt , D).
(3.126)
By (3.65), (3.116), and (3.126), ξt , h + (2a)−1 h − xt 2 ≤ ξt , z + (2a)−1 z − xt 2 .
(3.127)
In view of (3.127), ξt , z + (2a)−1 z − xt 2 ≥ (2a)−1 h − xt 2 + ξt , h − xt + ξt , xt .
(3.128)
It follows from (3.58), (3.61), (3.125), and (3.128) that (2a)−1 (h − xt − 1)2 − (2a)−1 ≤ (2a)−1 (h − xt 2 − h − xt ) ≤ (2a)−1 h − xt 2 − (L + 1)h − xt ≤ (2a)−1 h − xt 2 + ξt , h − xt ≤ (2a)−1 h − xt 2 + ξt , h − xt + ξt , xt + ξt xt .
(3.129)
By (3.116), (3.125), (3.128), and (3.129), (2a)−1 (h − xt − 1)2 − (2a)−1 ≤ (2a)−1 h − xt 2 + ξt , h − xt + ξt , xt + (L + 1)xt ≤ (2a)−1 z − xt 2 + ξt , z + (L + 1)xt ≤ (L + 1)(M + xt ) + (2a)−1 z − xt 2 .
(3.130)
3.7 Proof of Theorem 3.5
107
In view of (3.61), (3.121), (3.123), and (3.130), (h − xt − 1)2 ≤ 1 + z − xt 2 + M + xt 1 + 4M 2 + 8M + 4 + 4M + 2 = 4M 2 + 12M + 7 ≤ (2M + 3)2 and h − xt − 1 ≤ 2M + 3, h ≤ 5M + 6.
(3.131)
By (3.119)) and (3.131), h ∈ C. Together with relations (3.119) and (3.126) this implies that h ∈ argmin(gt , C).
(3.132)
xt+1 ≤ 5M + 7.
(3.133)
By (3.126) and (3.131),
It follows from (3.64), (3.65), (3.116)–(3.119), (3.124), (3.126), (3.132), (3.133), and Lemma 3.2 applied with M0 = 5M + 6, x = xt , ξ = ξt , u = xt+1 , that a(f (xt ) − f (z)) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 +aδf (10M + 13) + aδC (L + 1) + 4δC (5M + 7) + 2−1 a 2 (L + 1)2 . In view of (3.60), (3.63), (3.67), (3.116), and (3.123), f (xt ) − f (z) ≥ −LδC . It follows from (3.61), (3.116), (3.134), and the inequality above that
(3.134)
108
3 The Mirror Descent Algorithm
z − xt+1 2 − z − xt 2 ≤ 8δC (5M + 7) + a 2 (L + 1)2 +2aδf (10M + 3) + 2δC a(L + 1) + 2aLδC ≤ 8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3).
(3.135)
Thus we have shown that the following property holds: (a) if for an integer t ≥ 0 relation (3.121) holds, then (3.134) and (3.135) are true. Let us show that for all integers t = 0, . . . , T , z − xt 2 ≤ z − x0 2 + t[8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3)] ≤ z − x0 2 + T [8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3)] ≤ 4M 2 + 8M + 4.
(3.136)
First, note that by (3.61) and (3.62), T [8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3)] ≤ 8T δC (5M + 8) + T δC + 2T aδf (10M + 3) −1/2 −1 δf }(8δC (5M
≤ 8−1 min{δC−1 , δC
+ 9) + δf (10M + 3)δC (L + 1)−1 ) 1/2
≤ 5M + 9 + 2M + 1 = 7M + 10 ≤ 8M + 4.
(3.137)
It follows from (3.120) and (3.137) that z − x0 2 + T [8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3)] ≤ (2M + 2)2 . Assume that t ∈ {0, . . . , T } \ {T } and that (3.136) holds. Then xt − z ≤ 2M + 2 (see (3.121)) and by property (a), (3.134) and (3.135) hold. By (3.135), (3.136), and (3.138),
(3.138)
3.7 Proof of Theorem 3.5
109
z − xt+1 2 ≤ z − xt 2 + 8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3) ≤ z − x0 2 +(t + 1)[8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3)] ≤ z − x0 2 +T [8δC (5M + 8) + a 2 (L + 1)2 + 2aδf (10M + 3)] ≤ 4M 2 + 8M + 4 and z − xt+1 ≤ 2M + 2. Thus we have shown by induction that (3.136) holds for all t = 0, . . . , T . Together with property (a) this implies that (3.134) holds for all t = 0, . . . , T . By (3.116) and (3.136), xt ≤ 3M + 2, t = 0, . . . , T . It follows from (3.134) that T
a(f (xt ) − f (z))
t=0
≤ 2−1 z − x0 2 + (T + 1)(4δC (5M + 7) + aδf (10M + 13) +2−1 a 2 (L + 1)2 + aδC (L + 1)). By the relation above, (3.61), (3.62), (3.116), and (3.120), T (T + 1)−1 f (xt ) − inf(f, D), t=0
≤ δf (10M + 13) + δC (L + 1) −1/2
+4δC (5M + 7)2(L + 1)δC
+ 2−2 (L + 1)δC
1/2
110
3 The Mirror Descent Algorithm 1/2
+4M 2 (L + 1)8 max{δC , δf } 1/2
1/2
≤ 32M 2 (L + 1) max{δC , δf } + 44M(L + 1) max{δC , δf } 1/2
≤ 64M 2 (L + 1) max{δC , δf }.
This implies (3.68). Theorem 3.5 is proved.
3.8 Zero-Sum Games on Bounded Sets Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Let C be a nonempty closed convex subset of X, D be a nonempty closed convex subset of Y , U be an open convex subset of X, and V be an open convex subset of Y such that C ⊂ U, D ⊂ V .
(3.139)
Suppose that there exist L1 , L2 > 0, M1 , M2 > 0 such that C ⊂ BX (0, M1 ), D ⊂ BY (0, M2 ),
(3.140)
a function f : U × V → R 1 possesses the following properties: (i) for each v ∈ V , the function f (·, v) : U → R 1 is convex; (ii) for each u ∈ U , the function f (u, ·) : V → R 1 is concave, for each u ∈ U , |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2 for all v1 , v2 ∈ V ,
(3.141)
and that for each v ∈ V , |f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2 for all u1 , u2 ∈ U.
(3.142)
Recall that for each (ξ, η) ∈ U × V , ∂x f (ξ, η) = {l ∈ X : f (y, η) − f (ξ, η) ≥ l, y − ξ for all y ∈ U }, ∂y f (ξ, η) = {l ∈ Y : l, y − η ≥ f (ξ, y) − f (ξ, η) for all y ∈ V }.
(3.143)
3.8 Zero-Sum Games on Bounded Sets
111
In view of properties (i) and (ii) and (3.141)–(3.143), for each ξ ∈ U and each η ∈ V, ∅ = ∂x f (ξ, η) ⊂ BX (0, L1 ),
(3.144)
∅ = ∂y f (ξ, η) ⊂ BY (0, L2 ).
(3.145)
x∗ ∈ C and y∗ ∈ D
(3.146)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(3.147)
Assume that
satisfy
for each x ∈ C and each y ∈ D. Let δf,1 , δf,2 , δC , δD ∈ (0, 1], and {ak }∞ k=0 ⊂ (0, ∞). Let us describe our algorithm. Mirror descent algorithm for zero-sum games Initialization: select arbitrary x0 ∈ U and y0 ∈ V . Iterative Step: given current iteration vectors xt ∈ U and yt ∈ V calculate ξt ∈ ∂x f (xt , yt ) + BX (0, δf,1 ), ηt ∈ ∂y f (xt , yt ) + BY (0, δf,2 ) and the next pair of iteration vectors xt+1 ∈ U , yt+1 ∈ V such that BX (xt+1 , δC ) ∩ argmin{ξt , v + (2at )−1 v − xt 2 : v ∈ C} = ∅, BY (yt+1 , δD ) ∩ argmin{−ηt , u + (2at )−1 u − yt 2 : u ∈ D} = ∅. In this chapter we prove the following result. Theorem 3.6 Let δf,1 , δf,2 , δC , δD ∈ (0, 1], and {ak }∞ k=0 ⊂ (0, ∞). Assume that ∞ ∞ ∞ {xt }∞ t=0 ⊂ U , {yt }t=0 ⊂ V , {ξt }t=0 ⊂ X, {ηt }t=0 ⊂ Y , BX (x0 , δC ) ∩ C = ∅, BY (y0 , δD ) ∩ D = ∅ and that for each integer t ≥ 0, ξt ∈ ∂x f (xt , yt ) + BX (0, δf,1 ), ηt ∈ ∂y f (xt , yt ) + BY (0, δf,2 ),
(3.148)
112
3 The Mirror Descent Algorithm
BX (xt+1 , δC ) ∩ argmin{ξt , v + (2at )−1 v − xt 2 : v ∈ C} = ∅, BY (yt+1 , δD ) ∩ argmin{−ηt , u + (2at )−1 u − yt 2 : u ∈ D} = ∅. Let for each natural number T , xT = (
T
at )−1
T
i=0
yT = (
T
at xt ,
t=0
at )−1
T
i=0
at yt .
t=0
Then for each natural number T , xT , δC ) ∩ C = ∅, BX ( yT , δD ) ∩ D = ∅, BY ( |(
T
at )−1
t=0
T
at f (xt , yt ) − f (x∗ , y∗ )|
t=0
≤ max{δf,1 (2M1 + 1), δf,2 (2M2 + 1)} +2 max{δC (L1 + 1), δD (L2 + 1)} T +(T + 1)( at )−1 max{4δC (M1 + 1), 4δD (M2 + 1)} t=0
+2−1 (
T
at )−1
t=0
+2−1 (
T
at2 max{(L1 + 1)2 , (L2 + 1)2 }
t=0 T
at )−1 (max{2M1 , 2M2 } + 1)2 ,
t=0
yT ) − ( |f ( xT ,
T t=0
at )−1
T
at f (xt , yt )|
t=0
≤ max{δf,1 (2M1 + 1), δf,2 (2M2 + 1)} +2 max{δC (L1 + 1), δD (L2 + 1)}
(3.149)
3.8 Zero-Sum Games on Bounded Sets
113
T +(T + 1)( at )−1 max{4δC (M1 + 1), 4δD (M2 + 1)} t=0
+2−1 (
T
at )−1
t=0
T
at2 max{(L1 + 1)2 , (L2 + 1)2 }
t=0
+2−1 (
T
at )−1 (max{2M1 , 2M2 } + 1)2 ,
t=0
and for each natural number T , each z ∈ C, and each u ∈ D, f (z, yT ) ≥ f ( xT , yT ) −2 max{δf,1 (2M1 + 1), δf,2 (2M2 + 1)} −3 max{δC (L1 + 1), δD (L2 + 1)} T at )−1 max{4δC (M1 + 1), 4δD (M2 + 1)} −2(T + 1)( t=0 T T at )−1 at2 max{(L1 + 1)2 , (L2 + 1)2 } −2( t=0
t=0
−(
T
at )−1 (max{2M1 , 2M2 } + 1)2 ,
t=0
f ( xT , u) ≤ f ( xT , yT ) +2 max{δf,1 (2M1 + 1), δf,2 (2M2 + 1)} +3 max{δC (L1 + 1), δD (L2 + 1)} T +2(T + 1)( at )−1 max{4δC (M1 + 1), 4δD (M2 + 1)} t=0 T T at )−1 at2 max{(L1 + 1)2 , (L2 + 1)2 } +2( t=0
t=0
T at )−1 (max{2M1 , 2M2 } + 1)2 . +( t=0
114
3 The Mirror Descent Algorithm
Proof Evidently, (3.149) holds. It is not difficult to see that xt ≤ M1 + 1, yt ≤ M2 + 1, t = 0, 1, . . . . Let t ≥ 0 be an integer. Applying Lemma 3.2 with a = at , x = xt , f = f (·, yt ), ξ = ξt , u = xt+1 we obtain that for each z ∈ C, at (f (xt , yt ) − f (z, yt )) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 +at δf,1 (2M1 + 1) + at δC (L1 + 1) +4δC (M1 + 1) + 2−1 at2 (L1 + 1)2 . Applying Lemma 3.2 with a = at , x = yt , f = −f (xt , ·), ξ = −ηt , u = yt+1 we obtain that for each v ∈ D, at (f (xt , v) − f (xt , yt )) ≤ 2−1 v − yt 2 − 2−1 v − yt+1 2 +at δf,2 (2M2 + 1) + at δD (L2 + 1) +4δD (M2 + 1) + 2−1 at2 (L2 + 1)2 . For all integers t ≥ 0 set bt,1 = at δf,1 (2M1 + 1) + at δC (L1 + 1) +4δC (M1 + 1) + 2−1 at2 (L1 + 1)2 , bt,2 = at δf,2 (2M2 + 1) + at δD (L2 + 1) +4δD (M2 + 1) + 2−1 at2 (L2 + 1)2 and define
3.8 Zero-Sum Games on Bounded Sets
115
φ(s) = 2−1 s 2 , s ∈ R 1 . It is easy to see that all the assumptions of Proposition 2.13 hold and it implies Theorem 3.6. We are interested in the optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. By Theorem 3.6, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function Tt=0 at2 on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T which is the best choice of at , t = 0, 1, . . . , T . Let T be a natural number and at = a for all t = 0, . . . , T . Now we will find the best a > 0. Since T can be arbitrarily large we need to choice a which is a minimizer of the function Ψ (a) = a −1 max{4δC (M1 + 1), 4δD (M2 + 1)} + a max{(L1 + 1)2 , (L2 + 1)2 }. This function has a minimizer a = (max{4δC (M1 + 1), 4δD (M2 + 1)})1/2 max{(L1 + 1)2 , (L2 + 1)2 }−1 . Now our goal is to find the best T > 0. Since in view of the inequalities above, in order to make the best choice of T , it should be at the same order as max{δC , δD }−1 . For example, T = max{δC , δD }−1 . In this case, we obtain a pair of points x ∈ U, y ∈ V such that x , δC ) ∩ C = ∅, BX ( y , δC ) ∩ D = ∅ BY ( and for each z ∈ C and each v ∈ D, f (z, y ) ≥ f ( x, y) −c max{δC , δD }1/2 − 2 max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1) and
116
3 The Mirror Descent Algorithm
f ( x , v) ≤ f ( x, y) +c max{δC , δD }1/2 + 2 max{δf,1 , δf,2 }(2 max{M1 , M2 } + 1) where the constant c > 0 depends only on L1 , L2 and M1 , M2 . Theorem 3.6 is a generalization of Theorem 3.4 of [92] obtained in the case when δf,1 = δf,2 = δC = δD .
3.9 Zero-Sum Games on Unbounded Sets Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. In this section we prove an extension of Theorem 3.6 for games on unbounded sets. This result has no prototype in [92], where only games on bounded sets are studied. Let A be a nonempty closed convex subset of X, D be a nonempty closed convex subset of Y , W be an open convex subset of X, and V be an open convex subset of Y such that A ⊂ W, D ⊂ V and let a function f : W × V → R 1 possess the following properties: (i) for each v ∈ V , the function f (·, v) : W → R 1 is convex and continuous; (ii) for each w ∈ W , the function f (w, ·) : V → R 1 is concave and continuous. Suppose that M0 , M1 > 0, M2 ≥ M0
(3.150)
θ0 ∈ A ∩ BX (0, M0 ),
(3.151)
D ⊂ BY (0, M0 ),
(3.152)
L1 ≥ 1, L2 > 0
(3.153)
sup{f (θ0 , y) : y ∈ V ∩ BY (0, M0 + 1)} < M1 ,
(3.154)
for each y ∈ V ∩ BY (0, M0 + 1), {x ∈ W : f (x, y) ≤ M1 + 4} ⊂ BX (0, M2 ), for each v ∈ V ∩ BY (0, M0 + 4), |f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2
(3.155)
3.9 Zero-Sum Games on Unbounded Sets
for all u1 , u2 ∈ W ∩ BX (0, 10M2 + 12),
117
(3.156)
and that for each u ∈ W ∩ BX (0, 10M2 + 12), |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2 for all v1 , v2 ∈ V ∩ BY (0, M0 + 2).
(3.157)
x∗ ∈ A and y∗ ∈ D
(3.158)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(3.159)
Let
satisfy
for each x ∈ A and each y ∈ D. In view of (3.151), (3.152), (3.154), (3.155), (3.158), and (3.159), f (x∗ , y∗ ) ≤ f (θ0 , y∗ ) < M1 ,
(3.160)
x∗ ≤ M2 , θ0 ≤ M2 .
(3.161)
In this section we prove the following result which does not have an analog in [92]. Theorem 3.7 Let δf,1 , δf,2 , δA , δD ∈ (0, 1/2) and {ak }∞ k=0 ⊂ (0, ∞) satisfy δA ≤ (36M2 + 44)−1 (L1 + 1)−2 ,
(3.162)
δf,1 ≤ (18M2 + 21)−1 ,
(3.163)
δA (36M2 + 44) ≤ at ≤ (L1 + 1)−2 for all integers t ≥ 0.
(3.164)
∞ ∞ ∞ Assume that {xt }∞ t=0 ⊂ W , {yt }t=0 ⊂ V , {ξt }t=0 ⊂ X, {ηt }t=0 ⊂ Y ,
BX (x0 , δA ) ∩ A = ∅, BY (y0 , δD ) ∩ D = ∅,
(3.165)
x0 ≤ M2 + 1
(3.166)
ξt ∈ ∂x f (xt , yt ) + BX (0, δf,1 ),
(3.167)
ηt ∈ ∂y f (xt , yt ) + BY (0, δf,2 ),
(3.168)
and that for each integer t ≥ 0,
118
3 The Mirror Descent Algorithm
BX (xt+1 , δA ) ∩ argmin{ξt , v + (2at )−1 v − xt 2 : v ∈ A} = ∅,
(3.169)
BY (yt+1 , δD ) ∩ argmin{−ηt , u + (2at )−1 u − yt 2 : u ∈ D} = ∅.
(3.170)
Let for each natural number T , xT = (
T
T
at )−1
i=0
at xt , yT = (
t=0
T
at )−1
i=0
T
at yt .
(3.171)
t=0
Then for all integers t ≥ 0, xt ≤ 5M2 + 4
(3.172)
and for each natural number T , xT , δA ) ∩ A = ∅, BY ( yT , δD ) ∩ D = ∅, BX ( |(
T
T
at )−1
t=0
(3.173)
at f (xt , yt ) − f (x∗ , y∗ )|
t=0
≤ max{δf,1 (18M2 + 21), δf,2 (2M0 + 1)} +2 max{δA (L1 + 1), δD (L2 + 1)} T +(T + 1)( at )−1 max{4δA (9M2 + 11), 4δD (M0 + 18)} t=0
−1
+2
(
T t=0
at )
−1
T
at2 max{(L1 + 1)2 , (L2 + 1)2 }
t=0
+ 2−1 (
T
at )−1 (10M2 + 10)2 ,
t=0
yT ) − ( |f ( xT ,
T t=0
at )−1
T
at f (xt , yt )|
t=0
≤ max{δf,1 (18M2 + 21), δf,2 (2M0 + 1)} +2 max{δA (L1 + 1), δD (L2 + 1)}
(3.174)
3.9 Zero-Sum Games on Unbounded Sets
119
T +(T + 1)( at )−1 max{4δA (9M2 + 11), 4δD (M0 + 18)} t=0
+2−1 (
T
at )−1
t=0
T
at2 max{(L1 + 1)2 , (L2 + 1)2 }
t=0
+ 2−1 (
T
at )−1 (10M2 + 10)2
(3.175)
t=0
and for each natural number T , each z ∈ A, and each u ∈ D, xT , yT ) f (z, yT ) ≥ f ( −2 max{δf,1 (18M2 + 21), δf,2 (2M0 + 1)} −3 max{δA (L1 + 1), δD (L2 + 1)} T −2(T + 1)( at )−1 max{4δA (9M2 + 11), 4δD (M0 + 18)} t=0
−(
T
at )−1
t=0
T
at2 max{(L1 + 1)2 , (L2 + 1)2 }
t=0 T −( at )−1 (10M2 + 10)2 , t=0
f ( xT , u) ≤ f ( xT , yT ) +2 max{δf,1 (18M2 + 21), δf,2 (2M0 + 1)} +3 max{δA (L1 + 1), δD (L2 + 1)} T at )−1 max{4δA (9M2 + 11), 4δD (M0 + 18)} +2(T + 1)( t=0
+(
T t=0
at )−1
T t=0
at2 max{(L1 + 1)2 , (L2 + 1)2 }
(3.176)
120
3 The Mirror Descent Algorithm
+(
T
at )−1 (10M2 + 10)2 .
(3.177)
t=0
Proof By (3.152) and (3.165), for all integers t ≥ 0, yt ≤ M0 + 1 ≤ M2 + 1.
(3.178)
C = A ∩ BX (0, 9M2 + 10)
(3.179)
U = W ∩ {x ∈ X : x < 10M2 + 12}.
(3.180)
Set
and
In view of (3.151) and (3.166), x0 − θ0 ≤ x0 + θ0 ≤ 2M2 + 1.
(3.181)
Assume that t ≥ 0 is an integer and that xt − θ0 ≤ 4M2 + 4.
(3.182)
It follows from (3.151), (3.180), and (3.182) that xt ≤ 5M2 + 4,
(3.183)
xt ∈ U ∩ BX (0, 5M2 + 4).
(3.184)
By (3.156), (3.167), (3.178), and (3.183), ξt ∈ ∂f (xt , yt ) + BX (0, 1) ⊂ BX (0, L1 + 1).
(3.185)
In view of (3.169), there exists h ∈ BX (xt+1 , δA ) ∩ argmin{ξt , w + (2at )−1 w − xt 2 : w ∈ A}.
(3.186)
By (3.151) and (3.186), ξt , h + (2at )−1 h − xt 2 ≤ ξt , θ0 + (2at )−1 θ0 − xt 2 . In view of (3.187),
(3.187)
3.9 Zero-Sum Games on Unbounded Sets
121
ξt , θ0 + (2at )−1 θ0 − xt 2 ≥ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt .
(3.188)
It follows from (3.162), (3.185), and (3.188) that (2at )−1 (h − xt − 1)2 − (2at )−1 ≤ (2at )−1 (h − xt 2 − h − xt ) ≤ (2at )−1 h − xt 2 − (L1 + 1)h − xt ≤ (2at )−1 h − xt 2 + ξt , h − xt ≤ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt + ξt xt .
(3.189)
By (3.161), (3.182), (3.184), (3.185), (3.188), and (3.189), (2at )−1 (h − xt − 1)2 − (2at )−1 ≤ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt + (L1 + 1)(5M2 + 4) ≤ (2at )−1 θ0 − xt 2 + ξt , θ0 + (L1 + 1)(5M2 + 4) ≤ (L1 + 1)M2 + (2at )−1 (4M2 + 4)2 + (L1 + 1)(5M2 + 4).
(3.190)
In view of (3.164) and (3.190), (h − xt − 1)2 ≤ 1 + (2at )(L1 + 1)(6M2 + 4) + (4M2 + 4)2 ≤ (4M2 + 4)2 + 6M2 + 5 and h − xt ≤ 4M2 + 6.
(3.191)
It follows from (3.183) and (3.191) that h ≤ 9M2 + 10. By (3.179), (3.186), and (3.192),
(3.192)
122
3 The Mirror Descent Algorithm
h ∈ C.
(3.193)
Relations (3.179), (3.186), and (3.193) imply that h ∈ argmin{ξt , w + (2at )−1 w − xt 2 : w ∈ C}.
(3.194)
By (3.180) and (3.192), xt+1 ∈ U.
(3.195)
It follows from (3.158), (3.167), (3.179), (3.180), (3.184), (3.186), (3.194), (3.195), and Lemma 3.2 applied with M0 = 9M2 + 10, f = f (·, yt ), a = at , x = xt , ξ = ξt , u = xt+1 that for each z ∈ C, at (f (xt , yt ) − f (z, yt )) ≤ 2−1 z − xt 2 − 2−1 z − xt+1 2 +at δf,1 (18M2 + 21) + at δA (L1 + 1) + 4δA (9M2 + 11) + 2−1 at2 (L1 + 1)2 .
(3.196)
By (3.157), (3.168), (3.170), (3.178)–(3.180), and Lemma 3.2 applied with a = at , x = yt , f = −f (xt , ·), ξ = −ηt , u = yt+1 that for each v ∈ D, at (f (xt , v) − f (xt , yt )) ≤ 2−1 v − yt 2 − 2−1 v − yt+1 2 +at δf,2 (2M0 + 1) + at δD (L2 + 1) + 4δD (M0 + 1) + 2−1 at2 (L2 + 1)2 .
(3.197)
θ0 − xt+1 ≤ θ0 − xt ;
(3.198)
θ0 − xt+1 > θ0 − xt .
(3.199)
There are two cases:
3.9 Zero-Sum Games on Unbounded Sets
123
If (3.198) holds, then in view of (3.182), θ0 − xt+1 ≤ 4M2 + 4.
(3.200)
Assume that (3.199) is true. It follows from (3.151), (3.154), (3.163)–(3.165), (3.178), (3.179), and (3.196) with z = θ0 that f (xt , yt ) ≤ f (θ0 , yt ) +δf,1 (18M2 + 21) + δA (L1 + 1) + 4at−1 δA (9M2 + 11) + 1 ≤ M1 + 4.
(3.201)
By (3.155), (3.178), and (3.201), xt ≤ M2 .
(3.202)
It follows from (3.151), (3.185), (3.188), (3.189), and (3.202) that (2at )−1 (h − xt − 1)2 − (2at )−1 ≤ (2at )−1 h − xt 2 + ξt , h − xt + ξt , xt + (L1 + 1)M2 ≤ (2at )−1 θ0 − xt 2 + ξt , θ0 + (L1 + 1)M2 ≤ (L1 + 1)M0 + (2at )−1 (2M2 + 1)2 + (L1 + 1)M2 .
(3.203)
In view of (3.164), (3.202), and (3.203), (h − xt − 1)2 ≤ 1 + (M0 + M2 ) + (2M2 + 1)2 ≤ 4M22 + 6M2 + 2 ≤ 4(M2 + 1)2 , and h − xt ≤ 1 + 2M2 + 2, h ≤ 3M2 + 3.
(3.204)
By (3.186) and (3.204), xt+1 ≤ 3M2 + 4.
(3.205)
xt+1 − θ0 ≤ 4M2 + 4.
(3.206)
In view of (3.151) and (3.205),
124
3 The Mirror Descent Algorithm
Thus we have shown by induction that for all integers t ≥ 0, (3.206) holds, (3.196) is true for all z ∈ C = A ∩ BX (0, 9M2 + 10), and (3.197) holds for all v ∈ D. It follows from (3.155) and (3.206) that (3.172) is true for all integers t ≥ 0. For all integers t ≥ 0 set bt,1 = at δf,1 (18M2 + 21) + at δA (L1 + 1) +4δA (9M2 + 11) + 2−1 at2 (L2 + 1)2 , bt,2 = at δf,2 (2M0 + 1) + at δD (L2 + 1) +4δD (M0 + 1) + 2−1 at2 (L2 + 1)2 and define φ(s) = 2−1 s 2 , s ∈ R 1 . It is easy to see that all the assumptions of Proposition 2.13 hold and it implies (3.173)–(3.175) for each natural number T and (3.176) and (3.177) for each natural number T , each u ∈ D, and each z ∈ A ∩ BX (9M2 + 10). By (3.154), (3.155), and (3.178), for each z ∈ A \ BX (9M2 + 10), f (z, yT ) > M1 + 4 > 4 + f (θ0 , yT ) and (3.176) holds for all natural numbers T and all z ∈ A. Theorem 3.7 is proved. We are interested in the optimal choice of at , t = 0, 1, . . . . Let T be a natural number and AT = Tt=0 at be given. By Theorem 3.7, in order to make the best choice of at , t = 0, . . . , T , we need to minimize the function Tt=0 at2 on the set {a = (a0 , . . . , aT ) ∈ R T +1 : ai ≥ 0, i = 0, . . . , T ,
T
ai = AT }.
i=0
By Lemma 2.3, this function has a unique minimizer a ∗ = (a0∗ , . . . , aT∗ ) where ai∗ = (T + 1)−1 AT , i = 0, . . . , T which is the best choice of at , t = 0, 1, . . . , T . Let T be a natural number and at = a for all t = 0, . . . , T . Now we will find the best a > 0. Since T can be arbitrarily large we need to choice a which is a minimizer of the function
3.9 Zero-Sum Games on Unbounded Sets
125
Ψ (a) = 2a −1 max{4δA (9M2 + 1), 4δD (M0 + 18)} +a max{(L1 + 1)2 , (L2 + 1)2 }, a > 0. This function has a minimizer a = (2 max{4δA (9M2 + 1), 4δD (M0 + 18)})1/2 max{(L1 + 1), (L2 + 1)}−1 and the minimal value of Ψ is (8 max{4δA (9M2 + 1), 4δD (M0 + 18)})1/2 max{(L1 + 1), (L2 + 1)}. Now our goal is to find the best T > 0. In view of the inequalities above, in order to make the best choice of T , it should be at the same order as max{δA , δD }−1 . For example, T = max{δA , δD }−1 + 1. In this case, we obtain a pair of points x ∈ U, y ∈ V such that x , δA ) ∩ A = ∅, BX ( y , δD ) ∩ D = ∅ BY ( and for each z ∈ A and each v ∈ D, f (z, y ) ≥ f ( x, y) −c max{δA , δD }1/2 − 2 max{δf,1 , δf,2 } max{18M2 + 21, 2M0 + 1} and f ( x , v) ≤ f ( x, y) +c max{δA , δD }1/2 + 2 max{δf,1 , δf,2 } max{18M2 + 21, 2M0 + 1} where the constant c > 0 depends only on L1 , L2 and M0 , M1 , M2 .
Chapter 4
Gradient Algorithm with a Smooth Objective Function
In this chapter we analyze the convergence of a projected gradient algorithm with a smooth objective function under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a gradient of the objective function while in the second one we calculate a projection on the feasible set. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
4.1 Optimization on Bounded Sets Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let C be a nonempty closed convex subset of X, U be an open convex subset of X such that C ⊂ U , and f : U → R 1 be a convex continuous function. We suppose that the function f is Fréchet differentiable at every point x ∈ U and for every x ∈ U we denote by f (x) ∈ X the Fréchet derivative of f at x. It is clear that for any x ∈ U and any h ∈ X f (x), h = lim t −1 (f (x + th) − f (x)). t→0
(4.1)
Recall that for each nonempty set D and each function g : D → R 1 , inf(g, D) = inf{g(y) : y ∈ D}, © Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_4
(4.2) 127
128
4 Gradient Algorithm with a Smooth Objective Function
argmin(g, D) = argmin{g(z) : z ∈ D} = {z ∈ D : g(z) = inf(g, D)}. We suppose that the mapping f : U → X is Lipschitz on all bounded subsets of U . It is well known (see Lemma 2.2) that for each nonempty closed convex set D ⊂ X and each x ∈ X there exists a unique point PD (x) ∈ D such that x − PD (x) = inf{x − y : y ∈ D}. In this chapter we study the behavior of a projected gradient algorithm with a smooth objective function which is used for solving convex constrained minimization problems [67, 68, 72]. Suppose that there exist L > 1, M0 > 0 such that C ⊂ BX (0, M0 ),
(4.2)
|f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ U,
(4.3)
f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ U.
(4.4)
Let δf , δC ∈ (0, 1]. We describe below our algorithm. Gradient algorithm Initialization: select an arbitrary x0 ∈ U ∩ BX (0, M0 ). Iterative Step: given a current iteration vector xt ∈ U calculate ξt ∈ f (xt ) + BX (0, δf ) and calculate the next iteration vector xt+1 ∈ U such that xt+1 − PC (xt − L−1 ξt ) ≤ δC . In this chapter we prove the following result. Theorem 4.1 Let δf , δC ∈ (0, 1] and let x0 ∈ U ∩ BX (0, M0 ).
(4.5)
∞ Assume that {xt }∞ t=1 ⊂ U , {ξt }t=0 ⊂ X and that for each integer t ≥ 0,
ξt − f (xt ) ≤ δ
(4.6)
xt+1 − PC (xt − L−1 ξt ) ≤ δC .
(4.7)
and
4.2 Auxiliary Results
129
Then for each natural number T , min{f (xt ) : t = 2, . . . , T + 1} − inf(f, C), T +1
f(
T −1 xt ) − inf(f, C)
t=2
≤ (2T )−1 L(2M0 + 1)2 + LδC (6M0 + 7) + δf (4M0 + 4).
(4.8)
Theorem 4.1 is a generalization of Theorem 4.2 of [92] proved in the case when δf = δC . We are interested in an optimal choice of T . If we choose T in order to minimize the right-hand side of (4.8) we obtain that T should be at the same order as max{δf , δC }−1 . In this case the right-hand side of (4.8) is at the same order as max{δf , δC }. For example, if T = max{δf , δC }−1 + 1, then the right-hand side of (4.8) does not exceed 2−1 (2M0 + 1)2 L max{δf , δC } + δC (6M0 + 7)L + (4M0 + 5)δf .
4.2 Auxiliary Results Proposition 4.2 Let D be a nonempty closed convex subset of X, x ∈ X, and y ∈ D. Assume that for each z ∈ D, z − y, x − y ≤ 0.
(4.9)
Then y = PD (x). Proof Let z ∈ D. By (4.9), z − x, z − x = z − y + (y − x), z − y + (y − x) = y − x, y − x + 2z − y, y − x + z − y, z − y ≥ y − x, y − x + z − y, z − y = y − x2 + z − y2 . Thus y = PD (x). Proposition 4.2 is proved.
Proposition 4.3 Assume that x, u ∈ U , L > 0 and that for each v1 , v2 ∈ {tx + (1 − t)u : t ∈ [0, 1]}, f (v1 ) − f (v2 ) ≤ Lv1 − v2 .
130
4 Gradient Algorithm with a Smooth Objective Function
Then f (u) ≤ f (x) + f (x), u − x + 2−1 Lu − x2 . Proof For each t ∈ [0, 1] set φ(t) = f (x + t (u − x)).
(4.10)
Clearly, φ is a differentiable function and for each t ∈ [0, 1], φ (t) = f (x + t (u − x)), u − x.
(4.11)
By (4.10), (4.11) and the proposition assumptions, f (u) − f (x) = φ(1) − φ(0) =
1
φ (t)dt =
0
1
=
1
f (x + t (u − x)), u − xdt
0
f (x), u − xdt +
0
1
f (x + t (u − x)) − f (x), u − xdt
0
≤ f (x), u − x +
1
Ltu − x2 dt
0
1
= f (x), u − x + Lu − x
2
tdt 0
= f (x), u − x + Lu − x2 /2.
Proposition 4.3 is proved.
4.3 The Main Lemma Lemma 4.4 Let δf , δC ∈ (0, 1], u ∈ BX (0, M0 + 1) ∩ U,
(4.12)
ξ − f (u) ≤ δf
(4.13)
ξ ∈ X satisfy
4.3 The Main Lemma
131
and let v ∈ U satisfy v − PC (u − L−1 ξ ) ≤ δC .
(4.14)
Then for each x ∈ U satisfying B(x, δC ) ∩ C = ∅
(4.15)
the following inequalities hold: f (x) − f (v) ≥ 2−1 Lx − v2 − 2−1 Lx − u2 − δC L(6M0 + 7) − δf (4M0 + 5),
(4.16)
f (x) − f (v) ≥ 2−1 Lu − v2 + Lv − u, u − x − δC L(4M0 + 5) − δf (4M0 + 5).
(4.17)
Proof For each x ∈ U define g(x) = f (u) + f (u), x − u + 2−1 Lx − u2 .
(4.18)
Clearly, g : U → R 1 is a convex Fréchet differentiable function, for each x ∈ U , g (x) = f (u) + L(x − u), lim g(x) = ∞
x→∞
(4.19) (4.20)
and there exists x0 ∈ C
(4.21)
g(x0 ) ≤ g(x) for all x ∈ C.
(4.22)
such that
By (4.21) and (4.22), for all z ∈ C, g (x0 ), z − x0 ≥ 0.
(4.23)
132
4 Gradient Algorithm with a Smooth Objective Function
In view of (4.19) and (4.23), L−1 f (u) + x0 − u, z − x0 ≥ 0 for all z ∈ C.
(4.24)
Proposition 4.2, (4.21), and (4.24) imply that x0 = PC (u − L−1 f (u)).
(4.25)
It follows from (4.13), (4.14), (4.25), and Lemma 2.2 that v − x0 ≤ v − PC (u − L−1 ξ ) +PC (u − L−1 ξ ) − PC (u − L−1 f (u)) ≤ δC + L−1 ξ − f (u) ≤ δC + L−1 δf .
(4.26)
In view of (4.2) and (4.25), x0 ≤ M0 .
(4.27)
Relations (4.2) and (4.14) imply that v ≤ M0 + 1.
(4.28)
By (4.4), (4.12), (4.18), and Proposition 4.3, for all x ∈ U , f (x) ≤ f (u) + f (u), x − u + 2−1 Lu − x2 = g(x).
(4.29)
Let x∈U
(4.30)
B(x, δC ) ∩ C = ∅.
(4.31)
satisfy
It follows from (4.3), (4.26), and (4.30) that |f (x0 ) − f (v)| ≤ Lv − x0 ≤ δC L + δf .
(4.32)
In view of (4.21) and (4.29), g(x0 ) ≥ f (x0 ).
(4.33)
4.3 The Main Lemma
133
By (4.18), (4.33), and convexity of f , f (x) − f (x0 ) ≥ f (x) − g(x0 ) = f (x) − f (u) − f (u), x0 − u − 2−1 Lu − x0 2 ≥ f (u) + f (u), x − u −f (u) − f (u), x0 − u − 2−1 Lu − x0 2 = f (u), x − x0 − 2−1 Lu − x0 2 .
(4.34)
Relation (4.31) implies that there exists x1 ∈ C
(4.35)
x1 − x ≤ δC .
(4.36)
such that
By (4.3), (4.30), (4.35), and (4.36), |f (x1 ) − f (x)| ≤ LδC .
(4.37)
It follows from (4.23) (with z = x1 ) and (4.35) that 0 ≤ g (x0 ), x1 − x0 = g (x0 ), x1 − x + g (x0 ), x − x0 .
(4.38)
By (4.3), (4.12), (4.20), (4.27), (4.35), and (4.36), g (x0 ), x1 − x = f (u) + L(x0 − u), x1 − x = f (u), x1 − x + Lx0 − u, x1 − x ≤ LδC + LδC (2M0 + 1).
(4.39)
g (x0 ), x − x0 = f (u) + L(x0 − u), x − x0 .
(4.40)
In view of (4.19) and (4.21),
134
4 Gradient Algorithm with a Smooth Objective Function
Relations (4.38)–(4.40) imply that f (u), x − x0 = g (x0 ), x − x0 − Lx0 − u, x − x0 ≥ −g (x0 ), x1 − x − Lx0 − u, x − x0 ≥ −Lx0 − u, x − x0 − LδC (2M0 + 2).
(4.41)
It follows from (4.34) and (4.41) that f (x) − f (x0 ) ≥ f (u), x − x0 − 2−1 Lx0 − u2 ≥ −LC δ(2M0 + 2) − Lx0 − u, x − x0 − 2−1 Lx0 − u2 .
(4.42)
In view of (4.42) and Lemma 2.1, f (x) − f (x0 ) ≥ −LδC (2M0 + 2) − 2−1 Lx0 − u2 −2−1 L[x − u2 − x − x0 2 − u − x0 2 ] = 2−1 Lx − x0 2 − 2−1 Lx − u2 − LδC (2M0 + 2).
(4.43)
By (4.32), f (x) − f (v) ≥ f (x) − f (x0 ) − δC L − δf .
(4.44)
It follows from (4.12), (4.14), (4.25), (4.26), (4.30), and (4.31) that |x − x0 2 − x − v2 | = |x − x0 − x − v|(x − x0 + x − v) ≤ (δC + L−1 δf )(4M0 + 4)
(4.45)
and |u − x0 2 − u − v2 | = |u − x0 − u − v|(u − x0 + u − v) ≤ (δC + L−1 δf )(4M0 + 4).
(4.46)
4.3 The Main Lemma
135
In view of (4.42), f (x) − f (x0 ) ≥ −LδC (2M0 + 2) + 2−1 Lx0 − u2 −Lx0 − u, x0 − u − Lx0 − u, x − x0 ≥ −LδC (2M0 + 2) + 2−1 Lx0 − u2 − Lx0 − u, x − u.
(4.47)
By (4.32), (4.43), and (4.45), f (x) − f (v) ≥ f (x) − f (x0 ) − δC L − δf ≥ 2−1 Lx − x0 2 − 2−1 Lx − u2 −LδC (2M0 + 2) − LδC − δf ≥ 2−1 Lx − v2 − 2−1 Lx − u2 −L(δC + L−1 δf )(4M0 + 4) −LδC (2M0 + 2) − LδC − δf and (4.16) holds. It follows from (4.12), (4.26), (4.31), (4.32), (4.46), and (4.47) that f (x) − f (v) ≥ f (x) − f (x0 ) − δC L − δf ≥ −LδC (2M0 + 2) + 2−1 Lx0 − u2 −Lx0 − u, x − u − LδC − δf ≥ −LδC − δf − L(δC + L−1 δf )(2M0 + 2) + 2−1 Lu − v2 −Lv − u, x − u − L(δC + L−1 δf )(2M0 + 2) and (4.17) holds. Lemma 4.4 is proved.
136
4 Gradient Algorithm with a Smooth Objective Function
4.4 Proof of Theorem 4.1 Clearly, the function f has a minimizer on the set C. Fix z∈C
(4.48)
f (z) = inf(f, C).
(4.49)
xt ≤ M0 + 1, t = 0, 1, . . . .
(4.50)
such that
It is easy to see that
Let T be a natural number and t ≥ 0 be an integer. Applying Lemma 4.4 with u = xt , ξ = ξt , v = xt+1 , x = z we obtain that f (z) − f (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 −δC L(6M0 + 7) − δf (4M0 + 5). This implies that Tf (z) −
T
f (xt+1 )
t=1
≥
T (2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 ) t=1
−δC LT (6M0 + 7) − T δf (4M0 + 5) = 2−1 L(z − xT +1 2 − z − x1 2 ) − δC LT (6M0 + 7) − T δf (4M0 + 5). Relations (4.2), (4.48), (4.50), and (4.51) imply that T min{f (xt ) : t = 2, . . . , T + 1} − inf(f, C),
(4.51)
4.5 Optimization on Unbounded Sets
Tf (
137
T +1
T −1 xt ) − inf(f, C)
t=2
≤ 2−1 (2M0 + 1)2 L + δC LT (6M0 + 7) + T δf (4M0 + 5). Together with (4.49) this implies that min{f (xt ) : t = 2, . . . , T + 1} − inf(f, C), T +1
f(
T −1 xt ) − inf(f, C)
t=2
≤ 2−1 (2M0 + 1)2 T −1 L + δC L(6M0 + 7) + δf (4M0 + 5). This completes the proof of Theorem 4.1.
4.5 Optimization on Unbounded Sets Let X be a Hilbert space with an inner product ·, · which induces a complete norm · . Let D be a nonempty closed convex subset of X, V be an open convex subset of X such that D ⊂ V, and f : V → R 1 be a convex Fréchet differentiable function which is Lipschitz on all bounded subsets of V . Set Dmin = {x ∈ D : f (x) ≤ f (y) for all y ∈ D}.
(4.52)
Dmin = ∅.
(4.53)
We suppose that
We will prove the following result.
138
4 Gradient Algorithm with a Smooth Objective Function
Theorem 4.5 Let δf , δC ∈ (0, 1], M > 0, Dmin ∩ BX (0, M) = ∅,
(4.54)
M0 = 4M + 8,
(4.55)
L ≥ 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2),
(4.56)
f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, M0 + 2),
(4.57)
0 = δf (4M0 + 6) + δC L(6M0 + 8)
(4.58)
n0 = 2−1 L(2M + 1)2 (δf + LδC )−1 + 1.
(4.59)
and let
∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M
(4.60)
ξt − f (xt ) ≤ δf
(4.61)
xt+1 − PD (xt − L−1 ξt ) ≤ δC .
(4.62)
and that for each integer t ≥ 0,
and
Then there exists an integer q ∈ [1, n0 + 1] such that f (xq ) ≤ inf(f, D) + 0 , xi ≤ 3M + 3, i = 0, . . . , q. Proof By (4.54) there exists z ∈ Dmin ∩ BX (0, M). By (4.56), (4.60)–(4.63), and Lemma 2.2,
(4.63)
4.5 Optimization on Unbounded Sets
139
x1 − z ≤ x1 − PD (x0 − L−1 ξ0 ) + PD (x0 − L−1 ξ0 ) − z ≤ δC + x0 − z + L−1 ξ0 ≤ 1 + 2M + L−1 (L + 1) ≤ 2M + 3.
(4.64)
In view of (4.63) and (4.64), x1 ≤ 3M + 3.
(4.65)
Assume that an integer T ≥ 0 and that for all t = 1, . . . , T + 1, f (xt ) − f (z) > 0 .
(4.66)
U = V ∩ {v ∈ X : v < M0 + 2}
(4.67)
C = D ∩ BX (0, M0 ).
(4.68)
Set
and
Assume that an integer t ∈ [0, T ] and that xt − z ≤ 2M + 3.
(4.69)
(In view of (4.60) and (4.63), our assumption is true for t = 0.) By (4.55), (4.63), and (4.68), z ∈ C ⊂ BX (0, M0 ).
(4.70)
Relations (4.55), (4.63), (4.67), and (4.69) imply that xt ∈ U ∩ BX (0, M0 + 1).
(4.71)
It follows from (4.56), (4.61), and (4.71) that ξt ∈ f (xt ) + BX (0, 1) ⊂ BX (0, L + 1). By (4.63), (4.69), (4.72), and Lemma 2.2, z − PD (xt − L−1 ξt ) ≤ z − xt + L−1 ξt
(4.72)
140
4 Gradient Algorithm with a Smooth Objective Function
≤ z − xt + L−1 ξt ≤ 2M + 5.
(4.73)
In view of (4.63) and (4.73), PD (xt − L−1 ξt ) ≤ 3M + 5.
(4.74)
Relations (4.55), (4.68), and (4.74) imply that PD (xt − L−1 ξt ) ∈ C,
(4.75)
PD (xt − L−1 ξt ) = PC (xt − L−1 ξt ).
(4.76)
It follows from (4.62), (4.67), and (4.74) that xt+1 ≤ 3M + 6, xt+1 ∈ U.
(4.77)
By (4.56), (4.57), (4.61)–(4.63), (4.68), (4.71), (4.77), and Lemma 4.4 applied with u = xt , ξ = ξt , v = xt+1 , x = z we obtain that f (z) − f (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 − LδC (6M0 + 7) − δf (4M0 + 5).
(4.78)
By (4.58), (4.66), and (4.78), LδC (6M0 + 8) + δf (4M0 + 6) = 0 < f (xt+1 ) − f (z) ≤ 2−1 L(z − xt 2 − z − xt+1 2 ) + LδC (6M0 + 7) + δf (4M0 + 5). In view of (4.69) and (4.79), z − xt+1 ≤ z − xt ≤ 2M + 3. Thus by induction we showed that for all t = 0, . . . , T + 1
(4.79)
4.5 Optimization on Unbounded Sets
141
z − xt ≤ 2M + 3, xt ≤ 3M + 3 and that (4.79) holds for all t = 0, . . . , T . It follows from (4.79) that (1 + T )(LδC (6M0 + 8) + δf (4M0 + 6)) ≤ (1 + T )(min{f (xt ) : t = 1, . . . , T + 1} − f (z)) ≤
T (f (xt+1 ) − f (z)) t=0
−1
≤2
L
T
(z − xt 2 − z − xt+1 2 )
t=0
+(T + 1)(LδC (6M0 + 7) + δf (4M0 + 5)). By the relation above, (4.61) and (4.63), (T + 1)(δf + LδC ) ≤ 2−1 Lz − x0 2 ≤ 2−1 L(2M + 1)2 and T < 2−1 L(2M + 1)2 (δf + LδC )−1 ≤ n0 . Thus we assumed that an integer T ≥ 0 satisfies f (xt ) − f (z) > 0 , t = 1, . . . , T + 1 and showed that T ≤ n0 − 1 and xt ≤ 3M + 3, t = 0, . . . , T + 1. This implies that there exists a natural number q ≤ n0 + 1 such that f (xq ) − f (z) ≤ 0 , xt ≤ 3M + 3, t = 0, . . . , q. Theorem 4.5 is proved. The next result has no prototype in [92].
142
4 Gradient Algorithm with a Smooth Objective Function
Theorem 4.6 Let δf , δC ∈ (0, 1], M > 2 satisfy {x ∈ V : f (x) ≤ inf(f, D) + 3} ⊂ BX (0, M − 2),
(4.80)
L ≥ 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8),
(4.81)
f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 8),
(4.82)
δC ≤ (L(18M + 19))−1 , δf ≤ (12M + 3)−1 .
(4.83)
∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M
(4.84)
ξt − f (xt ) ≤ δf
(4.85)
xt+1 − PD (xt − L−1 ξt ) ≤ δC .
(4.86)
and that for each integer t ≥ 0,
and
Then xt ≤ 3M for all integers t ≥ 0 and for each natural number T , min{f (xt ) : t = 1, . . . , T + 1} − inf(f, D), T +1
f(
(T + 1)−1 xt ) − inf(f, D)
t=1
≤ LδC (18M + 19) + δf (12M + 13) + 2LM 2 (T + 1)−1 . Proof Fix z ∈ Dmin .
(4.87)
4.5 Optimization on Unbounded Sets
143
Set U = V ∩ {v ∈ X : v < 3M + 4}
(4.88)
C = D ∩ BX (0, 3M + 2).
(4.89)
and
In view of (4.80) and (4.87), z ≤ M − 2.
(4.90)
x0 − z ≤ 2M.
(4.91)
By (4.84) and (4.90),
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M.
(4.92)
z ∈ C ∩ BX (0, M − 1).
(4.93)
By (4.87), (4.89), and (4.90),
Relations (4.88), (4.90), and (4.92) imply that xt ∈ U ∩ BX (0, 3M).
(4.94)
xt+1 − PD (xt − L−1 ξt ) ≤ 1.
(4.95)
In view of (4.86),
It follows from (4.81), (4.85), and (4.94) that ξt ∈ f (xt ) + BX (0, 1) ⊂ BX (0, L + 1).
(4.96)
By (4.88), (4.92), (4.96), and Lemma 2.2, z − PD (xt − L−1 ξt ) ≤ z − xt + L−1 ξt ≤ z − xt + L−1 ξt ≤ 2M + L−1 (L + 1) ≤ 2M + 2.
(4.97)
144
4 Gradient Algorithm with a Smooth Objective Function
In view of (4.90) and (4.97), PD (xt − L−1 ξt ) ≤ 3M + 2.
(4.98)
Relations (4.89) and (4.98) imply that PD (xt − L−1 ξt ) ∈ C,
(4.99)
PD (xt − L−1 ξt ) = PC (xt − L−1 ξt ).
(4.100)
It follows from (4.83), (4.86), (4.88), and (4.98) that xt+1 ≤ PD (xt − L−1 ξt ) + δC < 3M + 3, xt+1 ∈ U.
(4.101) (4.102)
By (4.81), (4.82), (4.85), (4.86)–(4.89), (4.94), (4.101), (4.102), and Lemma 4.4 applied with M0 = 3M + 2, u = xt , ξ = ξt , v = xt+1 , x = z we obtain that f (z) − f (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 − LδC (18M + 19) − δf (12M + 13).
(4.103)
z − xt+1 ≤ z − xt ;
(4.104)
z − xt+1 > z − xt .
(4.105)
There are two cases:
If (4.104) holds, then in view of (4.92), z − xt+1 ≤ 2M. Assume that (4.105) holds. It follows from (4.83), (4.103), and (4.105) that f (xt+1 ) − f (z) ≤ LδC (18M + 19) + δf (12M + 13) ≤ 2.
(4.106)
4.5 Optimization on Unbounded Sets
145
By (4.80), (4.87), and (4.106), xt+1 ≤ M2 .
(4.107)
z − xt+1 ≤ 2M − 2.
(4.108)
In view of (4.93) and (4.107),
Thus in both cases z − xt+1 ≤ 2M and (4.103) holds. Therefore by induction we showed that for all integers t ≥ 0, xt ≤ 3M and (4.103) holds. Let T be a natural number. By (4.91) and (4.103), T (f (xt+1 ) − f (z)) t=0
≤ (T + 1)(LδC (18M + 19) + δf (12M + 13)) +2−1 L
T (z − xt 2 − z − xt+1 2 ) t=0
≤ (T + 1)(LδC (18M + 19) + δf (12M + 13)) + 2LM 2 . This implies that min{f (xt ) : t = 1, . . . , T + 1} − f (z), T +1
f(
(T + 1)−1 xt ) − f (z)
t=1
≤ LδC (18M + 19) + δf (12M + 13) + 2(T + 1)−1 LM 2 . Theorem 4.6 is proved.
It is clear that a best choice of T should be at the same order as max{δC , δf }−1 . The next result has no prototype in [92].
146
4 Gradient Algorithm with a Smooth Objective Function
Theorem 4.7 Let δf , δC ∈ (0, 1), M > 1, δC ≤ (160M)−1 , δf ≤ 120−1 ,
(4.109)
Dmin ∩ BX (0, M) = ∅,
(4.110)
L ≥ 1 satisfy |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 10),
(4.111)
f (v1 ) − f (v2 ) ≤ Lv1 − v2 for all v1 , v2 ∈ V ∩ BX (0, 4M + 10),
(4.112)
T = 36−1 min{δC−1 , Lδf−1 }.
(4.113)
∞ Assume that {xt }∞ t=0 ⊂ V , {ξt }t=0 ⊂ X,
x0 ≤ M
(4.114)
ξt − f (xt ) ≤ δf
(4.115)
xt+1 − PD (xt − L−1 ξt ) ≤ δC .
(4.116)
and that for each integer t ≥ 0,
and
Then xt ≤ 3M + 3 for all integers t = 0, . . . , T + 1 and T (T + 1)−1 (f (xt+1 ) − inf(f, D)), t=0
min{f (xt ) : t = 1, . . . , T + 1} − inf(f, D), T +1
(T + 1)−1 xt ) − inf(f, D)
f(
t=1
4.5 Optimization on Unbounded Sets
147
≤ 72(M + 1)2 max{δC L, δf }. Proof By (4.111) there exists z ∈ Dmin ∩ BX (0, M).
(4.117)
x0 − z ≤ 2M.
(4.118)
U = V ∩ {v ∈ X : v < 4M + 9}
(4.119)
C = D ∩ BX (0, 4M + 8).
(4.120)
By (4.114) and (4.117),
Set
and
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M + 2.
(4.121)
xt ≤ 3M + 2.
(4.122)
xt ∈ U ∩ BX (0, 3M + 2).
(4.123)
By (4.117) and (4.121),
In view of (4.119) and (4.122),
It follows from (4.111), (4.115), and (4.122) that ξt ≤ L + 1.
(4.124)
By (4.117), (4.121), (4.124), and Lemma 2.2, z − PD (xt − L−1 ξt ) ≤ z − xt + L−1 ξt ≤ z − xt + L−1 ξt ≤ 2M + 2 + L−1 (L + 1) ≤ 2M + 4.
(4.125)
148
4 Gradient Algorithm with a Smooth Objective Function
In view of (4.117) and (4.125), PD (xt − L−1 ξt ) ≤ 3M + 4.
(4.126)
Relations (4.120) and (4.126) imply that PD (xt − L−1 ξt ) ∈ C,
(4.127)
PD (xt − L−1 ξt ) = PC (xt − L−1 ξt ).
(4.128)
It follows from (4.109), (4.116), and (4.126) that xt+1 ≤ 3M + 5.
(4.129)
Relations (4.119) and (4.129) imply that xt+1 ∈ U ∩ BX (0, 3M + 5).
(4.130)
By (4.111), (4.112), (4.115)–(4.117), (4.123), and Lemma 4.4 applied with M0 = 4M + 8, u = xt , ξ = ξt , v = xt+1 , x = z we obtain that f (z) − f (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 − LδC (24M + 55) − δf (16M + 37).
(4.131)
It follows from (4.111), (4.116), (4.117), (4.126), and (4.129) that f (xt+1 ) ≥ f (PD (xt − L−1 ξt )) − LδC ≥ f (z) − LδC .
(4.132)
By (4.131) and (4.132), z − xt+1 2 − z − xt 2 ≤ 2δC + 2δC (24M + 55) + 2L−1 δf (16M + 37).
(4.133)
Thus we have shown that the following property holds: (a) if for an integer t ≥ 0 relation xt − z ≤ 2M + 2 holds, then (4.131) and (4.133) are true.
4.5 Optimization on Unbounded Sets
149
Let us show that for all integers t = 0, . . . , T , z − xt 2 ≤ z − x0 2 +t[2δC (24M + 55) + 2L−1 δf (16M + 37)] ≤ z − x0 2 + T [2δC (24M + 55) + 2L−1 δf (16M + 37)] ≤ 4M 2 + 8M + 4.
(4.134)
First, note that by (4.113), T [2δC (24M + 55) + 2L−1 δf (16M + 37)] = 2T δC (24M + 55) + 2T L−1 δf (16M + 37) ≤ 18−1 (24M + 55) + 18−1 (16M + 37) ≤ 5M + 4.
(4.135)
It follows from (4.121) and (4.135) that z − x0 2 + T [2δC (24M + 55) + 2L−1 δf (16M + 13)] ≤ 4M 2 + 8M + 4 = (2M + 2)2 .
(4.136)
Assume that t ∈ {0, . . . , T } \ {T } and that (4.134) holds. Then xt − z ≤ 2M + 2 and by property (a), relations (4.131) and (4.133) hold. By (4.133) and (4.134), z − xt+1 2 ≤ z − xt 2 +2δC (24M + 55) + 2L−1 δf (16M + 37) ≤ z − x0 2 + (t + 1)[2δC (24M + 55) + 2L−1 δf (16M + 37)] ≤ z − x0 2 + T [2δC (24M + 55) + 2L−1 δf (16M + 37)] ≤ 4M 2 + 8M + 4. Thus we have shown by induction that for all t = 0, . . . , T , (4.134) holds and xt − z ≤ 2M + 2.
(4.137)
150
4 Gradient Algorithm with a Smooth Objective Function
Property (a) implies that (4.131) and (4.137) hold for all t = 0, . . . , T . In view of (4.117) and (4.137), xt ≤ 3M + 2, t = 0, . . . , T . It follows from (4.117), (4.118), and (4.131) that T (T + 1)−1 f (xt+1 ) − inf(f, D) t=0
≤ 2−1 L(T + 1)−1 z − x0 2 +δC L(24M + 55) + δf (16M + 37) ≤ 72LM 2 max{δC , L−1 δf } + + δC L(24M + 55) + δf (16M + 37) ≤ max{δC , L−1 δf }(72LM 2 + L(24M + 55) + L(16M + 37) ≤ max{LδC , δf }(72M 2 + 40M + 92) ≤ 72 max{LδC , δf }(M + 1)2 . Property (a), (4.110), (4.133), and (4.137) with t = T imply that z − xT +1 2 − z − xT 2 ≤ 2δC (24M + 56) + 2L−1 δf (16M + 37) ≤ 4M 2 + 8M + 4 + 160MδC + 106L−1 δf ≤ 4M 2 + 8M + 6 ≤ (2M + 3)2 and xT +1 ≤ 3M + 3. Theorem 4.7 is proved.
Chapter 5
An Extension of the Gradient Algorithm
In this chapter we analyze the convergence of a gradient type algorithm, under the presence of computational errors, which was introduced by Beck and Teboulle [18] for solving linear inverse problems arising in signal/image processing. This algorithm is used for minimization of the sum of two given convex functions and each of its iteration consists of two steps. The first step is a calculation of a gradient of the first function while in the second one we solve an auxiliary minimization problem. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
5.1 Preliminaries and the Main Result Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Suppose that f : X → R 1 is a convex Fréchet differentiable function on X and for every x ∈ X denote by f (x) ∈ X the Fréchet derivative of f at x. It is clear that for any x ∈ X and any h ∈ X f (x), h = lim t −1 (f (x + th) − f (x)). t→0
Recall that for each function φ : X → R 1 , inf(φ) = inf{φ(y) : y ∈ X},
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_5
151
152
5 An Extension of the Gradient Algorithm
argmin(φ) = argmin{φ(z) : z ∈ X} = {z ∈ X : φ(z) = inf(φ)}. We suppose that the mapping f : X → X is Lipschitz on all bounded subsets of X. Let g : X → R 1 be a convex continuous function which is Lipschitz on all bounded subsets of X. Define F (x) = f (x) + g(x), x ∈ X. We suppose that argmin(F ) = ∅
(5.1)
and that there exists c∗ ∈ R 1 such that g(x) ≥ c∗ for all x ∈ X.
(5.2)
For each u ∈ X, each ξ ∈ X, and each L > 0 define a convex function Gu,ξ (w) = f (u) + ξ, w − u + 2−1 Lw − u2 + g(w), w ∈ X (L)
(5.3)
which has a minimizer. In this chapter we analyze the gradient type algorithm, which was introduced by Beck and Teboulle in [18] for solving linear inverse problems, and prove the following result. Theorem 5.1 Let δf , δG ∈ (0, 1], M ≥ 1 satisfy argmin(F ) ∩ BX (0, M) = ∅,
(5.4)
L ≥ 1 satisfy |f (w1 ) − f (w2 )| ≤ Lw1 − w2 for all w1 , w2 ∈ BX (0, 3M + 2)
(5.5)
and f (w1 ) − f (w2 ) ≤ Lw1 − w2 for all w1 , w2 ∈ X,
(5.6)
|f (w)|, F (w) ≤ M1 for all w ∈ BX (0, 3M + 2),
(5.7)
M1 ≥ 3M satisfy
5.1 Preliminaries and the Main Result
153
M2 = 18(M1 + |c∗ | + 1 + (L + 1)2 )1/2 + 3M + 2,
(5.8)
L0 ≥ 1 satisfy |g(w1 ) − g(w2 )| ≤ L0 w1 − w2 for all w1 , w2 ∈ BX (0, M2 ),
(5.9)
δf ≤ 4−1 (M2 + 9M + 4)−1 , δG ≤ 4−1 (2 + L0 + L(M2 + 3M + 3))−1 , 0 = 2δf (M2 + 9M + 4) + 2δG (2 + L0 + L(M2 + 3M + 3))
(5.10) (5.11)
and let n0 = 4LM 2 0−1 + 2.
(5.12)
∞ Assume that {xt }∞ t=0 ⊂ X, {ξt }t=0 ⊂ X,
x0 ≤ M
(5.13)
ξt − f (xt ) ≤ δf
(5.14)
and that for each integer t ≥ 0,
and (L)
BX (xt+1 , δG ) ∩ argmin(Gxt ,ξt ) = ∅.
(5.15)
Then there exists an integer q ∈ [0, n0 + 1] such that xi ≤ M2 , i = 0, . . . , q and F (xq ) ≤ inf(F ) + 0 . Note that in the theorem above δf , δG are the computational errors produced by our computer system. The computational error δf is produced when we calculate a gradient of f while the computational error δG is produced when we solve the (L) auxiliary minimization problem with the objective functions Gxt ,ξt , t = 0, 1, . . . .. By Theorem 5.1, after n0 + 1 iterations we obtain an approximate solution x satisfying
154
5 An Extension of the Gradient Algorithm
F (x) ≤ inf(f ) + 0 , where 0 , n0 are defined by (5.11) and (5.12). Theorem 5.1 is a generalization of Theorem 5.1 of [92] proved in the case when δf = δG . In this chapter we also prove two extensions of Theorem 5.1 which do not have prototypes in [92].
5.2 Auxiliary Results (L)
Lemma 5.2 ([18]) Let u, ξ ∈ X and L > 0. Then the function Gu,ξ has a point of (L)
minimum and z ∈ X is a minimizer of Gu,ξ if and only if 0 ∈ ξ + L(z − u) + ∂g(z). Proof By (5.2) and (5.3), lim G(L) u,ξ (w) = ∞.
w→∞ (L)
This implies that the function Gu,ξ has a minimizer. Clearly, z is a minimizer of (L)
Gu,ξ if and only if 0 ∈ ∂G(L) u,ξ (z) = ξ + L(z − u) + ∂g(z).
Lemma 5.2 is proved. Lemma 5.3 ([92]) Let M0 ≥ 1, L ≥ 1 satisfy |f (w1 ) − f (w2 )| ≤ Lw1 − w2 for all w1 , w2 ∈ BX (0, M0 + 2)
(5.16)
and M1 ≥ M0 satisfy |f (w)|, F (w) ≤ M1 for all w ∈ BX (0, M0 + 2).
(5.17)
Assume that u ∈ BX (0, M0 + 1),
(5.18)
ξ − f (u) ≤ 1
(5.19)
ξ ∈ X satisfies
5.2 Auxiliary Results
155
and that v ∈ X satisfies (L)
(L)
BX (v, 1) ∩ {z ∈ X : Gu,ξ (z) ≤ inf(Gu,ξ ) + 1} = ∅.
(5.20)
v ≤ (8(M1 + |c∗ | + (L + 1)2 + 1)1/2 + M0 + 2.
(5.21)
Then
Proof In view of (5.20), there exists v ∈ BX (v, 1)
(5.22)
(L)
(5.23)
such that (L)
v ) ≤ inf(Gu,ξ ) + 1. Gu,ξ ( By (5.1), (5.3), (5.17), (5.18), and (5.23), v − u2 + g( v) f (u) + ξ, v − u + 2−1 L = G(L) v ) ≤ G(L) u,ξ ( u,ξ (u) + 1 = F (u) + 1 ≤ M1 + 1.
(5.24)
It follows from (5.2), (5.17), (5.18), and (5.24) that v − u2 ≤ 2M1 + 1 + |c∗ |. ξ, v − u + 2−1 L
(5.25)
It is clear that v − u| ≤ L−1 (4−1 v − u2 + 4ξ 2 ). 2L−1 |ξ,
(5.26)
Since the function f is Lipschitz on BX (0, M0 + 2) relations (5.16), (5.18), and (5.19) imply that ξ ≤ f (u) + 1 ≤ L + 1. By (5.25)–(5.27), 2L−1 (2M1 + 1 + |c∗ |) v − u| ≥ v − u2 − 2L−1 |ξ, ≥ v − u2 − 4−1 v − u2 − 4ξ 2 ≥ 2−1 v − u2 − 4(L + 1)2 .
(5.27)
156
5 An Extension of the Gradient Algorithm
This implies that v − u2 ≤ 4(M1 + 1 + |c∗ |) + 8(L + 1)2 and v − u ≤ (4(M1 + 1 + |c∗ |) + 8(L + 1)2 )1/2 . Together with (5.18) and (5.22) this implies that v ≤ v + 1 ≤ v − u + u + 1 ≤ (4(M1 + 1 + |c∗ |) + 8(L + 1)2 )1/2 + M0 + 2.
Lemma 5.3 is proved. Lemma 5.4 Let δf , δG ∈ (0, 1], M0 ≥ 1, L ≥ 1 satisfy |f (w1 ) − f (w2 )| ≤ Lw1 − w2 for all w1 , w2 ∈ BX (0, M0 + 2), |f (w1 ) − f (w2 )| ≤ Lw1 − w2 for all w1 , w2 ∈ X,
(5.28) (5.29)
M1 ≥ M0 satisfy |f (w)|, F (w) ≤ M1 for all w ∈ BX (0, M0 + 2),
(5.30)
M2 = (8(M1 + |c∗ | + (L + 1)2 + 1)1/2 + M0 + 2
(5.31)
and let L0 ≥ 1 satisfy |g(w1 ) − g(w2 )| ≤ L0 w1 − w2 for all w1 , w2 ∈ BX (0, M2 ).
(5.32)
u ∈ BX (0, M0 + 1),
(5.33)
ξ − f (u) ≤ δf
(5.34)
Assume that
ξ ∈ X satisfies
and that v ∈ X satisfies
5.2 Auxiliary Results
157 (L)
BX (v, δG ) ∩ argmin(Gu,ξ ) = ∅.
(5.35)
Then for each x ∈ BX (0, M0 + 1), F (x) − F (v) ≥ 2−1 Lv − x2 − 2−1 Lu − x2 −δf (M2 + 3M0 + 3) − δG (1 + L0 + L(M0 + M + 2 + 3)). Proof By (5.35) there exists v ∈ X such that v ∈ argmin(Gu,ξ )
(L)
(5.36)
v − v ≤ δG .
(5.37)
and
In view of the assumptions of the lemma, Lemma 5.3, (5.31), (5.35), and (5.36), v, v ≤ M2 .
(5.38)
x ∈ BX (0, M0 + 1).
(5.39)
F (x) = f (x) + g(x), F (v) = f (v) + g(v).
(5.40)
Let
Clearly,
Proposition 4.3 and (5.33) imply that g(v) + f (v) ≤ g(v) + f (u) + f (u), v − u + 2−1 Lv − u2 . By (5.3), (5.33), (5.34), (5.38), (5.40), and (5.41), F (x) − F (v) ≥ [f (x) + g(x)] − [f (v) + g(v)] ≥ f (x) + g(x)
(5.41)
158
5 An Extension of the Gradient Algorithm
−[f (u) + f (u), v − u + 2−1 Lv − u2 + g(v)] = [f (x) + g(x)] −[f (u) + ξ, v − u + 2−1 Lv − u2 + g(v)] +ξ − f (u), v − u ≥ [f (x) + g(x)] − Gu,ξ (v) − ξ − f (u)v − u (L)
≥ [f (x) + g(x)] − G(L) u,ξ (v) − δf (M2 + M0 + 1).
(5.42)
It follows from (5.3) that (L)
(L)
v) Gu,ξ (v) − Gu,ξ ( = ξ, v − u + 2−1 Lv − u2 + g(v) −[ξ, v − u + 2−1 L v − u2 + g( v )] v − u2 ] + g(v) − g( v ). = ξ, v − v + 2−1 L[v − u2 −
(5.43)
Relations (5.28), (5.32), and (5.34) imply that ξ ≤ f (u) + 1 ≤ L + 1.
(5.44)
In view of (5.37) and (5.44), |ξ, v − v | ≤ (L + 1)δG .
(5.45)
By (5.33), (5.37), and (5.38), v − u2 | |v − u2 − ≤ |v − u − v − u|(v − u + v − u) ≤ v − v (2M2 + 2M0 + 2) ≤ δG (2M2 + 2M0 + 2).
(5.46)
In view of (5.32), (5.37), and (5.38), v ≤ L 0 δG . |g(v) − g( v )| ≤ L0 v −
(5.47)
5.2 Auxiliary Results
159
It follows from (5.43) and (5.45–(5.47) that (L)
(L)
v )| |Gu,ξ (v) − Gu,ξ ( ≤ (L + 1)δG + 2−1 LδG (2M0 + 2M2 + 2) + L0 δG = δG (L + 1 + L0 + L(M0 + M2 + 1)).
(5.48)
Relations (5.42) and (5.48) imply that F (x) − F (v) ≥ f (x) + g(x) − G(L) u,ξ (v) − δf (M2 + M0 + 1) (L)
≥ f (x) + g(x) − Gu,ξ ( v) − δf (M2 + M0 + 1) − δG (L0 + L + 1 + L(M2 + M0 + 1)).
(5.49)
By the convexity of f , (5.33), (5.34), and (5.39), f (x) ≥ f (u) + f (u), x − u ≥ f (u) + ξ, x − u − |f (u) − ξ, x − u| ≥ f (u) + ξ, x − u − f (u) − ξ x − u ≥ f (u) + ξ, x − u − δf (2M0 + 2).
(5.50)
Lemma 5.2 and (5.36) imply that there exists l ∈ ∂g( v)
(5.51)
ξ + L( v − u) + l = 0.
(5.52)
such that
In view of (5.51) and the convexity of g, g(x) ≥ g( v ) + l, x − v . It follows from (5.50) and (5.53) that f (x) + g(x)
(5.53)
160
5 An Extension of the Gradient Algorithm
≥ f (u) + ξ, x − u − δf (2M0 + 2) + g( v ) + l, x − v .
(5.54)
In view of (5.3), v ) = f (u) + ξ, v − u + 2−1 L v − u2 + g( v ). Gu,ξ ( (L)
(5.55)
By (5.52), (5.54), and (5.55), (L)
v) f (x) + g(x) − Gu,ξ ( ≥ ξ, x − v + l, x − v − 2−1 L v − u2 − δf (2M0 + 2) v − 2−1 L v − u2 = −δf (2M0 + 2) + ξ + l, x − = −2−1 L v − u2 − L v − u, x − v − δf (2M0 + 2) v − u2 + L v − u, u − x − δf (2M0 + 2). = 2−1 L
(5.56)
In view of (5.37)–(5.39), | v − x2 − v − x2 | ≤ | v − x − v − x|( v − x + v − x) ≤ v − v(2M2 + 2M0 + 2) ≤ δG (2M2 + 2M0 + 2).
(5.57)
Lemma 2.1 implies that v − x2 − v − u2 − u − x2 ]. v − u, u − x = 2−1 [ By (5.56)–(5.58), (L)
v) f (x) + g(x) − Gu,ξ ( ≥ 2−1 L v − u2 + 2−1 L v − x2 −2−1 L v − u2 − 2−1 Lu − x2 − δf (2M0 + 2) v − x2 − 2−1 Lu − x2 − δf (2M0 + 2) ≥ 2−1 L ≥ 2−1 Lv − x2 − 2−1 Lu − x2
(5.58)
5.3 Proof of Theorem 5.1
161
− 2−1 LδG (2M2 + 2M0 + 2) − δf (2M0 + 2).
(5.59)
It follows from (5.49) and (5.59) that F (x) − F (v) ≥ 2−1 Lv − x2 − 2−1 Lu − x2 −δG L(M2 + M0 + 1) − δf (2M0 + 2) −δf (M2 + M0 + 1) − δG (1 + L0 + L(M2 + M0 + 2)).
Lemma 5.4 is proved.
5.3 Proof of Theorem 5.1 By (5.4), there exists z ∈ argmin(F ) ∩ BX (0, M).
(5.60)
In view of (5.13) and (5.60), x0 − z ≤ 2M.
(5.61)
If F (x0 ) ≤ F (z) + 0 , then the assertion of the theorem holds. Let F (x0 ) > F (z) + 0 .
(5.62)
If F (x1 ) ≤ F (z) + 0 , then in view of Lemma 5.3, x1 ≤ M2 and the assertion of the theorem holds. Let F (x1 ) > F (z) + 0 . Assume that T ≥ 0 is an integer and that for all integers t = 0, . . . , T , F (xt+1 ) − F (z) > 0 .
(5.63)
We show that for all t ∈ {0, . . . , T }, xt − z ≤ 2M
(5.64)
162
5 An Extension of the Gradient Algorithm
and F (z) − F (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 − δf (M2 + 9M + 3) − δG (1 + L0 + L(3M + M2 + 3)).
(5.65)
In view of (5.61), (5.64) is true for t = 0. Assume that t ∈ {0, . . . , T } and (5.64) holds. Relations (5.60) and (5.64) imply that xt ≤ 3M.
(5.66)
M0 = 3M.
(5.67)
Set
By (5.14), (5.15), (5.60), (5.66), (5.67), and Lemma 5.4 applied with x = z, u = xt , ξ = ξt , v = xt+1 , we have F (z) − F (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 − δf (M2 + 3M0 + 3) − δG (1 + L0 + L(M0 + M2 + 3)).
(5.68)
It follows from (5.63), (5.67), and (5.68) that 0 < F (xt+1 ) − F (z) ≤ 2−1 Lz − xt 2 − 2−1 Lz − xt+1 2 + δf (M2 + 9M + 3) − δG (1 + L0 + L(M2 + 3M + 3)).
(5.69)
In view of (5.11), (5.64), and (5.69), 2M ≥ z − xt ≥ z − xt+1 . Thus we have shown by induction that (5.65) holds for all t = 0, . . . , T and that (5.64) holds for all t = 0, . . . , T + 1. By (5.63) and (5.65),
5.4 The First Extension of Theorem 5.1
(T + 1)0
z − xt .
(5.90)
Assume that (5.89) is true. Then in view of (5.86) and (5.89), z − xt+1 ≤ 2M. Assume that (5.90) holds. It follows from (5.76), (5.77), (5.83), (5.88), and (5.90) that F (xt+1 ) ≤ F (z) +δf (2M2 + 9M + 3) + δG (1 + L0 + L(3M + M2 + 3)) ≤ inf(F ) + 2. By the inequality above, (5.70) and (5.84), xt+1 ≤ M and xt+1 − z ≤ 2M. Therefore by induction we showed that for all integers t ≥ 0, (3.86)–(3.88) hold. Let T be a natural number. By (5.83), (5.84) and (5.88), T
(T + 1)−1 F (xt+1 ) − inf(F )
t=0
≤ δf (2M2 + 9M + 3) + δG (1 + L0 + L(3M + M2 + 3)) +2−1 L(T + 1)−1
T
(z − xt 2 − z − xt+1 2 )
t=0
≤ δf (2M2 + 9M + 3) +δG (1 + L0 + L(3M + M2 + 3)) + 2(T + 1)−1 LM 2 . This implies that F(
T +1 t=1
(T + 1)−1 xt ) − inf(F ),
5.5 The Second Extension of Theorem 5.1
167
min{F (xt ) : t = 1, . . . , T + 1} − inf(F ) ≤ δf (2M2 + 9M + 3) +δG (1 + L0 + L(3M + M2 + 3)) + 2(T + 1)−1 LM 2 .
This completes the proof of Theorem 5.5. By Theorem 5.5, the best choice of T is as the same order as max{δf , δG }−1 .
5.5 The Second Extension of Theorem 5.1 Theorem 5.6 Let δf , δG ∈ (0, 1), M > 1 argmin(F ) ∩ BX (0, M) = ∅,
(5.91)
L ≥ 1 satisfy |f (w1 ) − f (w2 )| ≤ Lw1 − w2 for all w1 , w2 ∈ BX (0, 4M + 10)
(5.92)
and f (w1 ) − f (w2 ) ≤ Lw1 − w2 for all w1 , w2 ∈ X,
(5.93)
M1 ≥ 4M + 8 satisfy |f (w)|, F (w) ≤ M1 for all w ∈ BX (0, 4M + 10),
(5.94)
M2 = (8(M1 + |c∗ | + 1 + (L + 1)2 )1/2 + 4M + 10,
(5.95)
L0 ≥ 1 satisfy |g(w1 ) − g(w2 )| ≤ L0 w1 − w2 for all w1 , w2 ∈ BX (0, M2 ),
(5.96)
δf , δG < 4−1 ((M2 + 12M + 27)L + 1 + L0 )−1 ,
(5.97)
−1 T = 2−1 L min{δf−1 , δG }((M2 + 12M + 27)L + 1 + L0 )−1 .
(5.98)
168
5 An Extension of the Gradient Algorithm
∞ Assume that {xt }∞ t=0 ⊂ X, {ξt }t=0 ⊂ X,
x0 ≤ M
(5.99)
ξt − f (xt ) ≤ δf
(5.100)
and that for each integer t ≥ 0,
and (L)
BX (xt+1 , δG ) ∩ argmin(Gxt ,ξt ) = ∅.
(5.101)
Then for each integer t ∈ [0, T ], xt ≤ 3M + 2 and T −1
T −1 F (xt+1 ) − inf(F ),
t=0 T −1
F(
T −1 xt+1 ) − inf(F ),
t=0
min{F (xt ) : t = 1, . . . , T } − inf(F ) ≤ max{δG , δf }(8M 2 + 1)((M2 + 12M + 27)(L + 1) + L0 + 1). Proof In view of (5.91), there exists z ∈ argmin(F ) ∩ BX (0, M).
(5.102)
x0 − z ≤ 2M.
(5.103)
By (5.99) and (5.102),
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M + 2
(5.104)
Relations (5.102) and (5.104) imply that xt ≤ 3M + 2.
(5.105)
5.5 The Second Extension of Theorem 5.1
169
Set M0 = 4M + 8.
(5.106)
By (5.100), (5.102), (5.105), (5.106) and Lemma 5.4 applied with x = z, u = xt , ξ = ξt , v = xt+1 , we have F (z) − F (xt+1 ) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 −δf (M2 + 3M0 + 3) − δG (1 + L0 + L(M0 + M2 + 3)) ≥ 2−1 Lz − xt+1 2 − 2−1 Lz − xt 2 − δf (M2 + 12M + 27) − δG (1 + L0 + L(4M + M2 + 11)).
(5.107)
By (5.102) and (5.107), z − xt+1 2 − z − xt 2 ≤ 2L−1 δf (M2 + 12M + 27) + 2L−1 δG (1 + L0 + L(4M + M2 + 11)).
(5.108)
Therefore we have shown that the following property holds: (a) for each integer t ≥ 0 satisfying (5.104), relations (5.107) and (5.108) hold. Let us show that for all integers t = 0, . . . , T , z − xt 2 ≤ z − x0 2 + t[2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11))] ≤ z − x0 2 + T [2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11))] ≤ 4M 2 + 8M + 4. First, note that in view of (5.98),
(5.109)
170
5 An Extension of the Gradient Algorithm
T [2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11))] = 2L−1 T δf (M2 + 12M + 27) + 2L−1 T δG (1 + L0 + L(4M + M2 + 11))] ≤ 2.
(5.110)
By (5.103) and (5.110), z − x0 2 + T [2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11))] ≤ 4M 2 + 2 ≤ (2M + 2)2 .
(5.111)
Assume that t ∈ {0, . . . , T } \ {T } and that (5.109) holds. Then xt − z ≤ 2M + 2 and by property (a), relations (5.107) and (5.108) hold. By (5.108), (5.109), and (5.111), z − xt+1 2 ≤ z − xt 2 +2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11)) ≤ z − x0 2 + (t + 1)[2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11))] ≤ z − x0 2 + T [2L−1 δf (M2 + 12M + 27) +2L−1 δG (1 + L0 + L(4M + M2 + 11))] ≤ (2M + 2)2 and z − xt+1 ≤ 2M + 2.
5.5 The Second Extension of Theorem 5.1
171
Thus we have shown by induction that for all t = 0, . . . , T . z − xt ≤ 2M + 2. and xt ≤ 3M + 2. Property (a) implies that for all t = 0, . . . , T , (5.107) holds. It follows from (5.97), (5.98), (5.102), (5.103), (5.106), and (5.107) which holds for all t = 0, . . . , T that T −1
T −1 F (xt+1 ) − inf(F )
t=0
≤ 2−1 T −1 Lz − x0 2 + δf (M2 + 12M + 27) +δG (1 + L0 + L(4M + M2 + 11)) ≤ 2LM 2 (T /2)−1 + δf (M2 + 12M + 27) +δG (1 + L0 + L(4M + M2 + 11)) ≤ max{δC , δf }(8((M2 + 12M + 27)L + 1 + L0 )M 2 +M2 + 12M + 27 + (1 + L0 + L(4M + M2 + 8))). Theorem 5.6 is proved.
Chapter 6
Continuous Subgradient Method
In this chapter we study the continuous subgradient algorithm for minimization of convex nonsmooth functions and for computing the saddle points of convex– concave functions, under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm we need a calculation of a subgradient of the objective function and a calculation of a projection on the feasible set. In each of these two calculations there is a computational error produced by our computer system. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two calculations of our algorithm, we find out what approximate solution can be obtained and how much time one needs for this.
6.1 Bochner Integrable Functions Let (Y, · ) be a Banach space and −∞ < a < b < ∞. A function x : [a, b] → Y is strongly measurable on [a, b] if there exists a sequence of functions xn : [a, b] → Y, n = 1, 2, . . . such that for any integer n ≥ 1 the set xn ([a, b]) is countable and the set {t ∈ [a, b] : xn (t) = y} is Lebesgue measurable for any y ∈ Y , and
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_6
173
174
6 Continuous Subgradient Method
xn (t) → x(t) as n → ∞ in (Y, · ) for almost every ( a. e. ) t ∈ [a, b]. Denote by mes(E) the Lebesgue measure of a Lebesgue measurable set E ⊂ R 1 . The function x : [a, b] → Y is Bochner integrable if it is strongly measurable b and there exists a finite a x(t)dt. If x : [a, b] → Y is a Bochner integrable function, then for almost every (a. e.) t ∈ [a, b], lim (Δt)
−1
Δt→0
t+Δt
x(τ ) − x(t)dτ = 0
t
and the function y(t) =
t
x(s)ds, t ∈ [a, b]
a
is continuous and a. e. differentiable on [a, b]. Let −∞ < τ1 < τ2 < ∞. Denote by W 1,1 (τ1 , τ2 ; Y ) the set of all functions x : [τ1 , τ2 ] → Y for which there exists a Bochner integrable function u : [τ1 , τ2 ] → Y such that t u(s)ds, t ∈ (τ1 , τ2 ] x(t) = x(τ1 ) + τ1
(see, e.g., [10, 21]). It is known that if x ∈ W 1,1 (τ1 , τ2 ; Y ), then this equation defines a unique Bochner integrable function u which is called the derivative of x and is denoted by x .
6.2 Convergence Analysis for Continuous Subgradient Method The study of continuous subgradient algorithms is an important topic in optimization theory. See, for example, [6, 9, 20, 21] and the references mentioned therein. In this chapter we analyze its convergence under the presence of computational errors. We suppose that X is a Hilbert space equipped with an inner product denoted by ·, · which induces a complete norm · . Suppose that f : X → R 1 ∪{∞} is a convex, lower semicontinuous and bounded from below function such that dom(f ) := {x ∈ X : f (x) < ∞} = ∅ and that f possesses a minimizer. Set
6.2 Convergence Analysis for Continuous Subgradient Method
175
inf(f ) = inf(f (x) : x ∈ X} and argmin(f ) = {x ∈ X : f (x) = inf(f )}. For each set D ⊂ X put inf(f, D) = inf{f (z) : z ∈ D}, sup(f, D) = sup{f (z) : z ∈ D}. In Sect. 6.4 we will prove the following result which has no prototype in [92]. Theorem 6.1 Let δ ∈ (0, 1], 0 < μ1 < μ2 , M ≥ 1, z ∈ BX (0, M) ∩ argmin(f ),
(6.1)
and let μ : [0, ∞) → R 1 be a Lebesgue measurable function such that μ1 ≤ μ(t) ≤ μ2 for all t ∈ [0, ∞).
(6.2)
T0 ∈ [(2δ)−1 , 5(4δ)−1 ],
(6.3)
x(0) ∈ dom(f ) ∩ BX (0, M)
(6.4)
Assume that
x ∈ W 1,1 (0, T0 ; X),
and that for almost every t ∈ [0, T0 ], x(t) ∈ dom(f )
(6.5)
BX (x (t), δ) ∩ (−μ(t)∂f (x(t))) = ∅.
(6.6)
and
Then the following two assertions hold. 1. T0−1
T0 0
f (x(t)dt − inf(f ),
176
6 Continuous Subgradient Method
f (T0−1
T0
x(t)dt) − inf(f ),
0
min{f (x(t)) : t ∈ [0, T0 ]}− ≤ inf(f ) ≤ (2M + 2)2 μ−1 1 δ, z − x(t) ≤ 2M + 2, t ∈ [0, T0 ]. and for almost every t ∈ [0, T0 ], 2x(t) − z, x (t) ≤ 2μ1 (f (z) − f (x(t)) + 2δ(2M + 2). 2. Assume that M ≥ 4, δ ≤ min{μ1 (2M + 2)−1 , 16−1 }
(6.7)
and that {x ∈ X : f (x) ≤ inf(f ) + 4} ⊂ BX (0, 4−1 M). Then x(T0 ) ≤ M. Theorem 6.1 easily implies the following result. Theorem 6.2 Let δ ∈ (0, 16−1 ], 0 < μ1 < 1 ≤ μ2 , M ≥ 1, μ : [0, ∞) → R 1 be a Lebesgue measurable function such that μ1 ≤ μ(t) ≤ μ2 for all t ∈ [0, ∞), δ ≤ μ1 (2M + 2)−1 , {x ∈ X : f (x) ≤ inf(f ) + 4} ⊂ BX (0, 4−1 M), T ≥ 5(4δ)−1 , x ∈ W 1,1 (0, T ; X), x(0) ∈ dom(f ) ∩ BX (0, M) and that for almost every t ∈ [0, T ], x(t) ∈ dom(f )
6.3 An Auxiliary Result
177
and B(x (t), δ) ∩ (−μ(t)∂f (x(t))) = ∅. Assume that k ≥ 1 is an integer, (2δ)−1 k < T ≤ (2δ)−1 (k + 1), Ti = i(2δ)−1 , i = 0, . . . , k − 1, Tk = T . Then x(t) ≤ 3M + 2 for all t ∈ [0, T ], x(Ti ) ≤ M, i = 0, . . . , k and for all integers i = 0, . . . , k − 1, f ((Ti+1 − Ti )−1
Ti+1
x(t)dt) − inf(f ),
Ti
min{f (x(t)) : t ∈ [Ti , Ti+1 ]} − inf(f ) ≤ (Ti+1 − Ti )
−1
Ti+1 Ti
f (x(t))dt − inf(f ) ≤ (2M + 2)2 δμ−1 1 .
6.3 An Auxiliary Result Let V ⊂ X be an open convex set and g : V → R 1 be a convex locally Lipschitzian function. Let T > 0, x0 ∈ X and let u : [0, T ] → X be a Bochner integrable function. Set
t
x(t) = x0 +
u(s)ds, t ∈ [0, T ].
0
Then x : [0, T ] → X is differentiable and x (t) = u(t) for almost every t ∈ [0, T ]. Assume that x(t) ∈ V for all t ∈ [0, T ]. We claim that the restriction of g to the set {x(t) : t ∈ [0, T ]}
178
6 Continuous Subgradient Method
is Lipschitzian. Indeed, since the set {x(t) : t ∈ [0, T ]} is compact, the closure of its convex hull C is both compact and convex, and so the restriction of g to C is Lipschitzian. Hence the function (g · x)(t) := g(x(t)), t ∈ [0, T ], is absolutely continuous. It follows that for almost every t ∈ [0, T ], both the derivatives x (t) and (g · x) (t) exist: x (t) = lim h−1 [x(t + h) − x(t)],
(6.8)
(g · x) (t) = lim h−1 [g(x(t + h)) − g(x(t))].
(6.9)
h→0
h→0
We continue with the following fact (see Proposition 14.2 of [92]). Proposition 6.3 Assume that t ∈ [0, T ] and that both the derivatives x (t) and (g · x) (t) exist. Then (g · x) (t) = lim h−1 [g(x(t) + hx (t)) − g(x(t))]. h→0
(6.10)
Corollary 6.4 Let z ∈ X and g(y) = z − y2 for all y ∈ X. Then for almost every t ∈ [0, T ], the derivative (g · x) (t) exists and (g · x) (t) = 2x (t), x(t) − z.
6.4 Proof of Theorem 6.1 Set φ(t) = z − x(t)2 , t ∈ [0, T0 ].
(6.11)
In view of Corollary 6.4, for a. e. t ∈ [0, T0 ], there exist derivatives x (t), φ (t), and φ (t) = 2x (t), x(t) − z.
(6.12)
By (6.6), for a. e. t ∈ [0, T0 ], there exist ξ(t) ∈ ∂f (x(t))
(6.13)
6.4 Proof of Theorem 6.1
179
such that x (t) + μ(t)ξ(t) ≤ δ.
(6.14)
It follows from (6.12) that for almost every t ∈ [0, T0 ], φ (t) = 2x(t) − z, x (t) = 2x(t) − z, −μ(t)ξ(t) + 2x(t) − z, x (t) + μ(t)ξ(t).
(6.15)
In view of (6.14), for almost every t ∈ [0, T0 ], |z − x(t), x (t) + μ(t)ξ(t)| ≤ δz − x(t).
(6.16)
By (6.1), (6.13) and (6.17), for almost every t ∈ [0, T0 ], z − x(t), ξ(t) ≤ −f (x(t)) + f (z).
(6.17)
It follows from (6.1), (6.2), (6.15), and (6.16) that for almost every t ∈ [0, T0 ], φ (t) ≤ 2μ(t)(−f (x(t)) + f (z)) + 2δz − x(t) ≤ 2μ1 (−f (x(t)) + f (z)) + 2δz − x(t).
(6.18)
We show that for all t ∈ [0, T0 ], z − x(t) ≤ 2M + 2.
(6.19)
Assume the contrary. Then there exists τ ∈ (0, T0 ] such that z − x(t) < 2M + 2, t ∈ [0, τ ),
(6.20)
z − x(τ ) = 2M + 2.
(6.21)
By (6.1), (6.4), (6.11), (6.18), (6.20), and (6.21), 8M + 4 = (2M + 2)2 − 4M 2 ≤ z − x(τ )2 − z − x(0)2
180
6 Continuous Subgradient Method
τ
= φ(τ ) − φ(0) =
2δz − x(t)dt
0
≤ 2τ δ(2M + 2).
(6.22)
In view of (6.3) and (6.22), 5(4δ)−1 ≥ T0 ≥ τ ≥ (8M + 4)(4M + 4)−1 δ −1 ≥ 3(2δ)−1 . The contradiction we have reached proves that z − x(t) ≤ 2M + 2, t ∈ [0, T0 ].
(6.23)
By (6.18), for almost every t ∈ [0, T0 ] φ (t) ≤ 2μ1 (f (z) − f (x(t)) + 2δ(2M + 2).
(6.24)
It follows from (6.1), (6.4), (6.11), (6.12), and (6.14) that 4M 2 ≥ φ(0) − φ(T0 )
T0
=−
φ (t)dt ≥ 2μ1
0
T0
(f (x(t)) − f (z))dt − 2δT0 (2M + 2).
(6.25)
0
By (6.3) and (6.25), T0−1
T0
(f (x(t)) − f (z))dt
0
≤ (2μ1 )−1 [4M 2 T0−1 + 2δ(M + 2)] ≤ 2δ(2μ1 )−1 (4M 2 + 4M + 4) = (2M + 2)2 μ−1 1 δ. Therefore assertion 1 is proved. Let us complete the proof of assertion 2. By (6.7) and (6.16), T0−1
T0
(f (x(t)) − f (z))dt ≤ 1.
0
This implies that 4T0−1 mes{t ∈ [0, T0 ] : f (x(t)) > f (z) + 4} < 1, mes{t ∈ [0, T0 ] : f (x(t)) > f (z) + 4} < 4−1 T0 .
(6.26)
6.5 Continuous Subgradient Method for Zero-Sum Games
181
This implies that there exists t0 ∈ [3T0 /4, T0 ]
(6.27)
f (x(t0 )) ≤ f (z) + 4.
(6.28)
such that
By (6.1), (6.28), and the assumptions of the assertion, x(t0 )| ≤ M/4, z ≤ M/4.
(6.29)
It follows from (6.1), (6.3), (6.11), (6.12), (6.18), (6.23), and (6.27) that z − x(T0 )2 − z − x(t0 )2 =
T0
φ (t)dt
t0
≤ 28(T0 − t0 )(2M + 2) = (3/2)T0 δ(2M + 2) ≤ 4M + 4. In view of the relation above, (6.1) and (6.29), z − x(T0 )2 ≤ M 2 /4 + 4M + 4 ≤ (M/2 + 4)2 and x(T0 ) ≤ z + M/2 + 4 ≤ 3M/4 + 4 ≤ M. Assertion 2 is proved. This completes the proof of Theorem 6.1.
6.5 Continuous Subgradient Method for Zero-Sum Games In this section we study continuous subgradient method for zero-sum games with two players. This topic was not considered in [92]. Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Let a borelian function f : X × Y → R 1 ∪ {−∞} possesses the following property: {(x, y) ∈ X × Y : f (x, y) > −∞} = X × C, where C is a nonempty bounded convex subset of Y .
182
6 Continuous Subgradient Method
Recall that for for each concave function g : Y → R 1 ∪ {−∞} and each x ∈ Y satisfying g(x) > −∞, ∂g(x) = {l ∈ Y : l, y − x ≥ g(y) − g(x) for all y ∈ Y }. Clearly, for each x ∈ Y satisfying g(x) > −∞, ∂g(x) = −(∂(−g)(x)). Suppose that the following properties hold: (i) for each y ∈ C, the function f (·, y) : X → R 1 is convex and lower semicontinuous; (ii) for each x ∈ X, the function f (x, ·) : Y → R 1 ∪ {−∞} is concave and upper semicontinuous. Let x∗ ∈ X and y∗ ∈ C
(6.30)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(6.31)
satisfy
for each x ∈ X and each y ∈ C. Let M > 0, x∗ < M/8,
(6.32)
C ⊂ BY (0, M),
(6.33)
f is bounded from above on BX (0, M/3) × C and let for every y ∈ C, {x ∈ X : f (x, y) ≤ f (x∗ , y∗ ) + 4} ⊂ BX (0, M/3).
(6.34)
In Sect. 6.7 we prove the following result. Theorem 6.5 Let 0 < μ1 < μ2 ,
(6.35)
μ ∈ [μ1 , μ2 ],
(6.36)
6.5 Continuous Subgradient Method for Zero-Sum Games
183
δf,1 , δf,2 ∈ (0, 1), δf,1 < μ1 /M.
(6.37)
Assume that T > 0, x ∈ W 1,1 (0, T ; X), y ∈ W 1,1 (0, T ; Y ), y(0) ∈ C,
(6.38)
y(t) ∈ C,
(6.39)
BY (y (t), δf,2 ) ∩ (μ∂y f (x(t), y(t))) = ∅,
(6.40)
BX (x (t), δf,1 ) ∩ (−μ∂x f (x(t), y(t))) = ∅
(6.41)
x(0) ≤ M/8.
(6.42)
x(t) ≤ M for all t ∈ [0, T ],
(6.43)
for almost every t ∈ [0, T ],
and that
Then
|T −1
T 0
f (x(t), y(t))dt − f (x∗ , y∗ )|
−1 ≤ 2M 2 T −1 μ−1 1 + 2Mμ1 max{δf,1 , δf,2 },
|f (T −1
T
x(t)dt, T −1
0
T
y(t)dt) − T −1
0
(6.44)
T
f (x(t), y(t))dt| 0
−1 ≤ 2M 2 T −1 μ−1 1 + 2Mμ1 max{δf,1 , δf,2 },
(6.45)
for each v ∈ C, f (T
−1
T
x(t)dt, v) 0
≤ f (T −1
0
T
x(t)dt, T −1
T
y(t)dt) 0
−1 + 4M 2 T −1 μ−1 1 + 4Mμ1 max{δf,1 , δf,2 },
for each z ∈ X,
(6.46)
184
6 Continuous Subgradient Method
f (z, T −1
T
y(t)dt) 0
≥ f (T −1
T
x(t)dt, T −1
0
T
y(t)dt) 0
−1 − 4M 2 T −1 μ−1 1 − 3Mμ1 max{δf,1 , δf,2 }.
(6.47)
Clearly, the best choice of T should be at the same order as max{δf,1 , δf,2 }−1 .
6.6 An Auxiliary Result Proposition 6.6 Let 0 < μ1 < μ2 ,
(6.48)
μ ∈ [μ1 , μ2 ],
(6.49)
δf,1 , δf,2 ∈ (0, 1), δf,1 < μ1 /M.
(6.50)
Assume that T > 0, x ∈ W 1,1 (0, T ; X), y ∈ W 1,1 (0, T ; Y ), y(0) ∈ C,
(6.51)
y(t) ∈ C,
(6.52)
BY (y (t), δf,2 ) ∩ (μ∂y f (x(t), y(t))) = ∅,
(6.53)
BX (x (t), δf,1 ) ∩ (−μ∂x f (x(t), y(t))) = ∅.
(6.54)
and that for almost every t ∈ [0, T ],
Then the following assertions hold. 1. Let z ∈ X and v ∈ C. Then for almost every t ∈ [0, T ], (d/dt)(x(t) − z2 ) ≤ 2δf,1 z − x(t) + 2μ(f (z, y(t)) − f (x(t), y(t))),
(6.55)
(d/dt)(y(t) − v2 ) ≤ 2δf,2 v − y(t) + 2μ(−f (x(t), v) + f (x(t), y(t))).
(6.56)
6.6 An Auxiliary Result
185
2. Let x(0) ≤ M/8.
(6.57)
x(t) ≤ M
(6.58)
Then for each t ∈ [0, T ],
and for each z ∈ X, each v ∈ C and almost every t ∈ [0, T ], (d/dt)(x(t) − z2 ) ≤ 2δf,1 (z + M) + 2μ(f (z, y(t)) − f (x(t), y(t))),
(6.59)
(d/dt)(y(t) − v2 ) ≤ 4δf,2 M + 2μ(−f (x(t), v) + f (x(t), y(t))).
(6.60)
Proof Let z ∈ X and v ∈ C. For all t ∈ [0, T ] set φ1 (t) = z − x(t)2 , φ2 (t) = v − y(t)2 .
(6.61)
By (6.53) and (6.54), for almost every t ∈ [0, T ], there exist ξ(t) ∈ ∂x f (x(t), y(t)),
(6.62)
η(t) ∈ ∂y f (x(t), y(t))
(6.63)
x (t) + μξ(t) ≤ δf,1
(6.64)
y (t) − μη(t) ≤ δf,2 .
(6.65)
and
such that
and
Corollary 6.4 and (6.61) imply that for almost every t ∈ [0, T ], φ1 (t) = 2x (t), x(t) − z,
(6.66)
φ2 (t) = 2y (t), y(t) − v.
(6.67)
186
6 Continuous Subgradient Method
In view of (6.66) and (6.67), for almost every t ∈ [0, T ], φ1 (t) = 2x (t), x(t) − z = 2x(t) − z, −μξ(t) + 2x(t) − z, x (t) + μξ(t)
(6.68)
and φ2 (t) = 2y(t) − v, y (t) = 2y(t) − v, μη(t) + 2y(t) − v, y (t) − μη(t).
(6.69)
By (6.64) and (6.65), for almost every t ∈ [0, T ], |x(t) − z, x (t) + μξ(t)| ≤ δf,1 z − x(t)
(6.70)
|y(t) − v, y (t) − μη(t)| ≤ δf,2 v − y(t).
(6.71)
and
It follows from (6.62) and (6.63) that for almost every t ∈ [0, T ], z − x(t), ξ(t) ≤ f (z, y(t)) − f (x(t), y(t))
(6.72)
v − y(t), η(t) ≥ f (x(t), v) − f (x(t), y(t)).
(6.73)
and
By (6.61) and (6.68)–(6.73), for almost every t ∈ [0, T ], (d/dt)(x(t) − z2 ) ≤ 2δf,1 z − x(t) + 2μ(f (z, y(t)) − f (x(t), y(t)))
(6.74)
and (d/dt)(y(t) − v2 ) ≤ 2δf,2 v − y(t) + 2μ(−f (x(t), v) + f (x(t), y(t))). Thus assertion 1 is proved. Let us complete the proof of assertion 2. We show that x(t) ≤ M for all t ∈ [0, T ].
(6.75)
6.6 An Auxiliary Result
187
In view of (6.32) and (6.57), x∗ − x(0) ≤ M/4.
(6.76)
By (6.31) and (6.55) with z = x∗ , for almost every t ∈ [0, T ], (d/dt)(x(t) − x∗ 2 ) ≤ 2δf,1 x∗ − x(t) + 2μ(f (x∗ , y(t)) − f (x(t), y(t))) ≤ 2δf,1 x∗ − x(t) + 2μ(f (x∗ , y∗ ) − f (x(t), y(t))).
(6.77)
Define E = {τ ∈ [0, T ] : x(t) − x∗ ≤ (3/4)M for all t ∈ 0, τ ]}.
(6.78)
In view of (6.76), E = ∅. Set τ0 = sup E.
(6.79)
Since the function x is continuous we have τ0 ∈ E.
(6.80)
In order to complete the proof of assertion 2 it is sufficient to show that τ0 = T0 . Assume the contrary. Then τ0 < T0 .
(6.81)
In view of (6.76), τ > 0. Relations (6.78)–(6.81) imply that x(τ0 ) − x∗ = (3/4)M.
(6.82)
188
6 Continuous Subgradient Method
There exists τ1 ∈ (0, τ0 ) such that for all t ∈ [τ1 , τ0 ] we have x(t) − x∗ ≥ (2/3)M. Together with (6.32) this implies that for all t ∈ [τ1 , τ0 ], x(t) ≥ (2/3)M − M/8 > M/3. By the relation above, (6.34) and (6.52), for almost every t ∈ [τ1 , τ0 ], f (x(t), y(t)) > f (x∗ , y∗ ) + 4.
(6.83)
It follows from (6.49) and (6.77)–(6.80) that 0 ≤ x(τ0 ) − x∗ 2 − x(τ1 ) − x∗ 2 ≤ 2(τ0 − τ1 )δf,1 M − 8μ1 (τ0 − τ1 ) and 4μ1 ≤ δf,1 M. This contradicts (6.50). The contradiction we have reached proves that τ0 = T0 and completes the proof of assertion 2 and Proposition 6.6.
6.7 Proof of Theorem 6.5 Proposition 6.6 and the assumptions of the theorem imply that for each t ∈ [0, T ], x(t) ≤ M
(6.84)
and for each z ∈ X, each v ∈ C and almost every t ∈ [0, T ], (d/dt)(x(t) − z2 ) ≤ 2δf,1 (z + M) + 2μ(f (z, y(t)) − f (x(t), y(t))),
(6.85)
6.7 Proof of Theorem 6.5
189
(d/dt)(y(t) − v2 ) ≤ 4δf,2 M + 2μ(−f (x(t), v) + f (x(t), y(t))).
(6.86)
By (6.85) and (6.86), for each z ∈ X, each v ∈ C, T −1 (x(T ) − z2 − x(0) − z2 ) +2μT −1
T
f (x(t), y(t))dt − 2μT −1
T
f (z, y(t))dt
0
0
≤ 2δf,1 (z + M)
(6.87)
and T −1 (y(T ) − v2 − y(0) − v2 ) −2μT
−1
T
f (x(t), y(t))dt + 2μT
−1
0
T
f (x(t), v)dt 0
≤ 4δf,2 M.
(6.88)
z ≤ M
(6.89)
Let z ∈ X satisfy
and v ∈ C. Relations (6.84), (6.85), and (6.89) imply that T
−1
T
f (x(t), y(t))dt − T
−1
T
f (z, y(t))dt
0
0
≤ (2μT )−1 4M 2 + 2Mδf,1 (2μ)−1 .
(6.90)
By (6.33) and (6.88), T −1
T
f (x(t), v)dt − T −1
0
T
f (x(t), y(t))dt 0
≤ (2μT )−1 4M 2 + 4Mδf,2 (2μ)−1 . It follows from (6.30), (6.31), (6.90) with v = y∗ and (6.91) with z = x∗ that T
−1
T 0
f (x(t), y(t))dt − f (x∗ , y∗ )
(6.91)
190
6 Continuous Subgradient Method
≥ T −1
T
f (x(t), y(t))dt − f (T −1
0
≥ T −1
T
x(t)dt, y∗ )
0 T
f (x(t), y(t))dt − T −1
0
T
f (x(t), y∗ )dt
0
≥ −(μT )−1 2M 2 − 2Mδf,2 μ−1
(6.92)
and T −1
T 0
≤T
−1
T 0
≤ T −1
T
f (x(t), y(t))dt − f (x∗ , y∗ ) −1
f (x(t), y(t))dt − f (x∗ , T
f (x(t), y(t))dt − T −1
0
T
y(t)dt) 0
T
f (x∗ , y(t))dt
0
≤ (μT )−1 2M 2 + Mδf,1 μ−1 .
(6.93)
Set xT = T −1
T
x(t)dt, 0
yT = T
−1
T
(6.94)
y(t)dt. 0
Since the set C is convex and closed we have yT ∈ C.
(6.95)
Relations (6.33), (6.84), (6.94), and (6.95) imply that xT , yT ≤ M.
(6.96)
By (6.84), (6.90) with z = xT and (6.96), T −1
T 0
f (x(t), y(t))dt − T −1
T
f (xT , y(t))dt 0
≤ 2M 2 (μT )−1 + 2δf,1 Mμ−1 .
(6.97)
6.7 Proof of Theorem 6.5
191
By (6.52), (6.91) with v = yT and (6.96), T
−1
T
f (x(t), y(t))dt − T
−1
T
f (x(t), yT )dt
0
0
≥ −2M 2 (μT )−1 − 2δf,2 Mμ−1 .
(6.98)
Since the function f (·, yT ) is convex and lower semicontinuous it follows from (6.94) and (6.98) that f (xT , yT ) − T −1
T
f (x(t), yT )dt 0
= f (T −1
T
x(t)dt, yT ) − T −1
f (x(t), yT )dt
0
≤T
−1
T
T
0
f (x(t), yT )dt − T
−1
0
T
f (x(t), y(t))dt 0
≤ 2M 2 (μT )−1 + 2δf,2 Mμ−1 .
(6.99)
Since the function f (xT , ·) is concave and upper semicontinuous it follows from (6.94) and (6.97) that f (xT , yT ) − T −1
T
f (x(t), y(t))dt 0
−1
= f (xT , T
T
y(t)dt) − T
−1
f (x(t), y(t))dt
0
≥ T −1
T
T 0
f (xT , y(t))dt − T −1
0
T
f (x(t), y(t))dt 0
≥ −2M 2 (μT )−1 − 2δf,1 Mμ−1 .
(6.100)
By (6.99) and (6.100), |f (xT , yT ) − T
−1
T
f (x(t), y(t))dt| 0
≤ max{2M 2 (μT )−1 + 2δf,1 Mμ−1 , 2M 2 (μT )−1 + 2δf,2 Mμ−1 } = 2M 2 (μT )−1 + 2Mμ−1 max{δf,1 , δf,2 }.
(6.101)
192
6 Continuous Subgradient Method
Let v ∈ C, z ∈ BX (0, M).
(6.102)
Then (6.90) and (6.91) hold. It follows from (6.94) and properties (i) and (ii) that
T −1
T
f (z, y(t)dt ≤ f (z, T −1
0
T
y(t)dt) = f (z, yT )
(6.103)
x(t)dt, v) = f (xT , v).
(6.104)
0
and T
−1
T
f (x(t), v)dt) ≥ f (T
−1
0
T 0
It follows from (6.90) and (6.103) that −1
T
T
f (x(t), y(t))dt − f (z, yT )
0
≤ T −1
T
f (x(t), y(t))dt − T −1
T
f (z, y(t))dt
0
0
≤ 2M 2 (μT )−1 + Mμ−1 δf,1 .
(6.105)
In view of (6.101) and (6.105), f (z, yT ) ≥T
−1
T
f (x(t), y(t))dt − 2M 2 (μT )−1 − Mμ−1 δf,1
0
≥ f (xT , yT ) − 2Mμ−1 max{δf,1 , δf,2 } − 4M 2 (μT )−1 − Mμ−1 δf,1 .
(6.106)
It follows from (6.91) and (6.104) that T −1
T
f (x(t), y(t))dt − f (xT , v)
0
≥T
−1
T 0
f (x(t), y(t))dt − T
−1
T
f (x(t), v)dt 0
≤ −2M 2 (μT )−1 − 2Mμ−1 δf,2 .
(6.107)
6.8 Continuous Subgradient Projection Method
193
In view of (6.101) and (6.107), f (xT , v) ≤ T −1
T
f (x(t), y(t))dt + 2M 2 (μT )−1 + 2Mμ−1 δf,2
0
≤ f (xT , yT ) + 2Mμ−1 max{δf,1 , δf,2 } +4M 2 (μT )−1 + 2δf,2 Mμ−1 . If z > M, then it follows from (6.34), (6.93), (6.96), and (6.101) that f (z, yT ) ≥ f (x∗ , y∗ ) + 4 ≥ T −1
T
f (x(t), y(t))dt − 2M 2 (μT )−1 − Mμ−1 δf,1
0
≥ f (xT , yT ) − 2Mμ−1 max{δf,1 , δf,2 } − 4M 2 (μT )−1 − δf,1 Mμ−1 .
Theorem 6.5 is proved.
6.8 Continuous Subgradient Projection Method Let X be a Hilbert space with an inner product ·, · which induces a complete norm · . Let C be a nonempty, convex, and closed set in the Hilbert space X, U be an open and convex subset of X such that C ⊂ U, and f : U → R 1 be a convex locally Lipschitzian function. Let x ∈ U and ξ ∈ X. Set f 0 (x, ξ ) = lim t −1 [f (x + tξ ) − f (x)], t→0+
∂f (x; ξ ) = {l ∈ ∂f (x) : l, ξ = f 0 (x, ξ )}. It is a well-known fact of the convex analysis that ∂f (x; ξ ) = ∅.
194
6 Continuous Subgradient Method
Let M > 1, L > 0 and assume that C ⊂ BX (0, M − 1), {y ∈ X : d(y, C) ≤ 1} ⊂ U, |f (v1 ) − f (v2 )| ≤ Lv1 − v2 for all v1 , v2 ∈ BX (0, M + 1) ∩ U.
(6.108)
We will prove the following result which also has no prototype in [92]. Theorem 6.7 Let δf , δC ∈ (0, 1], 0 < μ1 < μ2 , μ1 ≤ 1,
(6.109)
μ ∈ [μ1 , μ2 ].
(6.110)
d(x(0), C) ≤ δC
(6.111)
T0 > 0,
Assume that x ∈ W 1,1 (0, T0 ; X),
and that for almost every t ∈ [0, T0 ], there exists ξ(t) ∈ X such that BX (ξ(t), δf ) ∩ ∂f (x(t); x (t)) = ∅,
(6.112)
PC (x(t) − μξ(t)) ∈ BX (x(t) + x (t), δC ).
(6.113)
Then f (T0−1
T
x(s)ds) − inf(f, C),
0
min{f (x(t)) : t ∈ [0, T0 ]} − inf(f, C) ≤ T0−1
T
f (x(s))ds − inf(f, C)
0
−1 ≤ (2M 2 μ−1 1 + 2(M + 1))T0 −1 +6δf M + μ−1 1 δC (4M + μ2 (L + 1)) + 2δC LT0 .
Clearly, the best choice of T should be at the same order as max{δf , δC }−1 .
6.9 An Auxiliary Result
195
6.9 An Auxiliary Result Lemma 6.8 Let D be a nonempty convex closed subset of X, T , δ > 0, x ∈ W 1,1 (0, T ; X), d(x(0), D) ≤ δ,
(6.114)
Dδ = {x ∈ X : d(x, D) ≤ δ},
(6.115)
BX (x(t) + x (t), δ) ∩ D = ∅ for almost every t ∈ [0, T ].
(6.116)
Then x(t) ∈ Dδ for all t ∈ [0, T ]. Proof For almost every t ∈ [0, T ] set φ(t) = x(t) + x (t).
(6.117)
It is clear that φ : [0, T ] → X is a Bochner integrable function. In view of (6.116) and (6.117), for almost every t ∈ [0, T ], B(φ(t), δ) ∩ D = ∅.
(6.118)
Clearly, Dδ is a closed convex set, for each x ∈ Dδ , B(x, δ) ∩ D = ∅
(6.119)
and in view of (6.115) and (6.118), φ(t) ∈ Dδ for almost every t ∈ [0, T ]. Evidently, the function es φ(s), s ∈ [0, T ] is Bochner integrable. We claim that for all t ∈ [0, T ], t −t −t es φ(s)ds. x(t) = e x(0) + e 0
Clearly, (6.121) holds for t = 0. For every t ∈ (0, T ] we have
t
e φ(s)ds = s
0
t
es (x(s) + x (s))ds
0
= 0
t
(es x(s)) ds = et x(t) − x(0).
(6.120)
(6.121)
196
6 Continuous Subgradient Method
This implies (6.121) for all t ∈ [0, T ]. By (6.121), for all t ∈ [0, T ], x(t) = e−t x(0) + (1 − e−t )(1 − e−t )−1 e−t
t
es φ(s)ds
0
= e−t x(0) + (1 − e−t )
t
es (et − 1)−1 φ(s)ds.
(6.122)
0
In view of (6.120), for all t ∈ [0, T ],
t
es (et − 1)−1 φ(s)ds ∈ Dδ .
(6.123)
0
Relations (6.114), (6.122), and (6.123) imply that x(t) ∈ Dδ for all t ∈ [0, T ].
Lemma 6.8 is proved.
6.10 Proof of Theorem 6.7 Let z ∈ C.
(6.124)
CδC = {x ∈ X : d(x, C) ≤ δC }.
(6.125)
Set
For almost every t ∈ [0, T0 ] set φ(t) = x(t) + x (t).
(6.126)
Lemma 6.8, (6.111), and (6.113) imply that x(t) ∈ CδC for all t ∈ [0, T0 ].
(6.127)
It follows from (6.125) and (6.127) that for every t ∈ [0, T0 ], there exists x (t) ∈ C such that
(6.128)
6.10 Proof of Theorem 6.7
197
x(t) − x (t) ≤ δC .
(6.129)
By (6.128) and Lemma 2.2, for almost every t ∈ [0, T0 ], x (t) − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ 0.
(6.130)
Inequality (6.130) implies that for almost every t ∈ [0, T0 ], x(t) − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ x(t) − x (t), x(t) − μξ(t) − PC (x(t) − μξ(t)).
(6.131)
It follows from (6.124) and Lemma 2.2 that z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ 0.
(6.132)
In view of (6.42), for almost every t ∈ [0, T0 ] there exists ξ (t) ∈ ∂f (x(t); x (t))
(6.133)
ξ (t), x (t), f 0 (x(t), x (t)) =
(6.134)
ξ(t) − ξ (t) ≤ δf .
(6.135)
such that
In view of (6.133), for almost every t ∈ [0, T0 ], f (z) ≥ f (x(t)) + ξ (t), z − x(t).
(6.136)
By (6.113), for almost every t ∈ [0, T0 ], (x(t) + x (t)) − PC (x(t) − μξ(t)) ≤ δC . Relations (6.126) and (6.137) imply that for almost every t ∈ [0, T0 ], z − x(t) − x (t), x(t) − μξ(t) − x(t) − x (t) = z − φ(t), x(t) − μξ(t) − φ(t) = z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) +z − φ(t), PC (x(t) − μξ(t)) − φ(t)
(6.137)
198
6 Continuous Subgradient Method
≤ z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) + δC z − φ(t).
(6.138)
In view of (6.110), (6.125), and (6.127), for all t ∈ [0, T0 ], x(t) ≤ M.
(6.139)
It follows from (6.110), (6.113), (6.126), and (6.139) that for almost every t ∈ [0, T0 ], φ(t) = x(t) + x (t) ≤ M,
(6.140)
x (t) ≤ 2M.
(6.141)
By (6.110), (6.124), (6.126), (6.138), and (6.140), for almost every t ∈ [0, T0 ], z − x(t) − x (t), −μξ(t) − x (t) ≤ z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) + 2MδC .
(6.142)
By (6.110)–(6.112), (6.126), (6.132), and (6.137), z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) +δC x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ δC (2M + μ2 (L + 1)).
(6.143)
In view of (6.142) and (6.143), for almost every t ∈ [0, T0 ], x(t) + x (t) − z, μξ(t) + x (t) ≤ δC (4M + μ2 (L + 1)).
(6.144)
It follows from (6.110), (6.124), (6.135), (6.136), and (6.141) that for almost every t ∈ [0, T0 ], f (x(t)) − f (z) ≤ ξ (t), x(t) − z = ξ(t), x(t) − z + x (t) − ξ(t), x (t) + ξ (t) − ξ(t), x(t) − z + x (t)
6.10 Proof of Theorem 6.7
199
ξ(t) − ξ (t), x (t) ≤ ξ(t), x(t) − z + x (t) − ξ(t), x (t) + 4Mδf .
(6.145)
Relations (6.144) and (6.145) imply that for almost every t ∈ [0, T0 ], f (x(t)) − f (z) ≤ μ−1 −x (t), x(t) + x (t) − z + μ−1 (4M + μ2 (L + 1))δC − ξ(t), x (t) + 4Mδf .
(6.146)
By (6.110) and (6.146), for almost every t ∈ [0, T0 ], f (x(t)) − f (z) ≤ −μ−1 x (t)2 − μ−1 x (t), x(t) − z − ξ(t), x (t) + δC (4M + μ2 (L + 1))μ−1 1 + 4Mδf
(6.147)
and μ−1 x (t)2 + μ−1 x (t), x(t) − z +ξ(t), x (t) + f (x(t)) − f (z) ≤ δC (4M + μ2 (L + 1))μ−1 1 + 4Mδf .
(6.148)
In view of (6.135), (6.141), and (6.148), μ−1 x (t)2 + μ−1 x (t), x(t) − z + ξ (t), x (t) + f (x(t)) − f (z) ≤ δC (4M + μ2 (L + 1))μ−1 1 + 6Mδf .
(6.149)
It follows from (6.134), (6.149), and Corollary 6.4 that for almost every t ∈ [0, T0 ], (2μ)−1 (d/dt)(x(t) − z2 ) + f 0 (x(t), x (t)) +f (x(t)) − f (z) + μ−1 x (t)2 ≤ 6δf M + δC (4M + μ2 (L + 1))μ−1 1 . Using Proposition 6.3, the equality
(6.150)
200
6 Continuous Subgradient Method
f 0 (x(t), x (t)) = (f ◦ x) (t), and integrating inequality (6.150) over the interval [0, t], we obtain that for all t ∈ [0, T0 ], (2μ)−1 x(t) − z2 − (2μ)−1 x(0) − z2 +f (x(t)) − f (x(0))
t
+
(f (x(s)) − f (z))ds
0
≤ 6δf Mt + μ−1 1 δC (4M + μ2 (L + 1)).
(6.151)
Relations (6.108), (6.110), (6.111), and (6.129) imply that for all t ∈ [0, T0 ], f (x(t)) ≥ inf(f, C) − δC L, f (x(t)) ≤ sup(f, C) + δC L.
(6.152)
By (6.110), (6.151), and (6.152), for all t ∈ [0, T0 ], (2μ2 )−1 x(t) − z2 − (2μ1 )−1 x(0) − z2 + inf(f, C) − f (x(0)) − δC L
t
+
(f (x(s)) − f (z))ds
0
≤ 6δf Mt + δC t (4M + μ2 (L + 1))μ−1 1 . The relation above with t = T0 , (6.110), (6.111), (6.128), (6.129), and (6.152) imply that
T0
(f (x(s)) − f (z))ds
0
≤ (2μ1 )−1 x(0) − z2 + sup(f, C) − inf(f, C) +2δC L + 6δf MT0 + δC T0 (4M + μ2 (L + 1))μ−1 1 ≤ 2M 2 μ−1 1 + 2LM + 2δC L + 6δf MT0 +δC T0 (4M + μ2 (L + 1))μ−1 1 .
6.11 Continuous Subgradient Projection Method on Unbounded Sets
201
This implies that
f (T0−1
T0
x(s)ds) − inf(f, C)
0
min{f (x(t)) : t ∈ [0, T0 ]} − inf(f, C) ≤ T0−1
T0
f (x(s))ds − inf(f, C)
0
−1 ≤ (2M 2 μ−1 1 + 2LM)T0 −1 +6δf M + μ−1 1 δC (4M + μ2 (L + 1)) + 2δC LT0 .
This completes the proof of Theorem 6.7.
6.11 Continuous Subgradient Projection Method on Unbounded Sets In Sect. 6.13 of chapter we obtain an extension of Theorem 6.7 for minimization problems on unbounded sets. Note that in [92] we study continuous subgradient projection method only for problems on bounded sets. Let X be a Hilbert space with an inner product ·, · which induces a complete norm · Let C be a nonempty, convex, and closed set in the Hilbert space X, U be an open and convex subset of X such that C ⊂ U, and f : U → R 1 be a convex locally Lipschitzian function. Let M > 4, L ≥ 1, BX (0, M − 1) ∩ argmin(f, C) = ∅,
(6.153)
{y ∈ X : d(y, C) ≤ 1} ⊂ U,
(6.154)
0 < μ1 < μ2 .
(6.155)
We suppose that {x ∈ U : f (x) ≤ inf(f, C) + 1} ⊂ BX (0, M/4),
(6.156)
202
6 Continuous Subgradient Method
c∗ < 0, f (v) ≥ c∗ for all v ∈ U,
(6.157)
> 4M + 210μ2 μ−1 , M 1
(6.158)
2 (8μ2 )−1 M > 4M 2 μ−1 1 − 2c∗ + 2| sup{f (v) : v ∈ BX (0, M)}|,
(6.159)
|f (v1 ) − f (v2 )| ≤ Lv1 − v2 + 1) ∩ U, for all v1 , v2 ∈ BX (0, M
(6.160)
δf , δC ∈ (0, 1) satisfy 2 4 max{δf , δC }(16(L + 1)μ−1 1 M
+ sup{|f (v)| : v ∈ BX (0, M)} − c∗ + 6μ2 (L + 3) + 3 +14M + μ−1 1 (5M + 2μ2 (L + 1) + L + 4)) ≤ 1.
(6.161)
6.12 An Auxiliary Result Proposition 6.9 Let z ∈ BX (0, M) ∩ argmin(f, C),
(6.162)
T0 > 0, x ∈ W 1,1 (0, T0 ; X), d(x(0), C) ≤ δC
(6.163)
and for almost every t ∈ [0, T0 ] there exist ξ(t) ∈ X such that BX (ξ(t), δf ) ∩ ∂f (x(t); x (t)) = ∅,
(6.164)
PC (x(t) − μξ(t)) ∈ BX (x(t) + x (t), δC ).
(6.165)
Then for almost every t ∈ [0, T0 ],
6.12 An Auxiliary Result
203
(2μ)−1 (d/dt)(x(t) − z2 ) +(f ◦ x) (t) + f (x(t)) − f (z) ≤ δf (3x (t) + x(t) − z) −1 +δC (μ−1 1 x(t) − z + μ1 z − φ(t) + ξ(t))
and for all t ∈ [0, T0 ], d(x(t), C) ≤ δC . Proof In view of (6.156) and (6.162), z ≤ M − 1
(6.166)
f (z) = inf(f, C).
(6.167)
φ(t) = x(t) + x (t).
(6.168)
and
For almost every t ∈ [0, T0 ] set
It is clear that φ : [0, T0 ] → X is a Bochner integrable function. In view of (6.105) and (6.168), for almost every t ∈ [0, T0 ], BX (φ(t), δC ) ∩ C = ∅.
(6.169)
CδC = {x ∈ X : d(x, C) ≤ δC }.
(6.170)
Define
Clearly, Cδ is a convex closed set, for each x ∈ Cδ , BX (x, δC ) ∩ C = ∅
(6.171)
φ(t) ∈ CδC for almost every t ∈ [0, T0 ].
(6.172)
and in view of (6.169),
Lemma 6.8, (6.163), and (6.172) imply that x(t) ∈ CδC for all t ∈ [0, T0 ].
(6.173)
204
6 Continuous Subgradient Method
It follows from (6.171) and (6.173) that for every t ∈ [0, T0 ] there exists x (t) ∈ C
(6.174)
x(t) − x (t) ≤ δC .
(6.175)
such that
By (6.174) and Lemma 2.2, for almost every t ∈ [0, T0 ], x (t) − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ 0.
(6.176)
By (6.176), x(t) − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ x(t) − x (t), x(t) − μξ(t) − PC (x(t) − μξ(t)).
(6.177)
It follows from (6.162) and Lemma 2.2 that z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ 0.
(6.178)
In view of (6.164), for almost every t ∈ [0, T0 ] there exists ξ (t) ∈ ∂f (x(t); x (t))
(6.179)
ξ (t), x (t), f 0 (x(t), x (t)) =
(6.180)
ξ(t) − ξ (t) ≤ δf .
(6.181)
such that
In view of (6.109) and (6.179), for almost every t ∈ [0, T0 ], f (z) ≥ f (x(t)) + ξ (t), z − x(t).
(6.182)
By (6.165), for almost every t ∈ [0, T0 ], (x(t) + x (t)) − PC (x(t) − μξ(t)) ≤ δC . Relations (6.168) and (6.183) imply that for almost every t ∈ [0, T0 ], z − x(t) − x (t), x(t) − μξ(t) − x(t) − x (t) = z − φ(t), x(t) − μξ(t) − φ(t)
(6.183)
6.12 An Auxiliary Result
205
= z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) +z − φ(t), PC (x(t) − μξ(t)) − φ(t) ≤ z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) + δC z − φ(t).
(6.184)
By (6.168), (6.178), (6.183), and the definition of PC , for a. e. t ∈ [0, T0 ], z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) +δC x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ δC x(t) − μξ(t) − z.
(6.185)
In view of (6.184) and (6.185), for a. e. t ∈ [0, T0 ], x(t) + x (t) − z, μξ(t) + x (t) ≤ δC z − φ(t) + δC (x(t) − z − μξ(t)).
(6.186)
It follows from (6.181) and (6.182) that for almost every t ∈ [0, T0 ], f (x(t)) − f (z) ≤ ξ (t), x(t) − z = ξ(t), x(t) − z + x (t) − ξ(t), x (t) + ξ (t) − ξ(t), x(t) − z + x (t) +ξ(t) − ξ (t), x (t) ≤ ξ(t), x(t) − z + x (t) − ξ(t), x (t) + δf (x(t) − z + 2x (t)). In view of (6.184) and (6.185), for almost every t ∈ [0, T0 ], ξ(t), x(t) − z + x (t) ≤ μ−1 −x (t), x(t) − z + x (t)
(6.187)
206
6 Continuous Subgradient Method
+ξ(t) + μ−1 x (t), x(t) − z + x (t) ≤ μ−1 −x (t), x(t) − z + x (t) +μ−1 δC z − φ(t) +z − φ(t), x(t) − μξ(t) − PC (x(t) − μξ(t))μ−1 ≤ μ−1 δC z − φ(t) −μ−1 x (t), x(t) − z + x (t) + μ−1 δC x(t) − μξ(t − z.
(6.188)
Relations (6.187) and (6.188) imply that for almost every t ∈ [0, T0 ], f (x(t)) − f (z) ≤ μ−1 −x (t), x(t) + x (t) − z +μ−1 δC (z − φ(t) + x(t) − z + μξ(t)) − ξ(t), x (t) + δf (x(t) − z + 2x (t)).
(6.189)
By (6.189), for almost every t ∈ [0, T0 ], f (x(t)) − f (z) ≤ −μ−1 x (t)2 − μ−1 x (t), x(t) − z −ξ(t), x (t) +δC μ−1 (z − φ(t) + x(t) − z + μξ(t)) + δf (x(t) − z + 2x (t)). In view of (6.155), (6.181), and (6.190), μ−1 x (t)2 + μ−1 x (t), x(t) − z + ξ (t), x (t) + f (x(t)) − f (z) ≤ δf x (t) + δf x(t) − z + 2x (t)
(6.190)
6.13 The Convergence Result
207
−1 + δC (ξ(t) + μ−1 1 z − φ(t) + μ x(t) − z).
(6.191)
Corollary 6.4, (6.181), and (6.191) imply that for almost every t ∈ [0, T0 ], (2μ)−1 (d/dt)(x(t) − z2 ) + f 0 (x(t), x (t)) +f (x(t)) − f (z) + μ−1 x (t)2 ≤ δf (x(t) − z + 3x (t)) −1 +δC (ξ(t) + μ−1 1 z − φ(t) + μ x(t) − z).
Together with Proposition 6.3 and the equality f 0 (x(t), x (t)) = (f ◦ x) (t) for almost every t ∈ [0, T0 ], this implies that for almost every t ∈ [0, T0 ], (2μ)−1 (d/dt)(x(t) − z2 ) +(f ◦ x) (t) + f (x(t)) − f (z) ≤ δf (x(t) − z + 3x (t)) −1 +δC (ξ(t) + μ−1 1 z − φ(t) + μ x(t) − z).
Proposition 6.9 is proved.
6.13 The Convergence Result We will prove the following result. Theorem 6.10 Let 0 < μ1 ≤ 1 < μ2 ,
(6.192)
0 = 4 max{δf , δC }[8(L + 1)(2M 2 μ−1 1 + sup{|f (v)| : v ∈ BX (0, M)} − c∗ ) + 3μ2 (L + 1) + 3 + μ−1 (5M + 2μ2 (L + 1) + L + 4)]. + 14M 1 Then following two assertions hold.
(6.193)
208
6 Continuous Subgradient Method
1. Let 8−1 min{δf−1 , δC−1 }(L + 1)−1 ≤ T0 ≤ min{δf−1 , δC−1 }(L + 1)−1 ,
(6.194)
μ1 ≤ μ ≤ μ2 . Assume that x ∈ W 1,1 (0, T0 ; X), x(0) ∈ BX (0, M), d(x(0), C) ≤ δC
(6.195)
and that for almost every t ∈ [0, T0 ], there exists ξ(t) ∈ X such that BX (ξ(t), δf ) ∩ ∂f (x(t); x (t)) = ∅, PC (x(t) − μξ(t)) ∈ B(x(t) + x (t), δC ). Then for all t ∈ [0, T0 ], x(t) ≤ M min{f (x(t)) : t ∈ [0, T0 ]} − inf(f, C), f (T0−1 ≤
T0−1
T0
T0
x(s)ds) − inf(f, C)
0
f (x(s))ds − inf(f, C) ≤ 0 /4,
0
mes{t ∈ [0, T0 ] : f (x(t)) > inf(f, C) + 0 } ≤ T0 /4, d(x(t), C) ≤ δC , t ∈ [0, T0 ] and there exists t ∈ [(3/4)T0 , T0 ] such that x(t) ≤ M. 2. Assume that T ≥ max{δf−1 , δC−1 }(L + 1)−1 , x ∈ W 1,1 (0, T ; X),
(6.196) (6.197)
6.13 The Convergence Result
209
x(0) ∈ BX (0, M), d(x(0), C) ≤ δC and that for almost every t ∈ [0, T ], there exists ξ(t) ∈ X such that (6.196) and (6.197) hold. Then t ∈ [0, T ] x(t) ≤ M, and there exists a sequence of numbers {Ti }ki=0 , where k is a natural number, such that T0 = 0, Tk = T and that for all integers i = 0, . . . , k − 1, 4−1 min{δf−1 , δC−1 }(L + 1)−1 ≤ Ti+1 − Ti ≤ min{δf−1 , δC−1 }(L + 1)−1 , x(Ti ) ≤ M, i = 0, . . . , k − 1 and that for all i = 0, . . . , k − 1, min{f (x(t)) : t ∈ [Ti , Ti+1 ]} − inf(f, C), f ((Ti+1 − Ti )−1
Ti+1
x(s)ds) − inf(f, C)
Ti
≤ (Ti+1 − Ti )−1
Ti+1
f (x(s))ds − inf(f, C) ≤ 0 .
Ti
Proof In view of (6.156), there exists z ∈ BX (0, M − 1) ∩ argmin(f, C).
(6.198)
Proposition 6.9 implies that for almost every t ∈ [0, T0 ], (2μ)−1 (d/dt)(x(t) − z2 ) +(f ◦ x) (t) + f (x(t)) − f (z) ≤ δf (x(t) − z + 3x (t)) + δC (ξ(t) + 1 + μ−1 z − φ(t) + μ−1 x(t) − z).
(6.199)
210
6 Continuous Subgradient Method
For almost every t ∈ [0, T0 ] set φ(t) = x(t) + x (t).
(6.200)
Define for all t ∈ [0, τ ]}. E = {τ ∈ (0, T0 ] : x(t) ≤ M
(6.201)
Since the function x is continuous it follows from (6.158) and (6.195) that E = ∅. Set τ = sup(E).
(6.202)
τ ∈E
(6.203)
t ∈ [0, τ ]. x(t) ≤ M,
(6.204)
Clearly,
and
By (6.160), (6.196), and (6.204), for almost every t ∈ [0, τ ], ξ(t) ≤ L + 1.
(6.205)
Lemma 2.2, (6.198), (6.204), and (6.205) imply that for almost every t ∈ [0, τ ], PC (x(t) − μξ(t)) ≤ PC (x(t) − μξ(t)) − z + z ≤ x(t) − μξ(t) − z + z + μ2 (L + 1) + 2M. ≤M
(6.206)
It follows from (6.197), (6.200), (6.204), and (6.206) that for almost every t ∈ [0, τ ], φ(t) ≤ PC (x(t) − μξ(t)) + 1 + μ2 (L + 1) + 2M + 1 ≤M and
(6.207)
6.13 The Convergence Result
211
x (t) ≤ x(t) + φ(t) + μ2 (L + 1) + 2M + 1. ≤ 2M
(6.208)
Thus for a. e. t ∈ [0, τ ], (6.207) and (6.208) are true. Combined with (6.198), (6.199), and (6.201)–(6.203), this implies that for almost every t ∈ [0, τ ], (2μ)−1 (d/dt)(x(t) − z2 ) +(f ◦ x) (t) + f (x(t)) − f (z) + 3μ2 (L + 1) + 6M + 3 + M + M) ≤ δf (6M −1 +δC (μ−1 1 M + μ1 (M + μ2 (L + 1) + 4M + 1) + L + 1).
Integrating the inequality above over the interval [t1 , t2 ] ⊂ [0, τ ], we obtain that for all t1 , t2 ∈ [0, τ ] satisfying t1 < t2 , (2μ)−1 x(t2 ) − z2 − (2μ)−1 x(t1 ) − z2 +f (x(t2 )) − f (x(t1 )) +
t2
(f (x(s)) − f (z))ds
t1
+ 3μ2 (L + 1) + 3) ≤ (t2 − t1 )δf (14M +(t2 − t1 )δC μ−1 1 (5M + μ2 (L + 1) + 4) + δC (L + 1)(t2 − t1 ).
(6.209)
Since the function x is continuous it follows from (6.201) to (6.203) that at least one of the following equalities holds: τ = T0 ;
(6.210)
x(τ ) = M.
(6.211)
Assume that (6.211) holds. By (6.157), (6.192), (6.195), (6.198), (6.209) with t2 = τ , t1 = 0 and (6.211), − M)2 − 2(μ1 )−1 4M 2 (2μ2 )−1 (M +c∗ − sup{f (v) : v ∈ BX (0, M)}
212
6 Continuous Subgradient Method
+ 3μ2 (L + 1) + 3) ≤ τ δf (14M + τ δC μ−1 1 (5M + 6μ2 (L + 1) + 1) + τ δC (L + 1).
(6.212)
By (6.158) and (6.159), − 2M)2 − 2(μ1 )−1 4M 2 (2μ2 )−1 (M +c∗ − sup{f (v) : v ∈ BX (0, M)} −1 2 2 ≥ 2−1 μ−1 2 M /4 − 2μ1 M
+c∗ − sup{f (v) : v ∈ BX (0, M)} 2 ≥ 16−1 μ−1 2 M .
(6.213)
It follows from (6.194), (6.212), (6.213), and the inequality τ ≤ T0 that 2 16−1 μ−1 2 M + 3μ2 (L + 1) + 3) ≤ τ δf (14M +τ δC μ−1 1 (5M + 6μ2 (L + 1) + 1) +τ δC (L + 1) + 3μ2 (L + 1) + 3) ≤ (L + 1)−1 min{δf−1 , δC−1 }δf (14M +(L + 1)−1 min{δf−1 , δC−1 }δC [μ−1 1 (5M +6μ2 (L + 1) + 1) + L + 1)] + 3μ2 + 1 ≤ 7M −1 −1 + 5μ−1 1 M + 6μ2 μ1 + μ1 + 1.
In view of (6.158), (6.192), and (6.214), 2 ≤ 16 · 7μ2 M + 48μ22 M 2 μ−1 +16μ2 + 90Mμ 1 −1 +96μ22 μ−1 1 + 16μ2 μ1 + 16μ2
(6.214)
6.13 The Convergence Result
213
μ2 μ−1 1 (202M + 16 + 160μ2 + 32) −1 ≤ μ2 μ−1 1 (202M + 208μ2 ) ≤ 203μ2 μ1 M
and ≤ 203μ2 μ−1 . M 1 This contradicts (6.158). The contradiction we have reached proves that x(τ ) < M. Therefore τ = T0 and t ∈ [0, T0 ]. x(t) ≤ M,
(6.215)
It follows from (6.157), (6.192), (6.195), (6.198), and (6.209) with t2 = T0 and t1 = 0 that
T0
(f (x(s)) − f (z))ds
0
≤ (2μ1 )−1 x(0) − z2 + f (x(0)) − f (x(T0 )) + 3μ2 (L + 1) + 3) +T0 δf (14M +T0 δC μ−1 1 (5M + μ2 (L + 1) + 4) + δC T0 (L + 1) ≤ (2μ1 )−1 4M 2 − c∗ + sup{f (v) : v ∈ BX (0, M)} + 3μ2 (L + 1) + 3) +T0 δf (14M −1 + T0 δC μ−1 1 (5M + μ2 (L + 1) + 4) + μ1 δC T0 (L + 1).
By (6.191), (6.192), (6.193), (6.198), and (6.216), T0−1
T0
(f (x(s)) − inf(f, C))ds
0
≤ ((2μ1 )−1 4M 2 + sup{f (v) : v ∈ BX (0, M)}
(6.216)
214
6 Continuous Subgradient Method
−c∗ )8 max{δf , δC }(L + 1) + 3μ2 (L + 1) + 3) +δf (14M + δC μ−1 1 (5M + μ2 (L + 1) + 4) ≤ 0 /4.
(6.217)
By (6.217), f (T0−1
T
x(s)ds) − inf(f, C)
0
min{f (x(t)) : t ∈ [0, T0 ]} − inf(f, C) ≤
T0−1
T
f (x(s))ds − inf(f, C) ≤ 0 /4.
(6.218)
0
Define E = {t ∈ [0, T0 ] : f (x(t)) > inf(f, C) + 0 }.
(6.219)
In view of (6.217) and (6.219), T0−1 mes(E)0 ≤ T0−1
T0
(f (x(s)) − inf(f, C))ds ≤ 0 /4
0
and mes(E) ≤ T0 /4.
(6.220)
d(x(t), C) ≤ δC , t ∈ [0, T0 ].
(6.221)
Proposition 6.9 implies that
Set t0 = max{t ∈ [0, T0 ] : f (x(t)) − inf(f, C) ≤ 0 }. It follows from (6.219) and (6.220) that t0 > 3T0 /4. By the choice of t0 , (6.156), (6.161), (6.193), and the inequalities 0 ≤ 1, f (x(t0 )) ≤ inf(f, C) + 1
(6.222)
6.14 Subgradient Projection Algorithm for Zero-Sum Games
215
we have x(t0 ) ≤ M.
(6.223)
Now Assertion 1 follows from (6.215) and (6.217)–(6.223). Assertion 2 follows from Assertion 1 applied by induction. Theorem 6.10 is proved.
6.14 Subgradient Projection Algorithm for Zero-Sum Games Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Let C be a nonempty closed convex subset of X, D be a nonempty closed convex subset of Y , U be an open convex subset of X, and V be an open convex subset of Y such that C ⊂ U, D ⊂ V ,
(6.224)
and let a borelian function f : U × V → R 1 possess the following properties: (i) for each v ∈ V , the function f (·, v) : U → R 1 is convex and locally Lipschitzian; (ii) for each u ∈ U , the function f (u, ·) : V → R 1 is concave and locally Lipschitzian. Let δC , δD , δf,1 , δf,2 ∈ (0, 1].
(6.225)
For each (ξ, η) ∈ U × V , set ∂x f (ξ, η) = {l ∈ X : f (y, η) − f (ξ, η) ≥ l, y − ξ for all y ∈ U }, ∂y f (ξ, η) = {l ∈ Y : l, y − η ≥ f (ξ, y) − f (ξ, η) for all y ∈ V }.
(6.226)
Let x ∈ U , y ∈ V , ξ ∈ X, and η ∈ Y . Set f 0 (x, y, ξ ) = lim t −1 [f (x + tξ, y) − f (x, y)],
(6.227)
f 0 (x, y, η) = lim t −1 [f (x, y + tη) − f (x, y)],
(6.228)
∂x f (x, y; ξ ) = {l ∈ ∂x f (x, y) : l, ξ = f 0 (x, y, ξ )}.
(6.229)
∂y f (x, y; η) = {l ∈ ∂y f (x, y) : l, η = f 0 (x, y, η)}.
(6.230)
t→0+
t→0+
216
6 Continuous Subgradient Method
We study subgradient projection algorithm for the zero-sum game defined by the triplet (f, U, V ) and prove in Sect. 6.16 a convergence result which is based on an auxiliary result of Sect. 5.15.
6.15 An Auxiliary Result Proposition 6.11 Let T > 0, x ∈ W 1,1 (0, T ; X), y ∈ W 1,1 (0, T ; Y ), μ > 0,
(6.231)
x(t) ∈ U, y(t) ∈ V , t ∈ [0, T ], d(x(0), C) ≤ δC , d(y(0), D) ≤ δD
(6.232)
and let for almost every t ∈ [0, T ] there exist ξ(t) ∈ X, η(t) ∈ Y such that BX (ξ(t), δf,1 ) ∩ ∂x f (x(t), y(t)) = ∅,
(6.233)
PC (x(t) − μξ(t)) ∈ BX (x(t) + x (t), δC ),
(6.234)
BY (η(t), δf,2 ) ∩ ∂y f (x(t), y(t)) = ∅,
(6.235)
PD (y(t) + μη(t)) ∈ BY (y(t) + y (t), δD ).
(6.236)
Then for almost every t ∈ [0, T ], every z ∈ C, and every v ∈ D, (2μ)−1 (d/dt)(x(t) − z2 ) +f (x(t), y(t)) − f (z, y(t)) ≤ 4−1 μξ(t)2 +(μ−1 δC + δf,1 )(2 + 3z + 3x(t) + 2μξ(t)), (2μ)−1 (d/dt)(y(t) − v2 ) +f (x(t), v) − f (x(t), y(t)) ≤ 4−1 μη(t)2 +(μ−1 δD + δf,2 )(2 + 3v + 3y(t) + 2μη(t)) and for all t ∈ [0, T ],
6.15 An Auxiliary Result
217
BX (x(t), δC ) ∩ C = ∅, BY (y(t), δD ) ∩ D = ∅. Proof For almost every t ∈ [0, T ] set φX (t) = x(t) + x (t), φY (t) = y(t) + y (t).
(6.237)
It is clear that φX : [0, T ] → X and φY : [0, T ] → Y are Bochner integrable functions. In view of (6.138), (6.234), and (6.236), for almost every t ∈ [0, T ], BX (φX (t), δC ) ∩ C = ∅,
(6.238)
BY (φY (t), δD ) ∩ D = ∅.
(6.239)
Cδ = {x ∈ X : d(x, C) ≤ δC },
(6.240)
Dδ = {y ∈ Y : d(y, D) ≤ δD }.
(6.241)
Define
Clearly, Cδ and Dδ are convex closed sets, for each x ∈ Cδ , B(x, δC ) ∩ C = ∅,
(6.242)
B(x, δD ) ∩ D = ∅,
(6.243)
for each x ∈ Dδ ,
In view of (6.238) and (6.239), φX (t) ∈ Cδ , φY (t) ∈ Dδ for almost every t ∈ [0, T ].
(6.244)
Lemma 6.8, (6.232), (6.237), (6.240), (6.241), and (6.244) imply that for every t ∈ [0, T ], x(t) ∈ Cδ , y(t) ∈ Dδ . It follows from the relation above that for every t ∈ [0, T ] there exists x (t) ∈ C, y (t) ∈ D
218
6 Continuous Subgradient Method
such that x(t) − x (t) ≤ δC , y(t) − y (t) ≤ δD .
(6.245)
By (6.245) and Lemma 2.2, for almost every t ∈ [0, T ], x (t) − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ 0
(6.246)
and y (t) − PD (y(t) + μη(t)), y(t) + μη(t) − PD (y(t) + μη(t)) ≤ 0.
(6.247)
Relations (6.246) and (6.247) imply that for almost every t ∈ [0, T ], x(t) − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ x(t) − x (t), x(t) − μξ(t) − PC (x(t) − μξ(t))
(6.248)
and y(t) − PD (y(t) + μη(t)), y(t) + μη(t) − PD (y(t) + μη(t)) ≤ y(t) − y (t), y(t) + μη(t) − PD (y(t) + μη(t)).
(6.249)
Let z ∈ C, v ∈ D.
(6.250)
It follows from (6.250) and Lemma 2.2 that z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ 0
(6.251)
v − PD (y(t) + μη(t)), y(t) + μη(t) − PD (y(t) + μη(t)) ≤ 0.
(6.252)
and
In view of (6.229), (6.230), (6.233), and (6.235), for almost every t ∈ [0, T ] there exists ξ (t) ∈ ∂x f (x(t), y(t))
(6.253)
ξ (t), x (t), f 0 (x(t), y(t), x (t) ≥
(6.254)
ξ(t) − ξ (t) ≤ δf,1
(6.255)
such that
6.15 An Auxiliary Result
219
and η(t) ∈ ∂y f (x(t), y(t))
(6.256)
η(t), y (t), f 0 (x(t), y(t), y (t)) ≤
(6.257)
η(t) − η(t) ≤ δf,2 .
(6.258)
such that
In view of (6.226), (6.253), and (6.256), for almost every t ∈ [0, T ], f (z, y(t)) ≥ f (x(t), y(t)) + ξ (t), z − x(t), f (x(t), v) − f (x(t), y(t)) ≤ η(t), v − y(t) and f (x(t), y(t)) − f (z, y(t)) ξ (t), x (t) ≤ ξ (t), x(t) − z + x (t) −
(6.259)
f (x(t), y(t)) − f (x(t), v) η(t), y (t). ≥ η(t), y(t) − v + y (t) −
(6.260)
By (6.234) and (6.236), for almost every t ∈ [0, T ], (x(t) + x (t)) − PC (x(t) − μξ(t)) ≤ δC
(6.261)
and (y(t) + y (t)) − PD (y(t) + μη(t)) ≤ δD .
(6.262)
Relations (6.234), (6.237), (6.245), (6.261), and (6.262) imply that for almost every t ∈ [0, T ], z − x(t) − x (t), x(t) − μξ(t) − x(t) − x (t) = z − φX (t), x(t) − μξ(t) − φX (t) = z − φX (t), x(t) − μξ(t) − PC (x(t) − μξ(t)) +z − φX (t), PC (x(t) − μξ(t)) − φX (t)
220
6 Continuous Subgradient Method
≤ z − φX (t), x(t) − μξ(t) − PC (x(t) − μξ(t)) + δC z − φX (t)
(6.263)
and v − y(t) − y (t), y(t) + μη(t) − y(t) − y (t) = v − φY (t), y(t) + μη(t) − φY (t) = v − φY (t), y(t) + μη(t) − PC (y(t) + μη(t)) +v − φY (t), PD (y(t) + μη(t)) − φY (t) ≤ v − φY (t), y(t) + μη(t) − PD (y(t) + μη(t)) + δD v − φY (t).
(6.264)
By (6.237), (6.250)–(6.252), (6.261), and (6.262), z − φX (t), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) +PC (x(t) − μξ(t)) − φX (t), x(t) − μξ(t) − PC (x(t) − μξ(t)) ≤ z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) +δC x(t) − μξ(t) − z ≤ z − PC (x(t) − μξ(t)), x(t) − μξ(t) − PC (x(t) − μξ(t)) +δC (x(t) + μξ(t) + z) ≤ δC (x(t) + μξ(t) + z) and v − φY (t), y(t) + μη(t) − PD (y(t) + μη(t)) ≤ v − PD (y(t) + μη(t)), y(t) + μη(t) − PD (y(t) + μη(t)) +PD (y(t) + μη(t)) − φY (t), y(t) + μη(t) − PD (y(t) + μη(t)) ≤ v − PD (y(t) + μη(t)), y(t) + μη(t) − PD (y(t) + μη(t))
(6.265)
6.15 An Auxiliary Result
221
+δD y(t) + μη(t) − v ≤ v − PD (y(t) + μη(t)), y(t) + μη(t) − PD (y(t) + μη(t)) +δD (y(t) + μη(t) + v) ≤ δD (y(t) + μη(t) + v).
(6.266)
Lemma 2.2, (6.237), (6.250), and (6.261)–(6.266) imply that for almost every t ∈ [0, T ], x(t) + x (t) − z, μξ(t) + x (t) ≤ z − φX (t), x(t) − μξ(t) − PC (x(t) − μξ(t))) +δC z − φX (t) ≤ δC (z − x(t) − x (t) + x(t) + μξ(t) + z) ≤ δC (1 + z + x(t) + μξ(t) + x(t) + μξ(t) + z)
(6.267)
and y(t) + y (t) − v, −μη(t) + y (t) ≤ v − φY (t), y(t) + μη(t) − PD (y(t) + μη(t))) +δC v − φY (t) ≤ δD (v − y(t) − y (t) + y(t) + μη(t) + v) ≤ δD (1 + v + y(t) + μη(t) + y(t) + μη(t) + v).
(6.268)
It follows from (6.163), (6.250), (6.253), (6.254), (6.256), (6.258), (6.261), and Lemma 2.2 that for almost every t ∈ [0, T ], f (x(t), y(t)) − f (z, y(t)) ≤ ξ (t), x(t) − z = ξ(t), x(t) − z + x (t) − ξ(t), x (t) + ξ (t) − ξ(t), x(t) − z + x (t) +ξ(t) − ξ (t), x (t)
222
6 Continuous Subgradient Method
≤ ξ(t), x(t) − z + x (t) − ξ(t), x (t) +δf,1 x(t) − z + 2x (t) ≤ ξ(t), x(t) − z + x (t) − ξ(t), x (t) +δf,1 (2x(t) − z + x (t) + z + x(t)) ≤ ξ(t), x(t) − z + x (t) − ξ(t), x (t) + δf,1 (z + x(t) + 2 + 2(z + x(t) + μξ(t)))
(6.269)
and f (x(t), y(t)) − f (x(t), v) ≥ η(t), y(t) − v = η(t), y(t) − v + y (t) − η(t), y (t) + η(t) − η(t), y(t) − v + y (t) η(t) − η(t), y (t) ≥ η(t), y(t) − v + y (t) − η(t), y (t) −δf,2 y(t) − v + 2y (t) ≥ η(t), y(t) − v + y (t) − η(t), y (t) −δf,2 (2y(t) − v + y (t) + v + y(t)) ≥ η(t), y(t) − v + y (t) − η(t), y (t) − δf,2 (2 + 2(y(t) + μη(t) + v) + v + y(t)). In view of (6.267), for almost every t ∈ [0, T ], ξ(t), x(t) − z + x (t) ≤ μ−1 −x (t), x(t) − z + x (t) +ξ(t) + μ−1 x (t), x(t) − z + x (t) ≤ μ−1 −x (t), x(t) − z + x (t)
(6.270)
6.15 An Auxiliary Result
223
+ μ−1 δC (1 + 2z + 2x(t) + 2μξ(t)).
(6.271)
In view of (6.268), for almost every t ∈ [0, T ], η(t), y(t) − v + y (t) = μ−1 y (t), y(t) − v + y (t) +η(t) − μ−1 y (t), y(t) − v + y (t) ≥ μ−1 y (t), y(t) − v + y (t) − μ−1 δD (1 + 2v + 2y(t) + 2μη(t)).
(6.272)
Relations (6.162), (6.163), (6.269), and (6.271) imply that for almost every t ∈ [0, T ], f (x(t), y(t)) − f (z, y(t)) ≤ μ−1 −x (t), x(t) + x (t) − z −ξ(t), x (t) +μ−1 δC (1 + 2z + 2x(t) + 2μξ(t)) + δf,1 (3z + x(t) + 2 + 2μξ(t)).
(6.273)
By (6.270) are (6.272), f (x(t), y(t)) − f (x(t), v) ≤ μ−1 y (t), y(t) + y (t) − v −η(t), y (t) −δf,2 (2 + 3v + 3y(t) + 2μη(t)) − μ−1 δD (1 + 2v + 2y(t) + 2μη(t)). In view of (6.273) and (6.274), for almost every t ∈ [0, T ], μ−1 x (t)2 + μ−1 x (t), x(t) − z +ξ(t), x (t) + f (x(t), y(t)) − f (z, y(t))
(6.274)
224
6 Continuous Subgradient Method
≤ (μ−1 δC + δf,1 )(2 + 3z + 3x(t) + 2μξ(t))
(6.275)
and μ−1 y (t)2 + μ−1 y (t), y(t) − v −η(t), y (t) + f (x(t), v) − f (x(t), y(t)) ≤ (μ−1 δD + δf,2 )(2 + 3v + 3y(t) + 2μη(t)).
(6.276)
Corollary 6.4, (6.275), and (6.276) imply that for almost every t ∈ [0, T ], (2μ)−1 (d/dt)(x(t) − z2 ) +f (x(t), y(t)) − f (z, y(t)) ≤ μ−1 x (t), x(t) − z +f (x(t), y(t)) − f (z, y(t)) +μ−1 x (t) + 2−1 μξ(t)2 ≤ μ−1 x (t), x(t) − z +f (x(t), y(t)) − f (z, y(t)) +μ−1 x (t)2 + x (t), ξ(t) + 4−1 μξ(t)2 ≤ 4−1 μξ(t)2 + (μ−1 δC + δf,1 )(2 + 3z + 3x(t) + 2μξ(t)) and (2μ)−1 (d/dt)(y(t) − v2 ) +f (x(t), v) − f (x(t), y(t)) ≤ μ−1 y (t), y(t) − v +f (x(t), v)) − f (x(t), y(t)) +μ−1 y (t) − 2−1 μη(t)2 = μ−1 y (t), y(t) − v
6.16 A Convergence Result for Games on Bounded Sets
225
+f (x(t), v) − f (x(t), y(t)) +μ−1 y (t)2 − y (t), η(t) + 4−1 μη(t)2 ≤ 4−1 μη(t)2 + (μ−1 δD + δf,2 )(2 + 3v + 3y(t) + 2μη(t)).
Proposition 6.11 is proved.
6.16 A Convergence Result for Games on Bounded Sets Suppose that L1 , L2 ≥ 1, M1 , M2 > 1, C ⊂ BX (0, M1 − 1), D ⊂ BY (0, M2 − 1),
(6.277)
{x ∈ X : d(x, C) ≤ 1} ⊂ U, {y ∈ Y : d(y, D) ≤ 1} ⊂ V } and that the function f : U × V → R 1 is continuous and has the following properties: (iii) for each v ∈ V ∩ BY (0, M2 + 1), |f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2 for all u1 , u2 ∈ U ∩ BX (0, M1 + 1); (iv) for each u ∈ U ∩ BX (0, M1 + 1), |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2 for all v1 , v2 ∈ V ∩ BY (0, M2 + 1). Let x∗ ∈ C and y∗ ∈ D
(6.278)
f (x∗ , y) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(6.279)
satisfy
for each x ∈ C and each y ∈ D. In this section we prove the following result.
226
6 Continuous Subgradient Method
Theorem 6.12 Let M, T > 0, x ∈ W 1,1 (0, T ; X), y ∈ W 1,1 (0, T ; Y ), x(t) ∈ U, y(t) ∈ V , t ∈ [0, T ], d(x(0), C) ≤ δC , d(y(0), D) ≤ δD
(6.280)
and for almost every t ∈ [0, T ] there exist ξ(t) ∈ X, η(t) ∈ Y such that BX (ξ(t), δf,1 ) ∩ ∂x f (x(t), y(t)) = ∅,
(6.281)
PC (x(t) − μξ(t)) ∈ BX (x(t) + x (t), δC ),
(6.282)
BY (η(t), δf,2 ) ∩ ∂y f (x(t), y(t)) = ∅,
(6.283)
PD (y(t) + μη(t)) ∈ BX (yt) + y (t), δD ).
(6.284)
Then for almost every t ∈ [0, T ], BX (x(t), δC ) ∩ C = ∅, BY (y(t), δD ) ∩ D = ∅. BX (T −1
T
x(t)dt, δC ) ∩ C = ∅,
0
BY (T
−1
T
y(t)dt, δD ) ∩ D = ∅,
0
|T −1
T 0
f (x(t), y(t))dt − f (x∗ , y∗ )|
≤ max{6M1 μ−1 δC + 2δC (L1 + 1) + 6δf,1 M1 + 2μδf,1 (L1 + 1) +4−1 μ(L1 + 1)2 + (2μT )−1 4M12 , 4(2T μ)−1 M22 + 6M2 μ−1 δD + 2δD (L2 + 1) +6δf,2 M2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 }, |f (T
−1
T
x(t)dt, T 0
−1
0
T
y(t)dt) − T
−1
T
f (x(t), y(t))dt| 0
6.16 A Convergence Result for Games on Bounded Sets
227
≤ max{6M1 μ−1 δC + δC (3L1 + 1) + 6δf,1 M1 + 2μδf,1 (L1 + 1) +4−1 μ(L1 + 1)2 + (2μT )−1 4M12 , 4(2T μ)−1 M22 + 6M2 μ−1 δD + δD (3L2 + 1) +6δf,2 M2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 }, and for every z ∈ C and every v ∈ D, −1
f (z, T
T
y(t)dt) ≥ f (T
−1
0
T
x(t)dt, T
−1
0
T
y(t)dt) 0
−2 max{6M1 μ−1 δC + δC (3L1 + 1) + 6δf,1 M1 + 2μδf,1 (L1 + 1) +4−1 μ(L1 + 1)2 + (2μT )−1 4M12 , (2T μ)−1 4M22 + 6M2 μ−1 δD + δD (3L2 + 1) +6δf,2 M2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 }, f (T −1
T 0
x(t)dt, v) ≤ f (T −1
T
x(t)dt, T −1
0
T
y(t)dt) 0
+2 max{6M1 μ−1 δC + δC (3L1 + 2) + 6δf,1 M1 +4−1 μ(L1 + 1)2 + (2μT )−1 4M12 , 4(2T μ)−1 M22 + 6M2 μ−1 δD + δD (3L2 + 1) +6δf,2 M2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 }. Proof Let z ∈ C, v ∈ D. Proposition 6.11 implies that for almost every t ∈ [0, T ], (2μ)−1 (d/dt)(x(t) − z2 ) +f (x(t), y(t)) − f (z, y(t)) ≤ 4−1 μξ(t)2 + (μ−1 δC + δf,1 )(2 + 3z
(6.285)
228
6 Continuous Subgradient Method
+ 3x(t) + 2μξ(t)),
(6.286)
(2μ)−1 (d/dt)(y(t) − v2 ) +f (x(t), v) − f (x(t), y(t)) ≤ 4−1 μη(t)2 + (μ−1 δD + δf,2 )(2 + 3v + 3y(t) + 2μη(t))
(6.287)
and that for all t ∈ [0, T ], BX (x(t), δC ) ∩ C = ∅, BY (y(t), δD ) ∩ D = ∅.
(6.288)
In view of (6.225), (6.277), and (6.288), for all t ∈ [0, T ], x(t) ≤ M1 , y(t) ≤ M2 .
(6.289)
Properties (iii) and (iv), (6.281), (6.283), and (6.289) imply that for almost every t ∈ [0, T ], ξ(t) ≤ L1 + 1, η(t) ≤ L2 + 1.
(6.290)
By (6.277), (6.285)–(6.287), (6.289), and (6.290), for almost every t ∈ [0, T ], (2μ)−1 (d/dt)(x(t) − z2 ) +f (x(t), y(t)) − f (z, y(t)) ≤ 4−1 μ(L1 + 1)2 +(μ−1 δC + δf,1 )(2 + 3(M1 − 1) + 3M1 + 2μ(L1 + 1)) ≤ 4−1 μ(L1 + 1)2 + (μ−1 δC + δf,1 )(6M1 + 2μ(L1 + 1)) = 6M1 μ−1 δC + 2δC (L1 + 1) + 6M1 δf,1 + 2μδf,1 (L1 + 1) + 4−1 μ(L1 + 1)2 and
(6.291)
6.16 A Convergence Result for Games on Bounded Sets
229
(2μ)−1 (d/dt)(y(t) − v2 ) +f (x(t), v) − f (x(t), y(t)) ≤ 4−1 μ(L2 + 1)2 +(μ−1 δD + δf,2 )(2 + 3(M2 − 1) + 3M2 + 2μ(L2 + 1)) ≤ 4−1 μ(L2 + 1)2 + (μ−1 δD + δf,2 )(6M2 + 2μ(L2 + 1)) = 6M2 μ−1 δD + 2δD (L2 + 1) + 6M2 δf,2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 .
(6.292)
By integrating (6.291) and (6.292) we obtain that T −1 ((2μ)−1 x(T ) − z2 − (2μ)−1 x(0) − z2 ) +T
−1
T
f (x(t), y(t))dt − T
−1
T
f (z, y(t))dt
0
0
≤ 6M1 μ−1 δC + 2δC (L1 + 1) + 6M1 δf,1 + 2μδf,1 (L1 + 1) + 4−1 μ(L1 + 1)2
(6.293)
and T −1 ((2μ)−1 y(T ) − v2 − (2μ)−1 y(0) − z2 ) +T −1
T
0
f (x(t), v)dt − T −1
T
f (x(t), y(t))dt 0
≤ 6M2 μ−1 δD + 2δD (L2 + 1) + 6M2 δf,2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 .
(6.294)
It follows from (6.277), (6.280), (6.285), (6.293), and (6.294) that T −1
0
T
f (x(t), y(t))dt − T −1
T
f (z, y(t))dt 0
≤ (2μT )−1 4M12 + 6M1 μ−1 δC + 2δC (L1 + 1) + 6M1 δf,1 + 2μδf,1 (L1 + 1) + 4−1 μ(L1 + 1)2
(6.295)
230
6 Continuous Subgradient Method
and T −1
T
f (x(t), v)dt − T −1
T
f (x(t), y(t))dt
0
0
≤ (2μT )−1 4M22 + 6M2 μ−1 δD + 2δD (L2 + 1) + 6M2 δf,2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 .
(6.296)
Set Δ1 = (2μT )−1 4M12 + 6M1 μ−1 δC + 2δC (L1 + 1) + 6M1 δf,1 + 2μδf,1 (L1 + 1) + 4−1 μ(L1 + 1)2
(6.297)
and Δ2 = (2μT )−1 4M22 + 6M2 μ−1 δD + 2δD (L2 + 1) + 6M2 δf,2 + 2μδf,2 (L2 + 1) + 4−1 μ(L2 + 1)2 .
(6.298)
By (6.278), (6.279), and (6.295)–(6.298), T −1
T 0
≥T
−1
T
f (x(t), y(t))dt − f (x∗ , y∗ )
f (x(t), y(t))dt − T
−1
0
T 0
f (x(t), y∗ )dt ≥ −Δ2
and T −1
T 0
≤ T −1
T
f (x(t), y(t))dt − f (x∗ , y∗ )
f (x(t), y(t))dt − T −1
0
0
T
f (x∗ , y(t))dt ≤ Δ1 .
The relations above imply that |T −1
T 0
f (x(t), y(t))dt − f (x∗ , y∗ )| ≤ max{Δ1 , Δ2 }.
Clearly, the functions PC (x(t)), t ∈ [0, T ] and PD (y(t)), t ∈ [0, T ] are Bochner integrable. Set
6.16 A Convergence Result for Games on Bounded Sets
x = T −1
T
PC (x(t))dt, y = T −1
0
xT = T −1
231 T
PC (y(t))dt,
(6.299)
0
T
x(s)ds, yT = T −1
0
T
y(s)ds.
(6.300)
0
In view of (6.299), x ∈ C, y ∈ D.
(6.301)
By (6.288) and (6.299),
x − T −1
T
x(s)ds ≤ δC ,
0
y−T
−1
T
y(s)ds ≤ δD .
(6.302)
0
It follows from (6.295) to (6.298) and (6.301) that T −1
T
f (x(t), y(t))dt − T −1
0
T
f ( x , y(t))dt ≤ Δ1
(6.303)
f (x(t), y )dt ≥ −Δ2 .
(6.304)
0
and T
−1
T
−1
f (x(t), y(t))dt − T
0
T
0
Properties (iii) and (iv), (6.299), and (6.302) imply that for all t ∈ [0, T ], |f ( x , y(t)) − f (T
−1
T
x(s)ds, y(t))| ≤ L1 δC ,
(6.305)
0
|f (x(t), y ) − f (x(t), T −1
T
y(s)ds)| ≤ L2 δD .
(6.306)
f (xT , y(t))dt| ≤ L1 δC ,
(6.307)
f (x(t), yT )dt| ≤ L2 δD .
(6.308)
0
In view of (6.300), (6.305), and (6.306), T −1 |
T
f ( x , y(t))dt −
0
T −1 |
0
T
0 T
f (x(t), y )dt − 0
T
232
6 Continuous Subgradient Method
By (6.303), (6.304), (6.307), and (6.308), T
−1
T
f (x(t), y(t))dt ≤ Δ1 + T
−1
T
f ( x , y(t))dt
0
0
≤ Δ1 + T −1
T
f (xT , y(t))dt + L1 δC
(6.309)
0
and T
−1
T
f (x(t), y(t))dt ≥ −Δ2 + T
−1
0
T
f (x(t), y )dt
0
≥ −Δ2 − L2 δD + T −1
T
f (x(t), yT )dt.
(6.310)
0
Properties (i) and (ii), (6.300), (6.309), and (6.310) imply that T −1 f (xT , yT ) − T f (x(t), y(t))dt 0
= f (T −1
T
x(t)dt, yT ) − T −1
f (x(t), y(t))dt
0
≤T
−1
T
T
0
f (x(t), yT )dt − T
−1
T
f (x(t), y(t))dt
0
0
≤ Δ 2 + L 2 δD , f (xT , yT ) − T −1
T
f (x(t), y(t))dt 0
= f (xT , T
−1
T
y(t)dt) − T
−1
f (x(t), y(t))dt
0
≥ T −1
T
T 0
f (xT , y(t))dt − T −1
0
T
f (x(t), y(t))dt 0
≥ −Δ1 − L1 δC and |f (xT , yT ) − T
−1
T
f (x(t), y(t))dt| 0
≤ max{Δ1 + L1 δC , Δ2 + L2 δD }.
(6.311)
6.16 A Convergence Result for Games on Bounded Sets
233
Let z ∈ C, v ∈ D.
(6.312)
By (6.295)–(6.298),
T −1
T
f (x(t), y(t))dt − T −1
0
T
f (z, y(t))dt ≤ Δ1
(6.313)
f (x(t), y(t))dt ≤ Δ2 .
(6.314)
0
and T
−1
T
f (x(t), v)dt − T
−1
0
T 0
Properties (i) and (ii) and (6.300) imply that T
−1
T
f (z, y(t))dt ≤ f (z, T
−1
0
T
y(t)dt) = f (z, yT )
(6.315)
x(t)dt, v) = f (xT , v).
(6.316)
0
and T −1
T
f (x(t), v)dt ≥ f (T −1
0
T
0
By (6.313) and (6.315), T −1
T
f (x(t), y(t))dt − f (z, yT )
0
≤ T −1
T
f (x(t), y(t))dt − T −1
0
T
f (z, y(t))dt ≤ Δ1 .
0
Together with (6.311) this implies that f (z, yT ) ≥ T −1
T
f (x(t), y(t))dt − Δ1
0
≥ f (xT , yT ) − Δ1 − max{Δ1 + L1 δC , Δ2 + L2 δD }. It follows from (6.314) and (6.316), T −1
T 0
f (x(t), y(t))dt − f (xT , v)
234
6 Continuous Subgradient Method
≥ T −1
T
f (x(t), y(t))dt − T −1
0
T
f (x(t), v)dt ≥ −Δ2 .
0
Together with (6.311) this implies that f (xT , v) ≤ T
−1
T
f (x(t), y(t))dt + Δ2
0
≤ f (xT , yT ) + Δ2 + max{Δ1 + L1 δC , Δ2 + L2 δD }.
This completes the proof of Theorem 6.12.
We are interested to make the best choice of the parameter μ. In view of Theorem 6.12, we need to minimize the function max{6M1 μ−1 δC + 2μ(L1 + 1)2 , 6M2 μ−1 δD + 2μ(L2 + 1)2 }. It is not difficult to see that the best choice of μ is c1 max{δC , δD }1/2 where the constant c1 depends on L1 , M1 , L2 , M2 . Then the minimal value of the function above is c2 max{δC , δD }1/2 , where the constant c2 depends on L1 , M1 , L2 , M2 . The best choice of T is c3 max{δC , δD }1/2 , where the constant c3 also depends on L1 , M 1 , L 2 , M 2 .
6.17 A Convergence Result for Games on Unbounded Sets We continue to use the notation and definitions of Section 14. Let x∗ ∈ C, y∗ ∈ D
(6.317)
f (x∗ , v) ≤ f (x∗ , y∗ ) ≤ f (x, y∗ )
(6.318)
and
for all x ∈ C and all v ∈ D. Let M0 ≥ 1, D ⊂ BY (0, M0 − 1),
(6.319)
x∗ ≤ M/4,
(6.320)
M > 8,
6.17 A Convergence Result for Games on Unbounded Sets
235
≥ 4M, for each y ∈ V ∩ BY (0, M0 ), M {x ∈ U : f (x, y) ≤ sup{f (x∗ , v) : v ∈ V ∩ BY (0, M0 + 1)} + 4} ⊂ BX (0, M/4), L1 , L 2 ≥ 1
(6.321) (6.322)
and the continuous function f : U × V → R 1 has the following properties: (v) for each v ∈ V ∩ BY (0, M0 + 1), |f (u1 , v) − f (u2 , v)| ≤ L1 u1 − u2 + 1); for all u1 , u2 ∈ U ∩ BX (0, M + 1, (vi) for each u ∈ BX (0, M) |f (u, v1 ) − f (u, v2 )| ≤ L2 v1 − v2 for all v1 , v2 ∈ BY (0, M0 + 1); (vii) {x ∈ X : d(x, C) ≤ 1} ⊂ U, {y ∈ Y : d(y, D) ≤ 1} ⊂ V . In this section we prove the following result. Theorem 6.13 Let μ ∈ (0, 1] satisfy μ(L1 + 1)2 ≤ 1, + 2μ(L1 + 1)) ≤ 1, (μ−1 δC + δf,1 ))(4M
(6.323)
T > 0, x ∈ W 1,1 (0, T ; X), y ∈ W 1,1 (0, T ; Y ), x(t) ∈ U, y(t) ∈ V , t ∈ [0, T ], d(x(0), C) ≤ δC , d(y(0), D) ≤ δD ,
(6.324)
x(0) ≤ M,
(6.325)
and let for almost every t ∈ [0, T ] there exist ξ(t) ∈ X, η(t) ∈ Y such that
236
6 Continuous Subgradient Method
BX (ξ(t), δf,1 ) ∩ ∂x f (x(t), y(t)) = ∅,
(6.326)
PC (x(t) − μξ(t)) ∈ BX (x(t) + x (t), δC ),
(6.327)
BY (η(t), δf,2 ) ∩ ∂y f (x(t), y(t)) = ∅,
(6.328)
PD (y(t) + μη(t)) ∈ BY (yt) + y (t), δD ).
(6.329)
Let −1 δC + 4δC (L1 + 1) + 12δf,1 M Δ1 = 12Mμ 2 + 4μδf,1 (L1 + 1), + 2−1 μ(L1 + 1)2 + (2μT )−1 8M
(6.330)
Δ2 = 12M0 μ−1 δD + 4δD (L2 + 1) + 12δf,2 M0 + 2−1 μ(L2 + 1)2 + (2μT )−1 8M02 + 2μδf,2 (L2 + 1), xT = T
−1
T
x(t)dt, yT = T
−1
0
T
y(t)dt. 0
Then for almost every t ∈ [0, T ], BX (x(t), δC ) ∩ C = ∅, BY (y(t), δD ) ∩ D = ∅. BX (xT , δC ) ∩ C = ∅, BY (yT , δD ) ∩ D = ∅, xT ∈ U, yT ∈ V , |T −1
T 0
f (x(t), y(t))dt − f (x∗ , y∗ )| ≤ max{Δ1 , Δ2 },
|f (xT , yT ) − T
−1
T
f (x(t), y(t))dt| ≤ max{Δ1 , Δ2 },
0
and for every z ∈ C and every v ∈ D, f (z, yT ) ≥ f (xT , yT )− ≤ max{Δ1 , Δ2 }, f (xT , v) ≤ f (xT , yT ) + max{Δ1 , Δ2 }.
(6.331)
6.17 A Convergence Result for Games on Unbounded Sets
237
Proof Proposition 6.11 implies that for every z ∈ C, every v ∈ D, and almost every t ∈ [0, T ], (2μ)−1 (d/dt)(x(t) − z2 ) +f (x(t), y(t)) − f (z, y(t)) ≤ 4−1 μξ(t)2 + (μ−1 δC + δf,1 )(2 + 3z + 3x(t) + 2μξ(t))
(6.332)
and (2μ)−1 (d/dt)(y(t) − v2 ) +f (x(t), v) − f (x(t), y(t)) ≤ 4−1 μη(t)2 + (μ−1 δD + δf,2 )(2 + 3v + 3y(t) + 2μη(t)).
(6.333)
Set − M for all t ∈ [0, τ ]}. E = {τ ∈ [0, T ] : x(t) − x∗ ≤ M
(6.334)
In view of (6.322), (6.325), and (6.334), E = ∅. Set τ0 = sup(E). Clearly, τ0 ∈ E.
(6.335)
Proposition 6.11 implies that for every t ∈ [0, T ], BX (x(t), δC ) ∩ C = ∅, BY (y(t), δD ) ∩ D = ∅. In view of (6.319) and (6.334)–(6.336), for all t ∈ [0, τ0 ],
(6.336)
238
6 Continuous Subgradient Method
y(t) ≤ M0 , x(t) ≤ M.
(6.337)
Properties (v) and (vi), (6.326), (6.328), (6.336), and (6.337) imply that for almost every t ∈ [0, τ0 ], ξ(t) ≤ L1 + 1, η(t) ≤ L2 + 1.
(6.338)
We claim that τ0 = T . Assume the contrary. Then in view of (6.322) and (6.334)–(6.336), τ0 < T ,
(6.339)
− M, x(τ0 ) − x∗ = M
(6.340)
x(τ0 ) > 3M/4.
(6.341)
By (6.317), (6.320), (6.332), (6.337), and (6.338), for almost every t ∈ [0, τ0 ], (2μ)−1 (d/dt)(x(t) − x∗ 2 ) +f (x(t), y(t)) − f (x∗ , y(t)) ≤ 4−1 μ(L1 + 1)2 + 2μ(L1 + 1)). + (μ−1 δC + δf,1 )(2 + M + 3M
(6.342)
By (6.321) and (6.341), f (x(τ0 ), y(τ0 )) > sup{f (x∗ , v) : v ∈ V ∩ BY (0, M0 + 1)} + 4.
(6.343)
Since the function f is continuous it follows from (6.343) that there exists τ1 ∈ (0, τ0 ) such that for each t ∈ [τ1 , τ0 ], f (x(t), y(t)) > sup{f (x∗ , v) : v ∈ V ∩ BY (0, M0 + 1)} + 4.
(6.344)
6.17 A Convergence Result for Games on Unbounded Sets
239
By integrating (6.342) on [τ1 , τ0 ] we obtain that (2μ)−1 x(τ0 ) − x∗ 2 − (2μ)−1 x(τ1 ) − x∗ 2 +
τ0
τ0
f (x(t), y(t))dt −
τ1
τ1
f (x∗ , y(t))dt
≤ (τ0 − τ1 )(4−1 μ(L1 + 1)2 + 2μ(L1 + 1)). + (μ−1 δC + δf,1 )(2 + M + 3M
(6.345)
It follows from (6.320), (6.322), (6.323), (6.325), (6.337), (6.344), and (6.345) that 4(τ0 − τ1 ) ≤
τ0
f (x(t), y(t))dt τ1
−(τ0 − τ1 ) sup{f (x∗ , v) : v ∈ V ∩ BY (0, M0 + 1)} ≤ 2(τ0 − τ1 ), a contradiction. The contradiction we have reached proves that τ0 = T .
(6.346)
By (6.320), (6.322), (6.334)–(6.336), and (6.346), − M for all t ∈ [0, T ], x(t) − x∗ ≤ M − (3/4)M for all t ∈ [0, T ]. x(t) ≤ M
(6.347)
Set D˜ = D, V˜ = V , − 1), U˜ = U ∩ {x ∈ X : x < M}. C˜ = C ∩ BX (0, M
(6.348)
In view of (6.324), there exists x (0) ∈ C such that x(0) − x (0) ≤ δC .
(6.349)
It follows from (6.320), (6.347), and (6.349) that − (3/4)M + 1 ≤ M − 1. x (0) ≤ x(0) + 1 ≤ M By (6.348), (6.350), and the choice of x (0), ˜ x (0) ∈ C,
(6.350)
240
6 Continuous Subgradient Method
˜ ≤ x(0) − d(x(0), C) x (0) ≤ δC .
(6.351)
Lemma 2.2, (6.317), and (6.347) imply that for all t ∈ [0, T ], − M. x∗ − PC (x(t)) ≤ x∗ − x(t) ≤ M
(6.352)
It follows from (6.319), (6.320), (6.323), (6.338), and (6.352) that for all t ∈ [0, T ] − M + 1, x∗ − PC (x(t) + μξ(t)) ≤ M − 3M/4 + 1 ≤ M − 1. PC (x(t) + μξ(t)) ≤ M
(6.353)
In view of (6.348) and (6.353), ˜ PC (x(t) + μξ(t)) ∈ C, PC (x(t) + μξ(t)) = PC˜ (x(t) + μξ(t)).
(6.354)
By (6.347), (6.348), (6.351), and (6.354) all the assumptions of Theorem 6.12 hold with ˜ U = U˜ , D = D, ˜ V = V˜ . C = C,
(6.355)
Together with (6.330) and (6.331) and property (vii) this implies that for all t ∈ [0, T ], BX (x(t), δC ) ∩ C˜ = ∅, BY (y(t), δD ) ∩ D˜ = ∅. BX (xT , δC ) ∩ C˜ = ∅,
|T
−1
T 0
BY (yT , δD ) ∩ D = ∅,
(6.356)
xT ∈ U, yT ∈ V ,
(6.357)
f (x(t), y(t))dt − f (x∗ , y∗ )| ≤ max{Δ1 , Δ2 },
|f (xT , yT ) − T −1
0
for every
T
f (x(t), y(t))dt| ≤ max{Δ1 , Δ2 },
6.17 A Convergence Result for Games on Unbounded Sets
241
− 1) z ∈ C˜ = C ∩ BX (0, M and every v ∈ D, f (z, yT ) ≥ f (xT , yT ) − max{Δ1 , Δ2 }, f (xT , v) ≤ f (xT , yT ) + max{Δ1 , Δ2 }.
(6.358)
It follows from (6.319) to (6.321), (6.348), and (6.356) to (6.358) that for each ˜ z ∈ C \ C, f (z, yT ) ≥ sup{f (x∗ , v) : v ∈ V ∩ BX (0, M0 + 1)} + 4 ≥ f (x∗ , yT ) + 4 ≥ f (xT , yT ) − max{Δ1 , Δ2 } + 4. Theorem 6.13 is proved.
As in the case of Theorem 6.12 we can show that the best choice of μ is c1 max{δC , δD }1/2 where the constant c1 depends of L1 , M1 , L2 , M2 and the best choice of T is c2 max{δC , δD }1/2 , where the constant c2 depends of L1 , M1 , L2 , M2 .
Chapter 7
An Optimization Problems with a Composite Objective Function
In this chapter we study an algorithm for minimization of the sum of two functions, the first one being smooth and convex and the second being convex. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the first function while the second one is a proximal gradient step for the second function. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
7.1 Preliminaries Let H be a Hilbert space equipped with an inner product ·, · which induces a complete norm · , Id : X → X be the identity operator such that Id(x) = x, x ∈ X and let a set D ⊂ H be nonempty. An operator T : D → H is called firmly nonexpansive if T x − T y2 + (Id − T )x − (Id − T )y2 ≤ x − y2 for all x, y ∈ D. Let f, h : H → (−∞, ∞]. The infimal convolution of f and g is the function f g : H → [−∞, ∞] © Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_7
243
244
7 An Optimization Problems with a Composite Objective Function
defined by f g(x) = inf{f (y) + g(x − y) : y ∈ H } [13]. Let f : H → (−∞, ∞] and γ > 0. The Moreau envelope of f of parameter γ [13] is the function γ
f := f ((2γ )−1 · 2 ).
Clearly, for all x ∈ H , γ
f (x) = inf{f (y) + (2γ )−1 x − y2 : y ∈ H }.
(7.1)
Let f : H → (−∞, ∞] be a convex and lower semicontinuous function which is not identically infinity. Then Proxf x is the unique point in H which satisfies 1
f (x) = inf{f (y) + 2−1 x − y2 : y ∈ H } = f (Proxf x) + 2−1 x − Proxf x2 .
(7.2)
Let h : H → (−∞, ∞] be a convex and lower semicontinuous function which is not identically infinity and let x ∈ H . In view of (7.2), Proxth (x) = argmin{th(y) + 2−1 x − y2 : y ∈ H } = argmin{h(y) + (2t)−1 x − y2 : y ∈ H }.
(7.3)
In view of (7.3), 0 ∈ ∂h(Proxth (x)) + t −1 (Proxth (x) − x) and t −1 (x − Proxth (x)) ∈ ∂h(Proxth (x)).
(7.4)
In the sequel we use the following results. Proposition 7.1 ([13]) Let f : H → (−∞, ∞] be a convex and lower semicontinuous function which is not identically infinity. Then Proxf and Id − Proxf are firmly nonexpansive mappings. Proposition 7.2 ([13]) Let f : H → (−∞, ∞] be a convex and lower semicontinuous function which is not identically infinity and x, p ∈ H . Then
7.2 The Algorithm and Main Results
245
p = Proxf (x) if and only if y − p, x − p + f (p) ≤ f (y) for all y ∈ H . Proposition 7.3 Let h : H → (−∞, ∞] be a convex and lower semicontinuous function which is not identically infinity, u, w ∈ H , t > 0 and v = Proxth (u).
(7.5)
Then h(w) ≥ h(v) + (2t)−1 (u − v2 + w − v2 − u − w2 ). Proof Lemma 2.1, Proposition 7.2, and (7.5) imply that th(w) ≥ th(v) + w − v, u − v = th(v) + 2−1 [u − v2 + w − v2 − u − w2 ].
Proposition 7.3 is proved.
7.2 The Algorithm and Main Results Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · , f : X → R 1 be a convex Fréchet differentiable function, g : X → (−∞, ∞] be a convex and lower semicontinuous function which is not identically infinity and F (x) = f (x) + g(x), x ∈ X.
(7.6)
We consider the minimization problem F (x) → min, x ∈ X studied in [69] for in the finite-dimensional case. We denote by f (x) the Fréchet derivative of the function f at the point x ∈ X. Let g be bounded from below, θ ∈ X, M > 1, θ < M − 1, g(θ ) < ∞,
(7.7)
246
7 An Optimization Problems with a Composite Objective Function
c∗ > 0, g(x) ≥ −c∗ for all x ∈ X,
(7.8)
argmin(F ) ∩ BX (0, M) = ∅,
(7.9)
L1 ≥ 1 be such that |f (z1 ) − f (z2 )| ≤ L1 z1 − z2 for all z1 , z2 ∈ BX (0, 3M + 2).
(7.10)
M1 = 6M + 2L1 + 4 + 2c∗ + 2|g(θ )| + M,
(7.11)
Let
L ≥ 1 be such that f (z1 ) − f (z2 ) ≤ Lz1 − z2 for all z1 , z2 ∈ BX (0, M1 + 2).
(7.12)
0 < S− < S+ ≤ L−1 ,
(7.13)
S− ≤ St ≤ S+ t = 0, 1, . . . .
(7.14)
Let δf , δG ∈ (0, 1],
Let us describe our algorithm. Initialization: select an arbitrary x0 ∈ X. Iterative Step: given a current iteration vector xk ∈ X calculate ξk ∈ X such that ξk − f (xk ) ≤ δf and xk+1 ∈ X such that g(xk+1 ) + (2Sk )−1 xk+1 − (xk − Sk ξk )2 ≤ g(x) + (2Sk )−1 x − (xk − Sk ξk )2 + δG for all x ∈ X. Our iterative step consists of two calculations. The second one is a proximal method applied to the function g. It should be mentioned that proximal methods are important tools in analysis and optimization [4, 5, 7, 8, 14, 19, 23–26, 28, 31, 33, 34, 40–45, 47, 48]. In this chapter we prove the following two results.
7.2 The Algorithm and Main Results
247
Theorem 7.4 Let δG ≤ 4−1 , ∞ {xt }∞ t=0 ⊂ X, {ξt }t=0 ⊂ X,
x0 ≤ M,
(7.15)
ξt − f (xt ) ≤ δf
(7.16)
for each integer t ≥ 0,
and g(xt+1 ) + (2St )−1 xt+1 − (xt − St ξt )2 ≤ g(w) + (2St )−1 w − (xt − St ξt )2 + δG for all w ∈ X.
(7.17)
Let −1 δG (4M1 + 1), 0 = 2δG + 4δf M1 + S− 1/2
−1 2 n0 = 40−1 S− M + 1.
Then there exists an integer q ∈ [0, n0 + 1] such that F (xq ) ≤ inf(F ) + 0 , xt ≤ 3M, t = 0, . . . , q. It is easy to see that 1/2
0 = c1 max{δG , δf } and n0 = c2 max{δG , δf }−1 , 1/2
where c1 , c2 are positive constants which depend on L, M1 and S− . Theorem 7.5 Let δf < M1 /8, δG < min{4−1 , ((16M1 + 4)−1 S− )2 },
(7.18)
248
7 An Optimization Problems with a Composite Objective Function
{x ∈ X : F (x) ≤ inf(F ) + 4} ⊂ BX (0, M),
(7.19)
∞ {xt }∞ t=0 ⊂ X, {ξt }t=0 ⊂ X,
x0 ≤ M, for each integer t ≥ 0, ξt − f (xt ) ≤ δf
(7.20)
and g(xt+1 ) + (2St )−1 xt+1 − (xt − St ξt )2 ≤ g(w) + (2St )−1 w − (xt − St ξt )2 + δG for all w ∈ X.
(7.21)
Then for all integers t ≥ 0 xt ≤ 3M and for all pairs of natural numbers m, T , min{F (xt ) : t = m + 1, . . . , m + T } − inf(F ), f(
m+T
T −1 xt ) − inf(F )
t=m+1
≤ 4T −1 M 2 + (2S− )−1 (4M1 + 1)δG + δG + 2δf M1 . 1/2
It is easy to see that T = c1 max{δG , δf }−1 , 1/2
where c1 is a positive constant which depends on M, M1 and S− , is the best choice of T .
7.3 Auxiliary Results Lemma 7.6 Let S > 0, u0 , u1 , ξ0 , u1 ∈ X, g(u1 ) + (2S)−1 u1 − (u0 − Sξ0 )2
7.3 Auxiliary Results
249
≤ g(x) + (2S)−1 x − (u0 − Sξ0 )2 + δG for all x ∈ X and u1 − (u0 − Sξ0 )2 g( u1 ) + (2S)−1 ≤ g(x) + (2S)−1 x − (u0 − Sξ0 )2 for all x ∈ X. Then u1 ≤ 2(SδG )1/2 . u1 − Proof Set h1 (w) = g(w) + (2S)−1 w − (u0 − Sξ0 )2 , w ∈ X. Clearly, h1 is a convex function. Since the function g is convex it is not difficult to see that u1 ) ≤ h1 (2−1 u1 + 2−1 u1 ) h1 ( u1 ) + (2S)−1 2−1 u1 + 2−1 u1 − u0 + Sξ0 2 = g(2−1 u1 + 2−1 ≤ 2−1 g(u1 ) + 2−1 g( u1 ) +(2S)−1 2−1 (u1 − u0 + Sξ0 ) + 2−1 ( u1 − u0 + Sξ0 )2 = 2−1 g(u1 ) + 2−1 g( u1 ) u1 − u0 + Sξ0 )2 +(2S)−1 [22−1 (u1 − u0 + Sξ0 )2 + 22−1 ( −2−1 (u1 − u1 )2 ] = 2−1 (g(u1 ) + (2S)−1 u1 − u0 + Sξ0 )2 ) u1 ) + (2S)−1 u1 − u0 + Sξ0 )2 ) +2−1 (g( u1 )2 −(2S)−1 2−1 (u1 − = 2−1 h1 (u1 ) + 2−1 h1 ( u1 ) − (2S)−1 2−1 (u1 − u1 )2 ≤ h1 ( u1 ) + δG /2 − (2S)−1 2−1 (u1 − u1 )2 and
250
7 An Optimization Problems with a Composite Objective Function
u1 − u1 ≤ 2(SδG )1/2 .
Lemma 7.6 is proved. Lemma 7.7 Let S ∈ (0, 1], x, y, ξ, p ∈ X, δG ≤ 4−1 , ξ − f (y) ≤ δf
(7.22)
x, y ≤ 3M,
(7.23)
g(p) + (2S)−1 p − (x − Sξ )2 ≤ g(z) + (2S)−1 z − (x − Sξ )2 + δG
(7.24)
for all z ∈ X. Then p ≤ M1 and for every z ∈ X satisfying z ≤ 3M, F (z) − F (p) ≥ ((2S)−1 − 2−1 L)p − z2 +(2S)−1 (x − p2 − x − z2 ) +p − z, ξ − f (z) − δG − (2S)1/2 δG (4M1 + 1), 1/2
F (z) − F (p) ≥ (2S)−1 (x − p2 + z − p2 − x − z2 ) +f (z) − f (y) − 2−1 Lp − y2 +f (y), y − z − 2M1 δf − δG − (2S)1/2 δG (4M1 + 1), 1/2
F (z) − F (p) ≥ (2S)−1 (x − p2 + z − p2 − x − z2 ) −2−1 Lp − y2 − 2M1 δf − δG − (2S)1/2 δG (4M1 + 1). 1/2
Proof Set p0 = ProxSg (x − Sξ ).
(7.25)
7.3 Auxiliary Results
251
In view of (7.25), g(p0 ) + (2S)−1 p0 − (x − Sξ )2 ≤ g(z) + (2S)−1 z − (x − Sξ )2
(7.26)
for all z ∈ X. Let z ∈ BX (0, 3M). Applying Proposition 7.3 with v = p0 , u = x − Sξ , z = w, t = S we obtain that g(z) − g(p0 ) ≥ (2S)−1 (x − Sξ − p0 2 + z − p0 2 − x − Sξ − z2 ).
(7.27)
In view of (7.27), g(z) + (2S)−1 x − Sξ − z2 −(g(p0 ) + (2S)−1 x − Sξ − p0 2 ) ≥ (2S)−1 z − p0 2 .
(7.28)
Lemma 7.6, (7.24), and (7.26) imply that p0 − p ≤ 2(SδG )1/2 .
(7.29)
ξ ≤ f (y) + 1 ≤ L1 + 1.
(7.30)
By (7.10), (7.22), and (7.23),
It follows from (7.24), (7.28), and (7.29) that g(z) + (2S)−1 x − Sξ − z2 −(g(p) + (2S)−1 x − Sξ − p2 ) ≥ g(z) + (2S)−1 x − Sξ − z2 −(g(p0 ) + (2S)−1 x − Sξ − p0 2 ) − δG ≥ (2S)−1 z − p0 2 − δG
252
7 An Optimization Problems with a Composite Objective Function
≥ (2S)−1 z − p2 − δG − p − p0 (z − p0 + z + p) ≥ (2S)−1 z − p2 − δG − 2(SδG )1/2 (2z + 2p0 + 1).
(7.31)
In view of (7.8) and (7.24), (2S)−1 p − (x − Sξ )2 ≤ c∗ + g(θ ) + (2S)−1 θ − x + Sξ )2 + 1.
(7.32)
In view of (7.32), p − (x − Sξ )2 ≤ 2(c∗ + |g(θ )| + 1) + θ − x + Sξ )2 .
(7.33)
It follows from (7.7), (7.11), (7.23), and (7.30) that p ≤ x + ξ +2(c∗ + |g(θ )| + 1) + θ + x + ξ ≤ 3 + M + L1 + 1 + 2(c∗ + |g(θ )| + 1) + θ + 3M + L1 + 1 = 6M + 2L1 + 4 + 2(c∗ + |g(θ )|) + θ ≤ M1 .
(7.34)
By (7.11), (7.23), (7.31), and (7.34), g(z) + (2S)−1 x − Sξ − z2 −(g(p) + (2S)−1 x − Sξ − p2 ) ≥ (2S)−1 z − p2 − δG − 2(SδG )1/2 (4M1 + 1).
(7.35)
In view of (7.35), g(z) − g(p) ≥ (2S)−1 (x − Sξ − p2 − x − Sξ − z2 + z − p2 ) −δG − (2SδG )1/2 (4M1 + 1) = (2S)−1 (x − p2 − x − z2 + z − p2 ) + p − z, ξ − δG − (2SδG )1/2 (4M1 + 1).
(7.36)
7.3 Auxiliary Results
253
Proposition 4.3 and (7.12) imply that f (z) − f (p) ≥ −2−1 Lp − z2 − f (z), p − z.
(7.37)
It follows from (7.6), (7.36), and (7.37) that F (z) − F (p) ≥ ((2S)−1 − 2−1 L)p − z2 + (2S)−1 (x − p2 − x − z2 ) + ξ − f (z), p − z − δG − (2SδG )1/2 (4M1 + 1).
(7.38)
In view of (7.38), the first inequality in the statement of the lemma is proved. Let us prove the second inequality in the statement of the lemma. Proposition 4.3 and (7.12) imply that f (y), p − y ≥ f (p) − f (y) − 2−1 Lp − y2 .
(7.39)
Combining (7.22), (7.34), (7.36), and (7.39) we obtain that g(z) − g(p) ≥ (2S)−1 (x − p2 − x − z2 + z − p2 ) +f (p) − f (y) − 2−1 Lp − y2 +f (y), y − z + ξ − f (y), p − z −δG − (2SδG )1/2 (4M1 + 1) ≥ (2S)−1 (x − p2 − x − z2 + z − p2 ) +f (p) − f (y) − 2−1 Lp − y2 + f (y), y − z − 2M1 δf − δG − (2SδG )1/2 (4M1 + 1).
(7.40)
We add f (z) − f (p) to (7.40) and obtain the second inequality in the statement of the lemma. Together with the convexity of f this implies the third inequality in the statement of the lemma. Lemma 7.7 is proved. Lemma 7.7 implies the following result. Lemma 7.8 Let S ∈ (0, 1], x, ξ, p ∈ X, δG ≤ 4−1 , S ≤ L−1 ,
254
7 An Optimization Problems with a Composite Objective Function
ξ − f (x) ≤ δf , x ≤ M + 3, g(p) + (2S)−1 p − (x − Sξ )2 ≤ g(z) + (2S)−1 z − (x − Sξ )2 + δG for all z ∈ X. Then p ≤ M1 and for every z ∈ X satisfying z ≤ 3M, F (z) − F (p) (2S)−1 (z − p2 − x − z2 ) +((2S)−1 − 2−1 L)x − p2 −2M1 δf − δG − (2S)−1 δG (4M1 + 1) 1/2
≥ (2S)−1 (z − p2 − x − z2 ) −2M1 δf − δG − (2S)−1 δG (4M1 + 1). 1/2
7.4 Proof of Theorem 7.4 In view of (7.9), there exists z ∈ argmin(F ) ∩ BX (0, M).
(7.41)
x0 − z ≤ 2M.
(7.42)
By (7.15) and (7.41),
If F (x0 ) ≤ inf(F ) + 0 , then the assertion of the theorem holds. Assume that F (x0 ) > inf(F ) + 0 .
(7.43)
7.4 Proof of Theorem 7.4
255
If F (x1 ) ≤ inf(F ) + 0 , then by Lemma 7.8, x1 ≤ M1 and the assertion of the theorem holds. Let F (x1 ) > inf(F ) + 0 .
(7.44)
Assume that T ≥ 0 is an integer and that for all integers t = 0, . . . , T , F (xt+1 ) − F (z) > 0 .
(7.45)
(Note that in view of (7.44), out assumption holds for T = 0.) We show that for all t = 0, . . . , T , xt − z ≤ 2M
(7.46)
and F (z) − F (xt+1 ) ≥ (2St )−1 (z − xt+1 2 − z − xt 2 ) − 2M1 δf − δG − (2St )−1 δG (4M1 + 1). 1/2
(7.47)
Assume that t ∈ {0, . . . , T } and (7.46) holds. By (7.41) and Lemma 7.8 applied with p = xt+1 , x = xt , ξ = ξt , (7.47) is true. It follows from (7.13), (7.45), and (7.47) that 0 < (2St )−1 (z − xt 2 − z − xt+1 2 ) +(2St )−1 δG (4M1 + 1) + 2M1 δf + δG 1/2
≤ (2St )−1 (z − xt 2 − z − xt+1 2 ) + (2S− )−1 δG (4M1 + 1) + 2M1 δf + δG . 1/2
(7.48)
256
7 An Optimization Problems with a Composite Objective Function
Relations (7.13), (7.18), and (7.48) imply that z − xt+1 ≤ z − xt . Thus by induction we have shown that (7.46) holds for all t = 0, . . . , T + 1, (7.47) holds for all t = 0, . . . , T and that z − xt+1 ≤ z − xt
(7.49)
is true for all t = 0, . . . , T . By (7.42) and (7.45)–(7.49), (T + 1)0 < (T + 1)(min{F (xt ) : t = 1, . . . , T + 1} − F (z)) ≤
T
(F (xt+1 − F (z))
t=0
≤
T (2S− )−1 (z − xt 2 − z − xt+1 2 ) t=0
+(T + 1)[(2S− )−1 δG (4M1 + 1) + 2M1 δf + δG ] 1/2
≤ (2S− )−1 4M 2 + (T + 1)[(2S− )−1 δG (4M1 + 1) + 2M1 δf + δG ]. 1/2
Together with the choice of 0 , n0 this implies that (T + 1)0 /2 ≤ (2S− )−1 4M 2 and −1 2 −1 M 0 < n0 + 1. T ≤ 4S−
Thus we have shown that if T is an integer and (7.45) holds for all integers t = 0, . . . , T , then T < n0 and in view of (7.41) and (7.46) xt ≤ 3M, t = 0, . . . , T + 1. This implies that there exists an integer q ∈ [0, n0 + 1] such that F (xq ) ≤ inf(F ) + 0 , xt ≤ 3M, t = 0, . . . , q. Theorem 7.4 is proved.
7.5 Proof of Theorem 7.5
257
7.5 Proof of Theorem 7.5 In view of (7.9) and (7.19), there exists z ∈ argmin(F ) ⊂ BX (0, M).
(7.50)
It follows from (7.50) and the assumptions of the theorem that x0 − z ≤ 2M.
(7.51)
Assume that t ≥ 0 is an integer and that xt − z ≤ 2M.
(7.52)
By (7.20), (7.50), (7.52), and Lemma 7.8 applied with p = xt+1 , x = xt , ξ = ξt , we have F (z) − F (xt+1 ) ≥ (2St )−1 (z − xt+1 2 − z − xt 2 ) − 2M1 δf − δG − (2St )−1 δG (4M1 + 1).
(7.53)
z − xt+1 ≤ z − xt ;
(7.54)
z − xt+1 > z − xt .
(7.55)
1/2
There are two cases:
Assume that (7.54) holds. Then by (7.52) and (7.54), z − xt+1 ≤ 2M.
(7.56)
Assume that (7.55) holds. It follows from (7.18), (7.50), (7.53), and (7.55) that F (xt+1 ) ≤ inf(F ) +2M1 δf + δG + (2St )−1 δG (4M1 + 1) 1/2
≤ inf(F ) + 1.
(7.57)
258
7 An Optimization Problems with a Composite Objective Function
Together with (7.19) and (7.57) this implies that xt+1 ≤ M, xt+1 − z ≤ 2M and that (7.56) holds in the both cases. Thus (7.52) and (7.53) hold for all integers t ≥ 0. In view of (7.50) and (7.52), xt ≤ 3M for all integers t ≥ 0. Let m ≥ 1 and T > 0 be integers. It follows from (7.50) and (7.53) that m+T
(F (xt − inf(F ))
t=m+1
≤
m+T −1
(z − xt 2 − z − xt+1 2 )
t=m
+T [(2S− )−1 δG (4M1 + 1) + 2M1 δf + δG ]. 1/2
Together with (7.51) this implies that min{F (xt ) : t = m + 1, . . . , m + T } − inf(F ), f(
m+T
T −1 xt ) − inf(F )
t=m+1
≤ 4T −1 M 2 + (2S− )−1 (4M1 + 1)δG + δG + 2δf M1 . 1/2
Theorem 7.5 is proved.
Chapter 8
A Zero-Sum Game with Two Players
In this chapter we study an algorithm for finding a saddle point of a zero-sum game with two players. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient while the second one is a proximal gradient step. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
8.1 The Algorithm and the Main Result Let (X, ·, ·), (Y, ·, ·) be Hilbert spaces equipped with the complete norms · which are induced by their inner products. Denote by Id the identity operator in X. Suppose that f : X → R 1 is a convex Fréchet differentiable function such that for each r > 0 its Fréchet derivative f (·) is Lipschitz on BX (0, r), g : Y → R 1 ∪ {∞} is convex lower semicontinuous function which is not identically ∞, A : Y → X is a linear continuous operator, A∗ : X → Y its dual and that A = sup{Ay : y ∈ Y, y ≤ 1}.
(8.1)
For each x ∈ X and each y ∈ Y define F (x, y) = f (x) + x, Ay − g(y).
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_8
(8.2)
259
260
8 A Zero-Sum Game with Two Players
Suppose that there exists (u∗ , v ∗ ) ∈ X × Y such that F (u∗ , v) ≤ F (u∗ , v ∗ ) ≤ F (u, v ∗ )
(8.3)
for all (u, v) ∈ X × Y . Fix a constant τ > 0 such that τ A ≤ 1.
(8.4)
Let us describe our algorithm. Initialization: select arbitrary (u0 , v0 ) ∈ X × Y . Iterative Step: for t = 1, 2, . . . , given current iteration vectors ut−1 ∈ X and vt−1 ∈ Y calculate pt = ut−1 − τ (Avt−1 + f (ut−1 )),
(8.5)
vt = Proxg (vt−1 + A∗ pt ) = argmin{g(z) + 2−1 z − vt−1 − A∗ pt 2 : z ∈ Y }
(8.6)
(see (7.2)), ut = ut−1 − τ (Avt + f (ut−1 )). This algorithm was considered in [37]. One of its steps (see (8.6)) is a proximal method applied to the function g. It should be mentioned that proximal methods is an important tool in analysis and optimization [49–52, 57, 59, 60, 63, 74, 75, 77, 78, 82, 84, 86–88]. Set G = Id − τ A∗ A.
(8.7)
Let δf , δG ∈ (0, 1]. In this chapter we study the algorithm described above, taking into account the computational errors. Initialization: select arbitrary (u0 , v0 ) ∈ X × Y . Iterative Step: for t = 1, 2, . . . , given current iteration vectors ut−1 ∈ X and vt−1 ∈ Y calculate ξt−1 ∈ X such that ξt−1 − f (ut−1 ) ≤ δf ,
(8.8)
pt = ut−1 − τ (Avt−1 + ξt−1 ),
(8.9)
8.1 The Algorithm and the Main Result
261
vt ∈ Y such that g(vt ) + 2−1 vt − vt−1 − A∗ pt 2 ≤ g(z) + 2−1 z − vt−1 − A∗ pt 2 + δG for all z ∈ Y,
(8.10)
and ut = ut−1 − τ (Avt + ξt−1 ).
(8.11)
dom(g) = {y ∈ Y : g(y) < ∞}.
(8.12)
Define
Assume that the set dom(g) is bounded and let M0 > 0 be such that dom(g) ⊂ BY (0, M0 ).
(8.13)
Let M1 > 0 be such that {x ∈ X : f (x) ≤ |f (0)| + 4 + xAM0 } ⊂ BX (0, M1 )
(8.14)
and let L ≥ 1 be such that |f (z1 ) − f (z2 )| ≤ Lz1 − z2 for all z1 , z2 ∈ BX (0, M1 + 1)
(8.15)
and |f (z1 ) − f (z2 )| ≤ Lz1 − z2 for all z1 , z2 ∈ X.
(8.16)
In view of (8.3) and (8.13), v ∗ ∈ dom(g) ⊂ BY (0, M0 ), f (u∗ ) − g(v ∗ ) + u∗ , Av ∗ = F (u∗ , v ∗ ) ≤ F (0, v ∗ ) = f (0) − g(v ∗ ).
(8.17)
262
8 A Zero-Sum Game with Two Players
By (8.2) and (8.3), f (0) ≥ f (u∗ ) + u∗ , A∗ v ∗ ≥ f (u∗ ) − u∗ AM0 , f (u∗ ) ≤ f (0) + M0 Au∗ .
(8.18)
It follows from (8.14) and (8.18) that u∗ ≤ M1 .
(8.19)
In this chapter we prove the following result. Theorem 8.1 Let δf,1 ≤ min{τ M1−1 , (AM0 + L + 1)−1 },
(8.20)
τ ≤ L−1 ,
(8.21)
∞ ∞ ∞ Assume that {ut }∞ t=0 ⊂ X, {vt }t=0 ⊂ Y , {ξt }t=0 ⊂ X, {pt }t=0 ⊂ X satisfy (8.8)– (8.11) for all natural numbers t and that
g(v0 ) < ∞, u0 ≤ M1 .
(8.22)
Let Δ = 2M1 δf τ −1 + δG (12M0 + 3 + 4A(M1 + M0 + 2)) 1/2
(8.23)
and for each natural number T let uT = T −1
T
ut , vT = T −1
t=1
T t=1
Then for all integers t ≥ 0, vt ≤ M0 , ut ≤ M1 , for all integers t ≥ 1, pt ≤ M1 + M0 + 2 and for all integers T ≥ 1,
vt .
(8.24)
8.2 Auxiliary Results
263
uT ≤ M1 , vT ≤ M0 , vT ∈ dom(g), and for each u ∈ X and each v ∈ dom(g), uT , vT ) + Δ + (2τ T )−1 4M12 + (2T )−1 4M02 G F ( uT , v) ≤ F ( and vT ) ≤ F (u, vT ) + Δ + (2τ T )−1 4M12 + (2T )−1 4M02 G. F ( uT , It is clear that the best choice of T is at the same order as (max{δf , δG }1/2 )−1 . 1/2
8.2 Auxiliary Results Lemma 8.2 Let a linear continuous operator G0 : Y → Y satisfy G∗0 = G0 and z, G0 z ≥ 0 for all z ∈ Y. Then for each u, v, w ∈ Y , 2w − v, G0 (u − v) = w − v, G0 (w − v) − w − u, G0 (w − u) + u − v, G0 (u − v). Proof We have w − v, G0 (w − v) − w − u, G0 (w − u) + u − v, G0 (u − v) = w, G0 w + v, G0 v − 2w, G0 v −(w, G0 w + u, G0 u − 2w, G0 u) +(v, G0 v + u, G0 u − 2v, G0 u) 2[−w, G0 v + v, G0 v + w, G0 u − v, G0 u] = 2[G0 u, w − v + G0 v, v − w] = 2w − v, G0 (u − v). Lemma 8.2 is proved.
264
8 A Zero-Sum Game with Two Players
∞ ∞ ∞ Lemma 8.3 Assume that {ut }∞ t=0 ⊂ X, {vt }t=0 ⊂ Y , {ξt }t=0 ⊂ X, {pt }t=0 ⊂ X satisfy (8.8)–(8.11) for all natural numbers t. Let t ≥ 0 be an integer and vt ∈ Y satisfy
vt − vt−1 − A∗ pt 2 g( vt ) + 2−1 ≤ g(z) + 2−1 z − vt−1 − A∗ pt 2
(8.25)
for all z ∈ Y . Then 1/2
vt ≤ 2δG . vt − Proof For all y ∈ Y set h(y) = g(y) + 2−1 y − vt−1 − A∗ pt 2 .
(8.26)
In view of (8.25) and (8.26), vt ∈ argmin(h). By (8.12), (8.25), and (8.26), vt ) + δG . h(vt ) ≤ h( Since the function h is convex it follows from (8.25) to (8.27) that vt ) h( vt ) ≤ h(2−1 vt + 2−1 = g(2−1 vt + 2−1 vt ) vt − vt−1 − A∗ pt 2 +2−1 2−1 vt + 2−1 ≤ g(2−1 vt + 2−1 vt ) vt − vt−1 − A∗ pt )2 +2−1 2−1 (vt − vt−1 − A∗ pt ) + 2−1 ( ≤ 2−1 g(vt ) + 2−1 g( vt ) +2−1 [22−1 (vt − vt−1 − A∗ pt )2 +22−1 ( vt − vt−1 − A∗ pt )2 − 2−1 (vt − vt )2 ] vt ) − 2−1 2−1 (vt − vt )2 = 2−1 h(vt ) + 2−1 h( ≤ 2−1 h( vt ) + 2−1 δG + 2−1 h( vt ) − 2−1 2−1 (vt − vt )2 ,
(8.27)
8.2 Auxiliary Results
265
vt − vt 2 ≤ 4δG and 1/2
v − vt ≤ 2δG .
Lemma 8.3 is proved.
∞ ∞ ∞ Lemma 8.4 Assume that {ut }∞ t=0 ⊂ X, {vt }t=0 ⊂ Y , {ξt }t=0 ⊂ X, {pt }t=0 ⊂ X satisfy (8.8)–(8.11) for all natural numbers t,
u0 ≤ M1 , v0 ≤ M0 , g(v0 ) < ∞. Then the following assertions hold. 1. vt ≤ M0 for all integers t ≥ 0 and vt ∈ dom(g) for all integers t ≥ 0. 2. Let t ≥ 0 be an integer. Then for all u ∈ X, F (ut , vt ) − F (u, vt ) ≤ (2τ )−1 (u − ut−1 2 − u − ut 2 ) −2−1 (τ −1 − L)ut − ut−1 2 + τ −1 δf ut − u. 3. Let t ≥ 0 be an integer. Then for all v ∈ dom(g), F (ut , v) − F (ut , vt ) 1/2
≤ δG (12M0 + 3 + 4Apt ) +2−1 [v − vt−1 , G(v − vt−1 ) − v − vt , G(v − vt ) −vt−1 − vt , G(vt−1 − vt )].
(8.28)
266
8 A Zero-Sum Game with Two Players
Proof In view of (8.10), (8.13), and (8.28), g(vt ) < ∞, t = 0, 1, . . . . Clearly, vt ≤ M0 for all integers t ≥ 0. Let us prove assertion 2. For every u ∈ X set h(u) = F (u, vt ) = f (u) + u, Avt − g(vt ).
(8.29)
By (8.29), the function h is Fréchet differentiable, for all u ∈ X, h (u) = f (u) + Avt
(8.30)
h (u1 ) − h (u2 ) = f (u1 ) − f (u2 ).
(8.31)
and for all u1 , u2 ∈ X,
Proposition 4.3, (8.16), and (8.31) imply that h(ut ) ≤ h(ut−1 ) + h (ut−1 ), ut − ut−1 + 2−1 Lut − ut−1 2 .
(8.32)
Let u ∈ X. Since the function h is convex we have h(u) ≥ h(ut−1 ) + h (ut−1 ), u − ut−1 .
(8.33)
It follows from (8.32) and (8.33) that h(ut ) ≤ h(u) − h (ut−1 ), u − ut−1 +h (ut−1 ), ut − ut−1 + 2−1 Lut − ut−1 2 = h(u) + h (ut−1 ), ut − u + 2−1 Lut − ut−1 2 .
(8.34)
By (8.29), (8.30), and (8.34), F (ut , vt ) − F (u, vt ) ≤ f (ut−1 ) + Avt , ut − u + 2−1 Lut − ut−1 2 .
(8.35)
In view of (8.11), τ −1 (ut−1 − ut ) − (f (ut−1 ) + Avt ) = τ −1 (ξt−1 − f (ut−1 )) ≤ τ −1 δf .
(8.36)
8.2 Auxiliary Results
267
It follows from (8.35) and (8.36) that F (ut , vt ) − F (u, vt ) ≤ τ −1 (ut−1 − ut ), ut − u + τ −1 δf ut − u +2−1 Lut − ut−1 2 . Lemma 2.1 and the relation above imply that F (ut , vt ) − F (u, vt ) (2τ )−1 (u − ut−1 2 − u − ut 2 − ut − ut−1 2 ) +τ −1 δf ut − u + 2−1 Lut − ut−1 2 . Thus assertion 2 holds. Let us prove assertion 3. In view of (8.10), g(vt ) + 2−1 vt − vt−1 − A∗ pt 2 ≤ g(z) + 2−1 z − vt−1 − A∗ pt 2 + δG
(8.37)
for all z ∈ Y . Let v = argmin{g(z) + 2−1 z − vt−1 − A∗ pt 2 : z ∈ Y }.
(8.38)
Lemma 8.3, (8.37), and (8.38) imply that 1/2
vt − v ≤ 2δG .
(8.39)
In view of (8.3), for every v ∈ Y , − F (pt , v) = −f (pt ) − pt , Av + g(v). By (8.40), for every v ∈ Y , g(v) + 2−1 v − vt−1 − A∗ pt 2 = g(v) + 2−1 v − vt−1 2 − v − vt−1 , A∗ pt + A∗ pt 2 = g(v) − pt , Av + Avt−1 , pt
(8.40)
268
8 A Zero-Sum Game with Two Players
+2−1 v − vt−1 2 + A∗ pt 2 = −F (pt , v) + 2−1 v − vt−1 2 + f (pt ) + Avt−1 , pt + A∗ pt 2 .
(8.41)
It follows from (8.38) and (8.41) that vt = argmin{−F (pt , v) + 2−1 v − vt−1 2 : v ∈ Y }.
(8.42)
Set h(v) = −F (pt , v), v ∈ Y.
(8.43)
Proposition 7.2 and (8.43) imply that for every y ∈ Y , v ) − F (pt , v) = h(v) − h( vt ) F (pt , v . ≥ v − v , vt−1 −
(8.44)
Let v ∈ dom(g). In view of (8.2), F (ut , v) − F (ut , vt ) + F (pt , vt ) − F (pt , v) = f (ut ) + ut , Av − g(v) −(f (ut ) + ut , Avt − g(vt )) +f (pt ) + pt , Avt − g(vt ) −(f (pt ) + pt , Av − g(v)) = ut − pt , Av − Avt . By the relation above, (8.9)–(8.11) and (8.44), F (ut , v) − F (ut , vt ) = F (pt , v) − F (pt , vt ) + ut − pt , A(v − vt ) = F (pt , v) − F (pt , vt ) + τ A(vt−1 − vt ), A(v − vt ) = F (pt , v) − F (pt , vt ) + τ vt−1 − vt , A∗ A(v − vt ).
(8.45)
8.2 Auxiliary Results
269
In view of (8.2) and (8.38), v ) − F (pt , vt ) F (pt , v − g( v) = f (pt ) + pt , A −(f (pt ) + pt , Avt − g(vt )) v − vt ) + g(vt ) − g( v ). = pt , A(
(8.46)
It is clear that v) g(vt ) − g( = g(vt ) + 2−1 vt − vt−1 − A∗ pt 2 −(g( v ) + 2−1 v − vt−1 − A∗ pt 2 ) v − vt−1 − A∗ pt 2 ). − (2−1 vt − vt−1 − A∗ pt 2 − 2−1
(8.47)
It follows from (8.37), (8.39), and (8.47) that v )| |g(vt ) − g( ≤ |g(vt ) + 2−1 vt − vt−1 − A∗ pt 2 −(g( v ) + 2−1 v − vt−1 − A∗ pt 2 | +2−1 |vt − vt−1 − A∗ pt 2 − v − vt−1 − A∗ pt 2 | v (vt + 2vt−1 + 2Apt + v ) ≤ δG + 2−1 vt − 1/2
≤ δG + δG (2 + 2vt + 2vt−1 + 2Apt ).
(8.48)
Relations (8.39), (8.46), and (8.48) imply that v ) − F (pt , vt )| |F (pt , 1/2
≤ 2pt AδG + δG 1/2
+ δG (2 + 2vt + 2vt−1 + 2Apt ). By (8.44) and (8.45),
(8.49)
270
8 A Zero-Sum Game with Two Players
F (ut , v) − F (ut , vt ) = F (pt , v) − F (pt , vt ) + τ vt−1 − vt , A∗ A(v − vt ) v ) + F (pt , v ) − F (pt , vt ) = F (pt , v) − F (pt , +τ vt−1 − vt , A∗ A(v − vt ) v ≤ v − vt−1 , v − 1/2
+δG (2 + 2vt + 2vt−1 + 2Apt ) 1/2
+δG + 2δG pt A +τ vt−1 − vt , A∗ A(v − vt ) 1/2
= δG + 2δG pt A 1/2
+δG (2 + 2vt + 2vt−1 + 2Apt ) +τ vt−1 − vt , A∗ A(v − vt ) +vt − vt−1 , v − vt v − vt−1 , v − v ]. + [vt−1 − vt , v − vt +
(8.50)
By (8.13), (8.39), assertion 1, and the inclusion v ∈ dom(g), v − vt − vt−1 , v − vt | | v − vt−1 , v − v − v − vt−1 , v − vt | ≤ | v − vt−1 , v − +| v − vt−1 , v − vt − vt − vt−1 , v − vt | v ≤ v − vt−1 v − v − vt +v − vt 1/2
≤ 2δG (2M0 + 2M0 + v).
(8.51)
Assertion 1, (8.50), and (8.51) imply that F (ut , v) − F (ut , vt ) 1/2
1/2
≤ δG (4M0 + 3 + 4Apt ) + 8M0 δG
8.3 Proof of Theorem 8.1
271
+τ vt−1 − vt , A∗ A(v − vt ) +vt − vt−1 , v − vt 1/2
= δG (12M0 + 3 + 4Apt ) + vt − vt−1 , (Id − τ A∗ A)(v − vt ).
(8.52)
G = Id − τ A∗ A.
(8.53)
Recall (see (8.7)) that
In view of (8.4) and (8.53), G∗ = G and for each z ∈ Y , z, Gz = z, z − τ A∗ Az z2 − τ Az2 ≥ z2 (1 − τ A) ≥ 0.
(8.54)
By (8.52)–(8.54) and Lemma 8.2, F (ut , v) − F (ut , vt ) 1/2
≤ δG (12M0 + 3 + 4Apt ) +2−1 (v − vt−1 , G(v − vt−1 −v − vt , G(v − vt ) −vt−1 − vt , G(vt−1 − vt )). Thus assertion 3 holds. This completes the proof of Lemma 8.4.
8.3 Proof of Theorem 8.1 Assertion 1 of Lemma 8.4 implies that for all integers t ≥ 0, vt ∈ dom(g), vt ≤ M0 .
(8.55)
Assertion 2 of Lemma 8.4 and (8.21) imply that for all integers t ≥ 0 and all u ∈ X,
272
8 A Zero-Sum Game with Two Players
F (ut , vt ) − F (u, vt ) ≤ (2τ )−1 (u − ut−1 2 − u − ut 2 ) + L−1 δf ut − u.
(8.56)
By Assertion 3 of Lemma 8.4, for all integers t ≥ 0 and all v ∈ dom(g), F (ut , v) − F (ut , vt ) 1/2
≤ δG (12M0 + 3 + 4Apt ) + 2−1 [v − vt−1 , G(v − vt−1 ) − v − vt , G(v − vt )].
(8.57)
We show that for all integers t ≥ 0, ut ≤ M1 , pt+1 ≤ M1 + M0 + 2.
(8.58)
Assume that t ≥ 1 is an integer and that ut−1 ≤ M1 .
(8.59)
(Note that in view of (8.22), inequality (8.59) holds for t = 1.) By (8.56), F (ut , vt ) ≤ F (0, vt ) + (2τ )−1 (ut−1 2 − ut 2 ) + τ −1 δf ut .
(8.60)
There are two cases: ut ≤ ut−1 ;
(8.61)
ut > ut−1 .
(8.62)
Assume that (8.61) holds. In view of (8.59) and (8.61), ut ≤ ut−1 ≤ M1 . Assume that (8.62) holds. By (8.2), (8.60), and (8.62), f (ut ) + ut , Avt ≤ f (0) + τ −1 δf ut .
(8.63)
Relations (8.55) and (8.63) imply that f (ut ) ≤ f (0) + AM0 ut + δf τ −1 ut .
(8.64)
8.3 Proof of Theorem 8.1
273
It follows from (8.8), (8.11), (8.15), (8.55), and (8.59) that ut ≤ ut−1 + τ (Avt + ξt−1 ) ≤ M1 + τ (AM0 + L + 1).
(8.65)
By (8.20) and (8.65), δf τ −1 ut ≤ δf τ −1 M1 + δf (AM0 + L + 1) ≤ 2.
(8.66)
It follows from (8.14), (8.64), and (8.66) that f (ut ) ≤ f (0) + AM0 ut + 2.
(8.67)
In view of (8.67), ut ≤ M1 . Thus by induction we showed that ut ≤ M1 for all integers t ≥ 0.
(8.68)
Relations (8.8), (8.15), and (8.68) imply that for all integers t ≥ 0, ξt ≤ L + 1.
(8.69)
By (8.4), (8.9), (8.21), (8.55), (8.68), and (8.69), pt = ut−1 − τ (Avt−1 + ξt−1 ) ≤ ut−1 + τ (Avt−1 + ξt−1 ) ≤ M1 + τ (AM0 + L + 1) ≤ M1 + M0 + 2. Thus (8.58) holds. In view of (8.56), (8.58), and (8.68), for all integers t ≥ 0, all u ∈ BX (0, M1 ), and all v ∈ dom(g), F (ut , vt ) − F (u, vt ) ≤ (2τ )−1 (u − ut−1 2 − u − ut 2 ) + 2τ −1 δf M1 and F (ut , v) − F (ut , vt )
(8.70)
274
8 A Zero-Sum Game with Two Players
≤ 2−1 [v − vt−1 , G(v − vt−1 ) − v − vt , G(v − vt )] 1/2
+ δG (12M0 + 3 + 4A(M1 + M0 + 2)).
(8.71)
It follows from (8.7) and (8.70) that for all integers t ≥ 0, all u ∈ BX (0, M1 ), and all v ∈ dom(g), F (ut , v) − F (u, vt ) ≤ (2τ )−1 (u − ut−1 2 − u − ut 2 ) + 2τ −1 δf M1 +2−1 [v − vt−1 , G(v − vt−1 ) − v − vt , G(v − vt )] 1/2
+ 4δG (12M0 + 3 + 4A(M1 + M0 + 2)).
(8.72)
Let T be a natural number. By (8.24), (8.62), and (8.72), for all u ∈ BX (0, M1 ) and all v ∈ dom(g), vT ) F ( uT , v) − F (u, = F(
T
T −1 ut , v) − F (u,
t=1
≤ T −1
T
T −1 vt )
t=1
T
F (ut , v) − T −1
t=1
T
F (u, vt )
t=1
≤ 2M1 τ −1 δf + δG (12M0 + 3 + 4A(M1 + M0 + 2)) 1/2
+(2T τ )−1
T (u − ut−1 2 − u − ut 2 ) t=1
−1
+(2T )
T (v − vt−1 , G(v − vt−1 ) − v − vt , G(v − vt )) t=1
≤ 2M1 τ −1 δf + δG (12M0 + 3 + 4A(M1 + M0 + 2)) 1/2
+(2T τ )−1 4M12 + (2T )−1 4M02 G = Δ + (2T τ )−1 4M12 + (2T )−1 4M02 G.
(8.73)
ΔT = Δ + (2T τ )−1 4M12 + (2T τ )−1 4M02 G.
(8.74)
Set
8.3 Proof of Theorem 8.1
275
In view of (8.73) and (8.74), with v = v ∗ and u = u∗ , vT ) ≤ ΔT . F ( uT , v ∗ ) − F (u∗ ,
(8.75)
Relations (8.3) and (8.75) imply that 0 ≤ F ( uT , v ∗ ) − F (u∗ , v ∗ ) ≤ ΔT ,
(8.76)
0 ≤ F (u∗ , v ∗ ) − F (u∗ , vT ) ≤ ΔT .
(8.77)
By (8.73) and (8.74), for each u ∈ BX (0, M1 ) and each v ∈ dom(g), vT ) − F (u, vT ) ≤ ΔT , F ( uT ,
(8.78)
F ( uT , v) − F ( uT , vT ) ≤ ΔT .
(8.79)
u ∈ X \ BX (0, M1 ).
(8.80)
Assume that
In view of (8.2) and (8.78), vT ) ≤ ΔT + F (0, vT ) = ΔT + f (0) + g( vT ). F ( uT ,
(8.81)
It follows from (8.2), (8.14), (8.24), (8.55), and (8.80) that vT ) F (u, vT ) − F (0, = f (u) + u, A vT − f (0) ≥ f (u) − uAM0 − |f (0)| ≥ 4.
(8.82)
By (8.81) and (8.82), vT ) ≤ ΔT + F (0, vT ) F ( uT , vT ) − 4. ≤ ΔT + F (u,
(8.83)
In view of (8.83) and (8.88), vT ) ≤ ΔT + F (u, vT ) F ( uT , for all u ∈ X. Theorem 8.1 is proved.
Chapter 9
PDA-Based Method for Convex Optimization
In this chapter we use predicted decrease approximation (PDA) for constrained convex optimization. For PDA-based method each iteration consists of two steps. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
9.1 Preliminaries and the Main Result Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · , F : X → R 1 be a convex Fréchet differentiable function, G : X → R 1 ∪ {∞} be a convex lower semicontinuous function which is not identically infinity, dom(G) = {x ∈ X : G(x) < ∞}, H (x) = F (x) + G(x), x ∈ X.
(9.1)
We denote by F (x) the Fréchet derivative of F at x ∈ X. We suppose that dom(G) is a bounded set and that inf(H ) = inf{H (x) : x ∈ X} is a finite number. Let
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_9
277
278
9 PDA-Based Method for Convex Optimization
D := diam(dom(G)) = sup{x − y : x, y ∈ dom(G)},
(9.2)
H (x∗ ) = inf(H ).
(9.3)
x∗ ∈ X satisfy
Given a function f : X → R 1 ∪ {∞}, its convex conjugate is the function f ∗ (y) = sup{x, y − f (x) : x ∈ X}, y ∈ X.
(9.4)
Clearly, for all x, y ∈ X, f (x) + f ∗ (u) ≥ x, u
(9.5)
(Fenchel inequality). For every y ∈ X define S(y) = sup{F (y), y − p + G(y) − G(p) : p ∈ X}.
(9.6)
By (9.4) and (9.6), for y ∈ X, S(y) = G(y) + F (y), y + sup{−F (y), p − G(p) : p ∈ X} G(y) + F (y), y + G∗ (−F (y)).
(9.7)
It follows from Fenchel inequality (9.5) and (9.7) that for every y ∈ dom(G), S(y) ≥ 0.
(9.8)
Let y ∈ dom(G). In view (9.6) and (9.8), the following four properties are equivalent: S(y) = 0;
(9.9)
G(p) ≥ G(y) + −F (y), p − y for all p ∈ X;
(9.10)
− F (y) ∈ ∂G(y);
(9.11)
y is a minimizer of H.
(9.12)
9.1 Preliminaries and the Main Result
279
For each x, y ∈ X set [x, y] = {αx + (1 − α)y : α ∈ [0, 1]}. Lemma 9.1 ([16]) For every y ∈ X, H (y) − inf(H ) ≤ S(y). Proof Let y ∈ Y . If S(y) = ∞, then the lemma holds. Assume that S(y) < ∞.
(9.13)
sup{−F (y), p − G(p) : p ∈ X} < ∞
(9.14)
G(y) < ∞.
(9.15)
In view of (9.7) and (9.13),
and
By (9.14), there exists p ∈ X such that −F (y), p − G(p ) ≥ − − F (y), p − G(p) for all p ∈ X.
(9.16)
It follows from (9.6) and (9.16) that S(y) ≥ F (y), y − p + G(y) − G(p ) = F (y), y + G(y) − [F (y), p + G(p )] ≥ F (y), y + G(y) − F (y), x∗ + G(x∗ ) − = F (y), y − x∗ + G(y) − G(x∗ ) − . Since is any positive number in view of (9.3), S(y) ≥ F (y), y − x∗ + G(y) − G(x∗ ) ≥ F (y) − F (x∗ ) + G(y) − G(x∗ ) = H (y) − inf(H ). Lemma 9.1 is proved.
Suppose that L > 0 and F (x) − F (y) ≤ Lx − y for all x, y ∈ X.
(9.17)
280
9 PDA-Based Method for Convex Optimization
For every γ ≥ 1 and every y¯ ∈ dom(G), we say that u(y) ¯ ∈ dom(G) is a γ −1 -predicted decrease approximation (PDA) vector of H at y¯ [16] if γ −1 S(y) ¯ ≤ F (y), ¯ y¯ − u(y) ¯ + G(y) ¯ − G(u(y)). ¯
(9.18)
Note that for any γ˜ ≥ γ ≥ 1 any γ −1 -PDA vector is also a γ˜ −1 -PDA vector. Let γ ≥ 1, y¯ ∈ dom(G), > 0. We say that u(y) ¯ ∈ dom(G) is a (γ −1 , )-predicted decrease approximation (PDA) vector of H at y¯ if ¯ ≤ F (y), ¯ y¯ − u(y) ¯ + G(y) ¯ − G(u(y)) ¯ + . γ −1 S(y)
(9.19)
For x, y ∈ X define Q(y, x) = F (x) + F (x), y − x + 2−1 Lx − y2 .
(9.20)
Let γ ≥ 1, δ1 , δ2 ∈ (0, 1]. Let us describe our algorithm. Initialization: select an arbitrary y0 ∈ dom(G). Iterative Step: given a current iteration vector yk ∈ dom(G) do the following: (i) calculate u(yk ) which is (γ −1 , δ1 )-predicted decrease approximation (PDA) vector of H at yk such that u(yk ) ∈ dom(G), γ −1 S(yk ) − δ1 ≤ F (yk ), yk − u(yk ) + G(yk ) − G(u(yk ));
(9.21) (9.22)
(ii) choose a bounded set Xk such that [yk , u(yk )] ⊂ Xk
(9.23)
yk+1 ∈ Xk
(9.24)
and calculate
for which at least one of the following inequalities holds: Q(yk+1 , yk ) + G(yk+1 ) ≤ Q(y, yk ) + G(y) + δ2 for all y ∈ Xk
(9.25)
9.2 Auxiliary Results
281
(local modal update); H (yk+1 ) ≤ H (y) + δ2 for all y ∈ Xk
(9.26)
(global exact model update). Note that an exact version of this algorithm (without computational errors δ1 , δ2 ) was studied in [16]. ∞ Theorem 9.2 Assume that {yk }∞ k=0 , {u(yk )}k=0 ⊂ X, Xk ⊂ X, k = 0, 1, . . . , (9.21)–(9.24) hold for all k = 0, 1, . . . and that for all nonnegative integers k ≥ 0, at least one of the relations (9.25) and (9.26) hold. Then for every integer k ≥ 0,
H (yk+1 ) − inf(H ) ≤ 2γ (k + 2γ )−1 ((2γ − 2)(k + 1)−1 (H (y0 ) − inf(H )) + LD 2 γ ) +(δ1 + δ2 )(2γ + k − 1). Theorem 9.2 is proved in Sect. 9.3. Its prototype without computational errors was obtained in [16].
9.2 Auxiliary Results ∞ Lemma 9.3 Assume that {yk }∞ k=0 , {u(yk )}k=0 ⊂ X, Xk ⊂ X, k = 0, 1, . . . , (9.21)– (9.24) hold for all k = 0, 1, . . . and that for all nonnegative integers k, at least one of the relations (9.25) and (9.26) hold. Let {tk }∞ k=0 ⊂ (0, 1]. Then for every integer k ≥ 0,
H (yk+1 ) ≤ H (yk ) + 2−1 tk2 LD 2 − γ −1 tk S(yk ) + δ1 + δ2 . Proof Let k ≥ 0 be an integer and set uk = u(yk ).
(9.27)
[yk , uk ] ⊂ Xk ,
(9.28)
By (9.21)–(9.23) and (9.27),
γ −1 S(yk ) − δ1 ≤ F (yk ), yk − uk + G(yk ) − G(uk ).
(9.29)
Since the function G is convex it follows from (9.2), (9.20), (9.23), (9.28), and (9.29) that
282
9 PDA-Based Method for Convex Optimization
inf{Q(y, yk ) + G(y) : y ∈ Xk } ≤ inf{Q(y, yk ) + G(y) : y ∈ [yk , uk ]} ≤ inf{Q(tuk + (1 − t)yk , yk ) + G(tuk + (1 − t)yk ) : t ∈ [0, 1]} ≤ Q(tk uk + (1 − tk )yk , yk ) + G(tk uk + (1 − tk )yk ) = F (yk ) + tk F (yk ), uk − yk + 2−1 Ltk2 yk − uk 2 +G(tk uk + (1 − t)yk ) ≤ F (yk ) + G(yk ) +tk (F (yk ), uk − yk + G(uk ) − G(yk )) +2−1 tk2 Lyk − uk 2 ≤ F (yk ) + G(yk ) − tk γ −1 S(yk ) + δ1 + 2−1 Ltk2 D 2 .
(9.30)
Proposition 4.3, (9.17), and (9.20) imply that F (y) ≤ Q(y, yk ) for all y ∈ Xk
(9.31)
F (y) + G(y) ≤ Q(y, yk ) + G(y), y ∈ Xk .
(9.32)
and
In the case of the local update it follows from (9.25) and (9.32) that F (yk+1 ) + G(yk+1 ) ≤ Q(yk+1 , yk ) + G(yk+1 ) ≤ inf{Q(y, yk ) + G(y) : y ∈ Xk } + δ2 . In the case of the exact model update it follows from (9.26) and (9.32) that F (yk+1 ) + G(yk+1 ) ≤ inf{F (y) + G(y) : y ∈ Xk } + δ2 ≤ inf{Q(y, yk ) + G(y) : y ∈ Xk } + δ2 . Therefore in both cases in view of (9.30),
9.2 Auxiliary Results
283
F (yk+1 ) + G(yk+1 ) ≤ inf{Q(y, yk ) + G(y) : y ∈ Xk } + δ2 . ≤ F (yk ) + G(yk ) + 2−1 tk2 LD 2 − γ −1 tk S(yk ) + δ1 + δ2 .
Lemma 9.3 is proved. Lemma 9.4 Let c > 0, γ ≥ 1, δ ∈ (0, 2],
{ak }∞ k=0 ,
{bk }∞ k=0
⊂
R1,
0 ≤ ak ≤ bk , k = 0, 1, . . . , tk = 2γ (2γ + k)−1 , k = 0, 1, . . . .
(9.33)
Assume that for all integers k ≥ 0, ak+1 ≤ ak − tk bk γ −1 + 2−1 ctk2 + δ. Then for all integers k ≥ 0, ak+1 + (
k k (bi − ai )(i + 2γ − 1))( (i + 2γ − 1))−1 i=0
i=0
≤ 2γ (k + 2γ )−1 ((2γ − 2)(k + 1)−1 a0 + cγ ) + δ(2γ + k − 1). Proof In view of (9.33) and (9.34), for any integer i ≥ 0, bi ≤ γ ti−1 (ai − ai+1 + 2−1 ti2 c + δ)2 and bi − ai ≤ (γ ti−1 − 1)ai −γ ti−1 ai+1 + 2−1 cγ ti + δγ ti−1 = 2−1 ai (2γ + i − 2) − 2−1 ai+1 (2γ + i) +(2γ + i)−1 cγ 2 + 2−1 δ(2γ + i). Multiplying the relation above by i + 2γ − 1 ≥ 0 we obtain that (bi − ai )(i + 2γ − 1) ≤ 2−1 (i + 2γ − 2)(i + 2γ − 1)ai − 2−1 (i + 2γ )(i + 2γ − 1)ai+1
(9.34)
284
9 PDA-Based Method for Convex Optimization
+ cγ 2 (i + 2γ − 1)(i + 2γ )−1 + 2−1 δ(2γ + i)(i + 2γ − 1).
(9.35)
Let k ≥ 0 be integer. Summing up (9.35) for i = 0, . . . , k we obtain that k
(bi − ai )(i + 2γ − 1)
i=0
≤ 2−1 (2γ − 2)(2γ − 1)a0 − 2−1 (k + 2γ )(k + 2γ − 1)ak+1 + c(k + 1)γ 2 + 2−1 δ(k + 1)(2γ + k)(k + 2γ − 1).
(9.36)
Dividing both sides of (9.36) by 2−1 (k + 1)(k + 2γ ) we obtain that for any integer k ≥ 0, k ( (bi − ai )(i + 2γ − 1))(2−1 (k + 1)(k + 2γ ))−1 i=0
+(k + 2γ − 1)(k + 1)−1 ak+1 ≤ 2γ (k + 2γ )−1 ((2γ − 2)(2γ − 1)(2γ )−1 (k + 1)−1 a0 + cγ ) + δ(2γ + k − 1).
(9.37)
Clearly, k (i + 2γ − 1) = 2−1 (k + 1)(k + 4γ − 2) i=0
≥ 2−1 (k + 1)(k + 2γ ). By (9.37) and (9.38), k k (bi − ai )(i + 2γ − 1)( (i + 2γ − 1))−1 + ak+1 i=0
i=0
≤
k (bi − ai )(i + 2γ − 1)(2−1 (k + 1)(k + 2γ ))−1 i=0
(9.38)
9.3 Proof of Theorem 9.2 and Examples
285
+(k + 2γ − 1)(k + 1)−1 ak+1 ≤ 2γ (k + 2γ )−1 ((2γ − 2)(k + 1)−1 a0 + cγ ) + δ(2γ + k − 1).
Lemma 9.4 is proved.
9.3 Proof of Theorem 9.2 and Examples Theorem 9.2 follows from Lemmas 9.1, 9.3, and 9.4 applied with δ = δ1 + δ2 , ak = H (yk ) − inf(H ), bk = S(yk ), k = 0, 1, . . . , tk = 2γ (2γ + k)−1 , k = 0, 1, . . . , c = LD 2 . Clearly, the best choice of k is at the same order as (δ1 + δ2 )−1/2 and in this case the right-hand side of the final equation of Theorem 9.2 is c1 (δ1 + δ2 )1/2 , where c1 is a positive constant. We consider two special cases of the PDA-based method. Let γ = 1 and for a given integer k ≥ 0 define yk ∈ X, uk ∈ dom(G) such that F (yk ), uk + G(uk ) ≤ F (yk ), u + G(u) + δ1 for all u ∈ X. By (9.6), for every integer k ≥ 0, S(yk ) = sup{F (yk ), yk − u + G(yk ) − G(u) : u ∈ X} = F (yk ), yk + G(yk ) + sup{−F (yk ), u − G(u) : u ∈ X} ≤ F (yk ), yk + G(yk ) −F (yk ), uk − G(uk ) + δ1 = F (yk ), yk − uk + G(yk ) − G(uk ) + δ1
(9.39)
286
9 PDA-Based Method for Convex Optimization
and (9.22) holds with u(yk ) = uk . Then we chose tk ∈ [0, 1] such that H (yk + tk (uk − yk )) ≤ H (yk + t (uk − yk )) + δ2 for all t ∈ [0, 1] and set yk+1 = yk + tk (uk − yk ). Clearly, in this case we have a generalized conditional algorithm. Consider now another special case of the PDA-based method with Xk = X, k = 0, 1, . . . For any integer k ≥ 0 find yk+1 ∈ X such that F (yk ) + F (yk ), yk+1 − yk + 2−1 Lyk − yk+1 2 + G(yk+1 ) ≤ F (yk ) + F (yk ), y − yk + 2−1 Lyk − y2 + G(y) + δ2 for all y ∈ X which is equivalent to the relation L−1 G(yk+1 ) + 2−1 yk+1 − (yk − L−1 F (yk ))2 ≤ L−1 G(y) + 2−1 y − (yk − L−1 F (yk ))2 + L−1 δ2 for all y ∈ X.
Chapter 10
Minimization of Quasiconvex Functions
In this chapter we study minimization of a quasiconvex function. Our algorithm has two steps. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
10.1 Preliminaries Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · and let f : X → R 1 . We consider the problem f (x) → min, x ∈ X using the algorithm considered in [53]. Recall that inf(f ) = inf{f (x) : x ∈ X}, argmin(f ) = {x ∈ X : f (x) = inf(f )}. Let F ⊂ X. Denote by int(F ), cl(F ), and bd(F ) its interior, closure, and boundary respectively. For every x ∈ X, N(F, x) = {q ∈ X : q, y − x ≤ 0 for all y ∈ F }.
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_10
(10.1)
287
288
10 Minimization of Quasiconvex Functions
Set SX (0, 1) = {z ∈ X : z = 1}.
(10.2)
Suppose that for each α ∈ R 1 the set {x ∈ X : f (x) ≤ α} is convex. In other words, the function f is quasiconvex. Assume that inf(f ) > −∞.
(10.3)
For every > 0 and every x ∈ X set G (x) = {z ∈ X : f (z) < f (x) − }.
(10.4)
G(x) = G0 (x).
(10.5)
Let
For every x ∈ X set N (G(x), x) = {q ∈ X : q, y − x ≤ 0 for all y ∈ X satisfying f (y) < f (x)}.
(10.6)
β > 0, L ≥ 1, x∗ ∈ argmin(f ).
(10.7)
Let
Assume that for every z ∈ Y , |f (z) − f (x∗ )| ≤ Lz − x∗ β .
(10.8)
10.2 An Auxiliary Result Proposition 10.1 Let ≥ 0, x ∈ X, and f (x) − inf(f ) > .
(10.9)
10.2 An Auxiliary Result
289
Then f (x) − inf(f ) ≤ Lq, x − x∗ β + for all q ∈ SX (0, 1) ∩ N (G (x), x). Proof Since the function f is continuous and quasiconvex the set G (x) is nonempty, open, and convex. Set r = inf{z − x∗ : z ∈ bd(G (x))}.
(10.10)
{zk }∞ k=1 ⊂ bd(G (x))
(10.11)
There exist sequences
and {δk }∞ k=1 ⊂ (0, ∞) such that lim δk = 0,
k→∞
zk − x∗ ≤ r + δk , k = 1, 2, . . . .
(10.12)
By (10.8), (10.11), and (10.12), for all integers k ≥ 0, f (x) − inf(f ) − ≤ f (zk ) − inf(f ) ≤ Lzk − x∗ β ≤ L(r + δk )β .
(10.13)
It follows from (10.11) and (10.13) that f (x) − inf(f ) ≤ Lr β + .
(10.14)
By (10.7), (10.9), and (10.10), x∗ + rz ∈ cl(G (x)) for all z ∈ SX (0, 1). In view of (10.1), (10.4), and (10.15), for every q ∈ SX (0, 1) ∩ N (G (x), x) we have q, x∗ + rz − x ≤ 0.
(10.15)
290
10 Minimization of Quasiconvex Functions
Applying the inequality above with z = q we obtain that 0 ≥ q, x∗ + rq − x and q, x − x∗ ≥ r for all q ∈ SX (0, 1) ∩ N (G (x), x).
(10.16)
Let q ∈ SX (0, 1) ∩ N (G (x), x). By (10.14) and (10.16), f (x) − inf(f ) ≤ Lr β + ≤ Lq, x − x∗ β + .
Proposition 10.1 is proved.
10.3 The Main Result Let δ1 , δ2 > 0, γ ∈ (0, 2). Let us describe our algorithm. Initialization: select an arbitrary x0 ∈ X. Iterative Step: Let xk ∈ X be given a current iteration vector. If Gδ1 (xk ) = ∅, then inf(f ) ≥ f (xk ) − δ1 and xk is considered as an approximate solution of our problem. If Gδ1 (xk ) = ∅, then find qk ∈ X such that BX (qk , δ2 ) ∩ SX (0, 1) ∩ N (Gδ1 (xk ), xk ) = ∅ and set λk = γ [(f (xk ) − inf(f ) − δ1 )L−1 ]1/β , xk+1 = xk − λk qk .
10.3 The Main Result
291
Let M > 0 be such that x∗ ≤ M.
(10.17)
In this chapter we prove the following result. Theorem 10.2 Let T = (4γ )−1 (2 − γ )δ2−2 + 1,
(10.18)
T −1 {xt }Tt=0 , {qt }t=0 ⊂ X, {λt }Tt=0 ⊂ (0, ∞),
x0 ≤ M,
(10.19)
Gδ1 (xt ) = ∅, t = 0, . . . , T − 1
(10.20)
and for t = 0, . . . , T − 1, BX (qt , δ2 ) ∩ SX (0, 1) ∩ N (Gδ1 (xt ), xt ) = ∅,
(10.21)
λt = γ [(f (xt ) − inf(f ) − δ1 )L−1 ]1/β ,
(10.22)
xt+1 = xt − λt qt .
(10.23)
Then there exists an integer t ∈ [0, T ] such that f (xt ) ≤ (8δ2 M)β (2 − γ )−β + inf(f ) + δ1 . Proof Assume that the theorem is not true. Then for all t = 0, . . . , T , f (xt ) > (8δ2 M)β (2 − γ )−β + inf(f ) + δ1 .
(10.24)
In view of (10.24) for all t = 0, . . . , T , ((2 − γ )(f (xt ) − inf(f ) − δ1 ))1/β > 8δ2 M.
(10.25)
By (10.17) and (10.19), x0 − x∗ ≤ 2M.
(10.26)
Assume that S ∈ {0, . . . , T }, S < T and that xt − x∗ ≤ 2M, t = 0, . . . , S.
(10.27)
292
10 Minimization of Quasiconvex Functions
(Note that in view of (10.26), (10.27) holds for S = 0.) Let t ∈ [0, S] be an integer. By (10.21) and (10.23), xt+1 − x∗ 2 = xt − λt qt − x∗ 2 = xt − x∗ 2 − 2λt qt , xt − x∗ + λ2t .
(10.28)
In view of (10.21), there exists qt ∈ BX (qt , δ2 ) ∩ SX (0, 1) ∩ N (Gδ1 (xt ), xt ).
(10.29)
Proposition 10.1, (10.4), (10.20), (10.22), (10.25), (10.28), (10.29), and the inequality f (xt ) − inf(f ) > δ1 imply that xt+1 − x∗ 2 ≤ xt − x∗ 2 − 2λt qt , xt − x∗ + λ2t +2 qt − qt λt xt − x∗ qt , xt − x∗ + λ2t ≤ xt − x∗ 2 − 2λt +2δ2 λt xt − x ∗ ≤ xt − x∗ 2 − 2γ ((f (xt ) − inf(f ) − δ1 )L−1 )1/β ×(f (xt ) − inf(f ) − δ1 )1/β +γ 2 ((f (xt ) − inf(f ) − δ1 )L−1 )2/β + 2δ2 γ ((f (xt ) − inf(f ) − δ1 )L−1 )1/β xt − x ∗ . By (10.24), (10.27), and (10.30), xt+1 − x∗ 2 ≤ xt − x∗ 2 − γ (f (xt ) − inf(f ) − δ1 )2/β (2L−1/β − γ L−2/β ) +2δ2 γ ((f (xt ) − inf(f ) − δ1 )L−1 )1/β xt − x ∗ ≤ xt − x∗ 2
(10.30)
10.3 The Main Result
293
−γ (f (xt ) − inf(f ) − δ1 )1/β L−1/β ×(2 − γ L−1/β )((f (xt ) − inf(f ) − δ1 )1/β − 4δ2 M) ≤ xt − x∗ 2 − (f (xt ) − inf(f ) − δ1 )1/β γ (4δ2 M) ≤ xt − x∗ 2 − 4δ2 Mγ (2 − γ )−1 8δ2 M.
(10.31)
In view of (10.27) and (10.31), xt+1 − x∗ ≤ xt − x∗ ≤ 2M. By induction we have shown that xt − x∗ ≤ 2M, t = 0, . . . , T and that for all t = 0, . . . , T − 1, xt+1 − x∗ 2 ≤ xt − x∗ 2 − (4δ2 M)2 γ (2 − γ )−1 .
(10.32)
It follows from (10.26) and (10.32) that 4M 2 ≥ x0 − x∗ 2 − xT − x∗ 2 =
T −1
(xt − x∗ 2 − xt+1 − x∗ 2 )
t=0
≥ T (δ2 M)2 γ (2 − γ )−1 and T ≤ 4γ −1 (2 − γ )4−2 δ −2 . This contradicts (10.18). The contradiction we have reached completes the proof of Theorem 10.2.
Chapter 11
Minimization of Sharp Weakly Convex Functions
In this chapter we study the subgradient projection algorithm for minimization of sharp weakly convex functions, under the presence of computational errors. The problem is described by an objective function and a set of feasible points. For this algorithm each iteration consists of two steps. The first step is a calculation of a subgradient of the objective function while in the second one we calculate a projection on the feasible set. In each of these two steps there is a computational error. In general, these two computational errors are different. We show that our algorithm generates a good approximate solution, if all the computational errors are bounded from above by a small positive constant. Moreover, if we know the computational errors for the two steps of our algorithm, we find out what approximate solution can be obtained and how many iterates one needs for this.
11.1 Preliminaries The problem and the algorithm studied in this chapter were considered in [35] in a finite-dimensional setting and without computational error. Let X be a Hilbert space equipped with an inner product ·, · which induces a complete norm · . Let C ⊂ X be a nonempty closed convex set, ρ ≥ 0, μ > 0 g : X → R1 be a locally Lipschitz function such that the function g(x) + 2−1 ρx2 , x ∈ X
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_11
(11.1)
295
296
11 Minimization of Sharp Weakly Convex Functions
is convex. Following [35] the function g is called ρ-weakly convex. Set Cmin = argmin(g, C) = {x ∈ C : g(x) = inf(g, C)}.
(11.2)
Cmin = ∅.
(11.3)
Suppose that
Let M1 > 2 + 2μρ −1 , Cmin ⊂ BX (0, M1 − 2 − μρ −1 ).
(11.4)
Let L ≥ 2 be such that |g(z1 ) − g(z2 )| ≤ Lz1 − z2 for all z1 , z2 ∈ BX (0, M1 + 4).
(11.5)
For every x ∈ X, denote by ∂g(x) the set of all v ∈ X such that g(y) ≥ g(x) + v, y − x + o(y − x) as y → 0.
(11.6)
We can show (see Proposition 11.1) that for every x ∈ X, ∂(g + 2−1 ρ · 2 )(x) = ∂g(x) + ρx,
(11.7)
where the left-hand side of (11.7) is the subdifferential of the convex function. Suppose that the following sharpness property holds [35]: for all z ∈ C g(z) − inf(g, C) ≥ μd(z, Cmin ).
(11.8)
A point x¯ ∈ C is called stationary if g(x) − g(x) ¯ ≥ o(x − x) ¯ as x → x¯ in C.
(11.9)
11.2 The Subdifferential of Weakly Convex Functions
297
11.2 The Subdifferential of Weakly Convex Functions Proposition 11.1 Let x ∈ X. Then v ∈ ∂g(x) if and only if v + ρx ∈ ∂(g + 2−1 ρ · 2 )(x). Moreover, if v ∈ ∂g(x), then for every y ∈ X, g(y) ≥ g(x) + v, y − x − 2−1 ρy − x2 . Proof Let v ∈ ∂g(x)
(11.10)
and h ∈ X \ {0}. In view of (11.6), for all t > 0, g(x + th) ≥ g(x) + v, th + o(th) as t → 0+ . This implies that t −1 (g(x + th) − g(x)) ≥ v, h + o(th)t −1 and lim inf t −1 (g(x + th) − g(x)) ≥ v, h. t→0+
(11.11)
Thus we have shown that the following property holds: (i) if v ∈ ∂g(x), then (11.11) holds for every h ∈ X. Note that the function g(x) + 2−1 ρx2 , x ∈ X is convex. For every t > 0 and every h ∈ X, we have t −1 (g(x + th) + 2−1 ρx + th2 − g(x) − 2−1 ρx2 ) = t −1 (g(x + th) − g(x)) + t −1 (2−1 ρ2th, x + 2−1 ρt 2 h2 ). By (11.12), lim t −1 (g(x + th) + 2−1 ρx + th2 − g(x) − 2−1 ρx2 )
t→0+
(11.12)
298
11 Minimization of Sharp Weakly Convex Functions
= lim (t −1 (g(x + th) − g(x)) + ρh, x). t→0+
(11.13)
It follows from (11.13) that lim (t −1 (g(x + th) − g(x))
t→0+
exists and lim (t −1 (g(x + th) − g(x))
t→0+
= lim t −1 ((g + 2−1 ρ · 2 )(x + th) − (g + 2−1 ρ · 2 )(x)) − ρh, x. t→0+
(11.14)
Property (i) implies that if v ∈ ∂g(x), then (11.11) holds and in view of (11.14), v + ρx ∈ ∂(g + 2−1 ρ · 2 )(x).
(11.15)
Let v ∈ X and (11.15) hold. By (11.15), for every y ∈ X, g(y) + 2−1 ρy2 ≥ g(x) + 2−1 ρx2 + v + ρx, y − x and g(y) ≥ g(x) + v, y − x +2−1 ρx2 − 2−1 y2 + ρx, y − x = g(x) + v, y − x − 2−1 ρy2 − 2−1 x2 + ρx, y = g(x) + v, y − x − 2−1 ρy − x2 .
(11.16)
Relation (11.16) implies that v ∈ ∂g(x). Thus (11.15) implies the inclusion above. Proposition 11.1 is proved.
11.3 An Auxiliary Result Lemma 11.2 Let δ ∈ (0, 1], δ0 ∈ (0, 1], δ1 = μ/16,
(11.17)
11.3 An Auxiliary Result
299
x, ξ ∈ X, x ≤ M1 + 3,
(11.18)
B(x, δ) ∩ C = ∅,
(11.19)
ξ ∈ ∂f (x),
(11.20)
ξ ≤ δ1 .
(11.21)
Then at least one of the following inequalities holds: d(x, Cmin ) ≤ 16δ;
(11.22)
d(x, Cmin ) ≤ 16Lδμ−1 ;
(11.23)
d(x, Cmin ) ≥ 2−1 3μρ −1 . Proof There exists {xi }∞ i=1 ⊂ Cmin
(11.24)
lim xi − x = d(x, Cmin ).
(11.25)
such that i→∞
In view of (11.19), there exists x ∈ C such that x − x ≤ δ.
(11.26)
By (11.5), (11.8), (11.17), (11.18), (11.24), and (11.26), for i = 1, 2, . . . Lδ + g(x) − g(xi ) ≥ g( x ) − g(x) + g(x) − g(xi ) g( x ) − g(xi ) ≥ μd( x , Cmin ).
(11.27)
Proposition 11.1 and (11.20) imply that ξ + ρx ∈ ∂(g + 2−1 ρ · 2 )(x). In view of (11.21) and (11.28), for every y ∈ X,
(11.28)
300
11 Minimization of Sharp Weakly Convex Functions
g(y) − g(x) ≥ ξ, y − x − 2−1 ρy2 + 2−1 ρx2 − ρx2 + ρx, y ≥ −δ1 y − x − 2−1 ρx − y2 .
(11.29)
By (11.29), for i = 1, 2, . . . , g(x) − g(xi ) ≤ δ1 xi − x + 2−1 ρxi − x2 .
(11.30)
It follows from (11.27) and (11.30) that for all integers i ≥ 1, μd( x , Cmin ) ≤ Lδ + g(x) − g(xi ) ≤ Lδ + δ1 xi − x + 2−1 ρxi − x2 .
(11.31)
By (11.24)–(11.26) and (11.31), x , Cmin ) −μδ + μd(x, Cmin ) ≤ μd( ≤ Lδ + δ1 d(x, Cmin ) + 2−1 ρd(x, Cmin )2 .
(11.32)
In view of (11.32), μd(x, Cmin ) ≤ μδ + Lδ + δ1 d(x, Cmin ) + 2−1 ρd(x, Cmin )2 .
(11.33)
Assume that (11.22) and (11.23) do not hold. Together with (11.17) and (11.33) this implies that μd(x, Cmin ) ≤ (μ/16)d(x, Cmin ) +(μ/16)d(x, Cmin ) + (μ/16)d(x, Cmin ) + 2−1 ρd(x, Cmin )2 ≤ (μ/4)d(x, Cmin ) + 2−1 ρd(x, Cmin )2 .
(11.34)
By (11.34), μ ≤ μ/4 + 2−1 ρd(x, Cmin ) and μ ≤ (2/3)ρd(x, Cmin ). Lemma 11.2 is proved.
11.4 The First Main Result
301
11.4 The First Main Result Let γ ∈ (0, 1), δ, δ0 ∈ (0, 1], δ0 ≤ μ/16. Let us describe our algorithm. Initialization: select an arbitrary x0 ∈ X such that d(x0 , Cmin ) < γ μρ −1 , d(x0 , C) ≤ δ. Iterative Step: given a current iteration vector xt ∈ X, if g(xt ) ≤ inf(g, C) + max{(16δ(L + 1)3 γ (1 − γ )−1 ρ −1 )1/2 , 16δL, 16L2 δμ−1 , 4δL(μ + L)μ−1 (1 − γ )−1 }, then xt is considered as an approximate solution; otherwise calculate ξt ∈ X such that B(ξt , δ0 ) ∩ ∂f (xt ) = ∅ (note that by Lemma 11.2, ξt = 0) and calculate xt+1 ∈ X such that xt+1 − PC (xt − ξt −2 (g(xt ) − inf(g, C))ξt ) ≤ δ. Proposition 11.3 Let γ ∈ (0, 1), δ, δ0 ∈ (0, 1], δ0 < min{μ(1 − γ )/4, μ/32},
(11.35)
T −1 ⊂ X, T > 0 be an integer, {xt }Tt=0 ⊂ X, {ξt }t=0
d(x0 , C) ≤ δ,
(11.36)
d(x0 , Cmin ) < γ μρ −1 ,
(11.37)
for all t = 0, . . . , T − 1, g(xt ) > inf(g, C) + (16δ(L + 1)3 γ (1 − γ )−1 ρ −1 )1/2 ,
(11.38)
302
11 Minimization of Sharp Weakly Convex Functions
g(xt ) > inf(g, C) + max{16δL, 16L2 δμ−1 , δ(μ + L)L, 4δL(μ + L)μ−1 (1 − γ )−1 }, BX (ξt , δ0 ) ∩ ∂f (xt ) = ∅
(11.39) (11.40)
and xt+1 − PC (xt − ξt −2 (g(xt ) − inf(g, C))ξt ) ≤ δ.
(11.41)
Then ξt ≥ μ/32, t = 0, . . . , T , T ≤ 2M1 δ −1 . Proof By (11.4) and (11.37), x0 ≤ M1 .
(11.42)
Assume that t ∈ [0, T ) is an integer, xt is well defined, and d(xt , Cmin ) < γ μρ −1 .
(11.43)
(Note that in view of (11.37), inequality (11.43) holds for t = 0.) By (11.4) and (11.43), xt ≤ M1 .
(11.44)
It follows from (11.4), (11.5), (11.43), and (11.44), |g(xt ) − inf(g, C)| < Ld(xt , Cmin ).
(11.45)
g(xt ) < inf(g, C) + Ld(xt , Cmin ).
(11.46)
In view of (11.45),
Relations (11.39) and (11.46) imply that d(xt , Cmin ) > 16δ, 16δLμ−1 , 4δ(μ + L)μ−1 (1 − γ )−1 .
(11.47)
By (11.40), there exists ηt ∈ ∂f (xt )
(11.48)
11.4 The First Main Result
303
such that ηt − ξt ≤ δ0 .
(11.49)
Lemma 11.2, (11.17), (11.36), (11.41), (11.43), (11.44), (11.47), and (11.48) imply that ηt > μ/16.
(11.50)
It follows from (11.35), (11.49), and (11.50) that ξt ≥ μ/32.
(11.51)
Clearly, xt+1 is well defined by (11.41). (Thus we showed by induction that xi , i = 0, . . . , T , ξi , i = 0, . . . , T − 1 are well defined.) There exists a sequence {zi }∞ i=1 ⊂ Cmin
(11.52)
lim xt − zi = d(xt , Cmin ).
(11.53)
such that i→∞
Lemma 2.2, (11.41), and (11.52) imply that for all natural numbers i, xt+1 − zi ≤ δ + PC (xt − ξt −2 (g(xt ) − inf(g, C))ξt ) − zi ≤ δ + xt − ξt −2 (g(xt ) − inf(g, C))ξt − zi . Proposition 11.1, (11.6), (11.48), (11.49), (11.51), and (11.52) imply that xt − zi − ξt −2 (g(xt ) − inf(g, C))ξt )2 = xt − zi 2 +2ξt −2 (g(xt ) − inf(g, C))ξt , zi − xt +ξt −2 (g(xt ) − inf(g, C))2 ≤ xt − zi 2 +(g(xt ) − inf(g, C))2 ξt −2
(11.54)
304
11 Minimization of Sharp Weakly Convex Functions
+2(g(xt ) − inf(g, C))ξt −2 ξt − ηt , zi − xt +2(g(xt ) − inf(g, C))ξt −2 ηt , zi − xt ≤ xt − zi 2 +(g(xt ) − inf(g, C))2 ξt −2 +2(g(xt ) − inf(g, C))ξt −2 δ0 zi − xt +2(g(xt ) − inf(g, C) − δ)ξt −2 (g(zi ) − g(xt ) + 2−1 ρzi − xt 2 ) = xt − zi 2 +(g(xt ) − g(zi ))ξt −2 (ρxt − zi 2 − (g(xt ) − g(zi )) +2δ0 zi − xt ξt −2 (g(xt ) − inf(g, C)) = xt − zi 2 +(g(xt ) − g(zi ))ξt −2 (ρxt − zi 2 − (g(xt ) − g(zi )) + 2δ0 zi − xt ).
(11.55)
By (11.41) and (11.46), there exists x∈C
(11.56)
x − xt ≤ δ.
(11.57)
such that
By (11.5), (11.44), and (11.57), for all integers i = 1, 2, . . . , x ) − g(zi )) ≤ Lδ. |(g(xt ) − g(zi )) − (g(
(11.58)
Let i ≥ 0 be an integer. It follows from (11.8), (11.52), (11.56), and (11.57) that x , Cmin ) ≥ μd(xt , Cmin ) − δ. g( x ) − g(zi ) ≥ μd(
(11.59)
In view of (11.58) and (11.59), g(xt ) − g(zi ) ≥ −Lδ − μδ + μd(xt , Cmin ).
(11.60)
11.4 The First Main Result
305
By (11.55) and (11.60), xt − zi − ξt −2 (g(xt ) − inf(g, C))ξt )2 ≤ xt − zi 2 +ξt −2 (g(xt ) − g(zi ))(ρxt − zi 2 − μd(xt , Cmin ) + Lδ + μδ + 2δ0 zi − xt ).
(11.61)
It follows from (11.47) and (11.62) that for all sufficiently large natural numbers i, 16δ ≤ d(xt , Cmin ) ≤ zi − xt ≤ 2d(xt , Cmin ). By (11.35), (11.43), (11.52), (11.53), and (11.61), d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin )2 ≤ lim sup xt − zi − ξt −2 (g(xt ) − g(zi ))ξt 2 i→∞
≤ d(xt , Cmin )2 + ξt −2 (g(xt ) − inf(g, C)) ×(ρd(xt , Cmin )2 − μd(xt , Cmin ) +Lδ + μδ + 2δ0 d(xt , Cmin )) = d(xt , Cmin )2 + ξt −2 (g(xt ) − inf(g, C)) ×(ρd(xt , Cmin )2 − (μ − 2δ0 )d(xt , Cmin ) + δ(μ + L)) = d(xt , Cmin )2 +ξt −2 ρ(g(xt ) − inf(g, C))d(xt , Cmin )(d(xt , Cmin ) −(μ − 2δ0 )ρ −1 + d(xt , Cmin )−1 δ(μ + L))ρ −1 ≤ d(xt , Cmin )2 +ξt −2 ρ(g(xt ) − inf(g, C))d(xt , Cmin )(γ μρ −1 −μρ −1 + 2δ0 ρ −1 + δ(μ + L)d(xt , Cmin )−1 )ρ −1 ) ≤ d(xt , Cmin )2
(11.62)
306
11 Minimization of Sharp Weakly Convex Functions
−ξt −2 ρ(g(xt ) − inf(g, C))(d(xt , Cmin )((1 − γ )μρ −1 −2δ0 ρ −1 ) − δ(μ + L)ρ −1 ) ≤ d(xt , Cmin )2 − ξt −2 ρ(g(xt ) − inf(g, C)) × (d(xt , Cmin )(1 − γ )μρ −1 /2 − δ(μ + L)ρ −1 ).
(11.63)
By (11.47) and (11.63), d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin )2 ≤ d(xt , Cmin )2 − ξt −2 ρ(g(xt ) − inf(g, C))d(xt , Cmin )(1 − γ )μρ −1 /4.
(11.64)
In view of (11.39) and (11.64), d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin )2 ≤ d(xt , Cmin )2 .
(11.65)
It follows from (11.43), (11.64), and (11.65) that d(xt , Cmin ) − d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin ) = (d(xt , Cmin )2 − d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin )2 ) ×(d(xt , Cmin ) + d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin ))−1 ≥ d(xt , Cmin )2 − d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin )2 (2γ μ/ρ)−1 ≥ d(xt , Cmin )ξt −2 (g(xt ) − inf(g, C))8−1 γ −1 (1 − γ )ρ.
(11.66)
Lemma 2.2, (11.41), and (11.66) imply that d(xt+1 , Cmin ) ≤ δ + d(PC (xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin ) ≤ δ + d(xt − ξt −2 (g(xt ) − inf(g, C))ξt , Cmin ) ≤ δ + d(xt , Cmin ) − d(xt , Cmin )ξt −2 (g(xt ) − inf(g, C))8−1 γ −1 (1 − γ )ρ.
(11.67)
11.4 The First Main Result
307
By (11.5), (11.44), (11.48), and (11.49), ξt ≤ L + 1.
(11.68)
In view of (11.4), (11.5), and (11.44), g(xt ) − inf(g, C) ≤ Ld(xt , Cmin ).
(11.69)
It follows from (11.38), (11.68), and (11.69) that d(xt , Cmin )ξt −2 (g(xt ) − inf(g, C))8−1 γ −1 (1 − γ )ρ ≥ (g(xt ) − inf(g, C))L−1 8−1 γ −1 (1 − γ )ρ(L + 1)−2 ≥ 2δ.
(11.70)
By (11.67) and (11.70), d(xt+1 , Cmin ) ≤ d(xt , Cmin ) − δ.
(11.71)
We assume that (11.43) holds and obtained (11.71). This implies that for all integers t = 0, . . . , T − 1, d(xt+1 , Cmin ) ≤ d(xt , Cmin ) − δ and d(xt , Cmin ) ≤ γ μρ −1 for all integers t = 0, . . . , T . Together with (11.4) and (11.42) this implies that 2M1 ≥ d(x0 , Cmin ) ≥ d(x0 , Cmin ) − d(xT , Cmin ) =
T −1
(d(xt , Cmin ) − d(xt+1 , Cmin )) ≥ T δ
t=0
and T ≤ 2M1 δ −1 . This completes the proof of Proposition 11.3. Proposition 11.3 implies the following result.
308
11 Minimization of Sharp Weakly Convex Functions
Theorem 11.4 Let γ ∈ (0, 1), δ, δ0 ∈ (0, 1], δ0 < min{μ(1 − γ )/4, μ/32}, T = 2M1 δ −1 , T −1 ⊂ X, {xt }Tt=0 ⊂ X, {ξt }t=0
d(x0 , C) ≤ δ, d(x0 , Cmin ) < γ μρ −1 , for all t = 0, . . . , T − 1, B(ξt , δ0 ) ∩ ∂f (xt ) = ∅, if ξt = 0, then xt+1 − PC (xt − ξt −2 (g(xt ) − inf(g, C))ξt ) ≤ δ, otherwise xt+1 = xt . Then there exists t ∈ {0, . . . , T }, such that g(xt ) ≤ inf(g, C) + max{(16δ(L + 1)3 γ (1 − γ )−1 ρ −1 )1/2 , 16δL, 16L2 δμ−1 , δ(μ + L)L, 4δL(μ + L)μ−1 (1 − γ )−1 }.
11.5 An Algorithm with Constant Step Sizes Let γ ∈ (0, 1), δ, δ0 ∈ (0, 1], α > 0. Let us describe our algorithm.
11.6 An Auxiliary Result
309
Initialization: select an arbitrary x0 ∈ X such that d(x0 , Cmin ) < γ μρ −1 , d(x0 , C) ≤ δ. Iterative Step: given a current iteration vector xt ∈ X, calculate ξt ∈ X such that B(ξt , δ0 ) ∩ ∂f (xt ) = ∅, if ξt = 0, then calculate xt+1 ∈ X such that xt+1 − PC (xt − αξt −1 ξt ) ≤ δ; otherwise set xk+1 = xk .
11.6 An Auxiliary Result Lemma 11.5 Let α > 0, δ, δ0 ∈ (0, 1], γ ∈ (0, 1) satisfy α ≤ 2−1 μ(L + 1)−1 , δ0 < μ/32, δ0 ≤ 64−1 αρ, δ < μρ −1 , δ ≤ min{64−2 αμ(L + μ)−1 , 2−1 α 2 L−2 μ−1 ρ},
(11.72)
x ∈ X satisfy d(x, C) < δ, d(x, Cmin ) < γ μρ −1 ,
(11.73)
d(x, Cmin ) > max{16δ, 16Lδμ−1 , 6αμ−1 L}.
(11.74)
Then for every η ∈ ∂f (x), η ≥ μ/16 and the following assertion holds.
(11.75)
310
11 Minimization of Sharp Weakly Convex Functions
Let ξ ∈ X, x + ∈ X, BX (ξ, δ0 ) ∩ ∂f (x) = ∅,
(11.76)
x + − PC (x − αξ −1 ξ ) ≤ δ.
(11.77)
Then ξ ≥ μ/32, d(x + , Cmin )2 ≤ d(x, Cmin )2 − 2−1 α 2 . Proof By (11.4) and (11.73), x ≤ M1 .
(11.78)
Lemma 11.2, (11.73), (11.74), and (11.78) imply that for every η ∈ ∂f (x) (11.76) holds. Let us prove the assertion. In view of (11.76), there exists η ∈ ∂f (x)
(11.79)
ξ − η ≤ δ0 .
(11.80)
such that
It follows from (11.5), (11.72), (11.75), (11.76), and (11.78)–(11.80) that ξ ≥ μ/32,
(11.81)
ξ ≤ L + 1. Let {xi }∞ i=1 ⊂ Cmin
(11.82)
x − xi → d(x, Cmin ) as i → ∞.
(11.83)
and
In view of (11.73), we may assume that for all integers i = 1, 2, . . . , xi − x < γ μρ −1 . Let i ≥ 1 be an integer. Lemma 2.2, (11.77), and (11.82) imply that
(11.84)
11.6 An Auxiliary Result
311
x + − xi ≤ δ + PC (x − αξ −1 ξ ) − xi ≤ δ + x − xi − αξ −1 ξ .
(11.85)
Proposition 11.1, (11.6), (11.72), (11.79)–(11.81), and (11.84) imply that x − xi − αξ −1 ξ 2 = x − xi 2 + 2αξ −1 ξ, xi − x + α 2 ≤ x − xi 2 + 2αξ −1 η − ξ xi − x + 2αξ −1 η, xi − x + α 2 ≤ x − xi 2 + 64αμ−1 δ0 μρ −1 + α 2 + 2αξ −1 (g(xi ) − g(x) + 2−1 ρx − xi 2 ).
(11.86)
In view of (11.73), there exists x ∈ C ∩ BX (x, δ).
(11.87)
It follows from (11.5), (11.78), and (11.87) that g(x) − g( x )| ≤ Lδ. By (11.5), (11.8), (11.78)–(11.81), and (11.86)–(11.88), x − xi − αξ −1 ξ 2 x − xi 2 (1 + αρξ −1 ) + α 2 + 64αδ0 ρ −1 +2αξ −1 (g(xi ) − g( x ) + g( x ) − g(x)) ≤ x − xi 2 (1 + αρξ −1 ) x , Cmin )) +α 2 + 64αδ0 ρ −1 + 2αξ −1 (Lδ − μd( ≤ x − xi 2 (1 + αρξ −1 ) +α 2 + 64αδ0 ρ −1 + 2αξ −1 (Lδ + δμ − μd(x, Cmin )) ≤ x − xi 2 (1 + αρξ −1 ) −2αμd(x, Cmin )ξ −1 +α 2 + 64αδ0 ρ −1 + 2αξ −1 (L + μ)δ.
(11.88)
312
11 Minimization of Sharp Weakly Convex Functions
It follows from the relation above, (11.72), (11.73), and (11.81)–(11.83) that d(x − αξ −1 ξ, Cmin )2 ≤ d(x, Cmin )2 (1 + αρξ −1 ) −2αμd(x, Cmin )ξ −1 + α 2 + 64αδ0 ρ −1 + 2αξ −1 (L + μ)δ.
(11.89)
By (11.89), d(x, Cmin )2 − d(x − αξ −1 ξ, Cmin )2 ≥ d(x, Cmin )(2αμξ −1 − αρξ −1 d(x, Cmin )) −α 2 − 64αδ0 ρ −1 − 64αμ−1 (L + μ)δ ≥ d(x, Cmin )αμξ −1 − α 2 − 64αδ0 ρ −1 − 64αμ−1 (L + μ) ≥ d(x, Cmin )αμ(L + 1)−1 − 3α 2 ≥ d(x, Cmin )αμ(2L + 2)−1 .
(11.90)
d(x − αξ −1 ξ, Cmin ) ≤ d(x, Cmin ).
(11.91)
By (11.90),
Lemma 2.2 and (11.77) imply that for every z ∈ Cmin , d(x + , Cmin ) ≤ x + − z ≤ δ + PC (x − αξ −1 ξ ) − z ≤ δ + x − αξ −1 ξ − z. This implies that d(x + , Cmin ) ≤ δ + d(x − αξ −1 ξ, Cmin ).
(11.92)
By (11.72), (11.73), (11.91), and (11.92), d(x + , Cmin ) ≤ δ + d(x, Cmin ) ≤ δ + γ μρ −1 ≤ 2μρ −1 .
(11.93)
11.7 The Second Main Result
313
It follows from (11.73) and (11.91)–(11.93) that d(x + , Cmin )2 − d(x − αξ −1 ξ, Cmin )2 = (d(x + , Cmin ) − d(x − αξ −1 ξ, Cmin )) ×(d(x + , Cmin ) + d(x − αξ −1 ξ, Cmin )) ≤ 3δμρ −1 .
(11.94)
Lemma 2.2, (11.72), (11.74), (11.90), and (11.94) imply that d(x + , Cmin )2 ≤ d(x − αξ −1 ξ, Cmin )2 + 3δμρ −1 ≤ d(x, Cmin )2 − d(x, Cmin )αμ(2L + 2)−1 + 3δμρ −1 ≤ d(x, Cmin )2 − 6α 2 μ−1 Lμ(2L + 2)−1 + 3μρ −1 (2−1 α 2 L−2 μ−1 ρ) ≤ d(x, Cmin )2 − α 2 + 2α 2 L−2 ≤ d(x, Cmin )2 − α 2 /2.
Lemma 11.5 is proved.
11.7 The Second Main Result Theorem 11.6 Let γ ∈ (0, 1), δ, δ0 ∈ (0, 1], α > 0, α ≤ 2−1 μ(L + 1)−1 , δ0 < μ/32, δ0 ≤ 64−1 ρα, δ < μ/ρ, δ ≤ min{2−1 α 2 L−2 μ−1 ρ, 64−1 μ(L + μ)−1 α}, x0 ∈ X satisfy d(x0 , C) ≤ δ, d(x0 , Cmin ) < γ μρ −1 , T = 1 + 2(μ/ρ)2 α −2 , T −1 ⊂ X, for all t = 0, . . . , T − 1, {xt }Tt=0 ⊂ X, {ξt }t=0
(11.95)
314
11 Minimization of Sharp Weakly Convex Functions
B(ξt , δ0 ) ∩ ∂f (xt ) = ∅, if ξt = 0, then xt+1 − PC (xt − ξt −1 αξt ) ≤ δ, otherwise xt+1 = xt . Then there exists t ∈ {0, . . . , T }, such that ξs = 0 for all nonnegative integers s < t, g(xt ) ≤ inf(g, C) + max{16δ, 16Lμ−1 δ, 6αμ−1 L}. Proof Assume that the theorem does not hold. Then for all integers t = 0, . . . , T , d(xt , Cmin ) > max{16δ, 16δμ−1 L, 6αμ−1 L} + inf(g, C). Applying by induction Lemma 11.5 we obtain that for all integers t = 0, . . . , T − 1 ξt ≥ μ/32, d(xt+1 , Cmin )2 ≤ d(xt , Cmin )2 − 2−1 α 2 . By the relation above and (11.95), (μ/ρ)2 ≥ d(x0 , Cmin )2 ≥ d(x0 , Cmin )2 − d(xT , Cmin )2 T −1
(d(xt , Cmin )2 − d(xt+1 , Cmin )2 )
t=0
≥ 2−1 T α 2 and
11.8 Convex Problems
315
T ≤ 2(μ/ρ)2 α −2 . This contradicts the choice of T . The contradiction we have reached completes the proof of Theorem 11.6.
11.8 Convex Problems Let g : X → R 1 be a continuous convex function, μ > 0, M > 4, γ ∈ (0, 1) for every z ∈ C, g(z) − inf(g, C) ≥ μd(z, argmin(g, C))
(11.96)
argmin(g, C) ⊂ BX (0, M − 4).
(11.97)
and
Fix α > 0. Let L ≥ 1 be such that |f (z1 ) − f (z2 )| ≤ Lz1 − z2 for all z1 , z2 ∈ BX (0, M + 1)
(11.98)
and δ, δ0 ∈ (0, 1]. We prove the following result. Theorem 11.7 Let α ≤ 2−1 (L + 1)−1 , α ≤ 4−1 μ(L + 1)−2 , δ0 ≤ 4−1 M −1 α 2 (L + 1)2 , δ ≤ (2L)−1 α(L + 1)2 ,
(11.99)
x0 ≤ M, d(x0 , C) ≤ δ,
(11.100)
∞ {xt }∞ t=0 ⊂ X, {ξt }t=0 ⊂ X,
and for all integer t ≥ 0,
316
11 Minimization of Sharp Weakly Convex Functions
B(ξt , δ0 ) ∩ ∂f (xt ) = ∅, xt+1 − PC (xt − αξt ) ≤ δ.
(11.101)
Then there exists a nonnegative integer τ ≤ 2M 2 α −2 (L + 1)−2 such that for all integers t ∈ [0, τ ] \ {τ }, d(xt , Cmin ) ≤ d(xt , Cmin ) and for all integers t ≥ τ , d(xt , Cmin ) ≤ α(L + 1)((4L + 4)μ−1 + 3).
11.9 An Auxiliary Result Lemma 11.8 Let α < (2L + 2)−1 , 4δ0 M < α 2 (L + 1)2 , δ ≤ 2−1 α(L + 1),
(11.102)
x ≤ M,
(11.103)
d(x, C) ≤ δ,
(11.104)
BX (ξ, δ0 ) ∩ ∂f (x) = ∅,
(11.105)
y − PC (x − αξ ) ≤ δ.
(11.106)
x ∈ X,
ξ ∈ X,
y ∈ X,
Then d(y, Cmin )2 ≤ d(x, Cmin )2 + 6α 2 (L + 1)2 − 2αμd(x, Cmin ).
11.9 An Auxiliary Result
317
Proof Clearly, Cmin is a closed convex set. In view of (11.105), there exists η ∈ ∂g(x)
(11.107)
ξ − η ≤ δ0 .
(11.108)
such that
By (11.98), (11.103), and (11.107), η ≤ L.
(11.109)
It follows from (11.108) and (11.109) that ξ ≤ L + 1.
(11.110)
x∈C
(11.111)
x − x ≤ δ.
(11.112)
In view of (11.104), there exists
such that
Lemma 2.2 and (11.106) imply that y − PCmin (x) ≤ y − PC (x − αξ ) + PC (x − αξ ) − PCmin (x) ≤ δ + x − αξ − PC m in (x).
(11.113)
By (11.96), (11.98), (11.103), (11.104), (11.107), (11.108), and (11.110)–(11.112), x − αξ − PC m in (x)2 = x − PCmin (x)2 + α 2 ξ 2 + 2αξ, PCmin (x) − x ≤ x − PCmin (x)2 + α 2 ξ 2 + 2αη, PCmin (x) − x +2αη − ξ PCmin (x) − x ≤ x − PCmin (x)2 + α 2 (L + 1)2 + 4αδ0 M + 2αη, PCmin (x) − x
318
11 Minimization of Sharp Weakly Convex Functions
≤ d(x, Cmin )2 + α 2 (L + 1)2 + 4αδ0 M +2α(g(PCmin (x)) − g(x)) ≤ d(x, Cmin )2 + α 2 (L + 1)2 + 4αδ0 M +2α(inf(g, C) − g( x )) + 2α(g( x ) − g(x)) ≤ d(x, Cmin )2 + α 2 (L + 1)2 + 4αδ0 M + 2αδL −2αμd( x , Cmin ) ≤ d(x, Cmin )2 + α 2 (L + 1)2 + 4αδ0 M + 2αδL −2αμd(x, Cmin ) + 2αδμ ≤ d(x, Cmin )2 + 4α 2 (L + 1)2 − 2αμd(x, Cmin ).
(11.114)
It follows from (11.97), (11.102), (11.103), (11.110), (11.113), and (11.114) that y − PCmin (x)2 ≤ x − αξ − PCmin (x)2 + δ 2 + 2δx − αξ − PCmin (x) ≤ d(x, Cmin )2 + 4α 2 (L + 1)2 − 2αd(x, Cmin )μ + δ 2 + 2δ(2M + 2) ≤ d(x, Cmin )2 + 6α 2 (L + 1)2 − 2αd(x, Cmin )μ.
Lemma 11.8 is proved.
11.10 Proof of Theorem 11.7 Assume that t ≥ 0 is an integer, xt ≤ M. Applying Lemma 11.8 with x = xt , y = xt+1 , ξ = ξt we obtain that
(11.115)
11.10 Proof of Theorem 11.7
319
d(xt+1 , Cmin )2 ≤ d(xt , Cmin )2 + 6α 2 (L + 1)2 − 2αd(x, Cmin )μ.
(11.116)
Assume in addition that d(xt , Cmin ) ≥ 4α(L + 1)2 μ−1 .
(11.117)
By (11.116) and (11.117), d(xt+1 , Cmin )2 ≤ d(xt , Cmin )2 − 2α 2 (L + 1)2 .
(11.118)
Thus we have shown that the following property holds: (a) if an integer t ≥ 0 and (11.115) holds, then (11.116) holds and if (11.115) and (11.117) are valid, then (11.118) is true. Assume that τ ≥ 1 is an integer and that for all integers t = 0, . . . , τ − 1, (11.117) holds. By induction, we apply property (a) and (11.110) and show that (11.118) holds for all t = 0, . . . , τ − 1 and that (11.115) holds for all t = 0, . . . , τ . It follows from (11.97), (11.100), and (11.118) holding for all t = 0, . . . , τ − 1 that 4M 2 ≥ d(x0 , Cmin )2 ≥ d(x0 , Cmin )2 − d(xτ , Cmin )2 =
τ −1 (d(xt , Cmin )2 − d(xt+1 , Cmin )2 ) t=0
≥ 2α 2 (L + 1)2 τ and τ ≤ 2M 2 α −2 (L + 1)−2 . This implies that there exists an integer τ ≤ 2M 2 α −2 (L + 1)2
(11.119)
d(xτ , Cmin ) < 4α(L + 1)2 μ−1 ,
(11.120)
such that
if an integer t ∈ [0, τ ] \ {τ }, then (11.117) holds and d(xt+1 , Cmin ) ≤ d(xt , Cmin ).
320
11 Minimization of Sharp Weakly Convex Functions
We show that for all integers t ≥ τ , d(xt , Cmin ) ≤ 4α(L + 1)2 μ−1 + 3α(L + 1). Assume the contrary. Then in view of (11.120) there exists an integer s > τ such that d(xs , Cmin ) > 4α(L + 1)2 μ−1 + 3α(L + 1)
(11.121)
(11.121) does not hold for all t = τ, . . . , s − 1 and in particular, d(xs−1 , Cmin ) ≤ 4α(L + 1)2 μ−1 + 3α(L + 1).
(11.122)
By (11.97), (11.99), and (11.122), xs−1 ≤ M.
(11.123)
d(xs−1 , Cmin ) > 4α(L + 1)2 μ−1 ;
(11.124)
d(xs−1 , Cmin ) ≤ 4α(L + 1)2 μ−1 .
(11.125)
There are two cases:
Assume that (11.124) holds. Together with property (a) and (11.123) this implies that (11.118) holds with t = s − 1 and that d(xs , Cmin ) ≤ d(xs−1 , Cmin ) ≤ 4α(L + 1)2 μ−1 + 3α(L + 1). Assume that (11.125) holds. Property (a), (11.116) with t = s − 1, (11.213) and (11.125) imply that d(xs , Cmin ) ≤ d(xs−1 , Cmin ) + 3α(L + 1) ≤ 4α(L + 1)2 μ−1 + 3α(L + 1). Thus in both cases d(xs , Cmin ) ≤ 4α(L + 1)2 μ−1 + 3α(L + 1). This contradicts (11.121). The contradiction we have reached proves that d(xt , Cmin ) ≤ 4α(L + 1)2 μ−1 + 3α(L + 1) = α(L + 1)((4L + 4)μ−1 + 3) for all integers t ≥ τ . This completes the proof of Theorem 11.7.
Chapter 12
A Projected Subgradient Method for Nonsmooth Problems
In this chapter we study the convergence of the projected subgradient method for a class of constrained optimization problems in a Hilbert space. For this class of problems, an objective function is assumed to be convex but a set of admissible points is not necessarily convex. Our goal is to obtain an -approximate solution in the presence of computational errors, where is a given positive number.
12.1 Preliminaries and Main Results Let (X, ·, ·) be a Hilbert space with an inner product ·, · which induces a complete norm · . For each x ∈ X and each nonempty set A ⊂ X put d(x, A) = inf{x − y : y ∈ A}. Assume that f : X → R 1 is a convex continuous function which is Lipschitz on all bounded subsets of X. For each point x ∈ X and each positive number let ∂f (x) = {l ∈ X : f (y) − f (x) ≥ l, y − x for all y ∈ X} be the subdifferential of f at x and let ∂ f (x) = {l ∈ X : f (y) − f (x) ≥ l, y − x − for all y ∈ X} be the -subdifferential of f at x. Let C be a closed nonempty subset of the space X. Assume that © Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6_12
321
322
12 A Projected Subgradient Method for Nonsmooth Problems
lim f (x) = ∞.
x→∞
(12.1)
It means that for each M0 > 0 there exists M1 > 0 such that if a point x ∈ X satisfies the inequality x ≥ M1 , then f (x) > M0 . Define inf(f, C) = inf{f (z) : z ∈ C}. Since the function f is Lipschitz on all bounded subsets of the space X it follows from (12.1) that inf(f, C) is finite. Set Cmin = {x ∈ C : f (x) = inf(f, C)}. It is well known that if the set C is convex, then the set Cmin is nonempty. Clearly, the set Cmin = ∅ if the space X is finite-dimensional. In this chapter we assume that Cmin = ∅.
(12.2)
It is clear that Cmin is a closed subset of X. We suppose that the following assumption holds. (A1) For every positive number there exists δ > 0 such that if a point x ∈ C satisfies the inequality f (x) ≤ inf(f, C) + δ, then d(x, Cmin ) ≤ . (It is clear that (A1) holds if the space X is finite-dimensional.) We also suppose PC : X → X satisfies for all x ∈ C and all y ∈ X, x − PC (y) ≤ x − y
(12.3)
PC (x) = x for all x ∈ C.
(12.4)
and that
(Note that in [85] we assumed that PC (X) = X.) In this chapter we also use the following assumption. (A2) For each M > 0 and each r > 0 there exists δ > 0 such that for each x ∈ BX (0, M) satisfying d(x, C) ≥ r and each z ∈ BX (0, M) ∩ C, we have PC (x) − z ≤ x − z − δ. In Section 3.13, Chapter 3 of [93] mappings satisfying (A2), (12.3) and (12.4) are called (C)-quasi-contractive and it is shown that in appropriate spaces of mappings satisfying (12.3) and (12.4) most of the mappings are (C)-quasi-contractive. In the chapter we also use the following assumption.
12.1 Preliminaries and Main Results
323
(A3) The mapping PC is uniformly continuous on all bounded subsets of X and for every x ∈ X, PCn (x) converges in the norm topology to a point of C uniformly on every bounded subset of X. Clearly, if (A3) holds, then limn→∞ PCn (x) is a fixed point of PC for every x ∈ X. Set P 0 x = x, x ∈ X. For every number ∈ (0, ∞) set φ() = sup{δ ∈ (0, 1] : if x ∈ C satisfies f (x) ≤ inf(f, C) + δ, then d(x, Cmin ) ≤ min{1, }}.
(12.5)
In view of (A1), φ() is well defined for every positive number . In this chapter we will prove the following two results. Theorem 12.1 Suppose that at least one of the assumptions (A2) and (A3) holds. Let {αi }∞ i=0 ⊂ (0, 1] satisfy lim αi = 0,
i→∞
∞
αi = ∞
(12.6)
i=1
and let M, > 0. Then there exist a natural number n0 and δ > 0 such that the following assertion holds. Assume that an integer n ≥ n0 , {xk }nk=0 ⊂ X, x0 ≤ M, vk ∈ ∂δ f (xk ) \ {0}, k = 0, 1, . . . , n − 1, n−1 {ηk }n−1 k=0 , {ξk }k=0 ⊂ BX (0, δ),
and that for k = 0, . . . , n − 1, xk+1 = PC (xk − αk vk −1 vk − αk ξk ) − αk ηk . Then the inequality d(xk , Cmin ) ≤ hods for all integers k satisfying n0 ≤ k ≤ n. Theorem 12.2 Suppose that at least one of the assumptions (A2) and (A3) holds. Let M, > 0. Then there exists β0 ∈ (0, 1) such that for each β1 ∈ (0, β0 ) there exist a natural number n0 and δ > 0 such that the following assertion holds. Assume that an integer n ≥ n0 , {xk }nk=0 ⊂ X, x0 ≤ M,
324
12 A Projected Subgradient Method for Nonsmooth Problems
vk ∈ ∂δ f (xk ) \ {0}, k = 0, 1, . . . , n − 1, {αk }n−1 k=0 ⊂ [β1 , β0 ], n−1 {ηk }n−1 k=0 , {ξk }k=0 ⊂ BX (0, δ)
and that for k = 0, . . . , n − 1, xk+1 = PC (xk − αk vk −1 vk − αk ξk ) − ηk . Then the inequality d(xk , Cmin ) ≤ holds for all integers k satisfying n0 ≤ k ≤ n. In this chapter we use the following definitions and notation. Define X0 = {x ∈ X : f (x) ≤ inf(f, C) + 4}.
(12.7)
In view of (12.1), there exists a number K¯ > 0 such that ¯ X0 ⊂ BX (0, K).
(12.8)
Since the function f is Lipschtiz on all bounded subsets of the space X there exists a number L¯ > 1 such that ¯ 1 − z2 |f (z1 ) − f (z2 )| ≤ Lz for all z1 , z2 ∈ BX (0, K¯ + 4).
(12.9)
12.2 Auxiliary Results Proposition 12.3 Let ∈ (0, 1]. Then for each x ∈ X satisfying d(x, C) < min{L¯ −1 2−1 φ(/2), /2},
(12.10)
f (x) ≤ inf(f, C) + min{2−1 φ(/2), /2},
(12.11)
the inequality d(x, Cmin ) ≤ holds. Proof In view of the definition of φ, φ(/2) ∈ (0, 1] and the following property holds: (i) if x ∈ C satisfies f (x) < inf(f, C) + φ(/2),
12.2 Auxiliary Results
325
then d(x, Cmin ) ≤ min{1, /2}.
(12.12)
Assume that a point x ∈ X satisfies (12.10) and (12.11). In view of (12.10), there exists a point y ∈ C which satisfies x − y < 2−1 L¯ −1 φ(/2) and x − y < /2.
(12.13)
Relations (12.5), (12.7), (12.8), (12.11), and (12.13) imply that ¯ y ∈ BX (0, K¯ + 1). x ∈ BX (0, K),
(12.14)
By (12.13), (12.14), and the definition of L¯ (see (12.9)), ¯ − y < φ(/2)2−1 . |f (x) − f (y)| ≤ Lx
(12.15)
It follows from the choice of the point y, (12.11), and (12.15) that y∈C and f (y) < f (x) + φ(/2)2−1 ≤ inf(f, C) + φ(/2). Combined with property (i) this implies that d(y, Cmin ) ≤ /2. Together with (12.13) this implies that d(x, Cmin ) ≤ x − y + d(y, Cmin ) ≤ . This completes the proof of Proposition 12.3.
Lemma 12.4 Assume that > 0, x ∈ X, y ∈ X, f (x) > inf(f, C) + , f (y) ≤ inf(f, C) + /4, v ∈ ∂/4 f (x).
(12.16) (12.17)
Then v, y − x ≤ −/2. Proof In view of (12.17), f (u) − f (x) ≥ v, u − x − /4 for all u ∈ X.
(12.18)
326
12 A Projected Subgradient Method for Nonsmooth Problems
By (12.16) and (12.18), −(3/4) ≥ f (y) − f (x) ≥ v, y − x − /4.
This completes the proof of Lemma 12.4. Lemma 12.5 Let x¯ ∈ Cmin ,
(12.19)
¯ α ∈ (0, 1], let a number δ ∈ (0, 1] satisfy K0 > 0, ∈ (0, 16L], ¯ −1 , δ(K0 + K¯ + 1) ≤ (8L)
(12.20)
x ≤ K0 ,
(12.21)
f (x) > inf(f, C) + ,
(12.22)
η, ξ ∈ BX (0, δ),
(12.23)
v ∈ ∂/4 f (x) \ {0},
(12.24)
y = PC (x − αv−1 v − αξ ) − η.
(12.25)
let a point x ∈ X satisfy
and let
Then ¯ −1 + 2α 2 + η2 + 2η(K0 + K¯ + 2). ¯ 2 − α(4L) y − x ¯ 2 ≤ x − x Proof In view of (12.7)–(12.9) and (12.19), for every point ¯ 4−1 L¯ −1 ), z ∈ BX (x, we have ¯ − x f (z) ≤ f (x) ¯ + Lz ¯ ≤ f (x) ¯ + /4 = inf(f, C) + /4. Lemma 12.4, (12.22), (12.24), and (12.26) imply that for every point ¯ 4−1 L¯ −1 ), z ∈ BX (x,
(12.26)
12.2 Auxiliary Results
327
we have v, z − x ≤ −/2. Combined with (8.22) the inequality above implies that ¯ −1 ). ¯ (4L) v−1 v, z − x < 0 for all z ∈ B(x,
(12.27)
Set z˜ = x¯ + 4−1 L¯ −1 v−1 v.
(12.28)
z˜ ∈ B(x, ¯ 4−1 L¯ −1 ).
(12.29)
It is easy to see that
Relations (12.27)–(12.29) imply that 0 > v−1 v, z˜ − x = v−1 v, x¯ + 4−1 L¯ −1 v−1 v − x.
(12.30)
By (12.30), v−1 v, x¯ − x < −4−1 L¯ −1 .
(12.31)
y0 = x − αv−1 v − αξ.
(12.32)
Set
It follows from (12.7), (12.8), (12.19)–(12.21), (12.23), (12.31), and (12.32) that ¯ 2 = x − αv−1 v − αξ − x ¯ 2 y0 − x = x − αv−1 v − x ¯ 2 + α 2 ξ 2 −2αξ, x − αv−1 v − x ¯ ¯ 2 ≤ x − αv−1 v − x +α 2 δ 2 + 2αδ(K0 + K¯ + 1) ¯ αv−1 v ≤ x − x ¯ 2 − 2x − x, +α 2 + α 2 δ 2 + 2αδ(K0 + K¯ + 1) < x − x ¯ 2 − 2α(4−1 L¯ −1 )
328
12 A Projected Subgradient Method for Nonsmooth Problems
+α 2 (1 + δ 2 ) + 2αδ(K0 + K¯ + 1) ¯ −1 + 2α 2 . ≤ x − x ¯ 2 − α(4L)
(12.33)
In view of (12.7), (12.8), (12.19), (12.21), and (12.33), ¯ 2+2 ¯ 2 ≤ (K0 + K) y0 − x and ¯ ≤ K0 + K¯ + 2. y0 − x
(12.34)
By (12.3), (12.19), (12.25), (A2), (12.32), (12.33), and (12.34), ¯ 2 y − x ¯ 2 = PC (y0 ) − η − x ≤ PC (y0 ) − x ¯ 2 + η2 + 2ηPC (y0 ) − x ¯ ¯ 2 + η2 + 2ηy0 − x ¯ ≤ y0 − x ¯ −1 ≤ x − x ¯ 2 − α(4L) +2α 2 + η2 + 2η(K0 + K¯ + 2). This completes the proof of Lemma 12.5. Lemma 12.5 implies the following result. ¯ α ∈ (0, 1], a number δ ∈ (0, 1] satisfy Lemma 12.6 Let K0 > 0, ∈ (0, 16L], ¯ −1 , δ(K0 + K¯ + 1) ≤ (8L) let x ∈ X satisfy x ≤ K0 , f (x) > inf(f, C) + , let η, ξ ∈ BX (0, δ), v ∈ ∂/4 f (x) \ {0}, and let y = PC (x − αv−1 v − αξ ) − η. Then
12.3 An Auxiliary Result with Assumption A2
329
¯ −1 d(y, Cmin )2 ≤ d(x, Cmin )2 − α(4L) +2α 2 + η2 + 2η(K0 + K¯ + 2). Lemma 12.6 applied with = 16L¯ implies the next result. Lemma 12.7 Let K0 > 0, α ∈ (0, 1], a number δ ∈ (0, 1] satisfy δ(K0 + K¯ + 1) ≤ 2, let x ∈ X satisfy ¯ x ≤ K0 , f (x) > inf(f, C) + 16L, let η, ξ ∈ BX (0, δ), v ∈ ∂4 f (x) \ {0}, and let y = PC (x − αv−1 v − αξ ) − η. Then d(y, Cmin )2 ≤ d(x, Cmin )2 − 2α + 2η(K0 + K¯ + 3).
12.3 An Auxiliary Result with Assumption A2 Lemma 12.8 Assume that (A2) holds, K0 , > 0 and x¯ ∈ Cmin .
(12.35)
Then there exist a natural number m0 and δ0 ∈ (0, 1) such that for each integer n ≥ m0 and each finite sequence {xi }ni=0 ⊂ X satisfying xi ≤ K0 , i = 0, . . . , n
(12.36)
and BX (xi+1 , δ0 ) ∩ PC (BX (xi , δ0 )) = ∅, i = 0, . . . , n − 1 the inequality
(12.37)
330
12 A Projected Subgradient Method for Nonsmooth Problems
d(xi , C) ≤ holds for all integers i ∈ [m0 , n]. Proof Set K¯ 1 = 2K0 + K¯ + 5.
(12.38)
Assumption (A2) implies that there exists γ0 ∈ (0, 1) such that the following property holds: (ii) for each x ∈ X satisfying x ≤ K0 + 1 and d(x, C) ≥ /8 and each z ∈ BX (0, K¯ 1 ) ∩ C, we have PC (x) − z ≤ x − z − γ0 . Choose a natural number m0 > 2(K0 + K¯ + 1)γ0−1 + 2
(12.39)
δ0 < min{/8, 1, γ0 /4}.
(12.40)
and a positive number
Assume that an integer n ≥ m0 and that a finite sequence {xi }ni=0 ⊂ X satisfies (12.36) and (12.37) for all integers i = 0, . . . , n − 1. We show that there exists j ∈ {0, . . . , m0 } such that d(xj , C) ≤ /4. We may assume without loss of generality that d(x0 , C) > /4.
(12.41)
y0 ∈ BX (x0 , δ0 )
(12.42)
By (12.37), there exists
such that
12.3 An Auxiliary Result with Assumption A2
x1 − PC (y0 ) ≤ δ0 .
331
(12.43)
By (12.40)–(12.42), d(y0 , C) ≥ d(x0 , C) − δ0 > /4 − δ0 > /8.
(12.44)
In view of (12.36), (12.40), and (12.42), y0 ≤ x0 + δ0 ≤ K0 + 1.
(12.45)
Property (ii), (12.7), (12.8), (12.35), (12.38), (12.42), (12.44), and (12.45) imply that ¯ ≤ y0 − x ¯ − γ0 PC (y0 ) − x ≤ x0 − x ¯ + x0 − y0 − γ0 ≤ x0 − x ¯ + δ0 − γ0 .
(12.46)
By (12.40), (12.43), and (12.46), ¯ ≤ x1 − PC (y0 ) + PC (y0 ) − x ¯ x1 − x ¯ − γ0 + δ0 ≤ δ0 + x0 − x = x0 − x ¯ − γ0 + 2δ0 ≤ x0 − x ¯ − γ0 /2 and ¯ ≤ x0 − x ¯ − γ0 /2. x1 − x
(12.47)
We may assume without loss of generality that d(x1 , C) > /4.
(12.48)
Assume that p is a natural number such that for all i = 0, . . . , p − 1, ¯ ≤ xi − x ¯ − γ0 /2. xi+1 − x
(12.49)
(In view of (12.47), this assumption holds for p = 1.) It follows from (12.7), (12.8), (12.35), (12.36), and (12.49) that ¯ K0 + K¯ ≥ x0 − x ¯ − xp − x ¯ ≥ x0 − x
332
12 A Projected Subgradient Method for Nonsmooth Problems
=
p−1
(xi − x ¯ − xi+1 − x) ¯ ≥ (γ0 /2)p
i=0
and ¯ −1 . p ≤ 2(K0 + K)γ 0 Together with (12.39) this implies that p < m0 .
(12.50)
In view of (12.50), we may assume without loss of generality that p is the largest natural number such that (12.49) holds for i = 0, . . . , p − 1. Thus ¯ > xp − x ¯ − γ0 /2. xp+1 − x
(12.51)
d(xp , C) > /4.
(12.52)
yp ∈ BX (xp , δ0 )
(12.53)
xp+1 − PC (yp ) ≤ δ0 .
(12.54)
Assume that
By (12.37), there exists
such that
Relations (12.36), (12.40), and (12.53) imply that yp ≤ K0 + 1.
(12.55)
By (12.40), (12.52), and (12.53), d(yp , C) ≥ d(xp , C) − xp − yp > /4 − δ0 ≥ /8.
(12.56)
It follows from (12.7), (12.8), (12.35), (12.53), (12.55), and (12.56) that ¯ ≤ yp − x ¯ − γ0 PC (yp ) − x ≤ xp − x ¯ + yp − xp − γ0 ≤ xp − x ¯ − γ0 + δ0 .
(12.57)
12.3 An Auxiliary Result with Assumption A2
333
By (12.40), (12.54), and (12.57), ¯ ≤ xp+1 − PC (yp ) + PC (yp ) − x ¯ xp+1 − x ¯ − γ0 + 2δ0 ≤ xp − x ≤ xp − x ¯ − γ0 /2. This contradicts (12.51). The contradiction we have reached proves that d(xp , C) ≤ /4. Thus we proved the existence of an integer p ≥ 0 such that p < m0 ,
(12.58)
d(xp , C) ≤ /4.
(12.59)
We show that d(xj , C) ≤ for all integers j ∈ [p, n]. Assume that j ≥ p is an integer, j < n and that d(xj , C) ≤ .
(12.60)
d(xj , C) ≤ /4;
(12.61)
d(xj , C) > /4.
(12.62)
y ∈ BX (xj , δ0 )
(12.63)
xj +1 − PC (y) ≤ δ0 .
(12.64)
There are two cases:
By (12.37), there exists
such that
Assume that (12.61) holds. It follows from (12.3), (12.40), (12.61), (12.63), and (12.64) that d(xj +1 , C) ≤ xj +1 − PC (y) + d(PC (y), C)
334
12 A Projected Subgradient Method for Nonsmooth Problems
≤ δ0 + d(y, C) ≤ δ0 + d(xj , C) + y − xj ≤ 2δ0 + d(xj , C) ≤ /4 + 2δ0 ≤ .
(12.65)
Assume that (12.62) holds. By (12.7), (12.8), (12.35), (12.36), (12.40), and (12.63), d(y, C) ≤ y − x ¯ ≤ K0 + K¯ + 1.
(12.66)
In view of (12.40) and (12.66), there exists z ∈ C such that z − y ≤ d(y, C) + δ0 ≤ K0 + K¯ + 2.
(12.67)
By (12.36), (12.40), (12.63), and (12.67), z ≤ K0 + K¯ + 2 + y ≤ K0 + K¯ + 2 + xj + y − xj ≤ 2K0 + K¯ + 3.
(12.68)
It follows from (12.36), (12.40), and (12.63) that y ≤ xj + 1 ≤ K0 + 1.
(12.69)
By (12.40), (12.62), and (12.63), d(y, C) ≥ d(xj , C) − xj − y ≥ /4 − δ0 > /8.
(12.70)
Property (ii), the inclusion z ∈ C, (12.38), and (12.68)–(12.70) imply that PC (y) − z ≤ y − z − γ0 .
(12.71)
The inclusion z ∈ C, (12.40), (12.63)–(12.65), (12.67), and (12.71) imply that d(xj +1 , C) ≤ xj +1 − z ≤ PC (y) − z + xj +1 − PC (y) ≤ y − z − γ0 + δ0 ≤ d(y, C) + δ0 − γ0 + δ0 ≤ d(xj , C) + y − xj + δ0 − γ0 + δ0 ≤ d(xj , C) + 2δ0 − γ0 + δ0 ≤ d(xj , C) − γ0 /4 ≤ .
12.4 An Auxiliary Result with Assumption A3
335
Thus in both cases d(xj +1 , C) ≤ . Therefore d(xi , C) ≤ for all i = j, . . . , n. Lemma 12.8 is proved.
12.4 An Auxiliary Result with Assumption A3 Lemma 12.9 Assume that (A3) holds, K0 > 0, ∈ (0, 1), and x¯ ∈ Cmin .
(12.72)
Then there exist a natural number m0 and δ0 ∈ (0, 1) such that for each integer n ≥ m0 and each finite sequence {xi }ni=0 ⊂ X satisfying xi ≤ K0 , i = 0, . . . , n
(12.73)
and BX (xi+1 , δ0 ) ∩ PC (BX (xi , δ0 )) = ∅, i = 0, . . . , n − 1
(12.74)
the inequality d(xi , C) ≤ holds for all integers i ∈ [m0 , n]. Proof Assumption (A3) implies that for every x ∈ X there exists lim PCi (x) ∈ C.
(12.75)
Q(x) = lim PCi (x), x ∈ X.
(12.76)
i→∞
Set i→∞
(A3) implies that there exists a natural number m0 such that P m0 (y) − Q(y) ≤ /4 for all x ∈ BX (0, K0 + K¯ + 4).
(12.77)
Set δm0 = /4.
(12.78)
336
12 A Projected Subgradient Method for Nonsmooth Problems
In view of (A3), there exists δm0 −1 ∈ (0, δm0 /4) such that PC (z1 ) − PC (z2 ) ≤ δm0 /4 for all z1 , z2 ∈ BX (0, K0 + 1) satisfying z1 − z2 ≤ 4δm0 −1 . 0 By induction we construct a finite sequence {δi }m i=0 ⊂ (0, ∞) such that for each integer i ∈ [0, m0 − 1] we have
δi < δi+1 /4
(12.79)
and for each z1 , z2 ∈ BX (0, K0 + 1) satisfying z1 − z2 ≤ 4δi
(12.80)
PC (z1 ) − PC (z2 ) ≤ δi+1 /4.
(12.81)
we have
Assume that n ≥ m0 is an integer, {xi }ni=0 ⊂ BX (0, K0 )
(12.82)
and that (12.74) holds for each integer i ∈ {0, . . . , n − 1}. Let k ∈ [m0 , n] be an integer. In order to complete the proof it is sufficient to show that d(xk , C) ≤ . By (12.75)–(12.77) and (12.82), P m0 (xk−m0 ) − Q(xk−m0 ) ≤ /4 and d(P m0 (xk−m0 ), C) ≤ /4. In view of (12.74), there exists
(12.83)
12.4 An Auxiliary Result with Assumption A3
337
yk−m0 ∈ BX (xk−m0 , δ0 )
(12.84)
PC (yk−m0 ) − xk−m0 +1 ≤ δ0 .
(12.85)
such that
By (12.80)–(12.82) and (12.84), PC (yk−m0 ) − PC (xk−m0 ) ≤ δ1 /4.
(12.86)
It follows from (12.79), (12.85), and (12.86) that xk−m0 +1 − PC (xk−m0 ) ≤ δ1 /2.
(12.87)
We show that for all i = 1, . . . , m0 , xi+k−m0 − PCi (xk−m0 ) ≤ δi /2.
(12.88)
(Note that in view of (12.87), inequality (12.88) holds for i = 1.) Assume that i ∈ {1, . . . , m0 } \ {m0 } and that (12.88) holds. By (12.74), there exists yi+k−m0 ∈ BX (xi+k−m0 , δ0 )
(12.89)
PC (yi+k−m0 ) − xi+k−m0 +1 ≤ δ0 .
(12.90)
such that
In view of (12.88) and (12.89), yi+k−m0 − PCi (xk−m0 ) ≤ δi .
(12.91)
It follows from (12.80)–(12.82), (12.88), (12.89), and (12.91) that PC (yi+k−m0 ) − PCi+1 (xk−m0 ) ≤ δi+1 /4.
(12.92)
It follows from (12.79), (12.90), and (12.92) that xi+k−m0 +1 − PCi+1 (xk−m0 ) ≤ δi+1 /2.
(12.93)
Thus our assumption holds for i + 1 too. Therefore we showed by induction that xi+k−m0 − PCi (xk−m0 ) ≤ δi /2 for all i = 1, . . . , m0 and in particular (see (12.78))
338
12 A Projected Subgradient Method for Nonsmooth Problems
xk − PCm0 (xk−m0 ) ≤ δm0 = /4. Together with (12.83) this implies that d(xk , C) ≤ /2.
Lemma 12.9 is proved.
12.5 Proof of Theorem 12.1 We may assume without loss of generality that < 1. In view of Proposition 12.3, there exists a number ¯ ∈ (0, min{/8, 1/8})
(12.94)
such that the following property holds: (iii) if x ∈ X, d(x, C) ≤ 2¯ and f (x) ≤ inf(f, C) + 2¯ , then d(x, Cmin ) ≤ . In view of (12.1), we may assume without loss of generality that M > 4K¯ + 8, ¯ ⊂ BX (0, 2−1 M − 1). {x ∈ X : f (x) ≤ inf(f, C) + 16L}
(12.95) (12.96)
Fix ¯ −1 ). ¯ L) ¯1 ∈ (0, (64
(12.97)
Lemmas 12.8 and 12.9 imply that there exist δ1 ∈ (0, 1) and a natural number m0 such that the following property holds: (iv) for each integer n ≥ m0 and each finite sequence {yi }ni=0 ⊂ BX (0, 3M) satisfying BX (yi+1 , δ1 ) ∩ PC (BX (yi , δ1 )) = ∅, i = 0, . . . , n − 1 the inequality d(yi , C) ≤ ¯1
12.5 Proof of Theorem 12.1
339
holds for all integers i ∈ [m0 , n]. Since limi→∞ αi = 0 (see (12.6)) there is an integer p0 > 0 such that for all integers i ≥ p0 , we have αi ≤ δ1 /2.
(12.98)
x¯ ∈ Cmin .
(12.99)
Fix
Since limi→∞ αi = 0 (see (12.6)) there is an integer p 1 > p0
(12.100)
such that for all integers i ≥ p1 , we have ¯ −1 16−1 ¯1 . αi < (32L) Since
∞
i=0 αi
(12.101)
= ∞ (see (12.6)) there exists a natural number n0 > p0 + p1 + 4 + m0
(12.102)
such that n 0 −1
¯ 2 128¯ −1 L¯ + 1. αi > (3M + K)
(12.103)
i=p0 +p1 +m0
Fix a positive number δ such that ¯ −1 ¯1 . δ(3M + K¯ + 3) < 8−1 (64L)
(12.104)
Assume that an integer n ≥ n0 and that {xk }nk=0 ⊂ X, x0 ≤ M,
(12.105)
n−1 {ηk }n−1 k=0 , {ξk }k=0 ⊂ BX (0, δ),
(12.106)
vk ∈ ∂δ f (xk ) \ {0}, k = 0, 1, . . . , n − 1
(12.107)
and that for all integers k = 0, . . . , n − 1, we have xk+1 = PC (xk − αk vk −1 vk − αk ξk ) − αk ηk . In order to prove the theorem it is sufficient to show that
(12.108)
340
12 A Projected Subgradient Method for Nonsmooth Problems
d(xk , Cmin ) ≤ for all integers k satisfying n0 ≤ k ≤ n. First we show that for all integers i = 0, . . . , n, d(xi , Cmin ) ≤ 2M.
(12.109)
In view of (12.105), inequality (12.109) holds for i = 0. Assume that i ∈ {0, . . . , n} \ {n} and that (12.109) is true. There are two cases: ¯ f (xi ) ≤ inf(f, C) + 16L;
(12.110)
¯ f (xi ) > inf(f, C) + 16L.
(12.111)
Assume that (12.110) holds. In view of (12.96) and (12.110), xi ≤ M/2 − 1.
(12.112)
¯ x ¯ ≤ K.
(12.113)
By (12.7), (12.8), and (12.99),
It follows from (12.112) and (12.113) that ¯ ≤ K¯ + M/2. xi − x
(12.114)
By (12.3), (12.95), (12.99), (12.104), (12.106), (12.108), and (12.114), ¯ xi+1 − x ≤ αi ηi + x¯ − PC (xi − αi vi −1 vi − αi ξi ) ≤ αi δ + x¯ − (xi − αi vi −1 vi − αi ξi ) ≤ αi δ + x¯ − xi + αi + αi δ ≤ x¯ − xi + 3 ≤ K¯ + M/2 + 3 < 2M and d(xi+1 , Cmin ) ≤ 2M.
(12.115)
12.5 Proof of Theorem 12.1
341
Assume that (12.111) holds. It follows from (12.104), (12.106), (12.107), (12.113), Lemma 12.7 applied with K0 = 2M, α = αi , x = xi , ξ = ξi , v = vi , y = xi+1 , η = αi ηi , (12.95), and (12.109) that d(xi+1 , Cmin )2 ≤ d(xi , Cmin )2 − 2αi + αi δ(4M) ≤ d(xi , Cmin )2 ≤ 4M 2 . Thus in both cases d(xi+1 , Cmin ) ≤ 2M. Thus we have shown by induction that for all integers i = 0, . . . , n, d(xi , Cmin ) ≤ 2M. Together with (12.7), (12.8), and (12.95) this implies that xi ≤ 3M, i = 0, . . . , n.
(12.116)
yi = xi+p0 , i = 0, . . . , n − p0 .
(12.117)
Set
By (12.116)–(12.118), n−p
{yi }i=0 0 ⊂ BX (0, 3M).
(12.118)
n − p0 > m 0 .
(12.119)
In view of (12.102),
It follows from (12.98), (12.104), (12.106), (12.108), and (12.117) that for all i = 0, . . . , n − p0 − 1, yi − (yi − αi+p0 vi+p0 −1 vi+p0 − αi+p0 ξi+p0 ) = xi+p0 − (xi+p0 − αi+p0 vi+p0 −1 vi+p0 − αi+p0 ξi+p0 ) ≤ 2αi+p0 ≤ δ1 ,
342
12 A Projected Subgradient Method for Nonsmooth Problems
yi+1 − PC (yi − αi+p0 vi+p0 −1 vi+p0 − αi+p0 ξi+p0 ) ≤ αi+p0 ≤ δ1 and BX (yi+1 , δ1 ) ∩ PC (BX (yi , δ1 )) = ∅.
(12.120)
It follows from (12.118) to (12.120) that d(yi , C) ≤ ¯1 , i = m0 , . . . , n − p0 . Together with (12.96) and (12.117) this implies that d(xi , C) ≤ ¯1 < ¯ , i = m0 + p0 , . . . , n.
(12.121)
Assume that an integer k ∈ [p0 + p1 + m0 , n − 1],
(12.122)
f (xk ) > inf(f, C) + ¯ /8.
(12.123)
It follows from (12.94), (12.97), (12.99), (12.106), (12.114), (12.116), Lemma 12.5 applied with K0 = 3M, = ¯ /8, α = αk , x = xk , ξ = ξk , v = vk , y = xk+1 , η = αk ηk , (12.96), (12.101), (12.104), and (12.122) that ¯ −1 ¯ /8 ¯ 2 ≤ xk − x ¯ 2 − αk (4L) xk+1 − x +2αk2 + αk2 ηk 2 + 2ηk αk (3M + K¯ + 2) ¯ −1 ¯ ≤ xk − x ¯ 2 − αk (32L) +2αk2 + αk2 δ 2 + 2δαk (3M + K¯ + 2) ¯ −1 ¯ + 2δαk (3M + K¯ + 3) ≤ xk − x ¯ 2 − αk (64L) ¯ −1 ¯ . ¯ 2 − αk (128L) ≤ xk − x Thus we have shown that the following property holds:
12.5 Proof of Theorem 12.1
343
(v) if an integer k satisfies (12.122) and (12.123), then we have ¯ −1 αk ¯ . ¯ 2 ≤ xk − x ¯ 2 − (128L) xk+1 − x We claim that there exists an integer j ∈ {p0 + p1 + m0 , . . . , n0 } such that f (xj ) ≤ inf(f, C) + ¯ /8. Assume the contrary. Then f (xj ) > inf(f, C) + ¯ /8, i = p0 + p1 + m0 , . . . , n0 .
(12.124)
Property (v) and (12.124) imply that ¯ −1 αi ¯ , ¯ 2 ≤ xi − x ¯ 2 − (128L) xi+1 − x i = p0 + p1 + m0 , . . . , n0 − 1.
(12.125)
Relations (12.7), (12.8), (12.99), (12.116), and (12.125) imply that ¯ 2 ≥ xp0 +p1 +m0 − x ¯ 2 (3M + K) ≥ xp0 +p1 +m0 − x ¯ 2 − xn0 − x ¯ 2 =
n 0 −1
[xi − x ¯ 2 − xi+1 − x ¯ 2]
i=p0 +p1 +m0
¯ −1 ¯ ≥ (128L)
n 0 −1
αi .
(12.126)
i=p0 +p1 +m0
In view of (12.126), n 0 −1
¯ 2 L¯ ¯ −1 . αi ≤ 128(3M + K)
i=p0 +p1 +m0
This contradicts (12.103). The contradiction we have reached proves that there exists an integer j ∈ {p0 + p1 + m0 , . . . , n0 } such that
(12.127)
344
12 A Projected Subgradient Method for Nonsmooth Problems
f (xj ) ≤ inf(f, C) + ¯ /8.
(12.128)
By (12.121), (12.127), and (12.128), we have d(xj , Cmin ) ≤ .
(12.129)
We claim that for all integers i satisfying j ≤ i ≤ n, d(xi , Cmin ) ≤ . Assume the contrary. Then there exists an integer k ∈ [j, n] for which d(xk , Cmin ) > .
(12.130)
By (12.127), (12.129), and (12.130), we have k > j ≥ p0 + p1 + m0 .
(12.131)
We may assume without loss of generality that d(xi , Cmin ) ≤ for all integers i satisfying j ≤ i < k. Thus d(xk−1 , Cmin ) ≤ .
(12.132)
f (xk−1 ) ≤ inf(f, C) + ¯ /8;
(12.133)
f (xk−1 ) > inf(f, C) + ¯ /8.
(12.134)
There are two cases:
Assume that (12.133) is valid. It follows from (12.121) and (12.131) that d(xk−1 , C) ≤ ¯1 .
(12.135)
z∈C
(12.136)
xk−1 − z < 2¯1 .
(12.137)
By (12.135), there exists a point
such that
12.5 Proof of Theorem 12.1
345
By (12.3), (12.101), (12.104), (12.106), (12.108), (12.131), (12.136), and (12.137), xk − z ≤ αk−1 δ +z − PC (xk−1 − αk−1 vk−1 −1 vk−1 − αk−1 ξk−1 ) ≤ δ + z − xk−1 + αk−1 + δ ≤ 2¯1 + 2δ + αk−1 < 3¯1 .
(12.138)
In view of (12.137) and (12.138), xk − xk−1 ≤ xk − z + z − xk−1 < 5¯1 .
(12.139)
It follows from (12.7), (12.8), (12.94), (12.132), and (12.139) that xk−1 ≤ K¯ + 2, xk ≤ xk−1 + 5¯1 ≤ K¯ + 3 and xk−1 , xk ≤ K¯ + 4. Combined with (12.9) and (12.139) the relation above implies that ¯ k−1 − xk ≤ 5L¯ ¯1 . |f (xk−1 ) − f (xk )| ≤ Lx Together with (12.97) and (12.133) this implies that ¯ 1 f (xk ) ≤ f (xk−1 ) + 5L¯ ≤ inf(f, C) + /8 ¯ + 5L¯ ¯1 ≤ inf(f, C) + /4. ¯
(12.140)
Property (iii), (12.96), (12.136), (12.138), and (12.140) imply that d(xk , Cmin ) ≤ . This inequality contradicts (12.130). The contradiction we have reached proves (12.134). It follows from (12.96), (12.104), (12.106), (12.116), and (12.134) that Lemma 12.6 holds with K0 = 3M, = ¯ /8, x = xk−1 , y = xk , ξ = ξk−1 , v = vk−1 , α = αk−1 , η = αk−1 ηk−1 .
346
12 A Projected Subgradient Method for Nonsmooth Problems
Combined with (12.101), (12.104), (12.106), (12.131), and (12.132) this implies that ¯ −1 ¯ /8 d(xk , Cmin )2 ≤ d(xk−1 , Cmin )2 − αk−1 (4L) 2 2 + δ 2 αk−1 + 2αk−1 δ(3M + K¯ + 2) +2αk−1 2 ¯ −1 αk−1 ¯ + 4αk−1 ≤ d(xk−1 , Cmin )2 − (32L) + 2αk−1 δ(3M + K¯ + 2)
¯ −1 αk−1 ¯ + 2δαk−1 (3M + K¯ + 2) ≤ d(xk−1 , Cmin )2 − (64L) ¯ −1 αk−1 ¯ ≤ d(xk−1 , Cmin )2 − (128L) and d(xk , Cmin ) ≤ d(xk−1 , Cmin ) ≤ . This contradicts (12.130). The contradiction we have reached proves that d(xi , Cmin ) ≤ for all integers i satisfying j ≤ i ≤ n. This completes the proof of Theorem 12.1.
12.6 Proof of Theorem 12.2 In view of (12.1), we may assume without loss of generality that < 1, M > 8K¯ + 8
(12.141)
and that ¯ ⊂ BX (0, 2−1 M − 1). {x ∈ X : f (x) ≤ inf(f, C) + 16L}
(12.142)
Proposition 12.3 implies that there exists a number ¯ ∈ (0, /8) such that the following property holds: (vi) if x ∈ X, d(x, C) ≤ 2¯ and f (x) ≤ inf(f, C) + 2¯ , then d(x, Cmin ) ≤ /4.
(12.143)
12.6 Proof of Theorem 12.2
347
Fix x¯ ∈ Cmin
(12.144)
¯ −1 ). ¯ L) ¯1 ∈ (0, (64
(12.145)
and
Lemmas 2.8 and 2.9 imply that there exist δ1 ∈ (0, 1) and a natural number m0 such that the following property holds: (vii) for each integer n ≥ m0 and each finite sequence {yi }ni=0 ⊂ BX (0, 3M) satisfying BX (yi+1 , δ1 ) ∩ PC (BX (yi , δ1 )) = ∅, i = 0, . . . , n − 1 the inequality d(yi , C) ≤ ¯1 holds for all integers i ∈ [m0 , n]. Choose a positive number β0 such that ¯ −1 ¯1 . β0 ≤ δ1 /2, 2β0 < (34L)
(12.146)
β1 ∈ (0, β0 ).
(12.147)
¯ −1 β −1 . n0 > m0 + 322 M 2 L¯ 1
(12.148)
Let
Fix a natural number n0 such that
Fix positive number δ such that δ(3M + K¯ + 3) ≤ (128)−1 ¯1 β1 .
(12.149)
Assume that an integer n ≥ n0 , {xk }nk=0 ⊂ X, x0 ≤ M, vk ∈ ∂δ f (xk ) \ {0}, k = 0, 1, . . . , n − 1 {αk }n−1 k=0 ⊂ [β1 , β0 ],
(12.150) (12.151) (12.152)
348
12 A Projected Subgradient Method for Nonsmooth Problems n−1 {ηk }n−1 k=0 , {ξk }k=0 ⊂ BX (0, δ)
(12.153)
and that for all integers k = 0, . . . , n − 1, xk+1 = PC (xk − αk vk −1 vk − αk ξk ) − ηk .
(12.154)
In order to complete the proof it is sufficient to show that d(xk , Cmin ) ≤ for all integers k satisfying n0 ≤ k ≤ n. First we show that for all integers i = 0, . . . , n, d(xi , Cmin ) ≤ 2M.
(12.155)
In view of (12.7), (12.8), (12.141), and (12.150), inequality (12.155) holds for i = 0. Assume that i ∈ {0, . . . , n} \ {n} and that (12.155) is true. There are two cases: ¯ f (xi ) ≤ inf(f, C) + 16L;
(12.156)
¯ f (xi ) > inf(f, C) + 16L.
(12.157)
Assume that (12.156) holds. In view of (12.142) and (12.156), xi ≤ M/2 − 1.
(12.158)
¯ x ¯ ≤ K.
(12.159)
By (12.7), (12.8), and (12.144),
It follows from (12.158) and (12.159) that ¯ ≤ K¯ + M/2. xi − x
(12.160)
By (12.3), (12.141), (12.144), (12.146), (12.149), (12.152)–(12.154), and (12.160), ¯ d(xi+1 , Cmin ) ≤ xi+1 − x ≤ ηi + x¯ − PC (xi − αi vi −1 vi − αi ξi )
12.6 Proof of Theorem 12.2
349
≤ δ + x¯ − xi + αi + αi δ ≤ 2β0 + δ + K¯ + M/2 ≤ K¯ + M/2 + 3 ≤ M.
(12.161)
Assume that (12.157) holds. In view of (12.141), (12.144), and (12.155), xi ≤ 3M. It follows from (12.141), (12.146), (12.147), (12.149), (12.151), (12.152), (12.157), and Lemma 12.7 applied with K0 = 3M, α = αi , x = xi , ξ = ξi , v = vi , y = xi+1 , η = ηi , that d(xi+1 , Cmin )2 ≤ d(xi , Cmin )2 − 2αi + 2ηi (3M + K¯ + 3) ≤ d(xi , Cmin )2 − 2β1 + 8δM ≤ d(xi , Cmin )2 − β1 and in view of (12.155), d(xi+1 , Cmin ) ≤ d(xi , Cmin ) ≤ 2M. Thus in both cases d(xi+1 , Cmin ) ≤ 2M. Thus we have shown by induction that for all integers i = 0, . . . , n, d(xi , Cmin ) ≤ 2M.
(12.162)
By (12.7), (12.8), (12.141), and (12.162), xi ≤ 3M, i = 0, . . . , n. It follows from (12.149), (12.152), and (12.153), for all i = 0, . . . , n − 1, xi − (xi − αi vi −1 vi − αi ξi ) ≤ αi + αi δ ≤ 2αi ≤ 2β0 ≤ δ1 , xi+1 − PC (xi − αi vi −1 vi − αi ξi )
(12.163)
350
12 A Projected Subgradient Method for Nonsmooth Problems
≤ ηi ≤ δ ≤ β0 < δ1 and BX (xi+1 , δ1 ) ∩ PC (BX (xi , δ1 )) = ∅.
(12.164)
Property (vii), (12.148), (12.155), (12.163), and (12.164) imply that ¯ i = m0 , . . . , n. d(xi , C) ≤ ¯1 < ,
(12.165)
Assume that an integer k ∈ [m0 , n − 1], f (xk ) > inf(f, C) + ¯ /8.
(12.166)
It follows from (12.43)–(12.146), (12.149), (12.151), (12.154), (12.163), and Lemma 12.5 applied with K0 = 3M, = ¯ /4, α = αk , x = xk , ξ = ξk , v = vk , y = xk+1 , η = ηk that ¯ 2 xk+1 − x ¯ −1 ¯ /4 + 2αk2 ≤ xk − x ¯ 2 − αk (4L) +ηk 2 + 2ηk (3M + K¯ + 2) ¯ −1 ¯ ¯ 2 − αk (16L) ≤ xk − x +2αk2 + δ 2 + 2δ(3M + K¯ + 2) ¯ −1 ¯ + 2δ(3M + K¯ + 3) ≤ xk − x ¯ 2 − αk (32L) ¯ −1 ¯ + 2δ(3M + K¯ + 3) ¯ 2 − β1 (32L) ≤ xk − x ¯ −1 ¯ . ¯ 2 − β1 (64L) ≤ xk − x Thus we have shown that the following property holds: (viii) if an integer k ∈ [m0 , n − 1] satisfies f (xk ) > inf(f, C) + ¯ /4,
12.6 Proof of Theorem 12.2
351
then ¯ −1 ¯ . ¯ 2 ≤ xk − x ¯ 2 − β1 (64L) xk+1 − x We claim that there exists an integer j ∈ {m0 , . . . , n0 } such that f (xj ) ≤ inf(f, C) + ¯ /4. Assume the contrary. Then f (xj ) > inf(f, C) + ¯ /4, j = m0 , . . . , n0 .
(12.167)
Property (viii) and (12.167) imply that for all k ∈ {m0 , . . . , n0 − 1}, ¯ −1 ¯ . ¯ 2 ≤ xk − x ¯ 2 − β1 (64L) xk+1 − x
(12.168)
Relations (12.7), (12.8), (12.141), (12.144), (12.163), and (12.168) imply that ¯ 2 (4M)2 ≥ xm0 − x ≥ xm0 − x ¯ 2 − xn0 − x ¯ 2 =
n 0 −1
[xi − x ¯ 2 − xi+1 − x ¯ 2]
i=m0
¯ −1 ¯ ≥ (n0 − m0 )β1 (64L) and ¯ −1 β −1 . n0 − m0 ≤ 322 M 2 L¯ 1 This contradicts (12.148). The contradiction we have reached proves that there exists an integer j ∈ {m0 , . . . , n0 }
(12.169)
f (xj ) ≤ inf(f, C) + ¯ /4.
(12.170)
such that
Property (vi), (12.145), and (12.170) imply that d(xj , Cmin ) ≤ /4.
(12.171)
352
12 A Projected Subgradient Method for Nonsmooth Problems
We claim that for all integers i satisfying j ≤ i ≤ n, d(xi , Cmin ) ≤ . Assume the contrary. Then there exists an integer k ∈ [j, n] for which d(xk , Cmin ) > .
(12.172)
By (12.171) and (12.172), we have k > j. We may assume without loss of generality that d(xi , Cmin ) ≤ for all integers i satisfying j ≤ i < k. Thus d(xk−1 , Cmin ) ≤ .
(12.173)
f (xk−1 ) ≤ inf(f, C) + ¯ /8;
(12.174)
f (xk−1 ) > inf(f, C) + ¯ /8.
(12.175)
There are two cases:
Assume that (12.174) is valid. It follows from (12.165) and (12.169) that d(xk−1 , C) ≤ ¯1 .
(12.176)
z∈C
(12.177)
xk−1 − z < 2¯1 .
(12.178)
By (12.176), there exists a point
such that
By (12.3), (12.146), (12.149), and (12.152)–(12.154), xk − z ≤ δ +z − PC (xk−1 − αk−1 vk−1 −1 vk−1 − αk−1 ξk−1 ) ≤ δ + z − xk−1 + 2αk−1
12.6 Proof of Theorem 12.2
353
≤ 2¯1 + δ + 2β0 ≤ 3¯1 .
(12.179)
In view of (12.179), xk − xk−1 ≤ xk − z + z − xk−1 < 5¯1 .
(12.180)
It follows from (12.7), (12.8), and (12.173) that xk−1 ≤ K¯ + 1.
(12.181)
By (12.144), (12.180), and (12.181), xk ≤ xk−1 + 5¯1 ≤ K¯ + 4.
(12.182)
Relations (12.9) and (12.80)–(12.82) imply that ¯ k−1 − xk ≤ 5L¯ ¯1 . |f (xk−1 ) − f (xk )| ≤ Lx Together with (12.145) and (12.174) this implies that ¯ 1 f (xk ) ≤ f (xk−1 ) + 5L¯ ≤ inf(f, C) + /8 ¯ + 5L¯ ¯1 ≤ inf(f, C) + /4. ¯
(12.183)
By (12.145), (12.176), and (12.180), d(xk , C) ≤ 6¯1 < ¯ .
(12.184)
Property (vi), (12.183), and (12.184) imply that d(xk , Cmin ) ≤ . This inequality contradicts (12.172). The contradiction we have reached proves (12.175). It follows from (12.145), (12.149), (12.151)–(12.154), (12.163), (12.165), (12.169), and (12.175) that Lemma 12.6 holds with K0 = 3M, = ¯ /8, x = xk−1 , y = xk , ξ = ξk−1 , v = vk−1 , α = αk−1 , η = ηk−1 . Combined with (12.145), (12.146), and (12.152) this implies that ¯ −1 ¯ /8 d(xk , Cmin )2 ≤ d(xk−1 , Cmin )2 − αk−1 (4L)
354
12 A Projected Subgradient Method for Nonsmooth Problems 2 +2αk−1 + η2 + 2ηk−1 (3M + K¯ + 2)
¯ −1 ¯ − 2β0 ) + δ 2 + 2δ(3M + K¯ + 2) ≤ d(xk−1 , Cmin )2 − αk−1 ((32L) ¯ −1 ¯ + 2δ(3M + K¯ + 3) ≤ d(xk−1 , Cmin )2 − αk−1 (64L) ¯ −1 ¯ + 2δ(3M + K¯ + 3) ≤ d(xk−1 , Cmin )2 − β1 (64L) ¯ −1 ¯ . ≤ d(xk−1 , Cmin )2 − β1 (128L) In view of (12.173), d(xk , Cmin ) ≤ d(xk−1 , Cmin ) ≤ . This contradicts (12.172). The contradiction we have reached proves that d(xi , Cmin ) ≤ for all integers i satisfying j ≤ i ≤ n. This completes the proof of Theorem 12.2.
References
1. Alber YI (1971) On minimization of smooth functional by gradient methods. USSR Comp Math Math Phys 11:752–758 2. Alber YI, Iusem AN, Solodov MV (1997) Minimization of nonsmooth convex functionals in Banach spaces. J Convex Anal 4:235–255 3. Alber YI, Iusem AN, Solodov MV (1998) On the projected subgradient method for nonsmooth convex optimization in a Hilbert space. Math Program 81:23–35 4. Alber YI, Yao JC (2009) Another version of the proximal point algorithm in a Banach space. Nonlinear Anal 70:3159–3171 5. Alvarez F, Lopez J, Ramirez CH (2010) Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optim Methods Softw 25:859–881 6. Antipin AS (1994) Minimization of convex functions on convex sets by means of differential equations. Differ Equ 30:1365–1375 7. Aragon Artacho FJ, Geoffroy MH (2007) Uniformity and inexact version of a proximal method for metrically regular mappings. J Math Anal Appl 335:168–183 8. Attouch H, Bolte J (2009) On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math Program Ser B 116:5–16 9. Baillon JB (1978) Un Exemple Concernant le Comportement Asymptotique de la Solution du Probleme 0 ∈ du/dt + ∂φ(u). J Funct Anal 28:369–376 10. Barbu V, Precupanu T (2012) Convexity and optimization in Banach spaces. Springer Heidelberg, London, New York 11. Barty K, Roy J-S, Strugarek C (2007) Hilbert-valued perturbed subgradient algorithms. Math Oper Res 32:551–562 12. Bauschke HH, Borwein JM (1996) On projection algorithms for solving convex feasibility problems. SIAM Rev 38:367–426 13. Bauschke HH, and Combettes PL (2011) Convex analysis and monotone operator theory in Hilbert spaces. Springer, New York 14. Bauschke HH, Goebel R, Lucet Y, Wang X (2008) The proximal average: basic theory. SIAM J Optim 19:766–785 15. Bauschke H, Wang C, Wang X, Xu J (2015) On subgradient projectors. SIAM J Optim 25:1064–1082 16. Beck A, Pauwels E, Sabach S (2018) Primal and dual predicted decrease approximation methods. Math Program 167:37–73 17. Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper Res Lett 31:167–175
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6
355
356
References
18. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2:183–202 19. Benker H, Hamel A, Tammer C (1996) A proximal point algorithm for control approximation problems, I. Theoretical background. Math Methods Oper Res 43:261–280 20. Bolte J (2003) Continuous gradient projection method in Hilbert spaces. J Optim Theory Appl 119:235–259 21. Brezis H (1973) Opérateurs maximaux monotones. North Holland, Amsterdam 22. Burachik RS, Grana Drummond LM, Iusem AN, Svaiter BF (1995) Full convergence of the steepest descent method with inexact line searches. Optimization 32:137–146 23. Burachik RS, Iusem AN (1998) A generalized proximal point algorithm for the variational inequality problem in a Hilbert space. SIAM J Optim 8:197–216 24. Burachik RS, Kaya CY, Sabach S (2012) A generalized univariate Newton method motivated by proximal regularization. J Optim Theory Appl 155:923–940 25. Burachik RS, Lopes JO, Da Silva GJP (2009) An inexact interior point proximal method for the variational inequality problem. Comput Appl Math 28:15–36 26. Butnariu D, Kassay G (2008) A proximal-projection method for finding zeros of set-valued operators. SIAM J Control Optim 47:2096–2136 27. Butnariu D, Resmerita E (2002) Averaged subgradient methods for constrained convex optimization and Nash equilibria computation. Optimization 51:863–888 28. Ceng LC, Mordukhovich BS, Yao JC (2010) Hybrid approximate proximal method with auxiliary variational inequality for vector optimization. J Optim Theory Appl 146:267–303 29. Censor Y, Gibali A, Reich S (2011) The subgradient extragradient method for solving variational inequalities in Hilbert space. J Optim Theory Appl 148:318–335 30. Censor Y, Gibali A, Reich S, Sabach S (2012) Common solutions to variational inequalities. Set-Valued Var Anal 20:229–247 31. Censor Y, Zenios SA (1992) The proximal minimization algorithm with D-functions. J Optim Theory Appl 73:451–464 32. Chadli O, Konnov IV, Yao JC (2004) Descent methods for equilibrium problems in a Banach space. Comput Math Appl 48:609–616 33. Chen Z, Zhao K (2009) A proximal-type method for convex vector optimization problem in Banach spaces. Numer Funct Anal Optim 30:70–81 34. Chuong TD, Mordukhovich BS, Yao JC (2011) Hybrid approximate proximal algorithms for efficient solutions in for vector optimization. J Nonlinear Convex Anal 12:861–864 35. Davis D, Drusvyatskiy D, MacPhee KJ, Paquette C (2018) Subgradient methods for sharp weakly convex functions. J Optim Theory Appl 179:962–982 36. Demyanov VF, Vasilyev LV (1985) Nondifferentiable optimization. Optimization Software, New York 37. Drori Y, Sabach S, Teboulle M (2015) A simple algorithm for a class of nonsmooth convexconcave saddle-point problems. Oper Res Lett 43:209–214 38. Facchinei F, Pang J-S (2003) Finite-dimensional variational inequalities and complementarity problems, volume I and volume II. Springer-Verlag, New York 39. Gibali A, Jadamba B, Khan AA, Raciti F, Winkler B (2016) Gradient and extragradient methods for the elasticity imaging inverse problem using an equation error formulation: a comparative numerical study. Nonlinear Anal Optim Contemp Math 659:65–89 40. Gockenbach MS, Jadamba B, Khan AA, Tammer Chr, Winkler B (2015) Proximal methods for the elastography inverse problem of tumor identification using an equation error approach. Adv Var Hemivariational Inequal 33:173–197 41. Gopfert A, Tammer Chr, Riahi, H (1999) Existence and proximal point algorithms for nonlinear monotone complementarity problems. Optimization 45:57–68 42. Grecksch W, Heyde F, Tammer Chr (2000) Proximal point algorithm for an approximated stochastic optimal control problem. Monte Carlo Methods Appl 6:175–189 43. Griva I (2018) Convergence analysis of augmented Lagrangian-fast projected gradient method for convex quadratic problems. Pure Appl Funct Anal 3:417–428
References
357
44. Griva I, Polyak R (2011) Proximal point nonlinear rescaling method for convex optimization. Numer Algebra Control Optim 1:283–299 45. Hager WW, Zhang H (2008) Self-adaptive inexact proximal point methods. Comput Optim Appl 39:161–181 46. Hiriart-Urruty J-B, Lemarechal C (1993) Convex analysis and minimization algorithms. Springer, Berlin 47. Iusem A, Nasri M (2007) Inexact proximal point methods for equilibrium problems in Banach spaces. Numer Funct Anal Optim 28:1279–1308 48. Iusem A, Resmerita E (2010) A proximal point method in nonreflexive Banach spaces. SetValued Var Anal 18:109–120 49. Kaplan A, Tichatschke R (1998) Proximal point methods and nonconvex optimization. J Global Optim 13:389–406 50. Kaplan A, Tichatschke R (2007) Bregman-like functions and proximal methods for variational problems with nonlinear constraints. Optimization 56:253–265 51. Kassay G (1985) The proximal points algorithm for reflexive Banach spaces. Studia Univ Babes-Bolyai Math 30:9–17 52. Kiwiel KC (1996) Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J Optim 6:227–249 53. Konnov IV (2003) On convergence properties of a subgradient method. Optim Methods Softw 18:53–62 54. Konnov IV (2009) A descent method with inexact linear search for mixed variational inequalities. Russian Math (Iz VUZ) 53:29–35 55. Konnov IV (2018) Simplified versions of the conditional gradient method. Optimization 67:2275–2290 56. Korpelevich GM (1976) The extragradient method for finding saddle points and other problems. Ekon Matem Metody 12:747–756 57. Lemaire B (1989) The proximal algorithm. In: Penot JP (ed) International series of numerical, vol 87. Birkhauser-Verlag, Basel, pp 73–87 58. Mainge P-E (2008) Strong convergence of projected subgradient methods for nonsmooth and nonstrictly convex minimization. Set-Valued Anal 16:899–912 59. Minty GJ (1962) Monotone (nonlinear) operators in Hilbert space. Duke Math J 29:341–346 60. Minty GJ (1964) On the monotonicity of the gradient of a convex function. Pacific J Math 14:243–247 61. Mordukhovich BS (2006) Variational analysis and generalized differentiation, I: I: Basic theory. Springer, Berlin 62. Mordukhovich BS, Nam NM (2014) An easy path to convex analysis and applications. Morgan & Clayton Publishes, San Rafael, CA 63. Moreau JJ (1965) Proximite et dualite dans un espace Hilbertien. Bull Soc Math France 93:273–299 64. Nadezhkina N, Takahashi Wataru (2004) Modified extragradient method for solving variational inequalities in real Hilbert spaces. Nonlinear analysis and convex analysis, pp 359–366. Yokohama Publ., Yokohama 65. Nedic A, Ozdaglar A (2009) Subgradient methods for saddle-point problems. J Optim Theory Appl 142:205–228 66. Nemirovski A, Yudin D (1983) Problem complexity and method efficiency in optimization. Wiley, New York 67. Nesterov Yu (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Dokl Akad Nauk 269:543–547 68. Nesterov Yu (2004) Introductory lectures on convex optimization. Kluwer, Boston 69. Nguyen TP, Pauwels E, Richard E, Suter BW (2018) Extragradient method in optimization: convergence and complexity. J Optim Theory Appl 176:137–162 70. Pallaschke D, Recht P (1985) On the steepest–descent method for a class of quasidifferentiable optimization problems. Nondifferentiable optimization: motivations and applications (Sopron, 1984), pp 252–263. Lecture Notes in Econom. and Math. Systems, vol. 255. Springer, Berlin
358
References
71. Polyak BT (1987) Introduction to optimization. Optimization Software, New York 72. Polyak RA (2015) Projected gradient method for non-negative least squares. Contemp Math 636:167–179 73. Qin X, Cho SY, Kang SM (2011) An extragradient-type method for generalized equilibrium problems involving strictly pseudocontractive mappings. J Global Optim 49:679–693 74. Rockafellar RT (1976) Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math Oper Res 1:97–116 75. Rockafellar RT (1976) Monotone operators and the proximal point algorithm. SIAM J Control Optim 14:877–898 76. Shor NZ (1985) Minimization methods for non-differentiable functions. Springer, Berlin 77. Solodov MV, Svaiter BF (2000) Error bounds for proximal point subproblems and associated inexact proximal point algorithms. Math Program 88:371–389 78. Solodov MV, Svaiter BF (2001) A unified framework for some inexact proximal point algorithms. Numer Funct Anal Optim 22:1013–1035 79. Solodov MV, Zavriev SK (1998) Error stability properties of generalized gradient-type algorithms. J Optim Theory Appl 98:663–680 80. Su M, Xu H-K (2010) Remarks on the gradient-projection algorithm. J Nonlinear Anal Optim 1:35–43 81. Takahashi W (2009) Introduction to nonlinear and convex analysis. Yokohama Publishers, Yokohama 82. Xu H-K (2006) A regularization method for the proximal point algorithm. J Global Optim 36:115–125 83. Xu H-K (2011) Averaged mappings and the gradient-projection algorithm. J Optim Theory Appl 150:360–378 84. Yamashita N, Kanzow C, Morimoto T, Fukushima M (2001) An infeasible interior proximal method for convex programming problems with linear constraints. J Nonlinear Convex Anal 2:139–156 85. Zaslavski AJ (2010) The projected subgradient method for nonsmooth convex optimization in the presence of computational errors. Numer Funct Anal Optim 31:616–633 86. Zaslavski AJ (2010) Convergence of a proximal method in the presence of computational errors in Hilbert spaces. SIAM J Optim 20:2413–2421 87. Zaslavski AJ (2011) Inexact proximal point methods in metric spaces. Set-Valued Var Anal 19:589–608 88. Zaslavski AJ (2011) Maximal monotone operators and the proximal point algorithm in the presence of computational errors. J Optim Theory Appl 150:20–32 89. Zaslavski AJ (2012) The extragradient method for convex optimization in the presence of computational errors. Numer Funct Anal Optim 33:1399–1412 90. Zaslavski AJ (2012) The extragradient method for solving variational inequalities in the presence of computational errors. J Optim Theory Appl 153:602–618 91. Zaslavski AJ (2013) The extragradient method for finding a common solution of a finite family of variational inequalities and a finite family of fixed point problems in the presence of computational errors. J Math Anal Appl 400:651–663 92. Zaslavski AJ (2016) Numerical optimization with computational errors. Springer, Cham 93. Zaslavski AJ (2016) Approximate solutions of common fixed point problems, Springer optimization and its applications. Springer, Cham 94. Zeng LC, Yao JC (2006) Strong convergence theorem by an extragradient method for fixed point problems and variational inequality problems. Taiwanese J Math 10:1293–1303
Index
A Absolutely continuous function, 178 Algorithm, 1 Approximate solution, 1 B Banach space, 173 Bochner integrable function, 173–174, 195 Borelian function, 181 Boundary, 287 C Closure, 287 Compact set, 178 Concave function, 66, 182 Continuous function, 71, 80, 315 Continuous subgradient algorithm, 173, 181–184 Continuous subgradient projection algorithm, 193–194 Convex-concave function, 1, 25, 83 Convex conjugate the function, 278 Convex function, 1, 5, 297–298, 315 Convex hull, 178 Convex minimization problem, 1, 84 Convex set, 2, 177
Firmly nonexpansive operator, 243 Fréchet derivative, 18, 127 Fréchet differentiable function, 18, 127 G Game, 53–58 Gradient-type method, 151 H Hilbert space, 1, 2, 66 I Infimal convolution, 243 Inner product, 2, 66 Interior, 287 Iteration, 1 L Lebesgue measurable function, 173, 175 Lebesgue measure, 174 Locally Lipschitzian function, 193, 201 Lower semicontinuous function, 174, 191, 245
D Derivative, 178
M Minimizer, 76, 84 Mirror descent method, 10–17, 83–125 Moreau envelope, 244
F Feasible point, 1 Fenchel inequality, 278
N Nonexpansive mapping, 244 Norm, 2
© Springer Nature Switzerland AG 2020 A. J. Zaslavski, Convex Optimization with Computational Errors, Springer Optimization and Its Applications 155, https://doi.org/10.1007/978-3-030-37822-6
359
360 O Objective function, 1
P Predicted decrease approximation (PDA), 277, 280 Projected gradient algorithm, 1, 18, 127 Projected subgradient method, 25–81 Proximal method, 246, 260
Q Quasiconvex function, 287–293
Index S Saddle point, 1, 25, 83 Sharp weakly convex function, 295–320 Strongly measurable function, 173 Subdifferential, 2, 296 Subgradient, 1–10 Subgradient projection algorithm, 2, 67
U Upper semicontinuous function, 182, 191
Z Zero-sum game, 53–58, 67, 181–184