272 58 4MB
English Pages XIV, 393 [390] Year 2020
International Series in Operations Research & Management Science
Kurt Marti
Optimization Under Stochastic Uncertainty Methods, Control and Random Search Methods
International Series in Operations Research & Management Science Volume 296
Series Editor Camille C. Price Department of Computer Science, Stephen F. Austin State University, Nacogdoches, TX, USA Associate Editor Joe Zhu Foisie Business School, Worcester Polytechnic Institute, Worcester, MA, USA Founding Editor Frederick S. Hillier Stanford University, Stanford, CA, USA
More information about this series at http://www.springer.com/series/6161
Kurt Marti
Optimization Under Stochastic Uncertainty Methods, Control and Random Search Methods
Kurt Marti Institute for Mathematics and Computer Science University of Bundeswehr Munich Munich, Germany
ISSN 0884-8289 ISSN 2214-7934 (electronic) International Series in Operations Research & Management Science ISBN 978-3-030-55661-7 ISBN 978-3-030-55662-4 (eBook) https://doi.org/10.1007/978-3-030-55662-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Optimization problems in practice depend mostly on several model parameters, noise factors, uncontrollable parameters, etc., which are not given fixed quantities at the planning stage. Typical examples from engineering and economics/operations research are: material parameters, manufacturing errors, tolerances, noise terms, demand parameters, technological coefficients in input–output functions, cost factors, etc. Due to several types of stochastic uncertainties (physical uncertainty, economic uncertainty, statistical uncertainty, and model uncertainty), these parameters must be modeled by random variables having a certain probability distribution. In most cases, at least certain moments of this distribution are known. In order to cope with these uncertainties, a basic procedure in the engineering/economic practice is to replace first the unknown parameters by some chosen nominal values, e.g., estimates, guesses, of the parameters. Then, the resulting and mostly increasing deviation of the performance (output and behavior) of the structure/system from the prescribed performance (output and behavior), i.e., the “tracking error,” is compensated by (online) input corrections. However, the online correction of a system/structure is often time-consuming and causes mostly increasing expenses (correction or recourse costs). This can be omitted to a large extent by taking into account already at the planning stage the consequences of the possible decisions and the known prior and sample information about the random data of the problem. Hence, instead of relying on ordinary deterministic parameter optimization methods—based on some nominal parameter values—and applying then just some correction actions, stochastic optimization methods should be applied: Incorporating stochastic parameter variations into the optimization process, expensive and increasing online correction expenses can be omitted or at least reduced to a large extent. Consequently, for the computation of optimal decisions which are insensitive with respect to random parameter variations, hence, robust optimal decisions, appropriate deterministic substitute problems must be formulated first. Based on decision theoretical principles, these substitute problems depend on probabilities of failure/success and/or on more general expected cost/loss terms. Since probabilities and expectations are defined by multiple integrals in general, the resulting often v
vi
Preface
nonlinear and also nonconvex deterministic substitute problems can be solved by approximate methods only. Two basic types of deterministic substitute problems occur mostly in practice cost minimization subject to expected recourse (correction) cost: Minimization of the expected primary costs subject to expected recourse cost constraints (reliability constraints) and remaining deterministic constraints, e.g., box constraints, and minimization of the expected total costs (costs of construction, design, recourse costs, etc.) subject to the remaining deterministic constraints. After an introduction into the theory of dynamic control systems with random parameters, the major control laws are described, as open-loop control, closed-loop, feedback control, and open-loop feedback control, used for iterative construction of feedback controls. Taylor expansion methods are considered for solving stochastic optimization problems involving expected cost or loss functions. Applications of Taylor expansion methods to control problems with linear and sublinear cost functions and the computation of stochastic open-loop controls for tracking problems complete Chap. 1. A detailed presentation of the optimization of regulators under stochastic uncertainty is as follows in Chap. 2. Optimal stochastic open-loop control problems with more general dynamic equation are examined in Chap. 3. The construction of optimal feedback controls for control problems with random parameters by means of homotopy methods can be found in Chap. 4. Limit state functions for applying in reliability-based design problems arising in static and timedependent stochastic optimization problems are constructed in Chap. 5. For the numerical solution of the deterministic substitute problems as arising in the treatment of stochastic optimization and stochastic optimal control problems as considered in Part I, efficient numerical optimization routines are needed. Besides mathematical optimization techniques, one of the major methods for solving deterministic parameter optimization problems, as arising in the handling of optimization and optimal control problems with random parameters in Part I, are stochastic search methods. For many years, mostly (deterministic) mathematical programming (MP) procedures were used to solve (approximately) deterministic parameter optimization problems, hence, mathematical programs. However, due to the complexity of concrete optimization/control problems and their often lack of mathematical regularity as required of MP techniques, other optimization techniques like random search methods (RSM) became increasingly important. Hence, in the second Part II of this monograph basic results on the convergence and convergence rates of random search methods are presented. Moreover, methods, based on optimal stochastic decision processes, for the improvement of the—sometimes very low—convergence rates are presented: After the description of important classes of random search methods, such as random direction methods and procedures with absolutely continuous mutation sequences, conditions are considered which guarantee the convergence of random search methods. Moreover, convergence rates are determined. In order to improve the convergence behavior of RSM, the random search procedure is embedded into a stochastic decision process for an optimal control of the probability distributions of the mutation random variables. Basic theoretical results on the convergence and convergence rate of random search methods are taken from the doctoral thesis of G. Rappl. Several mathematical prerequisites are provided in the appendix.
Preface
vii
Last but not least, I would like to thank Dipl. Math. Ina Stein for her excellent support in the LaTeX-typesetting as well as in the final proofreading. Moreover, I am indebted to Springer-Verlag for inclusion of the book into the Springer International Series in Operations Research & Management Science (ISOR). I would like to thank especially the Senior Editor for Business/Economics/Operations Research of Springer-Verlag Heidelberg, Germany, Christian Rauscher, and the Editor for Operations Research/Management Science and Information Systems of SpringerVerlag New York, USA, Neil Levine, for their advice until the completion of this book. Munich, Germany October 2020
Kurt Marti
Contents
Part I Stochastic Optimization Methods 1
2
Optimal Control Under Stochastic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Stochastic Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Differential and Integral Equations Under Stochastic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Control Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Computation of Expectations by Means of Taylor Expansions . . . . 1.3.1 Complete Taylor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Inner or Partial Taylor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Taylor Approximation of Control Problems Under Stochastic Uncertainty: General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Control Problems with Linear and Sublinear Cost Functions . . . . . . 1.6 Stochastic Optimal Open-Loop Feedback Control of Tracking Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Approximation of the Expected Costs: Expansions of 1st Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Approximate Computation of the Fundamental Matrix . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stochastic Optimization of Regulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Regulator Design Under Stochastic Uncertainty . . . . . . . . . . . . . . . . . . . 2.3 Optimal Feedback Functions Under Stochastic Uncertainty . . . . . . . 2.3.1 Quadratic Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Calculation of the Tracking Error Rates (Sensitivities). . . . . . . . . . . . . 2.4.1 Partial Derivative with Respect to pD . . . . . . . . . . . . . . . . . . . . . 2.4.2 Partial Derivative with Respect to q0 . . . . . . . . . . . . . . . . . . . . . 2.4.3 Partial Derivative with Respect to q˙0 . . . . . . . . . . . . . . . . . . . . . 2.4.4 Partial Derivative with Respect to e0 . . . . . . . . . . . . . . . . . . . . . . . . 2.5 The Approximate Regulator Optimization Problem . . . . . . . . . . . . . . . .
3 3 4 9 13 16 17 19 21 23 25 27 30 31 33 33 35 40 42 46 47 48 48 49 50 ix
x
Contents
2.6
Active Structural Control Under Stochastic Uncertainty . . . . . . . . . . . 2.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
52 53 58
Optimal Open-Loop Control of Dynamic Systems Under Stochastic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Optimal Control Problems Under Stochastic Uncertainty. . . . . . . . . . 3.1.1 Computation of the Expectation of the Cost Functions L, G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Solution of the Substitute Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 More General Dynamic Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62 66 68 69
4
Construction of Feedback Control by Means of Homotopy Methods References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71 77
5
Constructions of Limit State Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Optimization-Based Construction of Limit State Functions . . . . . . . 5.3 The (Limit) State Function s ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Characterization of Safe States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Computation of the State Function for Concrete Cases . . . . . . . . . . . . 5.4.1 Mechanical Structures Under Stochastic Uncertainty . . . . . . 5.4.2 Linear-Quadratic Problems with Scalar Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Approximation of the General Operating Condition . . . . . . . 5.4.4 Two-Sided Constraints for the Response Functions . . . . . . . . 5.5 Systems/Structures with Parameter-Dependent States . . . . . . . . . . . . . 5.5.1 Dynamic Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Variational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Example to Systems with Control and Variational Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Discretization of Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.5 Reliability-Based Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79 79 80 84 89 91 91
61 62
96 97 97 98 98 99 100 102 110 118
Part II Optimization by Stochastic Methods: Foundations and Optimal Control/Acceleration of Random Search Methods (RSM) 6
Random Search Procedures for Global Optimization . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Convergence of the Basic Random Search Procedure . . . . . . . . . 6.2.1 Discrete Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Adaptive Random Search Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Infinite-Stage Search Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Convex Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123 123 125 128 129 134 135 137
Contents
7
8
xi
Controlled Random Search Under Uncertainty. . . . . . . . . . . . . . . . . . . . . . . . . 7.1 The Controlled (or Adaptive) Random Search Method . . . . . . . . . . . . 7.1.1 The Convergence of the Controlled Random Search Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 A Stopping Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Computation of the Conditional Distribution of F Given the Process History: Information Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
139 139
Controlled Random Search Procedures for Global Optimization . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Convergence of the Random Search Procedure . . . . . . . . . . . . . . . . . . . . . 8.3 Controlled Random Search Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Computation of Optimal Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Convergence Rates of Controlled Random Search Procedures . . . . 8.6 Numerical Realizations of Optimal Control Laws . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151 151 153 155 156 160 161 167
143 145 146 150
Part III Random Search Methods (RSM): Convergence and Convergence Rates 9
Mathematical Model of Random Search Methods and Elementary Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10
Special Random Search Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 R-S-M with Absolutely Continuous Mutation Sequence. . . . . . . . . . . 10.2 Random Direction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Relationships Between Random Direction Methods and Methods with an Absolutely Continuous Mutation Sequence . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179 179 180 180 185
11
Accessibility Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12
Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Convergence of Random Search Methods with an Absolutely Continuous Mutation Sequence. . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Convergence of Random Direction Methods . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195 195 198 205
13
Convergence of Stationary Random Search Methods for Positive Success Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
14
Random Search Methods of Convergence Order O(n−α ) . . . . . . . . . . . . . 213 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
xii
Contents
15
Random Search Methods with a Linear Rate of Convergence . . . . . . . . 15.1 Methods with a Rate of Convergence that Is at Least Linear . . . . . . 15.2 Methods with a Rate of Convergence that Is at Most Linear . . . . . . . 15.3 Linear Convergence for Positive Probability of Success . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
233 233 256 272 278
16
Success/Failure-Driven Random Direction Procedures . . . . . . . . . . . . . . . . 279 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
17
Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Part IV Optimization Under Stochastic Uncertainty by Random Search Methods (RSM) 18
Solving Optimization Problems Under Stochastic Uncertainty by Random Search Methods (RSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Convergence of the Search Process (Xt ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Estimation of the Minimum, Maximum Entry, Leaving Probability, Resp., αt , rt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
341 341 342 345 349
A
Properties of the Uniform Distribution on the Unit Sphere. . . . . . . . . . . . 351
B
Analytical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
C
Probabilistic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Symbols and Abbreviations
N N0 R+ Rd S Bd B d |M |A| A Å A¯ rd A xT |x| dist(x, A) K(x, r) det C resp. |C| r (x) ∼ f D Da f∗ D∗ x∗ εx λd PX
Set of natural numbers N ∪ {0} {x ∈ R; x ≥ 0} d-dimensional Euclidean space Unit sphere of the Rd Borel-σ -algebra of the Rd Restriction of B d to M ⊂ Rd Cardinality of the set A Complement of A Interior of A Closure of A Boundary of A Transposed vector s (vectors are generally constructed as column vectors) Euclidean norm Distance from x ∈ Rd to A ⊂ Rd Open sphere with center at x and radius r Determinant of C Largest whole number ≤ r Euler gamma function an ∼ bn ⇐⇒ an /bn → l Function to be minimized Set of admissible points {x ∈ D : f (x) ≤ a} Level set for the level a inf{f (x) : x ∈ D} {x ∈ D : f (x) ≤ f (y) for all y ∈ D} Minimal set Point from D ∗ Singular measure, concentrated on x d-dimensional Lebesgue measure Distribution of x
xiii
xiv
T (μ) A⊗B μ⊗ν pn qn R-S-M R-D-M γ λn ZV TP
Symbols and Abbreviations
Pushforward measure of μ under T Product-σ -algebra Product measure Mutation transition probability Selection transition probability Random search method Random direction method Uniform distribution on S Step length distribution Random variable Transition probability
Part I
Stochastic Optimization Methods
Chapter 1
Optimal Control Under Stochastic Uncertainty
1.1 Stochastic Control Systems Optimal control and regulator problems arise in many concrete applications (mechanical, electrical, thermodynamical, chemical, etc.) are modeled [3, 29, 33] by dynamical control systems obtained from physical measurements and/or known physical (a priori) laws. The basic control system (input–output system) is mathematically represented [16, 34] by a system of first order differential equations with random parameters: z˙ (t) = g t, ω, z(t), u(t) , t0 ≤ t ≤ tf , ω ∈ z(t0 ) = z0 (ω).
(1.1a) (1.1b)
Here, ω is the basic random element taking values in a probability space (, A, P ), and describing the random variations of model parameters or the influence of noise terms. The probability space (, A, P ) consists of the sample space or set of elementary events , the σ -algebra A of events and the probability measure P . The plant state vector z = z(t, ω) is an m-vector involving direct or indirect measurable/observable quantities like displacements, stresses, voltage, current, pressure, concentrations, etc., and their time derivatives (velocities), z0 (ω) is the random initial state. The plant control or control input u(t) is a deterministic or stochastic n-vector denoting system inputs like external forces or moments, voltages, field current, thrust program, fuel consumption, production rate, etc. Furthermore, z˙ denotes the derivative with respect to the time t. We assume that an input u = u(t) is chosen such that u(·) ∈ U , where U is a suitable linear space of input functions u(·) : [t0 , tf ] → Rn on the time interval [t0 , tf ]. Examples for U are subspaces of the space P C0n [t0 , tf ] of piecewise continuous functions © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_1
3
4
1 Optimal Control Under Stochastic Uncertainty
u(·) : [t0 , tf ] → Rn normed by the supremum norm u(·)
∞
= sup u(t) : t0 ≤ t ≤ tf .
Since feedback control (FB) laws can be approximated very efficiently, cf. [2, 17, 29], by means of open-loop feedback (OLF) control laws, see Sect. 1.2, for practical purposes we may confine to the computation of stochastic optimal open-loop (OL) controls u = u(·; tb ), tb ≤ t ≤ tf , on arbitrary remaining time intervals [tb , tf ] of [t0 , tf ]. Here, u = u(·; tb ) is stochastic optimal with respect to the information Atb available at the corresponding initial time point tb .
1.1.1 Differential and Integral Equations Under Stochastic Uncertainty In many technical applications the random variations are not caused by an additive white noise term, but by means of possibly time-dependent random parameters. Hence, in the following the dynamics of the control system is represented by a differential equation under stochastic uncertainty, i.e. a system of ordinary differential equations (1.1a,b) with random parameters. Furthermore, solutions of differential equations under stochastic uncertainty are defined here in the parameter (ω-point)-wise sense, cf. [5]. In case of a discrete or discretized probability distribution of the random elements, model parameters, i.e. = {ω1 , ω2 , . . . , ω },
P (ω = ωj ) = αj > 0, j = 1, . . . , ,
αj = 1,
j =1
we can redefine (1.1a,b) by z˙ (t) = g t, z(t), u(t) ,
t 0 ≤ t ≤ tf ,
z(t0 ) = z0 .
(1.1c) (1.1d)
With the vectors and vector functions z(t) := z(t, ωj ) j =1,..., , g(t, z, u) := g(t, ωj , z(j ) , u) j =1,..., ,
z0 := z0 (ωj ) j =1,..., z := (z(j ) )j =1,..., ∈ R m .
Hence, for a discrete distribution P we have again an ordinary system of first order differential equations for the m unknown functions
1.1 Stochastic Control Systems
zij = zi (t, ωj ),
5
i = 1, . . . , m, j = 1, . . . , .
Results on the existence and uniqueness of the systems (1.1a,b) and (1.1c,d) and their dependence on the inputs can be found in [8]. Also in the general case we consider a solution in the point-wise sense. This means that for each random element ω ∈ , (1.1a,b) is interpreted as a system of ordinary first order differential equations with the initial values z0 = z0 (ω) and control input u = u(t). Hence, we assume that to each deterministic control u(·) ∈ U and each random element ω ∈ there exists a unique solution z(·, ω) = S ω, u(·) = S ω, u(·) ,
(1.2a)
z(·, ω) ∈ C0m [t0 , tf ], of the integral equation
t z(t) = z0 (ω) + g s, ω, z(s)u(s) ds, t0 ≤ t ≤ tf ,
(1.2b)
t0
such that (t, ω) → S ω, u(·) (t) is measurable. This solution is also denoted by z(t, ω) = zu (t, ω) = z(t, ω, u(·)), t0 ≤ t ≤ tf .
(1.2c)
Obviously, the integral equation (1.2a–c) is the integral version of the initial value problem (1.1a,b): Indeed, if, for given ω ∈ , z = z(t, ω) is a solution of (1.1a,b), i.e. z(·, ω) is absolutely continuous, satisfies (1.1a) for almost all t ∈ [t0 , tf ] and fulfills (1.1b), then z = z(t, ω) is also a solution of (1.2a–c). Conversely, if, for given ω ∈ , z = z(t, ω) is a solution of (1.2a–c), such that the integral on the right hand side exists in the Lebesgue-sense for each t ∈ [t0 , tf ], then this integral as a function of the upper bound t and therefore also the function z = z(t, ω) is absolutely continuous. Hence, by taking t = t0 and by differentiation of (1.2a–c) with respect to t, we have that z = z(t, ω) is also a solution of (1.1a,b).
1.1.1.1
Parametric Representation of the Differential/Integral Equation Under Stochastic Uncertainty
In the following we want to justify the above assumption that the initial value problem (1.1a,b), the equivalent integral equation (1.2a–c), resp., has a unique solution z = z(t, ω). For this purpose, let θ = θ (t, ω) be an r-dimensional stochastic process, as, e.g., time-varying disturbances, random parameters, etc., of the system, such that the sample functions θ (·, ω) are continuous with probability one. Furthermore, let
6
1 Optimal Control Under Stochastic Uncertainty
g˜ : [t0 , tf ] × Rr × Rm × Rn → Rm be a continuous function having continuous Jacobians Dθ g, ˜ Dz g, ˜ Du g˜ with respect to θ, z, u. Now consider the case that the function g of the process differential equation (1.1a,b) is given by g(t, ω, z, u) := g˜ t, θ (t, ω), z, u , (t, ω, z, u) ∈ [t0 , tf ]××Rm ×Rn . The spaces U, Z of possible inputs, trajectories, resp., of the plant are chosen as follows: U := Reg([t0 , tf ]; Rn ) is the Banach space of all regulated functions u(·) : [t0 , tf ] → Rn , normed by the supremum norm · ∞ . Furthermore, we set Z := C0m [t0 , tf ] and := C0r [t0 , tf ]. Here, for an integer ν, ν C0 [t0 , tf ] denotes the Banach space of all continuous functions of [t0 , tf ] into Rν normed by the supremum norm · ∞ . By our assumption we have θ (·, ω) ∈ a.s. (almost sure). Define = Rm × × U ; is the space of possible initial values, time-varying model/environmental parameters, and inputs of the dynamic system. Hence, may be considered as the total space of inputs ⎛
⎞ z0 ξ := ⎝ θ (·) ⎠ u(·) into the plant, consisting of the random initial state z0 , the random input function θ = θ (t, ω), and the control function u = u(t). Let now the mapping τ : ×Z → Z related to the plant equation (1.1a,b) or (1.2a-c) be given by ⎛ ⎞
t τ ξ, z(·) (t) = z(t) − ⎝z0 + g˜ s, θ (s), z(s), u(s) ds ⎠ , t0 ≤ t ≤ tf .
(1.3a)
t0
Note that for each input vector ξ ∈ and function z(·) ∈ Z the integrand in (1.3a) is piecewise continuous, bounded, or at least essentially bounded on [t0 , tf ]. Hence, the integral in (1.3a) as a function of its upper limit t yields again a continuous function on the interval [t0 , tf ], and therefore an element of Z. This shows that τ maps × Z into Z. Obviously, the initial value problem (1.1a,b) or its integral form (1.2a–c) can be represented by the operator equation τ (ξ, z(·)) = 0.
(1.3b)
1.1 Stochastic Control Systems
7
Operators of the type (1.3a) are well studied, see, e.g., [8, 18]: It is known that τ is continuously Fréchet (F )-differentiable [8, 18]. Note that the F-differential is a generalization of the derivatives (Jacobians) of mappings between finite-dimensional spaces to mappingsbetween arbitrary normed spaces. Thus, the F -derivative Dτ of τ at a certain point ξ , z(·) is given by Dτ ξ¯ , z¯ (·) · ξ, z(·) (t)
t = z(t) − z0 + Dz g˜ s, θ¯ (s), z¯ (s), u(s) ¯ z(s)ds t0
t +
Dθ g˜ s, θ¯ (s), z¯ (s), u(s) ¯ θ (s)ds
t0
t +
Du g˜ s, θ¯ (s), z¯ (s), u(s) ¯ u(s)ds , t0 ≤ t ≤ tf ,
(1.3c)
t0
where ξ¯ = z¯ 0 , θ¯ (·), u(·) ¯ and ξ = z0 , θ (·), u(·) . Especially, for the derivative of τ with respect to z(·) we find
t Dz τ ξ¯ , z¯ (·) · z(·) (t) = z(t) − Dz g˜ s, θ¯ (s), z¯ (s), u(s) ¯ z(s)ds, t0 ≤ t ≤ tf . t0
(1.3d) The related equation Dz τ ξ¯ , z¯ (·) · z(·) = y(·), y(·) ∈ Z,
(1.3e)
is a linear vectorial Volterra integral equation. By our assumptions this equation has a unique solution z(·) ∈ Z. Note that the corresponding result for scalar Volterra equations, see, e.g., [31], can be transferred to the present vectorial case. Therefore, Dz τ ξ¯ , z¯ (·) is a linear, continuous one-to-one map from Z onto Z. Hence, its −1 inverse Dz τ ξ¯ , z¯ (·) exists. Using the implicit function theorem [8, 18], we obtain now the following result: Lemma 1.1 For given ξ¯ = z¯ 0 , θ¯ (·), u(·) ¯ , let ξ¯ , z¯ (·) ∈ × Z be selected such that τ ξ¯ , z¯ (·) = 0, hence, z¯ (·) ∈ Z is supposed to be the solution of
8
1 Optimal Control Under Stochastic Uncertainty
z˙ (t) = g˜ t, θ¯ (t), z(t), u(t) ¯ , t 0 ≤ t ≤ tf , z(t0 ) = z¯ 0
(1.4a) (1.4b)
in the integral sense (1.2b). Then there is an open neighborhood of ξ¯ , denoted by V 0 (ξ¯ ), such that for each open connected neighborhood V (ξ¯ ) of ξ¯ contained in ¯ ¯ V 0 (ξ¯ ) there existsa unique continuous mapping S : V (ξ ) → Z such that a) S(ξ ) = z¯ (·); b) τ ξ, S(ξ ) = 0 for each ξ ∈ V (ξ¯ ), i.e. S(ξ ) = S(ξ )(t), t0 ≤ t ≤ tf , is the solution of
t z(t) = z0 + g˜ s, θ (s), z(s), u(s) ds, t0 ≤ t ≤ tf ,
(1.4c)
t0
where ξ = z0 , θ (·), u(·) ; c) S is continuously differentiable on V (ξ¯ ), and it holds −1 Du S(ξ ) = − Dz τ ξ, S(ξ ) Du τ ξ, S(ξ ) , ξ ∈ V (ξ¯ ).
(1.4d)
An immediate consequence is given next: Corollary 1.1 The directional derivative ζ (·) = ζu,h (·) = Du S(ξ )h(·) Z, h(·) ∈ U , satisfies the integral equation
t ζ (t) −
∈
Dz g˜ s, θ (s), S(ξ )(s), u(s) ζ (s)ds
t0
t =
Du g˜ s, θ (s), S(ξ )(s), u(s) h(s)ds,
(1.4e)
t0
where t0 ≤ t ≤ tf and ξ = z0 , θ (·), u(·) . Remark 1.1 Taking the time derivative of Eq. (1.4e) shows that this integral equation is equivalent to the so-called perturbation equation, see, e.g., [16]. For an arbitrary h(·) ∈ U the mappings (t, ξ ) → S(ξ )(t), (t, ξ ) → Du S(ξ )h(·) (t), (t, ξ ) ∈ [t0 , tf ] × V (ξ ), (1.4f) are continuous and therefore also measurable. The existence of a unique solution z¯ = z¯ (t), t0 ≤ t ≤ tf , of the reference differential equation (1.4a,b) can be guaranteed as follows, where solution is
1.1 Stochastic Control Systems
9
interpreted in the integral sense (1.2b), i.e. z¯ = z¯ (t), t0 ≤ t ≤ tf , is absolutely continuous, satisfies equation (1.4a) almost everywhere in the time interval [t0 , tf ] and the initial condition (1.4b), cf. [6] and [36]. ¯ ∈ , and Lemma 1.2 Consider an arbitrary input vector ξ¯ = z¯ 0 , θ¯ (·), u(·) t, z := g˜ t, θ¯ (t), z, u(t) ¯ . define, see (1.4a,b), the function g˜ θ(·), = g˜ θ(·), ¯ u(·) ¯ u(·) ¯ ¯ Suppose that (i) z → g˜ θ(·), ¯ u(·) ¯ t, z is continuous for each time t ∈ [t0 , tf ], (ii) t → g˜ θ(·), t, z is measurable for each vector z, ¯ u(·) ¯ (iii) generalized Lipschitz-condition: For each closed sphere K ⊂ Rn there exists a nonnegative, integrable function LK (·) on [t0 , tf ] such that (a) g˜ θ(·), ¯ u(·) ¯ t, 0 ≤ LK (t), and (b) g˜ θ(·), t, z − g˜ θ(·), t, w ≤ LK (t)z − w on [t0 , tf ] × K. ¯ u(·) ¯ u(·) ¯ ¯ Then, the initial value problem (1.4a,b) has a unique solution z¯ = z¯ (t; ξ¯ ). Proof Proofs can be found in [6] and [36].
We observe that the controlled stochastic process z = z(t, ω) defined by the plant differential equation (1.1a,b) may be a non-Markovian stochastic process, see [3]. Moreover, note that the random input function θ = θ (t, ω) is not just an additive noise term, but may describe also a disturbance which is part of the nonlinear dynamics of the plant, random varying model parameters such as material, load or cost parameters, etc.
1.1.2 Objective Function The aim is to obtain optimal controls being robust, i.e. most insensitive with respect to stochastic variations of the model/environmental parameters and initial values of the process. Hence, incorporating stochastic parameter variations into the optimization process, for a deterministic control function u = u(t), t0 ≤ t ≤ tf ,
the objective function F = F u(·) of the controlled process z = z(t, ω, u(·)) is defined, cf. [24], by the conditional expectation of the total costs arising along the whole control process: F u(·) := Ef ω, S ω, u(·) , u(·) .
(1.5a)
Here, E = E(·|At0 ) denotes the conditional expectation given the information At0 about the up to the considered starting time point t0 . Moreover, control process f = f ω, z(·), u(·) denote the stochastic total costs arising along the trajectory z = z(t, ω) and at the terminal point zf = z(tf , ω), cf. [3, 34]. Hence,
10
1 Optimal Control Under Stochastic Uncertainty
tf f ω, z(·), u(·) := L t, ω, z(t), u(t) dt + G tf , ω, z(tf ) ,
(1.5b)
t0
z(·) ∈ Z, u(·) ∈ U . Here, L : [t0 , tf ] × × Rm × Rn → R, →R G : [t0 , tf ] × × Rm are given measurable cost functions. We suppose that L(t, ω, ·, ·) and G(t, ω, ·) are convex functions for each (t, ω) ∈ [t0 , tf ] × , having continuous partial derivatives ∇z L(·, ω, ·, ·), ∇u L(·, ω, ·, ·), ∇z G(·, ω, ·). Note that in this case
f z(·), u(·) → L t, ω, z(t), u(t) dt + G tf , ω, z(tf ) t
(1.5c)
t0
is a convex function on Z × U for each ω ∈ . Moreover, assume that the expectation F u(·) exists and is finite for each admissible control u(·) ∈ D. In the case of feedback control u = u(t, y(t)), t0 ≤ t ≤ tf , where y = y(t) denotesthe state z = z(t) or a certain state observation, the objective function F = F u(·, ·) reads F u(·, ·) := Ef ω, S ω, u(·, ·) , u(·, ·) .
(1.5d)
Example 1.1 (Tracking Problems) If a trajectory zf = zf (t, ω), e.g., the trajectory of a moving target, known up to a certain stochastic uncertainty, must be followed or reached during the control process, then the cost function L along the trajectory can be defined by 2 L t, ω, z(t), u := z z(t) − zf (t, ω) + ϕ(u).
(1.6a)
Here, z is a weight matrix, and ϕ = ϕ(u) denotes the control costs, as, e.g., ϕ(u) = u u2
(1.6b)
with a further weight matrix u . If a random target zf = zf (ω) has to be reached at the terminal time point tf only, then the terminal cost function G may be defined, e.g., by 2 G tf , ω, z(tf ) := Gf z(tf ) − zf (ω) with a weight matrix Gf .
(1.6c)
1.1 Stochastic Control Systems
11
Example 1.2 (Active Structural Control, Control of Robots) In case of active structural control or for optimal regulator design of robots, cf. [22, 33], the total cost function f is given by defining the individual6a cost functions L and G as follows: L(t, ω, z, u) :=zT Q(t, ω)z + uT R(t, ω)u G(tf , ω, z) :=G(ω, z).
(1.7a) (1.7b)
Here, Q = Q(t, ω) and R = R(t, ω), resp., are certain positive (semi)definite m × m, n × n matrices which may depend also on (t, ω), where in case of active structural control the external load term depends on the control u(t). For endpoint control, the cost function G is given by G(ω, z) = (z − zf )T Gf (ω)(z − zf )
(1.7c)
with a certain desired, possibly random terminal point zf = zf (ω) and a positive (semi)definite, possibly random weight matrix Gf = Gf (ω).
1.1.2.1
Optimal Control Under Stochastic Uncertainty
For finding optimal controls being robust with respect to stochastic parameter variations u (·), u (·, ·), resp., in this chapter we are presenting now several methods for approximation of the following minimum expected total cost problem: min F u(·)
s.t. (subject to) u(·) ∈ D,
(1.8a)
for feedback problems the time-function u(t) is replaced by u = u(t, y(t)): min F u(·, ·)
s.t. u(·, ·) ∈ D.
(1.8b)
Information set At at time t: The information set At ⊂ A, t ≥ t0 , is defined by the σ -algebra of events A ∈ A until time t. In many cases, as, e.g., for P D− and P I D− (feedback) controllers, the information σ -algebra At is given by At = A(y(t, ·)), where y = y(t, ω) denotes the m−vector ¯ function of state-measurements or -observations at time t. Then, an At − measurable control u = u(t, ω) has the representation, cf. [4], u(t, ω) = η(t, y(t, ω)) with a measurable function η(t, ·) : Rm¯ → Rm .
(1.8c)
12
1 Optimal Control Under Stochastic Uncertainty
Since parameter-insensitive optimal controls can be obtained by stochastic optimization methods incorporating random parameter variations into the optimization procedure, see [24], the aim is to determine stochastic optimal controls: Definition 1.1 An optimal solution of the expected total cost minimization problem (1.8a,b), providing a robust optimal open-loop, optimal feedback control, resp., is called an optimal control under stochastic uncertainty or—for short—a stochastic optimal OL/FB control. Note For controlled processes working on a time range tb ≤ t ≤ tf with an intermediate starting time point tb , the objective function F = F (u(·)) is defined also by (1.5a), but with the conditional expectation operator E = E(·|Atb ), where Atb denotes the information set about the controlled process available up to time tb . Problem (1.8a) is of course equivalent E = E(·|At0 ) to the optimal control problem under stochastic uncertainty: ⎞ ⎛
tf ⎟ ⎜ min E ⎝ L t, ω, z(t), u(t) dt + G tf , ω, z(tf ) At0 ⎠
(1.9a)
t0
s.t. z˙ (t) = g t, ω, z(t), u(t) , t0 ≤ t ≤ tf , a.s. z(t0 , ω) = z0 (ω), a.s. u(·) ∈ D,
(1.9b) (1.9c) (1.9d)
cf. [19, 20]. For further details see also [28]. Remark 1.2 Similar representations can be obtained also for stochastic optimal feedback control problems. Remark 1.3 (State Constraints) In addition to the plant differential equation (dynamic equation) and the control constraint we may still have some stochastic state constraints hI t, ω, z(t, ω) ≤ (=) 0 a.s. (1.10a) as well as state constraints involving (conditional) expectations EhI I t, ω, z(t, ω) = E hI I t, ω, z(t, ω) At0 ≤ (=) 0.
(1.10b)
1.2 Control Laws
13
Here, hI = hI (t, ω, z), hI I = hI I (t, ω, z) are given vector functions of (t, ω, z). By means of (penalty) cost functions, the random condition can be incorporated into the objective function of the control problem. As explained in Sect. 1.3, the expectations arising in the mean value constraints and in the objective function can be computed approximately by means of Taylor expansion with respect to the vector ϑ = ϑ(ω) := (z0 (ω), θ (ω)) of random and model parameters initial values (t ) at the conditional mean ϑ = ϑ 0 := E ϑ(ω)|At0 . This yields then ordinary deterministic constraints for the extended deterministic trajectory (nominal state and sensitivity) t → z(t, ϑ), Dϑ z(t, ϑ) , t ≥ t0 .
1.2 Control Laws Control or guidance usually refers [3, 16, 18] to direct influence on a dynamic system to achieve desired performance. In optimal control of dynamic systems mostly the following types of control laws or control policies are considered: (I) Open-Loop Control (OL) Here, the control function u = u(t) is a deterministic function depending only on the (a priori) information It0 about the system, the model parameters, resp., available at the starting time point t0 . Hence, for the optimal selection of optimal (OL) controls u(t) = u t; t0 , It0 ,
t ≥ t0 ,
(1.11a)
we get optimal control problems of type (1.9a–d). (II) Closed-Loop control (CL) or Feedback Control In this case the control function u = u(t) is a stochastic function u = u(t, ω) = u(t, It ),
t ≥ t0
(1.11b)
depending on time t and the total information It about the system available up to time t. Especially It may contain information about the state z(t) = z(t, ω) up to time t. Optimal (CL) or feedback controls are obtained by solving problems of type (1.8b). Remark 1.4 (Information Set At at Time t) Often the information It available up to time t is described by the information set or σ −algebra At ⊂ A of events A occurred up to time t. In the important case At = A(y(t, ·)), where y = y(t, ω) denotes the m−vector ¯ function of state-measurements or -observations at time t, then an At − measurable control u = u(t, ω), see
14
1 Optimal Control Under Stochastic Uncertainty
problem (1.8b), has the representation, cf. [4], u(t, ω) = ηt (y(t, ω))
(1.11c)
with a measurable function ηt : Rm¯ → Rm . Important examples of this type are the P D− and P I D−controllers. (III) Open-Loop Feedback (OLF) Control/Stochastic Open-Loop Feedback (SOLF) Control Due to their large complexity, in general, optimal feedback control laws can be determined approximately only. A very efficient approximation procedure for optimal feedback controls, being functions of the information It , is the approximation by means of optimal open-loop controls. In this combination of (OL) and (CL) control, at each intermediate time point tb := t, t0 ≤ t ≤ tf , given the information It up to time t, first the open-loop control function for the remaining time interval t ≤ s ≤ tf , see Fig. 1.1, is computed, hence, u[t,tf ] (s) = u s; t, It ,
s ≥ t.
(1.11d)
Then, an approximate feedback control policy, originally proposed by Dreyfus (1964), cf. [9], can be defined as follows: Definition 1.2 The hybrid control law, defined by uOLF (t, It ) := u t; t, It , t ≥ t0 ,
(1.11e)
is called open-loop feedback (OLF) control law. Thus, the OL control u[t,tf ] (s), s ≥ t, for the remaining time interval [t, tf ] is used only at time s = t, see also [2, 9–11, 15, 17, 35]. Optimal (OLF) controls are obtained therefore by solving again control problems of the type (1.9a–d) at each intermediate starting time point tb := t, t ∈ [t0 , tf ]. A major issue in optimal control is the robustness, cf. [12], i.e. the insensitivity of an optimal control with respect to parameter variations. In case of random parameter variations robust optimal controls can be obtained by means of stochastic optimization methods, cf. [24], incorporating the probability distribution, i.e. the random characteristics, of the random parameter variation into the optimization process, cf. Definition 1.1. Thus, constructing stochastic optimal open-loop feedback controls, hence, optimal open-loop feedback control laws being insensitive as far as possible with respect to random parameter variations, means that besides the optimality of the control policy also its insensitivity with respect to stochastic parameter variations should be guaranteed. Hence, in the following sections we also develop a stochastic version of the optimal open-loop feedback control method, cf. [23, 25–27]. A short overview on this novel stochastic optimal open-loop feedback control concept is given below:
1.2 Control Laws
15
Fig. 1.1 Remaining time interval for intermediate time points t
At each intermediate time point tb = t ∈ [t0 , tf ], based on the given process observation It , e.g., the observed state zt = z(t) at tb = t, a stochastic optimal open-loop control u∗ = u∗ (s) = u∗ (s; t, It ), t ≤ s ≤ tf , is determined first on the remaining time interval [t, tf ], see Fig. 1.1, by stochastic optimization methods, cf. [24]. Having a stochastic optimal open-loop control u∗ = u∗ (s; t, It ), t ≤ s ≤ tf , on each remaining time interval [t, tf ] with an arbitrary starting time point t, t0 ≤ t ≤ tf , a stochastic optimal open-loop feedback (SOLF) control law is then defined—corresponding to Definition 1.2—as follows: Definition 1.3 The hybrid control law, defined by u∗OLF (t, It ) := u∗ t; t, It ,
t ≥ t0 ,
(1.11f)
is called the stochastic optimal open-loop feedback (SOLF) control law. Thus, at time tb = t just the first control value u∗ (t) = u∗ (t; t, It ) of = u∗ (·; t, It ) is used only. For finding stochastic optimal open-loop controls, on the remaining time intervals tb ≤ t ≤ tf with t0 ≤ tb ≤ tf , the stochastic Hamilton function of the control problem is introduced. Then, the class of H -minimal controls, cf. [16], can be determined in case of stochastic uncertainty by solving a finite-dimensional stochastic optimization problem for minimizing the conditional expectation of the stochastic Hamiltonian subject to the remaining deterministic control constraints at each time point t. Having a H − minimal control, the related two-point boundary value problem with random parameters will be formulated for the computation of a stochastic optimal state and costate trajectory. In the important case of a linear-quadratic structure of the underlying control problem, the state and costate trajectory can be determined analytically to a large extent. Inserting then these trajectories into the H-minimal control, stochastic optimal open-loop controls are found on an arbitrary remaining time interval. According to Definition 1.3, these controls yield then immediately a stochastic optimal open-loop feedback control law. Moreover, the obtained controls can be realized in real-time, which is already shown for applications in optimal control of industrial robots, cf. [30]. (IV) Nonlinear Model Predictive Control (NMPC)/Stochastic Nonlinear Model Predictive Control (SNMPC) Optimal open-loop feedback (OLF) control is the basic tool in Nonlinear Model Predictive Control (NMPC). Corresponding to the approximation u∗
16
1 Optimal Control Under Stochastic Uncertainty
technique for feedback controls described above, (NMPC) is a method to solve complicated feedback control problems by means of stepwise computations of open-loop controls. Hence, in (NMPC), see [1, 13, 14, 29] optimal open-loop controls u = u[t,t+Tp ] (s),
t ≤ s ≤ t + Tp ,
(1.11g)
cf. (1.11c), are determined first on the time interval [t, t + Tp ] with a certain so-called prediction time horizon Tp > 0. In sampled-data MPC, cf. [13], optimal open-loop controls u = u[ti ,ti +Tp ] are determined at certain sampling instants ti , i = 0, 1, . . ., using the information Ati about the control process and its neighborhood up to time ti , i = 0, 1, . . ., see also [22]. The optimal open-loop control at stage i is applied then, u = u[ti ,ti +Tp ] (t),
ti ≤ t ≤ ti+1 ,
(1.11h)
until the next sampling instant ti+1 . This method is closely related to the Adaptive Optimal Stochastic Trajectory Planning and Control (AOSTPC) procedure described in [21, 22]. Corresponding to the extension of (OLF) control to (SOLF) control, (NMPC) can be extended to Stochastic Nonlinear Model Predictive Control (SNMPC). For control policies of this type, robust (NMPC) with respect to stochastic variations of model parameters and initial values are determined in the following way: • Use the a posteriori distribution P (dω|At ) of the basic random element ω ∈ , given the process information At up to time t, and • apply stochastic optimization methods to incorporate random parameter variations into the optimal (NMPC) control design.
1.3 Computation of Expectations by Means of Taylor Expansions Corresponding to the assumptions in Sect. 1.1.1, based on a parametric representation of the stochastic uncertainty of the control problem, using a finite dimensional random parameter vector θ = θ (ω), we suppose that the functions g, z0 , L, G are described by g = g(t, ˜ θ, z, u)
(1.12a)
z0 = z˜ 0 (θ )
(1.12b)
˜ θ, z, u) L = L(t,
(1.12c)
1.3 Computation of Expectations by Means of Taylor Expansions
17
˜ θ, z). G = G(t,
(1.12d)
θ = θ (ω), ω ∈ (, A, P ),
(1.12e)
Here,
denotes the time-independent r-vector of random model parameters and random initial values, and g, z0 , L, G are sufficiently smooth functions of the variables indicated in (1.12a–d). Again, for simplification, the conditional expectation E(. . . |At0 ) given the information At0 up to the considered starting time t0 is denoted by E. Thus, let us denote t (1.13a) θ = θ 0 := Eθ (ω) = E θ (ω)|At0 the conditional expectation of the random vector θ (ω) given the information At0 at time point t0 . Taking into account the properties of the solution z = z(t, θ ) = S z0 (θ ), θ, u(·) (t),
t ≥ t0 ,
(1.13b)
of the dynamic equation (1.1a,b), the expectations arising in the objective function of (1.9a) can be computed approximately by means of Taylor expansion with respect to θ at θ.
1.3.1 Complete Taylor Expansion Considering first the costs L along the trajectory we obtain, cf. [24], L t, θ, z(t, θ ), u(t) = L t, θ , z(t, θ ), u(t) T + ∇θ L t, θ , z(t, θ ), u(t) + Dθ z(t, θ)T ∇z L t, θ , z(t, θ ), u(t) (θ − θ ) 1 + (θ − θ)T QL t, θ , z(t, θ ), Dθ z(t, θ ), u(t) (θ − θ ) + . . . . 2
(1.14a)
Retaining only 1st order derivatives of z = z(t, θ ) with respect to θ , the approximate Hessian QL of θ → L t, θ, z(t, θ ), u at θ = θ is given by QL t, θ , z(t, θ ), Dθ z(t, θ ), u(t) := ∇θ2 L t, θ , z(t, θ ), u(t)
18
1 Optimal Control Under Stochastic Uncertainty
T 2 2 +Dθ z(t, θ)T ∇θz L t, θ , z(t, θ ), u(t) + ∇θz L t, θ , z(t, θ ), u(t) Dθ z(t, θ ) +Dθ z(t, θ)T ∇z2 L t, θ , z(t, θ ), u(t) Dθ z(t, θ ). (1.14b) Here, ∇θ L, ∇z L denotes the gradient of L with respect to θ, z, resp., Dθ z is the Jacobian of z = z(t, θ ) with respect to θ , and ∇θ2 L, ∇z2 L, resp., denotes the Hessian 2 L is the r × m matrix of partial derivatives of L with respect to θ, z. Moreover, ∇θz of L with respect to θi and zk , in this order. Taking expectations in (1.14a), from (1.14b) we obtain the expansion EL t, θ (ω), z t, θ (ω) , u(t) = L t, θ , z(t, θ ), u(t) T 1 + E θ (ω) − θ QL t, θ , z(t, θ ), Dθ z(t, θ), u(t) θ (ω) − θ + . . . 2 1 = L t, θ , z(t, θ ), u(t) + trQL t, θ , z(t, θ ), Dθ z(t, θ), u(t) cov θ (·) + . . . . 2 (1.15) For the terminal costs G, corresponding to the above expansion we find G tf , θ, z(tf , θ ) = G tf , θ , z(tf , θ ) T + ∇θ G tf , θ , z(tf , θ ) + Dθ z(tf , θ )T ∇z G tf , θ , z(tf , θ ) (θ − θ ) 1 + (θ − θ)T QG tf , θ , z(tf , θ ), Dθ z(tf , θ ) (θ − θ) + . . . , 2
(1.16a)
where QG is defined in the same way as QL , see (1.14a). Taking expectations with respect to θ (ω), we get EG tf , θ (ω), z tf , θ (ω) = G tf , θ , z(tf , θ ) 1 + trQG tf , θ , z(tf , θ ), Dθ z(tf , θ ) cov θ (·) + . . . . 2
(1.16b)
Note Corresponding to (1.13a), for the mean and covariance matrix of the random parameter vector θ = θ (ω) we have θ =θ
(t0 )
:= E θ (ω)|At0
1.3 Computation of Expectations by Means of Taylor Expansions
cov θ (·) = cov
(t0 )
19
(t0 ) (t0 ) T θ (·) := E θ (ω) − θ θ (ω) − θ At0 .
1.3.2 Inner or Partial Taylor Expansion Instead of a complete expansion of L, G with respect to θ , appropriate approximations of the expected costs EL,EG, resp., may be obtained by the inner 1st order approximation of the trajectory, hence, L t, θ, z(t, θ ), u(t) ≈ L t, θ, z(t, θ ) + Dθ z(t, θ )(θ − θ ), u(t) . (1.17a) Taking expectations in (1.17a), for the expected cost function we get the approximation EL t, θ, z(t, θ ), u(t) ≈ EL t, θ (ω), z(t, θ ) + Dθ z(t, θ )(θ (ω) − θ ), u(t) .
(1.17b)
In many important cases, as, e.g., for cost functions L being quadratic with respect to the state variable z, the above expectation can be computed analytically. Moreover, if the cost function L is convex with respect to z, then the expected cost function EL is convex with respect to both, the state vector z(t, θ ) and the Jacobian matrix of sensitivities Dθ z(t, θ) evaluated at the mean parameter vector θ. Having the approximate representations (1.15), (1.16b), (1.17b), resp., of the expectations occurring in the objective function (1.9a), we still have to compute the trajectory t → z(t, θ ), t ≥ t0 , related to the mean parameter vector θ = θ ∂z and the sensitivities t → (t, θ ), i = 1, . . . , r, t ≥ t0 , of the state z = z(t, θ ) ∂θi with respect to the parameters θi , i = 1, . . . , r, at θ = θ. According to [28], for z = z(t, θ ) we have the system of differential equations z˙ (t, θ) = g t, θ , z(t, θ ), u(t) , z(t0 , θ ) = z0 (θ).
t ≥ t0 ,
(1.18a) (1.18b)
Moreover, assuming that differentiation with respect to θi , i = 1, . . . , r, and integration with respect to time t can be interchanged, cf. [28], we obtain the following system of linear perturbation differential equation for the Jacobian ∂z ∂z ∂z Dθ z(t, θ ) = (t, θ ), (t, θ), . . . , (t, θ ) , t ≥ t0 : ∂θ1 ∂θ2 ∂θr
20
1 Optimal Control Under Stochastic Uncertainty
d Dθ z(t, θ ) = Dz g t, θ , z(t, θ ), u(t) Dθ z(t, θ ) dt + Dθ g t, θ , z(t, θ ), u(t) , t ≥ t0 , Dθ z(t0 , θ ) =Dθ z0 (θ ).
(1.19a) (1.19b)
Note Equation (1.19a,b) is closely related to the perturbation equation (1.4e) for representing the derivative Du z of z with respect to the control u. Moreover, the matrix differential equation (1.19a) can be decomposed into the following r ∂z differential equations for the columns ∂θ (t, θ ), j = 1, . . . , r: j ∂g ∂z d ∂z t, θ , z(t, θ ), u(t) (t, θ ) = (t, θ ) dt ∂θj ∂θj ∂θj ∂g + t, θ , z(t, θ ), u(t) , ≥ t0 , ∂θj
j = 1, . . . , r.
(1.19c)
Denoting by L˜ = L˜ t, θ, z(t, θ ), Dθ z(t, θ), u(t) ,
(1.20a)
˜ =G ˜ tf , θ, z(tf , θ ), Dθ z(tf , θ ) , G
(1.20b)
the approximation of the cost functions L, G by complete, partial Taylor expansion, for the optimal control problem under stochastic uncertainty (2.8a–d) we obtain now the following approximation: Theorem 1.1 Suppose that differentiation with respect to the parameters θi , i = 1, . . . , r, and integration with respect to time t can be interchanged in (1.4c). Retaining only 1st order derivatives of z = z(t, θ ) with respect to θ , the optimal control problem under stochastic uncertainty (2.8a–d) can be approximated by the ordinary deterministic control problem:
tf min t0
s.t.
E L˜ t, θ (ω), z(t, θ ), Dθ z(t, θ ), u(t) dt
˜ tf , θ (ω), z(tf , θ , Dθ z(t, θ ) + EG
(1.21a)
1.4 Taylor Approximation of Control Problems Under Stochastic Uncertainty. . .
z˙ (t, θ ) = g t, θ , z(t, θ ), u(t) ,
t ≥ t0 ,
z(t0 , θ ) = z0 (θ) d Dθ z(t, θ) = Dz g t, θ , z(t, θ ), u(t) Dθ z(t, θ) dt + Dθ g t, θ , z(t, θ ), u(t) , t ≥ t0 , Dθ z(t0 , θ ) = Dθ z0 (θ )
21
(1.21b) (1.21c)
(1.21d) (1.21e)
u(·) ∈ D.
(1.21f)
Remark 1.5 Obviously, the trajectory of above deterministic substitute control problem (1.21a–f) of the original optimal control problem under stochastic uncertainty (2.8a–d) can be represented by the m(r + 1)-vector function: ⎛
⎞ z(t, θ) ⎜ ∂z (t, θ) ⎟ ⎜ ∂θ1 ⎟ ⎟, t → ξ(t) := ⎜ .. ⎜ ⎟ . ⎝ ⎠ ∂z ∂θr (t, θ )
t 0 ≤ t ≤ tf .
(1.22)
Remark 1.6 Constraints of the expectation type, i.e. EhI I t, θ (ω), z t, θ (ω) ≤ (=) 0, can be evaluated as in (1.15) and (1.16b). This yields then deterministic constraints for the unknown functions t → z(t, θ ) and t → Dθ z(t, θ ), t ≥ t0 .
1.4 Taylor Approximation of Control Problems Under Stochastic Uncertainty: General Procedure Let us denote u = u(·) the open-loop or closed-loop control input function defined on a certain time interval [t0 , tf ]. Moreover, let z = z(·) be the output, hence, the trajectory, of the underlying dynamic system and a = a(ω) the vector of random parameters, such as the dynamic parameters, the external load, and the initial state at time t0 . The control problem can then be represented formally by minz(·)∈Z,u(·)∈U Ef (a(ω), z(·), u(·))
(1.23a)
s.t. S(a(ω), z(·), u(·)) = 0,
(1.23b)
22
1 Optimal Control Under Stochastic Uncertainty
where f = f (a, z, u) denotes the costs along the trajectory and at the terminal point tf , U is the set of admissible controls, and S = S(a, z, u) is the dynamic input–output operator represented by a differential equation. Supposing that (1.23b) can be solved uniquely for z(·), we have z(·) = z(·, a, u(·)),
(1.23c)
where · refers to the time t. Putting (1.23c) into the objective function f , problem (1.23a) reads minu(·)∈U Ef (a(ω), z(·, a(ω), u(·)), u(·)).
(1.24a)
Having uncertain, hence, random initial values z0 = z0 (ω), the parameter vector reads p(ω) a(ω) = , (1.24b) z0 (ω) where p = p(ω) is the vector of dynamic parameters, load coefficients, and other exterior coefficients. In this case the open-loop control u = u(·, a) = u(·, z0 )
(1.24c)
depends also on the random vector z0 = z0 (ω) of initial values with respect to the current starting time t0 = tb . Then, the stochastic optimization problem (1.24a) reads minu(·)∈U Ef (a(ω), z(·, a(ω), u(·, a(ω))), u(·, a(ω))).
(1.24d)
Since the expectations are defined here by multiple integrals, the objective function of (1.24a,d), resp., can be treated in general by means of approximation methods only. Supposing that the derivatives under consideration exist, in the following the expectation in (1.24a,d), resp., are determined by means of Taylor expansion of the integrand in (1.24a,d), resp., with respect to the parameter vector a at the mean vector a := Ea(ω). For simplification we consider first problem (1.24a), where we assume that the control function u(·) does not depend explicitly on the parameter vector a(ω). Thus, by differentiation of f(a) := f (a, z(·, a, u(·)), u(·))
(1.25)
with respect to a at a = a, we get f(a) :=f (a, z(·, a, u(·)), u(·))
(1.26a)
1.5 Control Problems with Linear and Sublinear Cost Functions
Da f(a)T =∇ f(a)T = Da2 f(a) =
∂f ∂z (z(·, a, u(·)), u(·)) z(·, a, u(·)) ∂z ∂a
23
(1.26b)
∂ 2f ∂z ∂z z(·, a, u(·))T 2 (z(·, a, u(·)), u(·)) z(·, a, u(·)) ∂a ∂a ∂z +
m ∂f ∂ 2 zk (z(·, a, u(·)), u(·)) 2 z(·, a, u(·)). ∂zk ∂a
(1.26c)
k=1
Moreover, the remainder term for the second order Taylor polynomial of f can be approximated by |R2 (a, a)| ≤
a3 3!
ν sup0≤θ≤1 i,j,k=1
∂ 3 f (a + θ a). ∂ai , ∂aj , ∂ak
(1.26d)
1.5 Control Problems with Linear and Sublinear Cost Functions In the following we consider objective functions, cf. (1.5a,b),
F (u(·)) = E
tf
Ldt + G
t0
of the type L = L(t, θ, z, u) := LI (t, θ, z) + LI I (t, θ, u),
(1.27a)
where LI is a linear and/or sublinear [7] cost functions with respect to the state vector z: LI = LI (t, θ, z) := c(t, θ )T z + max ci (t, θ )T z. i∈I
(1.27b)
Here, c = c(t, θ ), ci = ci (t, θ ), i ∈ I, t0 ≤ t ≤ tf , are vectors of cost coefficients depending on a random parameter vector θ = θ (ω) and a finite index set I . In the same way, the terminal cost function is defined by G = G(tf , θ, z(tf )) := c(tf , θ )T z(tf ) + max ci (tf , θ )T z(tf ). i∈I
(1.27c)
24
1 Optimal Control Under Stochastic Uncertainty
Cost functions L, G of this type arise, e.g., in active structural control, see, e.g., q [32, 33], where the state vector z = is composed of the vector q = q(t) q˙ of structural displacements and the vector of velocities q˙ = q(t). ˙ Evaluating the performance of the controlled structure by the compliance, i.e. the work done by the external load w = w(t, θ ) along the trajectory z = z(t), we have q . LI (t, θ, z) = C(t, θ, q) := w(t, θ ) q, z = q˙ T
(1.28)
For the expectation of the linear, sublinear, resp., cost function (1.27b) we get the representation, the lower bound, resp., ELI (t, θ (ω), z(t, θ (ω), u(t, ·))) = Ec(t, θ (ω))T z(t, θ (ω), u(t, ·)) +E max ci (t, θ (ω))T z(t, θ (ω), u(t, ·)) i∈I
(=) ≥ L˜ I (t, θ (ω), z(t, θ (ω), u(t, ·))) := Ec(t, θ (ω))T z(t, θ (ω), u(t, ·)) + max Eci (t, θ (ω))T z(t, θ (ω), u(t, ·)). i∈I
(1.29)
The expectation of the scalar products c(t, θ (ω))T z(t, θ (ω), u(t, ·)) and ci (t, θ (ω))T z(t, θ (ω), u(t, ·)) arising in (1.29) can be obtained approximately by Taylor expansion of c(t, θ ) and z = z(t, θ, u(·)) with respect to θ at a nominal point θ . Using here only expansions up to first order, we get the approximation Ec(t, θ (ω))T z(t, θ (ω), u(t, ·)) T ∂c ∂z ≈ E c(t, θ ) + (t, θ )(θ (ω) − θ ) z(t, θ , u(t, ·)) + (t, θ , u(t, ·))(θ (ω) − θ) ∂θ ∂θ ∂c ∂z = c(t, θ )T z(t, θ, u(t, ·)) + E(θ (ω) − θ )T (t, θ )T (t, θ , u(t, ·))(θ (ω) − θ), ∂θ ∂θ (1.30) assuming that θ := Eθ (ω). According to the results shown in [28], from (1.29), (1.30) we get the following result: Theorem 1.2 By means of first order Taylor expansion the expectation ELI of the linear/sublinear cost function LI can be approximated by L˜ I (t, θ(ω), z(t, θ(ω), u(t, ·))) = c(t, θ )T z(t, θ , u(t, ·)) + tr
∂z ∂c (t, θ )T (t, θ , u(t, ·))cov(θ(·)) ∂θ ∂θ
1.6 Stochastic Optimal Open-Loop Feedback Control of Tracking Systems
∂z ∂ci + max ci (t, θ )T z(t, θ , u(t, ·)) + tr (t, θ )T (t, θ, u(t, ·))cov(θ(·)) . ∂θ ∂θ i∈I
25
(1.31)
Denoting by Rc,1 = Rc,1 (t, θ , θ ), Rz,1 = Rz,1 (t, θ , θ ), u(·), θ = θ −θ , the remainders of the first order Taylor formulas, the error eL1 of the first order Taylor approximation reads eL1 = Rc,1 (t, θ , θ ) + Rz,1 (t, θ , θ ), u(·) ⎛ 2 ν ∂ 2c θ ⎝ (t, a + ϑθ ) ≤ sup 2! ∂ai ∂ak 0≤ϑ≤1 i,k=1
⎞ 2z ν ∂ + sup (t, a + ϑθ ) ⎠ . ∂ai ∂ak 0≤ϑ≤1
(1.32)
i,k=1
Concerning the reminder terms, the following special cases may often occur: Lemma 1.3 (i) For cost functions c(θ matrix B, e.g., B = I , ) = Bθ with a fixed we have Rc,1 = 0. (ii) If c(θ ) = θ T B (1) , . . . , θ T B (n) , then Rc,1 (t, θ , θ ) ≤ θ2 (k) 2 . 1≤k≤n B 2!
1.6 Stochastic Optimal Open-Loop Feedback Control of Tracking Systems According to Sect. 1.2, a stochastic optimal open-loop feedback control (SOLF) is defined, see Definition 1.2, by ϕ ∗ (t, It ) := u∗ t; t, It ,
t ≥ t0 .
Here, u∗ = u∗ (s; t, It ), t ≤ s ≤ tf , is a stochastic optimal open-loop control on each remaining time interval [t, tf ] with an arbitrary starting time point t, t0 ≤ t ≤ tf and the set of information It up to the current time t. Thus, at time tb = t just the first control value u∗ (t) = u∗ (t; t, It ) of u∗ = u∗ (·; t, It ) is used only. Consider now tracking problems with a prescribed reference trajectory q R = R q (t), t0 ≤ t ≤ tf . Moreover, let the feedforward control uR = uR (t), t0 ≤ t ≤ tf , be defined by R R R , q (t), q ˙ , q ¨ (t) = uR (t), t0 ≤ t ≤ tf , F pR D
(1.33a)
where pR D is an estimate or expectation of the stochastic vector of dynamic parameters.
26
1 Optimal Control Under Stochastic Uncertainty
For simplification we consider a PD-regulator. An open-loop control on a remaining interval [tb , tf ] has then the form: u := ϕ(t; tb , zb ), tb ≤ t ≤ tf ,
(1.33b)
where the information set Itb is given here by the state deviation zb between the actual state and the reference state at the initial time point tb of the remaining time interval [tb , tf ], thus, R qb q(tb ) q (tb ) R := z(tb ) = z(tb ) − z (tb ) = zb = ˙ − R . q(t ˙ b) q˙ (tb ) qb
(1.33c) (R)
If we start at time tb with zero errors, i.e. with zb = 0 and pD = pD , then without a control correction u(t) at any time t, tb ≤ t ≤ tf , the system follows the prescribed reference trajectory zR = zR (t), tb ≤ t ≤ tf . Consequently, in the following we consider open-loop controls with the property: ϕ(t; tb , 0) = 0,
tb ≤ t ≤ tf .
(1.34)
In the present case we have then, cf. [28], the following optimal regulator problem under stochastic uncertainty: ⎛ ⎜ min E ⎝
tf
˙ ˙ q(t)T Cq q(t) + q(t)C q˙ q(t)
tb
T ˙ ˙ +ϕ t; tb , q(t), q(t) Cu ϕ t; tb , q(t), q(t) dt
⎞ ⎟ At ⎠ b
(1.35a)
s.t. (R) (t) + q(t), q˙ (R) (t) + q(t), (R) (t) + q(t) ˙ ¨ + p (ω), q q ¨ F p(R) D D ˙ b ) a.s., t ≥ tb , (1.35b) = u(R) (t) + ϕ t; tb , q(tb ), q(t ˙ b ) = q ˙ b (ω) a.s. q(tb ) = qb (ω), q(t
(1.35c)
For the tracking error we obtain the representation: z(t) = z (t, pD , zb ) =
q (t; pD , zb ) ˙ (t; pD , zb ) , t ≥ tb , q
and according to the system dynamics (1.35b,c) it holds
(1.35d)
1.6 Stochastic Optimal Open-Loop Feedback Control of Tracking Systems
z (t, 0, 0) = 0.
27
(1.35e)
Having a stochastic optimal open-loop control ϕ ∗ = ϕ ∗ (t; tb , z(tb )) , tb ≤ t ≤ tf , of (1.35a–c), the stochastic optimal open-loop feedback control reads ϕOLF (t, z(t)) := ϕ ∗ (t; t, z(t)) , t0 ≤ t ≤ tf .
(1.36)
1.6.1 Approximation of the Expected Costs: Expansions of 1st Order According to (1.35a), we have to determine the following conditional expectation: E f (b) (t)|Atb := T E z(t)T Qz(t) + ϕ t; tb , z(tb ) Cu ϕ t; tb , z(tb ) Atb ,
(1.37a)
with Q0 :=
Cq 0 . 0 Cq˙
(1.37b)
For the computation of the expectation in (1.37a) also the open-loop control function ϕ = ϕ t; tb , z(tb ) is approximated by means of Taylor expansion at zb = 0. Because of ϕ(t; tb , 0) = 0, see (1.34), we get ϕ(t; tb , zb ) = Dz(tb ) ϕ(t; tb , 0)zb + . . .
(1.38)
with the unknown Jacobian Dz(tb ) ϕ(t; tb , 0) to be determined. Using first order Taylor approximation of the open-loop control ϕ = ϕ(t; tb , zb ), we find, cf. (1.37a) E f (b) (t)|Atb ≈ E f˜(b) |Atb ,
(1.39a)
where E f˜(b) |Ab := E z(t)T Q0 z(t) +zb T Dz(tb ) ϕ(t; tb , 0)T Cu Dz(tb ) ϕ(t; tb , 0)zb Atb . Define now the stochastic parameter vector
(1.39b)
28
1 Optimal Control Under Stochastic Uncertainty
pD (ω) . a(ω) = zb (ω)
(1.40a)
Because of (1.35e), by 1st order Taylor expansion of the tracking error we get the approximation: z(t, a) ≈ Da z (t, 0) a, t ≥ tb .
(1.40b)
Inserting (1.40b) into (1.39a,b), for E f (b) (t)|Ab we have the approximation: E f (b) (t)|Atb ≈ E a T Da z (t, 0)T Q0 Da z (t, 0) a +zb
T
Dzb ϕ(t; tb , 0) Cu Dzb ϕ(t; tb , 0)zb Atb , tb ≤ t ≤ tf . (1.41) T
From (1.41) and [28] we obtain now the following result: Theorem 1.3 The integrand of the objective function of the optimal regulator problem under stochastic uncertainty and open-loop PD-control can be approximated by E f˜(b) Atb = tr Da z (t, 0)T Q0 Da z (t, 0) E (b) a(·)a(·)T + tr Dzb ϕ(t; tb , 0)T Cu Dzb ϕ(t; tb , 0)E (b) zb (·)zb (·)T , t ≥ tb ,
(1.42)
where E (b) a(·)a(·)T is the symmetric matrix of the mixed second order moments of the random vector a(·). While the Jacobian Dzb ϕ(t; tb , 0) contains the parameters for the approximate optimal open-loop control ϕ ∗ = ϕ ∗ t; tb , z(tb ) , t ≥ tb , to be determined by the optimal regulator problem under stochastic uncertainty, according to (1.42) we still have to derive equations for the Jacobian Da z (t, a). Taking the derivative of the initial value problem (1.35a–c) with respect to the parameter vector a at a = 0, for the derivative Da q (t, a) of the tracking error q (t, a) at a = 0 we get the initial value problem R ˙ (t, 0) + M R (t)Da q ¨ (t, 0) Y (t), 0, 0 + K R (t)Da q (t, 0) + D R (t)Da q = 0, Dzb ϕ(t; tb , 0) , t ≥ tb , (1.43a) 0I 0 Da z (tb , 0) = . (1.43b) 00I In (1.43a,b) the Jacobians M R , K R , D R , Y R of the vector function F are defined, cf. [28],
1.6 Stochastic Optimal Open-Loop Feedback Control of Tracking Systems
29
R R R K R (t) := Dq F (pR D , q (t), q˙ (t), q¨ (t)),
(1.44a)
R R R D R (t) := Dq˙ F (pR D , q (t), q˙ (t), q¨ (t)),
(1.44b)
R R R Y R (t) := DpD F (pR D , q (t), q˙ (t), q¨ (t)),
(1.44c)
R R R M R (t) := Dq¨ F (pR D , q (t), q˙ (t), q¨ (t)).
(1.44d)
Using (1.35d), the second order linear initial value problem (1.43a,b) for the tracking error can be converted into a first order linear initial value problem: Theorem 1.4 The Jacobian of the tracking error and its time derivative z(t) = z (t, pD , zb ) with respect to a = (pD T , zb T )T at a = 0 fulfills the first order initial value problem: 0 I Da z (t, 0, 0) −M R (t)−1 K R (t) −M R (t)−1 D R (t) 0 0 0 + , t ≥ tb , (1.45a) −M R (t)−1 Y R (t) −M R (t)−1 Dzb ϕ(t; tb , 0) 0I 0 Da z (tb , 0, 0) = . (1.45b) 00I
d dt Da z (t, 0, 0)
=
Based on the above shown results Theorems 1.3 and 1.4 we get this result: Theorem 1.5 Applying open-loop control under stochastic uncertainty yields the following approximate optimal regulator problem: min
tf
tr Da z (t, 0)T Q0 Da z (t, 0) E (b) a(·)a(·)T
t0
+ tr Dzb ϕ(t; tb , 0)T Cu Dzb ϕ(t; tb , 0)E (b) zb (·)zb (·)T dt
(1.46a)
s.t. 0 I Da z (t, 0, 0) = −M R (t)−1 K R (t) −M R (t)−1 D R (t) 0 0 0 + (1.46b) , t ≥ tb , −M R (t)−1 Y R (t) −M R (t)−1 Dzb ϕ(t; tb , 0) 0I 0 Da z (tb , 0, 0) = . (1.46c) 00I
d dt Da z (t, 0, 0)
Remark 1.7 Comparison of Optimal open-loop control—Optimal feedback control under stochastic uncertainty
30
1 Optimal Control Under Stochastic Uncertainty
(i) Properties of the optimal control problems: According to (1.46a–c), for the computation of the optimal gain matrix Dzb ϕ(t; tb , 0) of the optimal openloop control we have a deterministic control problem with a quadratic objective function and linear constraints. Comparing the present results for the optimal open-loop control under stochastic uncertainty with the corresponding results in Chap. 2, Theorem 2.2, we see that in case of optimal feedback control under stochastic uncertainty the objective function is also quadratic in the unknown gain matrices Kd , (Ki ), Kp , resp., to be determined, but the perturbation differential equation for the determination of the sensitivities depends in a nonlinear way on the gain matrices Kd , (Ki ), Kp , respectively. (ii) Stability properties of the optimal controls: In the feedback control case the control system can be stabilized in easy way, see Chap. 2, Lemma 2.1, by an appropriate selection of the diagonal parameter matrices Kd , (Ki ), Kp . On the other hand, in the open-loop case the stability of the resulting control system depends on the properties of the mass matrix M R (t), the damping matrix D R (t) , and the stiffness matrix K R (t).
1.6.2 Approximate Computation of the Fundamental Matrix As can be seen from (1.46a–c), the system matrix A = A(t), t0 ≤ t ≤ tf of the linear dynamic equation (1.46b) depends on the matrices M R (t), D R (t), K R (t), hence, A = A(t) is in general a function of time t. Consequently, the fundamental matrix S = S(t) related to A = A(t) cannot be determined explicitly in general. However, due to present concept of open-loop feedback control, see (1.2), the openloop control function u := ϕ(t; tb , zb ), tb ≤ t ≤ tf , has to be determined on relatively small intervals [t0 , tf ] = [tb , tb + δ], b = 0, 1, . . .. Hence, the homogeneous linear differential equation d Da z (t, 0, 0) = A(t)Da z (t, 0, 0) , t0 ≤ t dt
(1.47a)
related to (1.46b) can be approximated on [t0 , tf ] = [tb , tb + δ] by d Da z (t, 0, 0) = A(tb )Da z (t, 0, 0) , tb ≤ t ≤ tb + δ. dt
(1.47b)
Thus, the fundamental matrix S = S(t) can be approximated on [t0 , tf ] = [tb , tb +δ] by S(t) ≈ eA(tb )(t−tb ) , tb ≤ t ≤ tb + δ.
(1.47c)
References
31
References 1. Allgöwer, F.E.A. (ed.): Nonlinear Model Predictive Control. Birkhäuser Verlag, Basel (2000) 2. Aoki, M.: Optimization of Stochastic Systems – Topics in Discrete-Time Systems. Academic Press, New York (1967) 3. Åström, K.J.: Introduction to Stochastic Control Theory. Elsevier, Amsterdam (1970) 4. Bauer, H.: Wahrscheinlichkeitstheorie und Grundzüge der Masstheorie. Walter de Gruyter & Co., Berlin (1968) 5. Bunke, H.: Gewöhnliche Differentialgleichungen mit zufälligen Parametern. AkademieVerlag, Berlin (1972) 6. Carathéodory, C.: Vorlesungen über reelle Funktionen. Teubner, Leipzig (1918) 7. Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. American Elsevier, Publishing Company, Inc., New York (1970) 8. Dieudonné, J.: Foundations of Modern Analysis. Academic Press, New York (1969) 9. Dreyfus, S.: Some types of optimal control of stochastic systems. J. SIAM Control 2(1), 120– 134 (1964) 10. Dreyfus, S.: Dynamic Programming and the Calculus of Variations. Academic Press, New York (1965) 11. Dreyfus, S.E., Law, A.M.: The Art of Dynamic Programming. Academic Press, New York (1977) 12. Dullerud, G., Paganini, F.: A Course in Robust Control Theory. Springer, New York (2000) 13. Findeisen, R., et al.: Sampled–Data Nonlinear Model Predictive Control for Constrained Continuous Time Systems, pp. 207–235. Springer, Berlin (2007) 14. Garcia, C.E., et al.: Model predictive control: theory and practice - a survey. Automatica 25(3), 335–348 (1989). https://doi.org/10.1016/0005-1098(89)90002-2 15. Gessing, R., Jacobs, O.L.R.: On the equivalence between optimal stochastic control and open-loop feedback control. Int. J. Control. 40(1), 193–200 (1984). https://doi.org/10.1080/ 00207178408933267 16. Kalman, R., et al.: Topics in Mathematical System Theory. McGraw-Hill Book Company, New York (1969) 17. Ku, R., Athans, M.: On the adaptive control of linear systems using the open-loop feedback optimal approach. IEEE Trans. Autom. Control 18, 489–493 (1973) 18. Luenberger, D.: Optimization by Vector Space Methods. J. Wiley, New York (1969) 19. Marti, K.: Convex approximations of stochastic optimization problems. Math. Meth. Oper. Res. 20, 66–76 (1975) 20. Marti, K.: Approximationen Stochastischer Optimierungsprobleme. Hain, Konigstein/Ts (1979) 21. Marti, K.: Stochastic optimization methods in robust adaptive control of robots. In: Groetschel, M.E.A. (ed.) Online Optimization of Large Scale Systems, pp. 545–577. Springer, Berlin (2001) 22. Marti, K.: Adaptive Optimal Stochastic Trajectory Planning and Control (AOSTPC) for Robots, pp. 155–206. Springer, Berlin (2004) 23. Marti, K.: Stochastic nonlinear model predictive control (SNMPC). In: 79th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), Bremen 2008, PAMM, vol. 8, Issue 1, pp. 10775–10776. Wiley-VCH, Weinheim (2008) 24. Marti, K.: Stochastic Optimization Methods, 2nd edn. Springer, Berlin (2008). https://doi.org/ 10.1007/978-3-540-79458-5 25. Marti, K.: Continuous-Time Control Under Stochastic Uncertainty. J. Wiley, Hoboken (2010). https://doi.org/10.1002/9780470400531.eorms0839 26. Marti, K.: Optimal control of dynamical systems and structures under stochastic uncertainty: stochastic optimal feedback control. Adv. Eng. Softw. 46, 43–62 (2012). https://doi.org/10. 1016/j.advengsoft.2010.09.008
32
1 Optimal Control Under Stochastic Uncertainty
27. Marti, K.: Stochastic optimal structural control: stochastic optimal open-loop feedback control. Adv. Eng. Softw. 44(1), 26–34 (2012). https://doi.org/10.1016/j.advengsoft.2011.05.040 28. Marti, K.: Stochastic Optimization Methods: Applications in Engineering and Operations Research, 3rd edn. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-46214-0 29. Richalet, J., et al.: Model predictive heuristic control: applications to industrial processes. Automatica 14, 413–428 (1978). https://doi.org/10.1016/0005-1098(78)90001-8 30. Schacher, M.: Stochastisch Optimale Regelung von Robotern. No. 1200 in Fortschritt-Berichte VDI, Reihe 8, Mess-, Steuerungs- und Regelungstechnik. VDI Verlag GmbH, Düsseldorf (2011) 31. Smirnov, W.: Lehrgang der Höheren Mathematik, Teil IV. Deutscher Verlag der Wissenschaft, Berlin (1966) 32. Soong, T.: Active structural control in civil engineering. Eng. Struct. 10, 74–84 (1988) 33. Soong, T.: Active Structural Control: Theory and Practice. John Wiley, New York (1990) 34. Stengel, R.: Stochastic Optimal Control: Theory and Application. J. Wiley, New York (1986) 35. Tse, E., Athans, M.: Adaptive stochastic control for a class of linear systems. IEEE Trans. Autom. Control 17(1), 38–52 (1972) 36. Walter, W.: Gewöhnliche Differentialgleichungen. Springer, Berlin (2000)
Chapter 2
Stochastic Optimization of Regulators
2.1 Introduction The optimal design of regulators is often based on the use of given, fixed nominal values of initial conditions, load, and other model parameters. However, due to variations of the material properties, measurement errors (e.g. in case of parameter identification), modeling errors (complexity of real systems), uncertainty on the working environment, the task to be executed, etc., the true initial conditions, external load, and further model parameters, like sizing parameters, mass values, gravity centers, moments of inertia, friction, tolerances, adjustment setting error, etc., are not known exactly in practice. Hence, a predetermined (optimal) regulator should be robust, i.e. the controller should guarantee satisfying results also in case of variations of the initial conditions, load, and other model parameters. Robust controls have been considered up to now mainly for uncertainty models based on given fixed sets of parameters, like multiple intervals, assumed to contain the unknown, true parameter. In this case one requires then often that the controlled system fulfills certain properties, as, e.g., certain stability properties for all parameter vectors in the given parameter domain. If the required property can be described by a scalar criterion, then the controller design is based on a minimax criterion, such as the H ∞ -criterion, see, e.g., [1, 4, 9, 10]. Since in many cases parameter uncertainty can be modeled more adequately by means of stochastic parameter models, in the following we suppose that the parameters involved in the regulator design problem are realizations of a random vector having a known or at least partly known joint probability distribution. The determination of an optimal controller under uncertainty with respect to model parameters, working neighborhood, modeling assumptions, etc. is a decision theoretical problem. Criteria of the type “holds for all parameter vectors in a given set” and the minmax-criterion are very pessimistic and often too strong. Indeed, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_2
33
34
2 Stochastic Optimization of Regulators
in many cases the available a priori and empirical information on the dynamic system and its working neighborhood allows a more adequate, flexible description of the uncertainty situation by means of stochastic approaches. Thus, it is often more appropriate to model unknown and varying initial values, external loads, and other model parameters as well as modeling errors, e.g. incomplete representation of the dynamic system, by means of realizations of a random vector, a random function with a given or at least partly known probability distribution. Consequently, the optimal design of robust regulators is based on an optimization problem under stochastic uncertainty [2, 13, 14, 17]. Stochastic Optimal Design of Regulator For the consideration of stochastic parameter variations as well as other uncertainties within the optimal design process of a regulator one has to introduce—as for any other optimization problem under (stochastic) uncertainty—an appropriate deterministic substitute problem. In the present case of stochastic optimal design of a regulator, hence, a map from the state or observation space into the space of control corrections, one has a control problem under stochastic uncertainty. Then, for the solution of the occurring deterministic substitute problems the methods of stochastic optimization, cf. [17], are available. As is well known [3, 20], the optimization of a regulator presupposes an optimal reference trajectory q R (t) and a corresponding feedforward control uR (t). In case of stochastic uncertainties, the R R guiding functions q (t), u (t) can be determined also by stochastic optimization methods [3, 15, 21]. The computation of stochastic optimal regulators is based now on deterministic substitute control problems of the following type: Minimize the expected total costs composed of (i) the costs arising from the deviation z(t) between the (stochastic optimal) reference trajectory and the effective trajectory of the dynamic system and (ii) the costs for the control corrections u(t) Subject to the following constraints: – dynamic equation of the underlying stochastic system with the total control input u(t) = uR (t) + u(t) being the sum of the feedforward control uR (t) and the control correction u(t) = ϕ(t, z(t)), – stochastic initial conditions q(t0 ) = q0 (ω), q(t ˙ 0 ) = q˙0 (ω) for the state of the system at the starting time point t0 , – conditions for the feedback law ϕ = ϕ(t, z(t)), such as ϕ(t, 0) = 0 (if the effective state is equal to the state prescribed by the reference trajectory, then no control correction is needed). Here, as often in practice, we use quadratic cost functions. The resulting deterministic substitute problem can be interpreted again as a control problem for the unknown feedback control law u(t) = ϕ(t, z(t)). A main problem is then the computation of the (conditional) expectation arising in the objective function. Since the expectations are defined by multiple integrals, the expectations can be determined in general only approximately. In the following, approximations based on Taylor expansions with respect to the stochastic parameter vector a(ω) at its
2.2 Regulator Design Under Stochastic Uncertainty
35
conditional mean a are taken into account. Using quadratic cost functions and first order Taylor expansions, hence, linearization, one has the advantage that often a certain part of the calculations can be done analytically! An Example: Active Control Under Stochastic Uncertainty As a typical application of the topic treated in this work, in Sect. 2.6 we present an illustrative example from active control under stochastic uncertainty: In order to stabilize mechanical structures under dynamic applied loads, active control strategies are taken into account [5, 22, 23]. The structures usually are stationary, safe, and stable without external dynamic disturbances, such as strong earthquakes, wind turbulences, water waves, etc. Thus, in case of dynamic disturbances, additional control elements can be installed enabling active control actions. Active control strategies for mechanical structures are applied in order to counteract heavy applied dynamic loads, such as earthquakes, wind, water waves, etc., which would lead to large vibrations causing possible damages of the structure. Modeling the structural dynamics by means of a system of first order random differential equations for the state vector (displacement vector q and time derivative of q), robust optimal controls are determined in order to cope with the stochastic uncertainty involved in the dynamic parameters, the initial values, and the applied loadings. The problem is modeled in the framework of stochastic optimal control for minimizing the expected total costs arising from the tracking error (deviation from the reference trajectory) of the structure and the regulation costs.
2.2 Regulator Design Under Stochastic Uncertainty Feedforward and feedback control are based on the dynamic equation of the underlying control system ˙ q(t)) ¨ = u(t), t ≥ t0 F (pD , q(t), q(t), q(t0 ) = q0 ,
q(t ˙ 0 ) = q˙0 .
(2.1a) (2.1b)
Here, q = q(t) denotes the vector of configuration variables, and u = u(t) is the control or input vector. Moreover, the state trajectory of the system z(t) :=
q(t) , t0 ≤ t ≤ t f q(t) ˙
(2.1c)
is the solution of the dynamic equation (2.1a,b) related to (i) the initial state z0 := q0 , (ii) the control or input vector u = u(t), and (iii) the vector of dynamic q˙0 parameters pD .
36
2 Stochastic Optimization of Regulators
For the control of a dynamic system one has [11, 14, 15] the following procedure: The control function u = u(t), t0 ≤ t ≤ tf , is represented by the equation u(t) := uR (t) + u(t), t0 ≤ t ≤ tf .
(2.2a)
Here, uR = uR (t), t0 ≤ t ≤ tf , denotes the feedforward control, and u = u(t), t0 ≤ t ≤ tf , is a control correction, represented by a feedback control. In a preceding planning phase, the feedforward control uR = uR (t), t0 ≤ t ≤ tf , is selected (tracking problem) such that the effective trajectory z = z(t), t0 ≤ t ≤ tf , follows in the mean a given (optimal) reference trajectory q R = q R (t), t0 ≤ t ≤ tf . Hence, uR (·) is defined by uR (t) := F (pD , q R (t), q˙ R (t), q¨ R (t)), t0 ≤ t ≤ tf ,
(2.2b)
where pD := E pD (ω)|At0
(2.2c)
denotes the conditional expectation with respect to the information available at time t0 . For the correction (compensation) of the tracking error z(t) := z(t) − zR (t) =
R q(t) q(t) q (t) = , t0 ≤ t ≤ tf , − R q˙ (t) q(t) ˙ q(t) ˙
(2.3)
q(t) hence, the deviation between the effective state z(t) = and the reference q(t) ˙ R q (t) one considers the PID-regulator trajectory zR (t) = q˙ R (t) u(t) := ϕ(t, q(t), qI (t), q(t)), ˙ t0 ≤ t ≤ t f ,
(2.4a)
where
t qI (t) :=
q(τ ) dτ, t0 ≤ t ≤ tf , t0
denotes the integrated position, and
t qI (t) :=
q(τ ) dτ = t0
t t0
q(τ ) − q R (τ ) dτ
(2.4b)
2.2 Regulator Design Under Stochastic Uncertainty
t =
t q(τ ) dτ −
t0
37
q R (τ ) dτ = qI (t) − qIR (t)
(2.4c)
t0
is then the deviation of the integrated position. Moreover, ϕ = ϕ(t, zI ) = ϕ(t, q, qI (t), q) ˙
(2.4d)
is the feedback function (feedback law) to be determined. In case of a zero tracking error, then, without further information about the control system, no control correction is needed, thus, ϕ(t, 0) = 0, t0 ≤ t ≤ tf .
(2.4e)
Working here with PID-regulators, the state z = z(t), and the deviation z(t) of the state from the reference state, resp., is replaced by the generalized state, the generalized tracking error zI (t), resp., ⎛
⎛ ⎞ ⎞ q(t) q(t) zI (t) := ⎝qI (t)⎠ , zI (t) = ⎝qI (t)⎠ q(t) ˙ q(t) ˙ ⎛ ⎞ q(t) − q R (t) := zI (t) − zIR (t) = ⎝qI (t) − qIR (t)⎠ , t0 ≤ t ≤ tf . q(t) ˙ − q˙ R (t)
(2.4f)
q(t) In former papers, cf. [14, 21], we assumed that the state z(t) = can be q(t) ˙ observed or measured exactly. Since in practice one has always some measurement error/noise, see, e.g., [6], we suppose now, cf. [16], that only a certain estimate ⎛
⎞ q(t) ˆ zˆ I (t) := ⎝qˆI (t)⎠ , t0 ≤ t ≤ tf , ˆ˙ q(t)
(2.5a)
of the effective generalized state zI (t) is available, can be observed. Having then the estimated/observed generalized tracking error ⎛
⎞ q(t) ˆ − q R (t) ˆ I (t) := zˆ I (t) − zIR (t) = ⎝qˆI (t) − q R (t)⎠ , t0 ≤ t ≤ tf , z I ˆ˙ − q˙ R (t) q(t) the control correction u(t) is defined now (PID control), cf. (2.4a), by
(2.5b)
38
2 Stochastic Optimization of Regulators
model parameters, initial values
z R (t)
OSTP
uR (t)
u(t) +
⎞ ⎛ q(t) zI (t) = ⎝ qI (t) ⎠ q(t) ˙
dynamic system
Δu(t) +
feedback
ˆ I (t) Δz
ezI (t)
−
observation error
zIR (t)
Fig. 2.1 Regulation under observation error
ˆ I (t)) = ϕ(t, zˆ I (t) − zIR (t)) u(t) := ϕ(t, z ˆ˙ − q˙ R (t)), t0 ≤ t ≤ tf . = ϕ(t, q(t) ˆ − q R (t), qˆI (t) − qIR (t), q(t)
(2.5c)
For the further analysis of the problem, the estimate/observation zˆ I (t) of the effective generalized state zI (t) is represented (Fig. 2.1) as follows: ⎛
⎞ eq (t) zˆ I (t) = zI (t) + ezI (t), ezI (t) = ⎝eqI (t)⎠ , t0 ≤ t ≤ tf . eq˙ (t)
(2.6a)
Here, eq = eq (t, ω), eqI = eqI (t, ω), eq˙ = eq˙ (t, ω) denotes the possibly timedependent random estimation, observation or measurement error, resp., arising at the determination of q(t), qI (t), and q(t). ˙ Considering additive and multiplicative errors [6, 16, 26], the random error vector ezI (t) is defined by ⎛
⎞ eq (t, ω) ezI (t, ω) = ⎝eqI (t, ω)⎠ := e(t, ω) + E(t, ω)zI (t, ω). eq˙ (t, ω)
(2.6b)
Here, with a time-dependent matrix = (t) and a corresponding random parameter vector e0 = e0 (ω), the additive errors e(t, ω) are represented in the following parametric form: ⎛
⎞ q (t)eq0 (ω) e(t, ω) = (t)e0 (ω) = ⎝qI (t)eqI 0 (ω)⎠ q˙ (t)eq0 ˙ (ω)
(2.6c)
2.2 Regulator Design Under Stochastic Uncertainty
39
with submatrices q (t), qI (t), q˙ (t) of (t) and subvectors eq0 (ω), eqI 0 (ω), eq0 ˙ (ω) of e0 (ω). Assuming that the estimate at each time point t is correct in the conditional mean, for the random vector e0 (ω) we have the condition: E e0 (ω)|At = 0,
(2.6d)
where At denotes the σ -algebra of information (events) available up to the time point t. Moreover, the block-diagonal matrix function of multiplicative errors ⎛ ⎞ Eq (t, ω) 0 0 E = E(t, ω) = ⎝ 0 0 ⎠ EqI (t, ω) 0 0 Eq˙ (t, ω)
(2.6e)
is composed of the diagonal blocks Eq (t, ω) = q (t)H (eqm (ω))
(2.6f)
EqI (t, ω) = qI (t)H (eqI m (ω))
(2.6g)
Eq˙ (t, ω) = q˙ (t)H (eqm ˙ (ω))
(2.6h)
with the matrices q (t), qI (t), q˙ (t) and the matrices H (eqm ), H (eqI m ), H (eqm ˙ ) depending linearly on the random vectors eqm (ω), qI m (ω), eqm (ω), where, cf. ˙ (2.6d), E eqm (ω)|At = E qI m (ω)|At = E eqm ˙ (ω)|At = 0.
(2.6i)
Additive Measurement Errors Only In this part we suppose that we only have additive errors, hence, E(t, ω) = 0, cf. (2.6b), and therefore ezI (t, ω) = e(t, ω).
(2.7)
Putting the representation (2.6a,c), (2.7) of zˆ I (t) into formula (2.5c), for the control correction then we get u(t) = ϕ(t, zˆ I (t) − zIR (t)) = ϕ(t, zI (t) + ezI (t, ω)) = ϕ(t, zI (t) + (t)e0 (ω)),
(2.8a)
where the effective tracking error zI (t) is defined, cf. (2.4f), by zI (t) := zI (t) − zIR (t).
(2.8b)
40
2 Stochastic Optimization of Regulators
Using the control correction given by (2.5c), corresponding to the estimated q(t) tracking error (2.5b), the effective state z = z(t) = , t0 ≤ t ≤ tf , and q(t) ˙ the related effective generalized state zI = zI (t), t0 ≤ t ≤ tf , of the control system is the solution of the following system of second order differential equations: ˙ q(t)) ¨ = u(0) (t) + u(t) = u(0) (t) F (pD , q(t), q(t), + ϕ(t, zI (t) + (t)e0 (ω))), t0 ≤ t ≤ tf
(2.9a)
q(t0 ) = q0 , q(t ˙ 0 ) = q˙0 .
(2.9b)
According to the above consideration, the solution q = q(t) of (2.9a,b) depends on the following quantities: q(t) = q(t, q0 , q˙0 , pD , e0 (.) ), t0 ≤ t ≤ tf .
(2.10)
initial dynamic parameter values param. obs. error
Additive and Multiplicative Measurement Errors Putting the representation (2.6a–i) of zˆ I (t) into formula (2.5c) for the control correction, then we get u(t) = ϕ(t, zˆ I (t) − zIR (t)) = ϕ(t, zI (t) + ezI (t) − zIR (t)) = ϕ(t, (I + E(t, ω))zI (t) + E(t, ω)zIR (t) + (t)e0 (ω)),
(2.11)
where zI (t) := zI (t) − zIR (t) is the effective tracking error, cf. (2.8b). Using the control correction given by (2.11), corresponding to the estimated q(t) tracking error (2.5b), the effective state z = z(t) = , t0 ≤ t ≤ tf , and q(t) ˙ the related effective generalized state zI = zI (t), t0 ≤ t ≤ tf , of the control system is the solution of a second order initial value problem as in the above additive case of measurement errors, see (2.9a,b). For simplification of the presentation, in the following we consider additive measurement errors only.
2.3 Optimal Feedback Functions Under Stochastic Uncertainty According to (2.5a–c) and (2.6c), (2.7), (2.9a,b), in case of observation errors, the arguments of the feedback function ϕ are given as follows: ˆ I (t))) ϕ =ϕ(t, z :=ϕ t, q (t)eq0 (ω) + q(t), qI (t)eqI 0 (ω) + qI (t), q˙ (t)eq0 ˙ . ˙ (ω) + q(t)
(2.12)
2.3 Optimal Feedback Functions Under Stochastic Uncertainty
41
Due to (2.12), for the optimal selection of a PID-regulator ϕ ∗ = ϕ ∗ (t, q, qI , q) ˙ in case of stochastic uncertainty we have the following optimization problem [17]: Definition 2.1 A stochastic optimal feedback control law ϕ ∗ = ϕ ∗ (t, zI (t)) is a solution of the stochastic optimization problem: tf c t, q(t), qI (t), q(t) min E ˙ t0
+γ t, ϕ (t, q(t), qI (t), q(t); ˙ e0 (ω) dt At0
(2.13a)
s.t. F (pD (ω), q(t), q(t), ˙ q(t)) ¨ = uR (t)
+ϕ t, q (t)eq0 (ω) + q(t), qI (t)eqI 0 (ω) + qI (t), q˙ (t)eq0 ˙ , ˙ (ω) + q(t) t0 ≤ t ≤ tf , a.s.
(2.13b)
q(t0 , ω) = q0 (ω),
(2.13c)
q(t ˙ 0 , ω) = q˙0 (ω),
(2.13d)
ϕ(t, 0) = 0.
(2.13e)
Here, the term c t, q(t), qI (t), q(t) ˙ in the objective function (2.13a) describes the costs resulting from the tracking error, and the costs for the control corrections are represented by γ t, u(t) = γ t, ϕ(t, ·, ·, ·) . Possible loss functions are convex quadratic functions, different norms, and—more general— sublinear functionals [7]. Having the stochastic optimal feedback control ϕ ∗ = ϕ ∗ (t, q, qI , q), ˙ the effective trajectory q = q(t), t ≥ t0 , of the stochastic optimally controlled dynamic system is then given by the solution of the initial value problem (2.13b–d). In the following we use the definitions: pD (ω) := pD (ω) − pD = pD (ω) − E pD |At0 q0 := q(t0 ) − q R (t0 ) = q0 − E q0 |At0 , q˙0 := q(t ˙ 0 ) − q˙ R (t0 ) = q˙0 − E q˙0 |At0 ,
(2.14a) (2.14b) (2.14c)
vector where pD (ω) denotes the deviation of the effective dynamic parameter pD = pD (ω) from its nominal value or mean pD . Here, pD := E pD (ω)|At0 denotes the (conditional) expectation of the random vector pD = pD (ω). The constraints of the regulator optimization problem (2.13a–e) can be represented then also by
42
2 Stochastic Optimization of Regulators
F (pD + pD (ω), q R (t) + q(t), q˙ R (t) + q(t), ˙ q¨ R (t) + q(t)) ¨ = uR (t) +ϕ t, q (t)eq0 (ω) + q(t), qI (t)eqI 0 (ω) + qI (t), q˙ (t)eq0 ˙ , ˙ (ω) + q(t) t0 ≤ t ≤ tf , a.s.,
(2.15a)
q(t0 , ω) = q0 (ω),
(2.15b)
q(t ˙ 0 , ω) = q˙0 (ω),
(2.15c)
ϕ(t, 0) = 0.
(2.15d)
Remark 2.1 For given control law ϕ = ϕ(t, q(t), qI (t), q(t)), ˙ the solution of the initial value problem (2.15a–c) yields the position error function q := q(t, a(ω)), t0 ≤ t ≤ tf ,
(2.16a)
depending on the random parameter vector T a = a(ω) := pD (ω)T , q0 (ω)T , q˙0 (ω)T , e0 (ω)T .
(2.16b)
2.3.1 Quadratic Cost Functions In the following we consider quadratic cost functions. Hence, with symmetric, positive (semi-) definite matrices Cq , CqI , Cq˙ , the costs c(t, ·, ·) arising from the tracking error are defined by ˙ c t, q(t), qI (t), q(t) := q(t)T Cq q(t) + qI (t)T CqI qI (t) + q(t) ˙ T Cq˙ q(t). ˙
(2.17a)
Moreover, with a positive (semi-) definite matrix Cu the regulator costs γ (t, ·, ·; ·, ·) are defined by γ t, ϕ(t, zI (t) + (t)e0 (ω)) := u(t)T Cu u(t) = ϕ(t, zI (t) + (t)e0 (ω))T Cu ϕ(t, zI (t) + (t)e0 (ω)).
(2.17b)
The total cost function is then defined by f (t) := q(t)T Cq q(t) + qI (t)T CqI qI (t) +q(t) ˙ T Cq˙ q(t) ˙ + u(t)T Cu u(t).
(2.18)
Thus, for the objective function of the regulator optimization problem (2.13a-e) we obtain
2.3 Optimal Feedback Functions Under Stochastic Uncertainty
tf
tf f (t) dt At0 = E f (t)At0 dt
t0
t0
E
43
tf = E q(t)T Cq q(t) + qI (t)T CqI qI (t) + q(t) ˙ T Cq˙ q(t) ˙ t0
+ ϕ(t, zI (t) + (t)e0 (ω))T Cu ϕ(t, zI (t) + (t)e0 (ω))At0 dt.
2.3.1.1
(2.19)
Computation of the Expectation by Taylor Expansion
We use now Taylor expansion method to compute approximatively the arising conditional expectations arising above. Thus, by linearization of the feedback function ϕ at zero tracking error and zero observation error, hence, zI = 0 and e0 (ω) = 0, we get ϕ(t, zI (t) + (t)e0 (ω)) ≈ ϕL (t, zI (t) + (t)e0 (ω)) := Dq ϕ(t, 0) q (t)eq0 (ω) + q(t) + DqI ϕ(t, 0) qI (t)eqI 0 (ω) + qI (t) ˙ = DzI ϕ(t, 0) zI (t) + (t)e0 (ω) , + Dq˙ ϕ(t, 0) q˙ (t)eq0 ˙ (ω) + q(t) (2.20a) since ϕ(t, 0) = 0, cf. (2.4e). Here, we have the gain matrices, represented by the Jacobians, see, e.g., [8], Dq ϕ(t, 0), DqI ϕ(t, 0), Dq˙ ϕ(t, 0),
(2.20b)
˙ resp., at (t, 0). of ϕ with respect to q, qI , q, Thus, the unknown feedback function is replaced, cf. Definition 2.1, in the optimal regulator problem under stochastic uncertainty (2.13a–e) by the total gain matrix: Definition 2.2 Linearizing the feedback function ϕ = ϕ(t, q, qI , q) ˙ at zI = 0, the unknowns of the stochastic optimal regulator problem (2.13a-g) are represented by the total gain matrix DzI ϕ(t, 0) = Dq ϕ(t, 0), DqI ϕ(t, 0), Dq˙ ϕ(t, 0) ,
(2.20c)
˙ resp., at (t, 0). composed of the Jacobians of ϕ with respect to q, qI , q, Taking into account the linearization (2.20a–c) of ϕ, for the expected total cost function we get now the approximation:
44
2 Stochastic Optimization of Regulators
E f At0 ≈ E f˜At0 ,
(2.21)
where ˜ E f |At0 := E q(t)T Cq q(t) + qI (t)T CqI qI (t) + q(t) ˙ T Cq˙ q(t) ˙ T + DzI ϕ(t, 0) zI (t) + (t)e0 (ω) × Cu
DzI ϕ(t, 0) zI (t) + (t)e0 (ω) At0 .
(2.22)
Defining the positive (semi-)definite matrices Qϕ (t) := DzI ϕ(t, 0)T Cu DzI ϕ(t, 0), ⎛ ⎞ Cq 0 0 Q0 := ⎝ 0 CqI 0 ⎠ , 0 0 Cq˙
(2.23a) (2.23b)
according to (2.22) we find f˜ := zI (t)T (Q0 + Qϕ (t))zI (t) + 2e0T (t)T Qϕ (t)zI (t) + e0T (t)T Qϕ (t)(t)e0 .
(2.24)
Since Cq , CqI , Cq˙ , and Cu are positive (semi-)definite matrices, we have f˜ ≥ 0. In order to determine the expectation of f˜, in addition to the Taylor approxima tion of the feedback law ϕ = ϕ t, q(t), qI (t), q(t) ˙ , we need also the Taylor expansion of zI = zI (t, ω) = zI (t, a(ω))
(2.25)
with respect to the parameter vector a, cf. (2.16b), at zero deviation and zero observation error. Because of Eqs. (2.2b,c) and conditions (2.15a–c) we have q(t, 0) = 0, t ≥ t0 ,
(2.26a)
q(t, ˙ 0) = 0, t ≥ t0 .
(2.26b)
Hence, the tracking error vector can be approximated as follows: q(t) ≈ DpD q(t, 0)pD (ω) + Dq0 q(t, 0)q0 (ω) + Dq˙0 q(t, 0)q˙0 (ω) + De0 q(t, 0)e0 (t, ω),
(2.27a)
2.3 Optimal Feedback Functions Under Stochastic Uncertainty
45
q(t) ˙ ≈ DpD q(t, ˙ 0)pD (ω) + Dq0 q(t, ˙ 0)q0 (ω) + Dq˙0 q(t, ˙ 0)q˙0 (ω) + De0 q(t, ˙ 0)e0 (t, ω),
(2.27b)
qI (t) ≈ DpD qI (t, 0)pD (ω) + Dq0 qI (t, 0)q0 (ω) + Dq˙0 qI (t, 0)q˙0 (ω) + De0 qI (t, 0)e0 (τ, ω).
(2.27c)
Consequently, we also have zI (t, a) ≈ DpD zI (t, 0)pD (ω) + Dq0 zI (t, 0)q0 (ω) + Dq˙0 zI (t, 0)q˙0 (ω) + De0 zI (t, 0)e0 (ω) = Da zI (t, 0)a(ω). (2.28) Inserting this into Eq. (2.24), we get the approximation: f˜ ≈ a T Da zI (t, 0)T (Q0 + Qϕ (t))Da zI (t, 0)a + 2e0T (t)T Qϕ Da zI (t, 0)a + e0T (t)T Qϕ (t)e0 . 2.3.1.2
(2.29)
Approximation of the Expectation of the Total Cost Function
Without limitation we may suppose the following properties of the occurring random parameters: Assumption The random vectors q0 (ω), pD (ω) as well as pD (ω), q˙0 (ω) are stochastically independent. Moreover, the error vector e0 = e0 (t, ω) is stochastically independent from the other random variables, i.e. pD (ω), q0 (ω), q˙0 (ω). For the conditional expectation of f˜ we obtain now the approximation E f˜At0 ≈ E a(ω)T Da zI (t, 0T (Q0 + Qϕ (t))Da zI (t, 0)a(ω)
+ 2e0 (ω)T (t)T Qϕ Da zI (t, 0)a(ω) + e0 (ω)T (t)T Qϕ (t)e0 (ω)At0 . (2.30)
In order to evaluate the above conditional expectation, let P denote one of the three matrices arising in (2.30). Moreover, “tr” denotes the trace of a quadratic matrix. Abbreviating then the random variables a = a(ω), e0 = e0 (ω) occurring in (2.30) by α, β, resp., for the conditional expectations of one of the occurring terms in (2.30) we have
46
2 Stochastic Optimization of Regulators
E α(ω)T Pβ(ω)At0 =
ν
ν ν [P ]kl E β(ω)α(ω)T At0 = [P ]kl E β(ω)α(ω)T At0 lk
k,l=1
=
ν
k=1
!
lk
l=1
k-th row of P · k-th column of E β(ω)α(ω)T At0
k=1
= tr P E β(ω)α(ω)T At0 . Note that in case of random vectors α, β with zero mean, cov β, α At0 := E β(ω)α(ω)T At0
(2.31a)
(2.31b)
is the matrix of all covariances the vectors α, β. Using now (2.31a) in the approximation (2.30), for the objective function (2.18) we find the approximation: Theorem 2.1 Based on the Taylor expansions of the feedback law and the tracking error, for the expectation of the total cost function E f At0 we get the following approximation: E f (t, ω)At0 ≈ tr Da zI (t, 0)T (Q0 + Qϕ (t))Da zI (t, 0)cov a(·)At0 + 2tr(t)T Qϕ (t)Da zI (t, 0)E a(·)e0 (·)T At0 (2.32) + tr(t)T Qϕ (t)(t) cov e0 (·)At0 .
2.4 Calculation of the Tracking Error Rates (Sensitivities) Because of the expansions (2.27a–c), (2.32), resp., we need the tracking error rates, i.e. the derivatives of the tracking error zI (t, a) with respect to the parameter vectors pD , q0 , q˙0 , e0 . The calculation of the derivatives is based on the initial value problem (2.13b–d). Due to Eqs. (2.15a–d), the constraints of the regulator optimization problem (2.13a–e) will be replaced now by the following equivalent ones: F (pD + pD (ω), q R (t) + q(t), q˙ R (t) + q(t), ˙ q¨ R (t) + q(t)) ¨ = uR (t) +ϕ t, q (t)eq0 (ω) + q(t), qI (t)eqI 0 (ω) + qI (t), q˙ (t)eq0 ˙ , ˙ (ω) + q(t)
2.4 Calculation of the Tracking Error Rates (Sensitivities)
47
t0 ≤ t ≤ tf , a.s.,
(2.33a)
q(t0 , ω) = q0 (ω),
(2.33b)
q(t ˙ 0 , ω) = q˙0 (ω),
(2.33c)
ϕ(t0 , 0) = 0.
(2.33d)
By partial differentiation of the Eq. (2.33a–c) with respect to the above mentioned parameter vectors we obtain five systems of differential equations with related initial values for the sensitivities or partial derivatives of zI = zI (t, a) with respect to the total parameter vector a needed in (2.32). For the representation of these matrix functions we introduce the following definition: Definition 2.3 The Jacobians of the vector function F in (2.33a) taken at pD , q R (t), q˙ R (t), q¨ R (t) are denoted by K R (t) := Dq F (pD , q R (t), q˙ R (t), q¨ R (t)),
(2.34a)
D (t) := Dq˙ F (pD , q (t), q˙ (t), q¨ (t)),
(2.34b)
Y R (t) := DpD F (pD , q R (t), q˙ R (t), q¨ R (t)),
(2.34c)
M R (t) := M(pD , q R (t)) = Dq¨ F (pD , q R (t), q˙ R (t), q¨ R (t)).
(2.34d)
R
R
R
R
2.4.1 Partial Derivative with Respect to pD As was shown in [16], by means of differentiation of (2.33a–c) with respect to the parameter subvector pD and using that q(t) = q(t, a) = q(t, pD , q0 , q˙0 , e0 ), for the partial derivative η(t) := DpD q(t, 0), t0 ≤ t ≤ tf ,
(2.35)
we obtain, cf. (2.12), the following initial value problem: R ˙ (t)η(t) ¨ Y R (t) + K R (t)η(t) + D R (t)η(t)+M
t = Dq ϕ(t, 0)η(t)+DqI ϕ(t, 0)
η(τ ) dτ + Dq˙ ϕ(t, 0)η(t), ˙ t0
(2.36a) η(t0 ) = 0, η(t ˙ 0 ) = 0.
(2.36b)
48
2 Stochastic Optimization of Regulators
Since the matrix M R = M R (t) is regular, see, e.g., [14], the unknown Jacobians of ϕ are represented as follows: Definition 2.4 With given fixed matrices Kp , Ki , Kd , represent the gain matrices Gp := Dq ϕ(t, 0), Gd := Dq˙ ϕ(t, 0), Gi := DqI ϕ(t, 0) by Kd =:M R (t)−1 D R (t) − Dq˙ ϕ(t, 0) , Kp =:M R (t)−1 K R (t) − Dq ϕ(t, 0) , Ki =:M R (t)−1 − DqI ϕ(t, 0) .
(2.37a) (2.37b) (2.37c)
Remark 2.2 In many cases the matrices Kp , Ki , Kd in (2.37a–c) are supposed to be diagonal matrices. With the definitions (2.37a–c), the initial value problem (2.36a,b) reads
t η(t) ¨ + Kd η(t) ˙ + Kp η(t) + Ki
η(τ ) dτ = −M R (t)−1 Y R (t), t ≥ t0 ,
t0
(2.38a) η(t0 ) = 0, η(t ˙ 0 ) = 0.
(2.38b)
2.4.2 Partial Derivative with Respect to q0 Let us denote I the unit matrix. Defining ξ(t) := Dq0 q(t, 0), t ≥ t0 ,
(2.39)
and using again definitions (2.37a–c), for ξ = ξ(t) we get the initial value problem: ξ¨ (t) + Kd ξ˙ (t) + Kp ξ(t) + Ki
t ξ(τ ) dτ = 0, t ≥ t0 ,
(2.40a)
t0
ξ(t0 ) = I, ξ˙ (t0 ) = 0.
(2.40b)
2.4.3 Partial Derivative with Respect to q˙0 Also the partial derivative with respect to the deviations q˙0 in the initial velocities q˙0 follows in the same way: With a corresponding differentiation and evaluation at a = (pD , q0 , q˙0 , e0 (·)) = 0, taking into account (2.37a–c), for the derivative
2.4 Calculation of the Tracking Error Rates (Sensitivities)
ζ (t) := Dq˙0 q(t, 0), t ≥ t0 ,
49
(2.41)
we finally get the initial value problem ζ¨ (t) + Kd ζ˙ (t) + Kp ζ (t) + Ki
t ζ (τ ) dτ = 0, t ≥ t0 ,
(2.42a)
t0
ζ (t0 ) = 0, ζ˙ (t0 ) = I.
(2.42b)
2.4.4 Partial Derivative with Respect to e0 The partial derivatives with respect to the sub vectors eq0 , eqI 0 , eq0 ˙ , resp., are given as follows:
2.4.4.1
Partial Derivative with Respect to eq
For the partial derivative with respect to eq0 λ(t) := Deq0 q(t, 0), t ≥ t0 ,
(2.43)
with the same methods as applied above, and using again definitions (2.37a–c), for the partial derivative of q with respect to eq0 we find the following initial value problem: ˙ + Kp λ(t) + Ki ¨ + Kd λ(t) λ(t)
t
λ(τ ) dτ = M R (t)−1 Dq ϕ(t, 0)q (t), t ≥ t0 ,
t0
(2.44a) λ(t0 ) = 0, λ˙ (t0 ) = 0.
(2.44b)
For the remaining derivatives μ(t) := DeqI 0 q(t, 0), ν(t) := Deq0 ˙ q(t, 0), t ≥ t0 ,
(2.45)
we obtain again an integro-differential equation of type (2.44a,b) where the expression Dq ϕ(t, 0)q (t) is replaced by DqI ϕ(t, 0)qI (t), Dq˙ ϕ(t, 0)q˙ (t), respectively. According to the form of the above given linear initial value problems for the sensitivities of the deviation q with respect to the parameters
50
2 Stochastic Optimization of Regulators
pD , q0 , q˙0 , eq0 , eq˙0 , for the stability of the above linear systems of 3rd order we have, cf. [1, 18, 21], the following result: Lemma 2.1 Suppose that Kd , Kp , Ki are diagonal matrices with positive diagonal elements [Kp ]jj , [Kd ]jj , [Ki ]jj , j = 1, . . . , jK . If the Hurwitz-criterion [Kp ]jj [Kd ]jj − [Ki ]jj > 0, j = 1, . . . , jK
(2.46)
holds, then the above perturbation integro-differential equations for the tracking error rates (sensitivities) are input-output stable, hence, the sensitivities are bounded, provided that the corresponding right hand sides are bounded functions.
2.5 The Approximate Regulator Optimization Problem In the following we consider the position error rate function, hence, the matrix function w = w(t) defined by the partial derivative of the position error function q with respect to the parameter vector a, cf. (2.16a,b) and (2.35), (2.39), . . . , (2.45): w(t) := Da q(t, 0) = DpD q(t, 0), Dq0 q(t, 0), Dq˙0 q(t, 0), Deq0 q(t, 0), DeqI 0 q(t, 0), Deq0 ˙ q(t, 0) = η(t), ξ(t), ζ (t), λ(t), μ(t), ν(t) . (2.47) According to the integro-differential equations for the (sensitivities) derived in Sect. 2.4, for the matrix function w = w(t) defined by (2.47) we have the following joint integro-differential equation:
t w(t) ¨ + Kd w(t) ˙ + Kp w(t) + Ki
w(τ ) dτ = ψ(t), t0 ≤ t ≤ tf ,
(2.48a)
t0
w(t0 ) = w0 , w(t ˙ 0 ) = w˙ 0 .
(2.48b)
Here, Kp , Kd , Ki are the block-diagonal matrices having the diagonal blocks Kp , Kd , Ki , respectively. Moreover, w0 , w˙ 0 are the matrices containing the initial conditions and ψ = ψ(t), t0 ≤ t ≤ tf , is the matrix composed of the right hand sides of Eqs. (2.35), (2.39), . . . , (2.45). By means of (2.47), the tracking error rate function can then be represented, cf. (2.4f), (2.28), by
2.5 The Approximate Regulator Optimization Problem
⎛
w(t)
51
⎞
⎜t ⎟ ⎜ ⎟ Da zI (t, 0) = ⎜ w(τ )dτ ⎟ . ⎝t0 ⎠ w(t) ˙
(2.49a)
In order to replace system (2.48a,b) by an equivalent system of first order differential equations, according to (2.49a), we introduce the matrix function: ⎛
⎞ 0 0 I A := ⎝ I 0 0 ⎠. −Kp −Ki −Kd
(2.49b)
Defining the matrix ⎛
w(t)
⎞
⎜t ⎟ ⎜ ⎟ W (t) := Da zI (t, 0) = ⎜ w(τ )dτ ⎟ , t0 ≤ t ≤ tf , ⎝t0 ⎠ w(t) ˙
(2.49c)
according to (2.49a) and (2.48a,b) we find that the linear integro-differential system (2.48a,b) is equivalent to the following first order system of differential equations: ⎛
⎞ ⎛ ⎞ 0 w0 W˙ (t) = AW (t) + ⎝ 0 ⎠ , t0 ≤ t ≤ tf , W (t0 ) := ⎝ 0 ⎠ . ψ(t) w˙ 0
(2.49d)
Consequently, the approximation of the regulator optimization problem under stochastic uncertainty yields the following deterministic optimal control problem: Theorem 2.2 Let W (t) = Da zI (t, 0), t0 ≤ t ≤ tf , denote the error rate function according to (2.49c). Then, the approximate regulator optimization problem under stochastic uncertainty can be represented by the error rate minimization problem in terms of weighted deviation and correction costs:
tf min t0
trW (t)T (Q0 + Qϕ (t))W (t)cov a(·)At0
+ 2tr(t)T Qϕ (t)W (t)E a(·)e0 (·)T At0 + tr(t)T Qϕ (t)(t) cov e(·)At0 dt
(2.50a)
52
2 Stochastic Optimization of Regulators
s.t. ⎛
⎞ 0 W˙ (t) = AW (t) + ⎝ 0 ⎠ , t0 ≤ t ≤ tf ψ(t) ⎛ ⎞ w0 ⎝ W (t0 ) := 0 ⎠ . w˙ 0
(2.50b)
(2.50c)
Remark 2.3 According to (2.22), (2.23a,b), (2.24), the objective function (2.50a) is nonnegative. Moreover, for given gain matrix function DzI ϕ(t, 0), t0 ≤ t ≤ tf , in Qϕ (t), the objective function (2.50a) is convex in the error rate matrix function W = W (t).
2.6 Active Structural Control Under Stochastic Uncertainty Because of the development of high strength materials and more efficient methods in structural analysis and design, larger and more complex mechanical structures, like tall buildings, offshore platforms, have been constructed. Constructions of this type provide only small damping in alleviating vibrations under heavy environmental loads such as strong earthquakes, wind turbulences, water waves, etc. The structures usually are stationary, safe, and stable without external dynamic disturbances, and external dynamic loads are the main sources inducing structural vibrations that should be controlled. Obviously, environmental loads, such as earthquakes, wind, waves, etc., are stochastic processes, having unknown time paths. In order to omit severe structural damages and therefore high compensation (recourse) costs, in recent years active control techniques were developed in structural engineering, see, e.g., [5, 12, 22, 23]. Basically, active control depends on the supply of external energy to counteract the dynamic response of a structure. An active control system consists therefore [19, 22, 23, 25] of (i) sensors installed at suitable locations of the structure to measure either external excitations or structural response quantities, (ii) devices to process the measured information and to compute necessary control forces based on a given control algorithm, and (iii) actuators to produce the required control forces. Possible technical devices, concepts, actuators to realize active structural controls are, e.g., electro hydraulic servomechanisms, passive/active/hybrid/semi active damping strategies, viscoelastic dampers, tuned mass dampers, aerodynamic appendages, gas pulse generators, gyroscopes, active structural members, and joints. Active structural enhancement consists of the use of active control to modify structural behavior. This enhancement can be used to actively stiffen, or strengthen (against Euler buckling) a given structure. Therefore, actively controlled structures
2.6 Active Structural Control Under Stochastic Uncertainty
53
can adaptively modify their stiffness properties to be either stiff or flexible as demanded. For example, optimal control strategies maximize the critical buckling load using sensors and actuators. The aim is then to actively stabilize the structure to prevent it from collapsing. Vibration control of large space structures [23]: Large space structures face difficult problems of vibration control. Because they require low weight, such structures will lack the stiffness and damping necessary for the passive control of vibration. Hence current research is directed towards the design of active vibration control to reduce the mean square response of the system to a desired level within a reasonable span of time. A great deal of research is currently in progress on designing active vibration control system for large flexible space structures. While the actual time path of the random external load is not known at the planning stage, we may assume that the probability distribution or at least the occurring/needed moments of the applied load are known. The underlying dynamic control system with random parameters is described often, cf. [22, 24], by a second order linear system of differential equations M q(t) ¨ + D q(t) ˙ + Kq(t) = Cu(t) + Ef (t), t0 ≤ t ≤ tf ,
(2.51)
where q = q(t) is the vector of structural displacements, M, D, K denotes the mass, damping, stiffness matrix, resp., and C, E is the matrix denoting the locations of the control forces, the excitations/applied loads, resp., on the structure, see also the basic system of differential equations (2.1a–c) used in this paper as well as the perturbation equations (2.39), . . . , (2.45) derived in Sect. 2.4. The evaluation of the performance of the control system, the feedback law, resp., can be done by means of convex cost functions. In the following we use weighted quadratic cost function for evaluating the deviation from the reference trajectory as well as for evaluation of the control corrections, see (2.17a,b), (2.18). The approximation/solution method presented here can then be applied directly to the optimal control problems under stochastic uncertainty, as occurring in active structural control under stochastic uncertainty. An example is given in the following.
2.6.1 Example We consider now the structure according to Fig. 2.2, see [5], where we want to control the supplementary active system while minimizing the expected total costs for the tracking errors and the costs for the control correction. Corresponding to [24], the behavior of the vector of displacements q = q(t, ω) can be described by means of a system of differential equations of second order:
54
2 Stochastic Optimization of Regulators
Fig. 2.2 Principle of active structural control
M
q¨0 (t, ω) q¨z (t, ω)
+D
q˙0 (t, ω, t q˙z (t, ω)
+K
q0 (t, ω) qz (t, ω)
= fL (t, ω) + fA (t) (2.52a)
with the matrices and vectors m0 0 M= 0 mz
mass matrix (2.52b)
D=
d0 + dz −dz −dz dz
damping matrix (2.52c)
K=
k0 + kz −kz −kz kz
stiffness matrix (2.52d)
fA (t) :=u(t) =
u1 (t) u2 (t)
actuator force (2.52e)
2.6 Active Structural Control Under Stochastic Uncertainty
f0 (t, ω) fL (t, ω) = fz (t, ω)
:= fL0 + (t) pD + pD (ω) ,
55
applied load (2.52f)
where fL0 is a constant load vector, = (t) denotes a given time-dependent matrix, and pD = pD (ω) = pD + pD (ω) is random vector with a known probability distribution. Moreover, u = u(t) is the control function, and the vector of displacements and its time derivatives reads q(t, ω) =
q˙ (t, ω) q0 (t, ω) , q(t, ˙ ω) = 0 . qz (t, ω) q˙z (t, ω)
(2.52g)
According to (2.52a), the function F of the dynamic equation (2.1a) reads F (pD , q(t), q(t), ˙ q(t)) ¨ = M q¨ + D q˙ + Kq − (fL0 + (t)pD ).
(2.53)
Based on the definition (2.2b) of the feedforward control uR = uR (t), t0 ≤ t ≤ tf , we get uR (t) = − fL0 + (t)pD for q R (t) = 0, t0 ≤ t ≤ tf .
(2.54)
Thus, in the optimal case of zero displacements q R (t, ω) = 0, q˙ R (t) = 0, t0 ≤ t ≤ tf , we have the feedforward control according to (2.54). In order to see more details, in the following we use the following simplifications: First we consider a PD control only. Hence, we have, cf. (2.3), (2.4a-f), z(t) = q(t) and therefore, see (2.54), q(t) ˙ u(t) := ϕ(t, z(t)) = ϕ(t, q(t), q(t)), ˙ t0 ≤ t ≤ t f , where R q(t) q(t) q (t) = z(t), t0 ≤ t ≤ tf . z(t) := z(t) − z (t) = = − R q˙ (t) q(t) ˙ q(t) ˙ R
In addition we suppose that the deviations of pD , q0 , q˙0 from their nominal values are hence, pD = 0, q0 = 0, q˙0 = 0. Thus, the random parameter vector a = a(ω) consists of the error vector e0 only. Consequently, according to (2.29), the approximate total cost function reads f˜ ≈ e0T De0 z(t, 0)T (Q0 + Qϕ (t))De0 z(t, 0)e0 + 2e0T (t)T Qϕ De0 z(t, 0)e0 + e0T (t)T Qϕ (t)e0 .
(2.55a)
56
2 Stochastic Optimization of Regulators
Rearranging terms, we also have T f˜ ≈e0T (t) + De0 z(t, 0) Qϕ (t)De0 (t) + De0 z(t, 0) e0 + e0T De0 z(t, 0)T Q0 De0 z(t, 0)e0 .
(2.55b)
Finally, due to Theorem 2.1 and (2.32), in the present case for the objective function we have this approximation: Corollary Based on the Taylor expansions of the feedback law and the tracking error, in case pD = 0, q0 = 0, q˙0 = 0 the approximate objective function of the regulator optimization problem under stochastic uncertainty reads E f (t, ω)At0 T ≈ tr (t) + De0 z(t, 0) Qϕ (t) (t) + De0 z(t, 0) cov e0 (·)At0 (2.55c) + trDe0 z(t, 0)T Q0 De0 z(t, 0) cov e0 At0 , where the unknowns, i.e. the gain matrix G := Dz ϕ(t, 0) is contained in the matrix function Qϕ (t) := Dz ϕ(t, 0)T Cu Dz ϕ(t, 0).
(2.55d)
According to (2.56b) the derivative λ(t) := De0 z(t, 0) of z with respect to e0 at (t, 0) needed in the objective function (2.55c) is given by following initial value problem: λ¨ (t) + Kd λ˙ (t) + Kp λ(t) = M −1 Dz ϕ(t, 0)(t), t ≥ t0 , ˙ 0 ) = 0, λ(t0 ) = 0, λ(t
(2.56a) (2.56b)
where (t) denotes the block-diagonal matrix composed of the matrices q (t), q˙ (t). Summarizing the above considerations, according to Theorem 2.1, the stochastic optimal regulator problem for the present example from active control under stochastic uncertainty can be described approximatively as follows: Corollary Suppose that only the additive measurement error is random. Then the optimal gain matrix G(t) := Dz ϕ(t, 0) of the approximate active control problem under stochastic uncertainty is the solution of the following optimal control problem:
2.6 Active Structural Control Under Stochastic Uncertainty
57
tf T tr (t) + λ(t) G(t)T Cu G(t) (t) + λ(t) cov e0 (·)At0 min G
t0
+ tr λ(t)T Q0 λ(t) cov e0 At0 dt
(2.57a)
s.t. the matrix initial value problem: ¨ + Kd λ(t) ˙ + Kp λ(t) = M −1 G(t)(t), t ≥ t0 , λ(t)
(2.57b)
λ(t0 ) = 0, λ˙ (t0 ) = 0,
(2.57c)
Remark 2.4 Using stochastic optimization methods, hence, incorporating random variations of model parameters, initial values, external forces, measurement errors, etc., into the optimization process for regulators, by formulating and solving appropriate deterministic substitute problems, robust optimal feedback control laws are obtained. Based on a (stochastic optimal) reference trajectory q R (t) and a corresponding feedforward control uR (t), for the computation of stochastic optimal regulators, deterministic substitute control problems of the following type are provided: Optimize the performance of the feedback control function ϕ = ϕ(t, z(t)) by minimizing the expected total costs resulting from a) the deviation z(t) between the (stochastic optimal) reference trajectory and the effective trajectory of the underlying dynamic system and b) the control corrections u(t) = ϕ(t, z(t)). Due to their favorable properties, weighted quadratic cost functions are chosen. For finding a stochastic optimal feedback control law ϕ(t, ·) or the corresponding gain matrices one has then a (deterministic) optimal control problem. The expectations, hence, multiple integrals, arising in the objective function have to be determined by approximation methods. Here, Taylor expansions with respect to the parameter vector a = a(ω) at the conditional expectation a¯ t0 are used. For the computation of the needed derivatives (tracking error rates or sensitivities) with respect to the parameter vector a, linear integro-differential equations have been derived, which can be transformed into systems of ordinary differential equations. According to the structure of the perturbation equations one obtains a stochastic optimal parametric representation of the gain matrices guaranteeing the stability of the differential equations for the tracking error rates. Moreover, an explicit analytical form is obtained for the calculation of the expectation-valued objective function. Finally, the resulting approximate deterministic optimal control problem can be solved by standard numerical solution procedures of optimal control. The given stochastic optimization procedure for regulators is then applied to the important field of active structural control under stochastic uncertainty. An example is given.
58
2 Stochastic Optimization of Regulators
References 1. Ackermann, J.: Robuste Regelung: Analyse und Entwurf von linearen Regelungssytemen mit unsicheren physikalischen Parametern. Springer, Berlin (1993) 2. Åström, K.J.: Introduction to Stochastic Control Theory. Elsevier, Amsterdam (1970) 3. Aurnhammer, A.: Optimale Stochastische Trajektorienplanung und Regelung von Industrierobotern. No. 1032 in Fortschrittberichte VDI, Reihe 8. VDI-Verlag GmbH, Düsseldorf (2004) 4. Basar, T., Bernhard, P.: H ∞ -Optimal Control and Related Minimax Design. Birkhäuser, Boston (1995) 5. Block, C.: Aktive Minderung personeninduzierter Schwingungen an weit gespannten Strukturen im Bauwesen. No. 336 in Fortschrittberichte VDI, Reihe 11, Schwingungstechnik. VDI-Verlag GmbH, Düsseldorf (2008) 6. Buonaccorsi, J.: Measurement Error: Models, Methods and Application. Chapman and Hall/CRC Press, Boca Raton (2010) 7. Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. American Elsevier, Publishing Company, Inc., New York (1970) 8. Dieudonné, J.: Foundations of Modern Analysis. Academic Press, New York (1969) 9. Dullerud, G., Paganini, F.: A Course in Robust Control Theory. Springer, New York (2000) 10. Guo, L.: H∞ output feedback control for delay systems with nonlinear and parametric uncertainties. IEE Proc. Control Theory Appl. 149(3), 226–236 (2002). https://doi.org/10. 1049/ip-cta:20020336 11. Kalman, R., et al.: Topics in Mathematical System Theory. McGraw-Hill Book Company, New York (1969) 12. Korkmaz, S.: Review: a review of active structural control: challenges for engineering informatics. Comput. Struct. 89(23–24), 2113–2132 (2011). https://doi.org/10.1016/j.compstruc. 2011.07.010 13. Marti, K.: Approximationen Stochastischer Optimierungsprobleme. Konigstein/Ts, Hain (1979) 14. Marti, K.: Stochastic optimization methods in robust adaptive control of robots. In: Groetschel, M.E.A. (ed.) Online Optimization of Large Scale Systems, pp. 545–577. Springer, Berlin (2001) 15. Marti, K.: Adaptive Optimal Stochastic Trajectory Planning and Control (AOSTPC) for Robots, pp. 155–206. Springer, Berlin (2004) 16. Marti, K.: Optimal design of regulators subject to stochastic uncertainty. In: Topping, B., Iványi, P. (eds.) Proceedings of the Twelfth International Conference on Computational Structures Technology, Paper 135. Civil-Comp Press, Stirlingshire (2014). https://doi.org/10. 4203/ccp.106.135 17. Marti, K.: Stochastic Optimization Methods: Applications in Engineering and Operations Research, 3rd edn. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-46214-0 18. Müller, P.: Stabilität und Matrizen: Matrizenverfahren in der Stabilitätstheorie linearer dynamischer Systeme. Springer, Berlin (1977) 19. Nagarajaiah, S., Narasimhan, S.: Optimal control of structures. In: Arora, J. (ed.) Optimization of Structural and Mechanical Systems, pp. 221–244. World Scientific, New Jersey (2007) 20. Pfeiffer, F., Johanni, R.: A concept for manipulator trajectory planning. IEEE J. Robot. Autom. 3(2), 115–123 (1987) 21. Schacher, M.: Stochastisch optimale Regelung von Robotern. No. 1200 in Fortschritt-Berichte VDI, Reihe 8, Mess-, Steuerungs- und Regelungstechnik. VDI Verlag GmbH, Düsseldorf (2011) 22. Soong, T.: Active Structural Control: Theory and Practice. John Wiley, New York (1990)
References
59
23. Soong, T., Costantinou, M.: Passive and Active Structural Vibration Control in Civil Engineering. CISM Courses and Lectures, vol. 345. Springer, Wien (1994) 24. Soong, T., Spencer, B.: Active, semi-active and hybrid control of structures. Bull. N. Z. Soc. Earthq. Eng. 33(3), 387–402 (2000) 25. Spencer, B., Nagarajaiah, S.: State of the art of structural control. J. Struct. Eng. 129(7), 845– 856 (2003). https://doi.org/10.1061/(ASCE)0733-9445(2003)129:7(845) 26. Wei, G., Wang, Z., Shen, B.: Error-constrained finite-horizon tracking control with incomplete measurements and bounded noises. Int. J. Robust Nonlinear Control 22(2), 223–238 (2012). https://doi.org/10.1002/rnc.1728
Chapter 3
Optimal Open-Loop Control of Dynamic Systems Under Stochastic Uncertainty
As shown in former sections on optimal tracking problems, optimal regulator problems, active structural optimization problems under stochastic uncertainty, approximate optimal feedback controls under stochastic uncertainties can be obtained by means of open-loop feedback control methods used also in model predictive control under stochastic uncertainty. In order to simplify the notation, the intermediate starting points tb of the open-loop control problems are here abbreviated here just by t0 . In the following the computation of approximate optimal open-loop controls is considered for more general optimal control problems under stochastic uncertainty. As a key tool stochastic optimization methods are used for the incorporation random variations of model parameters, initial values, external forces, measurement errors, etc. into the optimization process for finding robust optimal controls. Hence, corresponding deterministic substitute control problems are formulated as follows: Optimize the performance of the open-loop control u = u(t, z0 ), depending on the time t and the initial state z0 , by minimizing the expected total costs. The expectations, hence, multiple integrals, arising in the objective function have to be determined by approximation methods. Here, Taylor expansions with respect to the parameter vector a = a(ω) at the conditional expectation a¯ t0 are used. For the computation of the needed derivatives (tracking error rates or sensitivities) with respect to the parameter vector a, linear integro-differential equations are derived, which can be transformed into systems of ordinary differential equations.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_3
61
62
3 Optimal Open-Loop Control of Dynamic Systems Under Stochastic Uncertainty
3.1 Optimal Control Problems Under Stochastic Uncertainty Corresponding to the former chapters, dealing especially with optimal regulator, tracking and active structural control problems, here we consider following more general optimal control problem under stochastic uncertainty: Definition 3.1 A stochastic optimal open-loop control u∗ = u∗ (t; t0 , z0 ), t0 ≤ t ≤ tf , is a solution of the stochastic optimization problem: tf min E L t, a(ω), q(t), q(t), ˙ u(t, z0 (ω)) dt + G a(ω), q(tf ) At0
(3.1a)
t0
s.t. F (t, p(ω), q(t), q(t), ˙ q(t), ¨ u(t, z0 (ω))) = 0, t0 ≤ t ≤ tf , a.s.
(3.1b)
q(t0 , ω) = q0 (ω),
(3.1c)
q(t ˙ 0 , ω) = q˙0 (ω)
(3.1d)
u(t) ∈ Ut , t0 ≤ t ≤ tf ,
(3.1e)
where the open-loop control u(t) = u(t; t0 , z0 ) depends on the initial time t0 (tb ) q and on the initial state z0 (zb ) := 0 . q˙0
3.1.1 Computation of the Expectation of the Cost Functions L, G A first main problem is the computation of the expectations EL, EG of the cost function L along the trajectory and the terminal cost function G. In the following, let denote a = a(ω) the stochastic data vector available at time t0 ⎛ ⎞ p(ω) p(ω) = ⎝q0 (ω)⎠ . a(ω) := a(ω) := z0 (ω) q˙0 (ω)
(3.2a)
Moreover, let denote ⎛ N⎞ N p p q (t, a) ⎝q N ⎠ = , a N := z = z(t, a) := 0 0 z0N q˙0 (t, a) q˙0N
(3.2b)
3.1 Optimal Control Problems Under Stochastic Uncertainty
63
the state vector, a nominal data vector, resp., e.g. the expectation a N = Ea(ω) of a(ω). For simplification of the presentation we suppose that L = L(t, z, u) = LI (t, z) + LI I (t, u(t)), G = G(tf , z(tf )), with
(3.3)
z(t) = z(t, a), u(t) = u(t, a) = u(t, z0 ), hence, L, G do not depend explicitly on the random parameter vector a. Using second order Taylor expansions of L with respect to a at a N , we get L(t, z(t, a), u(t, z0 )) ≈ LI (t, z(t, a N )) + LI I (t, u(t, z0N )) (3.4a) ∂z ∂LI I ∂u ∂LI (t, z(t, a N )) (t, a N )(a − a N ) + (t, u(t, z0N )) (t, z0N )(z0 − z0N ) + ∂z ∂a ∂u ∂z0 1 ∂z ∂ 2 LI ∂z + (a − a N )T (t, a N )T (t, z(t, a N )) (t, a N )(a − a N ) 2 ∂a ∂a ∂z2 ∂LI ∂ 2z 1 (t, z(t, a N )) ⊗ 2 (t, a N )(a − a N ) + (a − a N )T 2 ∂z ∂a ∂u ∂ 2 LI I ∂u 1 (t, z0N )T (t, u(t, z0N )) (t, z0N )(z0 − z0N ) + (z0 − z0N )T 2 2 ∂z0 ∂z0 ∂u 1 ∂LI I ∂ 2u + (z0 − z0N )T (t, u(t, z0N )) ⊗ (t, z0N )(z0 − z0N ). 2 ∂u ∂z0 2 Moreover, the second order Taylor expansions of G with respect to a at a N reads G(tf , z(tf , a)) ≈ G(tf , z(tf , a N )) +
∂G ∂z (tf , z(tf , a N )) (tf , a N )(a − a N ) ∂z ∂a (3.4b)
∂z ∂ 2G ∂z 1 (tf , a N )T 2 (tf , z(t, a N )) (tf , a N )(a − a N ) + (a − a N )T 2 ∂a ∂a ∂z ∂G ∂ 2z 1 (tf , z(tf , a N )) ⊗ 2 (tf , a N )(a − a N ). + (a − a N )T 2 ∂z ∂a Here, for L = LI , LI I , G we put ∂L ∂ 2 zk ∂ 2z ∂L ⊗ 2 := . ∂z ∂zk ∂a 2 ∂a n
k=1
From (3.4a,b), we obtain, cf. [1]:
(3.5)
64
3 Optimal Open-Loop Control of Dynamic Systems Under Stochastic Uncertainty
Theorem 3.1 Assume that a N = a := Ea(ω), and let denote cov(a(·)), cov(z0 (·)), resp., the covariance matrix of a = a(ω), z0 = z0 (ω) available at time t = t0 . Retaining only first order derivatives of z, u with respect to a, z0 , resp., the expected cost function L can be approximated by EL(t, z(t, a(ω)), u(t)) ≈ LI (t, z(t, a) + LI I (t, u(t, z0 ))
(3.6a)
∂ 2 LI ∂z 1 ∂z (t, z(t, a)) (t, a)cov(a(·)) + tr (t, a)T 2 2 ∂a ∂a ∂z ∂ 2 LI I ∂u 1 ∂u (t, z0 )T (t, u(t, z0 )) (t, z0 )cov(z0 (·)). + tr 2 ∂z0 ∂z0 ∂u2 Moreover, for EG we have EG(tf , z(tf , a(ω))) ≈ G(tf , z(tf , a)
(3.6b)
∂ 2G
∂z 1 ∂z + tr (tf , a)T 2 (tf , z(tf , a)) (tf , a)cov(a(·)). 2 ∂a ∂a ∂z As can be seen from (3.6a,b), for the approximate computation of the expected ∂z cost function EL we need the trajectory t → z(t, a) ¯ and the Jacobian ∂a (t, a). ¯ Having the initial value problem, cf. (3.1a–e), F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )) = 0, t0 ≤ t ≤ tf ,
(3.7a)
q(t0 , a) = q0 ,
(3.7b)
q(t ˙ 0 , a) = q˙0 ,
(3.7c)
the basic trajectory t → z(t, a) is obtained by solving the initial value problem (3.7a–c) with the parameter vector a := a. Based on this trajectory, an initial value ∂z problem for the Jacobian ∂a (t, a) of z = z(t, a) with respect to the parameter vector a at a is obtained by partial differentiation of (3.7a–c) with respect to a at a. Hence, define first the Jacobians Y (t, a), K(t, a), D(t, a), M(t, a), B(t, a) of the vector function F with respect to a, q, q, ˙ q, ¨ u at t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 ): ˙ a), q(t, ¨ a), u(t, z0 )), Y(z,u) (t, a) := Da F (t, p, q(t, a), q(t,
(3.8a)
K(z,u) (t, a) := Dq F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )),
(3.8b)
D(z,u) (t, a) := Dq˙ F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )),
(3.8c)
M(z,u) (t, a) := Dq¨ F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )),
(3.8d)
B(z,u) (t, a) := Du F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )).
(3.8e)
Next to, for the Jacobian
∂q ∂a (t, a)
we get the system of differential equations
3.1 Optimal Control Problems Under Stochastic Uncertainty
Y(z,u) (t, a) + K(z,u) (t, a)
∂q d ∂q (t, a) + D(z,u) (t, a) (t, a) ∂a dt ∂a
65
(3.9)
d 2 ∂q ∂u (t, a) + B(z,u) (t, a) (t, z0 ) = 0, t0 ≤ t ≤ tf , 2 ∂a dt ∂a ∂q (tf , a) = (0, I, 0), ∂a d ∂q (tf , a) = (0, 0, I ), dt ∂a ∂u where ∂u ∂a (t, z0 ) := 0, ∂z0 (t, z0 ) . +M(z,u) (t, a)
The second order linear initial value problem (3.9) can be converted into a first order linear initial value problem: ∂z Theorem 3.2 Given the parameter vector a = a := Ea(ω), the matrix ∂a (t, a) := ! ∂q ∂a (t, a) composed of the Jacobian of q = q(t, a) and its time derivative d ∂q dt ∂a (t, a)
fulfills the first order initial value problem: d ∂z ∂z 0 I (t, a) = (t, a) −1 −1 −M(z,u) (t, a) K(z,u) (t, a) −M(z,u) (t, a) D(z,u) (t, a) ∂a dt ∂a O O , + + −M(z,u) (t, a)−1 B(z,u) (t, a) ∂u −M(z,u) (t, a)−1 Y(z,u) (t, a) ∂a (t, z0 ) t 0 ≤ t ≤ tf , ∂z 0I 0 (tf , a) ¯ = . 00I ∂a
(3.10a) (3.10b)
According to the above shown Theorem 3.1 and Theorem 3.2 we find the following result: Theorem 3.3 For the approximate computation of an optimal open-loop control u∗ = u∗ (t, z0 ), according to Definition 3.1, we have the optimization problem:
tf min LI (t, z(t, a)) ¯ + LI I (t, u(t, z0 )) t0
∂ 2 LI ∂z 1 ∂z ¯ T (t, a)cov(a(·)) ¯ (t, z(t, a)) ¯ + tr (t, a) 2 ∂a ∂a ∂z2 1 ∂u ∂ 2 LI I ∂u + tr (t, z0 )T (t, u(t, z )) (t, z )cov(z (·)) dt 0 0 0 2 ∂z0 ∂z0 ∂u2
(3.11a)
66
3 Optimal Open-Loop Control of Dynamic Systems Under Stochastic Uncertainty
∂ 2G 1 ∂z ∂z + G(tf , z(tf , a) + tr (tf , a)T 2 (tf , z(tf , a)) (tf , a)cov(a(·)) 2 ∂a ∂a ∂z s.t. F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )) = 0, t0 ≤ t ≤ tf ,
(3.11b)
q(t0 , a) = q¯0 ,
(3.11c)
q(t ˙ 0 , a) = q¯˙0 ,
(3.11d)
d ∂z ∂z 0 I (t, a) ¯ = (t, a) ¯ −1 −1 −M(z,u) (t, a) K(z,u) (t, a) −M(z,u) (t, a) D(z,u) (t, a) ∂a dt ∂a O O + , + −M(z,u) (t, a)−1 B(z,u) (t, a) ∂u −M(z,u) (t, a)−1 Y(z,u) (t, a) ∂a (t, z0 ) t 0 ≤ t ≤ tf , ∂z 0I 0 ¯ = (t0 , a) . 00I ∂a
(3.11e) (3.11f)
∂u∗ ∗ With the optimal solutions u∗ = u∗ (t, z0 ) and ∂u ∂a (t, z0 ) := 0, ∂z0 (t, z0 ) the approximate optimal open-loop control is given by u∗ (t, z0 ) = u∗ (t, z0 ) +
∂u∗ (t, z0 )(z0 − z0 ). ∂z0
(3.11g)
3.2 Solution of the Substitute Control Problem The present substitute problem (3.11a–f) can be interpreted as a two-stage control problem: In the inner loop or second stage, for given functions z = z(t, a), u = u(t, z0 ), t0 ≤ t ≤ tf ,
(3.12a)
corresponding to the expected parameter vector a, z0 , resp., we have the optimal computation of the sensitivities ∂z ∂u ∂z ∂u = (t, a), ¯ = (t, z0 ), t0 ≤ t ≤ tf . ∂a ∂a ∂z0 ∂z0
(3.12b)
Then, in the outer loop or first stage the optimal trajectory and control (3.12a) have to be determined.
3.2 Solution of the Substitute Control Problem
67
A complete separation of the two loops occurs if the Hessians ∂ 2 LI ∂ 2 LI ∂ 2 LI I ∂ 2 LI I = (t), = (t), t0 ≤ t ≤ tf , ∂z2 ∂z2 ∂u2 ∂u2
(3.12c)
and the matrices Y(z,u) (t, a) = Y (t, a), K(z,u) (t, a) = K(t, a), D(z,u) (t, a) = D(t, a), M(z,u) (t, a) = M(t, a), B(z,u) (t, a) = B(t, a)
(3.12d)
are independent of the guiding functions (3.12a). This holds, e.g., if the vector function F (t, p, q, q, ˙ q, ¨ u) := Y (t)p + K(t)q + D(t)q˙ + M(t)q¨ + B(t)u
(3.12e)
is linear in its arguments (p, q, q, ˙ q, ¨ u) and LI = LI (t, z), LI I = LI I (t, u) are quadratic cost functions with constant Hessians. An approximate decoupling can be obtained by solving first the following separate first- stage control problem with the mean parameter vector a:
tf min
LI (t, z(t, a)) + LI I (t, u(t, z0 )) dt + G(tf , z(tf , a))
(3.13a)
t0
s.t. F (t, p, q(t, a), q(t, ˙ a), q(t, ¨ a), u(t, z0 )) = 0, t0 ≤ t ≤ tf ,
(3.13b)
q(t0 , a) = q 0 ,
(3.13c)
q(t ˙ 0 , a) = q˙ 0
(3.13d)
u(t, z0 ) ∈ Ut , t0 ≤ t ≤ tf .
(3.13e)
Let then z∗ = z∗ (t, a), u = u∗ (t, z0 ), t0 ≤ t ≤ tf , denote an optimal solution of the control problem (3.13a–e). Inserting now the resulting functions z ∗ (·), u ∗ (·) into the matrices Y, K, D, M, B, hence defining, c.f. (3.8a–e), Y ∗ (t, a) := Y(z∗ ,u∗ ) (t, a), ∗
(3.14a)
K (t, a) := K(z∗ ,u∗ ) (t, a),
(3.14b)
D ∗ (t, a) := D(z∗ ,u∗ ) (t, a),
(3.14c)
∗
M (t, a) := M(z∗ ,u∗ ) (t, a),
(3.14d)
B ∗ (t, a) := B(z∗ ,u∗ ) (t, a),
(3.14e)
t0 ≤ t ≤ tf ,
68
3 Optimal Open-Loop Control of Dynamic Systems Under Stochastic Uncertainty
we obtain the following separated inner loop or second stage control problem:
tf min
1 ∂z ∂ 2 LI ∂z (t, z(t, a)) ¯ tr (t, a) ¯ T (t, a)cov(a(·)) ¯ 2 2 ∂a ∂a ∂z
(3.15a)
t0
∂ 2 LI I ∂u 1 ∂u (t, z0 )T (t, u(t, z )) (t, z )cov(z (·)) dt + tr 0 0 0 2 ∂z0 ∂z0 ∂u2 1 ∂z ∂ 2G ∂z + tr (tf , a)T 2 (tf , z(tf , a)) (tf , a)cov(a(·)) 2 ∂a ∂a ∂z s.t. d ∂z ∂z 0 I (t, a) ¯ = (t, a) ¯ −M ∗ (t, a)−1 K ∗ (t, a) −M ∗ (t, a)−1 D ∗ (t, a) ∂a dt ∂a O O + , + −M ∗ (t, a)−1 B ∗ (t, a) ∂u −M ∗ (t, a)−1 Y ∗ (t, a) ∂a (t, z0 ) t 0 ≤ t ≤ tf , ∂z 0I 0 (t0 , a) ¯ = . 00I ∂a
(3.15b) (3.15c)
Adding the optimal values of the separated control problems (3.13a–e) and (3.15a–c), we get an upper bound of the optimal value of the original coupled control problem (3.11a–f).
3.3 More General Dynamic Control Systems As stated in (1.1a,b), more general dynamic control systems can be described by a first order initial value problem z˙ (t) = g t, p(ω), z(t), u(t) , t0 ≤ t ≤ tf , z(t0 ) = z0 (ω).
(3.16a) (3.16b)
Corresponding to (3.6a,b), for the Taylor expansion of the expected costs we need ∂z the Jacobian ∂a (t, a). Proceeding as above, for this we get, cf. Theorem 3.2, the following initial value problem
Reference
69
Theorem 3.4 d ∂z ∂g ∂z (t, a) = (t, p, z(t, a), u(t, z0 )) (t, a) dt ∂a ∂z ∂a
∂g ∂g ∂u (t, z0 ) , + (t, p, z(t, a), u(t, z0 )), (t, p, z(t, a), u(t, z0 )) ∂p ∂u ∂z0
∂z (t0 , a) = 0, I . ∂a
(3.17a) (3.17b)
Reference 1. Marti, K.: Stochastic Optimization Methods: Applications in Engineering and Operations Research, 3rd edn. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-46214-0
Chapter 4
Construction of Feedback Control by Means of Homotopy Methods
Homotopy methods are known from topology [3] to describe deformations of certain mathematical objects O, e.g. geometrical bodies, sets, functions, etc., such that an initial object O1 is continuously transformed into a terminal one O2 . In the following we apply this procedure to the construction of feedback control laws by starting from open-loop controls which can be obtained in general much easier than feedback control laws. Thus, a given open-loop control u0 = ϕ(t, z0 ), depending besides the time t on the initial state z0 only, is continuously transformed (“deformed”) into a feedback control u1 = ϕ(t, z). We suppose now that the dynamic equation of the control system is modeled by a first order initial value problem: z˙ (t) =f (t, θ (ω), z(t), u(t)) ,
t 0 ≤ t ≤ tf ,
z(t0 ) =z0 (ω).
(4.1a) (4.1b)
Moreover, we consider the construction of a state-feedback control u(t) = ϕ(t, z(t)),
t ≥ t0 ,
(4.2a)
with a feedback control law ϕ = ϕ(t, z) optimizing a certain cost function to be defined later on. Using the homotopy theory, cf. [2, 4], the construction is based on a transfer u (t) = ϕ (t, z) := ϕ0 (t, z) + (ϕ(t, z) − ϕ0 (t, z)),
0 ≤ ≤ 1,
(4.2b)
from an open-loop control (put = 0) uOL = ϕ0 (t, z) := u0 (t, z0 ),
t 0 ≤ t ≤ tf ,
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_4
(4.2c) 71
72
4 Construction of Feedback Control by Means of Homotopy Methods
with a time-function u0 = u0 (t, z0 ), to a feedback control (put = 1) uF B = ϕ = ϕ(t, z),
t0 ≤ t ≤ tf .
(4.2d)
Supposing at the moment that the controls u0 = u0 (t, z0 ) and ϕ = ϕ(t, x) are given, and putting the hybrid control law (4.2b) into the dynamic equation (4.1a,b), for each , 0 ≤ ≤ 1, we get the initial value problem: z˙ (t) =f (t, θ (ω), z(t), ϕ (t, z)) ,
t 0 ≤ t ≤ tf ,
z(t0 ) =z0 (ω).
(4.3a) (4.3b)
Obviously, (4.3a,b) contains for = 0, 1, resp., the open-loop, closed loop dynamic control system. Under standard analytical assumptions on the function f = f (t, θ, z, u), system (4.3a,b) yields a unique solution, analytical in the variables (t, ), (t, θ, ), resp., cf. [1]: z = z(t, a, ),
t0 ≤ t ≤ t f ,
0 ≤ ≤ 1,
(4.4a)
where a is the total parameter vector z a := 0 . θ
(4.4b)
f (t, θ, z, u) = A(t, θ )z + B(t, θ )u + b(t, θ ),
(4.5a)
In case of an affine-linear function
from (4.3a,b) we get the (partly linear) dynamical system z˙ (t, a, ) =A(t, θ )z(t, a, ) + B(t, θ )ϕ (t, z(t, a, )), +b(t, θ ), t 0 ≤ t ≤ tf ,
0 ≤ ≤ 1,
z(t0 , a, ) =z0 .
(4.5b) (4.5c)
In the following, for = 0, = 1, resp. we compare now the solutions zOL = zOL (t, a) := z(t, a, 0), t0 ≤ t ≤ tf ,
(4.6a)
zF B = zF B (t, a) := z(t, a, 1), t0 ≤ t ≤ tf ,
(4.6b)
of the open-loop ( = 0), feedback ( = 1), resp., control system. Using first the mean value theorem [1], we have zF B (t, a) − zOL (t, a) = z(t, a, 1) − z(t, a, 0) =
∂z (t, a, ϑ) ∂
(4.7a)
4 Construction of Feedback Control by Means of Homotopy Methods
73
with 0 < ϑ = ϑ(t,a) < 1. By means of Taylor expansion of z = z(t, a, ) with respect to at = 0, we get zF B (t, a) = z(t, a, 1) = zOL (t, a) +
∞ 1 ∂kz (t, a, 0), k! ∂ k
(4.7b)
k=1
k
∂ z where the partial derivatives ∂ k of z = z(t, a, ) at = 0 can be obtained by successive differentiation of the initial value problem (4.3a,b) with respect to at = 0. Moreover, the Lagrange remainder term related to a Taylor polynomial of order κ reads
Rκ (t, a, ϑ) =
∂ κ+1 z 1 (t, a, ϑ), (κ + 1)! ∂ κ+1
0 < ϑ < 1.
(4.7c)
Thus, differentiating the initial value problem (4.3a,b) with respect to and using (4.2a,b) we obtain d ∂z ∂f ∂z (t, a, ) = (t, a, z(t, a, ), ϕ (t, z(t, a, ))) (t, a, ) dt ∂ ∂z ∂ ∂f + (t, a, z(t, a, ), ϕ (t, z(t, a, ))) ϕ(t, z(t, a, )) − u0 (t, z0 ) ∂u ∂ϕ ∂z + (t, z(t, a, )) (t, a, ) , (4.8a) ∂z ∂ ∂z (t0 , a, ) =0. (4.8b) ∂ Putting = 0 in (4.8a,b), we find the initial value problem for the first derivative in the Taylor expansion (4.7b):
∂z ∂ (t, a, 0)
d ∂z ∂f ∂z (t, a, 0) = (t, a, z(t, a, 0), u0 (t, z0 )) (t, a, 0) dt ∂ ∂z ∂ ∂f + (t, a, z(t, a, 0), u0 (t, z0 )) ϕ(t, z(t, a, 0)) − u0 (t, z0 ) , ∂u (4.9a) ∂z (t0 , a, 0) =0, ∂
(4.9b)
where the trajectory z = z(t, a, 0) is defined by the initial value problem, cf. (4.1a,b) z˙ (t, a, 0) =f t, θ, z(t, a, 0), u0 (t, z0 ) , z(t0 , a, 0) =z0 ,
t 0 ≤ t ≤ tf ,
(4.9c) (4.9d)
74
4 Construction of Feedback Control by Means of Homotopy Methods
with a = (z0T , θ T )T , see (4.4b). In the linear case (4.5a) Eqs. (4.5b,c) and (4.9a–d) read d ∂z ∂z (t, a, ) =A(t, θ ) (t, a, ) + B(t, θ ) ϕ(t, z(t, a, )) − u0 (t, z0 ) dt ∂ ∂ ∂ϕ ∂z + (t, z(t, a, )) (t, a, ) , (4.10a) ∂z ∂ ∂z (4.10b) (t0 , a, ) =0, ∂ and ∂z d ∂z (t, a, 0) =A(t, θ ) (t, a, 0) + B(t, θ ) ϕ(t, z(t, a, 0)) − u0 (t, z0 ) , dt ∂ ∂ (4.11a) ∂z (t0 , a, 0) =0, ∂
(4.11b)
z˙ (t, a, 0) =A(t, θ )z(t, a, 0) + B(t, θ )u0 (t, z0 ) + b(t, θ ), z(t0 , a, 0) =z0 .
(4.11c) (4.11d)
Arbitrary higher derivatives of z = z(t, a, ) with respect to can be obtained by further differentiation of system (4.9a,b) with respect to and putting then = 0. 2 For example, in the linear case, for the second order derivative ∂ z2 (t, a, 0) we get ∂ the condition ∂z d ∂ 2z ∂ 2z ∂ϕ (t, a, 0) =A(t, θ ) (t, a, 0) + 2B(t, θ ) (t, z(t, a, 0)) (t, a, 0), 2 2 dt ∂ ∂z ∂ ∂ (4.12a) ∂ 2z ∂ 2
(t0 , a, 0) =0.
(4.12b)
Proceeding this way and assuming a linear feedback law ϕ(t, z) = (t)z
(4.12c)
with the gain matrix = (t), for the determination of the higher derivatives d ∂kz (t, a, 0), k ≥ 2, we get the following initial value problems: dt ∂ k ∂kz ∂ k−1 z d ∂kz (t, a, 0), (t, a, 0) =A(t, θ ) (t, a, 0) + kB(t, θ )(t) dt ∂ k ∂ k−1 ∂ k
(4.12d)
4 Construction of Feedback Control by Means of Homotopy Methods
∂kz (t0 , a, 0) =0. ∂ k
75
(4.12e)
Consider now the optimal feedback control problem under stochastic uncertainty minimizing the expected costs tf E L t, θ (ω), z(t, a(ω)), ϕ(t, z(t, a(ω))) dt + G θ (ω), z(tf , a(ω)) At0 t0
(4.13) subject to the control system (4.1a,b) with the feedback control (4.2a). According to the above results, in a first approach we may approximate the trajectory z = zF B (t,a) resulting from (4.1a,b), (4.2a) by, cf. (4.7a), zF B (t, a) = z(t, a, 1) ≈ z(t, a, 0) +
∂z ∂z (t, a, 0) = zOL (t, a) + (t, a, 0), ∂ ∂ (4.14a)
∂z (t, a, 0) are defined by (4.9a–d). In the where the subtrajectories t → zOL (t, a), ∂ linear case (4.5a–c) and assuming the linear feedback law (4.12c) with the gain matrix = (t), for
w(t) :=
z(t, a, 0) , t ≥ t0 , ∂z ∂ (t, a, 0)
(4.14b)
we get the initial value problem w˙ =
A(t, θ ) 0 b(t, θ ) B(t, θ ) w(t) + , u0 (t, z0 ) + B(t, θ )(t) A(t, θ ) 0 −B(t, θ )
t 0 ≤ t ≤ tf , z w(t0 ) = 0 . 0
(4.15a) (4.15b)
For minimizing the expected cost function (4.13) subject to the dynamic system (4.1a,b), we represent the cost function as follows:
tf
L t, θ, z(t, a), ϕ(t, z(t, a)) dt + G θ, z(tf , a)
t0
tf = t0
LOL t, θ, zOL (t, a), u0 (t) dt + GOL θ, zOL (tf , a)
(4.16)
76
4 Construction of Feedback Control by Means of Homotopy Methods
tf +
LF B t, θ, (t, a), (t) dt + GF B θ, (tf , a) ,
t0 ∂z where (t, a) := ∂ (t, a, 0), t ≥ t0 . The optimization of the feedback control can then be split into two stages: Optimize first the open-loop control u0 (·) and then the gain matrix (·):
Step I:
Find u∗0 (t), t0 ≤ t ≤ tf , solving ⎛ ⎜ min E ⎝ u0 (·)
tf
⎞ ⎟ LOL t, θ, zOL (t, a), u0 (t, z0 ) dt + GOL θ, zOL (tf , a) ⎠
t0
(4.17a) s.t. z˙ OL (t, a) =A(t, θ )zOL (t, a) + B(t, θ )u0 (t, z0 ) + b(t, θ ),
(4.17b)
zOL (t0 , a) =z0 . Step II:
(4.17c)
Having u∗0 (t, z0 ), t0 ≤ t ≤ tf , find the gain matrix ∗ = ∗ (t) solving ⎛ ⎜ min E ⎝
tf
(·)
⎞ ⎟ LF B t, θ, (t, a), (t) dt + GF B θ, (tf , a) ⎠
(4.18a)
t0
s.t. ∗ ˙ a) =A(t, θ )(t, a) + B(t, θ ) (t)zOL (t, (t, a) − u∗0 (t, z0 ) , (t0 , a) =0,
(4.18b) (4.18c)
∗ ∗ (t, a) denotes the optimal open-loop trajectory obtained in where zOL = zOL from Step I, and θ = θ (ω), a = a(ω) are the random vectors defined above.
Remark 4.1 The optimal open-loop type problems under stochastic uncertainty (4.17a–c) and (4.18a–c) can be solved also by the developed in Chap. 3.
References
77
References 1. Dieudonné, J.: Foundations of Modern Analysis. Academic Press, New York (1969) 2. Liao, S.: Homotopy Analysis Method in Nonlinear Differential Equations. Springer, Heidelberg (2012) 3. Schubert, H.: Topologie: eine Einführung. Teubner, Stuttgart (1971) 4. Whitehead, G.: Elements of Homotopy Theory. Springer, New York (1995)
Chapter 5
Constructions of Limit State Functions
5.1 Introduction The observations or measurements from real devices or objects, as, e.g., in natural sciences (such as physics, chemistry, biology, earth sciences, meteorology) as well as in social sciences (such as economics, psychology, sociology) are analyzed and ordered in order to describe them (approximate) by mathematical concepts, in mathematical terms. An important application of these mathematical models is the prediction of the future behavior of the real devices or objects under consideration. Moreover, future measurements can be used to validate the model. The mathematical model of a technical or economic system or structure is based on the state my -vector y containing the minimum number of internal state variables y1 , y2 , . . . , ymy , needed to describe the varying, time-varying properties, resp., of the device. Besides the my -vector y of state variables, several other variables and parameters occur in the description of the input–output behavior of the system/structure. Important further variables are the ν-vector a of model parameters aj , 1 ≤ j ≤ ν, the r-vector x of nominal design variables xk , 1 ≤ j ≤ r, and, in some cases, the n-vector u of control variables uj , 1 ≤ j ≤ n, for handling/steering the underlying system/structure, as, e.g., by mounting additional devices to improve the performance of the modified system, see, e.g., the active control [17, 18, 23] of mechanical structures. Finally, there is often a vector z of output variables describing the (observable/measurable) response of the device. Remark 5.1 Since the control vector u, if present, is an internal variable, system/structure is represented parametrically = a(ω),x by the vector a = a(ω) of model parameters and the vector x of nominal design variables. Due to random variations of the material strengths (e.g., yield stress, modulus of elasticity), the external loadings, initial conditions, geometrical quantities (moments © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_5
79
80
5 Constructions of Limit State Functions
of inertia, cross- sections, sizing variables, etc.), manufacturing errors, etc., the vector of model parameters a = a(ω) is not a given fixed quantity, but has to be represented by a random vector on a certain probability space (, A, P ). On the other hand, the vectors x, u of nominal design and control variables are deterministic vectors. Originally, in reliability analysis [9, 15] and reliability theory of structural systems [20], the limit state function, failure function or also safety margin g = g(a, x) has been introduced to describe, to separate, resp., for given design vector x, the safe and unsafe domain (the regions of failure and survival) in the area A of random parameters (material strengths, external loads, sizing variables, geometrical quantities, etc.) a = a(ω), ω ∈ (, A, P ): g(a, x) ≤ 0, a failure may occur at parameter vector a g(a, x) = 0, the structure may be safe or unsafe at a, g(a, x) ≥ 0, the structure is safe with parameter a.
(5.1a)
Note that, besides the vectors a, x, the function g = g(a, x, ·) may depend also on further variables, such as the time t. An initial representation of the limit state function has been given by g(a) = γ (R(a) − L(a)),
(5.1b)
where R = R(a, ·) is the structural resistance, L = L(a, ·) denotes the external loading, and γ = γ (δ) is an appropriate function, e.g. γ (v) = v, indicating a failure of the structure if the loadings exceeds the resistance of the structure/system. In the literature on reliability analysis, one finds methods for the generation of safety margins for truss, frame and elasto-plastic structures [20] and procedures for the approximate representation of limit state functions, cf. [1, 4, 13]: Based on the evaluation of the status of the structural system of a sample of parameter vectors a l , l = 1, . . . , l0 , in A, by means of regression-type techniques an approximate polynomial surface is determined separating the unsafe and safe realizations of the random parameter vector a = a(ω). This way, the computation of the failure/survival probabilities can be simplified considerably. In the following an optimization-oriented approach is suggested for the construction of an appropriate limit state or performance function.
5.2 Optimization-Based Construction of Limit State Functions The methods for the analysis and optimal design of structures and systems are based, as already mentioned above, mostly on certain functions, called performance
5.2 Optimization-Based Construction of Limit State Functions
81
functions or limit state functions [20], indicating the existence of a safe, a failure state, respectively. Here, the state of a structure/system is described by certain observable, measurable variables characterizing the properties of the device or object under consideration. In mechanical systems the state is often described by the time, position variables, velocity of an object, forces. In electrical circuits state variables are the voltages in the nodes and the currents through the components as, e.g., inductors, capacitors, resistors. In thermodynamics possible state variables are: Temperature, pressure, volume, internal energy, enthalpy, entropy. Finally, in economic system the population sizes, resources (e.g., nutrients, organic material) are possible state variables. Corresponding to (5.1a), the meaning of a scalar or vectorial limit state or performance function, s ∗ = s ∗ (a, x),
(5.2a)
depending on the configuration variables a, x of the structure, i.e. the model parameter vector a and the nominal design vector x is: The safety/survival, failure, resp., of a system/structure is represented by a scalar or vectorial functional criterion of the following type: ⎧ ⎪ ⎪ < 0, ⎪ ⎪ ⎪ ⎨ ∗ s (a, x) > 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ = 0,
a safe state y of S exists, (for a certain control input vector u - if u is available), no safe state y of S exists, (for every control vector u - if control inputs are available), S a safe or unsafe state may occur, (5.2b)
see, e.g., in reliability analysis [9, 15, 20], reliability-based optimal design [15, 19], stochastic optimization [11]. In the vectorial case, the above relations are understood component-wise. Note that by a simple transformation s˜ ∗ (a, x) := −s ∗ (a, x),
(5.2c)
with the transformed function s˜ ∗ = s˜ ∗ (a, x), state conditions having the same orientation as in (5.1a) can be obtained. For a general definition of the limit state function s ∗ = s ∗ a(ω), x , in static technical, economic systems/structures one has, after a possible discretization (e.g., by FEM), the following two basic relations [10]: (I) The state equation (equilibrium, balance) T y, u, a(ω), x = 0, y ∈ Y0 , u ∈ U0 .
(5.3a)
Here, y is the state my -vector of the system/structure , e.g. a stress-state vector or vector of displacements of a mechanical structure, lying in a range
82
5 Constructions of Limit State Functions
Y0 ⊂ Rmy ; in case of dynamical systems, the state y may represent the trajectory of the system. Moreover, u denotes a control or input n-vector of the system/structure with range U0 . Furthermore, for given configuration vectors (a, x) of model parameters and design variables, T = T y, u, (a, x) is a linear or nonlinear mapping from Rmy × Rn into RmT . Equation (5.3a) represents, e.g. the structural equilibrium in elastic or plastic structural design [7, 14] or the dynamical equation of a control system [24]. Remark 5.2 We assume that for each a(ω), x under consideration, Eq. (5.3a) has at least one solution y, (y, u), respectively. Depending on the application, (5.3a) may have a unique solution y, (y, u), resp., as in the case of elastic mechanical structures [14], or multiple solutions y, (y, u), resp., as in for plastic structures [7]. that condition (5.3a) may also include inequality Note state conditions T y, u, a(ω), x ≤ 0. In case of a linear system/structure with respect to the state y or a system/structure linearized with respect to y at y = 0, the state equation (5.3a) reads K u, a(ω), x y = L u, a(ω), x
(5.3b)
with the equilibrium matrix K u, a(ω), x and the vector L u, a(ω), x of external loadings. In many cases K, L are independent of the control vector u, while in active control [17, 18] the load vector L depends also on u. Hence, in many applications the right hand side of (5.3b) reads L(u, (a(ω), x)) := F (a(ω), x) + C(a(ω), x)u
(5.3c)
with a vector function F = F (a(ω), x) and a matrix function C = C(a(ω), x). = K a(ω), x independent of the Having then an equilibrium matrix K input u, then (5.3b) can be represented also by y K(a(ω), x), −C(a(ω), x) = F (a(ω), x). u
(5.3d)
(II) Admissibility condition (operating condition) In addition to the state equation (5.3a), (5.3b), resp., for the state vector y we have the admissibility or operating condition y ∈ Y u, a(ω), x) or y ∈ int Y u, a(ω), x
(5.4a)
with the admissible (feasible) state domain Y = Y u, a(ω), x depending on the model parameter vector a = a(ω) and the nominal input vectors u, x. Using certain response variables, such as displacements, stress variables, forces, discrepancies, etc., a frequent representation of the feasible domain, e.g.
5.2 Optimization-Based Construction of Limit State Functions
83
by means of lower and upper bounds of response variables reads Y u, a(ω), x := y ∈ Rmy : g y, u, a(ω), x ≤ 0 ,
(5.4b)
where g = g y, u, a(ω), x
(5.4c)
denotes a scalar or mg -vector response function of y, u, depending also on a(ω), x , and having appropriate analytical properties (e.g., convexity, linearity). In many applications or after linearization with respect to y at y = 0, the function g is affine–linear with respect to y, hence, g y, u, a(ω), x := H u, a(ω), x y − h u, a(ω), x
(5.4d)
with a matrix (H, h) = H u, a(ω), x , h u, a(ω), x . We assume that the zero state y = 0 is an interior point of the admissible state domain Y u, a(ω), x , i.e. 0 ∈ int Y u, a(ω), x or g 0, u, a(ω), x < 0
(5.4e)
for all variables u, a(ω), x under consideration. Thus, in case (5.4d) we have h u, a(ω), x > 0.
(5.4f)
Representation by a Minkowski Functional If the feasible state domain Y u, a (ω), x is a closed convex set containing the zero state as an interior point, then the admissibility condition (5.4a) can be represented also by π y|Y (u, a(ω), x) ≤ 1, π y|Y (u, a(ω), x) < 1, resp.,
(5.5a)
where π = π y|Y (u, a(ω), x) denotes the Minkowski or distance functional of the admissible state domain Y = Y u, a(ω), x , defined, cf. [22], by ' & y π(y|Y ) = inf λ > 0 : ∈ Y . λ
(5.5b)
Suppose that the state vector can be partitioned, y = (yi )1≤i≤nG ,
(5.5c)
84
5 Constructions of Limit State Functions
into a certain number nG of mi -subvectors yi , i = 1, . . ., such that the admissible state domain Y u, a(ω), x is represented then by the product G Yi u, a(ω), x Y u, a(ω), x = ⊗ni=1
(5.5d)
of nG convex factor domains & ' Yi u, a(ω), x := yi : gi yi , u, a(ω), x ≤ 0
(5.5e)
with the functions gi = gi yi , u, a(ω), x , i = 1, . . . , nG .
(5.5f)
In the above mentioned case, due to (5.5d), the distance functional of Y u, a(ω), x can be represented by π y|Y u, a(ω), x = max πi yi |Yi u, a(ω), x , 1≤i≤nG
(5.5g)
where yi πi yi |Yi u, a(ω), x := inf λ > 0 : ∈ Yi u, a(ω), x λ denotes the Minkowski functional of the factor domain Yi u, a(ω), x .
(5.5h)
Example 5.1 Suppose that mi = 1, i = 1, . . . , nG (= my ), and let Yi := [ai , bi ] be intervals on the real line such that a1 ≤ 0 ≤ bi , i = 1, . . . , my . Then, π(yi ) = max{ yaii , ybii }, yi ∈ R, and π(y) = max1≤i≤my max{ yaii , ybii }.
5.3 The (Limit) State Function s ∗ The role of the (limit) state function is to indicate whether a state vector y belongs to the safe or the failure domain. According to the (I,II) stated in conditions Sect. 5.1, for a given pair of configuration vectors a(ω), x , the state function s ∗ = s ∗ a(ω), x can be defined by the minimum value function of one of the following optimization problems: Problem A
min s
s,y,u
s.t.
(5.6a)
5.3 The (Limit) State Function s ∗
85
T y, u, a(ω), x = 0 g y, u, a(ω), x ≤ s1
(5.6b) (5.6c)
y ∈ Y0
(5.6d)
u ∈ U0 ,
(5.6e)
where 1 := (1, . . . , 1)T . Note that due to Remark 5.2, problem (5.6a–e) always has the feasible solutions (s, y,u) = (˜s , y, ˜ u) ˜ with any solution (y, ˜ u) ˜ of (5.3a), (5.6b) and s˜ with g y, ˜ u˜ a(ω), x ≤ s˜ 1. Obviously, if the state equation (5.3a) is linear (5.6a–e) is in y, u, problem convex, provided that the vector function g = g y, u, a(ω), x is convex with respect to y, u, and the ranges Y0 and U0 are convex. Problem B: Linear State Equation, Linear Operating Condition, and Convex Polyhedral Domains of the State and Control Variables Suppose that relations (5.3b,c) and (5.4d) hold, and the ranges Y0 , U0 are closed convex polyhedrons: Y0 := {y ∈ R my : Ay ≤ b}, U0 := {u ∈ R n : Bu ≤ c}
(5.7)
with fixed matrices (A, b), (B, c). Moreover, assume that K, H, h are independent of u, and the right hand side of (5.3b) is given by (5.3c). Then (5.6a–e) can be represented by the linear program (LP): min s
(5.8a)
K a(ω, x y − C a(ω, x u = F a(ω), x H a(ω), x y − h a(ω), x ≤ s1
(5.8b)
s,y,u
s.t.
(5.8c)
Ay ≤ b
(5.8d)
Bu ≤ c.
(5.8e)
The dual representation of the LP (5.8a–e) reads: T T max F a(ω), x θ − h a(ω), x v − bT w − cT z
(5.9a)
s.t. T T K a(ω), x θ − H a(ω), x v − AT w = 0
(5.9b)
CT θ + BT z = 0
(5.9c)
86
5 Constructions of Limit State Functions
1T v ≤ 1
(5.9d)
θ free, v, w, z ≥ 0.
(5.9e)
Remark 5.3 If the state vector y has full range Y0 = R my , then (5.8c) is canceled, and in (5.9a–e) the expressions bT w, AT w, w are canceled. In case of a closed convex admissible state domain Y a(ω), x , we have the also admissibility condition (5.5a) with the distance functional π = π y, a(ω), x of Y a(ω), x . Then, the state function s ∗ = s ∗ a(ω), x can be defined by the minimum value function of the following optimization problem: Problem C
min s
(5.10a)
s,y,u
s.t. T y, u, a(ω), x = 0 π y, a(ω), x − 1 ≤ s
(5.10b) (5.10c)
y ∈ Y0
(5.10d)
u ∈ U0 .
(5.10e)
Special forms of (5.10a–e) are obtained for a linear state equation (5.3b) and/or for special admissible domains Y a(ω), x , as e.g. closed convex polyhedrons, cf. (5.4b,d), & ' Y a(ω), x = y : H a(ω), x y ≤ h a(ω), x .
(5.11a)
If (Hi , hi ), i = 1, . . . , mH , denote the rows of (H, h), then, because of property (5.4f), the distance functional of (5.11a) is given by y ≤ hi a(ω), x , i = 1, . . . , nH π y, a(ω), x = inf λ > 0 : Hi a(ω), x λ ( ) 1 Hi a(ω), x y ≤ λ, i = 1, . . . , nH = inf λ > 0 : hi a(ω), x i a(ω), x y, = max H (5.11b) 0≤i≤mH
where H˜ i a(ω), x :=
(
0,
1 Hi hi a(ω),x
i=0 (a(ω), x) , 1 ≤ i ≤ mH .
(5.11c)
5.3 The (Limit) State Function s ∗
87
Problem D: Scalar/Vectorial Operating Condition If one has a scalar response function g, as in the above Problem C, then after elimination the variable s, problem (5.6a–e) reads: min g y, u, a(ω), x y,u
(5.12a)
s.t. T y, u, a(ω), x = 0
(5.12b)
y ∈ Y0
(5.12c)
u ∈ U0 ,
(5.12d)
where the state function s ∗ = s ∗ (a, x) is then given by s ∗ (a, x) := g y ∗ , u∗ , a(ω), x
(5.12e)
with a pair of optimal solutions (y ∗ , u∗ ). In case of an mg -vector response function g = (gi )1≤i≤mg , the vectorial conditions (5.6c) can be represented in the following scalar form: max gi y, u, (a(ω), x) ≤ s.
1≤i≤mg
(5.13)
Corresponding to (5.12a–d), in the vectorial case, problem (5.6a–e) can be represented also by min max gi y, u, (a(ω), x) y,u 1≤i≤mg
(5.14a)
s.t. T y, u, a(ω), x = 0
(5.14b)
y ∈ Y0
(5.14c)
u ∈ U0 ,
(5.14d)
Constrained Nonnegative Response Functions Suppose that the response functions gi , i = 1, ..., mg , in the admissibility condition (II) are given by gi :=
gi − 1, i = 1, ..., mg , di
(5.15a)
88
5 Constructions of Limit State Functions
with nonnegative primary response functions gi ≥ 0 and given positive upper bounds di > 0, i = 1, ..., mg for gi . Then, we have max gi = max
1≤i≤mg
1≤i≤mg
gi di
−1=
gi di
∞ − 1,
(5.15b)
i
and for the primary response variables gi the admissible condition (II) reads: gi y, u, (a(ω), x) ≤ di , 1 ≤ i ≤ mg .
(5.15c)
1 m p p of an m−vector w, where p ≥ 1, With the p-norm wp := j =1 |wj | the maximum-norm can be approximated, cf. [6], by m
− p1
wp ≤ w∞ ≤ wp .
(5.15d)
Moreover, we have wp → w∞ , p → ∞.
(5.15e)
Hence, using p-norms and the diagonal matrix D with the diagonal elements di , the maximum norm of the vector D −1 g occurring on the right hand side of (5.15b) can be estimated by 1
D −1 g ∞ ≤ D −1 g p ≤ mg p D −1 g ∞ , p ≥ 1
(5.15f)
1 mg with D −1 g p := ( i=1 ( gdii )p ) p . Consequently, taking the p-th power of the pnorm, the objective function of (5.14a–d) with the response functions (5.15a) can be represented approximately, see the following note, also by
min y,u
p mg 1 gi (y, u, (a(ω), x)) − 1. di
(5.15g)
i=1
Admissibility conditions occurring frequently in practice are box constraints, as considered in the following: Example 5.2 With given lower and upper bounds ai < bi , 1 ≤ i ≤ mg , consider the box conditions ai ≤ gi ≤ bi , 1 ≤ i ≤ mg with the primal response functions bi −ai i gi , 1 ≤ i ≤ mg . Defining the quantities ci = ai +b 2 and di = 2 , 1 ≤ i ≤ mg , the box conditions can be represented by means of the following response variables: gi − ci − 1, 1 ≤ i ≤ mg , gi := d i
which corresponds to the case (5.15a).
(5.15h)
5.3 The (Limit) State Function s ∗
89
A further approximation is obtained by taking into account that max1≤i≤mg gi ≥ gi for all 1 ≤ i ≤ mg . Thus, if si∗ = si∗ a(ω), x denotes the state functions of the systems/structures having the single performance functions gi , 1 ≤ i ≤ mg , then s ∗ a(ω), x := max si∗ a(ω), x . s ∗ a(ω), x ≥ 1≤i≤mg
(5.15i)
5.3.1 Characterization of Safe States Based on the state equation (I) and the admissibility condition (II), the safety of a structural system Σ = Σa(ω),x represented by the configuration variables a(ω), x can be defined as follows: Definition 5.1 Let a = a(ω), x, resp., be given vectors of model parameters, design variables. A vector y˜ ∈ Y0 is called a safe state of system Σa(ω),x if—applicable—a control vector u˜ ∈ U0 exists such that (y, ˜ u) ˜ fulfills (i) the state equation (5.3a) or (5.3b), hence, T y, ˜ u, ˜ a(ω), x = 0, K a(ω), x y˜ = L a(ω), x, u , resp., (5.16a) and (ii) the admissibility condition (5.4a,b) or (5.5a,b) in the standard form g y, ˜ a(ω), x ≤ 0, π y, ˜ a(ω), x ≤ 1, resp., (5.16b) or in the strict form g y, ˜ a(ω), x < 0, π y, ˜ a(ω), x < 1, resp.,
(5.16c)
depending on the application. System Σa(ω),x is called to be in a safe, unsafe state, resp., if a safe state y˜ exists, if no safe state vector y˜ exists, respectively. Remark 5.4 If Σa(ω),x is in an unsafe state, then, due to violations of the basic safety conditions (I) and (II), failures and corresponding damages of Σa(ω),x may (will) occur. For example in mechanical structures (trusses, frames) high external loads may cause internal stresses in certain elements which damage then the structure, at least partly. Thus, in this case failures and therefore compensation and repair costs occur. By means of the state function s ∗ = s ∗ a(ω), x , the safety or failure of a system or structure Σa(ω),x having configuration a(ω), x can be described now, cf. Definition 5.1 as follows: Theorem 5.1 Let s ∗ = s ∗ a(ω), x be the state function of Σa(ω),x defined by the minimum function of one of the above optimization problems (A)–(D). Then, the following basic relations hold:
90
5 Constructions of Limit State Functions
• If s ∗ a(ω), x < 0, then a safe state vector y˜ exists. Hence, Σa(ω),x is in a safe state. • If s ∗ a(ω), x > 0, then no safe state vector exists. Thus, Σa(ω),x is in an unsafe state and failures and damages may occur. • If a safe state vector σ˜ exists, then s ∗ a(ω), x ≤ 0, s ∗ a(ω), x < 0, resp., corresponding to the standard, the strict admissibility condition (5.16b), (5.16c). • If s ∗ a(ω), x ≥ 0, then no safe state vector exists with respect to the strict admissibility condition (5.16c). • Suppose that the minimum (s ∗ , y ∗ , u∗ ) is attained in the selected optimization problem defining the state function s ∗ . If s ∗ a(ω), x = 0, then a safe state vector exists in case of the standard admissibility condition (5.16b), and no safe state exists in case of the strict condition (5.16c). Proof
• If s ∗ a(ω), x < 0, then according to the definition of the minimum value function s ∗ , there exists a feasible point (˜s , y, ˜ u)) ˜ in Problems (A), (B), and (C), resp., such that, cf. (5.6c), (5.8c), (5.10c), and (5.12a) g y, ˜ a(ω), x ≤ s˜ 1, π y, ˜ a(ω), x − 1 ≤ s˜ ,
•
•
• •
s˜ < 0, s˜ < 0.
Consequently, (y, ˜ u) ˜ ∈ Y0 × U0 fulfills the state equation (5.16a) and the strict admissibility condition (5.16c). Hence, according to Definition 5.1, the vector y˜ is a safe state vector the strict admissibility condition (5.16c). fulfilling Suppose that s ∗ a(ω), x > 0, and assume that there exists a safe state vector y˜ with a related control vector u˜ for Σa(ω),x . Thus, according to Definition 5.1, (y, ˜ u) ˜ fulfills the state equation (5.16a) and the admissibility condition (5.16b) or (5.16c). Consequently, (˜s ,y, ˜ u) ˜ with (A), s˜ = 0 is feasible in Problems (B), (C) or (D), and we obtain s ∗ a(ω), x ≤ 0 in contradiction to s ∗ a(ω), x > 0. Thus, no safe state vector exists in this case. Suppose that y˜ is a safe state for Σa(ω),x . According to Definition 5.1 there exists a vector u˜ such that (y, ˜ u) ˜ fulfills (5.16a) and condition (5.16b), (5.16c), respectively. Thus, (˜s , y, ˜ u) ˜ with s˜ = 0, with a certain s˜ < 0, resp., is feasible ∗ a(ω), x ≤ 0 in case of the in Problems (A), (B) or (C), (D). Hence, we get s standard and s ∗ a(ω), x < 0 in case of the strict admissibility condition. This assertion follows directly from the third assertion. If s ∗ a(ω), x = 0, then under the present assumption there exists (y ∗ , u∗ ) ∈ Y0 × U0 such that (s ∗ , y ∗ , u∗ ) = (0, y ∗ , u∗ ) is an optimal solution of Problems (A) and (B) or (C), (D). Hence, (y ∗ , u∗ ) fulfills the state equation (5.16a) and the admissibility condition (5.16b). Thus, y ∗ is a safe state vector with respect to the standard admissibility condition. On the other hand, because of (5.16c), no safe state vector σ˜ exists with respect to the strict admissibility condition.
5.4 Computation of the State Function for Concrete Cases
91
5.4 Computation of the State Function for Concrete Cases In this section the state function is computed for several applications and special cases.
5.4.1 Mechanical Structures Under Stochastic Uncertainty 5.4.1.1
Trusses
For the state y of a truss, i.e. the vector y = (y1 , . . . , yB )T of axial (normal) forces acting in the bars i = 1, . . . , B, we have the linear state equation, cf. (5.3a–d), Ky = F with the m × B− equilibrium matrix K, where m ≤ B, rankK = m, and the vector F = F (a) of external forces depending on random parameter vector a = a(ω). Furthermore, let us denote y L = y L (a), y U = y U (a), resp., the vector of bounds for the forces in compression, tension. Thus, for a truss under external load F we have the safety condition: Ky = F (a),
(5.17a)
y L (a) ≤ y ≤ y U (a).
(5.17b)
Condition (5.17b) can be represented by | yi − ci | ≤ 1, i = 1, . . . , B, di
(5.18a)
with the vectors c(a) =
yU − yL yL + yU , d(a) = . 2 2
(5.18b)
Using the transformation y := y − c, the safety condition reads (a) := F (a) − Kc(a), K y=F
(5.18c)
| yi | ≤ 1, i = 1, . . . , B. di
(5.18d)
Due to (5.18d) we introduce now the response function g(y, (a, x)) := max
1≤i≤B
| yi | − 1. di
(5.18e)
92
5 Constructions of Limit State Functions
For the treatment of the safety condition (5.18c,d) we consider now, cf. (5.2a–c), the limit state function, involving a projection problem with the maximum-norm, s ∗ (a, x) := min max
1≤i≤B K y =F
| yi | − 1. di
(5.19a)
Using the upper bound, see (5.18d), B | yi | yi 2 ≤ , max 1≤i≤B di di
(5.19b)
i=1
for the limit state function we get the upper bound B yi 2 ∗ ∗ s (a, x) ≤ s (a, x) := min − 1. d K y =F i
(5.19c)
i=1
The optimization problem, arising in (5.19c) can be represented by the minimization of a quadratic function subject to a linear constraint, hence, a projection problem with a quadratic norm. Denoting by D the diagonal matrix with the diagonal elements di , i = 1, . . . , B, for the upper bound s = s ∗ (a, x) of the limit state function we get s ∗ (a, x) =
*
T (KD 2 K T )−1 F − 1. F
(5.19d)
Obviously, negative values of s ∗ = s ∗ (a, x) and therefore a safe truss result for larger values of the strengths di , i = 1, . . . , B of the bars, and/or moderate values . of the external load F
5.4.1.2
Elastic-Plastic Mechanical Structures
For mechanical structures made of elastic-plastic materials, the structural failure/survival can be described by means of the dual collapse theorems, cf. [12]. After a possible discretization of the structure by finite elements, the survival/failure condition can be formulated, see [21], by the equilibrium equation with the stress boundary conditions and the yield condition defining the elastic limits: Cσ = P
(5.20a)
−1 Ni Rid σi ≤ hi , i = 1, 2, . . . , nG .
(5.20b)
Here, C denotes the m × n equilibrium matrix with rankC = n, n = n0 nG , σ = (σ1T , . . . , σiT , . . . , σnTG )T is the stress-state n−vector involving the n0 −vectors σi of
5.4 Computation of the State Function for Concrete Cases
93
stress/stress-state components corresponding to the i−th reference (nodal, check or Gauss) points xi , i = 1, . . . , nG , of the structure. P is the m−vector of external load y of the closed, convex feasible domain components. The piecewise linearizations K i y (yield domains, plasticity domains) Ki corresponding to the reference points xi , i = 1, . . . , nG , containing the origin as an interior point, are represented, cf. [3], by y = {z ∈ Rn0 : Ni z ≤ hi } K i
(5.20c)
with the given matrices ⎞ Ni1 hi1 ⎜ N h ⎟ ⎜ i1 i1 ⎟ ⎟ ⎜ · ⎟ ⎜ (Ni , hi ) = ⎜ ⎟ , i = 1, 2, . . . , nG . ⎟ ⎜ · ⎟ ⎜ ⎠ ⎝ · Nimy himy ⎛
(5.20d)
y contains the origin as an Since also the piecewise linearized feasible domain K i interior point, we have hi > 0, i = 1, . . . , nG .
(5.20e)
Hence, one can put hi := (1, . . . , 1). Moreover, Ri = (Ri1 , Ri2 , . . . , Rin0 , ), i = 1, . . . , nG , denotes the n0 -vectors of positive material resistance (strength) parameters such as plastic capacities or yield stresses Rij , j = 1, . . . , n0 at the i-th reference points xi , i = 1, . . . , nG , and Rid is the diagonal matrix of the corresponding resistance parameters. In the following we suppose that y ⊂ K y ⊂ S(0; ηi ). K i i
(5.21)
y is an inner linearization of the admissible domain K y lying in the sphere Hence, K i i S(0; ηi ) with center in the origin 0 and radius ηi . Describing the admissibility conditions in the reference points xi , i = 1, . . . , ng , by means of Minkowski functionals, cf. (5.5a,b), we have ) ≥ π(z|K ) ≥ π(z|K i i y
y
z . ηi
(5.22a)
Thus, the approximate survival condition (5.20a,b) is sufficient for the exact y survival condition containing the true yield domains Ki , i = 1, . . . , nG . Moreover, if (5.20b) holds, hence, if the admissibility condition holds, then (5.22a) yields σi ≤ ηi , i = 1, . . . , nG .
(5.22b)
94
5 Constructions of Limit State Functions
By the above assumptions and derivations, for the admissibility condition (5.20b) we get the following, always valid lower bound: y
Theorem 5.2 Suppose that yield domains Ki , i = 1, . . . , nG at the reference y points of the structure are approximated from inside by the convex polygon K i represented by (5.20c). Moreover, assume that the yield domains are bounded according to condition (5.21). Then the admissibility condition (5.20b) is equivalent to −1 − Li ≤ Ni Rid σi ≤ hi , i = 1, 2, . . . , nG ,
(5.23a)
−1 where Li := ηi Rid Nij and ⎛ ⎞ Ni1 ⎜ ⎟ Nij := ⎝ ... ⎠ .
(5.23b)
Nimy Define now the block-diagonal matrices ⎡
⎡ ⎤ N1 0 . . . 0 R1d ⎢ 0 N2 . . . 0 ⎥ ⎢ 0 ⎢ ⎢ ⎥ Nd := ⎢ . .. ⎥ , Rd := ⎢ .. . . . ⎣ . 0 . . ⎦ ⎣ . 0 . . . 0 NnG 0
0 ... R2d . . . . 0 ..
0 0 .. .
⎤ ⎥ ⎥ ⎥ ⎦
(5.24a)
. . . 0 Rn G d
and the vectors T h := hT1 , hT2 . . . , hTnG , T L := LT1 , LT2 . . . , LTnG .
(5.24b)
Using the definitions (5.24a,b), the admissibility condition (5.20b) can be represented by Theorem 5.3 Suppose that the assumptions of the above theorem hold. Then, the admissibility condition (5.20b) is equivalent to −
h+L h+L h−L ≤ Nd Rd−1 σ − ≤ . 2 2 2
(5.25a)
Denoting by Dh+L the diagonal matrix with the components of the vector h + L on the diagonal, condition (5.25a) can be represented also by h − L 1 −1 ∞ ≤ , Nd Rd−1 σ − Dh+L 2 2 where · ∞ denotes the maximum-norm.
(5.25b)
5.4 Computation of the State Function for Concrete Cases
95
According to the representation (5.25b) of the admissibility condition (5.20b) the limit state function for elastic-plastic mechanical structures can be represented, (5.19a–d), (5.20a,b), as follows: Corollary 5.1 Suppose that the assumptions in Theorem 5.2, 5.3 hold. Then, the limit state functions of the above described elastic-plastic mechanical structures read h − L 1 −1 s ∗ (a, x) = min Dh+L ∞ − . Nd Rd−1 σ − Cσ =P 2 2
(5.26)
Note 5.1 Due to the linear constraint in (5.26), the linear function in the max-norm and the max-norm, the minimum value in (5.26) can be determined by a linear program and its dual, see (5.3a–e), (5.4a–e). Approximation of the Maximum-Norm by the Euclidean Norm Using the pnorm for p = 2, the max-norm can be approximated, see (5.18c,d), from above as follows: h−L 2 h−L 2 −1 −1 −1 −1 Dh+L Nd Rd σ − ∞ ≤ Dh+L Nd Rd σ − 2 2 2 ≤ σ T Qσ − σ T q + d,
(5.27a)
where −1 2 ) Nd Rd−1 , Q := Rd−1 NdT (Dh+L −1 2 q := Rd−1 NdT (Dh+L )
d :=
h−L , 2
(h − L) T −1 2 h − L . (Dh+L ) 2 2
(5.27b) (5.27c) (5.27d)
Note 5.2 In case of symmetric lower and upper bounds, hence, if h = L, cf. (5.23a), then q = 0, d = 0, and the right hand side of (5.27a) is a quadratic form in σ . Taking the minimum in the inequality (5.27a), for the limit state function of the above described class of elastic-plastic mechanical structures we get the following upper bound: Theorem 5.4 Suppose that the assumptions in Theorem 5.2, 5.3 hold. For the exact limit state function (5.26) we have the upper bound 1 s ∗ (a, x) = (CQ−1 q − P )T (CQ−1 C T )−1 (CQ−1 q − P ) − q T Q−1 q + d − , 2 (5.28a) where Q, q, d are defined by (5.27b–d). If h = L, then 1 s ∗ (a, x) = P T (CQ−1 C T )−1 P − . 2
(5.28b)
96
5 Constructions of Limit State Functions
5.4.2 Linear-Quadratic Problems with Scalar Response Function With positive (semi) definite matrices Q = Q(a, x), R = R(a, x), vectors q = q(a, x), p = p(a, x), which may depend on the vectors a, x of model parameters and design variables, and a maximum response value g, the response function g = g(y, u, (a, x)) is given by the convex, scalar function: g(y, u, (a, x)) := y T Q(a, x)y + q(a, x)T y + uT R(a, x)u + p(a, x)T u − g ˜ = y˜ T Q(a, x)y˜ + q(a, ˜ x)T y˜ − g,
(5.29a)
y ˜ where y˜ := is the modified state vector, and the matrix, vector Q(a, x), u q(a, ˜ x), is defined then correspondingly. Considering then the limit state function generated by Problem D, hence (5.3a–d), where Y0 = Rmy , U0 = Rn , and the operator T is given by (5.2a–c), the constraints of Problem D are reduced to the linear equation for y: ˜ K˜ y˜ = F,
(5.29b)
where the matrix K˜ is defined by K˜ := K, −C . Note that in most practical cases we can suppose that K˜ has full rank. Consequently, under the above assumptions the minimization problem (D) can be represented by the quadratic program: ˜ x)y˜ + q(a, ˜ x)T y˜ − g s.t. K˜ y˜ = F. min y˜ T Q(a, y˜
(5.29c)
˜ is a positive definite matrix, and K˜ has full rank, also the matrix Supposing that Q −1 T ˜ ˜ ˜ K Q K is positive definite. Hence, in the present case the limit state function s ∗ = s ∗ (a, x) reads 1 ˜ ˜ −1 1 T ˜ −1 1 ˜ ˜ −1 T ˜ ˜ −1 ˜ T −1 F + K Q q˜ s (a, x) = − q˜ Q q˜ + F + K Q q˜ (K Q K ) 4 2 2 ∗
− g,
(5.29d)
˜ K, ˜ F may depend on the vectors a and/or x of model parameters/design where q, ˜ Q, variables.
5.4 Computation of the State Function for Concrete Cases
97
5.4.3 Approximation of the General Operating Condition Assuming that g = g(y, u, (a, x)) is an mg -vector function with components gj , j = 1, . . . , mg , the operating condition (5.4c) can also be represented by the scalar condition max gj (y, u, (a, x)) ≤ s.
1≤j ≤mg
(5.30a)
In many cases we may suppose that the functions gj have a joint lower bound gj (y, u, (a, x)) ≥ g(a, x) for all j = 1, . . . , mg , and y ∈ Y0 , u ∈ U0
(5.30b)
with a given function g = g(a, x). Denoting by · ∞ the maximum-norm, then max1≤j ≤mg gj (y, u, (a, x)) can be represented by max gj (y, u, (a, x)) = g(y, u, (a, x)) − g(a, x)1∞ + g(a, x). (5.30c)
1≤j ≤mg
Hence, for an exponent p > 1, because of (5.18d,e), Problem A, i.e. (2.6a–d), can be approximated by the minimum p-norm problem: min g(y, u, (a, x)) − g(a, x)1p y,u
(5.31a)
s.t. T y, u, a(ω), x = 0
(5.31b)
y ∈ Y0
(5.31c)
u ∈ U0 .
(5.31d)
Consequently, if s˜p∗ = s˜p∗ (a, x) the minimum value of (5.31a–d), then the limit state function can be approximated from above by s ∗ (a, x) ≤ sp∗ (a, x) := s˜p∗ (a, x) + g(a, x).
(5.32)
5.4.4 Two-Sided Constraints for the Response Functions With the response functions gj (y, u, (a, x), j = 1, . . . , mg , the operating condition (5.4a–c) is often formulated in the following way: α ≤ g(y, u, (a, x)) ≤ β,
(5.33a)
98
5 Constructions of Limit State Functions
where α, β are the mg -vectors of lower, upper bounds, resp., for the mg -vector g of response functions gj (y, u, (a, x), j = 1, . . . , mg . Defining the diagonal matrix Dα,β having the diagonal elements βj −αj , j = 1, . . . , mg , the two-sided condition (5.33a) can be represented also by α+β 1 −1 g(y, u, (a, x)) − ∞ ≤ . Dα,β 2 2
(5.33b)
Using again the p-norm approximation for the maximum-norm, corresponding to (5.18d) we have the p-norm minimization problem: −1 min Dα,β y,u
α+β g(y, u, (a, x)) − p 2
(5.34a)
s.t. T y, u, a(ω), x = 0
(5.34b)
y ∈ Y0
(5.34c)
u ∈ U0 .
(5.34d)
Finally, if sp∗ denotes the minimum value of (5.34a–d), then the limit state function corresponding to problems with the operating condition (5.33a) can be approximated from above by s ∗ (a, x) ≤ sp∗ (a, x).
(5.35)
5.5 Systems/Structures with Parameter-Dependent States 5.5.1 Dynamic Control Systems In many practical cases, as, e.g., in the optimization and control of robots [11], active control of mechanical structures [17], analysis and control of vibrations of structures [18], etc., the state vector y depends on a certain parameter, such as the time t in dynamical problems, or dimensions, sizing variables, etc., in geometrical problems. Hence, e.g. in dynamical problems the state equation (2.6b) is given by a certain system of differential equations depending on random quantities: y˙ = f (t, y(t), u(t), (a, x)), t ≥ t0 , y(t0 ) = y0 (a, x),
(5.36a)
with a given function f and an initial value y = y(t0 ) = y0 (a, x). In case of a linear or linearized state differential equation we have
5.5 Systems/Structures with Parameter-Dependent States
y˙ = A(t, (a, x))y(t) + B(t, (a, x))u(t) + b(t, (a, x)),
99
(5.36b)
with matrix and vector functions, resp., A, B, b. Often the state differential equation is also represented by a second order system of linear differential equation M(t)y¨ + D(t)y˙ + K(t)y(t) = L(t, u(u), (a, x))
(5.36c)
with deterministic matrix functions M = M(t), D = D(t), K = K(t) and a vector function L = L(t, u(u), (a, x)). In dynamic control systems the response or performance function g in (2.6c) is very frequently represented in scalar way by
tf
g(y(·), u(·), (a, x)) =
γ t, y(t), u(t), (a, x) dt + ψ(tf , y(tf ), (a, x))
t0
(5.37) with a cost function along the trajectory γ = γ (t, y, u, (a, x)) and a terminal cost function ψ = ψ(tf , y(tf ), (a, x)). In many cases convex quadratic functions are taken into account. Consequently, in the present case, according to the optimization problem (2.6a– e), the limit state function s ∗ = s ∗ (a, x) is given by the optimal value function of the optimal control problem:
min
tf
y(·),u(·) t0
γ t, y(t), u(t), (a, x) dt + ψ(tf , y(tf ), (a, x)) − gmax
(5.38a)
s.t. y(t) ˙ = f (t, y(t), u(t), (a, x)),
(5.38b)
y(t0 ) = y0 (a, x),
(5.38c)
y(t) ∈ Y0 (t), u(t) ∈ U0 (t), t0 ≤ t ≤ tf .
(5.38d)
5.5.2 Variational Problems Corresponding to the representation of the limit state function s ∗ = s ∗ (a, x) by an optimal control problem, also representations of s ∗ by using methods of calculus of variation [2] may occur. Corresponding problems may found in analytical mechanics and shape optimization. In a basic version the performance or response function g = g(y, (a, x)) is given by
g(g(y, (a, x)) :=
tf t0
f τ, y(τ ), y (τ ), (a, x) dτ,
(5.39a)
100
5 Constructions of Limit State Functions
where τ is the (scalar) parameter in the state function y = y(τ ), y = y (τ ) denotes the derivative of the state function y with respect to τ , and f = f (τ, y, y ) is a given cost function. Moreover, the state equation T reads then y(t0 ) − y0 (a, x) . T (y, (a, x)) = y(tf ) − yf (a, x)
(5.39b)
In addition to the boundary conditions (5.39b) in some variational problems there is still a constraint of type:
tf
h τ, y(τ ), y (τ ), (a, x) dτ = h¯
(5.39c)
t0
¯ with a given function h and given value h. Thus, according to the optimization problem (5.6a–e), here, the limit state function s ∗ = s ∗ (a, x) is given by the optimal value function of the following problem of calculus of variation:
tf
min y(·)
f τ, y(τ ), y (τ ), (a, x) dτ − gmax
(5.40a)
t0
s.t. tf ¯ t0 h τ, y(τ ), y (τ ), (a, x) dτ = h
(5.40b)
y(t0 ) = y0 (a, x), y(tf ) = yf (a, x).
(5.40c)
Remark 5.5 If the system differential equation (5.38b) can be solved uniquely to the control input u(t), hence, u(t) = h(t, y(t), y(t), ˙ (a, x))
(5.41)
with some function h, and putting (5.41) into the objective function (5.38a) of the control problem, then the control problem is transformed into a variational problem of the type (5.40a–c).
5.5.3 Example to Systems with Control and Variational Problems 5.5.3.1
Control Problems
Suppose that the (open-loop) control problem has, cf. (5.36b) the linear system differential equation
5.5 Systems/Structures with Parameter-Dependent States
101
y˙ = Ay(t) + Bu(t), t ≥ t0 ,
(5.42a)
with fixed matrices A, B, and let the objective function be given (5.37) by
tf
t0
1 γ t, y(t), u(t), (a, x) dt = 2
tf
u(t)T Ru(t)dt
(5.42b)
t0
with a positive definite matrix R. Moreover, assume the boundary conditions y(t0 ) = y0 (a, x), y(tf ) = yf (a, x). With the solution P = P (t) of the Lyapunov equation, see, e.g., [8], the optimal open-loop control reads u(t) = R −1 B T eA
T (t
f −t)
P −1 (tf )d(t0 , tf )
(5.42c)
with d(t0 , tf ) = d(t0 , tf , (a, x)) := yf (a, x) − eA(tf −t0 ) y0 (a, x).
(5.42d)
Consequently, the related limit state function reads [8] s ∗ (a, x) =
1 d(t0 , tf , (a, x))T P −1 (tf )d(t0 , tf , (a, x)) − gmax . 2
(5.42e)
Similar results can be found for closed-loop control problems [8].
5.5.3.2
Variational Problems
Considering the problem of minimum distance of two points (t0 , y0 (a, x)), (tf , yf (a, x)) in the plane, the function f is given by f (τ, y(τ ), y (τ )) :=
*
1 + y (τ )2 .
(5.43a)
By application of the Euler–Lagrange differential equation, for the optimal solution y ∗ = y s t (τ ) of the variational problem (5.40a,c) we have the line segment y ∗ = y s t (τ ) =
yf − y0 y0 tf − yf t0 t+ . tf − t0 tf − t0
(5.43b)
Consequently, according to (5.40a,c), in the present case the limit state function reads * (5.43c) s ∗ = s ∗ (a, x) = (tf − t0 )2 + (yf (a, x) − y0 (a, x))2 − gmax .
102
5 Constructions of Limit State Functions
5.5.3.3
Transformation of Control Problems into Variational Problems
As can be seen from the variational problem (5.40a–c), in many cases one has to minimize an explicit objective function (5.40a) subject to simple constraints (5.40c). Hence, using the methods of calculus of variation [2], the corresponding limit state function can be represented directly by the minimum value of the objective function (5.40a). As shown in the following, by a certain transformation of optimal control problems to variational problems, this procedure can also be applied to the representation of the limit state function of a certain class of optimal control problems.
5.5.4 Discretization of Control Systems In the following we consider the approximate construction of limit state functions for optimal control systems by means of discretization techniques. Hence, first we have to treat the optimal control problem (5.38a–d). Starting with the time grid t0 ≤ t1 ≤ . . . ≤ tj ≤ tj +1 ≤ . . . tN −1 ≤ tN := tf with step lenghts hj := tj +1 − tj , j = 0, 1, . . . , N − 1, we define the state and control vectors y1 = y(t1 ), . . . , yj = y(tj ), . . . , yN −1 = y(tN −1 ), yN = y(tN ) = y(tf ),
(5.44a)
u0 = u(t0 ), . . . , uj = u(tj ), . . . , uN −1 = u(tN −1 ), . . . , uN = u(tN ).
(5.44b)
For the discretization of the initial value problem (5.38b,c) we consider the equivalent integral representation:
t y(t) − y(t0 ) =
f (s, y(s), u(s), (a, x))ds, t0 ≤ t ≤ tf .
(5.45)
t0
Numerous rules are available [16] for the discretization of the integral in (5.45). Applying the trapezoidal rule to the partial integrals over the subintervals [tj , tj +1 ], for the state values defined in (5.44a) we get the approximations: yj +1 − yj =
hj f (tj , yj , uj , (a, x)) + f (tj +1 , yj +1 , uj +1 , (a, x)) , 2
0 ≤ j ≤ N − 1.
(5.46)
For given initial value y0 , inputs uj , j = 1, . . . , N , parameter vector a and design vector x, (5.46) is a system of equations for the approximations yj of the states y(tj ), j = 1, . . . , N . In many cases the vector function f reads f (t, y, u, (a, x)) = fI (t, y) + b(t, u, (a, x))
(5.47)
5.5 Systems/Structures with Parameter-Dependent States
103
with given functions fI = fI (t, y), b = b(t, u, (a, x)). In this case system (5.46) can be represented by y1 −
=
h0 h0 fI (t1 , y1 ) = y0 + f (t0 , y0 , u0 , (a, x)) + b(t1 , u1 , (a, x)) , 2 2 (5.48a) hj hj yj +1 − fI (tj +1 , yj +1 ) − yj + fI (tj , yj ) 2 2
hj b(tj , uj , (a, x)) + b(tj +1 , uj +1 , (a, x)) , 1 ≤ j ≤ N − 1. 2
(5.48b)
A sufficient condition for the solvability of the system (5.48a,b) for the state vectors y1 , . . . , yN reads Lemma 5.1 Define the time, step size and control vectors tD := (t0 , t1 , . . . , tN −1 , tN ), tN = tf , hD := (h0 , h1 , . . . , hN −1 ), uD := (uT0 , uT1 , . . . , uTN −1 , uTN )T , yD := T )T . Moreover, suppose that (y1T , y2T , . . . , yN ϑj = ϑj (yj ) := yj − fI (tj , yj ), j = 1, 2, . . . , N,
(5.48c)
are bijective mappings onto the state domain under consideration. Then, system (5.48a,b) can be solved uniquely for yD , hence, yD = TD−1 (bD ),
(5.48d)
where bD = bD (tD , hD , y0 , uD , (a, x)) is the Nn-vector containing the subvectors on the right hand side of (5.48a,b), and TD = TD (yD ; t˜D , hD ), t˜D := (t1 , . . . , tN ) is the bijective operator defined by the left hand side of (5.48a,b). Proof The assertion follows from the bijectivity of the mappings ϑj , j 1, 2, . . . , N, and the successive solution of (5.48a,b) for y1 , . . . , yN .
=
Remark 5.6 Besides the trapezoidal rule, further well known methods for the numerical solution of initial value problems are: Euler polygonal-line, Heun, Runge–Kutta method, see [16]. Concerning the consistency, stability, and convergence of discretization methods for initial a vast amount of publications exist [5, 16]. If the function fI is linear in y, i.e. fI (t, y) = A(t)y with a matrix A = A(t), then system (5.48a,b) reads
(5.49)
104
5 Constructions of Limit State Functions
h0 h0 h0 I − A(t1 ) y1 = I + A(t0 ) y0 + (b(t0 , u0 , (a, x)) + b(t1 , u1 , (a, x))) , 2 2 2 (5.50a) hj hj I − A(tj +1 ) yj +1 − I + A(tj ) yj 2 2 =
hj b(tj , uj , (a, x)) + b(tj +1 , uj +1 , (a, x)) , 1 ≤ j ≤ N − 1. 2
(5.50b)
Concerning the properties of (5.50a,b) we get the following result: Theorem 5.5 Given the initial value y0 , system (5.50a) of equations for the vector yD of state vectors on tD is linear with a block-diagonal matrix AD = AD A(tj +1 ), hj , 0 ≤ j ≤ N − 1 . If the value h2j is no eigenvalue of A(tj +1 ) for each j = 0, 1, . . . , N − 1, then AD is regular, and (5.50a,b) can be solved uniquely for yD = A−1 D bD
(5.51)
with the vector bD = bD (tD , hD , y0 , uD , (a, x)) of right hand sides in (5.50a,b). (5.52) h
Proof If the matrices I − 2j A(tj +1 ), 0 ≤ j ≤ N − 1, are regular, i.e. h2j is no eigenvalue of A(tj +1 ) for each j = 0, 1, . . . , N − 1, then the system AD yD = bD can be solved uniquely for yD . Moreover, the N partial equations in (5.50a,b) can be solved sequentially for y1 , y2 , . . . , yN . Corresponding to the discretization of the dynamic system (5.36a), we may also discretize the integral gI (y(·), u(·), (a, x)) in the objective function g, see (5.37), by the trapezoidal rule: gI (y(·), u(·), (a, x)) ≈ gI D (tD , hD , y0 , yD , uD , (a, x)),
(5.53a)
where gI D (tD , hD , y0 , yD , uD , (a, x)) := +
N −2 j =0
h0 γ (t0 , y0 , u0 , (a, x)) 2
hj + hj +1 hN −1 γ (tj +1 , yj +1 , uj +1 , (a, x)) + γ (tN , yN , uN , (a, x)). 2 2 (5.53b)
By means of the above discretizations of the control problem (5.38a–d), the corresponding limit state function s ∗ = s ∗ (a, x) can be approximated by the optimal ∗ = s ∗ (a, x) of the following parameter optimization problem: value sD D
5.5 Systems/Structures with Parameter-Dependent States
105
Theorem 5.6 Suppose that condition (5.47) holds. The approximate limit state ∗ = s ∗ (a, x) is then given by the optimal value of the parameter function sD D optimization problem: min gI D (tD , hD , y0 , yD , uD , (a, x)) + ψ(tN , yN , (a, x)) − gmax
yD ,uD
(5.54a)
s.t. TD (yD ; t˜D , hD ) = bD (tD , hD , y0 , uD , (a, x)) yD ∈ Y0D , uD ∈ U0D ,
(5.54b) (5.54c)
where Y0D := Y0 (t1 ) × Y0 (t2 ) × . . . × Y0 (tN ), and U0D is defined in the same way. Under the assumptions of Theorem 5.5 condition (5.54b) can be solved for yD , hence, yD = AD (t˜D , hD )−1 bD (tD , hD , y0 , uD , (a, x)).
(5.54d)
Remark 5.7 In many applications the objective function (5.54a) is quadratic in (yD , uD ), and the right hand side bD of (5.54b) is linear in uD . Putting (5.54d) into (5.54a), we get the following Corollary 5.2 Suppose that the assumptions in Theorem 5.2 and 5.3 hold. Furthermore, suppose that Y0D := RN my . Then (5.54a–c) is reduced to min gI D (tD , hD , y0 , yD , uD , (a, x)) + ψ(tN , yN , (a, x)) − gmax
(5.55)
uD ∈U0D
with yD = yD (tD , hD , y0 , uD , (a, x)) given by (5.54d).
5.5.4.1
Control Problems with Quadratic Objective Functions
Suppose here that the cost functions γ , ψ are given by γ (t, y, u, (a, x)) := y T Q(t, (a, x))y + q(t, (a, x))T y + γ0 (t, (a, x)) +uT W (t, (a, x))u ψ(t, y, (a, x)) := y T P (t, (a, x))y + p(t, (a, x))T y + ψ0 (t, (a, x)),
(5.56a) (5.56b)
where Q = Q(t, (a, x)), W = W (t, (a, x)), P = P (t, (a, x)) are positive (semi)definite matrices, q = q(t, (a, x)), p = p(t, (a, x)) denote vectors, and γ0 (t, (a, x)), ψ0 (t, (a, x)) are scalars depending on the variables t, (a, x). According to (5.53a,b) we obtain then
106
5 Constructions of Limit State Functions
gI D (tD , hD , y0 , yD , uD , (a, x)) :=
h0 T y Q(t0 , (a, x))y0 2 0
+ q(t0 , (a, x))T y0 + γ (t0 , (a, x)) + uT0 W (t0 , (a, x))u0 +
N −2 j =0
hj + hj +1 T yj +1 Q(tj +1 , (a, x))yj +1 2
+ q(tj +1 , (a, x))T yj +1 + γ (t(j +1) , (a, x)) + uTj+1 W (tj +1 , (a, x))uj +1
hN −1 T yN Q(tN , (a, x))yN + q(tN , (a, x))T yN 2 + γ (tN , (a, x)) + uTN W (tN , (a, x))uN +
=
h0 T (y Q(t0 , (a, x))y0 + q(t0 , (a, x))T y0 + γ (t0 , (a, x))) 2 0
T + yD QD (t˜D , hD , (a, x))yD
+ qD (t˜D , hD , (a, x))T yD + γD (t˜D , hD , (a, x)) + uTD W (tD , hD , (a, x))uD (5.57a) with block-diagonal matrices QD (t˜D , hD , (a, x)), W (tD , hD , (a, x)), a vector qD (t˜D , hD , (a, x)) and a scalar γD (t˜D , hD , (a, x)). Moreover, the end point costs read: T P (tN , (a, x))yN + p(tN , (a, x))T yN + ψ0 (tN , (a, x)). ψ(tN , yN , (a, x)) := yN (5.57b)
Assuming that the vector b = b(t, u, (a, x)) in the representation (5.47) of f is given by b(t, u, (a, x)) := B(t, (a, x))u + β(t, (a, x)), (5.58a) then the vector bD can be represented, cf. (5.50a,b), (5.51), by bD (tD , hD , y0 , uD , (a, x)) = BD (tD , hD , (a, x))uD +βD (tD , hD , y0 , (a, x)) (5.58b) with a corresponding matrix BD and vector βD . With (5.58b), the assumptions of Theorem 5.2 and (5.51) the discretized state trajectory yD reads yD =AD (t˜D , hD )−1 BD (tD , hD , (a, x))uD +AD (t˜D , hD )−1 βD (tD , hD , y0 , (a, x)). (5.59a) The function βD = βD (tD , hD , y0 , (a, x)) can be represented by the sum βD (tD , hD , y0 , (a, x)) = βD0 (t0 , h0 , y0 ) + β˜D (tD , hD , (a, x)) of given functions βD0 = βD0 (t0 , h0 , y0 ) and β˜D = β˜D (tD , hD , (a, x)).
(5.59b)
5.5 Systems/Structures with Parameter-Dependent States
107
−1 If Ainv N denotes the submatrix of AD with the last n rows, then for the terminal subvector yN of yD we have inv ˜ yN = Ainv N (tD , hD )BD (tD , hD , (a, x))uD + AN βD (tD , hD , y0 , (a, x)).
(5.59c)
Inserting now (5.59a,b) into (5.57a,b) we get the following result: Lemma 5.2 For quadratic cost functions, the objective function of the minimization problem (5.55) can be represented by gD (tD , hD , y0 , yD , uD , (a, x)) = gI D + ψ =
h0 T y Q(t0 , (a, x))y0 2 0
+ q(t0 , (a, x))T y0 + uTD Q(tD , hD , (a, x))uD + q(tD , hD , y0 , (a, x))T uD + d(tD , hD , y0 , (a, x)),
(5.60a)
where −1 T T inv T AD QD A−1 P Ainv (5.60b) Q(tD , hD , (a, x)) := BD N BD + WD , D + AN T −1 T inv T A−1 βD q(tD , hD , y0 , (a, x)) := 2BD P Ainv N D QD AD + AN T T inv T A−1 +BD p , (5.60c) D qD + AN T −1 T inv T A−1 βD P Ainv d(tD , hD , y0 , (a, x)) := βD N D QD AD + AN T T inv T A−1 +βD p . (5.60d) D qD + AN 5.5.4.2
Tracking Problems
In case of tracking problems [11], with a prescribed trajectory y = y(t), t0 ≤ t ≤ tf , the cost functions γ , ψ, see (5.56a,b), reads γ (t, y, u, (a, x)) := (y − y)T Q(t, (a, x))(y − y) + uT W (t, (a, x))u ψ(tf , y, (a, x)) := 0.
(5.61a) (5.61b)
Thus, corresponding to (5.57a,b) the discretized total cost function of the tracking problem reads gD (tD , hD , y0 , yD , uD , (a, x)) =
h0 (y0 − y 0 )T Q(t0 , (a, x))(y0 − y 0 ) 2
+(yD − y D )T QD (t˜D , hD , (a, x))(yD − y D ) + uTD W (tD , hD , (a, x))uD . (5.62a)
108
5 Constructions of Limit State Functions
Inserting yD , represented by (5.58a,b), (5.59a) into (5.62a), we get gD (tD , hD , y0 , yD (uD ), uD , (a, x)) =
h0 (y0 − y 0 )T Q(t0 , (a, x))(y0 − y 0 ) 2
= uD QuD + uTD q + d,
(5.62b)
where T
T −1 AD QD A−1 Q := BD D BD , T
(5.62c)
T −1 q := −2BD AD QD (A−1 D βD − y D ),
(5.62d)
−1 T d := (A−1 D βD − y D ) QD (AD βD − y D ).
(5.62e)
−1 Let us denote xD the vector with the subvectors y0 − y 0 , A−1 D BD uD + AD βD − y D , uD let be R the positive (semi) definite block-diagonal matrix with the square matrices h20 Q(t0 ), QD , WD . Then, gD can be represented also by T gD = xD RxD = xD 2 ,
(5.62f)
where is defined by R = T .
5.5.4.3
Endpoint Control
Given a certain end point yf to be reached with minimum costs for the control input u, the cost functions γ , ψ, see (5.56a,b), can be selected as follows: γ (t, y, u, (a, x)) := uT W (t, (a, x))u ψ(tf , y, (a, x)) := (y − yf )T P (tN , (a, x))(yN − yf ).
(5.63a) (5.63b)
Here, corresponding to (5.57a,b) the discretized total cost function of the end point control problem reads gD (tD , hD , yN , uD , (a, x)) = (yN − yf )T P (tN , (a, x))(yN − yf ) + uTD W (tD , hD , (a, x))uD . 5.5.4.4
(5.64)
Control Problems with Sublinear Objective Functions
Considering here tracking and end point control problem, the cost functions γ , ψ are given by γ (t, y, u, (a, x)) := max cl (t, (a, x))T (y − y(t)) + cI I (t, u), l
(5.65a)
5.5 Systems/Structures with Parameter-Dependent States
ψ(t, y, (a, x)) := max cl (tf , (a, x))T (y − y(tf )), l
109
(5.65b)
where cl = cl (t, (a, x), 1 ≤ l ≤ lmax , are my -vectors of cost factors, y = y(t) is the prescribed function to be tracked, and cI I = cI I (t, u) denotes the costs for the control input u. Corresponding to (5.57a,b) we obtain then gI D (tD , hD , y0 , yD , uD , (a, x)) := +
h0 max cl0 (t0 , (a, x))T (y0 − y 0 ) + cI I (t0 , u0 ) l0 2
N −2 j =0
+
hj + hj +1 max cl (tj +1 , (a, x))T (yj +1 − y j +1 ) + cI I (tj +1 , uj +1 ) lj +1 2
hN −1 max cl (tN , (a, x))T (yN − y N ) + cI I (tN , uN ) . l 2
(5.66a)
Moreover, the end point costs read ψ(tN , yN , (a, x)) := max cl (tN , (a, x))T (yN − y N ). l
(5.66b)
Define now the transformed state vectors hj + hj +1 h0 hN −1 y0 , y˜j +1 := yj +1 , 0 ≤ j ≤ N − 2, y˜N := yN , 2 2 2 (5.67a) and in the same way the transformed states to be reached y˜ 0 , y˜ j , 1 ≤ j ≤ N − 1, y˜ N . Let us then denote y˜D , y˜ D , resp., the Nmy -vector containing the subvectors y˜j , y˜ j , resp., 1 ≤ j ≤ N. Moreover, for an index set lD := {l1 , l2 , . . . , lN }, 1 ≤ lj ≤ lmax , 1 ≤ j ≤ N, we define the total cost vector y˜0 :=
T clD (t˜D ) := clT1 , clT2 , . . . , clTN .
(5.67b)
According to (5.66a,b) we have the following result: Lemma 5.3 For sublinear cost functions, the total approximate cost function gD = gI D + ψ of the minimization problem (5.55) can be represented by gD (tD , hD , y0 , yD , uD , (a, x)) := + +
max
1≤lj ≤lmax ,1≤j ≤N
max
1≤lf ≤lmax
max
1≤l0 ≤lmax
cl0 (t0 , (a, x))T (y˜0 − y˜ 0 )
clD (t˜D , (a.x))T (y˜D − y˜ D )
clf (tN , (a, x))T (yN − y N ) + cI I D (tD , hD , uD ),
(5.68)
where the discretized trajectory yD , the terminal state yN , resp., are defined by (5.59a,b).
110
5 Constructions of Limit State Functions
5.5.5 Reliability-Based Optimal Control According to Sect. 5.5, see (5.38a–d,5.54a–d,5.55, 5.59a,b), the approximate per∗ = s ∗ (t , h , y , (a, x)) can be represented, assuming for formance function sD D D D 0 simplification that Y0D := RN my , by ∗ (tD , hD , y0 , (a, x)) := min gD (tD , hD , y0 , yD (uD ), uD , (a, x)) − gmax , sD uD ∈U0D
(5.69a) where yD (uD ) = yD (tD , hD , y0 , uD , (a, x)), yN (uD ) = yN (tD , hD , y0 , uD , (a, x)) is given by one of the above equations, as, e.g., (5.48d,5.51,5.59a,b). In case that the control vector uD is treated as an outer, selectable input variable, such as the design vector x, then, instead of (5.69a), we have the performance function—being a special case of (5.69a) ∗ sD (tD , hD , y0 ,yD (uD ), uD , (a, x)) := gD (tD , hD , y0 ,yD (uD ), uD , (a, x)) − gmax .
(5.69b) 5.5.5.1
Computation of the Probability of Survival
Suppose now that the initial value y0 = y0 (ω) and the parameter vector a = a(ω) are—without restrictions—stochastically independent random variables. In both cases (5.69a,b), the probability of survival ps can be approximated as follows: Theorem 5.7 Let h = h(t) denote a nonnegative, monotone increasing function on R. Then, ps (x) ≈ psD (x) = P
min gD (tD , hD , y0 ,yD (uD ), uD , (a, x)) ≤ gmax
uD ∈U0D
≥ max P gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)) ≤ gmax ,
uD ∈U0D
≥1−
1 min Eh gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x) . h(gmax ) uD ∈U0D (5.70)
Improvements of the second inequality can be obtained by minimizing the quotient in the second inequality subject to a certain class of nonnegative, monotone increasing functions h(·). If all measurable nonnegative monotone increasing functions are taken into account, then the minimum is attained at h∗ := 1[gmax ,+∞) , and the second inequality in (5.70) holds with “=”. Proof Since gD (.., ω, ., uD , . . .) ≤ gmax implies that min gD (.., ω, ., uD , . . .) ≤ uD ∈U0D
gmax for each given uD ∈ U0D , for psD (x) we get the lower bound psD (x) ≥ max P gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)) ≤ gmax , . uD ∈U0D
5.5 Systems/Structures with Parameter-Dependent States
111
The second inequality in (5.70) follows from the Markov inequality. Improved lower bounds can be obtained by minimizing the quotient in this inequality subject to a certain class of nonnegative, monotone increasing functions h(·). If all measurable nonnegative monotone increasing functions are taken into account, then the minimum is attained at h∗ := 1[gmax ,+∞) , and the second inequality in (5.70) holds with “=”. From the above theorem we get the following consequences: Corollary (a) With a minimum probability of survival α, a sufficient condition for psD (x) ≥ α is given by min Eh gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x) ≤ (1 − α)h(gmax ).
uD ∈U0D
(5.71a) (b) For nonnegative, monotone increasing concave functions on R we have psD (x) 1 ≥1− h h(gmax )
min EgD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)
uD ∈U0D
.
(5.71b) The optimal, but nonsmooth function h∗ := 1[gmax ,+∞) can be approximated by various smooth sigmoid functions, such as: x − gmax 1 1 1 + erf , h2 (x) = h1 (x) = √ 1 + exp(−k (x − gmax )) 2 2σ x − gmax , x ∈ R, (5.71c) =! σ where erf(x) denotes the error function, ! is the Gaussian distribution function, and k > 0,σ > 0 are positive parameters. For k → +∞, σ ↓ 0, resp, the functions h1 ,h2 , resp, approach the 0−1−function h∗ . Expanding the sigmoid functions h1 ,h2 in Taylor series at x0 = gmax , we get h1 (x) =
1 k k3 k5 + (x − gmax ) − (x − gmax )3 + (x − gmax )5 − . . . , 2 4 48 480 (5.71d)
h2 (x) 1 1 = +√ 2 2π
x − gmax 1 − σ 2·3
x − gmax σ
3
1 + 2!22 5
x − gmax σ
5
! − ... . (5.71e)
112
5 Constructions of Limit State Functions
The expectation Eh(gD ) = Eh(gD (. . . , y0 (ω), . . . , a(ω), . . .)) can be deter˜ mined also by means of Taylor expansion of h(b) := h(gD )(b) with respect to b := (y0 ,a). The needed derivatives up to second order read, cf. [10, 11], ˜ ∇b h(b) = h (gD (b))∇b gD (b) ˜ ∇b2 h(b) = h (gD (b))∇b gD (b)∇b gD (b)T + h (gD (b))∇b2 gD (b).
(5.71f)
For approximate performance functions gD which are linear with respect to the parameter vector b, the second term in (5.71f) vanish.
5.5.5.2
Tracking Problems with Quadratic Cost Function
According to Corollary 5.2, (5.55), (5.62f) and Theorem 5.7, for the probability of survival we have √ psD (x) ≥ max P xD ≤ gmax ≥ 1 − uD ∈U0D
where
1 min Eh xD , √ h( gmax ) uD ∈U0D (5.72a)
⎛
⎞ y0 − y 0 −1 ⎠ xD (tD , hD , y0 ,(a, x), uD ) := ⎝A−1 D BD uD + AD βD − y D , uD ⎞ ⎛ h0 0 2 Q(t0 ) 0 R := ⎝ 0 QD 0 ⎠ , 0 0 WD and h = h(x) is a nonnegative, monotone increasing function on R. √ Taking h(x) := x 2 , we get h( gmax ) = gmax and Eh xD = trRM2 (xD (·)),
(5.72b)
(5.72c)
(5.72d)
where M2 (xD (·)) denotes the matrix of second order moments of xD .
5.5.5.3
Endpoint Control in Case of Sublinear Cost Function
With Lemma 5.3 and (5.59a,b), in case of endpoint control with a sublinear cost function the performance function reads gD (tD , hD , y0 ,yD , uD , (a, x)) :=
max
1≤lf ≤lmax
clf (tN , (a, x))T (yN − y N )
+ cI I D (tD , hD , uD ),
(5.73)
5.5 Systems/Structures with Parameter-Dependent States
113
where yN is given by (5.59c). According to (5.70) and (5.72a–d) for the approximate probability psD (x) we get the lower bound: Theorem 5.8 For optimal endpoint control problems with sublinear cost functions, lower bounds for the approximate probability of survival psD (x) are given by psD (x) ≥ max P gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)) ≤ gmax , uD ∈U0D
= max P clf (tN , (a, x))T (yN − y N ) uD ∈U0D
+ cI I D (tD , hD , uD ) ≤ gmax , for 1 ≤ lf ≤ lmax ,
(5.74)
where yN is defined as in (5.59c). Corresponding lower bounds for psD (x) hold also for general optimal control problems with sublinear cost functions. With the representation (5.59a,c) of yD , yN and (5.59b) of the function β, from (5.74) we also get ˜ psD (x) ≥ max P clf (tN , (a, x))T (Ainv N (tD , hD )BD (tD , hD , (a, x))uD uD ∈U0D
˜ +Ainv N bD0 (t0 ,h0 ,y0 ) + βD (tD , hD , (a, x)) − y N ) + cI I D (tD , hD , uD ) ≤ gmax , for 1 ≤ lf ≤ lmax .
(5.75a)
In many cases the function bD0 is linear in y0 , cf. (5.49), and β˜D is affine linear with respect to the parameter vector a = a(ω), hence bD0 (t0 .h0 ,y0 ) = 0 (t0 ,h0 )y0 , β˜D (tD , hD , (a, x)) = 1 (tD , hD , x)a + γ1 (tD , hD , x).
5.5.5.4
(5.75b)
Further Lower Bound for psD
For deriving a further lower bound for psD we model the control input uD by a random vector uD = u˜ D (ω) with a certain distribution on the discretized control domain U0D . We have the following result: Theorem 5.9 If uD = u˜ D (ω), ω ∈ , is a random vector taking values in U0D with probability 1, then psD (x) ≥ P gD (tD , hD , y0 (ω), yD (u˜ D (ω)), u˜ D (ω), (a(ω), x)) ≤ gmax .
(5.76a)
Conversely, if u˜ ∗D (ω) is defined by u˜ ∗D (ω) := argminuD ∈U0D g(tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)),
(5.76b)
114
5 Constructions of Limit State Functions
then psD (x) = P gD (tD , hD , y0 (ω), yD (u˜ ∗D (ω)), u˜ ∗D (ω), (a(ω), x)) ≤ gmax . (5.76c) Proof For each ω ∈ such that gD (tD , hD , y0 (ω), yD (u˜ D (ω)), u˜ D (ω), (a(ω), x)) ≤ gmax we also get min gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)) ≤ gmax ,
uD ∈U0D
which yields the first assertion. If u˜ ∗D is defined by (5.76b), then P (u˜ ∗D (ω) ∈ U0D ) = 1 and
psD (x) =P
min gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)) ≤ gmax
uD ∈U0D
=P gD (tD , hD , y0 (ω), yD (u˜ ∗D (ω)), u˜ ∗D (ω), (a(ω), x)) ≤ gmax . Note 5.3 If u˜ D = u˜ D (ω) is independent of (y0 (ω), a(ω)), then the right hand side of (5.76a) can be represented by P (gD (tD , hD , y0 (ω), yD (u˜ D (ω)), u˜ D (ω), (a(ω), x)) ≤ gmax )
= P (gD (tD , hD , y0 (ω), yD (uD ), uD , (a(ω), x)) ≤ gmax ) Pu˜ D (duD ), uD ∈U0D
where Pu˜ D is the probability distribution of u˜ D . 5.5.5.5
Reliability Computation by Using Copulas
Having a multiple cost function γk = γk (t, y(t), u(t), (a, x)), k = 1, . . . , r,
(5.77)
the objective function of the control problem may be defined, see (5.38a), by the maximum of the individual total costs along the time interval [t0 ,tf ]: g(y(·), u(·), (a, x)) := max gk (y(·), u(·), (a, x)) 1≤k≤r
5.5 Systems/Structures with Parameter-Dependent States
= max
1≤k≤r
tf
γk t, y(t), u(t), (a, x) dt + ψk (tf , y(tf ), (a, x)) .
115
(5.78a)
t0
Consequently, after discretization, the operating condition g(y(·), u(·), (a, x)) ≤ gmax reads, cf. (5.4a–c), max gkD (tD , hD , y0 ,yD , uD , (a, x)) ≤ gmax .
1≤k≤r
(5.78b)
In control problems, dynamic systems we can have certain integral constraints
tf
ηk t, y(t), u(t), (a, x) dt ≤ ηk,max , k = 1, . . . , r.
(5.78c)
t0
After discretization we get again conditions of the type (5.78b): ηkD (tD , hD , y0 ,yD , uD , (a, x)) ≤ ηk,max , k = 1, . . . , r.
(5.78d)
Furthermore, also pointwise constraints may occur: ζk tj , y(tj ), u(tj ), (a, x) ≤ ζj,max , j ∈ J,
(5.78e)
with a certain set J ⊂ {1,2, . . . , N }, which are special constraints of the type (5.78b,d). Hence, for simplification, suppressing the other arguments in (5.77–5.78e) in the following we represent the above conditions by systems of constraints involving a (joint) random vector b = b(ω) and a (joint) decision vector w = (x, uD ): hk (b(ω), w) ≤ hk,max , k = 1, . . . , r,
(5.78f)
with given functions hk = hk (b, w), k = 1, . . . , r. Note 5.4 If the approximate control uD does not belong to the outer decision variables, then uD is just an inner parameter, see (5.69a), or uD = uD (ω) belongs to the stochastic variables, see (5.76a–c). In many applications one has then to determine and/or to maximize the joint probability of safety/survival: ps (w) := P hk (b(ω), w)) ≤ hk,max , k = 1, . . . , r,
(5.79a)
that the conditions (5.78f) hold. Consider the separated probabilities ps,k (w) = F (hk,max ; w) := P hk (b(ω), w) ≤ hk,max , k = 1, . . . , r,
(5.79b)
116
5 Constructions of Limit State Functions
and the copula C = C(p1 ,p2 , . . . , pr ), 0 ≤ pk ≤ 1,k = 1, . . . , r,
(5.79c)
representing the stochastic dependence or correlations of the one-dimensional random variables hk = hk (b(ω), w), k = 1,2, . . . , r. The probability of safety ps = ps (w) can be represented by ps (w) = C F (h1,max ; w), . . . , F (hr,max ; w)
(5.79d)
with the marginal distribution functions F = F (hk,max ; w) of the 1 − dim random variables hk = hk (b(ω), w), k = 1, . . . , r. Consider now the set & ' P oS := P oS(w) : w ∈ W
(5.80a)
with ⎛ ⎞ F (h1,max ; w) ⎜F (h2,max ; w)⎟ ⎜ ⎟ P oS(w) := ⎜ ⎟ .. ⎝ ⎠ .
(5.80b)
F (hr,max ; w) of marginal probabilities of safety, where W denotes the domain of w. For the maximization max ps (w) of the probability of safety ps = ps (w) on W we have w∈W
the following result: Theorem 5.10 Suppose that the copula C = C(p) is strictly monotone increasing in each of its variables pk , k = 1, . . . , r. Moreover, assume that the copula C(·), describing the correlations of the 1 − dim random variables hk (·), k = 1, . . . , r, is independent of the decision variable w. If w∗ is an optimal solution of max ps (w), then P oS(w ∗ ) is a Pareto-optimal point of P oS.
w∈W
Proof Let w ∗ be an optimal solution of max ps (w). Assuming that P oS(w ∗ ) is not w∈W
a Pareto-optimal point of P oS, then there is a point w˜ ∈ W such that ˜ j = 1, . . . , r, F (hj,max ; w ∗ ) ≤ F (hj,max ; w), F (hj,max ; w ∗ ) < F (hj,max ; w) ˜ for an index j = j0 . According to (5.79d, 5.80a) and the assumption on C we get the contradiction ps (w ∗ ) < ps (w). ˜ Sufficient conditions for the condition in the above theorem that the copula C(·) is independent if of the variables w are given in the following.
5.5 Systems/Structures with Parameter-Dependent States
117
Lemma 5.4 (a) A copula C(·) related to the performance or cost functions hk , k = 1, . . . , r, is independent of the variables w if this representation holds ) (I I ) hk = hk (b, w) = h(I k (b) + hk (w), k = 1, . . . , r, (I )
(5.81a)
(I I )
with certain functions hk , gk , k = 1, . . . , r. (b) If the performance or cost functions hk , k = 1, . . . , r, are approximated, hk (b, w) ≈ h˜ k (b, w) by the semi-linearizations ¯ w) + ∇b hk (b, ¯ w0 )T (b − b) ¯ h˜ k (b, w) := hk (b,
(5.81b)
¯ w0 ), then the copula with respect to the approximates of hk , k = 1, . . . , r, at (b, h˜ k , k = 1, . . . , r, is independent of w. Proof In case (a) the condition (5.78f) can be represented by (I )
(I I )
hk (b(ω) ≤ hk,max − hk
(w), k = 1, . . . , r.
(5.81c)
) Hence, the C(·) is determined only by the random variables h(I k (b(ω), k = 1, . . . , r, which are independent of w. In case (b) for condition (5.78f) we get
¯ w0 )T (b − b) ¯ ≤ hk,max − hk (b, ¯ w), k = 1, . . . , r. ∇b hk (b,
(5.81d)
Thus, also in this case the random variables on the left side of (5.81d) are independent of w. For the accuracy of the semi-linearization (5.81b) we have this result: ¯ w0 ; b, w), R ¯ = R ¯ (w0 ; w), resp., Lemma 5.5 Let us denote Rk = Rk (b, k,b k,b denote the remainder term of the first order Taylor expansion of hk = hk (b, w), ¯ w) at (b, ¯ w0 ), w0 , respectively. Then, hk = hk (b, hk (b¯ + b, w0 + w) − h˜ k (b¯ + b, w0 + w) ¯ w0 ; b, w) − R ¯ (w0 ; w), k = 1, . . . , r. = Rk (b, k,b
(5.82a)
The difference between the errors of the (full) linearization and the semilinearization can be evaluated by the above lemma as follows: Corollary Denoting by L hk , SL hk , resp., the errors of the full, semi¯ w0 ), then for all (b, w) it holds linearization, resp., of hk at (b, L hk (b, w) − SL hk (b, w) = Rk,b¯ (w0 ; w), k = 1, . . . , r.
(5.82b)
118
5 Constructions of Limit State Functions
Some special cases are shown next: Example 5.3 (a) If hk is linear in its argument (b, w), then hk = h˜ k and L hk = SL hk . (b) For a quadratic function hk with the Hessian
Qbb Qwb ∇ hk (b, w) = Qbw Qww
2
(5.82c)
we get L hk (b, w) − SL hk (b, w) = 12 w T Qww w. c) If hk is convex in w, then L hk (b, w) ≥ SL hk (b, w) for all (b, w)
5.5.5.6
Approximations of P oS
Approximations of the set P oS of marginal probabilities of safety can be obtained by using the Markov inequality: Given monotonous increasing functions φj = φj (t), t ∈ R, j = 1, . . . , r, for each index j we have F (hj,max ; w) = P (hj (b(ω), w) ≤ hj,max ) ≥1−
1 Eφj (hj (b(ω), w)). φj (hj,max )
(5.83a)
Consequently, (5.83a) yields this result: Theorem 5.11 For given monotonous increasing functions φj = φj (t), t ∈ R, j = 1, . . . , r, P oS can be approximated from below as follows: ⎫ ⎧⎛ ⎞ 1 − φ1 (h11,max ) Eφ1 (h1 (b(ω), w)) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎟ ⎪ 1 ⎬ ⎨⎜ 1 − Eφ (h (b(ω), w)) ⎜ ⎟ 2 2 φ2 (h2,max ) ⎜ ⎟ , , w ∈ W P oS ≥ ⎜ . ⎟ ⎪ ⎪ .. ⎪ ⎪ ⎝ ⎠ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 1 − φr (h1r,max ) Eφr (hr (b(ω), w))
(5.83b)
where the inequality corresponds to the representation of the two sets by means of the vectors w ∈ W . Note 5.5 If the performance or cost functions φj = φj (t), t ∈ R, j = 1, . . . , r, are nonnegative, then also functions φ(t) = t, t 2 ,etc., can be taken.
References 1. Basudhar, A., Missoum, S., Harrison Sanchez, A.: Limit State Function Identification Using Support Vector Machines for Discontinuous Responses and Disjoint Failure Domains, pp. 1– 11. Elsevier, Amsterdam (2008). https://doi.org/10.1016/j.probengmech.2007.08.004
References
119
2. van Brunt, B.: The Calculus of Variations. Springer, New York (2006) 3. Cocchetti, G., Maier, G.: Static shakedown theorems in piecewise linearized poroplasticity. Arch. Appl. Mech. 68, 651–661 (1998). https://doi.org/10.1007/s004190050194 4. Gavin, H.P. Yan, S.: High-order limit state functions in the response surface method for structural reliability analysis. Struct. Saf. 30(2), 162–179 (2008). https://doi.org/10.1016/j. strusafe.2006.10.003 5. Gerdts, M.: Optimal Control of ODES and ADES. De Gruyter, Berlin (2012) 6. Hardy, G., Littlewood, J., Pólya, G.: Inequalities. Cambridge Univ. Press, London (1973) 7. Kaliszky, S.: Plasticity: Theory and Engineering Applications. Elsevier, Amsterdam (1989) 8. Lewis, F.: Applied Optimal Control and Estimation. Prentice-Hall Inc., Englewood Cliffs (1992) 9. Marti, K.: Stochastic optimization methods in optimal engineering design under stochastic uncertainty. ZAMM 83(11), 1–18 (2003) 10. Marti, K.: Stochastic Optimization Methods, 2nd edn. Springer, Berlin (2008). https://doi.org/ 10.1007/978-3-540-79458-5 11. Marti, K.: Stochastic Optimization Methods: Applications in Engineering and Operations Research, 3rd edn. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-46214-0 12. Neal, B.: The Plastic Methods of Structural Analysis. Chapman and Hall, London (1965) 13. Proppe, C.: Estimation of failure probabilities by local approximation of the limit state function. Struct. Saf. 30, 277–290 (2008). https://doi.org/10.1016/j.strusafe.2007.04.001 14. Rozvany, G.: Structural Design via Optimality Criteria. Kluwer, Dordrecht (1989) 15. Schuëller, G., Gasser, M.: Some Basic Principles of Reliability-Based Optimization (RBO) of Structure and Mechanical Components. Lecture Notes in Economics and Mathematical Systems (LNEMS), vol. 458,pp. 80–103. Springer, Berlin (1998) 16. Schwarz, H., Köckler, N.: Numerische Mathematik, 8th edn. Vieweg und Teubner, Wiesbaden (2011) 17. Soong, T.: Active Structural Control: Theory and Practice. John Wiley, New York (1990) 18. Soong, T., Costantinou, M.: Passive and Active Structural Vibration Control in Civil Engineering. CISM Courses and Lectures, vol. 345. Springer, Wien (1994) 19. Straub, D. (ed.): Reliability and Optimization of Structural Systems. CRC Press, Taylor and Francis, London (2010) 20. Thoft-Christensen, P., Murotsu, Y.: Application of Structural Systems Reliability Theory. Springer, Berlin (1986) 21. Tin-Loi, F.: On the optimal plastic synthesis of frames. Eng. Optim. 16, 91–108 (1990) 22. Valentine, F.: Convex Sets. McGraw-Hill, Inc., New York (1964) 23. Yang, J., Soong, T.: Recent advances in active control of civil engineering structures. Probab. Eng. Mech. 3(4), 179–188 (1988). https://doi.org/10.1016/0266-8920(88)90010-0 24. Zabczyk, J.: Mathematical Control Theory, 2nd edn. Birkhäuser, Basel (1995)
Part II
Optimization by Stochastic Methods: Foundations and Optimal Control/Acceleration of Random Search Methods (RSM)
Chapter 6
Random Search Procedures for Global Optimization
6.1 Introduction Solving optimization problems from engineering, as, e.g., parameter—or process— optimization problems min F (x) s.t. x ∈ D,
(6.1)
where D is a subset of Rn , one meets often the following situation: (a) One should find the global optimum in (6.1), hence most of the deterministic programming procedures, which are based on local improvements of the performance index F (x), will fail. (b) Concerning the objective function F one has a blackbox—situation, i.e. there is only few a priori information about the structure of F , especially there is no knowledge about the direct functional relationship between the control or input vector x ∈ D and its index of performance F (x); hence—besides the more or less detailed a priori information about F —the only way of getting objective information about the structure of F is via evaluations of its values F (x) by experiments or by means of a numerical procedure simulating the technical plant. Consequently, engineers use in these situations often a certain search procedure for finding an optimal vector x, see, e.g., Box’ EVOP method in [2] and the random search methods as first proposed by Anderson [1], Brooks [3], and Karnopp [6]. Obviously, deterministic search methods can be considered as special stochastic ones. Further classes of random search procedures are simulated annealing (SA) procedures [7], genetic algorithms (GA) [4], and further nature-oriented and evolution-type search methods [12]. For further information, see [11]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_6
123
124
6 Random Search Procedures for Global Optimization
An important property of (SA) methods is that also non-improving search steps are possible—with decreasing probability (cooling). In (GA) procedures, mutation, crossover and selection rules are applied to generate a sequence of sets (population) of points. Random changes occur during mutations. The optimal solution is then approximated by the sequence of best points in the current population. In the basic random search routine considered in this section—allowing not only local improvements as in mathematical programming—a sequence of nrandom vectors X0 , X1 , . . . , Xt , . . . in D is constructed according to the following recurrence relation: 4 ∈ D and F (zt+1 ) < F (Xt ) z ,z Xt+1 := t+1 t+1 (6.2a) Xt , if zt+1 ∈ D or F (zt+1 ) ≥ F (Xt ), t = 0, 1, 2, . . ., where the starting point X0 := x0 is a realization x0 of the random vector X0 having the given distribution PX0 := πstart concentrated on the domain Dstart . In many cases we have Dstart ⊂ D. If the search process starts at a given, fixed point x0 , then πstart = εx0 , where εx0 denotes the one-point measure at the point x0 . Moreover, z1 = Z1 (ω), z2 = Z2 (ω), . . . are realizations of n-random vectors Z1 , Z2 , . . . such that P Zt+1 ∈ B|X0 = x0 , X1 = x1 , . . . , Xt = xt , Z1 = z1 , Z2 = z2 , . . . , Zt = zt = P Zt+1 ∈ B|X0 = x0 , X1 = x1 , . . . , Xt = xt = πt (x t , B), (6.2b) where x t := (x0 , x1 , . . . , xt ) and πt (x t , ·) is a given transition probability distribution, as, e.g., a joint normal distribution with mean xt and covariance matrix Q = Qt . According to Definition (6.2a), given the states Xt = x t , first an n-vector zt+1 is generated randomly according to the distribution πt (x t , ·). Then, if zt+1 drops into the area of success GF (xt ), where & ' GF (x) := y ∈ D : F (y) < F (x) ,
(6.3)
we move to Xt+1 = zt+1 , otherwise we stay at Xt+1 = Xt . Thus, the whole search process Xt stays within the union D0 := Dstart ∪ D of the domain of the starting points and the feasible domain of the basic optimization problem (6.1). If the set of starting points Dstart ⊂ D is contained in the feasible domain D, then D0 = D. Moreover, we observe that if& Xt+1 ∈ GF (x ' t ), then also Xs ∈ GF (xt ) for all s > t. Furthermore, if F ∗ = inf F (x) : x ∈ D and, for given levels ε > 0, M < 0, resp., the set of ε−, M−optimal solutions of (6.1) is defined by & ' Bε,M := y ∈ D : F (y) ≤ F ∗ + ε, if F ∗ ∈ R, F (y) ≤ M, if F ∗ = −∞, resp. , (6.4a) then
6.2 The Convergence of the Basic Random Search Procedure
125
Xs ∈ Bε,M ⇒ Xs+1 ∈ Bε,M , s = 0, 1, 2, . . . .
(6.4b)
P (Xs ∈ Bε,M ) ≤ P (Xs+1 ∈ Bε,M ), s = 0, 1, 2, . . . .
(6.4c)
Hence,
In the following we assume that the objective function F of (6.1) is a measurable function on Rn .
6.2 The Convergence of the Basic Random Search Procedure For considering the convergence behavior of the search method (6.2a), we examine the probability P (Xt ∈ Bε,M ), t = 0, 1, 2, . . . , that the t-th iterate Xt is an ε-, M-optimal solution, resp., of (6.1), where ε > 0, M < 0 are given numbers. According to the considerations at the end of Sect. 6.1 these probabilities form a nondecreasing, convergent sequence, and due to (6.4b) we have that Xt ∈ Bε,M ⇔ X0 ∈ Bε,M , X1 ∈ Bε,M , . . . , Xt ∈ Bε,M ,
(6.5a)
hence, Bε,M , X1 ∈ Bε,M , . . . , Xt ∈ Bε,M ) (6.5b) P (Xt ∈ Bε,M ) = 1 − P (X0 ∈
P (X1 ∈ Bε,M , . . . , Xt ∈ Bε,M |X0 = x0 )πt (dx0 ). = 1− x0 ∈Bε,M
Denoting by Kt (x t , · · · ) the conditional distribution of Xt+1 given X0 = x0 , X1 = x1 , . . . , Xt = xt , we have Kt (x t , B) = πt x t , B ∩ GF (xt ) + 1 − πt ( x t , GF (xt ) εxt (B),
(6.6a)
where εx is the one-point-measure at x. Thus, with B ε,M := D0 \ Bε,M , we get
P (X1 ∈ Bε,M , . . . , Xt ∈ Bε,M |X0 = x0 ) =
K0 (x0 , dx1 ) . . . x1 ∈B ε,M
126
6 Random Search Procedures for Global Optimization
×·
Kt−2 (x t−2 , dxt−1 ) ·
xt−1 ∈B ε,M
Kt−1 (x t−1 , dxt ).
(6.6b)
xt ∈B ε,M
Considering first the t-th integral in the above equation, we obtain
Kt−1 (x t−1 , dxt ) = Kt−1 (x t−1 , D0 \ Bε,M )
(6.6c)
xt ∈B ε,M
= Kt−1 (x t−1 , D0 ) − Kt−1 (x t−1 , Bε,M ) = 1 − Kt−1 (x t−1 , Bε,M ). Suppose now that D0 = D. Having xt−1 ∈ Bε,M , we get εxt−1 (Bε,M ) = 0 and Bε,M ⊂ GF (xt−1 ), see the definitions (6.3), (6.4a). Hence, (6.6a,c) yield
Kt−1 (x t−1 , dxt ) ≤ 1 − πt−1 (x t−1 , Bε,M )
(6.7)
xt ∈B ε,M
& ' ≤ 1 − inf πt−1 (x t−1 , Bε,M ) : xs ∈ D0 \ Bε,M , 0 ≤ s ≤ t − 1
for all xs ∈ D0 \ Bε,M , s = 0.1, . . . , t − 1. Defining now αt , t = 0, 1, . . . , by & ' αt := αt (Bε,M ) = inf πt (x t , Bε,M ) : xs ∈ D0 \ Bε,M , 0 ≤ s ≤ t ,
(6.8)
from (6.6a,b) we obtain now P (X1 ∈ Bε,M , . . . , Xt ∈ Bε,M |X0 = x0 ) ≤
t−1 5
(1 − αs (Bε,M ))
(6.9a)
s=0
Hence, by (6.5b) and (6.9a) it is P (Xt ∈ Bε,M )
=1−
P (X1 ∈ Bε,M , . . . , Xt ∈ Bε,M |X0 = x0 )πstart (dx0 )
x0 ∈Bε,M
≥ 1 − (1 − πstart (Bε,M ))
t−1 5
(1 − αs (Bε,M )).
s=0
Since log u ≤ u − 1, for u > 0 we have
(6.9b)
6.2 The Convergence of the Basic Random Search Procedure
(1 − πstart (Bε,M ))
t−1 5
127
(1 − αt ) ≤ exp −πstart (Bε,M ) −
s=0
t−1
! αs
(6.9c)
s=0
and therefore also P (Xt ∈ Bε,M ) ≥ 1 − exp −πstart (Bε,M ) −
t−1
! αs .
(6.9d)
s=0
Thus, from (6.9d) we get the following convergence result: Theorem 6.1 Suppose now that D0 = D. The search process (6.2a) has the following convergence properties: (a) If for an ε > 0, M < 0, resp., ∞
αs (Bε,M ) = +∞,
(6.10)
s=0
then lim P (Xt ∈ Bε,M ) = 1. t→∞
(b) Suppose that F ∗ ∈ R and lim P (Xn ∈ Bε ) = 1 for every ε > 0.
n→∞
(6.11)
Then lim F (Xn ) = F ∗ w.p.1 (with probability one), n→∞
(c) Assume that F ∗ ∈ R and F is continuous and that the level sets Dε are nonempty and compact for each ε > 0. Then lim F (Xt ) = F ∗ implies that t→∞
also lim dist (Xt , D ∗ ) = 0, where dist (Xt , D ∗ ) denotes the distance between t→∞
Xt and the set D ∗ of global minimum points of (6.1). Proof For a proof of assertions (b) and (c), see [9].
Note 6.1 (a) For the case that the distribution πt of Zt+1 does not depend on the states x t , preliminary versions of the decisive inequality (6.9a) may already be found in the early Random Search literature, see, e.g., [3]. (b) Comparing the above theorem with the 0-1-laws of probability theory we observe that this result is essentially a consequence from the Borel–Cantellitype laws, see, e.g., [10, p. 400] and [5, p.1–6, 51–52]. (c) If the feasible domain D can be reached also from starting points x0 outside D, then the above result holds also. Working with random search procedures, one observes that the rate of convergence—especially near to the optimum—may be very poor. Hence, in the following we consider a modified random search procedures with an improved convergence behavior.
128
6 Random Search Procedures for Global Optimization
6.2.1 Discrete Optimization Problems Consider now the case that D contains a finite number r of elements di ∈ Rn , thus, D = {d1 , d2 , . . . , dr }.
(6.12a)
Furthermore, assume that Dstart ⊂ D, hence D0 = D, and let P Zt+1 ∈ B|X0 , X1 , . . . , Xt = P Zt+1 ∈ B|Xt .
(6.12b)
Hence, (Zt ) and (Xt ) are discrete-time stochastic processes. Therefore (Zt ) is described by a transition matrix (πijt ) from Xt = i to Zt+1 = j , and the iterates (Xt ) t ) from X = i to X are described by the transition matrix (pij t t+1 = j , t = 0, 1, . . . . & ' For Xt = di we have GF (di ) = GF (i) = j : F (dj ) < F (di ) . The relationship t ) reads then: between (πijt ) and (pij
(t,t+1) t = pij pij
⎧ ⎪ 0, if j ∈ GF (i) and j = i ⎪ ⎨ t , if j = i π 1 − il = ⎪ ⎪ t l∈GF (i) ⎩ πij , if j ∈ GF (i).
(6.12c)
Assuming now stationary search variables Zt (ω), i.e. in case πijt = πij for all t = 0, 1, . . ., then also (Xt ) is stationary and by searching for stationary distributions of (pij ) we get this result: Theorem 6.2 Let πij > 0 for all i, j = 1, . . . , r or suppose that πij > 0 j ∈GF (i)
for all 1 ≤ i ≤ r such that di is not a solution of the optimization problem (6.1). Then Xt (ω) converges with probability one to a solution of problem (6.1). Proof Without limitation we may assume here that 0 < ε < Fmax − F ∗ . According to (6.8) and the above assumptions, for the minimum probabilities αt = αt (ε) we have ' & ' & αt (ε) = inf πt (x t , Bε ) : xs ∈ D \ Bε , 0 ≤ s ≤ t = inf π(xt , Bε ) : xt ∈ D \ Bε ⎧ ⎫ ⎨ ⎬ = inf πij : di ∈ Bε =: α0 > 0, (6.12d) ⎩ ⎭ dj ∈Bε
provided that πij > 0 for all indices i, j . Since αt (ε) = α0 > 0 for all t = 0, 1, . . . , and each ε, 0 < ε < Fmax −F ∗ , the assertion follows now from Theorem 6.1. Since in the present case there are a finite number elements of feasible points di , i . . . , r, and the sets GF (i), Bε are contained in each other for corresponding values of ε, F (di ), resp., the proof for the second case follows then also from (6.12d).
6.3 Adaptive Random Search Methods
129
Example 6.1 For illustration we may assume—without limitation—that the elements d1 , d2 , . . . , dr of the feasible domain D are arranged such that (6.13a)
F (d1 ) < F (d2 ) < . . . < F dr ).
Hence, d1 is the unique minimum point, and the remaining points dj are arranged in strictly increasing order of the function values F (dj ). With the stationary transition (t,t+1) probabilities πij = πij from Xt to Zt+1 , the stationary transition matrix P t = P := (pij ) from Xt to Xt+1 reads ⎛
1 0 0 ⎜π t 1 − π t 0 21 ⎜ 21 t t t ⎜ t P = ⎜π31 π32 1 − (π31 + π32 ) ⎜ . .. .. ⎝ .. . . t t t πr2 πr3 πr1
...0 ...0 ...0 . . .. ..
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
(6.13b)
t . . . πrr
Corresponding to Theorem 6.2 we find that q T := (1, 0, . . . , 0) is a left fixed point of P and lim Xt = q with probability 1. t→∞
6.3 Adaptive Random Search Methods In this section we describe a general method how to find search variables (Zt ) such that the convergence of (Xt ) towards a solution of our basic problem (6.1) is accelerated. This can be achieved by an adaptive selection of the probability distribution of the search variates Z1 , Z2 , . . .. In order to control the sequence (Zt ) we assume that the probability distribution πt (x0 , x1 , . . . xt , ·) = πt (at , x0 , x1 , . . . , xt , ·)
(6.14a)
of Zt+1 depends on a control parameter vector at ∈ At (x0 , x1 , . . . , xt ), where At ⊂ A is the set of admissible controls at time t and given state-history x t := (x0 , x1 , . . . , xt ). Moreover At is assumed to be contained in a fixed set A. By δ = (δt )t≥0 , δt : Rn(1+t) → A, t = 0, 1, . . .
(6.14b)
we denote a decision rule, composed of the control functions or strategies δt , t = 0, 1, . . ., such that the control parameter vectors at are given by at := δt (x t ) ∈ At (x t ) for xs ∈ D, 0 ≤ s ≤ t, t = 0, 1, . . . . The set of admissible decision rules δ is defined then by
(6.14c)
130
6 Random Search Procedures for Global Optimization
& := δ : δ = (δt )t≥0 , δt (x0 , x1 , . . . , xt ) ∈ At (x0 , x1 , . . . , xt ) ' for xs ∈ D, 0 ≤ s ≤, t = 0, 1, . . . .
(6.14d)
Note 6.2 Since the transition probabilities πt (at , x0 , x1 , . . . , xt , ·) depend on the controls at , the expectation operator E = E δ depends on the decision rule δ. Looking for an optimal decision rule δ ∗ , clearly we have to guarantee that the process (Xt ) generated by δ ∗ converges actually to a solution of (6.1). Note that the reachability property in Theorem 6.1 holds, e.g. if the decision rules satisfies the condition, see (6.8), ∞
inf πt δt (x t ), x t , Bε ) : x t = (x0 , x1 , . . . , xt ), xs ∈ D \ Bε , 0 ≤ s ≤ t = + ∞,
t=0
(6.15a)
where cf. (6.4a) & ' Bε := y ∈ D : F (y) ≤ F ∗ + ε .
(6.15b)
In the stationary case πt (at , x t , ·) = π(at , xt , ·) and δt (x t ) = δ(xt ), t = 0, 1, . . . (6.15a) is reduced to the much simpler condition inf π δ(x), x, Bε : x ∈ D \ Bε > 0.
(6.15c)
Appropriate utility- or reward-criterion for the evaluation of the individual steps Xt → Xt+1 of the search process (Xt ) are, e.g. (a) Probability of success 4 ut (xt , xt+1 ) =
1, xt+1 ∈ GF (xt ) 0, otherwise,
(6.16a)
hence E δ ut (at , Xt , Xt+1 )|Xt = P Xt+1 ∈ G(Xt )|Xt is the (conditional) probability of a success in the state Xt . (b) Step length 4 ut (Xt , Xt+1 ) =
Xt+1 − Xt p , Xt+1 ∈ GF (Xt ) 0 , otherwise,
(6.16b)
where p ≥ 1 is a fixed number. Here E ut (at , Xt , Xt+1 |Xt is the average step length of Xt+1 into the area of success G(Xt ).
6.3 Adaptive Random Search Methods
131
A modification of the above case is (c) Relative step length ( ut (Xt , Xt+1 ) =
Xt+1 −Xt p Xt 0
, Xt+1 ∈ GF (Xt )
(6.16c)
, otherwise.
Obviously, any linear combination of the above three criterion yields a further criterion. In the following we suppose F ∗ = inf{F (x) : x ∈ D} > −∞. Search procedures with an improved performance can be constructed now by maximizing [8] the expected (total) reward function U∞ (x0 , δ) := E δ
∞
s us (δs (x s ), xs , xs+1 ),
(6.17a)
s=0
with respect to the decision rule δ = (δs ) involving the control functions δs satisfying the constraints (6.14c). Here, , 0 < < 1, denotes still a certain discount factor. For the maximization of the expectation U∞ (x0 , δ) next to we consider the (T − t)-stage search processes (Xs ) starting at time t and running then up to time T > t. Hence, with Xs = (X0 , X1 , . . . , Xs ), X0 := x0 , and as = δ(Xs ), s = t, t + 1, . . . , T − 1, let UT (t, x t ; δt , . . . , δT −1 ) := E δ
−1 T
(s−t) us (δs (Xs ), Xs , Xs+1 )|X0 = x0 , X1 = x1 , . . . , Xt = xt
s=t
(6.17b) denote the conditional expected reward of this (T −t)−stage process, given the time history Xs = xs , s = 0, 1, . . . , t, and the control functions δt , . . . , δT −1 . Denote by Kt (at , x t , ·) the transition probabilities for Xt → Xt+1 Kt (at , x t , B) = P (Xt+1 ∈ B|Xt = x t )
(6.18a)
of the process (Xt ) based on search variates (Zt ) controlled by control inputs at , t = 0, 1, . . . , where Xt = (X0 , X1 , . . . , Xt ) and B is any Borel subset of Rn . According to the basic definition (6.2a,b) of Xt and (6.14a-d) it holds
132
6 Random Search Procedures for Global Optimization
Kt (x t , B) = Kt (at , x t , B) = πt at , x t , B ∩ G(xt ) + 1 − πt at , x t , G(xt ) εxt (B),
(6.18b)
cf. (6.6a), where at = δt (x t ), and εx denotes again the one-point measure at the point x ∈ Rn . Due to the above definitions, the reward functions UT (t, x t ; δt , . . . , δT −1 ), t = 0, 1 . . . , T − 2, T − 1, see (6.17b), satisfy the recurrence rations
ut δt (x t ), xt , xt+1 UT (t, x ; δt , . . . , δT −1 ) = t
+ UT t + 1, (x t , xt+1 ); δt+1 , . . . , δT −1 ) Kt (δt (x t ), dxt+1 ) = u¯ t (δt (x t ), x t )
+ UT t + 1, (x t , y); δt+1 , . . . , δT −1 ) Kt (δt (x t ), x t , dy), (6.19a) where u¯ t (at , x t ) = E ut (at , Xt , Xt+1 )|Xt = x t =
ut (at , xt , y)Kt (at , x t , dy). (6.19b)
With the set of admissible decision rules δ, cf. (6.14d), the value function of the (T − t)-stage process Xt , . . . , XT with given state-history x t is now defined by VtT (x t ) :=
sup δt ,...,δT −1
&
UT (t, x t ; δt , . . . , δT −1 ) :
' δs (x s ) ∈ As (x s ), x s ∈ D (s+1) , t ≤ s ≤ T − 1 ,
(6.20)
where D (s+1) denotes the (s + 1)-fold Cartesian product of D. As mentioned already above, the set of restrictions in (6.20) should also include a condition guaranteeing that the whole search process (Xt ) controlled by the decision rule δ = (δt ) satisfies a reachability condition according to Theorem 6.1. However, in many practical problems this condition may be deleted since the optimal decision functions δt∗ defined by the optimization problem (6.20) can be shown to generate a search process (Xt∗ ) fulfilling a sufficient reachability condition. From (6.19a,b), for the value functions VtT (x t ) we get then the following recurrence relation: Theorem 6.3 Let VTT (x T ) = 0 for all x T ∈ R(T +1)n . If for all steps t under consideration the maximum is attained in (6.20), then the following backwards recurrence relation holds
t T t T Vt (x ) = sup (x , y) Kt (a, x t , dy) (6.21) ut (a, xt , y) + Vt+1 a∈At (x t )
6.3 Adaptive Random Search Methods
=
sup a∈At (x t )
133
u¯ t (a, x t ) +
T Vt+1 (x t , y)Kt (a, x t , dy) ,
t = T − 1, T − 2, . . . , 1, 0, where a = δt (x t ). Proof Omitting for simplification the constraint set in (6.20), from (6.19a,b) we get VtT (x t ) :=
sup δt ,...,δT −1
= sup
UT (t, x t ; δt , . . . , δT −1 )
sup
δt δt+1 ,...,δT −1
+ =
×
sup
u¯ t (δt (x t ), x t )
δt δt+1 ,...,δT −1
UT t + 1, (x t , y); δt+1 , . . . , δT −1 ) Kt (δ(x t ), x t , dy)
sup at ∈At
UT (t, x t ; δt , . . . , δT −1 ) = sup
(x t )
u¯ t (at , x t ) +
sup δt+1 ,...,δT −1
UT t + 1, (x t , y); δt+1 , . . . , δT −1 ) Kt (δ(x t ), x t , dy) . (6.22a)
Now, according to (6.17b) we have UT (t + 1, (x t , y); δt+1 , . . . , δT −1 ) = Ex t UT (t + 1, (x t , y), δt+1 (x t , y), δt+2 (x t , y, Xt+2 ), . . . , δT −1 (x t , y, Xt+2 , . . . , XT −1 )),
(6.22b)
where Ex t ,y denotes the conditional expectation given Xt+1 = (x t , y) and Xj are random vectors defined by (6.2a,b) and UT is the expected sum in (6.17b). Taking now, cf. (6.22a), the integral in (6.22b) with respect to y and then the supremum with respect to δt+1 , . . . , δT −1 under the constraints δs (x s ) ∈ As (x s ), x s ∈ D (s+1) , t + 1 ≤ s ≤ T − 1, see (6.20), the question is whether the integral and the supremum can be interchanged. Assuming that the suprema in (6.22b) are attained at as∗ = δs∗ (x s ), x s ∈ D (s+1) , t + 1 ≤ s ≤ T − 1, with the conditional expectation operator Ex t with respect to Xt = x t , from (6.22b) we get ∗ , . . . , δT∗ −1 Ex t UT t + 1, (x t , y); δt+1 ≤ sup Ex t UT t + 1, (x t , y); δt+1 , . . . , δT −1 ) δt+1 ,...,δT −1
≤ Ex t
sup δt+1 ,...,δT −1
UT t + 1, (x t , y); δt+1 , . . . , δT −1 )
∗ = Ex t UT t + 1, (x t , y); δt+1 , . . . , δT∗ −1 ) . Thus, (6.22c) yields
(6.22c)
134
6 Random Search Procedures for Global Optimization
sup δt+1 ,...,δT −1
Ex t UT t + 1, (x t , y); δt+1 , . . . , δT −1 )
= Ex t
sup δt+1 ,...,δT −1
T UT t + 1, (x t , y); δt+1 , . . . , δT −1 ) = Vt+1 (x t , y). (6.22d)
The assertion follows now from Eq. (6.22a) and (6.22d).
Remark 6.1 According to the definition (6.18b) of Kt (at , x t , ·) we have
T (x t , y)Kt (a, x t , dy) = Vt+1
T Vt+1 (x t , y)πt (a, x t , dy) y∈G(xt )
T + Vt+1 (x t , xt ) 1 − πt a, x t , G(xt ) .
(6.23a)
Furthermore, assuming ut (a, x, x) = 0 for all t = 0, 1, . . ., and x ∈ Rn , we have, cf. (6.19b),
t u¯ t (a, x ) = ut (a, xt , y)πt (a, x t , dy). (6.23b) y∈G(xt )
In the important Markovian case, i.e. if πt (a, x t , ·) = πt (a, xt , ·) and At (x t ) = At (xt ),
(6.23c)
the value function VtT depends only on xt , see (6.14a-d), (6.17b), (6.18a,b), and (6.21) has the form VtT (xt )
=
T u¯ t (a, xt ) + Vt+1 (y)Kt (a, xt , dy) . sup
(6.23d)
a∈At (xt )
In the one-stage case t = T − 1 Eq. (6.21) has the simple form VTT−1 x T −1 = sup u¯ T −1 a, x T −1 : a ∈ AT −1 x T −1 .
(6.23e)
6.3.1 Infinite-Stage Search Processes The decision process defined by (6.21) is called the sequential stochastic decision process associated with the random search procedure (6.2a,b). An important variant of this decision process results in the infinite-stage stationary Markovian case:
6.4 Convex Problems
135
Let πt (at , x t , ·) = π(at , xt , ·), At (x t ) = A(xt ), ut (at , xt , xt+1 ) = u(at , xt , xt+1 ), δt (x t ) = δ(xt ), t = 0, 1, . . . . Moreover, let 0 < < 1 be a certain discount factor. According to Theorem 6.3, the value function VtT = VtT (x) of the (T-t)-stage process depends only on the state xt = x and fulfills the recurrence relation:
T T ¯ x) + Vt+1 (y)K(a, x, dy) , Vt (x) = sup u(a, (6.24a) a∈A(x)
t = T − 1, T − 2, . . . , 1, 0, where a = δ(xt ). Introducing the stage transformation (T − t) → t, the transformed value function Wt (x) := VTT−t (x), t = 0, 1, . . . ,
(6.24b)
satisfies (insert s := T − t and replace then again s → t) the forward recurrence relations
¯ x) + Wt−1 (y)K(a, x, dy) , t = 0, 1, . . . , Wt (x) = sup u(a, (6.24c) a∈A(x)
where the functional equation (6.24c) holds for each integer T , and we have, cf. Theorem 6.3, W0 (x) = 0. Under certain conditions the sequence Wt (x) is convergent to the function W ∗ (x) satisfying the asymptotic functional equation
∗
W (x) = sup
u(a, ¯ x) +
W (y)K(a, x, dy) . ∗
(6.24d)
a∈A(x)
Moreover, an optimal decision rule δ ∗ is then given by δ ∗ (x) = a ∗ ∈ A(x), where a ∗ is a solution of the maximization problem in (6.24d).
6.4 Convex Problems For simplicity we consider here only the minimization of a real valued convex function F : R → R with respect to D = R, where we assume that the & second derivative F 'exists and F (x) > 0 for all x ∈ R. The interval G(x) = y ∈ R : F (y) < F (x) may now be approximated by the interval 6 4 F (x) (y − x)2 < 0 . H (x) = y ∈ K : F (x)(y − x) + 2 It is easy to see that
(6.25a)
136
6 Random Search Procedures for Global Optimization
6 4 F (x) , if F (x) < 0, H (x) = y ∈ R : x < y < x − 2 F (x) H (x) = G(x) = ∅ , if F (x) = 0, 4 6 F (x) H (x) = y ∈ R : x − 2 < y < x , if F (x) > 0. F (x)
(6.25b) (6.25c) (6.25d)
For the conditional distribution π(a, x, ·) of the search variables (Zt ) given Xt = x we choose now a normal distribution with mean μ = x and variance σ 2 = a 2 . Hence, in this case our decision parameter a is then the standard deviation σ . Furthermore, according to the above approximation of G(x) by H (x), we approximate the utility function u(a, x, y) of Sect. 6.3, by 4 u(a, ˜ x, y) =
|y − x| , y ∈ H (x) . 0 , otherwise
(6.26a)
Obviously, the stochastic decision process associated with the random search procedure (6.2a,b) is stationary and u(a, ¯ x) may be approximated by ¯˜ x) := u(a,
y∈H (x)
σ u(a, ˜ x, y)π(a, x, dy) = √ 2π
1 1 − exp − 2
2F (x) σ F (x)
2 !! . (6.26b)
¯˜ 0 (x) = W0 (x) = 0, the approximate W 1 (x) = sup u(σ, Starting from W x) to the σ >0
value function W1 (x), see (6.24c) and the approximative decision function σ˜ 1 = ¯˜ σ˜ 1 , x), are given by the following theorem: 1 (x) = u( σ˜ 1 (x), defined by W 1 1 Theorem 6.4 Let g be the function g(t) = √ 1 − exp(− t 2 ) , and let us 2 2π t denote t ∗ > 0 the number where g attains its maximum g ∗ . Then, 1 (x) = W
2F (x) and g ∗ F (x)
1 σ˜ 1 (x) = ∗ t
2F (x) F (x) .
(6.27)
2F (x) F (x) , according to (6.26b) and the (x) above definition of the function g = g(t), we have u(σ, ˜¯ x) = 2F F (x) g(t). This yields the assertion. 1 (x) = g ∗ 2F (x) is an approximate to the Obviously, according to (6.26a), W F (x) average step length s1 (x) of the first step X0 → X1 of the search process (Xt ). F (x) Comparing this result with Newton’s method x → y = x − α(x) , α(x) > 0 F (x)
Proof Using the transformation σ → t :=
1 σ
References
137
for the minimization of F that in Newton’s method the step length , we observe F (x) sN (x) = |y − x| = α(x) has—up to a normalizing factor—the same form F (x) as s1 (x). Similar results are obtained from comparisons of Theorem 6.4 with deterministic and stochastic gradient procedures. t and δ˜t , t = 2, 3, . . . will be In general, the computation of the further iterates W in general carried out in practice, because of its difficulty and because σ˜ (x) = hardly 2F (x) with a normalizing factor τ (x) > 0 is a reasonable approximate to τ (x) F (x) the optimal decision rule. This is also confirmed by numerical experiments. On the other hand, for the quadratic case F (x) = x 2 we can obtain the exact results. In fact, then we have that H (x) = G(x) and 1 (x) = 2g ∗ |x| as also σ1 (x) = σ˜ 1 (x) = 2 |x|. For solving now the W1 (x) = W t∗ functional equation (6.24b) we work therefore with the assumptions W ∗ (x) = C|x| and σ ∗ (x) = c|x|
(6.28)
where C, c are positive constants. Theorem 6.5 The optimal value W ∗ and the optimal decision rule δ ∗ (x) = σ ∗ (x) of the infinite-stage stationary stochastic decision process associated with the random search procedure for the minimization of F (x) = x 2 has the form (6.28), √ 1 8 π . where c ≈ √ − √ and C ≈ √ 4π 2π 4 π − 2 Note 6.3 As was mentioned in Sect. 6.1, often an analytic expression for F is not known and only the function values F (x) may be obtained. Hence the derivatives 2F (x) must be F (x), F (x) in the “optimal” decision rule σ˜ (x) = τ (x) F (x) estimated from observations of F .
References 1. Anderson, R.L.: Recent advances in finding best operating conditions. J. Am. Stat. Assoc. 48(264), 789–798 (1953). http://www.jstor.org/stable/2281072 2. Box, G.: Evolutionary operation: a method for increasing industrial productivity. J. R. Stat. Soc. Ser. C 6(2), 81–101 (1957). https://doi.org/10.2307/2985505 3. Brooks, S.: A discussion of random methods for seeking maxima. Oper. Res. 6(2), 244–251 (1958). https://doi.org/10.1287/opre.6.2.244
138
6 Random Search Procedures for Global Optimization
4. Goldberg, D.: The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Acad, Publ., Boston (2002) 5. Iosifescu, M., Theodorescu, R.: Random Process and Learning. Springer, Berlin (1969) 6. Karnopp, D.C.: Random search techniques for optimization problems. Automatica 1(2–3), 111–121 (1963). https://doi.org/10.1016/0005-1098(63)90018-9 7. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983) 8. Neumann, K.: Dynamische Optimierung. Bibliographisches Institut, Mannheim (1969) 9. Rappl, G.: Konvergenzraten von Random-Search-Verfahren zur globalen Optimierung. Ph.D. Thesis, UniBw München (1984) 10. Richter, H.: Wahrscheinlichkeitstheorie. Springer, Berlin (1966) 11. Spall, J.: Introduction to stochastic search and optimization. J. Wiley, Hoboken (2003) 12. Zabinsky, Z.: Stochastic Adaptation Search for Global Optimization. Springer, New York (2003)
Chapter 7
Controlled Random Search Under Uncertainty
7.1 The Controlled (or Adaptive) Random Search Method In order to increase the rate of convergence of the basic search method (6.2a), according to Sect. 6.3 we consider the following procedure, cf. [2–4]. Based on the basic random search method (6.2a), by means of the definitions (I)–(III) we describe first an (infinite-stage) sequential stochastic decision process associated with (6.2a). (I) We use next to the fact that the transition probabilities πt (x t , ·) depend πt (x t , ·) = πt (a, x t , ·) usually on certain parameters a = (aj )j εJ ∈ A, as, e.g., on certain (mixed) moments of the random vector Zt+1 . Let hˆ t = (x0 , x1 , . . . , xt , z1 , . . . , zt ) be the process history of Xt , Zt up to time t. The idea, developed first in [2, 3], is now to run the random search not with a fixed parameter a, but to use an “optimal” control a = at∗ (x t ) or a = at∗ (hˆ t ) of the parameter a such that a certain criterion measuring the progress of the search, as, e.g., the probability of a search success or the mean step length into the area of success at each step Xt → Xt+1 , is as large as possible. In the following,
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_7
139
140
7 Controlled Random Search Under Uncertainty
ht = (x0 , x1 , . . . , xt , z1 , z2 , . . . , zt , a0 , a1 , . . . , at−1 ) denotes the total process history up to time t. (II) To each step xt → Xt+1 there is associated a conditional mean (search-) gain E ut (at , xt , Xt+1 )|ht = E ut (at , xt , Zt+1 )|ht .
(7.1a)
Working, e.g., with the probability of a search success resp. the mean improvement of F resp. the mean (relative) step length into the area of success, ut is given by ut (at , xt , zt+1 ) = 1, = F (xt ) − F (zt+1 ), = xt − zt+1 =
xt − zt+1 , resp., if zt+1 ∈ GF (xt ), xt
ut = 0, if zt+1 ∈ GF (xt ).
(7.1b)
Calculating the conditional mean again in (7.1a) we have to solve next the integrals of the type
J (x , F ) = t
ut (at , xt , zt+1 )πt (at , x t , dzt+1 ).
(7.2)
F (zt+1 ) n. Let the set Dε of ε-optimal solutions of our global minimization problem (8.1) be defined by & ' Dε = y ≤ D : F (y) ≤ F ∗ + c , where c > 0 and F ∗ is given by & ' F ∗ = inf F (x) : x ∈ D ; let F ∗ > −∞. Note that Xn ∈ Dε implies Xt ∈ Dε for all t > n, hence P (Xn ∈ Dε ), n = 1, 1, . . . is a non decreasing sequence for each fixed ε > 0.
(8.5)
8.2 Convergence of the Random Search Procedure
153
8.2 Convergence of the Random Search Procedure Let αn (Dε ) denote the minimal probability that at the nth iteration step Xn → Sn+1 we reach the set Dε from any point Xn = xn outside this set, i.e. ' & αn (Dε ) = inf πn (xn , Dε ) : xn ∈ D \ Dε .
(8.6)
According to [6] we have this Theorem 8.1 (a) If for an ε > 0 ∞
αn (Dε ) = +∞,
(8.7)
n=0
then lim P (Xn ∈ Dε ) = 1 for every ε > 0. n→∞ (b) Suppose that lim P (Xn ∈ Dε ) = 1 for every ε > 0.
(8.8)
n→∞
Then lim F (Xn ) = F ∗ w.p.1 (with probability one) for every starting point n→∞ x0 ∈ D. (c) Assume that F is continuous and that the level sets Dε are nonempty and compact for each ε > 0. Then lim F (Xn ) = F ∗ implies that also lim dist n→∞
n→∞
(Xn , D ∗ ) = 0, where dist (Xn , D ∗ ) denotes the distance between Xn and the set D ∗ = D0 of global minimum points x ∗ of (8.1). Example 8.1 If πn (xn , ·) = π(·) is a fixed probability measure, then lim F (Xn ) = n→∞
F ∗ w.p.1 holds, provided that π
&
y ∈ D : F (y) ≤ F ∗ + ε
'
> 0 for each ε > 0.
This if true e.g. if Dε has a nonzero Lebesgue measure for all ε > 0 and π has a probability density φ with φ(x) > 0 almost everywhere. Note 8.1 Further convergence results of this type were given by Oppel [11], Rappl [13]. Knowing several (weak) conditions which guarantee the convergence w.p.1 of (Xn ) to the global minimum F ∗ , to the set of global minimum points D ∗ , resp., one should also have some information concerning the rate of convergence of F (Xn ), (Xn ) to F ∗ , D ∗ , respectively.
154
8 Controlled Random Search Procedures for Global Optimization
By Rappl [13] we have now the following result. Of course, as in the deterministic optimization, in order to prove theorems about the speed of convergence, the optimization problem (8.1) must fulfill some additional regularity conditions. Theorem 8.2 Suppose that D ∗ = ∅ and the probability measure π(xn , ·) transition is a d-dimensional normal distribution N μ(xn ), with a fixed covariance matrix . (a) Let Dε be bonded for some ε = ε0 > 0 and assume that F is convex in a certain neighborhood of D ∗ . Then lim nγ F (Xn ) − F ∗ = 0 w.p.1
(8.9)
n→∞
for each constant γ such that 0 < γ < d1 and every starting point x0 . (b) Let Dε be compact, D ∗ = {x ∗ }, where x ∗ ∈ int(D) ( = interior of D), and suppose that F is continuous and twice continuously differentiable in a certain neighborhood of x ∗ . Moreover, assume that F has a positive definite Hessian matrix at x ∗ . Then for each starting point x0 ∈ D it is 2 lim nγ F (Xn ) − F ∗ = 0 w.p. 1 for each 0 < γ < , n→∞ d 1 lim nγ Xn − x ∗ = 0 w.p. 1 for each 0 < γ < , (8.10) n→∞ d 2 lim sup n d E F (Xn ) − F ∗ ≤ τ (x0 ) < +∞,
n→∞
where τ (x0 ) is a nonnegative finite constant depending on the starting point x0 ∈ D and E denotes the expectation operator. (c) Under the same assumptions as in (b) we also have for each starting point x0 ∈ D, x0 = x ∗ , 2 lim inf n d E F (Xn ) − F ∗ ≥ h(x0 ),
n→∞
(8.11)
where h(x0 ) is a nonnegative constant depending on the starting point x0 . Furthermore for each x0 ∈ D, x0 = x ∗ , it is lim inf nγ EXn − x ∗ = +∞ for each γ >
n→∞
2 . d
(8.12)
Note 8.2 (a) Theorem 8.2 holds also for many non-normal classes of transition probability measures πn (xn , ·)), see [13]. (b) It turns out that under the assumptions of Theorem 8.2b the speed of convergence of (8.2) to the global minimum of (8.1) is exactly given by
8.3 Controlled Random Search Methods
155
2 E F (Xn ) − F ∗ = O(n− d ).
(8.13)
(c) The above convergence rates reflect the fact that in practice one observes that the speed of convergence may be very poor—especially near to the optimum of (8.1). Hence, using random search procedures, a main problem is the control of the basic random search algorithm (8.2) such that the speed of convergence of (Xn ) of F ∗ , D ∗ , resp., is increased.
8.3 Controlled Random Search Methods A general procedure how to speed up the search routine (8.2) is described in [6– 8]. The idea is to associate with the random search routine (8.2) a sequential stochastic decision process defined by the following items (I)–(III): (I) We observe that the conditional probability distribution πn (xn , ·) of Zn+1 given Xn = xn depends in general on a certain (vector valued) parameter a, i.e. πn (xn , ·) = πn (a, xn , ·), a ∈ A,
(8.14)
where A is the set of admissible parameters a. The method, developed first in [6–8], is now to run the algorithm (8.2) not with a fixed parameter a, but to use an optimal control a = an∗ (xn )
(8.15)
of a such that a certain criterion—to be explained in (II)—is maximized. Exemplary, πn (xn , ·) is assumed to be a d-dimensional normal distribution with mean μn and covariance matrix n . Hence, in this case we have a = (μ, ) ∈ A := M × Q,
(8.16)
where M ⊂ Rd and Q is the set containing all symmetric, positive definite d × d matrices and the zero matrix. (II) To search step Xn → Xn+1 there is associated a mean search gain Un (an , xn ) = E u(xn , Xn+1 )|Xn = xn ,
(8.17)
where the gain function u(xn , xn+1 ) is defined, e.g., by 4 u(xn , xn+1 ) =
1, if xn+1 ∈ GF (xn ) 0, else
(8.18a)
156
8 Controlled Random Search Procedures for Global Optimization
4 u(xn , xn+1 ) = 4 u(xn , xn+1 ) =
F (xn ) − F (xn+1 ), if xn+1 ∈ GF (xn ) 0, else
(8.18b)
xn − xn+1 , if xn+1 ∈ GF (xn ) 0, else
(8.18c)
Hence, in the first case Un (an , xn ) is the probability of a search success , in the second case Un (an , xn ) is the mean improvement of the value of the objective function and in case (8.18c) Un (an , xn ) is the mean step length of a successful iteration step Xn → Xn+1 . (III) Obviously, the convergence behavior of the random search process (Sn ) can be improved now by maximizing the mean total search gain U∞ = U∞ (a0 , a1 , . . .) := E
∞
ρ n u(Xn , Xn+1 )
n=0
subject to the controls an = an (xn ) ∈ A, n = 0, 1, . . ., where ρ > 0 is a certain discount factor. This maximization can be done in principle by the methods of stochastic dynamic programming, see e.g. [10].
8.4 Computation of Optimal Controls In order to weaken the computational complexity, the infinite-stage stochastic decision process defined in Sect. 8.3 is replaced by the sequence of 1-stage decision problems max Un (an , xn )
s.t. an ∈ A,
n = 0, 1, 2, . . .. Hence, the optimal control an∗ = a ∗ (xn ) is defined as a solution of
max a∈A
u(x, y)π(a, x, dy).
(8.19)
y∈GF (x)
In the following we consider the gain function (8.18b), i.e. u(x, y) = F (x) − F (y). Since an exact analytical solution of (8.19) is not possible in general, we have to apply some approximations. Firstly, the area of success GF (x) is approximated according to
8.4 Computation of Optimal Controls
157
1 d T T 2 GF (x) ≈ y ∈ R : ∇F (x) (y − x) + (y − x) ∇ F (x)(y − x) < 0 , 2 (8.20) where ∇F (x) denotes the gradient of F and ∇ 2 F is the Hessian matrix of F at x. We assume that ∇ 2 F (x) is regular and ∇F (x) = 0. Defining then the vector w ∈ Rd by y − x = w − ∇ 2 F (x)−1 ∇F (x),
(8.21)
the quadratic inequality contained in (8.20) has the form wT
∇ 2 F (x) w < 1, r
where r > 0 is defined by r = ∇F (x)T ∇ 2 F (x)−1 ∇F (x). By the Cholesky-decomposition of
∇ 2 (x) we can determine a matrix such that r
∇ 2 F (x) = T . r
(8.22)
v = T w,
(8.23)
Defining
the approximation (8.20) of GF (x) can be represented according to (8.21) and (8.22) by T GF (x) ≈ xN + −1 v : v < 1 , where · is the Euclidean norm and xN is given by xN = x − ∇ 2 F (x)−1 ∇F (x). It is then easy to verify that by the same transformation v = v(ω) := T (y(ω) − xN ) the search gain u(x, y) = F (x) − F (y) can be approximated by
(8.24)
158
8 Controlled Random Search Procedures for Global Optimization
u(x, y) ≈
r 1 − v2 . 2
(8.25)
By means of (8.24) and (8.25) the objective function U (a, x), a = (μ, ), of (8.19) can be approximated by (a, Q) = r U 2
1 − v2 f (q, Q, v)dv,
(8.26)
v 0. Now (8.30), (8.33), and (8.22) yield μ∗ = xN + k ∗ −1 1, 2 −1 1 −1 ∗ ∗ −1 ∗ ∇ F (x) = ∗ = c ( ) = c c r = c∗ r∇ 2 F (x)−1 . Hence, we have this result. Theorem 8.4 The 1-stage optimal control a ∗ (x) = (μ∗ , ∗ ) of the random search procedure (8.2) is given approximately by μ∗ = xN + k ∗ −1 , ∗ = c∗ ∇F (x) ∇ 2 F (x)−1 ∇F (x) ∇ 2 F (x)−1 , where k ∗ ∈ R, c∗ > 0 are certain fixed parameters.
(8.34)
160
8 Controlled Random Search Procedures for Global Optimization
8.5 Convergence Rates of Controlled Random Search Procedures Assume that the random search procedure (8.2) has normal distributed search variates Z1 (ω), Z2 (ω), . . . , Zn (ω), . . . controlled by means of the following control law μ0 (x) = x 0 (x) = c ∇F (x) ∇ 2 F (x)−1 ∇F (x) ∇ 2 F (x)−1 ,
(8.35)
where c > 0 is a fixed parameter. For control (8.35) we obtain, see the later considerations, this result: Theorem 8.5 Suppose that D is a compact, convex subset of Rd and let x ∗ be the unique optimal solution of (8.1). Let x ∗ ∈ int (D) (= interior of D) and assume that F is twice continuously differentiable in a certain neighborhood of x ∗ . Moreover, suppose that ∇ 2 F is positive definite at x ∗ . Then there is a constant κ > 1 such that κ n E F (Xn ) − F ∗ → 0 as n → ∞ and κ n F (Xn ) − F ∗ → 0 as n → ∞, w.p. 1
(8.36)
for all starting points contained in a certain neighborhood of x ∗ . Note 8.3 (a) Comparing Theorems 8.2 and 8.5, we find that—at least locally—the convergence rate of (8.2) is increased very much by applying a suitable control, as e.g. the control (8.35). (b) However, the high convergence rate (8.36) holds only if the starting point x0 is sufficiently close to x ∗ , while the low convergence rate found in Theorem 8.2 holds for arbitrary starting points x0 ∈ D. Hence, the question arises whether by a certain combination of a controlled random search procedure we also can guarantee a linear convergence rate for all starting points x0 ∈ D. Given an increasing sequence N of integers n1 < n2 < . . . < nk < nk+1 < . . . , let the controls an = (μn , n ) of the normal distributed search variates Zn+1 (ω), n = 0, 1, 2, . . ., be defined by
8.6 Numerical Realizations of Optimal Control Laws
161
μn = xn and 4 n =
0 (xn ) , if n ∈ N R , if n ∈ N,
(8.37)
where 0 (x) is defined by (8.35) and R is a fixed positive definite d × d matrix. Hence, according to (8.37), the search procedure is controlled only at the times n1 , n2 , . . .. Now, we have this result. Theorem 8.6 Suppose that D is a compact, convex subset of Rd and let x ∗ ∈ int (D) be the unique optimal solution of (8.1). Assume that F is twice continuously differentiable in a certain neighborhood of x ∗ and let ∇ 2 F (x ∗ ) be positive definite. Define then hn = max{k : nk < n}. Then for every starting point x0 ∈ D there is a constant β > 1 such that β hn E F (Xn ) − F ∗ → 0 as n → ∞ and β hn F (Xn ) − F ∗ → 0 as n → ∞, w.p. 1,
(8.38)
hn < 1. n→∞ n
provided that lim sup Note 8.4
(a) Hence, the linear convergence rate (8.38) can be obtained by a suitable control of the type (8.37) for each starting points x0 ∈ D. √ k (b) If nk = for some p ∈ N, then β hn = ( p β)n . p (c) The assertion of Theorem 8.6 follows by using results of [13].
8.6 Numerical Realizations of Optimal Control Laws In order to realize the control laws obtained in (8.34), (8.35), (8.37), one has to compute the gradient ∇F (x) and the inverse Hessian Matrix ∇ 2 F (x)−1 of F at x. However, since the derivatives ∇F and ∇ 2 F of F are not given in analytical form in practice, the gradient and the Hessian matrix of F must be approximated by
162
8 Controlled Random Search Procedures for Global Optimization
means of the information obtained about F during the search process. Hence, for an approximate computation of F and ∇ 2 F we may use the sequence of sample points, iteration points, and function values x0 , F (x0 ), z1 , F (z1 ), x1 , z2 , F (z2 ), x2 , . . . . In order to define a recursive approximation procedure, for n = 0, 1, 2, . . . let denote gn the approximation of ∇F (xn ), Bn the approximation of ∇ 2 F (xn ), Hn the approximation of ∇ 2 F (xn )−1 . Proceeding recursively, we suppose at the n-th stage of the search process we know the approximations gn , Bn , and Hn of ∇F (xn ), ∇ 2 F (xn ), and ∇ 2 F (xn )−1 , respectively. Hence, we may compute—approximately—the control an = (μn , n ) according to one of the formulas (8.34), (8.35), or (8.37) by replacing ∇F (xn ) and ∇ 2 F (xn )−1 by gn , Hn , respectively. The search process (8.2) yields then the sample point zn+1 , its function value F (zn+1 ) and the next iteration point xn+1 . Now we have to perform the update gn → gn+1 , Bn , → Bn+1 and Hn → Hn+1
(8.39)
by using the information sn = xn+1 − xn , F (xn ), zn+1 , F (zn+1 ), xn+1 about F . (a) Search failure at xn If zn+1 ∈ D or F (zn+1 ) ≥ F (xn ), then xn+1 = xn . Since in this case we stay at xn , we may define the update (8.39) by gn+1 = gn , Bn+1 = Bn , Hn+1 = Hn . (b) Search success at xn In this case it is zn+1 ∈ D and F (zn+1 ) < F (xn ), hence xn+1 = zn+1 = xn . By a quadratic approximation of F at xn+1 we find then F (xn ) ≈ F (xn+1 ) + ∇F (xn+1 )T (xn − xn+1 ) 1 + (xn − xn+1 )T ∇ 2 F (xn+1 )(xn − xn+1 ) 2
8.6 Numerical Realizations of Optimal Control Laws
163
and therefore 1 T ∇F (xn+1 )sn − snT ∇ 2 F (xn+1 )sn ≈ Fn , 2
(8.40)
where sn = xn+1 − xn = zn+1 − xn , Fn = F (xn+1 ) − F (xn ) = F (zn+1 ) − F (xn ). Now we have to define the new approximations gn+1 and Bn+1 of ∇F (xn n + 1) and ∇ 2 F (xn+1 ), respectively. Because of (8.40), in order to define the update (8.39), we demand next to the following Modified Quasi–Newton Condition 1 T gn+1 sn − snT Bn+1 sn = Fn 2
(8.41)
1 T gn+1 sn − snT Bn+1 sn < 0. 2
(8.42)
or
Note 8.5 (i) In contrary to (8.41), the modified Quasi-Newton condition (8.42) uses only the information that the function value of F at xn+1 is less than that at xn . (ii) If Fn = F (xn+1 ) · F (xn ) < 0, then −sn = xn − xn+1 is an ascent direction of F at xn+1 . Hence, since ∇F (xn+1 ) is the best ascent direction of F at xn+1 , −sn may be used to define the approximation gn+1 of ∇F (xn+1 ). Since gn+1 is not completely determined by the modified Quasi-Newton condition (8.41) or (8.42), resp., there are still many possibilities to define the update formulas (8.39). Clearly, since Bn is an approximation to a symmetric matrix, we suppose that Bn is a symmetric matrix. (A) Additive rank-one-updates In order to select a particular tuple (gn+1 , Bn+1 ) we may require that (gn+1 , Bn+1 ) is an optimal solution (g, B) of the distance minimization problem min d1 (B, B) + d2 (g, g)
(8.43)
1 s.t. g T s − s T Bs = F, 2 where B = Bn , g = gn , F = Fn and d1 , d2 are certain distance measures. We suppose here that d1 , d2 are defined by
164
8 Controlled Random Search Procedures for Global Optimization
d1 (B, B) =
d 1 (bij − bij )2 , 2
(8.44)
i,j −1
1 (g j − gj )2 , 2 d
d2 (g, g) =
j −1
where bij , bij are the elements of B and B, resp., and g j , gj denote the components of g, g, respectively. Note 8.6 The minimization (8.43) is a generalization of the minimality principles characterizing some of the well known Quasi-Newton update formulas, see e.g. [4]. Solving (8.43), (8.44), we find that g, B are given by g = g − λs B=B+
(8.45)
λ T ss , 2
(8.46)
where the Lagrange multiplier λ is given by λ=
g T s − 12 s T Bs − F . s T s 1 + 14 s T s
(8.47)
If the distance functions d1 , d2 are changed, then other update formulas may 1 be generated. If e.g. d2 is replaced by d2 (g, g) = (g − g)B −1 (g − g), then 2 g = g − λBs. Supposing now that B is positive definite, it is known that the matrix B defined by (8.46) is positive definite if and only if 1+
λ T s H s > 0, 2
(8.48)
where H = B −1 is our approximation to the inverse Hessian matrix ∇ 2 F (x)−1 −1 of F at x = xn . Hence, if H = B denotes the approximation of the inverse Hessian matrix of F at xn+1 , then by (8.46) and (8.48) the following update formula H → H for the inverse Hessian matrix of may be established: ( H =
H−
λ 1 T 2 1+ λ s T H s H ss H, 2
H, else,
if (8.48) holds
(8.49)
8.6 Numerical Realizations of Optimal Control Laws
165
Updates in the Case of a Search Failure If zn+1 ∈ D or F (zn+1 ) ≥ F (xn ), then we stay at xn+1 = xn and we may define therefore g = g, B = B, and H = H . However, also in the case of a search failure the tuple zn+1 , F (zn+1 ) yields new information about F , provided only that zn+1 = xn . Hence, replacing the modified Quasi-Newton condition (8.41) by 1 g T s + s T Bs = F, 2 where now s = zn+1 − xn , F = F (zn+1 ) − F (xn ), we may derive by the above procedure also update formulas g → g, B → B, H → H for defining improved approximation g, B, H of ∇F, ∇ 2 and ∇ 2 F −1 , respectively, at xn+1 = xn . (B) Multiplicative rank-one-updates By (8.45)–(8.49) we have given a first concrete procedure for the realization of the optimal control laws (8.34), (8.35), and (8.37), respectively. Indeed, having e.g. the mean μn = xn and the covariance matrix n = c∗ (gnT Hn gn )Hn ,
(8.50)
the random variable Zn+1 may be defined by 0 , Zn+1 = μn + n Zn+1 0 where Zn+1 is a normal distributed with mean zero and covariance matrix equal to the identity matrix, and n is a d × d matrix such that
n nT = n .
(8.51)
Hence, at each iteration point xn the (Cholesky-)decomposition (8.51) of n has to be computed. In order to omit this time-consuming step, we still ask whether update formulas n → n+1 for the Cholesky-factors n may be obtained. Since Hn = Bn−1 we suppose that Bn may be represented by Tn TnT = Bn . Then n is given by n = c∗ (Tn−1 gn )T Tn−1 gn Tn−1T Tn−1 and the factor n may be defined, cf. (8.50) by
166
8 Controlled Random Search Procedures for Global Optimization
n =
√
c∗ Tn−1 gn Tn−1T .
(8.52)
In order to define the update T → T , where T = Tn and T = Tn+1 with T Tn+1 Tn+1 = Bn+1 , we require that T is changed only in the direction of s = xn+1 − xn . Hence, we assume that
γ −1 T = I + T , ss T s s
T,
where γ is real parameter to be determined. Furthermore, the distance minimization problem (8.43) is then replaced by min d1 (T , T ) + d2 (g, g)
(8.53)
1 s.t. g T s − s T Bs = F, 2 where now γ −1 γ −1 B = I + T ss T B I + T ss T , with B = T T T . s s s s
(8.54)
If the distance functions d1 , d2 are again defined corresponding to (8.44), then 1 d1 (T , T ) = 2
γ −1 ss
2 ss T T 2E ,
(8.55)
where T E denotes the Euclidian norm of T . Hence, by (8.53) a particular tuple (g, γ ) is selected. Because of (8.54) and (8.55), the minimization problem (8.53) has the form ss T T 2E min 2 s.t. g T s −
γ −1 ss
2
1 + g − g2E 2
(8.56)
γ2 T s Bs − F, 2
hence, the tuple (g, γ ) is projected onto the parabola in Rd+1 defined by the constraint in (8.56). Note 8.7 (a) Other update formulas may be gained by changing the objective function of (8.56). (b) Also for multiplicative updates, similar update rules may be derived in case of search failures.
References
167
References 1. Anderson, R.L.: Recent advances in finding best operating conditions. J. Am. Stat. Assoc. 48(264), 789–798 (1953). http://www.jstor.org/stable/2281072 2. Box, G.: Evolutionary operation: a method for increasing industrial productivity. J. R. Stat. Soc. Ser. C 6(2), 81–101 (1957). https://doi.org/10.2307/2985505 3. Brooks, S.: A discussion of random methods for seeking maxima. Oper. Res. 6(2), 244–251 (1958). https://doi.org/10.1287/opre.6.2.244 4. Dennis, J., Schnabel, R.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Englewood Cliffs, Prentice-Hall (1983) 5. Karnopp, D.C.: Random search techniques for optimization problems. Automatica 1(2-3), 111–121 (1963). https://doi.org/10.1016/0005-1098(63)90018-9 6. Marti, K.: Random search in optimization problems as a stochastic decision process (adaptive random search). Meth. Oper. Res. 36, 223–234 (1979) 7. Marti, K.: Adaptive zufallssuche in der optimierung. ZAMM 60, 357–359 (1980) 8. Marti, K.: On accelerations of the convergence in random search methods. Meth. Oper. Res. 38, 391–406 (1980) 9. Müller, P.E.A.: Optimierung mittels stochastischer suchverfahren. Wissenschaftliche Zeitschrift der TU Dresden 21(1), 69–75 (1983) 10. Müller, P., Nollau, V.: Steuerung Stochastischer Prozesse. Akademie-Verlag, Berlin (1984) 11. Oppel, U.: Random search and evolution. In: Transactions Symposium Applied Mathematics. Thessaloniki (1976) 12. Rappl, G.: Optimierung durch Zufallsirchungsverfahren, Simulation und Optimierung det. und stoch. dynam. Systeme (1980) 13. Rappl, G.: Konvergenzraten von Random-Search-Verfahren zur globalen Optimierung. Ph.D. Thesis, UniBw München (1984) 14. Zielinski, R., Neumann, P.: Stochastische Verfahren zur Suche nach dem Minimum einer Funktion. Akademie-Verlag, Berlin (1983)
Part III
Random Search Methods (RSM): Convergence and Convergence Rates
Chapter 9
Mathematical Model of Random Search Methods and Elementary Properties
Random search methods [2–4] are special stochastic optimization methods to solve the following problem: Minimize f (x) with regard to x ∈ D, where f : M → R and D ⊂ M ⊂ Rd (d ∈ N).
The principle on which this method is based is extremely simple and clear. It is similar to the mutation and selection of the biological evolution. The search is started at an arbitrary starting point x0 ∈ D. Now a distribution p1 (x0 , ·), e.g. a d-dimensional normal distribution with the mean value x0 and the covariance matrix C(x0 ), is used to generate a point y1 . This process is equivalent to mutation. If y1 lies in D, f is evaluated at y1 and the two functional values f (x0 ) and f (y1 ) are compared with each other. If y1 is better than x0 ( i.e. f (y1 ) ≤ f (x0 ) ), then x1 := y1 . If y1 does not lie in D or y1 is not better than x0 , i.e. f (y1 ) ≥ f (x0 ), then x0 is retained, i.e. x1 := x0 . This process is equivalent to the selection. Thus the first step of the method is completed. In the second step a distribution p2 (x0 , x1 , ·) is used to generate an additional mutation point y2 which is compared with x1 . Analogously to the first step, this provides the selection point x2 . If this process is continued, it leads to a sequence of random points x1 , x2 , . . . from D with f (x0 ) ≥ f (x1 ) ≥ f (x2 ) ≥ . . .. The description also shows that in order to use such a method it only has to be assumed that the function f can be evaluated at any value x ∈ D. Furthermore, it can be seen that one and the same method being applied to two functions f1 and f2 , which are identical on D, gives the same selection points x1 , x2 , . . . etc. In other words: A method is not influenced by the way f behaves outside of D. Therefore we may assume without loss of generality (and we will do this in the further course of the discussion) that the domain set M is identical with the set of admissible points D. Therefore we will have to deal with the following basic problem:
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_9
171
172
9 Mathematical Model of Random Search Methods and Elementary Properties
Minimize f (x) with regard to x ∈ D, where f : D → R and D ⊂ Rd not empty .
(9.1)
For technical reasons we will furthermore assume that D and f are measurable, i.e. D ∈ Bd and f Bd|D −B1 measurable (Bd = Borel-σ -algebra on Rd ). In addition, we will always suppose that f is not constant. This does not represent a restriction, since the minimization problem 9.1 is trivial if f is constant. Let us now once more have a look at the initial description of the search algorithm. We notice that, if D and f are given, the starting point x0 ∈ D and the distributions pn are the decisive quantities for the practical execution of a random search method. With these quantities the mutation points y1 , y2 , . . . are generated, and the selection rule 4 yn if yn ∈ D and f (yn ) < f (xn−1 ) xn = , therefore xn−1 otherwise (9.2) xn =yn · 1{x∈D:f (x) f ∗ , which is crucial for the next chapters. Lemma 11.1 Let (Xn : n ∈ N0 ) be an R-S-M with an arbitrary mutation sequence and an arbitrary starting point x0 ∈ D. Then the following statements are equivalent: (1) f (Xn ) → f ∗
P − a.s.
(2) lim P ({Xn ∈ Da }) = 1
n→∞
for any a > f ∗ .
Proof It is generally known (cf. [2], p. 171) that f (Xn ) − f ∗ → 0
P − a.s. ⇐⇒ sup | f (Xm ) − f ∗ | −→ 0 m≥n
n→∞
P-stoch.
From this it follows that f (Xn ) − f ∗ → 0 P − a.s. ⇐⇒ ∗ lim P sup | f (Xm ) − f |≤ a = 1 for all a > 0 ⇐⇒ n→∞
lim P
n→∞
m≥n
8
! {Xm ∈ Da } = 1 for all a > f ∗ .
m≥n
Now the assertion follows from Remark 11.1.
References 1. Driml, M., Hanš, O.: On a randomized optimization procedure. In: Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions and Random Processes (held at Prague 1965), pp. 273–276 (1967) 2. Gänssler, P., Stute, W.: Wahrscheinlichkeitstheorie. Springer, Berlin (1977) 3. Marti, K.: On accelerations of the convergence in random search methods. Methods Oper. Res. 37, 391–406 (1980) 4. Oppel, U.: Random search and evolution. In: Trans. Symp. Applied Math. Thessaloniki (1976)
Chapter 12
Convergence Theorems
12.1 Convergence of Random Search Methods with an Absolutely Continuous Mutation Sequence Theorem 12.1 Let (Xn : n ∈ N0 ) be an R-S-M with an absolutely continuous mutation sequence (pn : n ∈ N). Let the associated density sequence (gn : n ∈ N) be uniformly bounded towards zero, i.e.: For each x0 ∈ D \ D ∗ and a ∈ (f ∗ , f (x0 )) there exists a c > 0 with gn (x0 , . . . , xn−1 , yn ) ≥ c for all n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) \ Da ,
(12.1)
yn ∈ Da .
If furthermore λd (Da ) > 0
for all a > f ∗ ,
(12.2)
then we have for every starting point x0 ∈ D f (Xn ) −→ f ∗ n→∞
P − a.s.
Proof Let a > f ∗ be arbitrary. To prove the assertion, it is sufficient to demonstrate according to Lemma 11.1 that the condition limn→∞ P (Xn ∈ Da ) = 1 is satisfied. However, this follows from Theorem 11.1, because for all starting points x0 ∈ D \ D ∗ and all x1 , . . . , xn−1 ∈ Df (x0 ) \ Da we have
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_12
195
196
12 Convergence Theorems
pn (x0 , x1 , . . . , xn−1 , Da ) =
gn (x0 , . . . , xn−1 , yn )λd (dyn ) Da
≥ cλd (Da ) and this, which is trivial, satisfies condition (11.2). For starting points x0 ∈ D ∗ the assertion is trivial. Remark 12.1 If f is continuous and Da is compact for all a ≥ x0 ∈ D and sequences (xn : n ∈ N) from Df (x0 ) with lim f (xn ) = f ∗
then for all
lim dist(xn , D ∗ ) = 0.
also
n→∞
f ∗,
n→∞
Proof Let (xn : n ∈ N) be a sequence from Df (x0 ) with the above property. If it is assumed that lim supn→∞ dist(xn , D ∗ ) > 0, then there is an ε > 0 and one subsequence of the natural numbers (nk : k ∈ N) so that dist(xnk , D ∗ ) ≥ ε for each k ∈ N. Due to the compactness of the set Df (x0 ) a subsequence (xnk ) ⊂ (xnk ) converges towards an x ∈ D. From the continuity of f it follows that limk→∞ f (xnk ) = f (x). From the continuity of the representation z → dist(z, D ∗ ) (cf. [4, p. 78]) it follows that dist(x, D ∗ ) ≥ ε, i.e. f (x) > f ∗ in contradiction to f (x) = limk→∞ f (xnk ) = limn→∞ f (xn ) = f ∗ . −
◦
D∗
◦
Remark 12.2 If ∩ D = ∅ and if f is continuous on D ∗ , then there is a y ∈ D and an r > 0 with K(y, r) ⊂ Da for all a > f ∗ . Therefore in particular condition (12.2) is satisfied. −
Proof Let a >
f∗
be arbitrary and
x∗
∈
◦
D∗
◦
∩ D.
Then for each ε > 0 there is a y ∈ D with | y − x ∗ |< ε. Since f is continuous at x ∗ , ε > 0 can be chosen in a way that z ∈ Da , if | z − x ∗ |< ε. ◦
◦
Let now be y ∈ D with | y − x ∗ |< 2ε and σ ∈ (0, 2ε ) with K(y, σ ) ⊂ D. From the triangular inequality it follows that K(y, σ ) ⊂ K(x ∗ , ε) and therefore K(y, σ ) ⊂ Da . −
Note 12.1 The condition
D∗
◦
∩ D = ∅ states that the points x ∗ ∈ D ∗ must not be ◦
isolated points with regard to D. −
◦
For, if D ∗ ∩ D = ∅, then any x ∗ ∈ D ∗ would have a neighborhood K(x ∗ , r)
◦
(r > 0) with K(x ∗ , r) ∩ D = ∅.
12.1 Convergence of Random Search Methods with an Absolutely Continuous. . .
197
Remark 12.3 Condition (12.1) is satisfied if any Da (a > f ∗ ) is compact, f is continuous and, furthermore, in the stationary case, i.e. if gn (x0 , . . . , xn−1 , yn ) = g(xn−1 , yn ), and g is continuous and positive on (D \ D ∗ ) × D. Proof Let x0 ∈ D \ D ∗ and a ∈ (f ∗ , f (x0 )). The compactness of the set {x ∈ D : f (x0 ) ≥ f (x) ≥ a} follows from the fact that f is continuous and Df (x0 ) is compact. Since Df (x0 ) \ Da is included in this set, it follows that there is a c > 0 with g(x, y) ≥ c for all x ∈ Df (x0 ) \ Da and y ∈ Da . Example 12.1 gn (x0 , . . . , xn−1 , yn ) =g(xn−1 , yn ) =
9 : d | yn − xn−1 |2 =(2π(σ (xn−1 ))2 )− 2 exp − 2(σ (xn−1 ))2
with σ : D → R+ measurable (cf. Example 10.1). If f is continuously and Da is compact for all a > f ∗ , then σ (x) := σ > 0 Condition (12.1) is satisfied according to Note 12.3. If f is continuously differentiable and ∇f (x) = 0 is satisfied for all x ∈ D \ D ∗ (which is the case, e.g., for a convex function), the control σ (x) :=| ∇f (x) | also satisfies condition (12.1) according to Note 12.3. This follows from the fact that σ is continuous and positive on D\D ∗ . If therefore −
◦
additionally D ∗ ∩ D = ∅, then the conditions (12.1), respectively, (12.2) are met in both cases according to the Remark 12.3 and 12.2, and it follows from Theorem 12.1 and Remark 12.1 for all starting points x0 ∈ D that: f (Xn ) − f ∗ → 0
P − a.s.
and
dist(Xn , D ∗ ) → 0
P − a.s.
The next example shows that the convergence of f (Xn ) towards f ∗ does not generally imply the convergence of dist(Xn , D ∗ ) towards zero. Example 12.2 Let D = [0, 1],
gn (x0 , . . . , xn−1 , yn ) = 1[0,1] (yn ) and
198
12 Convergence Theorems
f (x) :=
⎧ ⎪ ⎪ ⎨1 x ⎪ ⎪ ⎩0
if
x=0
if
0 0. K d−1
Let now x1 , . . . , xn−1 ∈ Df (x0 ) \ Da be arbitrary. Then: E f (xn−1 ) − f (Xn ) | Xn−1 = xn−1 , . . . , X1 = x1 , X0 = x0
= f (xn−1 ) − f (xn ) qn (x0 , x1 , . . . , xn−1 , dxn )
f (xn−1 ) − f (yn ) pn (x0 , x1 , . . . , xn−1 , dyn )
=
Df (xn−1 )
f (xn−1 ) − f (yn ) pn (x0 , x1 , . . . , xn−1 , dyn )
≥ Da˜
≥ c · inf hn (x0 , x1 , . . . , xn−1 , | yn − xn−1 |) : yn ∈ Da˜ ≥ c · inf hn (x0 , x1 , . . . , xn−1 , r) : r ∈ (0, K) . Therefore: inf E f (xn−1 ) − f (Xn ) | Xn−1 = xn−1 , . . . , X1 = x1 , X0 = x0 : x1 , . . . , xn−1 ∈ Df (x0 ) \ Da ≥ c · inf hn (x0 , x1 , . . . , xn−1 , r) : x1 , . . . , xn−1 ∈ Df (x0 ) \ Da , r ∈ (0, K) . From (12.3) it follows that condition (11.5) is valid. Then the assertion follows from Theorem 11.2 and Lemma 11.1 for all x0 ∈ D \ D∗. For x0 ∈ D ∗ the assertion is trivial. Example 12.3 (1) Let (an : n ∈ N) be a bounded sequence of positive numbers with
∞ n=1
an = ∞.
Let λn (x0 , x1 , . . . , xn−1 , ·) = λn (·) be the exponential distribution with the parameter an , i.e. hn (x0 , . . . , xn−1 , r) = an exp(−an r). With a := sup{an : n ∈ N} it follows for all K > 0 that
200
12 Convergence Theorems ∞
∞ ∞ inf hn (r) : r ∈ (0, K] ≥ an exp(−an K) ≥ exp(−aK) an .
n=1
n=1
Thus (12.3) is satisfied. (2) Let hn (x0 , x1 , . . . , xn−1 , r) :=
*
n=1
> 1 2 π σ (xn−1 )
?
exp −
r2
2 σ (xn−1 )
2
with σ : D →
(0, ∞). Let σ be continuous on D \ D ∗ . Because of 9 : 4 6 r2 1 exp − 2 : x ∈ Df (x0 ) \ Da , r ∈ (0, K) inf σ (x) 2σ (x) 6 9 : 4 K2 1 exp − 2 : x ∈ Df (x0 ) \ Da = inf σ (x) 2σ (x) it follows from Note 12.3 that (12.3) f is valid, if f is continuous and any Da (a > f ∗ ) is compact. Theorem 12.3 Let (Xn : n ∈ N0 ) be an R-D-M with the step length distributions ◦
λn . Let D be convex, D ∗ = ∅, D = ∅ and f : D → R continuous and convex. Furthermore, let Da be bounded for all a ≥ f ∗ . It is also assumed that we have for all x0 ∈ D, a ∈ (f ∗ , f (x0 )) and c > 0 ∞ n=1
4
rλn (x0 , . . . , xn−1 , dr) :
inf (0,c)
6
(12.4)
x1 , . . . , xn−1 ∈ Df (x0 ) \ Da = ∞. Then for all starting points x0 ∈ D f (Xn ) → f ∗
P − a.s.
Proof It is assumed that x0 ∈ D \ D ∗ without loss of generality (for x0 ∈ D ∗ the assertion is trivial). Let a ∈ (f ∗ , f (x0 )) and x1 , . . . , xn−1 ∈ Df (x0 ) \ Da be arbitrary. Furthermore, let a˜ ∈ (f ∗ , a). ◦
From the convexity of D it follows that D = D (cf. [2, p. 16]) and therefore ◦
D ∗ ⊂ D. ◦ Because of Remark 12.2 there exist a z ∈ D and an ε > 0 with K(z, 2ε) ⊂ Da˜ .
12.2 Convergence of Random Direction Methods
201
From the boundedness of Df (x0 ) it follows that ' & K := sup | x − y |: x, y ∈ Df (x0 ) < ∞. For x ∈ Df (x0 ) \ Da let β(x) := arcsin
ε . |x−z|
Because of xn−1 ∈ Df (x0 ) \ Da and K(z, 2ε) ⊂ Da˜ ⊂ Da , it follows that | z − xn−1 |> 2ε and therefore | z − xn−1 | −ε > ε > 0. From Lemma B.10 it follows that xn−1 + C (β(xn−1 ), z − xn−1 , ε) ⊂ xn−1 + C (β(xn−1 ), z − xn−1 , | z − xn−1 | −ε) ⊂ conv {xn−1 } ∪ K(z, ε) . From the convexity of D and from {xn−1 } ∪ K(z, ε) ⊂ D it follows that: xn−1 + C (β(xn−1 ), z − xn−1 , ε) ⊂ D. Let now y ∈ xn−1 + C (β(xn−1 ), z − xn−1 , ε) be arbitrary. Then according to Lemma B.10, there is a t > 0 with | xn−1 + t (y − xn−1 ) − z |≤ ε. Let z := xn−1 + t (y − xn−1 ). Because of | z − xn−1 |
≥
| z − xn−1 |
−
| z − z |
and | y − xn−1 |
≤ε≤
| z − xn−1 |
−ε
it follows that | z − xn−1 | and therefore t =
≥
| y − xn−1 |
|z −xn−1 | |y−xn−1 |
≥ 1.
+ε−
| z − z |
≥
| y − xn−1 |
202
12 Convergence Theorems
From the central inequality for convex functions (cf. Lemma B.1) it follows now that f (z ) − f (xn−1 ) f (y) − f (xn−1 ) ≤ | y − xn−1 | | z − xn−1 | therefore f (xn−1 ) − f (y) ≥| y − xn−1 | ≥| y − xn−1 |
f (xn−1 ) − f (z ) | xn−1 − z | a − a˜ . K
If β := arcsin Kε , then β ≤ β(x) for all x ∈ Df (x0 ) \ Da . Therefore xn−1 + C (β, z − xn−1 , ε) 6 4 a − a˜ | y − xn−1 | ⊂ Df (xn−1 ) . ⊂ y ∈ D : f (xn−1 ) − f (y) ≥ K From this it follows that
12.2 Convergence of Random Direction Methods
203
E f (xn−1 ) − f (Xn ) | Xn−1 = xn−1 , . . . , X1 = x1 , X0 = x0
f (xn−1 ) − f (y) pn (x0 , . . . , xn−1 , dy) = Df (xn−1 )
a − a˜ ≥ K a − a˜ = K
xn−1 +C(β,z−xn−1 ,ε)
| y − xn−1 | pn (x0 , . . . , xn−1 , dy)
S
R+
1C(β,z−xn−1 ,ε) (rs) | rs | λn (x0 , . . . , xn−1 , dr)γ (ds)
a − a˜ γ S ∩ C(β, z − xn−1 ) = K
rλn (x0 , . . . , xn−1 , dr) (0,ε]
with the last but one equation following from the integration property (10.4). According to Lemma A.3, λ(S ∩ C(β, z − xn−1 )) does not depend on z − xn−1 . Therefore a positive constant M exists with E f (xn−1 ) − f (Xn ) | Xn−1 = xn−1 , . . . , X1 = x1 , X0 = x0
rλn (x0 , . . . , xn−1 , dr) ≥M (0,ε]
for all x1 , . . . , xn−1 ∈ Df (x0 ) \ Da . Because of (12.4) the condition (11.5) is satisfied. Now the assertion follows from Theorem 11.2 and Lemma 11.1.
Example 12.4 (1) Let (an : n ∈ N) be a null sequence of positive numbers. Let λn (x0 , . . . , xn−1 , ·) = εan . In this case step length is therefore selected deterministically, namely an in the nth step (cf. [1], where an = n−s ). Condition (12.4) is therefore satisfied if and only if ∞
an = ∞.
n=1
(2) Let (an : n ∈ N) be a bounded sequence of positive numbers with ∞
an = ∞.
n=1
Let λn (x0 , . . . , xn−1 , ·) be the uniform distribution on the Interval (0, an ). Then the condition (12.4) is satisfied.
204
12 Convergence Theorems
This follows from
rλn (x0 , . . . , xn−1 , dr) ≥ (0,c)
≥
6 4 2 1 c min , an 2 a
4 2 6 c 1 min , an 2 an a := sup {an : n ∈ N} .
where
(3) Let λn (x0 , . . . , xn−1 , ·) have the density @ hn (r) =
2 1 r2 exp − π σn 2σn 2
(σn > 0).
* 2 Because of (0,c) rhn (r)dr = π2 1 − exp − c 2 σn (12.4) is satisfied if 2σn an only if the sequence c2 σn 1 − exp − 2σn 2 n−1
∞
is divergent. This is, for example, the case if σn → 0
and
∞
σn = ∞.
n−1
(4) Let h : R+ → R+ be a Lebesgue density. Let there be a ε > 0 such that h is positive on (0, ε). Let σ : D → (0, ∞) on D \ D ∗ be continuous. Let λn (x0 , . . . , xn−1 , ·) = λ(xn−1 , ·) have the density g(xn−1 , r) :=
r 1 h . σ (xn−1 ) σ (xn−1 )
Then we have for all c > 0
rλn (x0 , . . . , xn−1 , dr) = σ (xn−1 ) (0,c)
0, σ (x c
rh(r)dr.
n−1 )
If f is continuous and Da is compact for all a > f ∗ , then for all a ∈ (f ∗ , f (x0 )) there are constants σ1 > 0 and σ2 < ∞ with σ1 ≤ σ (x) ≤ σ2
for all
x ∈ Df (x0 ) \ Da .
References
205
With m := σ1
0, σc
rh(r)dr
it follows that
2
4
inf (0,c)
6 rλn (x0 , . . . , xn−1 , dr) : x1 , . . . , xn−1 ∈ Df (x0 ) \ Da ≥ m.
Since h is positive on (0, ε), we have m > 0. Therefore condition (12.4) is satisfied. Additional examples and convergence statements with regard to R-D-M can be found in [3].
References 1. Guseva, O.: Convergence of a random search algorithm. Kibernetika 7, 1104–1106 (1971) 2. Marti, J.: Konvexe Analysis. Birkhäuser, Basel (1977) 3. Rastrigin, L.: The convergence of the random search method in the extremal control of a many parameter system. Autom. Remote Control 24, 1337–1342 (1963) 4. Schwefel, K.P.: Numerische Optimierung von Computer-Modellen mittels der EvolutionsStrategie. Birkhäuser, Basel-Stuttgart (1977)
Chapter 13
Convergence of Stationary Random Search Methods for Positive Success Probability
Let (Xn : n ∈ N0 ) be in this section—in the sense of Definition 9.3—a stationary RS-M with the mutation transition probability p. Let us now again pose the question on which conditions (Xn : n ∈ N0 ) converges, i.e. when does f (Xn ) → f ∗
P − a.s.
apply to any starting point x0 ∈ D? For this it is obviously necessary that the following condition is satisfied: p x, y ∈ D : f (y) < f (x) > 0
for all x ∈ D \ D ∗ .
(13.1)
& ' For if an x ∈ D \ D ∗ would exist with p x, f < f (x) = 0, then P Xn+1 = x | Xn = x = q x, {x} & ' & & ' ' = p x, x ∩ f < f (x) + p x, Rd \ f < f (x) & ' = 1 − p x, f < f (x) = 1 and therefore P (Xn = x | X0 = x) = 1
for all n ∈ N.
Then because of P (X0 = x) the equality P (Xn = x) = 1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_13
207
208
13 Convergence of Stationary Random Search Methods for Positive Success. . .
applies to the R-S-M that starts to x0 = x for all n ∈ N0 . Therefore P
lim f (Xn ) = f (x) = 1
n→∞
in contradiction to f (x) > f ∗ and P lim f (Xn ) = f ∗ = 1. n→∞ It remains to be determined when (13.1), i.e. the requirement for a positive success probability, is sufficient for the convergence. For this purpose initially the following lemma will be used. Lemma 13.1 (
inf Df (y)
f (y) − f (z) p(y, dz) : y ∈ Df (x0 ) \ Da
) > 0.
(13.2)
For all x0 ∈ D \ D ∗ and all a ∈ (f ∗ , f (x0 )). Then we have f (Xn ) → f ∗ P − a.s. for all starting points x0 ∈ D. Proof Because of
f (y) − f (z) p(y, dz)
Df (y)
=
{u∈D:f (u) 0 with &
' & ' y ∈ D : f (y) < f (x) = y ∈ D : f (y) ≤ f (x) − b
for all x ∈ D.
There is furthermore a c > 0 with p(x, Df (x) ) ≥ c for all x ∈ D \ D ∗ . Therefore Df (y) (f (y) − f (z))p(y, dz) ≥ bc for all y ∈ D \ D ∗ . Now the assertion follows from Lemma 10.1.
Definition 13.1 Let E be a topological space and let BE be the Borel-σ -algebra on E.
13 Convergence of Stationary Random Search Methods for Positive Success. . .
209
(1) A function g : E → R is called lower semi-continuous, if lim inf g(xn ) ≥ g(x) n→∞
for all x ∈ E and all sequences (xn : n ∈ N) from E with lim xn = x. n→∞
(2) A transition probability K from (E, BE ) to (E, BE ) is called continuous, if the representation
x→
h(y)K(x, dy)
is continuous for all continuous and bounded functions h : E → R. Lemma 13.2 Let E be a Polish space (i.e. completely metrisable and separable) and let h : E × E → R be a lower semi-continuous and bounded from below. Let K be a continuous transition probability from E to E. Then the representation
x→
h(x, y)K(x, dy)
is lower semi-continuous.
Proof cf. [1, p. 18].
Theorem 13.2 Let the condition (13.1) be satisfied. Also let f and p be continuous and Da compact for all a ≥ f ∗ . Then for all starting points x0 ∈ D f (Xn ) → f ∗
P − a.s.
Proof Let x0 ∈ D \ D ∗ and let a ∈ (f ∗ , f (x0 )) be arbitrary. Let h : Df (x0 ) × Df (x0 ) → R+ h(y, z) := 1{(y,z):f (z)f ∗ +ε}
...
∗
{f >f ∗ +ε}
qn (x0 , x1 , . . . , xn−1 , dxn ) . . . q1 (x0 , dx1 ) 1 − pn x0 , x1 , . . . , xn−1 , {f ≤ f ∗ + ε} ·
qn−1 (x0 , . . . , xn−2 , dxn−1 ) . . . q1 (x0 , dx1 )
... ≤ 1 − a0 λd {f ≤ f ∗ + ε} · {f >f ∗ +ε}
...
{f >f ∗ +ε}
qn−1 (x0 , . . . , xn−2 , dxn−1 ) . . . q1 (x0 , dx1 )
n−1 1 − q1 x0 , {f ≤ f ∗ + ε} . ≤ . . . ≤ 1 − a0 λd {f ≤ f ∗ + ε} If ε < f (x0 ) − f ∗ and therefore f (x0 ) > f ∗ + ε, then q1 x0 , {f ≤ f ∗ + ε} = p1 x0 , {f ≤ f ∗ + ε} ≥ a0 λd {f ≤ f ∗ + ε}. Thus, n P f (Xn ) − f ∗ > ε ≤ 1 − a0 λd {f ≤ f ∗ + ε} If additionally ε ≤ c1 , then, according to (11.2), n P f (Xn ) − f ∗ > ε ≤ 1 − a0 λd K x ∗ , b1 ερ1 n = 1 − a0 mb1 d ερ1 d and therefore P f (Xn ) − f ∗ ≤ ε ≥ Hn (ε) n for all ε ∈ (0, min{c1 , f (x0 ) − f ∗ }), with Hn (ε) := 1 − 1 − a0 mb1 d ερ1 d . Obviously, P f (Xn ) − f ∗ ≤ ε = 1 for ε ≥ f (x0 ) − f ∗ P f (Xn ) − f ∗ ≤ ε = 0 for ε < 0. Defining α :=
1 2
min{c1 , f (x0 ) − f ∗ } and
and
14 Random Search Methods of Convergence Order O(n−α )
219
⎧ ⎪ ⎪ ⎨0 for ε ≤ 0 αε for 0 < ε ≤ f (x0 ) − f ∗ Gn (ε) := Hn f (x0 )−f ∗ ⎪ ⎪ ⎩ 1 for ε > f (x0 ) − f ∗ , Gn is a monotone increasing function, and for all ε ∈ R we have Gn (ε) ≤ Fn (ε) := P f (Xn ) − f ∗ ≤ ε . For ε ≤ 0 and ε ≥ f (x0 ) − f ∗ , it is clear that this inequality holds. For ε ∈ (0, α), this inequality follows because of α ≤ 1. f (x0 ) − f ∗ For ε ∈ [α, f (x0 ) − f ∗ ) it follows from Gn (ε) ≤ Gn f (x0 ) − f ∗ = Hn (α) ≤ Fn (α) ≤ Fn (ε). Hence, assertion (i) is shown. (c) In addition to (14.1) and (14.2), let also (14.3) be fulfilled. Furthermore, assume that Df (x0 ) is bounded. 1 ρ If 0 ≤ ε ≤ b2 c2 ρ2 , hence, bε2 2 ≤ c2 , condition (14.3) yields ( f ≤ f∗ +
ε b2
1) ρ2
⊂ D ∩ K(x ∗ , ε)
and therefore ( 1) & ' ε ρ2 y ∈ D :| y − x ∗ |> ε ⊂ f > f ∗ + . b2 According to (b), for ε ∈ [0, b2 c2 ρ2 ] we get P | Xn − x ∗ |≤ ε ≥P
∗
f (Xn ) − f ≤
≥Gn
ε b2
1! ρ2
.
ε b2
1! ρ2
14 Random Search Methods of Convergence Order O(n−α )
220
Furthermore, if
ε b2
1 ρ2
≤ f (x0 ) − f ∗ , i.e. ε ≤ b2 (f (x0 ) − f ∗ )ρ2 , we get
P | Xn − x ∗ |≤ ε ≥Gn
=Hn
ε b2
1! ρ2
α f (x0 ) − f ∗
ε b2
1! ρ2
.
From the boundedness of Df (x0 ) it follows that ' & r := sup | x − x ∗ |: x ∈ Df (x0 ) < ∞. ' & Defining β := min r, b2 (f (x0 ) − f ∗ )ρ2 , b2 c2 ρ2 and ⎧ ⎪ 0 for ε ≤ 0 ⎪ ⎪ ⎨ 1 ρ εβ α 2 ˜ n (ε) := Hn G for 0 < ε ≤ r f (x0 )−f ∗ rb2 ⎪ ⎪ ⎪ ⎩ 1 for ε > r, with the same argument as in (b), for all ε ∈ R we get the inequality ˜ n (ε). P | Xn − x ∗ |≤ ε ≥ G This shows now assertion (iv). (d) To prove assertion (ii), it is sufficient to show that for all x0 ∈ D, α < ε > 0 we have ∞
P nα f (Xn ) − f ∗ > ε < ∞.
n=1
With the lemma of Borel–Cantelli we obtain then P nα f (Xn ) − f ∗ > ε for infinite many n ∈ N = 0. Taking the complement, we have P nα f (Xn ) − f ∗ ≤ ε for asymptotically all n ∈ N = 1. This yields P lim sup nα f (Xn ) − f ∗ ≤ ε = 1. n→∞
1 dρ2 ,
and
(14.5)
14 Random Search Methods of Convergence Order O(n−α )
221
Because this is satisfied for each ε > 0, we get (ii). Now we show (14.5): If εn−α ≤ f (x0 ) − f ∗ , then (i) yields n P nα f (Xn ) − f ∗ > ε ≤ 1 − c(x0 )(εn−α )dρ1 . Because of α < (dρ1 )−1 it is αdρ1 < 1. Now (14.5) follows from Lemma B.16. Part (v) can be shown analogously by using (iv). (e) Because of Fn (x) = 0 for x < 0, part (b) yields E f (Xn ) − f ∗ =
∞
0
=
∞
(1 − Fn (x))dx ≤ 0
f (x0 )−f ∗
1 − a0 mb1
0
d
(1 − Gn (x))dx αx f (x0 ) − f ∗
ρ1 d !n dx.
4 6 − 1 For αˆ := min α, a0 mb1 d ρ1 d we have αˆ ≤ α and therefore E f (Xn ) − f ∗ ≤
f (x0 )−f ∗
1 − a0 mb1
0
d
αx ˆ f (x0 ) − f ∗
ρ1 d !n dx
− 1 f (x ) − f ∗ αˆ a0 mb1 d ρ1 d n ρ1 d 0 1 − uρ1 d du = a0 mb1 d αˆ 0 − 1 f (x ) − f ∗ 1 n ρ1 d 0 1 − uρ1 d du. ≤ a0 mb1 d αˆ 0 1
Now it follows that 1 lim sup n ρ1 d E f (Xn ) − f ∗ n→∞
1 − 1 f (x ) − f ∗ n 1 ρ1 d 0 lim sup n ρ1 d 1 − uρ1 d du ≤ a0 mb1 d αˆ n→∞ 0 − 1 f (x ) − f ∗ 1 ρ1 d 0 1+ . = a0 mb1 d αˆ ρ1 d
Here, the last equation follows again from Lemma B.16. This proves now assertion (iii). From (c) it follows that
14 Random Search Methods of Convergence Order O(n−α )
222
∗
∞
E | Xn − x | ≤ 0
= 0
˜ n (x) dx 1−G
⎛ r
1 α − 1 ⎝1 − a0 m b1 ρ1 b2 ρ2 ∗ f (x0 ) − f
dρ1
xβ r
dρ1 ρ2
⎞n ⎠ dx
− ρ2 f (x ) − f ∗ ρ2 b r 1 dρ1 n ρ1 d 0 2 1 − u ρ2 ≤ a0 mb1 d du, α βˆ 0 6 4 − ρρ2d f (x0 )−f ∗ ρ2 d ˆ 1 b2 . where β := min β, a0 mb1 α Corresponding to the above, part (vi) follows now again by using Lemma B.16. (f) If in addition to (14.1) and (14.2) condition (14.4) holds, then analogously to part (a) we get P f (Xn ) − f ∗ > 0 = P
n 8 &
' f (Xi ) − f > 0
!
∗
i=0
n ≤ 1 − a0 λd {f ≤ f ∗ } = (1 − a0 μ)n . From this it follows on the one hand that
∗ E f (Xn ) − f = {f (Xn
)−f ∗ >0}
f (Xn ) − f ∗ dP
≤ f (x0 ) − f ∗ P f (Xn ) − f ∗ > 0 ≤ f (x0 ) − f ∗ (1 − a0 μ)n . Therefore, with β = (1 − a0 μ)−1 , for all n ∈ N we have the inequality β n E f (Xn ) − f ∗ ≤ f (x0 ) − f ∗ . On the other hand, we have P
∞ 8 &
f (Xi ) > f
∗
'
i=0
P
∞ ; &
f (Xi ) = f
' ∗
! = 0,
hence
! = 1.
i=0
Because of the monotonicity of the sequence f (Xn ) we even have that (cf. Remark 11.1)
14 Random Search Methods of Convergence Order O(n−α )
⎛ P⎝
∞ 8 ; &
223
⎞ ' f (Xj ) = f ∗ ⎠ = 1
i=0 j ≥i
and therefore ⎛ P⎝
∞ 8 ; &
⎞ ' aj (f (Xj ) − f ∗ ) = 0 ⎠ = 1
i=0 j ≥i
for each sequence (an : n ∈ N) of real numbers. This proves assertion (vii). Remark 14.2 Let D ∗ = ∅, and assume that f is convex on an open sphere K D ∗ . Then condition (14.2) holds (with ρ1 = 1). Proof Let x ∗ ∈ D ∗ and r > 0 be chosen such that D ∗ K(x ∗ , r) ⊂ D, and ∗ suppose that f is continuous and convex on K(x , r). Let c := sup f (x) : x ∈ K(x ∗ , r) . Because of D ∗ K(x ∗ , r) it follows that c − f ∗ > 0. r Define c1 := c − f ∗ and b1 := c−f ∗. Let ε ∈ (0, c1 ] be arbitrary. Because of b1 ε ≤ b1 c1 ≤ r we get K(x ∗ , b1 ε) ⊂ D. Let now y ∈ K(x ∗ , b1 ε)\{x ∗ } be arbitrary. r ∗ ∗ With z := x ∗ + |y−x ∗ | (y−x ) we find | z−x |= r, and, according to Lemma B.1 it follows that f (z) − f ∗ f (y) − f ∗ ≤ . | y − x∗ | | z − x∗ | From this we obtain f (y) − f ∗ ≤| y − x ∗ |
f (z) − f ∗ c − f∗ < b1 ε =ε r r
Hence, K(x ∗ , b1 ε) ⊂ {f ≤ f ∗ + ε} for all ε ∈ (0, c1 ]
Remark 14.3 Assume that = ⊂ D˙ with compact D and continuous f . Furthermore, assume that f is twice continuously differentiable on a neighborhood of x ∗ , and has a positive definite Hessian matrix at x ∗ . Then the conditions (11.2) and (11.3) are fulfilled (with ρ1 = ρ2 = 12 ). This follows directly from Lemma B.10, Lemma B.11, and Remark B.1. D∗
{x ∗ }
Remark 14.4 Assume that the mutation sequence (pn : n ∈ N) is absolutely continuous with the sequence of densities (gn : n ∈ N). Furthermore, suppose that
14 Random Search Methods of Convergence Order O(n−α )
224
⎧ ∗ ⎪ ⎪ ⎨For all x0 ∈ D\D there exists a c0 > 0 with gn (x0 , x1 , . . . , xn−1 , yn ) ≥ c0 for all n ∈ N, ⎪ ⎪ ⎩ x ,...,x ∗ 1 n−1 ∈ Df (x0 ) \D and yn ∈ Df (x0 ) .
(14.6)
(Cf. Condition (12.1). Obviously, (12.1) follows from(14.6).) Then Condition (14.1) holds. Equation (14.6) is satisfied in particular in the stationary case, i.e. if gn (x0 , x1 , . . . , xn−1 , yn ) = g(xn−1 , yn ), as well as all Da (a > f ∗ ) are bounded, ¯ and g is positive and continuous on D¯ × D. Proof Let x0 ∈ D\D ∗ , n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) and B ⊂ Df (x0 ) be measurable. If (14.6) is satisfied, then
pn (x0 , x1 , . . . , xn−1 , B) = gn (x0 , x1 , . . . , xn−1 , yn )λd (dyn ) ≥ c0 λd (B), B
and also (14.1) holds. Based on the above assumptions, in the stationary case the density g takes its minimum on Df (x0 ) × Df (x0 ) . However, this is positive according to the assumptions. Hence, (14.6) is therefore satisfied. Theorem 14.2 Assume that condition (14.3) of Theorem 14.1 be satisfied. Additionally, suppose: ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
There exist a b ≥ 0 and an R ∈ (0, ∞) ∪ {∞} with pn (x0 , x1 , . . . , xn−1 , B ∩ K(xn−1 , R)) ≤ bλd (B ∩ K(xn−1 , R)) for all n ∈ N, x0 , . . . , xn−1 ∈
D\D ∗
and measurable B ⊂ D .
Then, there exists f0 > f ∗ with (i)
α ∗ P lim inf n f (Xn ) − f =∞ =1 n→∞
for all α >
2 and x0 ∈ Df0 \D ∗ ; dρ2
(ii)
α ∗ P lim inf n | Xn − x | = ∞ = 1 n→∞
for all α >
2ρ1 and x0 ∈ Df0 \D ∗ . dρ2
(14.7)
14 Random Search Methods of Convergence Order O(n−α )
225
if additionally condition (14.2) of Theorem 14.1 is satisfied. (iii) 1 1 lim inf n dρ2 E f (Xn ) − f ∗ ≥ f (x0 ) − f ∗ 1 + n→∞ dρ2 for all x0 ∈ Df0 \D ∗ . In the stationary case there exists a function h with h(x) > 0 for x ∈ D\D ∗ and 1 ∗ dρ 2 lim inf n E f (Xn ) − f ≥ h(x0 ) n→∞
for all x0 ∈ D\D ∗ . Note 14.1 For R = ∞ and arbitrary x ∈ Rd we put K(x, R) := Rd . Proof of Theorem 14.2 (a) Choose sufficiently small c > 0 such that (
R c ≤ min c2 , 2b2
1) ρ2
and bλd K x ∗ , b2 cρ2 < 1
(As already mentioned, we have λd (K (x ∗ , b2 cρ2 )) = mb2 d c2 dρ2 , d
where m =
π 2 is 1+ d2 := f ∗ + c.
the volume of the unit sphere).
Let f0 Because of c ≤ c2 , according to (14.3) we have D f0
∗ ρ2 ∗ R . = {f ≤ f + c} ⊂ K x , b2 c ⊂K x , 2 ∗
Therefore, for x, y ∈ Df0 we have | x − y |< R. Hence, Df0 ⊂ K(x, R) for all x ∈ Df0 . (b) Let now x0 ∈ Df0 \D ∗ and ε ∈ (0, f (x0 ) − f ∗ ]. Moreover, x0 , . . . , xn−1 ∈ Df (x0 ) \Df ∗ +ε . Then Df ∗ +ε ⊂ Df (xi−1 ) ⊂ Df (x0 ) ⊂ Df0 ⊂ K(xi−1 , R) for i = 1, . . . , n Thus, for i = 1, . . . , n we have qi (x0 , . . . , xi−1 , Df (x0 ) \Df ∗ +ε ) = 1 − qi (x0 , . . . , xi−1 , Df ∗ +ε ) + qi (x0 , . . . , xi−1 , Rd \Df (x0 ) ) = 1 − qi (x0 , . . . , xi−1 , Df ∗ +ε ) = 1 − pi (x0 , . . . , xi−1 , Df ∗ +ε )
14 Random Search Methods of Convergence Order O(n−α )
226
= 1 − pi x0 , . . . , xi−1 , Df ∗ +ε ∩ K(xi−1 , R) ≥ 1 − bλd Df ∗ +ε ∩ K(xi−1 , R) ≥ 1 − bλd K(x ∗ , b2 ερ2 ) ≥ 0. Here, the last but one inequality follows from (14.3), since ε ≤ f (x0 ) − f ∗ ≤ f0 − f ∗ = c ≤ c2 . Applying this inequality to i = n, n − 1, . . . , 1, we obtain P f (Xn ) − f ∗ ≤ ε n 8
=1−P
{Xi ∈ (Df (x0 ) \Df ∗ +ε )}
i=1
=1−
!
Df (x0 ) \Df ∗ +ε
Df (x0 ) \Df ∗ +ε
...
qn (x0 , . . . , xn−1 , dxn ) . . . q1 (x0 , dx1 )
n n = 1 − 1 − bmb2 d εdρ2 . ≤ 1 − 1 − bλd K(x ∗ , b2 ερ2 ) (c) If additionally condition (14.2) holds, then for ε > 0 with
2ε b1
1 ρ1
we obtain the inclusion ( ∗
f ≤f +
≤ c1 , hence, 0 < ε ≤
2ε b1
1) ρ1
b1 ρ2 c1 , 2
⊃ K(x ∗ , 2ε) ⊃ K(x ∗ , ε).
This yields &
∗
'
( ∗
y ∈ D :| y − x |> ε ⊃ f > f +
2ε b1
1) ρ1
.
Using part (b), for 0 < ε ≤ min f (x0 ) − f ∗ , b21 c1 ρ2 we obtain
14 Random Search Methods of Convergence Order O(n−α )
P | Xn − x ∗ |> ε ≥ P
227
f (Xn ) − f ∗ > ⎛
≥ ⎝1 − bmb2 d
2ε b1
2ε b1
1!
dρ2 ρ1
ρ1
⎞n ⎠ .
(d) Let now N ∈ N be arbitrary and α > dρ2 2 . Let K ∈ N be chosen such that bmb2 d (N n−α )dρ2 ≤ 1 and N n−α ≤ f (x0 ) − ∗ f , if n ≥ K. Then part (b) and the Bernoulli inequality yield n P f (Xn ) − f ∗ ≤ Nn−α ≤ 1 − 1 − bmb2 d (N n−α )dρ2 ≤ bmb2 d (N n−α )dρ2 n = bmb2 d N dρ2 n1−αdρ2 . Because of α > ∞ n=1
2 dρ2
it follows that 1 − αdρ2 < −1 and therefore
∞ ∗ d dρ2 P n f (Xn ) − f ≤ N ≤ K + bmb2 N n1−αdρ2 < ∞.
α
n=K
From Borel–Cantelli’s lemma it follows therefore that P nα f (Xn ) − f ∗ ≤ N for infinite many n ∈ N = 0 , P nα f (Xn ) − f ∗ > N for finally all n ∈ N = 1, respectively. Because this is true for all n ∈ N, the assertion (i) follows. Similarly, Part (c) is used to show assertion (ii). (e) Let again x0 ∈ Df0 \D ∗ . Define Fn (ε) := P (f (Xn ) − f ∗ ≤ ε). From part (b) for all ε ∈ (0, f (x0 ) − f ∗ ] we get the inequality n Fn (ε) ≤ 1 − 1 − bmb2 d εdρ2 . The continuity of Fn from the right guarantees this inequality also for ε = 0. Let ⎧ ⎪ 0 if ε < 0 ⎪ ⎪ ⎨ dρ2 n Gn (ε) := 1 − 1 − f (x ε)−f ∗ if 0 ≤ ε ≤ f (x0 ) − f ∗ 0 ⎪ ⎪ ⎪ ⎩ 1 if ε > f (x0 ) − f ∗ .
14 Random Search Methods of Convergence Order O(n−α )
228
Because of the assumption bλd (K(x ∗ , b2 cρ2 )) < 1 (see part (a)), we get − 1 dρ2 f (x0 ) − f ∗ ≤ f0 − f ∗ = c < bmb2 d , hence,
1 f (x0 ) − f ∗
dρ2
≥ bmb2 d
and therefore Fn (ε) ≤ Gn (ε) for all ε ∈ R. Therefore, n
1 dρ2
1 E f (Xn ) − f ∗ = n dρ2
∞
(1 − Fn (x))dx
0
≥n
1 dρ2
f (x0 )−f ∗
1−
0 1 = n dρ2 f (x0 ) − f ∗
1
x f (x0 ) − f ∗
dρ2 !n dx
(1 − udρ2 )n du.
0
Now, Lemma B.16 yields assertion (iii). (f) Select an arbitrary x0 ∈ D\D ∗ . Let ( inf{n ∈ N0 : Xn ∈ (Df0 \D ∗ )} if{n ∈ N0 : Xn ∈ (Df0 \D ∗ )} = ∅ . T := ∞ else Because of the monotonicity of the sequence E(f (Xn ) − f ∗ ) and due to E(f (Xn ) − f ∗ ) ≥ 0, there exists b ≥ 0 with lim E(f (Xn ) − f ∗ ) = b.
n→∞
1
If b > 0, the last assertion of Theorem 14.2 follows because of n dρ2 → ∞. If b = 0, the theorem of monotone convergence yields E
lim (f (Xn ) − f ∗ ) = 0.
n→∞
Because of lim (f (Xn ) − f ∗ ) ≥ 0 n→∞
P − a.s., we therefore have
lim (f (Xn ) − f ∗ ) = 0
n→∞
P − a.s..
14 Random Search Methods of Convergence Order O(n−α )
229
From this and on account of P (Xn ∈ D ∗ ) = P (f (Xn ) − f ∗ = 0) = Fn (0) = 0 (cf. e)) it follows that P (T < ∞) = 1. Thus, we have E f (Xn ) − f ∗ ≥ E f (Xn+T ) − f ∗
= E f (Xn+T ) − f ∗ | XT = x PXT (dx). Since the stationary case is present, (cf. [2]), we have E f (Xn+T ) − f ∗ | XT = x = E f (Xn ) − f ∗ | X0 = x
PXT − a.s..
For x0 = x, according to part (e) we get 1 1 n dρ2 E f (Xn ) − f ∗ | X0 = x = n dρ2 E f (Xn ) − f ∗ 9
1 n : 1 1 − udρ2 du (f (x) − f ∗ ). ≥ n dρ2
0
Because of P XT ∈ (Df0 \D ∗ ) = 1, this implies that 1 n dρ2 E f (Xn ) − f ∗
1 E f (Xn+T ) − f ∗ | XT = x PXT (dx) ≥ n dρ2
Df0 \D ∗
1
= n dρ2
Df0 \D ∗
9
1 ≥ n dρ2
E f (Xn ) − f ∗ | X0 = x PXT (dx)
1
1 − udρ2
0
n
:
du
Df0 \D ∗
(f (x) − f ∗ )PXT (dx).
Now, it is f (x) − f ∗ > 0 on Df0 \D ∗ and therefore
Df0
\D ∗
(f (x) − f ∗ )PXT (dx) > 0.
14 Random Search Methods of Convergence Order O(n−α )
230
With h(x) := 1 + dρ1 2 Df \D ∗ (f (x) − f ∗ )PXT (dx), the last assertion 0 Theorem 14.2 follows from Lemma B.16. Remark 14.5 Suppose that the mutation sequence (pn : n ∈ N) is absolutely continuous with the sequence of densities (gn : n ∈ N). Furthermore, assume ⎧ ⎪ ⎪ ⎨There exist constants b0 ≥ 0 and R ∈ R+ ∪ {∞}with ⎪ ⎪ ⎩
gn (x0 , . . . , xn−1 , yn ) ≤ b0 for all n ∈ N, x0 , . . . , xn−1 ∈ D\D ∗
(14.8)
and yn ∈ D ∩ K(xn−1 , R).
Then condition (14.7) is satisfied. Condition (14.8) is satisfied in particular in the stationary case, D is bounded, and g is continuous on D¯ × D¯ This can be shown analogously to the proof of Remark 14.4. Remark 14.6 Let the RSM (Xn : n ∈ N0 ) be stationary with the mutation transition probability p, which is assumed to be absolutely continuous with the continuous and positive Lebesgue density g. Furthermore, let the conditions of Remark 14.3 be fulfilled. Then all assumptions of Theorem 14.1 and Theorem 14.2 hold (with ρ1 = ρ2 = 1 2 ). This follows from the Remarks 14.3 and 14.5. In particular, for each starting point x0 ∈ D\D ∗ there exist constants c1 > 0 and c2 < ∞ with 2 c1 ≤ lim inf n d E(f (Xn ) − f ∗ ) n→∞ 2 ≤ lim sup n d E(f (Xn ) − f ∗ ) ≤ c2 . n→∞
The first inequality means rate of convergence of the sequence E(f (Xn ) − that − d2 , while the last inequality means that the rate of is at most of order O n convergence is at least of this order. 2 Hence, the rate of convergence is exactly of order O n− d . f ∗)
Example 14.1 Let f : D → R, D = R, be continuously differentiable and convex. Furthermore, let D ∗ be a singleton, hence, D ∗ = {x ∗ }. Let denote h : R → R a positive, continuous, and bounded probability density. Let yn − xn−1 1 h gn (x0 , . . . , xn−1 , yn ) =g(xn−1 , yn ) = σ (xn−1 ) σ (xn−1 ) (cf. Example 12.1).
14 Random Search Methods of Convergence Order O(n−α )
231
(Note: If the random variable X has the density fX (x), then the random variable f ( x−a ) Y = a + bX(b > 0) has the density fY (x) = X b b ). Let us consider the following situation: (i) uncontrolled case: σ (x) ≡ σ > 0. (ii) controlled case: σ (x) =| ∇f (x) |=| f (x) |. In the uncontrolled case, obviously condition (14.8) is satisfied because of g(x, y) = 1 σ sup{h(z) : z ∈ D}. Since D ∗ is bounded and f is convex, any Da (a > f ∗ ) is bounded (cf. Lemma B.3), and Remark 14.4 implies that condition (14.1) is satisfied too. If in the controlled case x0 = x ∗ + 2 is chosen for the starting point and B = ∗ [x + 1, x ∗ + 2], then
p(x, B) =
g(x, y)λ1 (dy) = B
=
(x ∗ +2−x) σ (x) (x ∗ +1−x) σ (x)
since
1 σ (x)
h(u)du ≤
x ∗ +2 x ∗ +1
∞ (x ∗ +1−x) σ (x)
h
y−x σ (x)
dy
h(u)du −→∗ 0 x→x
1 → ∞ , if x → x ∗ . σ (x)
If it is taken into consideration that because of the convexity of f the set B is included Df (x0 ) , then this indicates that in the controlled case condition (14.1) is violated. The same is also true in case of condition(14.7). Indeed, if we select C C Bx := x ∗ − σ (x), x ∗ + σ (x) , then on one hand we have C λ1 (Bx ) = 2 σ (x) → 0 for x ∗ → x. But on the other hand it is 1 p(x, Bx ) = σ (x)
=
√ x ∗ + σ (x)
√ x ∗ − σ (x)
√ (x ∗ −x+ σ (x)) σ (x) √ (x ∗ −x− σ (x)) σ (x)
h
y−x (x)
dy
h(u)du −→∗ x→x
+∞
−∞
h(u)du = 1.
Therefore, condition (14.7) cannot be satisfied. With the methods of Chap. 14 it is therefore not possible to make a statement about the rate of convergence in the controlled case. This gap will be closed in the next chapter.
232
14 Random Search Methods of Convergence Order O(n−α )
Additionally, it should be noted that in the controlled case it is possible to prove the validity of condition (12.1) with Remark 12.3. Thus, from Remark 12.2 and Theorem 12.1 at least the convergence of the controlled random search procedure can be obtained.
References 1. Gänssler, P., Stute, W.: Wahrscheinlichkeitstheorie. Springer, Berlin (1977) 2. Neveu, J.: Mathematische Grundlagen der Wahrscheinlichkeitstheorie. Oldenburg Verlag, Munich (1969)
Chapter 15
Random Search Methods with a Linear Rate of Convergence
15.1 Methods with a Rate of Convergence that Is at Least Linear We return now once again to the example of Chap. 14. In this example we have D=R
,
f (x) = | x |s
(s > 0 arbitrary), and
p(x, ·)
denotes the uniform distribution on the interval [x − r(x), x + r(x)]. We know that in the uncontrolled case r(x) = r > 0
for any
x∈R
the rate of convergence of the sequence E(f (Xn ) − f ∗ ) is of the order O(n−s ). Of course, we may choose a different function r(x) in order to improve the type of the convergence rate. However, what is an appropriate choice of r(x)? To find an answer to this question, we remember that for the process history x0 , x1 , . . . , xn the next mutation point yn+1 is the realization of the uniform distribution on the interval [xn −r(xn ), xn +r(xn )]. If r(x) is chosen as a constant function, then the probability of a search success (i.e., f (yn+1 ) < f (xn )) becomes very small if xn is close to x ∗ . It even becomes all the smaller, the closer xn approaches x ∗ . Obviously, this has a negative influence on the rate of convergence. Hence, to remedy this, we have to adjust the length of the mutation interval [xn − r(xn ), xn + r(xn )] to the distance between xn and x ∗ . In the case of an optimal point x ∗ , this is not a problem. But how to obtain information about | x − x ∗ |, if x ∗ is not known? In general this is very difficult. However, if some information about the form of f are known, things may be different. For example, for quadratic functions f we have the equation
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_15
233
234
15 Random Search Methods with a Linear Rate of Convergence
x − x ∗ = Hf −1 (x)∇f (x)
(Hf = Hessian)
If f is convex and twice continuously differentiable with a regular Hessian at x ∗ , then for each R > 0 at least constants c1 , c2 > 0 can be found such that on K(x ∗ , R)\{x ∗ } the estimate c1 ≤
| ∇f (x) | ≤ c2 | x − x∗ |
holds (cf. Lemma B.13). Now we will return again to our example and adapt the length of the interval [x − r(x), x + r(x)] to the distance between x and x ∗ by setting r(x) =| x − x ∗ |=| x | . Furthermore, with this control of the interval length we examine whether the convergence rate O(n−s ) of the uncontrolled case is exceeded under the above control law. According to Lemma 9.1, for x > 0 we have
∗
∗
(f (y) − f )q(x, dy) = f (x) − f − = f (x) − f ∗ −
= | x |s −
−|x|
= | x |s − = | x |s −
|x|
|x|
−|x|
=
(f (x) − f (y))q(x, dy)
{f 0, it follows that With β := 12 s+2 s+1 we then obtain
1 s+2 2 s+1
< 1.
E(f (Xn ) − f ∗ ) = O(β n ). Comparing this result with the uncontrolled case, a considerable improvement of the rate of convergence is found. The reason for this is the fact that the step length parameter r(x) has been adapted to the distance between x and x ∗ . As already mentioned, such an adjustment is possible only if specific demands are made on the shape of f and D. Hence, in the following we consider the following assumptions: (
D ∗ is assumed to be bounded and D ∗ ∩ D˙ = ∅ . Furthermore, let there exist an α > f ∗ with a convex Dα , such that f is continuous and convex on Dα . (15.1) ⎧ ∗ ∗ ∗ ⎪ ˙ ˜ ⎪ ⎨There exist an x ∈ D ∩ D as well as a˜ > f and b > 0 with ∗ inf{| y − x |: y ∈ D, f (y) = f (x)} ≥ ⎪ ⎪ ⎩b˜ · sup{| y − x ∗ |: y ∈ D, f (y) = f (x)} for all x ∈ D \D ∗ a˜
(15.2)
Remark 15.1 Condition (15.2) ensures, roughly spoken, that the level sets {x ∈ D : f (x) = c} do not get too small for values of c close to f ∗ . This will be illustrated by means of two simple examples. If D = Rd and f (x, y) := ax 2 + by 2 (a, b fixed with 0 < a < b), then * the level * sets {x ∈ D : f (x) = c} (c > 0) are ellipses with the half axis ac and bc . Therefore, @
∗
inf{| z − x |: f (z) = c} = @ = i.e. condition (15.2) holds with b˜ :=
*
a b.
c = b
@ @ a c b a
a sup{| z − x ∗ |: f (z) = c}, b
236
15 Random Search Methods with a Linear Rate of Convergence
Conversely, for the function f (x, y) := x 2 +y 4 condition (15.2) violated. Indeed, for 0 < c ≤ 1 we have 1
1 inf{| z − x ∗ |: f (z) = c} c2 = 1 = c 4 −→ 0. c→0 sup{| z − x ∗ |: f (z) = c} c4
˙ and let f be twice continuously differentiable Remark 15.2 Let D ∗ = {x ∗ } ⊂ D, in a neighborhood of x ∗ with a positive definite Hessian at x ∗ . Furthermore, assume that f is continuous on D. If D is compact or D and f are convex, then the conditions (12.1) and (12.2) are satisfied. Indeed, by Lemma B.13 we know that all assumptions of Lemma B.12 are satisfied. According to Remark B1, the assumptions (15.1) and (15.2) are then satisfied. Definition 15.1 (a) A random search procedure (Xn : n ∈ N0 ) starting in x0 ∈ D is called convergent in the mean, linearly convergent P -a.s., resp., with respect to the starting point x0 ∈ D if a q > 1 exists such that q n E(f (Xn ) − f ∗ ) −→ 0 , n→∞ P -a.s. , respectively. q n f (Xn ) − f ∗ −→ 0 n→∞
(b) A random search procedure (Xn : n ∈ N0 ) is called convergent in the mean, globally linearly convergent P − a.s., provided that for each starting point x0 ∈ D the procedure (Xn : n ∈ N0 ) is convergent in the mean, linearly convergent P -a.s., respectively. For the proof of a linear rate of convergence the following two lemmas are of crucial importance. Instead of {x ∈ D : f (x) < c} we will use again the shorter notation {f < c}. Lemma 15.1 Let denote (Xn : n ∈ N0 ) the random search procedure starting in x0 ∈ D\D ∗ and having the mutation sequence (pn : n ∈ N). Furthermore, let there exist a c > 0 with
(f (xn−1 ) − f (yn ))pn (x0 , x1 , . . . , xn−1 , dyn ) {f f ∗ the following conditions hold: ⎧ ⎪ There exists ε > 0 with ⎪ ⎪ ⎪ ⎨ ∗ rμ (x , x , . . . , x n 0 1 n−1 , s, dr) ≥ ε0 | xn−1 − x | 0, 12 |xn−1 −x ∗ | ⎪ ⎪ ⎪ ⎪ ⎩ for all n ∈ N, x0 , . . . , xn−1 ∈ Da \D ∗ , s ∈ S (15.3) and
15.1 Methods with a Rate of Convergence that Is at Least Linear
⎧ ⎪ ⎪ ⎨There exists τ > 0 with ρn (x0 , x1 , . . . , xn−1 , A) ≥ τ γ (A) ⎪ ⎪ ⎩ for all n ∈ N, x , . . . , x ∗ 0 n−1 ∈ Da \D , A ⊂ S measurable .
241
(15.4)
Then there is a0 > f ∗ such that the random search procedure (Xn : n ∈ N0 ) is linearly convergent in the mean for any starting point x0 ∈ Da0 . (ii) If for an a > f ∗ condition (12.4) as well as the following condition ⎧ ⎪ There exist ε1 ∈ 0, 12 and σ > 0 such that ⎪ ⎪ ⎨ μn x0 , x1 , . . . , xn−1 , s, ε1 | xn−1 − x ∗ |, 12 | xn−1 − x ∗ | ≥ σ ⎪ ⎪ ⎪ ⎩ for any n ∈ N, x0 , . . . , xn−1 ∈ Da \D ∗ , s ∈ S (15.5) is satisfied, then there is a1 > f ∗ such that the procedure (Xn : n ∈ N0 ) is linearly convergent for all starting points x0 ∈ Da1 P -a.s. (iii) Suppose that D is convex and f is convex and continuous. If for each a > f ∗ the conditions (12.4) and (15.4), the conditions (12.4) and (15.5), are satisfied, then the procedure (Xn : n ∈ N0 ) is globally linearly convergent in the mean, P -a.s., respectively. Proof (1) Suppose that the conditions (15.3) and (15.4) are satisfied for an a > f ∗ . If in Lemma B.11 the set D is replaced by Dα (α according to assumption (15.1)), π B then, together with assumption (15.2), it follows that there is a φ ∈ 0, 2 with 1 ∗ ∗ x + C φ, x − x, | x − x | 2 6 4 |y−x | ∗ (f (x) − f ) ⊂ y ∈ D : f (x) − f (y) ≥ 3 | x∗ − x | ⊂ {f ≤ f (x)} for all x ∈ Dα \D ∗ . For the definition of C φ, x ∗ − x, 12 | x ∗ − x | , see Definition B.1. Let a0 := min{a, α}, and let denote x0 an arbitrary starting point from Da0 . Without loss of generality we may assume that f (x0 ) > f ∗ , because for x0 ∈ D ∗ the assertion is trivial. Select any n ∈ N and arbitrary x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ . Because of Df (x0 ) \D ∗ ⊂ Da0 \D ∗ ⊂ Dα \D ∗
242
15 Random Search Methods with a Linear Rate of Convergence
we have
(f (xn−1 ) − f (yn )pn (x0 , x1 , . . . , xn−1 , dyn ) xn−1 +C φ,x ∗ −xn−1 , 12 |xn−1 −x ∗ |
≥
f (xn−1 ) − f ∗ 3 | xn−1 − x ∗ |
Rd
· C φ,x ∗ −xn−1 , 12 |xn−1 −x ∗ |
| yn − xn−1 | 1
· (yn − xn−1 )pn (x0 , x1 , . . . , xn−1 , dyn ) =
f (xn−1 ) − f ∗ 3 | xn−1 − x ∗ |
S
R+
(rs)· C φ,x ∗ −xn−1 , 12 |xn−1 −x ∗ |
| rs | 1
· μn (x0 , x1 , . . . , xn−1 , s, dr)ρn (x0 , x1 , . . . , xn−1 , ds) f (xn−1 ) − f ∗ = 3 | xn−1 − x ∗ |
S
R+
r1C (φ,x ∗ −xn−1 ) (s)1
0, 12 |xn−1 −x ∗ |
(r)·
· μn (x0 , x1 , . . . , xn−1 , s, dr)ρn (x0 , x1 , . . . , xn−1 , ds) f (xn−1 ) − f ∗ = 3 | xn−1 − x ∗ |
S∩C (φ,x ∗ −xn−1 )
0, 12 |xn−1 −x ∗ |
r·
· μn (x0 , x1 , . . . , xn−1 , s, dr)ρn (x0 , x1 , . . . , xn−1 , ds) ≥
ε0 (f (xn−1 ) − f ∗ )ρn x0 , x1 , . . . , xn−1 , S ∩ C φ, x ∗ − xn−1 3
(due to (15.3)) ≥
ε0 (f (xn−1 ) − f ∗ )τ γ S ∩ C φ, x ∗ − xn−1 . 3
(because of (15.4)) We know that γ (S ∩ C (φ, x ∗ − xn−1 )) is positive and does not depend on xn−1 (cf. Lemma A.3). Considering the inclusion {f < f (xn−1 )} ⊂ xn−1 + C (φ, x ∗ − xn−1 ), Lemma 15.1 yields the statement of part (i). (2) Suppose that for a certain a > f ∗ conditions (15.4) and (15.5) are fulfilled. Let α be selected according to assumption (15.1), and let a1 := min{a, α}. Select x0 ∈ Da1 \D ∗ and x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ arbitrarily. Applying Lemma B.11 to Dα , we obtain
15.1 Methods with a Rate of Convergence that Is at Least Linear
243
1 ∗ ∗ ∗ xn−1 + C φ, x − xn−1 , ε1 | x − xn−1 |, | x − xn−1 | 2 (cf. Definition B.1)) ε1 ⊂ y ∈ D : f (xn−1 ) − f (y) ≥ (f (xn−1 ) − f ∗ ) 3 ε 1 ⊂ y ∈ D : f (y) − f ∗ ≤ 1 − (f (xn−1 ) − f ∗ ) . 3 Consequently, we have ε1 (f (xn−1 ) − f ∗ ) pn x0 , x1 , . . . , xn−1 , y ∈ D : f (y) − f ∗ ≤ 1 − 3 ≥ pn x0 , x1 , . . . , xn−1 , xn−1 + 1 ∗ ∗ ∗ + C φ, x − xn−1 , ε1 | x − xn−1 |, | x − xn−1 | 2
= μn (x0 , x1 , . . . , xn−1 , s, dr)· S∩C (φ,x ∗ −xn−1 ) ε1 |x ∗ −xn−1 |, 12 |x ∗ −xn−1 | · ρn (x0 , x1 , . . . , xn−1 , ds) ≥ σρn x0 , x1 , . . . , xn−1 , S ∩ C φ, x ∗ − xn−1 ≥ σ τ γ S ∩ C φ, x ∗ − xn−1 . Lemma 15.2 yields now the assertion in part (ii). (3) Assume that D is convex and f is convex and continuous. Let x0 ∈ D\D ∗ be selected arbitrarily. According to the assumptions, for a := f (x0 ) (15.3) and (15.4) are satisfied. Due to Lemma B.11 there exists a φ > 0 with 1 x + C φ, x ∗ − x, | x ∗ − x | 2 4 6 |x−y | ∗ (f (x) − f ⊂ y ∈ D : f (x) − f (y) ≥ ) 3 | x − x∗ | ⊂ {f ≤ f (x)}
for all x ∈ Da \D ∗ .
The assertions analogous to (i), (ii), resp., follow now exactly in the same way as in part (1), part (2), respectively. Theorem 15.2 Suppose that the assumptions (15.1) and (15.2) are satisfied.
244
15 Random Search Methods with a Linear Rate of Convergence
Let (Xn : n ∈ N0 ) be a random direction method with the step length distribution λn . Let h denote a bounded probability density on R+ and let αn be a positive, measurable function with λn (x0 , x1 , . . . , xn−1 , B) = (αn (x0 , . . . , xn−1 ))−1
h B
r dr αn (x0 , . . . , xn−1 )
for any x0 , . . . , xn−1 ∈ D\D ∗ , B ⊂ R+ measurable (cf. Example 14.1). Furthermore, let h be positive on the interval (0, c) Then:
(c > 0).
(i) If for an a > f ∗ the following condition ⎧ ⎪ ⎪ ⎨There exist α0 , α1 > 0 with 0 ,...,xn−1 ) α0 ≤ αn|x(xn−1 −x ∗ | ≤ α1 ⎪ ⎪ ⎩for any n ∈ N, x , . . . , x ∗ 0 n−1 ∈ Da \D .
(15.6)
holds, then there is an a0 > f ∗ such that (Xn : n ∈ N0 ) converges linearly for any starting point x0 ∈ Da0 in the mean and P -a.s. (ii) Provided that D is convex, the function f is continuous and convex, and for each a > f ∗ Condition (15.6) holds, then (Xn : n ∈ N0 ) converges globally linear in the mean and P -a.s. Proof According to Lemma 10.1 the integration property (10.4) is satisfied with μn (x0 , x1 , . . . , xn−1 , s, ·) = λn (x0 , x1 , . . . , xn−1 , ·) and ρn (x0 , x1 , . . . , xn−1 , ·) = γ . Obviously, for each a > f ∗ condition (15.4) holds. Hence, it remains to show that—under the assumptions of Theorem 15.2—assumption (15.6) yields conditions (15.3) and (15.5). Let (15.6) be satisfied for a > f ∗ . Moreover, let x0 , . . . , xn−1 ∈ Da \D ∗ and s ∈ S be arbitrary points. Then:
rμn (x0 , x1 , . . . , xn−1 , s, dr) 0, 12 |xn−1 −x ∗ |
=
0, 12 |xn−1 −x ∗ |
rλn (x0 , x1 , . . . , xn−1 , dr)
15.1 Methods with a Rate of Convergence that Is at Least Linear
245
r r = h dr αn (x0 , x1 , . . . , xn−1 ) αn (x0 , x1 , . . . , xn−1 ) 0
1 |xn−1 −x ∗ |αn −1 2 rh(r)dr = αn (x0 , x1 , . . . , xn−1 ) ·
1 ∗ 2 |xn−1 −x |
0 ∗
1 −1 2 α1
≥| xn−1 − x | α0 ·
! rh(r)dr .
0
Since h is positive on the interval (0, c), also the constant in the round brackets is positive. This yields (15.3). If ε ∈ 0, 12 is chosen so small that αε10 < c and αε10 < 12 α1 −1 , then under the above assumptions the constant
σ :=
1 −1 2 α1
ε1 α0 −1
h(r)dr
is positive. A simple calculation shows that 9 : 1 μn x0 , x1 , . . . , xn−1 , s, ε1 | xn−1 − x ∗ |, | xn−1 − x ∗ | ≥ σ 2 Thus, also (15.5) holds. Now, Theorem 15.1 yields the assertion.
˚ D are compact, and the function f Remark 15.3 Suppose that D ∗ = {x ∗ } ⊂ D, is continuous and twice continuous differentiable on a neighborhood of x ∗ with a positive definite Hessian at x ∗ . Assume that (1) αn (x0 , x1 , . . . , xn−1 ) =| A(xn−1 )∇f (xn−1 ) | with a continuous matrix function A : Rd → Rd×d , where A is invertible at ∗ x . (2) αn (x0 , x1 , . . . , xn−1 ) =| ∇f (xn−1 ) | η(xn−1 ) with a continuous function η : Rd → (0, ∞). Then condition (15.6) is satisfied for an a > f ∗ . In addition, according to Remark 15.1, also the assumptions (15.1) and (15.2) hold. Proof In the first case the regularity of A(x ∗ ) and the continuity of A yield that A(x) is invertible for each x contained in a certain neighborhood of K(x ∗ , r). Hence,
246
15 Random Search Methods with a Linear Rate of Convergence
the matrices (A(x))T A(x) are positive definite (and of course symmetric) for any x ∈ K(x ∗ , r). Define now λ1 := max λ(x) : λ(x) eigenvalue of (A(x))T A(x), x ∈ K(x ∗ , r) and λ0 := min λ(x) : λ(x) eigenvalue of (A(x))T A(x), x ∈ K(x ∗ , r) . For x ∈ K(x ∗ , r) we have then the following estimate (cf. Lemma B.7): | ∇f (x) |
C
λ0 ≤| A(x)∇f (x) |≤| ∇f (x) |
C
λ1 .
In the first case we get the assertion due to λ0 > 0 and λ1 < ∞, since according to Lemma B.13, for an a > f ∗ the set Da is included in K(x ∗ , r). In the second case it follows from Lemma B.13 und Lemma B.10 that there exist an a > f ∗ and positive constants b1 , b2 with b1 ≤
| ∇f (x) | ≤ b2 | x − x∗ |
for any x ∈ Da \D ∗ .
Furthermore, due to the continuity of η and the compactness of Da , constants c1 , c2 > 0 exist with c1 ≤ η(x) ≤ c2
for all x ∈ Da ,
cf. Lemma B.4. Therefore, b1 c1 ≤
| ∇f (x) | η(x) ≤ b2 c2 | x − x∗ |
which yields the assertion in the second case.
for any x ∈ Da \D ∗
Remark 15.4 Dropping in Remark 15.2 the requirement for compactness of D and demands instead that D := Rd and f is a convex, twice continuously differentiable function, then according to Remark 15.1 assumptions (15.1) and (15.2) are satisfied again. Let αn be defined as in (1) and (2), respectively, of Remark 15.2, where in the first case A(x) is assumed to be invertible for any x ∈ Rd . Then condition (15.6) holds for all a > f ∗ . Taking into account that all sets Da (a ≥ f ∗ ) are compact (cf. Lemma B.4), the proof follows similarly to that of Remark 15.2.
15.1 Methods with a Rate of Convergence that Is at Least Linear
247
Example 15.1 Let (Xn : n ∈ N0 ) be a stationary random search procedure having the step length distribution λ. Let λ(x, ·) denote the uniform distribution on the interval (0, α(x)). Hence, λ(x, ·) has the density r 1 1 1(0,α(x)) (r) = 1(0,1) α(x) α(x) α(x) r 1 h with h(r) = 1(0,1) (r). = α(x) α(x)
g(x, r) =
Suppose that D = Rd , D ∗ = {x ∗ }, and let the function f be convex and twice continuously differentiable with a positive definite Hessian at x ∗ . From Remark 15.3 and Theorem 15.2 it follows then that α(x) =| ∇f (x) | yields a procedure (Xn : n ∈ N0 ) being globally linear convergent in the mean and P -a.s. Analogously to Theorem 15.2, for methods with an absolutely continuous mutation sequence we have the following result, where its proof is again primarily based on Theorem 15.1. Theorem 15.3 Assume that the assumptions (15.1) and (15.2) are satisfied. Let (Xn : n ∈ N0 ) denote a random search procedure with an absolutely continuous mutation sequence (pn : n ∈ N) and associated sequence of densities (gn : n ∈ N). Let h1 , h2 be measurable, nonnegative functions on R+ . Suppose that h1 is positiveon the interval (0, c) (c > 0). Furthermore, assume that R+ r d−1 h2 (r)λ1 (dr) < ∞. Then: (i) Provided that for an a > f ∗ the following condition ⎧ There are sequences of measurable, positive functions ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (αn : n ∈ N) und (τn : n ∈ N) as well as positive constants ⎪ ⎪ ⎪ ⎪ ⎪ α0 , α1 such that ⎪ ⎪ ⎪ ) ⎪ ⎨α0 ≤ αn (x0 ,...,xn−1 and |xn−1 −x ∗ | ≤ α1 |z| ⎪ (αn (x0 , . . . , xn−1 ))−d h1 αn (x0 ,...,x ⎪ ) n−1 ⎪ ⎪ ⎪ ⎪ ⎪ (x , . . . , x , x + z) ≤ g n 0 n−1 n−1 ⎪ ⎪ ⎪ |z| ⎪ −d ⎪ ≤ (x , . . . , x )) h (τ n 0 n−1 2 ⎪ τn (x0 ,...,xn−1 ) ⎪ ⎪ ⎩ for any n ∈ N, x0 , . . . , xn−1 ∈ Da \D ∗ , z ∈ Rd
(15.7)
248
15 Random Search Methods with a Linear Rate of Convergence
is satisfied, then an a0 > f ∗ exists such that (Xn : n ∈ N0 ) converges linearly in the mean and P − a.s. for any starting point x0 ∈ Da0 . (ii) If D is convex and f is continuous and convex, and if furthermore condition (15.7) holds for any a > f ∗ , then (Xn : n ∈ N0 ) is globally linearly convergent in the mean and P -a.s. Proof According to Lemma 10.1, for μn and ρn defined according to (10.3), the integration property (10.4) is satisfied. Theorem 15.1 yields then the assertion, provided that the conditions (15.3), (15.4) and (15.5) are satisfied for an a > f ∗ . In the following it will be shown that this is true. According to (15.3) we have d
ρn (x0 , . . . , xn−1 , A) =
2π 2 ( d2 )
A R+
r d−1 gn (x0 , . . . , xn−1 , xn−1 + rs)·
· λ1 (dr)γ (ds). From (15.7) we get then :d r 1 ρn (x0 , . . . , xn−1 , A) ≥ · d α (x , . . . , x ) r ( 2 ) A R+ n 0 n−1 r drγ (ds) · h1 αn (x0 , . . . , xn−1 ) ! d
2π 2 d−1 = u h1 (u)du γ (A). ( d2 ) R+ d
2π 2
9
According to the assumption, the constant in the brackets is positive. This implies condition (15.4). The term occurring in (10.3) at μn , hence,
R+
r d−1 gn (x0 , . . . , xn−1 , xn−1 + rs)λ1 (dr)
can be estimated from below—as just shown—by R+ ud−1 h1 (u)du > 0. Moreover, this term can be estimated from above by I := R+ ud−1 h2 (u)du. Since by assumption I < ∞, we get
0, 12 |xn−1 −x ∗ |
rμn (x0 , x1 , . . . , xn−1 , s, dr)
15.1 Methods with a Rate of Convergence that Is at Least Linear
1 I
≥
1 ≥ I
249
· r d−1 gn (x0 , . . . , xn−1 , xn−1 + rs)λ1 (dr)
r 0, 12 |xn−1 −x ∗ |
0, 12 |xn−1 −x ∗ |
αn (x0 , . . . , xn−1 ) = I α0 I
≥| xn−1 − x ∗ |
r αn (x0 , . . . , xn−1 )
|xn−1 −x ∗ | 2αn (x0 ,...,xn−1 )
1 −1 2 α1
h1
r dr αn (x0 , . . . , xn−1 )
ud h1 (u)du
0
d
! ud h1 (u)du .
0
This implies (15.3). Condition (15.5) can be shown in the same way (cf. also the proof of Theorem 15.2). Example 15.2 (1) Let r : Rd → R+ be measurable. Moreover, let pn (x0 , . . . , xn−1 , ·) denote the uniform distribution on the ddimensional rectangle Rxn−1 := xn−1 + (−r(xn−1 ), r(xn−1 ))d . Here, gn (x0 , . . . , xn−1 , yn ) = (2r(xn−1 ))−d 1Rxn−1 (yn ), hence, gn (x0 , . . . , xn−1 , xn−1 + z) = (2r(xn−1 ))−d 1(−r(x ),r(x ))d (z) n−1 n−1 z . = (2r(xn−1 ))−d 1(−1,1)d r(xn−1 ) Because of K(0, 1) ⊂ (−1, 1)d ⊂ K(0, (2r(xn−1 ))
−d
1K(0,1)
d), we have
z r(xn−1 )
≤ (2r(xn−1 ))−d 1K(0,√d)
√
≤ gn (x0 , . . . , xn−1 , xn−1 + z) z
r(xn−1 )
250
15 Random Search Methods with a Linear Rate of Convergence
and therefore (2r(xn−1 ))
−d
1(0,1)
|z| r(xn−1 )
≤ gn (x0 , . . . , xn−1 , xn−1 + z)
|z| . r(xn−1 )
≤ (2r(xn−1 ))−d 1(0,√d) Choosing
r(x) :=| ∇f (x) |, then with αn (x0 , . . . , xn−1 ) := r(xn−1 )
h1 := 1(0,1)
h2 := 1(0,√d)
and
and under suitable assumptions on D and f (cf. Remark 15.2, Remark 15.3), resp., condition (15.7) is satisfied for an (for all, resp.) a > f ∗ . (2) Let pn (x0 , . . . , xn−1 , ·) denote a d-dimensional normal distribution with mean xn−1 and regular covariance matrix C = C(xn−1 ). Then, d 1 1 gn (x0 , . . . , xn−1 , xn−1 + z) = (2π )− 2 | C(xn−1 ) |− 2 exp (− zT C(xn−1 )−1 z). 2
With λ1 := max {λ : λ eigenvalue of C} and λ0 := min{λ : λ eigenvalue of C} we get − d2
λ1
1
− d2
≤ | C |− 2 ≤ λ0
and 2 2 − 21 − 12 T −1 | z | λ1 ≤ z C z ≤ | z | λ0 . This yields the estimate (2π )
− d2
λ0 λ1
d
2
−d λ0 2
2 ! 1 − 12 | z | λ0 exp − 2
≤ gn (x0 , . . . , xn−1 , xn−1 + z) − d2
≤ (2π )
λ1 λ0
d
2
−d λ1 2
2 ! 1 − 21 | z | λ1 exp − . 2
15.1 Methods with a Rate of Convergence that Is at Least Linear
251
Supposing that for a given a > f ∗ the following condition ⎧ ⎪ α ,α > 0 ⎪ ⎨There are √ 0 1 λ α0 ≤ |x−x ∗ | ≤ α1 ⎪ ⎪ ⎩
with for all x ∈ Da \D ∗ and all
(15.8)
eigenvalues λ of C(x)
holds, then for this given a > f ∗ condition (15.7) holds (with αn (x0 , . . . , C C d λ0(xn−1), τn (x0 , . . . , xn−1 ) = λ1 (xn−1 ), h1 (r) := (2π )− 2 xn−1 ) = d d d 2 2 α0 α1 −1 exp − r2 and h2 (r) := (2π )− 2 α1 α0 −1 exp − r2 ). If D = Rd , D ∗ = {x ∗ } and if f is a twice continuously differentiable function with a Hessian being positive definite at all points, then (15.1) and (15.2) are satisfied. This follows from Remark 15.1 taking into account that the positive definiteness of the Hessian implies the convexity of f (cf. Lemma B.4). Choosing C(x) = | ∇f (x) |2 I
(I = d-dimensional unit matrix),
then the Remark 15.3 implies the validity of (15.8). Choosing (cf. [3]) C(x) = ∇f (x)T H −1 (x)∇f (x) H −1 (x) (H = Hessian of f ), then the supposition (15.8) is satisfied as well. This follows from Lemma B.14. Hence, according to Theorem 15.3, in both cases we know that the corresponding procedure converges globally linear in the mean and P -a.s. Conversely, choosing C(x) ≡ C
(C regular),
according to Remark 14.4 and Theorem 14.1, for all starting points x0 ∈ Rd we have the considerably smaller convergence rate 2 O n− d . The above two examples have two important properties in common. Firstly, the stationary case applies. Secondly, in both examples the mutation distribution p(x, ·) has the same structure. For if we choose in the first, second example, resp.
252
15 Random Search Methods with a Linear Rate of Convergence
Ax := r(x)I and for μ the uniform distribution on (−1, 1)d ,
1
Ax := (C(x)) 2
i.e. Ax Ax T = C(x)
and for μ the d-dimensional standard normal distribution, then in both cases we have p(x, ·) = τx (μ)
τx (z) := x + Ax z.
with
Let us consider now this situation in more detail. According to the transformation rule for densities, p(x, ·) has the Lebesgue density g(x, y) =
1 h Ax −1 (y − x) | det Ax |
provided that μ has the density h and Ax is regular. Furthermore, if functions h1 , h2 : R+ → R+ exist with h1 (| z |) ≤ h(z) ≤ h2 (| z |)
for all z = 0
(this holds in both examples), then—analogously to the second example—we get the following chain of inequalities:
λ0 λ1
d
2
−d −1 λ0 2 h1 | z | λ0 2 ≤ g(x, x + z) ≤
λ1 λ0
d
2
−d −1 λ1 2 h2 | z | λ 1 2 ,
where λ0 = λ0 (x) (λ1 = λ1 (x)) denotes the smallest (the largest, resp.) eigenvalue of Ax Ax T . Note that under suitable additional conditions by means of Theorem 15.3 it is possible to show—just as in the second example—a linear rate of convergence. However, the next theorem shows that in case p(x, ·) = τx (μ) we don’t need very restrictive requirements concerning the shape of h. Theorem 15.4 Let the assumptions (15.1) and (15.2) be satisfied.
15.1 Methods with a Rate of Convergence that Is at Least Linear
253
Let μ be a probability measure on (Rd , Bd ) having the Lebesgue density h. Moreover, suppose that there are positive constants σ1 , σ2 such that h(x) ≥ σ1 for any x ∈ K(0, σ2 ). Let denote (Ax : x ∈ D) a family of regular d × d-matrices, and assume that the mapping x → Ax is continuous. Consider a stationary random search procedure (Xn : n ∈ N0 ) having the mutation transition probability p given by p(x, ·) = τx (μ)
, where τx (y) := x + Ax y.
Then, (i) Assuming that for an a > f ∗ the following condition ⎧ ⎪ ⎪ ⎨There are α0 , α1 > 0 α ≤ ⎪ 0 ⎪ ⎩
√ λ |x−x ∗ |
≤ α1
with for any x ∈ Da \D ∗ and all eigenvalues λ of
(15.9)
Ax T Ax
is satisfied, then there is an a0 > f ∗ such that for any starting point x0 ∈ Da0 the algorithm (Xn : n ∈ N0 ) converges linearly in the mean and P -a.s. (ii) Let D be convex and f be continuous and convex. Furthermore, if for any a > f ∗ the condition (15.9) is satisfied, then (Xn : n ∈ N0 ) is globally linearly convergent in the mean and P -a.s. Proof (1) Let (15.9) be satisfied for an a > f ∗ . If in Lemma B.9 the set D is replaced by Dα (α selected according to assumption then together with assumption (15.2) we have the existence (15.1)), B of a φ ∈ 0, π2 with 1 x + C φ, x ∗ − x, | x ∗ − x | 2 6 4 |y−x | ∗ f (x) − f ⊂ y ∈ D : f (x) − f (y) ≥ 3 | x∗ − x | ⊂ {f ≤ f (x)}
for any x ∈ Da \D ∗ .
Let a0 := min {a, α}, and let denote x0 an arbitrary starting point from Da0 . Without loss of generality we may assume that f (x0 ) > f ∗ . Furthermore, let x ∈ Df (x0 ) \D ∗ be an arbitrary point. Because of Df (x0 ) \D ∗ ⊂ Da0 \D ∗ ⊂ Dα \D ∗
254
15 Random Search Methods with a Linear Rate of Convergence
we have
{f ≤f (x)}
(f (x) − f (y)) p(x, dy)
≥
f (x) − f ∗ 3 | x − x∗ |
=
f (x) − f ∗ 3 | x − x∗ |
f (x) − f ∗ = 3 | x − x∗ |
x+C φ,x ∗ −x, 12 |x−x ∗ |
| x − y | τx (μ)(dy)
τx −1 x+C φ,x ∗ −x, 12 |x ∗ −x| Ax −1 C φ,x ∗ −x, 12 |x ∗ −x|
| Ax y | μ(dy)
| Ax y | μ(dy).
According to Lemma B.9, there is ϕ > 0 with 1 ∗ ∗ C φ, x − x, | x − x | Ax 2 1 1 ⊃ C ϕ, Ax −1 (x ∗ − x), | x ∗ − x | λ1 − 2 2 −1
' & where λ1 := max λ : λ eigenvalue of Ax T Ax . Note that Df (x0 ) is compact—due to Lemma B.4. Because of 1 1 1 | x − x ∗ | λ1 − 2 ≥ α1 −1 2 2
with z := Ax −1 (x ∗ − x) it follows that
(f (x) − f (y)) p(x, dy) {f ≤f (x)}
≥
f (x) − f ∗ 3 | x − x∗ |
C ϕ,z, 12 α1 −1
f (x) − f ∗ 1 λ0 2 ≥ 3 | x − x∗ |
| Ax y | h(y)λd (dy)
C ϕ,z, 12 α1 −1
| y | h(y)λd (dy),
' & where λ0 := min λ : λ eigenvalue of Ax T Ax . From (15.9) we get f (x) − f ∗ 1 1 λ0 2 ≥ α0 f (x) − f ∗ . ∗ 3|x−x | 3
15.1 Methods with a Rate of Convergence that Is at Least Linear
255
· min { 12 α1 −1 , σ2 }. Because of C ϕ, z, 12 α1 −1 ⊃ C ϕ, z, σ3 , 12 α1 −1 and due to h(y) ≥ σ1 for any y ∈ C ϕ, z, σ3 , 12 α1 −1 , we find
Let σ3 :=
1 2
C ϕ,z, 12 α1 −1
1 −1 . | y | h(y)λ (dy) ≥ σ3 σ1 λ C ϕ, z, σ3 , α1 2 d
d
From the rotation invariance of the Lebesgue measure (cf. also Appendix A in the appendix) it follows that 1 λd C ϕ, z, σ3 , α1 −1 2 does not depend on z. Therefore, a c > 0 exists such that for any x ∈ Df (x0 ) \D ∗ we have
{f ≤f (x)}
(f (x) − f (y)) p(x, dy) ≥ c f (x) − f ∗ .
Hence, Lemma 15.1 yields then the linear convergence in the mean. ∗ (2) Select again an arbitrary point x ∈ Df (x0 ) \D . Furthermore, consider an arbitrary ε ∈ 0, 12 . According to Lemma B.9, there exists a ϕ > 0 with
1 Ax −1 C φ, x ∗ − x, ε | x ∗ − x |, | x ∗ − x | 2 1 1 1 ⊃ C ϕ, Ax −1 (x ∗ − x), ε | x ∗ − x | λ0 − 2 , | x ∗ − x | λ1 − 2 2 (λ0 , λ1 as in (1)) −1 ∗ −1 1 −1 ⊃ C ϕ, Ax (x − x), εα0 , α1 2 (according to (15.9)). If now ε ∈ 0, 12 is chosen as small that
ε α0
< 12 α1 −1 , then it follows that
ε f (x) − f ∗ p x, y ∈ D : f (y) − f ∗ ≤ 1 − 3 ε f (x) − f ∗ ≥ p x, y ∈ D : f (x) − f (y) ≥ 3
256
15 Random Search Methods with a Linear Rate of Convergence
1 ∗ ∗ ∗ ≥ p x, x + C φ, x − x, ε | x − x |, | x − x | 2 1 = μ Ax −1 C φ, x ∗ − x, ε | x ∗ − x |, | x ∗ − x | 2 1 . ≥ μ C ϕ, Ax −1 (x ∗ − x), εα0 −1 , α1 −1 2 The second inequality follows from the inclusion 1 ∗ ∗ x + C φ, x − x, | x − x | 2 4 6 |y−x | ∗ ⊂ y ∈ D : f (x) − f (y) ≥ f (x) − f 3 | x∗ − x | and the definition of C φ, x ∗ − x, ε | x ∗ − x |, 12 | x ∗ − x | (cf. Definition B.1). Analogously to part (1), the linear convergence P -a.s. follows now from Lemma 15.2. (3) Suppose that D is convex and f is a convex and continuous function. Consider an arbitrary point x0 ∈ D\D ∗ . According to the assumption, condition (15.9) is satisfied for a = f (x0 ). According to Lemma B.11, there is again a φ > 0 with 1 ∗ ∗ x + C φ, x − x, | x − x | 2 6 4 |y−x | ∗ f (x) − f ⊂ y ∈ D : f (x) − f (y) ≥ 3 | x∗ − x | ⊂ {f ≤ f (x)}
for any x ∈ Da \D ∗ .
Now the statements analogously to (i) follow exactly as in part (1) and part (2).
15.2 Methods with a Rate of Convergence that Is at Most Linear Under suitable assumptions, the preceding theorems provide statements about a linear rate of convergence, i.e. it was shown the existence of a constant q > 1 such that
15.2 Methods with a Rate of Convergence that Is at Most Linear
257
and q n E f (Xn ) − f ∗ → 0 q n f (Xn ) − f ∗ → 0 P -a.s. At least theoretically, a procedure with a linear rate of convergence could converge still faster, e.g. superlinear. Hence, we are interested therefore to make also estimates of the convergence rate in the other direction. It will turn out (cf. Theorems 15.5, 15.6, 15.7, and 15.8) that under assumptions implying a linear rate of convergence (cf. Theorems 15.2 and 15.3), stronger results are not possible. Indeed, we will show the existence of an r > 1 with r n E f (Xn ) − f ∗ → ∞ and n ∗ r f (Xn ) − f → ∞ P -a.s. Thus, the rate of convergence cannot be superlinear: In the other case, i.e. if, e.g., the sequence E (f (Xn ) − f ∗ ) would have a superlinear rate of convergence, hence, E (f (Xn+1 ) − f ∗ ) = 0, n→∞ E (f (Xn ) − f ∗ ) lim
then—as it is easy to see—we would have r n E f (Xn ) − f ∗ → 0
for any r > 1
The same holds true of course also for the sequence f (Xn ) − f ∗ . The rates of convergence of the sequences E (f (Xn ) − f ∗ ) and f (Xn ) − f ∗ are not necessarily always identical. As the next example shows, they may differ considerably. It is therefore not sufficient to limit the examination of the rate of convergence to one of the two sequences. Example 15.3 Let D = R and consider the function f (x) =
( 0 | x | −1
if
x ∈ [−1, 1]
else.
Let (Xn : n ∈ N0 ) denote a stationary random search procedure having a mutation distribution p(x, ·) being normal with mean x and fixed variance σ 2 > 0. Because of D ∗ = [−1, 1], it follows in particular that λ1 (D ∗ ) > 0. Therefore assumption (14.4) is satisfied. From the Remarks 14.4, 14.2, resp., we obtain the validity of the assumptions (14.1) and (14.2). Theorem 14.1, part (vii) firstly shows now that the sequence E (f (Xn ) − f ∗ ) is globally linear convergent, and, secondly, that the sequence f (Xn ) − f ∗ is P -a.s. convergent of any order, i.e.
258
15 Random Search Methods with a Linear Rate of Convergence
an f (Xn ) − f ∗ → 0
P -a.s.
for each sequence (an : n ∈ N) in R. Theorem 15.5 Suppose that D ∗ = ∅, and D and f are convex. Let (Xn : n ∈ N0 ) be a random direction procedure. Then pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) ≤
1 2
for any n ∈ N, x0 , . . . , xn−1 ∈ D\D ∗ and E f (Xn ) − f ∗ ≥ 2−n f (x0 ) − f ∗ for any n ∈ N and all starting points x0 ∈ D. & ' Proof For z ∈ Rd \{0} define Hz := y ∈ Rd : y T z ≤ 0 . For arbitrary n ∈ N, x0 , . . . , xn−1 ∈ D and z ∈ Rd \{0} we have then: pn (x0 , . . . , xn−1 , xn−1 + Hz )
= (γ ⊗ λn (x0 , . . . , xn−1 , ·)) d(s, r) {(s,r)∈S×R+ :sr∈Hz }
=
γ (ds)λn (x0 , . . . , xn−1 , dr) {s∈S:s T z≤0} R+ 1 = γ s ∈ S : sT z ≤ 0 = . 2 Since f is convex, also the level set {f ≤ f (x)} is convex. Furthermore, according to Lemma B.6 we have {f = f (x)} ⊂ rd{f ≤ f (x)} for x ∈ D\D ∗ . Hence, the set {f ≤ f (x)} has a supporting hyperplane at x. Consequently, there is a z = 0 with (cf. Lemma B.5) {f ≤ f (x)} ⊂ x + Hz . This yields pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) ≤ 12 and therefore
1 f (xn−1 ) − f ∗ . (f (xn−1 ) − f (yn )) pn (x0 , . . . , xn−1 , dyn ) ≤ 2 {f 0 and c ∈ (0, 1] such that ' ε a & pn x0 , . . . , xn−1 , y ∈ D : f (y) − f ∗ ≤ ε f (xn−1 ) − f ∗ ≤ c for any ε ∈ (0, c], n ∈ N and x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ . For n ∈ N define Zn :=
f (Xn )−f ∗ cn (f (x0 )−f ∗ )
a
.
15.2 Methods with a Rate of Convergence that Is at Most Linear
261
Then for any n ∈ N and x ∈ (0, 1] we have P (Zn ≤ x) ≤ x
n−1 1 i 1 ln . i! x i=0
Proof
B (1) Next we show that for x ∈ (0, c (f (x0 ) − f ∗ ) and n ∈ N we have P f (Xn ) − f ∗ ≤ x | f (Xn−1 ) − f ∗ = y a x 1( x ,f (x0 )−f ∗ B (y) + 1A0, x B (y) ≤ c c cy Pf (Xn−1 )−f ∗ − a.s. Obviously, this holds for y = 0. Hence, to prove the assertion, the following B inequality must be shown for B ⊂ (0, f (x0 ) − f ∗ . Note that Pf (Xn−1 )−f ∗ A arbitrary measurable B ∗ 0, f (x0 ) − f = 1: P
' & ' f (Xn ) − f ∗ ≤ x ∩ f (Xn−1 ) − f ∗ ∈ B
a x 1( x ,f (x0 )−f ∗ B (y) + 1A0, x B (y) Pf (Xn−1 )−f ∗ (dy). ≤ c c cy B
&
B B Thus, let B ⊂ (0, f (x0 ) − f ∗ and take x ∈ (0, c (f (x0 ) − f ∗ ) . Then: P
' & ' f (Xn ) − f ∗ ≤ x ∩ f (Xn−1 ) − f ∗ ∈ B
= P f (Xn ) − f ∗ ≤ x | Xn−1 , . . . , X0 dP {f (Xn−1 )−f ∗ ∈B } 4
= qn X0 , . . . , Xn−1 , y ∈ D : f (y) − f ∗ ≤ {f (Xn−1 )−f ∗ ∈B } 6 x ∗ f (X dP ) − f ≤ n−1 f (Xn−1 ) − f ∗
= pn (. . .)dP + x ∈(0,c) {f (Xn−1 )−f ∗ ∈B }∩ f (Xn−1 )−f ∗
+ qn (. . .)dP x ≥c {f (Xn−1 )−f ∗ ∈B }∩ f (Xn−1 )−f ∗ a
x ≤ · B ∗ {f (Xn−1 )−f ∗ ∈B }∩{f (Xn−1 )−f ∗ ∈( xc ,f (x0 )−f ∗ } c (f (Xn−1 ) − f )
&
262
15 Random Search Methods with a Linear Rate of Convergence
· dP +
=
{f (Xn−1 )−f ∗ ∈B }
1A0, x B f (Xn−1 ) − f ∗ dP c
a
x
c (f (Xn−1 ) − f ∗ )
{f (Xn−1 )−f ∗ ∈B }
1( x ,f (x0 )−f ∗ B · c
∗ ∗ A B dP · f (Xn−1 ) − f + 1 0, x f (Xn−1 ) − f c
= B
x cy
a 1( x ,f (x0 )−f ∗ c
B (y) + 1A
B x (y)
0, c
Pf (Xn−1 )−f ∗ (dy).
(2) For n ∈ N and x ≥ 0 define Fn (x) := P f (Xn ) − f ∗ ≤ x and > n i ? a n−1 x c (f (x0 ) − f ∗ ) a 1 ln Gn (x) := · cn (f (x0 ) − f ∗ ) i! x i=0
· 1(0,cn (f (x0 )−f ∗ )] (x) + 1(cn (f (x0 )−f ∗ ),∞) (x). Furthermore, for x ≥ 0 and a fixed n ∈ N let ! n−1 1 G(x) := x (− ln x)i 1(0,1] (x) + 1(1,∞) (x). i! i=0
For x ∈ (0, 1) we have then G (x) =
n−1 n−1 1 1 (− ln x)i − (− ln x)i−1 i! (i − 1)! i=0
=
i=1
(− ln x)n−1 > 0, (n − 1)!
which means that G is a distribution function. a Hence, because of Gn (x) = G cn (f (xx0 )−f ∗ ) , also Gn is a distribution function having the Lebesgue density > gn (x) :=
9 9 n : :n−1 ? ax a−1 c (f (x0 ) − f ∗ ) a ln · (n − 1)! (cn (f (x0 ) − f ∗ ))a x
· 1[0,cn (f (x0 )−f ∗ )] (x). According to the definition B of Zn , the assertion follows by showing for any x ∈ (0, cn (f (x0 ) − f ∗ ) that the inequality
15.2 Methods with a Rate of Convergence that Is at Most Linear
263
P f (Xn ) − f ∗ ≤ x ≤ Gn (x) holds. This will be done by means of complete induction. (3) Let n = 1. Because of P (f (X0 ) − f ∗ = f (x0 ) − f ∗ ) = 1, for arbitrary x (0, c (f (x0 ) − f ∗ )) we have
∈
P f (X1 ) − f ∗ ≤ x = P f (X1 ) − f ∗ ≤ x | f (X0 ) − f ∗ = f (x0 ) − f ∗ a x 1( x ,f (x0 )−f ∗ B f (x0 ) − f ∗ + ≤ ∗ c c (f (x0 ) − f ) a x . + 1A0, x B f (x0 ) − f ∗ = c c (f (x0 ) − f ∗ ) Due to c (f (x0 ) − f ∗ ) ≤ f (x0 ) − f ∗ and P (f (X1 ) − f ∗ ≤ f (x0 ) − f ∗ ) = 1, obviously, the above inequality holds also for x = c (f (x0 ) − f ∗ ). Consider the step n → An + 1: B Select an arbitrary x ∈ 0, cn+1 (f (x0 ) − f ∗ ) . With b := f (x0 ) − f ∗ and because of Fn (b) = 1 we get P f (Xn+1 ) − f ∗ ≤ x
P f (Xn+1 ) − f ∗ ≤ x | f (Xn ) − f ∗ = y Fn (dy) = [0,b]
x a A B B ≤ 1( x ,b (y) + 1 0, x (y) Fn (dy) c c cy [0,b]
x x a −a + = Fn B y Fn (dy) x c c ( c ,b !
x x a −a b −a−1 + Fn (y)dy y Fn (y) | x +a · = Fn By c c c ( xc ,b x x a + ≤ Fn · c c !
x x −a −a −a−1 +a· Fn Gn (y)dy , · b Fn (b) − By c c ( xc ,b
where the last inequality follows from induction assumption, and the last equation follows from the rule of integration by parts. Note that the use of this rule is not obvious, because here we do not have a Riemann integral, but a Lebesgue integral.
264
15 Random Search Methods with a Linear Rate of Convergence
However, replacing in the corresponding integral above the term y −a by x −a c
−a·
(
x c ,y
Bt
−a−1
dt,
then we obtain a double integral. Applying Fubini’s theorem, i.e. by interchanging the integrals, the rule of integration by parts follows in the present case. again the rule of the integration by parts to the integral Applying B y −a−1 G (y)dy, we find x n ( ,b c
P f (Xn+1 ) − f ∗ ≤ x !
x a x a −a b −a + ≤ −y Gn (y) | x + B y gn (y)dy c bc c ( xc ,b x b x a x a x −a −a + = + Gn y −a gn (y)dy −b + x bc c c c c = Gn
= Gn
x c
+
x a
c
cn b x c
!
y −a gn (y)dy
x a cn b a + y a−1 y −a · x c (n − 1)!(cn b)a c c
x
n−1 dy · ln(cn b)a − a ln y = Gn
x
ax a + c (n − 1)!(cn+1 b)a
x
n n c b −1 n a ln(c b) − a ln y x an c
x a 1 x n + n+1 ln(cn b)a − a ln = Gn c c c b n! ? > a n x x a 1 cn+1 b + n+1 = Gn = Gn+1 (x), ln c x c b n! since
x c
∈ [0, cn b].
Lemma 15.4 Let the assumptions of Lemma 15.3 be satisfied. Then there is a q = q(x0 ) > 1 with q n f (Xn ) − f ∗ → ∞
P -a.s.
15.2 Methods with a Rate of Convergence that Is at Most Linear
265
Proof
a ∗ (1) Defining Zn := cnf(f(X(xn0)−f ∗ )−f ) , for x ∈ [0, 1] we get 1 P (Zn ≤ x) = P f (Xn ) − f ∗ ≤ x a cn f (x0 ) − f ∗ ≤x
n−1 1 i 1 ln . i! x i=0
To prove the assertion, it is sufficient to show that q n Zn → ∞
P -a.s.
holds for a q > 1. Indeed, in this case we get 1
qa c
!n
f (Xn ) − f ∗ → ∞
P -a.s.
(2) A sufficient, but not necessary condition for q n Zn → ∞ P − a.s. reads: ∞ n=0
K P Zn ≤ n < ∞ q
for any K ∈ N.
Namely, in this case, the Lemma of Borel–Cantelli yields P q n Zn ≤ K for ∞-many n ∈ N = 0 , hence P q n Zn > K for finally all n ∈ N = 1 for any K ∈ N and therefore P lim inf q n Zn ≥ K = 1 for any K ∈ N. n→∞
(3) Let q > 1 be chosen such that the inequality 4e ln q < q − 1 is satisfied (e.g. q ≥ 41, 404). Let K ∈ N be arbitrary. Because of q n → ∞, there is an N ∈ N with Kq −n ≤ 1 for any n ≥ N .
266
15 Random Search Methods with a Linear Rate of Convergence
Then: ∞
∞ P Zn ≤ Kq −n ≤ N + P Zn ≤ Kq −n n=N
n=0
≤N +K ·
∞
n−1 i 1 ln q n − ln K i!
q −n
n=N
≤N +K ·
∞
i=0
q −n
n=1
=N +K ·
∞
n−1 i 1 2 ln q n i! i=0
q −n
n−1 (2n)i
n=1
=N +K ·
∞
i=0 ∞
i=0 n=i+1
=N +K ·
i!
∞ i=0
(ln q)i
(ln q 2 )i i −n nq i!
∞ ∞ (ln q 2 )i i=0
=N +K ·
i!
(n + 1 + i)i q −(n+i+1)
n=0 ∞
(ln q 2 )i −i−1 (n + 1 + i)i q −n . q i! n=0
According to Remark B.5 we have (n + 1 + i)i ≤ 2i (n + 1)i + (2i)i . Furthermore, due to Remark B.4 it is ∞ i! (n + 1) . . . (n + i)q −n = i+1 . n=0 1 − q1
This yields the estimate ∞ ∞ 1 −1 i −n i (n + 1 + i) q ≤ (2i) 1 − + 2i (n + 1)i q −n q n=0
n=0
∞ 1 −1 ≤ (2i)i 1 − + 2i (n + 1) . . . (n + i)q −n q n=0
1 −1 1 −i−1 i i = (2i) 1 − + 2 i! 1 − q q −1 1 1 −i−1 i i i ≤ 2 e i! 1 − + 2 i! 1 − q q
according to Remark B.6.
15.2 Methods with a Rate of Convergence that Is at Most Linear
267
From this it follows that ∞
(ln q 2 )i −i−1 q (n + 1 + i)i q −n i! ≤
n=0
4e ln q q
i
1 1 + q −1 q −1
4 ln q q −1
i
≤
4e ln q q −1
i
2 , q −1
hence, ∞ n=0
∞ 2K 4e ln q i −n ≤N+ P Zn ≤ Kq < ∞. q −1 q −1 i=0
This implies the assertion. Remark 15.5 Suppose that the assumptions of Lemma 15.3 are satisfied. According to the proof of Lemma 15.4, for q ≥ 42 it follows then that 1
qa c
!n
f (Xn ) − f ∗ → ∞
P -a.s.
The stochastic convergence towards ∞ is obtained for much smaller values of q. Here, it follows already for q > e that 1
qa c
!n
f (Xn ) − f ∗ → ∞
P − stoch.
Proof Consider a sequence (Yn : n ∈ N) of random variables having values in [0, 1] such that n−1 1 i 1 ln P (Yn ≤ x) = x i! x i=0
for x ∈ [0, 1]. Let then Qn := ln Y1n for n ∈ N. A simple calculation shows that Qn -distributed with the parameters 1 and n. The characteristic function ϕn of Qn is then given by ϕn (t) = (1 − it)−n , hence, the for characteristic function φn of n1 Qn we get
268
15 Random Search Methods with a Linear Rate of Convergence
φn (t) = ϕn
t it −n = 1− → eit . n n
However, eit is the characteristic function of the one-point probability distribution at the point 1. Because of the convergence of the characteristic function we obtain the convergence of n1 Qn towards constant 1 in distribution and therefore also P -stochastic. Consequently, −n
1 Qn − s n
P −st.
= −Qn + ns −→ ∞
if s > 1
and therefore s n n P −st. e Yn = es exp (−Qn ) −→ ∞
for s > 1,
hence r n Yn
P −stoch.
−→
∞
if r > e.
Defining Zn as in Lemma 15.3, and because of P (Zn ≤ x) ≤ P (Yn ≤ x) , for any K > 0 we get the inequality P (Zn > K) ≥ P (Yn > K) . Now, if r n Yn converges stochastically towards infinity, i.e. if P r n Yn > K → 1 then this also applies to the sequence r n Zn . Thus, the assertion is shown.
for any K > 0,
Lemma 15.5 Let (Xn : n ∈ N0 ) be a random search procedure with the mutation sequence (pn : n ∈ N). Suppose that there are transition probabilities ρn from (Rd )n to S and transition probabilities μn from (Rd )n × S to R+ fulfilling the integration property (10.4). Moreover, let the following assumptions be satisfied: (1) D is compact or D and f are convex ˚ (2) D ∗ = {x ∗ }, where x ∗ ∈ D
15.2 Methods with a Rate of Convergence that Is at Most Linear
269
(3) f is twice continuously differentiable in a neighborhood of x ∗ with a Hessian matrix that is positively definite at x ∗ . Furthermore, let f be continuous on D. (4) Forany x0 ∈ D\D ∗ there are constants a > 0 and c ∈ (0, 1] withB A a μn x0 , . . . , xn−1 , s, (1 − ε) | xn−1 − x ∗ |, (1 + ε) | xn−1 − x ∗ | ≤ εc for all n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ , ε ∈ (0, c], s ∈ S. Then, for any starting point x0 ∈ D\D ∗ , a constant q > 1 exists with q n f (Xn ) − f ∗ → ∞
P -a.s.
Proof According to Lemma B.13, the assumptions (1), (2), and (3) imply all assumptions of Lemma B.12. Hence, due to Corollary B.3, for any x0 ∈ D\D ∗ there exist constants d1 ∈ (0, 1] and d2 > 0 with √ {y ∈ D : f (y) − f ∗ ≤ ε f (x) − f ∗ } ⊂ K x ∗ , d2 ε | x − x ∗ | for any x ∈ Df (x0 ) and any ε ∈ (0, d1 ]. Without loss of generality we may assume that c ≤ d1 , and let x0 ∈ D\D ∗ be arbitrary. For any n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ and ε ∈ (0, c] we have & ' pn x0 , . . . , xn−1 , y ∈ D : f (y) − f ∗ ≤ ε f (xn−1 ) − f ∗ √ ≤ pn x0 , . . . , xn−1 , K x ∗ , d2 ε | xn−1 − x ∗ |
1K (x ∗ −xn−1 ,d2 √ε|xn−1 −x ∗ |) (yn − xn−1 ) pn (x0 , . . . , xn−1 , dyn ) = Rd
= S
R+
1K (x ∗ −xn−1 ,d2 √ε|xn−1 −x ∗ |) (rs)μn (x0 , . . . , xn−1 , s, dr)·
· ρn (x0 , . . . , xn−1 , ds)
= 1K (x ∗ ,d2 √ε|xn−1 −x ∗ |) (xn−1 + rs)μn (x0 , . . . , xn−1 , s, dr)· S
R+
· ρn (x0 , . . . , xn−1 , ds). If |r− | xn−1 − x ∗ || ≥ ρ, then | xn−1 + rs − x ∗ |≥ | rs | − | xn−1 − x ∗ | = r− | xn−1 − x ∗ | ≥ ρ. Thus, | xn−1 + rs − x ∗ |< ρ implies the inequality r− | xn−1 − x ∗ | < ρ.
270
15 Random Search Methods with a Linear Rate of Convergence
This yields 1K (x ∗ ,d2 √ε|xn−1 −x ∗ |) (xn−1 + rs) ≤ 1((1−d2 √ε)|xn−1 −x ∗ |,(1+d2 √ε)|xn−1 −x ∗ |) (r) and therefore ' & pn x0 , . . . , xn−1 , y ∈ D : f (y) − f ∗ ≤ ε f (xn−1 ) − f ∗
√ ≤ μn x0 , . . . , xn−1 , s, 1 − d2 ε | xn−1 − x ∗ |, S
√ 1 + d2 ε | xn−1 − x ∗ | · · ρn (x0 , . . . , xn−1 , ds). & ' Let τ := min c, c2 d2 −2 . √ Thus, if ε ∈ (0, τ ], then d2 ε ≤ c, and according to Assumption (4) we have ' & pn x0 , . . . , xn−1 , y ∈ D : f (y) − f ∗ ≤ ε f (xn−1 ) − f ∗ √ a a ε a d2 ε 2 2 ≤ = εc−2 d2 2 ≤ c τ for all ε ∈ (0, τ ].
The assertion follows now from Lemma 15.4.
Theorem 15.7 Let (Xn : n ∈ N0 ) denote a random direction procedure with the step length distribution λn . Let h be a bounded probability density on R+ , and let αn denote positive, measurable functions with λn (x0 , . . . , xn−1 , B) = (αn (x0 , . . . , xn−1 ))−1
h B
r dr αn (x0 , . . . , xn−1 )
∗
for any n ∈ N, x0 , . . . , xn−1 ∈ D\D , B ⊂ R+ measurable . Moreover, assume that the assumptions (1), (2), and (3) of Lemma 15.5 are satisfied. Provided that for all x0 ∈ D\D ∗ there exists a constant K < ∞ such that | xn−1 − x ∗ | ≤K αn (x0 , . . . , xn−1 )
for all n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ ,
then for each starting point x0 ∈ D\D ∗ there is a constant q > 1 with q n f (Xn ) − f ∗ → ∞
P -a.s.
15.2 Methods with a Rate of Convergence that Is at Most Linear
271
Proof Since (Xn : n ∈ N0 ) is a random direction procedure, according to Lemma 10.1 we have μn (x0 , . . . , xn−1 , s, ·) = λn (x0 , . . . , xn−1 , ·) . For arbitrary n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ and ε ∈ [0, 1] this yields μn x0 , . . . , xn−1 , s, (1 − ε) | xn−1 − x ∗ |, (1 + ε) | xn−1 − x ∗ |
(1+ε)|xn−1 −x ∗ | r −1 dr = (αn (x0 , . . . , xn−1 )) h αn (x0 , . . . , xn−1 ) (1−ε)|xn−1 −x ∗ |
=
(1+ε)|xn−1 −x ∗ |(αn (x0 ,...,xn−1 ))
(1−ε)|xn−1 −x ∗ |(αn (x0 ,...,xn−1 ))
≤ const · 2ε
−1
−1
h(u)du
| xn−1 − x ∗ | ≤ const · ε. αn (x0 , . . . , xn−1 )
The assertion follows now directly from Lemma 15.5.
Theorem 15.8 Let (Xn : n ∈ N0 ) be a random search procedure with an absolutely continuous mutation sequence (pn : n ∈ N) and associated density sequence (gn : n ∈ N). Let h1 , h2 denote measurable, nonnegative functions on R+ . Assume that h1 is positive on the interval (0, c) (c > 0). Furthermore, suppose that h2 is bounded. Let (αn : n ∈ N) and (τn : n ∈ N) denote sequences of measurable, positive functions with
|z| ≤ gn (x0 , . . . , xn−1 , xn−1 + z) (αn (x0 , . . . , xn−1 )) h1 αn (x0 , . . . , xn−1 ) |z| −d ≤ (τn (x0 , . . . , xn−1 )) h2 τn (x0 , . . . , xn−1 ) −d
for any n ∈ N, x0 , . . . , xn−1 ∈ D\D ∗ , z ∈ Rd . Furthermore, let there exist for all x0 ∈ D\D ∗ a positive constant K < ∞ with | xn−1 − x ∗ | ≤K τn (x0 , . . . , xn−1 )
for any n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) \D ∗ .
If f and D satisfy assumptions (1), (2), and (3) of Lemma 15.5, then there is for each starting point x0 ∈ D\D ∗ a q > 1 such that q n f (Xn ) − f ∗ → ∞
P -a.s.
272
15 Random Search Methods with a Linear Rate of Convergence
Proof Let x0 ∈ D\D ∗ be an arbitrary point, and consider n ∈ N, x1 , . . . , xn−1 ∈ Df(x0 ) \D ∗ , ε ∈ [0, 1] and an arbitrary s ∈ S. According to (10.3) and with I := R+ r d−1 gn (x0 , . . . , xn−1 , xn−1 + rs)dr we have μn x0 , . . . , xn−1 , s, (1 − ε) | xn−1 − x ∗ |, (1 + ε) | xn−1 − x ∗ |
∗ 1 (1+ε)|xn−1 −x | d−1 = r gn (x0 , . . . , xn−1 , xn−1 + rs)dr I (1−ε)|xn−1 −x ∗ | d−1
(1+ε)|xn−1 −x ∗ | r 1 1 · ≤ I τn (x0 , . . . , xn−1 ) (1−ε)|xn−1 −x ∗ | τn (x0 , . . . , xn−1 ) r · h2 dr τn (x0 , . . . , xn−1 ) −1
∗ 1 (1+ε)|xn−1 −x |(τn (x0 ,...,xn−1 )) = ud−1 h2 (u)du I (1−ε)|xn−1 −x ∗ |(τn (x0 ,...,xn−1 ))−1 ≤
1 2ε | xn−1 − x ∗ | (τn (x0 , . . . , xn−1 ))−1 · I 6 4 2 | xn−1 − x ∗ | · sup ud−1 h2 (u) : 0 ≤ u ≤ τn (x0 , . . . , xn−1 )
≤
1 2εK(2K)d−1 c, ˜ I
where c˜ := sup {h2 (u) : u ∈ R+ }. Since, according to the assumption, I can be estimated from below by
R+
ud−1 h1 (u)du > 0,
we have I1 < ∞. The assertion follows now from Lemma 15.5.
15.3 Linear Convergence for Positive Probability of Success The considerations at the beginning of this section have shown that a decreasing probability of success, in particular near x ∗ , has negative consequences for the rate of convergence. The next example shows that the same holds true if the probability of success becomes too large. Example 15.5 Let D = R2 and f (x) = | x |2 .
15.3 Linear Convergence for Positive Probability of Success
273
Consider a stationary random direction method (Xn : n ∈ N0 ) having the step length distribution λ(x, ·) = ερ (with arbitrary ρ > 0). The probability of success p (x, {f < f (x)}) is then given by p (x, {f < f (x)}) = p (x, {y :| y | f ∗ there are constants c1 , c2 ∈ 0, 12 with pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) ∈ [c1 , c2 ] for any n ∈ N, x0 , . . . , xn−1 ∈ Da \D ∗ , then (Xn : n ∈ N0 ) is globally linear convergent in the mean and P -a.s. Proof (1) From Remark 15.1 and Lemma B.2 it follows that the assumptions (15.1) and (12.2) are satisfied. The assertion follows then from Theorem 15.2, part (ii), provided that condition (15.6) holds for all a > f ∗ . In the following we show that this holds true. For this purpose let a > f ∗ be arbitrary. (2) For x ∈ Rd and r > 0 define Ax,r := {s ∈ S : f (x + rs) − f (x) < 0} . From Corollary B.1 we know that for any x, y ∈ Rd we have (∇f (x))T (x − y) ≥ f (x) − f (y). Hence, if f (x + rs) − f (x) < 0, then (∇f (x))T (−rs) > 0 and therefore (∇f (x))T s < 0. This yields Ax,r ⊂ s ∈ S : (∇f (x))T s < 0 . From the compactness of the set Da (cf. Lemma B.2 and Lemma B.4), and because f is twice continuously differentiable, we obtain the Lipschitz continuity of ∇f on the set Da .
15.3 Linear Convergence for Positive Probability of Success
275
Hence, there exists a constant L > 0 with | ∇f (x) − ∇f (y) |≤ L | x − y |
for any x, y ∈ Da .
Moreover, the mean value theorem of calculus yields for all x, y ∈ Da , with a certain τ ∈ [0, 1], the equation f (x) − f (y) = (∇f (y + τ (x − y)))T (x − y) = (∇f (y))T (x − y) + (∇f (y + τ (x − y)) − ∇f (y))T (x − y). Since Da is convex, we have y + τ (x − y) ∈ Da . Thus, from the inequality of Schwarz we obtain f (x) − f (y) ≤ (∇f (y))T (x − y)+ | ∇f (y + τ (x − y)) − ∇f (y) || x − y | ≤ (∇f (y))T (x − y) + L| x − y |2 . Consequently, for all x ∈ Da \D ∗ , r > 0, and s ∈ Ax,r we have f (x + rs) − f (x) ≤ (∇f (x))T rs + Lr 2 and therefore (
) Lr s 0 we have | ∇f (x) |≥
| x − x ∗ |2 f (x) − f ∗ ≥ m = m | x − x∗ | | x − x∗ | | x − x∗ |
and therefore (
Ax,r ⊃ s ∈ S :
f (x) | ∇f (x) |
T
) Lr s
@
= λn x0 , . . . , xn−1 , 0, 1 + 1 = αn (x0 , . . . , xn−1 ) =
− x∗
| | xn−1 αn (x0 , . . . , xn−1 ) − x∗
0
M m
M m
0
| xn−1 | ≤ K 1+ αn (x0 , . . . , xn−1 )
h @
M m
| xn−1 − x ∗ | ?!
! ∗
| xn−1 − x |
* ∗ 1+ M m |xn−1 −x |
* 1+ M m
!!
!
h
r dr αn (x0 , . . . , xn−1 )
| xn−1 − x ∗ | u du αn (x0 , . . . , xn−1 ) ! ,
where K > 0 is chosen such that h ≤ K, which is possible according to the assumption. This yields * K 1+ M m
αn (x0 , . . . , xn−1 ) ≤ . | xn−1 − x ∗ | pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) If pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) ≥ c1 , it follows therefore that
αn (x0 , . . . , xn−1 ) ≤ | xn−1 − x ∗ |
* K 1+ M m c1
=: d2 .
15.3 Linear Convergence for Positive Probability of Success
277
(4) Let μ denote the probability measure on (R+ , Bl |R+ ) having the density h. Then for arbitrary x0 , . . . , xn−1 ∈ Da \D ∗ we have again pn (x0 , . . . , xn−1 , {f < f (xn−1 )})
= (γ ⊗ λn (x0 , . . . , xn−1 , ·)) (d(s, r)) {(s,r)∈S×R+ :f (xn−1 +rs) f ∗ there exists ε0 > 0 with pn (x0 , . . . , xn−1 , {f < f (xn−1 )})
0 with αn (x0 , . . . , xn−1 ) ≥σ | xn−1 − x ∗ |
for any x0 , . . . , xn−1 ∈ Da \D ∗ .
Thus, Condition (15.6) holds for all a > f ∗ . As already mentioned, the assertion follows now from Theorem 15.2.
References 1. Gänssler, P., Stute, W.: Wahrscheinlichkeitstheorie. Springer, Berlin (1977) 2. Lawson, G., Lavi, A.: Monte Carlo programming: random searching in the solution of parameter optimization problems. In: Conference on Systems Science and Cybernetics. Pittsburgh (1969) 3. Marti, K.: Optimierungsverfahren. vorlesung an der hochschule der bundeswehr münchen (1983) 4. Richter, H.: Wahrscheinlichkeitstheorie. Springer, Berlin (1966) 5. Schumer M.A.; Steiglitz, K.: Adaptive step size random search. IEEE Trans. Autom. control AC (13), 270–276 (1968) 6. Tarasenko, G.: Über die konvergenzgeschwindigkeit der adaptiven zufälligen suche. Russisch), Problemy slucajnogo poiska (1980)
Chapter 16
Success/Failure-Driven Random Direction Procedures
In this section we consider a random direction procedure, cf. G.S. Tarasenko [2], based on a very simple step length control. At any iteration step n the step length 1n is chosen deterministically according to the following algorithm: %1 = % > 0 4 γ1 %n in case of success at the n-th step %n+1 = γ2 %n in case of failure in the n-th step, where
γ1 > %, γ2 ∈ (0, %)
are fixed parameters. Hence, after successes the step lengths will be increased, and after failures they will be reduced. If the function f is convex, which is generally assumed in this section, this type of step size control appears to be quite plausible. Indeed, under not too restrictive assumptions, in Chap. 16 we derive now global linear (even P a.s.) for a rather broad class of random direction methods. Definition 16.1 (Model) Suppose that D = Rd and f : D → R is convex. Furthermore, let D ∗ = {x ∗ }. Choose coefficients γ1 > %, γ2 ∈ (0, %) and 0 < %0 ≤ L0 < ∞, and let λ0 denote a probability measure on [%0 , L0 ] (initial distribution). Let the sequence of mappings %n : D n → R+ be defined as follows: %1 (x0 ) := 1 for all x0 ∈ D 4 γ1 %n (x0 , . . . , xn−1 ) if f (xn ) < f (xn−1 ) %n+1 (x0 , . . . , xn ) := γ2 %n (x0 , . . . , xn−1 ) if f (xn ) ≥ f (xn−1 ).
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4_16
279
280
16 Success/Failure-Driven Random Direction Procedures
For 1 ∈ R+ and x ∈ D, let T1 (x) := 1 · x. Furthermore, for n ∈ N and x0 , . . . , xn−1 ∈ D, define λn (x0 , . . . , xn−1 , ·) := T1n (x0 , . . . , xn−1 )(λ0 ) (in particular we have λ1 (x0 , ·) = λ0 ). From the recursive definition of %n it follows that %n+k (x0 , . . . , xn−1 , xn , . . . , xn+k−1 ) = γ1enk · γ2k−enk · %n (x0 , . . . , xn−1 ), where the quantity enk :=
n+k−1
%{(x,y):f (x)>f (y)} (xi−1 , xi )
i=n
represents the number of “successes” in the time interval [n, n + k − 1]. Let denote (Xn : n ∈ N0 ) the random direction procedure with the just defined step length distribution λn . Therefore, pn (x0 , . . . , xn−1 , B) = γ ⊗ λn (x0 , . . . , xn−1 , ·) × (x, r) ∈ S × R+ : xn−1 + rs ∈ B , where γ is the uniform distribution on the d-dimensional unit sphere S. Let An := σ (X0 , . . . , Xn ) denote the σ -algebra generated by X0 , . . . , Xn . Furthermore, let γx,r be the uniform distribution on the sphere with center x and radius r (cf. Appendix A). For a stopping time with respect to (An : n ∈ N0 ), let AT := A ∈ A : A ∩ {T = n} ∈ An for all n ∈ N0 . Sets of the form x ∈ D : f (x) ≤ c} are denoted again shortly by {f ≤ c}. The following theorem represents the basic results of the present Chap. 16. Its proof is very complicated and requires several preparations. For simplification, we start with an outline of the proof illustrating the most important ideas of the proof. This outline can be found after Theorem 16.3. Theorem 16.1 Let the following conditions be satisfied: ⎫ forall x0 = x ∗ there exists ⎪ b = b(x0 ) > 0 with ⎪ ⎬ inf |y − x ∗ | : f (y) = α ≤ b sup |y − x ∗ | : f (y) = α ⎪ ⎪ ⎭ for all α ∈ f ∗ , f (x0 )
(16.1)
16 Success/Failure-Driven Random Direction Procedures
for all x0 = x ∗ there exists p > 0 with p 1−p
(i) γ1 γ2
> % and
(ii) for all δ > 0 there is ε > 0 such that the inequality γx,r f < f (x) ≥ p − δ is satisfied for all r r > 0 and x ∈ Df (x0 ) > D ∗ with < ε. |x − x ∗ |
281
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(16.2)
Then, (Xn : n ∈ N0 ) is P -a.s. globally linear convergent. Remark 16.1 (I) The assumption (16.1) is already satisfied, provided that an a1 > f ∗ and a b1 > 0 exist such that inf |y −x ∗ | : f (y) = α ≥ b1 sup |y −x ∗ | : f (y) = α for all α ∈ (f ∗ , a1 ]. (II) If assumption (16.1) is satisfied, then even inf |y−x ∗ | : f (y) = α ≤ b sup |y−x ∗ | : f (y) ≤ α for all α ∈ f ∗ , f (x0 ) . Proof (I) Let x0 = x ∗ be arbitrary. If f (x0 ) ≤ a1 , then the assertion is clear. Hence, let f (x0 ) > a1 . From the boundedness of D ∗ , with Lemmas B.2 and B.4 we obtain the boundedness of the set Df (x0 ) . Thus, R := sup |y − x ∗ | : f (y) ≤ f (x0 ) < ∞. Now, for α ∈ a1 , f (x0 ) we have
inf |y − x ∗ | : f (y) = α inf |y − x ∗ | : f (y) ≤ a1 ≤ =: b˜ > 0. R sup |y − x ∗ | : f (y) = α For α ∈ (f ∗ , a1 ], according to the assumption it is inf |y − x ∗ | : f (y) = α ≥ b1 sup |y − x ∗ | : f (y) = α . The assertion with b := min{b1 , b}. follows now ∗ (II) Select α ∈ f , f (x0 ) and y ∈ D\D ∗ with f (y) ≤ α arbitrarily. For t ≥ 0, define g(t) := f x ∗ + t (y − x ∗ ) .
282
16 Success/Failure-Driven Random Direction Procedures
From the boundedness of Dα and the continuity of g, we obtain the existence of a point t0 > 0 with g(t0 ) = α. If t0 < 1, then x ∗ + t0 (y − x ∗ ) lies between x ∗ and y. Lemma B.1 yields then ∗ + t (y − x ∗ ) − f ∗ ∗ f x 0 f (y) − f ≤ , |y − x ∗ | t0 |y − x ∗ | hence, f (y) ≥ f ∗ + t0 ≥ 1 and therefore
1 (α − f ∗ ) > α in contradiction to f (y) ≤ α. Thus, t0
|y − x ∗ | ≤ t0 |y − x ∗ | = x ∗ + t0 (y − x ∗ ) − x ∗ ≤ sup |z − x ∗ | : f (z) = α . Since y was arbitrary, it follows that sup |y − x ∗ | : f (y) ≤ α ≤ sup |z − x ∗ | : f (z) = α . This inequality and assumption (16.1) yield the assertion. Remark 16.2 Suppose that condition (16.1) is fulfilled. Then, for all x0 = x ∗ there exists q = q(x0 ) > 0 with f (y) − f ∗ |y − x ∗ | for all x ∈ Df (x0 ) \D ∗ and y ∈ Df (x) . ≤ q f (x) − f ∗ |x − x ∗ | Proof Let x0 = x ∗ be arbitrary. Choose b = b(x0 ) according to assumption (16.1). Select arbitrary vectors x ∈ Df (x0 ) \D ∗ and y ∈ Df (x) .
Moreover, let t ∈ [0, 1] be chosen such that f x ∗ + t (x − x ∗ ) = f (y). This is possible because of f (x ∗ ) = f ∗ ≤ f (y) and f (x) ≥ f (y). Define z := x ∗ + t (x − x ∗ ). Because of f (z) = f (y) we have |z − x ∗ | ≤ sup |z − x ∗ | : f (z) = f (y) ≤ =
|y − x ∗ | 1 inf |z − x ∗ | : f (z) = f (y) ≤ . b b
16 Success/Failure-Driven Random Direction Procedures
283
Now it follows from Lemma B.1 that b
f (z) − f ∗ f (x) − f ∗ f (y) − f ∗ ≤ ≤ |y − x ∗ | |z − x ∗ | |x − x ∗ |
and therefore f (y) − f ∗ 1 |y − x ∗ | . ≤ f (x) − f ∗ b |x − x ∗ | The assertion follows then with q = 1/b.
Theorem 16.2 Let γ1 γ2 > 1, and assume that ∇f is Lipschitz continuous on compact sets, i.e., for all compact set M ⊂ D there exists L > 0 with ∇f (x) − ∇f (y) ≤ L|x − y| for all x, y ∈ M. Furthermore: 4 for all R > 0 there exist mR > 0, MR > 0 and sR ∈ (1, 2] with mR |x − x ∗ |sR ≤ f (x) − f ∗ ≤ MR |x − x ∗ |sR for all x ∈ K(x ∗ , R). (16.3) Then, (Xn : n ∈ N0 ) is P -a.s. globally linear convergent. Remark 16.3 Note that assumption (16.1) holds (with sR = 2), provided that f is twice continuously differentiable in neighborhood of x ∗ with a positive definite Hessian at x ∗ . This follows from Corollary B.3. Remark 16.4 If f is twice continuously differentiable, then ∇f is Lipschitz continuous on compact sets. This follows from the mean value theorem of differential calculus and from Schwarz’s inequality. Remark 16.5 Assumption (16.3) implies assumption (16.1). Proof Let (16.3) be satisfied and take an arbitrary x0 = x∗ . If R > 0 is chosen such that Df (x0 ) ⊂ K(x ∗ , R), then (16.1) implies for all α ∈ f ∗ , f (x0 ) the inclusions
α − f∗ {f = α} ⊂ D\K x , MR ∗
α − f∗ {f = α} ⊂ K x , mR ∗
1/sR ! and
1/sR ! .
284
16 Success/Failure-Driven Random Direction Procedures
Consequently, α − f ∗ 1/sR inf |y − x | : f (y) = α ≥ = MR mR 1/sR α − f ∗ 1/sR mR 1/sR ≤ sup |y − x ∗ | : f (y) = α . MR mR MR
∗
Hence, (16.1) follows with b =
mR MR
1/sR .
Proof of Theorem 16.2 In order to apply Theorem 16.1, according to Remark 16.5 we have only to show that assumption (16.3) holds. For this purpose, consider an arbitrary x0 = x ∗ . For x ∈ Df (x0 ) \D ∗ and r > 0, define Ax,r := s ∈ S : f (x + rs) < f (x) . Then, γx,r f < f (x) = γ (Ax,r ). From the compactness of Df (x0 ) (cf. Lemma B.2 and Lemma B.4) we get the existence of a constant L > 0 with ∇f (x) − ∇f (y) ≤ L|x − y| for all x, y ∈ Df (x0 ) . In the same way as in the proof of Theorem 15.9, the following inclusion can be shown: ⎫ ⎧ ⎞Ts ⎛ ⎪ ⎪ ⎨ ∇f (x) ⎠ Lr ⎬ ⊂ Ax,r . s ∈ S : ⎝ < − ⎭ ⎪ ⎩ ∇f (x) ∇f (x) ⎪ Because of T f (x) − f ∗ ≤ ∇f (x) (x − x ∗ ) ≤ ∇f (x)|x − x ∗ | (cf. Corollary B.1), according to assumption (16.3) we have the inequality f (x) − f ∗ |x − x ∗ |s ≥m = m|x − x ∗ |s−1 ∇f (x) ≥ ∗ |x − x | |x − x ∗ | with a suitable m > 0 and a certain s ∈ (1, 2]. Hence, ⎧ ⎫ ⎞Ts ⎛ ⎪ ⎪ ⎨ ⎬ ∇f (x) ⎠ Lr ∗ 2−s ⎝ |x − x Ax,r ⊃ s ∈ S : 1 with α n |Xn − x ∗ | → 0 P − a.s. Proof For each starting point x0 ∈ D there is a constant R > 0 with Df (x0 ) ⊂ K(x ∗ , R). Choosing α = β 1/sR then (16.1) yields αn |Xn − x ∗ | ≤
1/sR 1 n β f (Xn ) − f ∗ P -a.s. mR
Of course, β > 1 implies α > 1.
Theorem 16.3 Suppose that the function f is twice continuously differentiable with a Hessian which is positive definite at x ∗ . If γ1 γ2 > %, then for each starting point x0 ∈ D there is a β > 1 such that β n f (Xn ) − f ∗ → 0 P -a.s.
286
16 Success/Failure-Driven Random Direction Procedures
and β n |Xn − x ∗ | → 0 P -a.s. The proof of this assertion follows directly from Theorem 16.2 taking into account the remarks 16.3, 16.4, and 16.6. Outline of the proof of Theorem 16.1 Let us consider the special case %0 = L0 , i.e. λ0 = ε%0 . The step length distribution λn is then given by λn (x0 , . . . , xn−1 , ) = ε%n (x0 , . . . , xn−1 ). Once the selection points x0 , . . . , xn−1 are generated, then, according to the construction of a random direction procedure, in the present case the mutation points yn are obtained as realizations of the uniform distribution on the sphere with center xn−1 and radius r = %n (x0 , . . . , xn−1 ). Therefore xn = xn−1 or |xn−1 − xn | = %n (x0 , . . . , xn.1 ). Of crucial importance to the proof of Theorem 16.1 are then the quotients Qn =
%n+1 (X0 , . . . , Xn ) |Xn − x ∗ |
and the angles Zn = ϕ(Xn+1 − Xn , x ∗ − Xn )
cf. Definition B.1.
According to Lemma B.9 we know that for starting points x0 ∈ D\D ∗ there all 1 we have exists an angle ψ > 0 such that for all a ∈ 0, 2 4 4
:6 9 1 Qn ∈ a, ∩ Zn ∈ [0, &] = 2
6 1 ∗ a|Xn − x | ≤ |Xn+1 − Xn | ≤ |Xn − x | ∩ Zn ∈ [0, ψ] = 2 4 6 ∗ ∗ 1 ∗ (Xn+1 − Xn ) ∈ C ψ, x − Xn , a|Xn − x | |Xn − x | ⊂ 2 4 6 6 4 |X |Xn+1 − Xn | n+1 − Xn | ∗ f (Xn ) − f (Xn+1 ) ≥ f (X ≥ a ⊂ ∩ ) − f n 3|x ∗ − Xn | |x ∗ − Xn | ∗
(according to Lemma B.11, part (2)
16 Success/Failure-Driven Random Direction Procedures
287
a f (Xn ) − f ∗ = f (Xn ) − f (Xn+1 ) ≥ 3 a ∗ f (Xn+1 ) − f ≤ 1 − f (Xn ) − f ∗ . 3
To obtain, firstly, the convergence of the random search procedure (Xn : n ∈ N0 ), it is sufficient to show that for infinite many n ∈ N the event 4
:6 9 1 Qn ∈ a, ∩ Zn ∈ [0, ψ] 2
occurs. Then it is also true that for infinite many n ∈ N the event a f (Xn ) − f ∗ f (Xn+1 ) − f ∗ ≤ 1 − 3 occurs. Since for the remaining iteration stages n the inequality F (Xn+1 ) − f ∗ ≤ f (Xn ) − f ∗ holds, we get then the convergence of f (Xn ) towards f ∗ . For the event 4
9
1 Qn ∈ a, 2
:6
∩ Zn ∈ [0, ψ]
4 :6 9 1 to occur for infinite many n, it is sufficient that Qn ∈ a, infinite often occurs. 2 Indeed, there exists a p > 0 with :6 4 9 1 p Zn ∈ [0, ψ]An ≥ p P -a.s. on Qn ∈ a, 2 (cf. part 2 of the proof of Lemma 16.10). For the meaning of the above inequality, cf. Definition C.1, Lemma C.5, respectively. Fromthis the desired result can be derived. Hence, it is sufficient to show that for 1 an a ∈ 0, the event 2 4
:6 9 1 Qn ∈ a, 2
will occur for infinite many n ∈ N. However, the proof of this assertion is rather difficult, since the stochastic behavior of the sequence (Qn : n ∈ N0 ) cannot be described easily. In this context the following properties are of crucial importance:
288
16 Success/Failure-Driven Random Direction Procedures
(i) 1 P Qn+1 = γ2 Qn |An ≥ . 2 This follows from the fact that in a random direction procedure the probability 1 of failure is at least (cf. Theorem 15.5 and Lemma 16.1; note that f is 2 convex) and a failure in the n + 1-ten step implies the event {Qn+1 = γ2 Qn }. (ii) If c is chosen sufficiently large, a failure will occur in the n + 1-th step (i.e. Xn+1 = Xn and therefore Qn+1 = γ2 Qn ), provided that Qn > c.
The mathematical formulation of this fact is: P (Qn+1 = γ2 Qn |A) = 1P -a.s. on {Qn > 0}. This means that after entering the interval (c, ∞), the sequence (Qn : n ∈ N) will behave deterministically for some time, namely k times, where k = min{i ∈ N : γ2i Qn < c} (here, n = time point at which the algorithm enters the interval (c, ∞)). Consequently, after each entry of (Qn : n ∈ N0 ) into the interval (c, ∞) the sequence will leave this interval after a finite number of iteration steps. Therefore, the event {Qn < c} will occur for infinite many time points n ∈ N. Using probabilistic arguments, from property (i) we even obtain that for arbitrary a > 0 the event {Qn < a} occurs also for infinite many n ∈ N (cf. Lemma 16.8).
16 Success/Failure-Driven Random Direction Procedures
289
1 (iii) If Qn > , then in case of a failure in the n+1-th step we get Qn+1 = γ2 Qn > 2 1 γ2 . 2 In case of success we have Qn+1 =
|Xn − x ∗ | γ1 %n+1 (X0 , . . . , Xn ) 1 = γ1 Q n ≥ bγ1 Qn > γ1 b ∗ |Xn+1 − x | |Xn+1 − x ∗ | 2
(here, b is according to assumption (16.1), cf. Remark 16.1, part (2). 1 If therefore Qn > , then in the next step Qn+1 cannot be arbitrarily close to 2 zero. Hence, there exists a1 > 0 such that the events 6 4 1 (n ∈ N0 ) Qn > · Qn+1 < a1 2 are impossible, i.e. that their probability is zero (cf. Lemma 16.6). If it were now also known that the event {Qn ≥ a1 } occurs for infinite many n ∈ N, then we would be finished. Then the sequence (Qn : n ∈ N) infinite many leave the interval (0, a1 ), and according to part (ii), infinite often enters this interval. : However, then it would be necessary for (Qn : n ∈ N0 ) to be in the interval 9 1 immediately before each entry. Since this process occurs infinite often, a1 , 2 4 :6 9 1 the event, too, must occur Qn ∈ a1 , infinite often. This implies then, as 2 already mentioned, the convergence of the algorithm. The proof of the existence of an a1 > 0 such that the event {Qn ≥ a1 } occurs for infinite many n ∈ N is given in Lemmas 16.2 to 16.5. The basic idea for this is that that the event {Qn < a1 , . . . , Qn+K < a1 }K can occur only if the number of successes in the time interval [n, n + K] does not exceed a certain number, which is essentially proportional to K. The corresponding probability can be estimated using the Binomial distribution. To show a linear rate of convergence, it is necessary to evaluate in more detail how often the event :6 4 9 1 ∩ Zn ∈ [0, ψ] Qn ∈ a, 2 and therefore also the event a f (Xn ) − f ∗ f (Xn+1 ) − f ∗ ≤ 1 − 3 occurs.
290
16 Success/Failure-Driven Random Direction Procedures
For example, the following condition—unfortunately not known whether it holds—would be sufficient for the linear convergence: ⎛
⎞ n 1 ≥ α > 0⎠ = 1. P ⎝lim inf 1 n→∞ 2 Qi ∈[a,1/2] ∩ Zi ∈[0,ψ] i=1
For α = 1/3 this would mean, for example, that the event 4
9
1 Qn ∈ a, 2
:6
∩ Zn ∈ [0, ψ]
occurs “in the mean at every third step.” Then it follows for large n that a n f (x0 ) − f ∗ f (X3n ) − f ∗ ≤ 1 − 3 or a 1/3 n f (Xn ) − f ≤ 1 − f (x0 ) − f ∗ , 3 ∗
i.e., linear convergence. However, linear convergence is also obtained if ⎛
⎞ n 1 + 1{Q ≥c} ≥ α > 0⎠ = 1 P ⎝lim inf 1 i n→∞ n Qi ∈[a,1/2],Zi ∈[0,ψ]
(16.4)
i=1
is satisfied for a sufficiently large c = c(x0 ). The reason for this is that the event {Qn−1 < c, Qn ≥ c} can only occur if there is a search success in the n-th step. It can be shown that this success must be greater, the greater c. Moreover, it can be shown that the following inequality is true for sufficiently large values of c (cf. Lemma 16.11): f (XTn ) − f ∗ ≤ γ2Mn f (XTn−1 ) − f ∗ (where Tn is the n-th time of entry of the sequence (Qn : n ∈ N0 ) into the interval [c, ∞), and Mn denotes the n-th sojourn time of this sequence in the interval [c, ∞)). This means that the improvement of the function at the time of the n-th entry into the interval [c, ∞) has within one iteration step the same effect (note the factor γ2Mn ) as if the event f (Xi ) − f ∗ ≤ γ2 f (Xi−1 ) − f ∗
16 Success/Failure-Driven Random Direction Procedures
291
occurred at any time i = Tn , Tn + 1, . . . , Tn + Mn − 1. From this consideration and from : 4 9 6 1 a , Zi ∈ [0, ψ] ⊂ f (Xi+1 ) − f ∗ ≤ 1 − f (Xi ) − f ∗ Qi ∈ a, 2 3 the following inclusion can be derived (cf. the proof of Theorem 16.1): (
n i=1
(1
) = k
Qi ∈[a,1/2],Zi ∈[0,ψ]
⎫ ⎬ ⎭
⊂ f (Xn+1 ) − f ∗ ≤ γ k f (x0 ) − f ∗ a . with γ := max γ2 , 1 − 3
Then linear convergence follows, provided that (16.4) holds. To prove (16.4), it is necessary to make a suitable estimate of the average sojourn times 1 Mi n n
i=1
of the sequence (Qn : n ∈ N0 ) in the intervals (0, a) and 12 , c (cf. Lemma 16.12). The basic tool for this estimate is Lemma C.1. The following Lemmas 16.1 to 16.12 will serve exclusively the purpose to prepare the proof of Theorem 16.1. Lemma 16.1 For n ∈ N0 , define ( Qn :=
1n+1 (X0 ,...,Xn ) |Xn −x ∗ |
∞
if Xn = x ∗ else.
Then, for all starting points x0 ∈ D\D ∗ we have (1) P
;
{Qn < ∞} = 1
n∈N
(2) P (Qn+1 < Qn |An ) ≥ P (Qn+1 = γ2 Qn |An ) ≥ 1 ≥ P f (Xn+1 ) ≥ f (Xn )|An ≥ P -a.s. 2
292
16 Success/Failure-Driven Random Direction Procedures
(3) P (Xn+1 = Xn |An ) = P (Qn+1 = γ2 Qn |An ) = 1 P -a.s. on {Qn ≥ c} for all c ≥ 2(1 + 1/b)/10 (with b according to assumption (16.1)). For the notation, cf. Definition C.1 and Lemma C.2. Proof (1) For arbitrary n ∈ N and x0 , . . . , xn−1 ∈ D\D ∗ we have qn x0 , . . . , xn−1 , {x ∗ } = pn x0 , . . . , xn−1 , {x ∗ } (x, r) : xn−1 + rs = x ∗ = γ ⊗ λn (x0 , . . . , xn−1 , ·) 6 4 ∗ |x − xn−1 | ∗ · λ x , . . . , x , {|x − x |} = 0. γ n 0 n−1 n−1 |x ∗ − xn−1 | This yields qn x0 , . . . , xn−1 , D\{x ∗ } = 1 and therefore ⎛ P⎝
n 8
⎞ {Xi = x ∗ }⎠ =
i=0
···
D\{x ∗ }
qn (x0 , . . . , xn−1 , dxn ) . . . q1 (x0 , dx1 ) = 1.
D\{x ∗ }
This holds for arbitrary n ∈ N. ! D Therefore P {Xn = x ∗ } = 0. n∈N0
(2) Because of γ2 < 1, the first inequality is obvious. Theorem 15.5 yields qn x0 , . . . , xn−1 , f < f (xn−1 ) 1 = pn x0 , . . . , xn−1 , f > f (xn−1 ) ≤ 2 for all n ∈ N, x0 , . . . , xn−1 ∈ D\D ∗ and therefore 1 qn x0 , . . . , xn−1 , f ≥ f (xn−1 ) ≥ . 2 Hence, 1 P f (Xn+1 ) ≥ f (Xn )|An ≥ P -a.s. 2
16 Success/Failure-Driven Random Direction Procedures
293
The assertion follows now from P (f (Xn+1 ) ≥ f (Xn )|An ) = P (Xn+1 = Xn |An ) ≤ P (Qn+1 = γ2 Qn |An ) P -a.s. (3) Let x0 ∈ D\D ∗ be an arbitrary starting point. Furthermore, let b = b(x0 ) be chosen according to assumption (16.1). Then, for x, y ∈ Df (x0 ) with f (y) ≤ f (x), according to Remark 16.1, part (2) we have |y − x| ≤
1 |x − x ∗ |, b
and therefore |y − x| ≤ |y − x ∗ | + |x − x ∗ | ≤ (1 + 1/b)|x − x ∗ |. Hence, f ≤ f (x) ⊂ K x, (1 + 1/b)|x − x ∗ | . Choose now an arbitrary c ≥ 2(1 + 1/b)/ l0 , and select x1 , . . . , xn ∈ Df (x0 ) with
ln+1 (x0 , . . . , xn ) ≥ c. |xn − x ∗ |
Then, Pn+1 x0 , . . . , xn , f < f (xn ) ≤ Pn+1 x0 , . . . , xn , K xn , (1 + 1/b)|xn − x ∗ | ∗ (s, r) : xn + rs ∈ K xn , (1 + 1/b)|xn − x |) = γ ⊗ λn+1 (x0 , . . . , xn , ·) = γn+1 x0 , . . . , xn , 0, (1 + 1/b)|xn − x ∗ | = Tln+1 (x0 ,...,xn ) (λ0 ) 0, (1 + 1/b)|xn − x ∗ | 9 : |xn − x ∗ | ≤ λ0 0, (1 + 1/b)/c = λ0 0, (1 + 1/b) ln+1 (x0 , . . . , xn ) 9 : 1 ≤ λ0 0, l0 = 0, da λ0 [l0 , L0 ] = 1. 2
Because of P (Xn = Xn+1 |Xn = xn , . . . , X0 = x0 ) v = Pn+1 x0 , . . . , xn , f < f (xn ) = 0
294
16 Success/Failure-Driven Random Direction Procedures
and {Xn+1 = Xn } ⊂ {Qn+1 = γ2 Qn }, the assertion follows now.
Lemma 16.2 Let Qn be defined as in Lemma 16.2. Moreover, assume that assumption (16.1) of Theorem 16.1 holds, and let x0 denote an arbitrary starting point from D\D ∗ . Let b be chosen according to assumption (16.1) and Remark 16.1 such that for all α ∈ f ∗ , f (x0 ) the inequality inf |y − x ∗ | : f (y) = α ≥ b · sup |y − x ∗ | : f (y) ≤ α is satisfied. Then, with ρ1 :=
ln γ2 , for all n, s ∈ N and δ, ε > 0, we have ln γ2 − ln γ1
{Qn ≥ δ, Qn+1 < ε, . . . , Qn+s < ε} ⊂ {Qn ≥ δ, Qn+s < ε} ⊂
n+s
1{f (Xi−1 )>f (Xi )} < ρ1 s + ρ2 P -a.s.
i=n+1
where ρ2 :=
ln(δb) − ln ε . ln γ2 − ln γ1
Proof Let ω ∈ {Q n ≥ δ, Qn+s < ε}. Because of P X0 = x0 , f (X0 ) ≥ f (X1 ) ≥ f (X2 ) ≥ . . . = 1, without loss of generality we can assume that X0 (ω) und f (x0 ) ≥ f X1 (ω) ≥ f X2 (ω) ≥ . . . holds. Define e(ω) :=
n+s i=n+1
1{f (Xi+1 )>f (Xi )} (ω).
Obviously, e(ω) ≤ s. If, e(ω) ≥ ρ1 s + ρ2 , then e(ω) ≥
s ln γ2 + ln(bδ/ε) ln γ2 − ln γ1
and therefore ε . s − e(ω) ln γ2 + e(ω) ln γ1 ≥ ln δb ε This yields gγ1e(ω) γ2s−e(ω) ≥ . δ Because of f (Xn+s (ω)) ≤ f (Xn (ω)), we get
16 Success/Failure-Driven Random Direction Procedures
295
1 |Xn (ω) − x ∗ | b
|Xn+s (ω) − x ∗ | ≤ and therefore (cf. Definition 16.1) Qn+s (ω) = ≥
1n+s+1 (X0 (ω), . . . , Xn+s (ω)) |Xn+s (ω) − x ∗ | 1n+s+1 (X0 (ω), . . . , Xn+s (ω)) b |Xn (ω) − x ∗ |
1n+1 (X0 (ω), . . . , Xn (ω)) e(ω) s−e(ω) bγ1 γ2 |Xn (ω) − x ∗ | ε ε e(ω) s−e(ω) = Qn (ω)bγ1 γ2 ≥ Qn (ω) ≥ δ = ε δ δ =
in contradiction to Qn+s (ω) < . Lemma 16.3 Consider an arbitrary starting point x0 ∈ D
\ D∗.
For ε > 0, let
pε := inf{pn (x0 , . . . , Xn−1 , {f < f (xn−1 )}) : n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) ,
1n (x0 , . . . , xn−1 ) < ε}. |xn−1 − x ∗ |
Then, for all δ, ε > 0 and n, s ∈ N we have P ({Qn−1 ≥ δ} ∩
n+s−1 8
{Qi < ε} ≤
[p1 s+p2 ] j =1
i=1
s j p (1 − pε )s−j j ε
with p1 , p2 according to Lemma 16.2. Proof According to the Definition of pε for all n ∈ N we have P (f (Xn ) < f (Xn−1 )|An−1 ) ≥ Pε P -a.s. on {Qn−1 < ε}. Lemma 16.2 yields: P ({Qn−1 ≥ δ} ∩
≤P
(n+s−1
n+s−1 8
{Qi < })
i=1
)
1{f (Xi−1 )>f (Xi )} < p1 s + p2 ∩
i=n+1
Lemma C.4, part (2) applied to
n+s−1 8 i=1
! {Qi < ε} .
296
16 Success/Failure-Driven Random Direction Procedures
Zn = 1{f (Xi−1 )>f (Xi )} (Xn−1 , Xn ), T = n, n = s and K = [p1 s + p2 ], yields the assertion.
\ D∗.
Lemma 16.4 Consider an arbitrary starting point x0 ∈ D Let p be defined as in Lemma 16.3. Moreover, let the assumption (16.2) of Theorem 16.1 be fulfilled. Then there exists an ε > 0 with p ε > p1 = p 1−p
Proof Because of γ1 γ2
ln γ2 . ln γ2 − ln γ1
> 1, for continuity reasons there is δ˜ > 0 with p−δ˜ 1−p+δ˜ γ2
γ1
> 1.
This inequality is equivalent to p1 =
ln γ2 ˜ < p − δ. ln γ2 − ln γ1
According to assumption (16.2) (ii) there is an ε > 0 with γx,r {f < f (x)} ≥ p − δ˜ for all xDf (x0 ) with
r < L0 ε. |x − x ∗ |
Let now n ∈ N, x1 , . . . , xn−1 ∈ Df (x0 ) be chosen arbitrarily with < ε. Then we have
ln (x0 , . . . , xn−1 ) |xn−1 − x ∗ |
pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) = (γ ⊗ λn (x0 , . . . , xn−1 , ·))({(s, r) : xn−1 + rs ∈ {f < f (xn−1 )
= γ ({s ∈ S : xn−1 + rs ∈ {f < f (xn−1 )}})λn (x0 , . . . , xn−1 , dr) R+
γxn−1 ,r {f < f (xn−1 )}λn (x0 , . . . , xn−1 , dr)
= R+
γxn−1 ,rln (x0 ,...,xn−1 ) {f < f (xn−1 )}λ0 (dr)
= R+
γxn−1 ,rln (x0 ,...,xn−1 ) {f < f (xn−1 )}λ0 (dr).
= [l0 ,L0 ]
16 Success/Failure-Driven Random Direction Procedures
297
rln (x0 , . . . , xn−1 ) < l0 ε and therefore |xn−1 − x ∗ |
For r ∈ [l0 , L0 ] we have
˜ γxn−1 ,rln (x0 ,...,xn−1 ) {f < f (xn−1 )} ≥ p − δ. From this it follows that pn (x0 , . . . , xn−1 , {f < f (xn−1 )}) ≥ p − δ˜ and therefore also pε ≥ p − δ˜ > p1 .
Lemma 16.5 Assume that the conditions of Theorem 16.1 are satisfied. Then for all starting points x0 ∈ D \ D ∗ there exists an a0 = a0 (x0 ) > 0 such that P (Qn ≥ a0 for infinite many n ∈ N0 ) = 1. Proof Let x0 ∈ D \ D ∗ be an arbitrary starting point. According to Lemma 16.4 there is an ε > 0 with pε > p1 . Define a0 := ε. From Lemma 16.3 it follows for all n ∈ N, s ∈ N and δ > 0 that P ({Qn−1 ≥ δ})
n+s−1 8
{Qi < a0 } ≤
[p1 s+p2 ] j =0
i=n
s j pa0 (1 − pa0 )s−j . j
Because of Pa0 > p1 , Lemma B.15 yields lim P ({Qn−1 ≥ δ}
s→∞
n+s−1 8
{Qi < a0 }) = 0.
i=n
By definition we have for all Qn > 0. Hence, for each n ∈ N we get P(
8
{Qi < a0 } = P ({Qn−1 > 0} ∩
i≥n
8
{Qi < a0 }
i≥n
⎛
= lim P ⎝{Qn−1 ≥ δ} ∩ δ→0
= lim
δ→0
8
⎞ {Qi < a0 }⎠
i≥n
lim P
s→∞
{Qn−1 ≥ δ}
n+s−1 8 i=n
for all n ∈ N we therefore have P(
8
i≥n
{Qi < a0 }) = 0.
!! {Qi < a0 }
= 0.
298
16 Success/Failure-Driven Random Direction Procedures
Hence, P(
;8
{Qi < a0 }) = 0.
n∈N i≥n
Taking the complement, we get the assertion.
Lemma 16.6 Let the assumptions of Theorem 16.1 be satisfied. Then, for all 1 starting points x0 ∈ D \ D ∗ there exists an a1 = a1 (x0 ) ∈ (0, ) with 2L0 P (Qn < a1 , Qn+1 >
1 1 |Ak ) = P (Qn > Qn+1 < a1 |Ak ) = 0 2L0 2L0
P -a.s. for all n ∈ N and k ≤ n. Proof
(
) γ2 γ1b 1 (1) Let k = n, and assume that a1 := m in , , , where b 2L0 2L0 L0 + 2L0 γ1 is chosen, according to assumption (16.1), such that for all α ∈ (f ∗ , f (x0 )] the inequality inf{|y − x ∗ | : f (y) = a} ≥ b sup{|y − x ∗ : f (y) = α} holds. Then for all x1 , . . . , xn ∈ Df (x0 ) with
ln+1 (x0 , . . . , xn ) < a1 |xn − x ∗ |
we have 6 4 1 d ln+2 (x0 , . . . , xn , xn+1 ) > qn+1 x0 , . . . , xn , xn+1 ∈ R : |xn+1 − x ∗ | 2L0 6 4 1 γ1 ln+1 (x0 , . . . , xn ) ≤ qn+1 x0 , . . . , xn , xn+1 : > |xn+1 − x ∗ | 2L0 6 4 1 γ1 a1 |x0 , . . . , xn | > ≤ qn+1 x0 , . . . , xn , xn+1 : |xn+1 − x ∗ | 2L0 = qn+1 (x0 , . . . , xn , {xn+1 : |xn+1 − x ∗ | < 2L0 γ1 a1 |xn − x ∗ |}) × qn+1 (x0 , . . . , xn , {xn+1 : |xn − x ∗ | − |xn − xn+1 | < 2L0 a1 γ1 |xn − x ∗ |}) = qn+1 (x0 , . . . , xn , {xn+1 : |xn+1 − xn | > (1 − 2L0 γ1 a1 )|xn − x ∗ |} = 0
16 Success/Failure-Driven Random Direction Procedures
299
since on the one hand qn+1 (x0 , . . . , xn , {xn+1 : |xn+1 − xn | ≤ L0 ln+1 (x0 , . . . , xn )}) = 0. However, because of ln+1 (x0 , . . . , xn ) 1 < a1 ≤ |xn − x ∗ | LO + 2γ1 L0 the inequality (1 − 2L0 γ1 a1 )|xn − x ∗ | >
1 − 2L0 γ1 a1 ln+1 (x0 , . . . , xn ) a1
≥ L0 ln+1 (x0 , . . . , xn ) is satisfied.
This yields P ({Qn+1 >
1 }|An ) = 0P -a.s. on {Qn < a1 }. 2L0
Because of {Qn < a1 } ∈ An , for arbitrary An ∈ An we have P ({Qn < a1 }∩{Qn+1 >
1 }∩An ) = 2L0
An ∩{Qn
1 |An )dP = 0. 2L0
Lemma C.2 yields then P (Qn < a1 , Qn+1 >
1 |An ) = 0 P -a.s. 2L0
For k < n it follows that P (Qn < a1 , Qn+1 >
1 1 |An ) = E(P (Qn < a1 , Qn+1 > |An )|Ak ) = 0 2L0 2L0
and thus the first part of the assertion. 1 ln+1 (x0 , . . . , xn ) > (2) Select x1 , . . . , xn ∈ Df (x0 ) such that . ∗ |xn − x | 2L0 Then, {xn+1 ∈ Rd :
ln+2 (x0 , . . . , xn , xn+1 ) < a1 } |xn+1 − x ∗ |
300
16 Success/Failure-Driven Random Direction Procedures
= {xn+1 : {xn+1 : = {xn+1 :
γ2 ln+1 (x0 , . . . , xn ) < a1 } ∩ {xn+1 = xn } ∪ |xn − x ∗ | γ1 ln+2 (x0 , . . . , xn ) < a1 } ∩ {xn+1 = xn } |xn+1 − x ∗ | γ1 ln+2 (x0 , . . . , xn ) < a1 } ∩ {xn+1 = xn }, since |xn+1 − x ∗ |
γ2 ln+2 (x0 , . . . , xn ) 1 > γ2 ≥ a1 . ∗ |xn − x | 2L0 Consequently, we get 6 4 γ1 ln+1 (x0 , . . . , xn ) qn+1 x0 , . . . , xn , xn+1 : < a1 |xn+1 − x ∗ | 4 ln+2 (x0 , . . . , xn , xn+1 ) = pn+1 x0 , . . . , xn , xn+1 : |xn+1 − x ∗ | < a1 , f (xn+1 ) < f (xn )}) . 1 If f (xn+1 ) < f (xn ), then |xn+1 | ≤ |xn − x ∗ | b (cf. Remark 16.1, part 2) and therefore 4 6 γ1 ln+1 (x0 , . . . , xn ) xn+1 : < a , f (x ) < f (x ) 1 n+1 n |xn+1 − x ∗ | 6 4 γ1 ln+1 (x0 , . . . , xn ) = ∅, since b < a ⊂ xn+1 : 1 |xn+1 − x ∗ | γ1 ln+1 (x0 , . . . , xn ) 1 ≥ a1 . b > γ1 b |xn+1 − x ∗ | 2L0 The remainder is now done analogously to part 1. Lemma 16.7 Let the assumptions of Theorem 16.1 be satisfied. Define B :=9 [b1 , b2 ] with :0 < b1 < b2 . ln b2 − ln b1 + 1. Then for all stopping times T with regard to Let K := − ln γ2 (An : n ∈ N0 )(for the definition, cf. [1, p. 215]), for all i ∈ N and all starting points x0 ∈ D \ D ∗ we have ⎛
T8 +j K
P⎝
j =T
⎞ {Qj ∈ B}|AT ⎠ ≤ (1 − 2−K )i P -a.s.
16 Success/Failure-Driven Random Direction Procedures
301
Proof (1) Let n ∈ N be arbitrary, but fixed. For j ∈ N0 , define Aj := {Qn+Kj +1 = γ2 Qn+Kj } ∩ · · · ∩ {Qn+K(j +1) = γ2 Qn+K(j +1)−1 }. A simple calculation yields K = inf{n ∈ N : b2 γ2n < b1 }. From this it follows (where Aj denotes the complement of Aj ) i−1 8 j =0
Aj ⊃
n+Ki 8
{Qj ∈ B} for all i ∈ N, since: Let ω ∈
j =n
n+Ki 8
i=1 8
{Qj B} with ω ∈ /
j =n
j =0
Aj ;
then there is a j ∈ {0, . . . , i − 1} with ω ∈ Aj . Because of Qn+Kj ∈ B, it follows that ω ∈ {Qn+Kj ∈ B}∩Aj and therefore Qn+k(j +1) < b1 , hence, Qn+k(j +1) (ω) ∈ / B. However, because of j + 1 ≤ i, we find Qn+k(j +1) (ω) ∈ B. 1 (2) Because of P (Qn+1 = γ2 Q2 |Aj ) ≥ P -a.s. for all j ∈ N (cf. Lemma 16.1), 2 we get P ({Qj +1 = γ2 Qj } ∩ {Qj +2 = γ2 Qj +1 }|Aj ) = E(E(l{Qj +1 =γ2 Qj } l{Qj +2 =γ2 Qj +1 } |Aj +1 )|Aj ) = E(l{Qj +1 =γ2 Qj } P (Qj +2 = γ2 Qj +1 |Aj +1 )|Aj ) ≥
1 P (Qj +1 = γ2 Qj |Aj ) Continuing this way, one obtains 2 P(
j +L−l 8 m=j
1 {Qm+1 γ2 Qm }|Aj ) ≥ ( )L P -a.s. 2
for all L ∈ N and j ∈ N. When considering the definition of Aj , it follows that 1 P (Aj |An+Kj ) ≥ ( )K 2 and therefore P (Aj |An+Kj ) ≤ 1 − 2−K P -a.s. for all j ∈ N0 .
302
16 Success/Failure-Driven Random Direction Procedures
From this we get again P (A0 ∩ A1 |An ) = E(lA · P (A1 |An+k )|An ) 0
≤ (1 − 2−K )P (A0 |An ) ≤ (1 − 2−K )2 P -a.s. Proceeding this way, we finally get P(
i−1 8
Aj |An ) ≤ (1 − 2−K )i P -a.s.
j =0
Now the assertion follows from Lemma C.3. Lemma 16.8 Let the assumptions of Theorem 16.1 be satisfied, and select an arbitrary starting point x0 ∈ D \ D ∗ . Let c ≥ 2(1 + 1/b)/ l0 (with b according to assumption (16.1)) be arbitrary. Denote by Tn , Mn , resp., the n-th time of entry, the sojourn time of the sequence (Qn : n ∈ N0 ) in the interval [c, ∞) (for the definition, see Definition C.2). Then, (1) 9
ln QTn − ln c Mn = 1 + − ln γ2
: P -a.s. on {Tn < ∞}
(2) P (Qn ≤ α for infinite many n ∈ N0 ) = 1 for all α > 0. Proof (1) (i) To prove the first assertion, by complete induction we prove next to the following assertion: For all k ∈ N and any stopping time T with regard to (An : n ∈ N0 ) we have ! k 8 {QT +i = γ2 QT +i−1 |AT } = 1 P -a.s. for {γ2k−1 QT ≥ c}. P i=1
k=1: From Lemma 16.1 it follows for all n ∈ N0 that P (Qn+1 = γ2 Qn |An ) = 1 P -a.s. on {Qn ≥ c}. Lemma C.3 yields then the basis of the induction.
16 Success/Failure-Driven Random Direction Procedures
303
k −→ k + 1 tδ . x∈B
(18.22a) For rt () we now have the following result (Fig. 18.2):
Fig. 18.2 Shell for D = R2+ , F0 (y) := y12 + y22
References
349
Lemma 18.3 With condition (18.21) and assuming stochastic independence of Zt+1 and Fˆt , given Xt = xt , it holds rt () ≤ sup πt (x, B+δ \ B ),
t > tδ .
(18.22b)
x∈B
Note 18.1 The set B+δ \ B is a shell of the level set B+δ = {y ∈ D : F0 (y) ≤ + δ}:
References 1. Devroye, L., Krzyzak, A.: Random Search Under Additive Noise, pp. 383–417. Kluwer Academic Publishers, Boston (2002) 2. Marti, K.: Minimizing noisy objective function by random search methods. ZAMM 62, 377–380 (1982) 3. Marti, K.: Stochastic Optimization Methods, 2nd edn. Springer, Berlin (2008). https://doi.org/ 10.1007/978-3-540-79458-5 4. Polyak, B.: Introduction to optimization. Translation series in mathematics and engineering. Optimization Software, New York (1987)
Appendix A
Properties of the Uniform Distribution on the Unit Sphere
Let denote S := {x ∈ Rd : |x| = 1} the unit sphere in the space Rd (d ∈ N). Moreover, for d ≥ 2 let T : R+ × [o, π ]d−2 × [o, 2ϕ] → Rd be the d-dimensional polar coordinate representation, i.e., ⎞ r cos ϕ1 ⎟ ⎜ r sin ϕ cos ϕ 1 2 ⎟ ⎜ ⎟ ⎜ . ⎟ ⎜ .. T (r, ϕ1 , . . . , ϕd−1 ) := ⎜ ⎟ ⎟ ⎜ ⎝ r sin ϕ1 sin ϕ2 . . . sin ϕd−2 cos ϕd−1 ⎠ r sin ϕ1 sin ϕ2 . . . sin ϕd−2 cos ϕd−1 ⎛
and define T1 : [o, π ]d−2 × [o, 2π ] → Rd , T1 (ϕ1 , . . . , ϕd−1 ) := T (1, ϕ1 , . . . , ϕd−1 ).
We know that T1 ([o, π ]d−2 × [o, 2π ]) = S. Definition A.1 The probability measure γ on (S, B|S), defined by (i) γ (B) := 12 −1 (B) + 12 1 (B) for d = 1 (ii) γ (B) := (d/2) sind−2 ϕ1 sind−3 ϕ2 . . . sin ϕd−2 d(ϕ1 , . . . , ϕd−1 ) 2π d/2 −1(B)
for d ≥ 2
T1
is called the normalized uniform distribution on S. Obviously, this is a measure. The fact that this measure is normalized follows from
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4
351
352
A Properties of the Uniform Distribution on the Unit Sphere
π
π 2π
0
2π d/2 (d/2)
sin ϕd−2 dϕd−1 . . . dϕ1 =
... 0
0
Furthermore, from the definition of γ , for each nonnegative function g : S → R we have
π
(d/2) g(s)γ (ds) = 2π d/2
π 2π ...
0
S
g(T1 (ϕ1 , . . . , ϕd−1 ))| 0
dT1 |dϕd−1 . . . dϕ1 , dϕ
0
d−2 1 ϕ1 sind−3 ϕ2 . . . sin ϕd−2 . where | dT dϕ | := sin
Lemma A.1 Given a nonnegative, measurable function h : Rd → R, we have
2π d/2 (d/2)
h(x)λd (dx) = Rd
2π d/2 (d/2)
=
f d−1 h(rs)γ (ds)λ1 (dr) R+ S
f d−1 h(rs)λ1 (dr)γ (ds). S R+
Proof Using the d-dimensional substitution rule, we get
h(x)λd (dx) =
h(T (r, ϕ1 , . . . , ϕd−1 ))| T −1 (Rd )
Rd
dT |λd (d(r, ϕ)). d(r, ϕ)
dT 1 Because of | d(r,ϕ) | = r d−1 sind−2 ϕ1 sind−3 ϕ2 . . . sin ϕd−2 = r d−1 | dT dϕ | and due to T (r, ϕ1 , . . . , ϕd−1 ) = rT1 (ϕ1 , . . . , ϕd−1 ) , we find
∞ π
h(x)λ (dx) = d
Rd
π 2π r d−1 h(rT1 (ϕ1 , . . . , ϕd−1 ))|
... 0
0
2π d/2 = (d/2)
0
dT1 |dϕd−1 . . . dϕ1 dr dϕ
0
∞
r d−1 h(rs)γ (ds)dr. 0 S
Thus, the first equality is shown, and the second one follows from Fubini’s theorem. Corollary A.1 Let X be a d-dimensional random vector such its distribution PX has a Lebesgue density g.
A Properties of the Uniform Distribution on the Unit Sphere
Defining Y :=
X |X| and
353
R := |X| , we get
(1) P ((Y, R) ∈ C) =
2π d/2 (d/2)
r d−1 g(rs), γ ⊗ λ1 (d(s, r)) C
C ∈ (B1 |R+ ). (2) If g(x) = h(|x|) (with a measurable function h : R → R), then Y and R are independent. Moreover, PY = γ . Proof It is sufficient to show the first assertion for product sets C = A × B(A ∈ Bd |S, B ∈ B1 |R) , since those form an intersection-stable generator of the product−σ −algebra (Bd |S) ⊗ (B1 |R+ ) . Hence, let C = A × B . x Furthermore, define D := {x ∈ Rd \{0} : |x| ∈ A, |x| ∈ B}. Because of {(Y, R) ∈ C} = {X ∈ D} , we get
P ((Y, R) ∈ C) =
g(x)λd (dx) D
=
2π d/2 (d/2)
r d−1 1D (rs)g(rs)drγ (ds) S R+
=
2π d/2 (d/2)
r d−1 1A (s)1B (r)g(rs)drγ (ds) S R+
=
2π d/2 (d/2)
r d−1 g(rs)drγ (ds) A R+
2π d/2 = (d/2)
r d−1 g(rs)γ ⊗ λ1 (d(s, r)). A×B
Assertion (2) follows now directly from the first one, since g(rs) = h(|rs|) = h(r) and
r d−1 h(r)dr = R+
(d/2) 2π d/2
according to A.2.
354
A Properties of the Uniform Distribution on the Unit Sphere
Lemma A.2 If T : Rd → R is an orthogonal mapping , then T (γ ) = γ . Proof Because of |T (x)| = |x| for arbitrary x ∈ Rd , we have T (S) ⊂ S. Since T is bijective and the mapping T −1 is also orthogonal, it follows that T (S) = S. Let X denote a d-dimensional standardized normal distributed random vector. Then PX has the density |x|2 . fX (x) = (2π )−d/2 exp − 2 X According to A.1, (2), we have PY = γ , where Y = |X| . It is known that the random vector T (X) has the density
−1 dT fX (T −1 (y)) = fX (y); fT (X) (y) = dy indeed, because of the orthogonality of T we have |
dT −1 | = 1 and |T −1 | = |y|. dy
Thus, T (X) is also a d-dimensional standardized normal distributed random vector. Because of T(
X 1 T (X) )= T (X) = , |X| |X| |T (X)|
we get T (γ ) = T (Pγ ) = PT (Y ) = PT (X/|X|) = P T (X) = γ , |T (X)|
which yields the assertion.
Lemma A.3 Consider a cone C(α, x) with the axis of symmetry x and the apex angle 2α (cf. Definition B.1). Then for all x, y = 0 and α ∈ [0, π ] we have γ (S ∩ C(α, x)) = γ (S ∩ C(α, y)). Proof Consider x, y = 0 and α ∈ [0, π ]. Without loss of generality we may assume that |x| = |y| = 1 . Let T denote the reflection at the hyperbola H through the point x+y 2 and with the normal vector x − y, i.e.
A Properties of the Uniform Distribution on the Unit Sphere
H = {z ∈ R : (x − y) d
T
x+y z− 2
355
= 0}.
We know that T is orthogonal, hence, T = T −1 , moreover, T (x) = y. Because of x T z = (T (x))T T (z) = y T T (z), we have z ∈ S∩C(α, x) and therefore x T z ≤ cos α if and only if T (z) ∈ S∩C(α, y). This yields T (S ∩ C(α, x)) = S ∩ C(α, y). Now the assertion follows with Lemma (A.2) . Definition A.2 For given x ∈ Rd , r > 0 and B ∈ Bd define γx,r (B) := γ ({s ∈ S : x + rs ∈ B}). The probability measure γx,r is called the uniform distribution on the sphere around x and with radius r. In particular we have γ0,1 = γ .
Appendix B
Analytical Tools
At first we will compile some characteristics of convex functions and convex sets. Lemma B.1 Let f : Rd → R. Then the following statements are equivalent: (i) f is convex (y) (z) (ii) f (x)−f ≥ f (x)−f for all x, z ∈ Rd , x = z |x−y| |x−z| and for all y between x and z (i.e. y = (1 − t)x + tz with t ∈ (0, 1)). (y) (z) (iii) f (x)−f ≥ f (x)−f for all x, z ∈ Rd , x = z |x−y| |x−z| and any y between x and z. Proof To prove the first equivalence, let x, z ∈ Rd , x = z and y = (1 − t)x + tz with t ∈ (0, 1). From the continuity of f it follows that f (y) ≤ (1 − t)f (x) + tf (z), therefore f (x) − f (y) ≥ t (f (x) − f (z)) and thus f (x) − f (y) f (x) − f (z) f (x) − f (y) = ≥ . |x − y| |x − z| |x − z| If, conversely, (ii) is satisfied, it follows that f ((1 − t)x + tz) = f (y) ≤ f (x) − |z − y|
f (x) − f (z) |x − z|
= f (x) − t (f (x) − f (z)) = (1 − t)f (x) + tf (z), i.e. f is convex. The equivalence of (i) and (iii) is proven analogically.
Corollary B.1 Let f : Rd → R be convex and differentiable.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4
357
358
B Analytical Tools
Then we have for all x, z ∈ Rd (∇f (z))T (x − z) ≤ f (x) − f (z) ≤ (∇f (x))T (x − z) Proof Let first be d = 1 and let x, z ∈ Rd . Without loss of generality let z > x. For all y ∈ (x, z) Lemma B.1, (ii) yields the inequality and thus f (x) − f (z) ≤ (z − x) lim y↓x
f (x)−f (z) |z−x|
≤
f (x)−f (y) |y−x|
f (x) − f (y) = f (x)(x − z). |y − x|
The inequality f (x) − f (z) ≥ (f (z))(x − z) can be shown analogically. Let now d ∈ N be arbitrary. Let x, z ∈ Rd be arbitrary, but fixed. Let g(t) := f ((1 − t)x + tz). From the differentiability of f follows the differentiability of g. If f is convex, g is also convex. Therefore −g (1) ≤ g(0) − g(1) ≤ −g(0) , i.e. (∇f (z))T (x − z) ≤ f (x) − f (z) ≤ (∇f (x))T (x − z). Rd
Lemma B.2 Let D ⊂ be open and convex, and let f : D → R be convex. Then f is continuous. Lemma B.3 Let D ⊂ Rd be open and convex, and let f : D → R be twice continuously differentiable. Then f is convex if and only if the Hessian Matrix of f is positive semidefinite definite in all points x ∈ D. Lemma B.4 Let D ⊂ Rd be convex and let f : D → R be continuous and convex. Let there be an aˆ ∈ R such that the set {x ∈ D : f (x) ≤ a} ˆ is not empty and bounded. Then the sets {x ∈ D : f (x) ≤ a} are bounded and convex for any a ≥ inf{f (x) : x ∈ D}. Lemma B.5 Let A ⊂ Rd be convex and let x ∈ rdA. Then A has supporting hyperplane at x, i.e. there is a z ∈ Rd \{0} with A ⊂ x + {y ∈ Rd : y T z ≤ 0}. Lemma B.6 Let D ⊂ Rd be convex. Let f : D → R be convex. Then we have for all x ∈ D with f (x) > inf{f (z) : z ∈ D}: {y ∈ D : f (y) = f (x)} ⊂ rd{y ∈ D : f (y) ≤ f (x)}
B Analytical Tools
359
Proof Let x ∈ D with f (x) > inf{f (z) : z ∈ D} and let r > 0 be arbitrary. Then it is trivial that we have for y ∈ D with f (y) = f (x): K(y, r) ∩ {u ∈ D : f (u) ≤ f (x)} = ∅. But we also have K(y, r) ⊂ (Rd \{u ∈ D : f (u) ≤ f (x)} = ∅. For if this intersection were empty, then we would have K(y, r) ⊂ {u ∈ D : f (u) ≤ f (x)}. Now Lemma B.1 yields a contradiction to the convexity of f . For since we have f (x) > inf{f (u) : u ∈ D}, there is a z ∈ D with f (x) > f (z). r If we now choose y1 := y + 2|y−z| (y − z), then y1 ∈ K(y, r), therefore f (y1 ) ≤ f (x) = f (y). r (z − y1 )), it follows from Since y lies between y1 and z (for we have y = y1 + 2|y−z| Lemma B.1
f (y1 ) − f (y) f (y) − f (z) ≥ , therefore |y1 − y| |y − z| f (y1 ) ≤ f (y) +
r f (y) − f (z) > f (y). 2 |y − z|
Definition B.1 T
x y (1) For x, y ∈ Rd \{0} let ϕ(x, y) := arccos |x||y| (angle between x and y) d (2) For ϕ ∈ [0, π ] and x ∈ R \{0} let C(ϕ, x) := {y ∈ Rd : ϕ(x, y) ≤ ϕ} (a cone with the “axis of symmetry” x and the “apex angle” 2ϕ) (3) For ϕ ∈ [0, π ], x ∈ Rd \{0} and r > 0 let C(ϕ, x, r) := {y ∈ Rd : ϕ(x, y) ≤ ϕ, |y| ≤ r} (4) For ϕ ∈ [0, π ], x ∈ Rd \{0}, and r1 , r2 > 0 let C(ϕ, x, r1 , r2 ) := {y ∈ Rd : ϕ(x, y) ≤ ϕ, r1 ≤ |y| ≤ r2 }
Lemma B.7 Let (At : t ∈ K) be a family of invertible d × d matrices with a compact index set K. Let the representation t → At be continuous. Then: −1 (1) ∀ φ > 0 ∃ ϕ > 0 ∀ x ∈ Rd \*{0}, t ∈ K : C(φ, A−1 t x) ⊂ At (C(φ, x)) C |x| ≤ |At x| ≤ λmax |x|, where λmin := (2) ∀ x ∈ Rd \ {0}, t ∈ K : λmin t t t
min{λ : λ eigenvalue of ATt At } and λmax := max{λ : λ eigenvalue of ATt At } t
360
B Analytical Tools
(3) ∀ φ > 0 ∃ ϕ > 0 ∀ x ∈ Rd \ {0}, t ∈ K : r1 r2 −1 C(ϕ, A−1 t x, √ min , √ max ) ⊂ At (C(φ, x, r1 , r2 )) λt
λt
and λmax according to (2). with λmin t t Proof (i) ∀ φ > 0 ∃ ϕ > 0 ∀ x, y = 0, t ∈ K : [ϕ(A−1 t x, y) ≤ ϕ ⇒ ϕ(x, At y) ≤ φ] ⇐⇒ ∀ φ > 0 ∃ ϕ > 0 ∀ x, y = 0, t ∈ K : [ϕ(x, y) ≤ ϕ ⇒ ϕ(At x, At y) ≤ φ] ⇐⇒ ∀ φ > 0 ∃ ϕ > 0 ∀ x, y with |x| = |y| = 1, t ∈ K : (At x)T At y [x T y ≥ 1 − ϕ ⇒ arccos ≤ φ] |At x||At y| ⇐⇒ ∀ > 0 ∃ δ > 0 ∀ x, y with |x| = |y| = 1, t ∈ K : [x T y ≥ 1 − δ ⇒
(At x)T At y ≤ φ] |At x||At y|
For an arbitrary d × d matrix A and for arbitrary x, y ∈ Rd it follows that 1−
(Ax)T Ay (At x)T At y ≤ |1 − | |Ax||Ay| |At x||At y| =
Ax T 1 ||Ay| − |Ax| − ( ) A(y − x)| ≤ |Ay| |Ax|
≤
Ax T 1 (||Ay| − |Ax|| + |( ) A(y − x)|) ≤ |Ay| |Ax|
≤
2|A(y − x)| 1 (|A(y − x)| + |A(y − x)|) = ≤ |Ay| |Ay|
≤
2|A||y − x| , |Ay|
where |A| stands for the Euclidean norm of the matrix A. For t ∈ K and y ∈ Rd let g(t, y) := |At y|. From the continuity of the representation t → At follows the continuity of the representation g. From the compactness of K × {y ∈ Rd : |y| = 1} = 1 and the invertibility of the matrices At it follows that there is a c > 0 with inf{|At y| : t ∈ K, |y| = 1} ≥ c. From the continuity of the representation t → At it follows also that the representation t → |At | is continuous and thus that there is an R < ∞ with
B Analytical Tools
361
sup{|At | : t ∈ K} ≤ R. For arbitrary x, y ∈ Rd with |x| = |y| = 1 and arbitrary t ∈ K and because of |x − y|2 = |x|2 + |y|2 − 2x T y = 2(1 − x T y) this now yields the inequality 1−
* (At x)T At y 2R ≤ 2(1 − x T y). |At x||At y| c
Thus the assertion is proven. (ii) From the invertibility of At it follows that ATt At is a positive definite symmetric matrix. There is therefore a diagonal matrix with positive diagonal elements and an orthogonal matrix S with ATt At = S T S. Hence |At x|2 = x T (ATt At )x = (Sx)T (Sx). Since the diagonal elements of are exactly the eigenvalues of ATt At , it follows that 2 2 max |Sx|2 λmin t |Sx| ≤ |At x| ≤ λt
Because of the orthogonality of S we have |Sx| = |x|. This yields the assertion. (iii) If √r1min ≤ |y| ≤ √r2max , the inequality λt
λt
r1 ≤ |At y| ≤ r2 follows from (2). Together with (1) this yields the assertion. Rd
Lemma B.8 Let x, z ∈ and r > 0 with r < |x − z|. r Then with ϕ := arcsin |x−z| : (i) for all y ∈ x + C(ϕ, z − x) there is a t > 0 with |x + t (y − z) − z| ≤ r.
362
B Analytical Tools
(ii) x + C(ϕ, z − x, |z − x| − r) ⊂ conv ({x} ∪ K(z, r)) (conv A = convex closure of A) Proof r (i) Since y lies in x +C(ϕ, z−x), it follows that ϕ(y −z, z−x) ≤ arcsin |x−z| < π2 and hence (y − x)T (x − z) > 0. T (x−z) Let t := (y−x) and let z := x + t (y − x) and u := x + 1t (z − x) (z is |y−x|2 the projection of z on the straight line through x and y and y the projection of u on this straight line). Then (y − x)T (u − y) = (z − x)T (z − z ) = 0, which yields
|u − y|2 + |y − x|2 = |u − x|2 hence |y − x|2 |u − y|2 + =1 |u − x|2 |u − x|2 and |z − x|2 |z − z |2 + = 1. |z − x|2 |z − x|2 According to the definition of z and u, we have |y − x| |z − x| |z − x| + = |u − x| t · 1t |z − x| |z − x| From this and from the above equality it follows that |u − y| |z − z | = |u − x| |z − x| and thus |z − z | = |z − x| |u−y| |u−x| . Because of
B Analytical Tools
F 1−
363
(y − x)T (y − x) |u − y|2 |y − x| = = |u − x| |u − x| · |y − x| |u − x|2 =
(u − x)T (y − x) (u − x)T (y − x) + (y − u)T (y − x) = |u − x| · |y − x| |u − x| · |y − x|
= cos ϕ(u − x, y − x) = cos ϕ(z − x, y − x) it follows that sin ϕ(z − x, y − x) =
|u − y| |u − x|
and thus |u − y| r ≤ sin ϕ = |u − x| |x − z| hence |z − z | ≤ r. (ii) Let y ∈ x + C(ϕ, z − x, |z − x| − r) be arbitrary. According to part (1), there is a t > 0 with |x + t (y − x) − z| ≤ r. Let z := x + t (y − x). Because of |z − z| ≤ r we have |z − x| ≥ |z − x| − |z − z| ≥ |z − x| − r |z−x|−r and thus |z|y−x| −x| ≤ |z−x|−r = 1. With α :=
|y−x| |z −x|
we have
y = x + (y − x) = x +
|y − x| (z − x) = αz + (1 − α)x. |z − x|
Since α ∈ [0, 1] and z ∈ K(z, r), we have y ∈ conv ({x} ∪ K(z, r)) = 1.
Lemma B.9 Let D ⊂ Rd be convex. Let f : D → R be convex and continuous. Let f ∗ := inf{f (x) : x ∈ D} > −∞. Let the minimum set D ∗ := {x ∈ D : f (x) = f ∗ } be bounded. ˚ Furthermore, let there exist an x ∗ ∈ D ∗ ∩ D. Then: (i) for all x ∈ D\D ∗ there is a φ > 0 with 1 |y − x| (f (x)−f ∗ )} x +C(φ, x ∗ −x, |x ∗ −x|) ⊂ {y ∈ D : f (x)−f (y) ≥ 2 3|x ∗ − x| (ii) If a˜ > f ∗ and b˜ > 0 exist in such a way that the condition inf{|y − x ∗ | : y ∈ D, f (y) = f (x)}
364
B Analytical Tools
≥ b˜ · sup{|y − x ∗ | : y ∈ D, f (y) = f (x)} ∗ , then we have furthermore: is satisfied for all x ∈ {y ∈ D : f (y) ≤ a}\D ˜ π ∗ for all a > f there is a φ ∈ (0, 2 ] with
4 6 1 |y − x| (f (x) − f ∗ ) x+C(φ, x ∗ −x, |x ∗ −x|)⊂ y ∈ D : f (x) − f (y)≥ ∗ 2 3|x − x| for all x ∈ {y ∈ D : f (x) ≤ a}\D ∗ Proof Because of x ∗ ∈ D 0 there is a ρ > 0 with K(x ∗ , ρ) ⊂ D. For x ∈ D let r(x) := inf{|y − x ∗ | : y ∈ D, f (y) = f (x)}. From the continuity of f it follows that r(x) > 0 for all x ∈ D\D ∗ . For x ∈ D\D ∗ let ρ(x) := min{ρ, r(x)} and φ(x) := arcsin
ρ(x) . 2|x − x ∗ |
Let now x ∈ D\D ∗ be arbitrary. Let z ∈ K(x ∗ , 12 ρ(x)), z = x ∗ . ρ(x) ∗ Let x˜ := x ∗ + |z−x ∗ | (z − x ). Because of |x˜ − x ∗ | = ρ(x) > 0, we have f (x) ˜ ≤ f (x). For if f (x) ˜ > f (x), then there would be a t ∈ [0, 1) with f (x ∗ + t (x˜ − x ∗ )) = f (x) and thus ρ(x) ≤ r(x) ≤ |x ∗ + t (x˜ − x ∗ ) − x ∗ | = t|x˜ − x ∗ | = tρ(x), i.e. ρ(x) = 0, which is in contradiction to ρ(x) > 0. Then it follows from Lemma B.1 that f (x) ˜ − f∗ f (x) − f ∗ f (z) − f ∗ ≤ ≤ ∗ ∗ |z − x | |x˜ − x | ρ(x) and thus f (z) − f ∗ ≤
1 (f (x) − f ∗ ). 2
Let now y ∈ x + C(ϕ(x), x ∗ − x, 12 |x ∗ − x|) be arbitrary. According to Lemma B.8, we now have because of the convexity of D
B Analytical Tools
365
1 x + C(ϕ(x), x ∗ − x, |x ∗ − x|) ⊂ 2 conv ({x} ∪ K(x ∗ ,
ρ(x) )) ⊂ D. 2
Likewise according to Lemma B.8, there is a t > 0 with |x + t (y − x) − x ∗ | ≤
ρ(x) . 2
Because of |y − x| ≤ 12 |x ∗ − x|, we even have t > 1, because for t < 1 it follows that |x + t (y − x) − x ∗ | ≥ |x − x ∗ | − t|y − x| > 1 |x − x ∗ | − |y − x| ≥ |x − x ∗ | − |y − x| 2 r(x) ρ(x) ≥ ≥ , 2 2 which is in contradiction to |x + t (y − x) − x ∗ | ≤
ρ(x) . 2
From Lemma B.1 it follows furthermore with z := x + t (y − x) that f (x) − f (z) f (x) − f (y) ≥ . |x − y| |x − z| Because of |z − x ∗ | ≤ 12 ρ(x) is f (z) − f ∗ ≤ 12 (f (x) − f ∗ ) and because of |x − z| ≤ |x − x ∗ | + |z − x ∗ | ≤ 1 ≤ |x − x ∗ | + ρ(x) ≤ 2 3 1 ≤ |x − x ∗ | + r(x)| ≤ |x − x ∗ | 2 2 it finally follows that f (x) − f (y) ≥ ≥
|x − y| (f (x) − f ∗ − (f (z) − f ∗ )) |x − z| |x − y| f (x) − f ∗ 1 |y − x| ≥ (f (x) − f ∗ ). |x − z| 2 3 |x ∗ − x|
366
B Analytical Tools
Thus the first assertion is proven. The second assertion follows obviously from the first, if for all a > f ∗ there is a φ > 0 such that we have φ(x) ≥ φ for all x ∈ D with f ∗ < f (x) ≤ a. Now we will show that such a φ exists. Let a ≥ a˜ be arbitrary. Let R := sup{|x − x ∗ | : x ∈ D, f (x) ≤ a}. According to Lemma B.4, we have R < ∞. For x ∈ {y ∈ D : a˜ ≤ f (y) ≤ a} we have inf{|y − x ∗ | : y ∈ D, f (y) = f (x)} inf{|y − x ∗ | : y ∈ D, f (y) ≥ a} ˜ ≥ := bˆ ∗ sup{|y − x | : y ∈ D, f (y) = f (x)} R Because of a˜ > f ∗ and since f is continuous, we have bˆ > 0. ˆ b} ˜ that According to the condition, it follows now with b := min{b, inf{|y − x ∗ | : y ∈ D, f (y) = f (x)} ≥ b · sup{|y − x ∗ | : y ∈ D, f (y) = f (x)} for all x ∈ D with f ∗ < f (x) ≤ a. Let now x ∈ D with f ∗ < f (x) ≤ a. If ρ ≤ r(x) it follows in accordance with the definition that ρ(x) = ρ and thus ρ(x) ρ ≥ . ∗ 2|x − x | 2R If ρ > r(x), it follows in accordance with the definition that ρ(x) = r(x) and thus ρ(x) 1 r(x) b = ≥ . 2|x − x ∗ | 2 2|x − x ∗ | 2 ρ If we now choose φ := min{arcsin b2 , arcsin 2R }φ(x) ≥ φ, and thus the assertion, ∗ follows for all x ∈ D with f < f (x) ≤ a. In the following it is a matter of formulating conditions that are sufficient so that the conditions (14.2) and (14.3) and the conditions (15.1) and (15.2) are satisfied.
Lemma B.10 Let D ⊂ Rd be not empty and let f : D → R be a map with the following properties: (1) (2) (3) (4)
There is an x ∗ ∈ D 0 with f ∗ := f (x ∗ ) < f (x) for x = x ∗ . f is twice continuously differentiable in a neighborhood of x ∗ . The Hessian matrix of f is positive definite at x ∗ . An r0 > 0 exists such that there is a c > 0 with {y ∈ D : f (y) − f ∗ < c} ⊂ K(x ∗ , r) for all r ∈ (0, r0 ]
Then c0 , a1 , a2 > 0 exist with
B Analytical Tools
367
√ √ K(x ∗ , a1 c) ⊂ {y ∈ D : f (x) − f ∗ < c} ⊂ K(x ∗ , a2 c)
for all c ∈ (0, c0 ]
Proof (1) Let H be the Hessian Matrix of f at x ∗ . According to (2), f is twice continuously differentiable on a sphere K(x ∗ , r). From the multidimensional Taylor formula it follows then for h = (h1 , . . . , hd )T ∈ D(0, r) that f (x ∗ +h)+f (x ∗ )+(∇f (x ∗ ))T +
d 1 ∂ 2 f (x ∗ + ϑh) hi hj 2 ∂xi ∂xj
with ϑ ∈ [0, 1]
i,j =1
Because of (1) and (2), we have ∇f (x ∗ ) = 0. Therefore f (x ∗ + h) + f (x ∗ ) + 12 hT H h + R(h), with R(h) =
d 1 ∂ 2 f (x ∗ + ϑh) ∂ 2 f (x ∗ ) ( − )hi hj 2 ∂xi ∂xj ∂xi ∂xj i,j =1
Let ρ(h) :=
d 1 ∂ 2 f (x ∗ + ϑh) ∂ 2 f (x ∗ ) | − )| 2 ∂xi ∂xj ∂xi ∂xj i,j =1
For i, j ∈ {1, . . . , d} we have |hi hj | ≤ |h|2 and therefore |R(h)| ≤ ρ(h)|h|2 . Furthermore, it follows from (2) that lim ρ(h) = 0.
h→0
For y ∈ K(x ∗ , r) we have therefore 1 f (y) = f ∗ + (y − x ∗ )T H (y − x ∗ ) + R(y − x ∗ ), 2 where |R(h)| ≤ |h|2 ρ(h) and lim ρ(h) = 0.
h→0
Let λ0 and λ1 be the smallest and the largest eigenvalue of H , respectively. Then
368
B Analytical Tools
1 1 1 λ0 |y − x ∗ |2 ≤ |y − x ∗ |T H (y − x ∗ ) ≤ λ1 |y − x ∗ |2 2 2 2 (cf., for example, the proof of Lemma B.7, Part 2)). Hence 1 1 λ0 |y − x ∗ |2 − |R(y − x ∗ )| ≤ f (y) − f ∗ ≤ λ1 |y − x ∗ |2 + |R(y − x ∗ )| 2 2 and because of |R(y − x ∗ )| ≤ |y − x ∗ |2 ρ(y − x ∗ ) we have
1 1 λ0 − ρ(y − x ∗ ) |y − x ∗ |2 ≤ f (y) − f ∗ ≤ λ1 − ρ(y − x ∗ ) |y − x ∗ |2 2 2
This inequality is valid for all y ∈ K(x ∗ , r). (2) Let now r1 ≤ r be chosen so that ρ(y − x ∗ ) < K(x ∗ , r1 ). Then the inequality
1 4 λ0
is satisfied for all y ∈
1 λ0 |y − x ∗ |2 ≤ f (y) − f ∗ ≤ λ1 |y − x ∗ |2 2 follows for y ∈ K(x ∗ , r1 ). According to Condition (4) r0 ≤ r1 and c1 > 0 exist with {y ∈ D : f (y) − f ∗ < c1 } ⊂ K(x ∗ , r0 ). Let now 6 4 1 2 and let c ∈ (0, c0 ]. c0 := min c1 , r0 λ0 4 Because of c ≤ c1 we have {y ∈ D : f (y) − f ∗ < c} ⊂ K(x ∗ , r0 ). Furthermore, because of c ≤ r02 λ0 · 14 it follows that √ 2 c K(x , √ ) λ0 ∗
and of course also √ c ⊂ K(x ∗ , r0 ). K x∗, √ λ1 Therefore √ 4 6 ∗ 2 c d 1 ∗ 2 = y ∈ R : λ0 |y − x | < c ∩ K(x ∗ , r0 ) ⊃ K x ,√ 4 λ0
B Analytical Tools
369
& ' ⊃ y ∈ D : f (y) − f ∗ < c ∩ K(x ∗ , r0 ) = & ' = y ∈ D : f (y) − f ∗ < c ⊃ √ c ⊃ y ∈ Rd : λ1 |y − x ∗ |2 < c ∩ K(x ∗ , r0 ) = K x ∗ , √ λ1 Lemma B.11 Property (4) of Lemma B.10 follows from the properties (1), (2), (3) and one of the additional properties (4 ) D is compact, and f is continuous or (4 ) D is convex and f is convex. Proof Let property (4’) be satisfied. From (2) and (3) it follows that the Hessian matrix is positively definite—and therefore, according to Lemma B.3, that f is convex—on a closed neighborhood K(x ∗ , r1 ). Since D is compact, also D\K(x ∗ , r1 ) is compact. Let a := min{f (y) : y ∈ D\K(x ∗ , r1 )}. Because of (1) we have a > f ∗ . Because f is continuous and x ∗ lies in the interior of D, there is an r0 ∈ (0, r1 ) with K(x ∗ , r1 ) ⊂ {x ∈ D : f (x) < a}. Let r ∈ (0, r0 ) be arbitrary and c := min{f (y) : |y − x ∗ | = r}. To prove property (4) it is necessary to prove the inclusion {y ∈ D : f (y) < c} ⊂ K(x ∗ , r). Let for this purpose x ∈ D\K(x ∗ , r) be arbitrary. If |x − x ∗ | ≥ r1 , then we have by definition f (x) ≥ a. But now the inequality c ≤ a is satisfied because of r ≤ r0 . For if c > a, a y ∈ Rd would exist with |y − x ∗ | = r and f (y) = c > a. For z := x ∗ + rr0 (y − x ∗ ) it follows then that |z − x ∗ | = r0 . Since y lies between x ∗ and z and because of the convexity of f on K(x ∗ , r1 ), Lemma B.1 yields the inequality f (z) − f ∗ f (y) − f ∗ , therefore ≥ r0 r r0 f (z) ≥ f ∗ + (f (y) − f ∗ ). r Because of
r0 r
≥ 1, this yields the inequality
370
B Analytical Tools
f (z) ≥ f (y) > a, which is in contradiction to K(x ∗ , r0 ) ⊂ {x ∈ D : f (x) < a}. therefore we have c ≤ a and thus f (x) ≥ c. r ∗ ∗ If r ≤ |x − x ∗ | < r1 , we have with z := x ∗ + |x−x ∗ | (x − x ) at first |z − z | = r and thus f (z) ≥ c. From the convexity of f on K(x ∗ , r1 ) it follows again with Lemma B.1 that f (z) − f ∗ f (x) − f ∗ ≤ . r |x − x ∗ | From this it follows that f (z) ≤ f (x) and from this in turn that f (x) ≥ c. In both cases we therefore have f (x) ≥ c. Thus D\K(x ∗ , r) ⊂ {y ∈ D : f (y) ≥ c} and therefore the Property (4) is satisfied. If D and f are convex, the inclusion {y ∈ D : f (y) < cr } ⊂ K(x ∗ , r), and thus the assertion, follows analogously for all r > 0 with cr := inf{f (y) : y ∈ D, |y − x ∗ | = r}. Corollary B.2 Let the conditions of Lemma B.10 be satisfied. Then r, a, b > 0 exist with a|x − x ∗ |2 ≤ f (x) − f ∗ ≤ b|x − x ∗ |2 for all x ∈ K(x ∗ , r) Proof The proof follows from the first inequality in part (2) of the proof of Lemma B.11. Corollary B.3 Let the conditions of Lemma B.10 be satisfied. Then for all x0 ∈ D\D ∗ constants c ∈ (0, 1] and d > 0 exist with &
' √ y ∈ D : f (y) − f ∗ ≤ (f (x) − f ∗ ) ⊂ K(x ∗ , d |x − x ∗ |) for all x ∈ {y ∈ D : f (y) ≤ f (x0 )} and any ∈ (0, c].
Proof Let x0 ∈ D\D ∗ . According to Lemma B.10 there is a c0 > 0 and an a > 0 with √ {y ∈ D : f (y) − f ∗ ≤ ρ} ⊂ K(x ∗ , a ρ) for all ρ ∈ (0, c0 ].
B Analytical Tools
371
0 If we choose c = f (x0c)−f ∗ , then we have for arbitrary x ∈ {y ∈ D : f (x) ≤ f (x0 )} and for arbitrary ∈ (0, c]
√ C {y ∈ D : f (y) − f ∗ ≤ (f (x) − f ∗ )} ⊂ K(x ∗ , a f (x) − f ∗ ). According to Corollary B.2, r > 0 and b > 0 exist with f (x) − f ∗ ≤ b|x − x ∗ |2 for all x ∈ K(x ∗ , r). If therefore x ∈ K(x ∗ , r), we have √ √ C K(x ∗ , a f (x) − f ∗ ) ⊂ K(x ∗ , a b|x − x ∗ |) If |x − x ∗ | ≥ r, we have f (x) − f ∗ ≤ f (x0 ) − f ∗ ≤
f (x0 ) − f ∗ (|x − x ∗ |2 ) |x − x ∗ |2
≤
f (x0 ) − f ∗ |x − x ∗ |2 r2
and thus √ C aC f (x0 ) − f ∗ |x − x ∗ |). K(x ∗ , a f (x) − f ∗ ) ⊂ K(x ∗ , r √ √ Now the assertion follows with d := a · max{ b, 1r f (x0 ) − f ∗ }.
Corollary B.4 Let D ⊂ Rd be convex. Let f : D → R be convex and continuous. Assume there is x ∗ ∈ D 0 with f (x) > f ∗ for all x ∈ D\{x ∗ }. Let f be twice continuously differentiable in a neighborhood of x ∗ with a Hessian matrix that is positively definite at x ∗ . Then for all x0 ∈ D constants m > 0, M < ∞ exist with m|x − x ∗ |2 ≤ f (x) − f ∗ ≤ M|x − x ∗ |2 for all x ∈ {y ∈ D : f (y) ≤ f (x0 )}. Proof Since D and f are convex, it follows from Lemma B.11 that all conditions of Lemma B.10 are satisfied. According to Corollary B.2, constants m, ˜ r > 0 and M˜ < ∞ exist with ˜ − x ∗ |2 for all x ∈ K(x ∗ , r). m|x ˜ − x ∗ |2 ≤ f (x) − f ∗ = M|x Let now x0 ∈ D be arbitrary. According to Lemma B.4 there is an R ≥ r with {y ∈ D : f (y) ≤ f (x0 )} ⊂ K(x ∗ , R).
372
B Analytical Tools ∗
˜ Let m := Rr m ˜ and M := max{ f (x0r)−f , M}. 2 Let x ∈ {y ∈ D : f (y) ≤ f (x0 )} be arbitrary. If x ∈ K(x ∗ , r), the assertion is trivial. If r ≤ |x − x ∗ |, we initially have f (x0 ) − f ∗ f (x) − f ∗ ≤ ≤ M, therefore f (x) − f ∗ ≤ M|x − x ∗ |2 . |x − x ∗ |2 r2 r ∗ To prove the second inequality, let y := x ∗ + |x−x ∗ | (x − x ). Because of |y − x ∗ | = r and according to Lemma B.1 it follows from the convexity of f that
f (x) − f ∗ f (x) − f ∗ 1 f (y) − f ∗ 1 ≤ = |x − x ∗ | |x − x ∗ | |x − x ∗ | |y − x ∗ | |x − x ∗ |2 =
r |y − x ∗ | f (y) − f ∗ ˜ = m. ≤ m |x − x ∗ | |y − x ∗ |2 R
Remark B.1 The conditions of Lemma B.10 imply the conditions (14.2), (14.3), (15.1), and (15.2). Proof Because of {y ∈ D : f (y) − f ∗ ≤
c } ⊂ {y ∈ D : f (y) − f ∗ < c} 2 ⊂ {y ∈ D : f (y) − f ∗ ≤ c}
(c > 0 arbitrary), the conditions (14.2) and (14.3) follow directly from Lemma B.10. Let c0 , a1 , a2 > 0 be chosen according to Lemma B.10. Then we have (cf. the proof of Corollary B.2) for all x ∈ {y ∈ D : f (y) − f ∗ ≤ c0 } 2 the inequality 1 1 |x − x ∗ |2 ≤ f (x) − f ∗ ≤ 2 |x − x ∗ |2 . 2 2a0 a1 If we choose a˜ := f ∗ +
c0 2,
then it follows for all x ∈ D with f (x) ≤ a˜ that
C inf{|y − y ∗ | : y ∈ D, f (y) = f (x)} ≥ a1 f (x) − f ∗ ≥
a1 1 · √ sup{|y − y ∗ | : y ∈ D, f (y) = f (x)}. a2 2
Condition (15.2) follows now with b˜ :=
a1 a2
·
√1 . 2
B Analytical Tools
373
According to the Conditions (2), (3), and (4) of Lemma B.10 there are r, c, > 0 such that f is twice continuously differentiable on K(x ∗ , r), that the Hessian matrix of f is positive definite on K(x ∗ , r) and that the set {y ∈ D, f (y) = f (x) ≤ c} is included in K(x ∗ , r). With Lemma B.3 this implies condition (15.1). Lemma B.12 Let condition (1), (2), and (3) of Lemma B.10 be satisfied. Let H (x) be the Hessian Matrix of f at x. Then the proof is as follows. Proof Because of ∇f (x ∗ ) = 0, the Taylor series expansion of f at x ∗ yields the representation 1 f (x) = f (x ∗ ) + (x − x ∗ )T H (x)(x − x ∗ ) + o(|x − x ∗ |). 2 Hence ∇f (x) = H (x ∗ )(x − x ∗ ) + o(|x − x ∗ |), therefore lim∗
x→x
∇f (x) − H (x ∗ )(x − x ∗ ) = 0, |x − x ∗ |
which—because of the continuity of H (x)−1 neighborhood of x ∗ —yields lim∗
x→x
H (x)−1 ∇f (x) − H (x)−1 H (x ∗ )(x − x ∗ ) = 0, |x − x ∗ |
and thus lim∗
x→x
H (x)−1 ∇f (x) − (x − x ∗ ) = 0, |x − x ∗ |
Lemma B.13 Let the conditions of Lemma B.10 be satisfied. Then constants a > f and b1 , b2 > 0 exist with b1 ≤
|∇f (x)| ≤ b2 for all x ∈ {y ∈ D : f (y) ≤ a}\{x ∗ }. |x − x ∗ |
Proof From ∇f (x) = H (x ∗ )(x − x ∗ ) + o(|x − x ∗ |)(cf. proof of Lemma B.12) and because of |
∗ ∗ |∇f (x)| ∇f (x) ∗ x−x ∗ x−x ) ) − |H (x || ≤ | − H (x | |x − x ∗ | |x − x ∗ | |x − x ∗ | |x − x ∗ |
for each > 0 a δ > 0 exists with
374
B Analytical Tools
|H (x ∗ )
∗ x − x∗ |∇f (x)| ∗ x−x ) | − ≤ ≤ |H (x |+ |x − x ∗ | |x − x ∗ | |x − x ∗ |
if 0 < |x − x ∗ | ≤ δ. Let M := max{|H (x ∗ )y| : y ∈ Rd , |y| = 1} and m := min{|H (x ∗ )y| : y ∈ Rd , |y| = 1} From the compactness of the Unit Sphere in the Rd it follows that M < ∞. From the invertibility of H (x ∗ ) it follows that m > 0. Thus for = m/2 an r > 0 exists with |∇f (x)| m m ≤ ≤ M + for all x ∈ K(x ∗ , r)\{x}. ∗ 2 |x − x | 2 According to Lemma B.10, it can be assumed without loss of generality that there is a c > 0 with {x ∈ D : f (x) − f ∗ < c} ⊂ K(x ∗ , r). This proves the first assertion. Let us now assume that D = Rd and that f is convex and twice continuously differentiable. Furthermore, let a > f ∗ be arbitrary. According to Lemma B.4, there is an R > 0 with {y ∈ D : f (y) ≤ a} ⊂ K(x ∗ , R). If R < r, the second assertion follows with c1 = m2 and c2 = M + m2 . Let r ≥ R. Let m1 := inf{|∇f (x)| : x ∈ D, f (x) ≤ a, r ≤ |x − x ∗ | ≤ R} and m2 := sup{|∇f (x)| : x ∈ D, f (x) ≤ a, r ≤ |x − x ∗ | ≤ R}. Because f is convex and continuously differentiable, we have m1 > 0. Furthermore, we have m2 < ∞. Then for x ∈ Rd with r ≤ |x − x ∗ | ≤ R: m1 |∇f (x)| m2 ≤ ≤ . R |x − x ∗ | r Now the second assertion follows with c1 := min{ m2 , m21 } and c2 := max{M + m m2 2 , r }. Lemma B.14 Let the conditions of Lemma B.10 be satisfied. Let (Ax : x ∈ D) be a family of positive definite and symmetric matrices. Let the representation x → Ax be continuous. Let Cx := (∇f (x)T Ax ∇f (x))Ax . Then constants a > f ∗ and b1 , b2 > 0 exist with b1 ≤
λ ≤ b2 |x − x ∗ |2
for all x ∈ {y ∈ D : f (y) ≤ a}\{x ∗ } and any eigenvalue λ of Cx . If D = Rd and if f is also continuously differentiable and convex, for all a > f ∗ there are constants c1 , c2 > 0 with
B Analytical Tools
375
c1 ≤
λ ≤ c2 |x − x ∗ |2
for all x ∈ {y ∈ D : f (y) ≤ a}\{x ∗ } and any eigenvalue λ of Cx . Remark B.2 If we set Ax = I (I = unity matrix), then Cx has only the d-fold eigenvalue λ|∇f (x)|2 . Thus Lemma B.13 is a special case of Lemma B.14. Proof According to Lemma B.13, constants a > f ∗ , a1 , a2 , > 0 exist with a1 ≤
|∇f (x)| ≤ a2 |xx∗ |
for all x ∈ {y ∈ D : f (y) ≤ a}\{x ∗ }. Let λ1 := max{λ : λ eigenvalue of Ax , f (x) ≤ a} and λ0 := max{λ : λ eigenvalue of Ax , f (x) ≤ a}. Let λ be an arbitrary eigenvalue of Ax with associated normalized eigenvector z. Since λ is positive, it follows that λ = |λz| = |Ax z| ≤ sup{|Ax z| : |z| = 1, f (x) ≤ a} Because the set {(z, x) : |z| = 1, f (x) ≤ a} is compact (cf. Lemma B.2 and Lemma B.4) and the representation (z, x) → |Ax z| is continuous, it follows that λ1 < ∞. If it is taken into consideration that the inequality |Ax z| > 0 is satisfied for all z, x with |z| = 1 and f (x) ≤ a, it follows analogously that λ0 > 0. Since λ0 |y|2 ≤ |y T Ax y| ≤ λ1 |y|2 for all y ∈ Rd and any x ∈ {y ∈ D : f (y) ≤ a} (cf. the proof of Lemma B.7, part (2)), we have λ0 ≤
|y T Ax y| ≤ λ1 . |y|2
From |∇f (x)T Ax ∇f (x)| |∇f (x)|2 |∇f (x)T Ax ∇f (x)| = ∗ |x − x | |x − x ∗ | |∇f (x)|2 we have then for all x ∈ {y ∈ D : f (y) ≤ a}\{x ∗ } the inequality λ0 a12 ≤
|∇f (x)T Ax ∇f (x)| ≤ λ1 a22 . |x − x ∗ |2
376
B Analytical Tools
Since each eigenvalue λx of Cx has the shape λx = [∇f (x)T Ax ∇f (x)]αx where αx is an eigenvalue of Ax the assertion follows with b1 = (λ0 a1 )2 and b2 = (λ1 a2 )2 . The remainder of this section contains primarily statements about the convergence behavior of special series. Remark B.3 Let (an : n ∈ N), (bn : n ∈ N) be sequences of nonnegative real numbers. Let a := lim inf an > 0 and b := lim sup bn > 0. Then n→∞
n→∞
lim inf n→∞
an an a a ≥ and lim sup ≤ . bn b b n→∞ bn 2
b a Proof For arbitrary > 0 define c := a+b . Then a−c a+c ≥ b − , because the c(a+b) 2 definition of c implies that c(a + b) ≤ b + bc, yielding b+c ≤ b, and hence c(a+b) c . This implies − bc ≥ ac = (1 + bc ), from which ≥ bc(a+b) 2 (1+ c ) and (1 + b ) ≥ b2 b2 b
it follows that
a b
−
c b
≥ (1 + bc )( ab − ).
This inequality yields
a−c b
1+ bc
≥
a b
− which finally implies the desired inequality
= − . The definition of a and b implies the desired inequality
a−c b+c
a b
an ≥ a − c and bn ≤ b + c for all n ∈ N. From this it follows finally for all n ∈ N that an a a−c ≥ − ≥ bn b+c b and thus lim inf bann ≥ ab . n→∞ The second assertion follows analogously.
Remark B.4 For arbitrary q ∈ (−1, 1) and i ∈ N we have ∞
(n + 1) · · · (n + i)q n =
n=0
i! . (1 − q)i+1
This implies the assertion. Remark B.5 For all x, y ≥ 0 and any n ∈ N we have (x + y)n ≤ 2n−1 (x n + y n )
B Analytical Tools
377
Proof Complete induction for n. For n = 1 the assertion is trivial. n → n + 1: (a + b)n+1 = (a + b)(a + b)n ≤ (a + b)2n−1 (a n + bn ) = = 2n−1 (a n+1 + bn+1 + abn + ba n ) ≤ 2n (a n+1 + bn+1 ) da abn + ba n ≤ a n+1 + bn+1 . This goes to prove as follows: Without loss of generality, let a ≤ b. For x ∈ [a, b] let f (x) := xbn − bx n − x n+1 . Then f (x) = bn + nbx n−1 − (n + 1)x n = bn − x n + nx n−1 (b − x). For x ∈ [a, b] we therefore have f (x) ≥ 0. Thus f is strictly increasing on the interval [a, b]. Therefore we have f (a) ≤ f (b), i.e. abn + ba n ≤ a n+1 + bn+1 .
Remark B.6 For all n ∈ N we have n! ≥ ( ne )n Proof Complete induction for n. For n = 1 the assertion is trivial. n → n + 1: It is generally known that the sequence (1 + converges towards e. Therefore
1 i i)
is strictly increasing and
1 i 1+ ≤ e for all i ∈ N. i Hence n+1 n nn (n + 1) = e e (n + 1)n n+1 n 1 −n n + 1 = = (n + 1) 1 + e n e n + 1 n+1 = e
(n + 1)! = n!(n + 1) ≥
n n
(n1 ) =
Lemma B.15 Let 0 < r < p < 1 and c ∈ R. rn+c n k n−k . Then: Let an := k p (1 − p) k=0
(1)
∞ n=1
nan < ∞
378
(2)
B Analytical Tools ∞
an < ∞
n=1
(3) lim an = 0 n→∞
Proof Let (Xn : n ∈ N) be a sequence of independent random variables on a probability field (, A, P ). Let Xn be binomial distributed with the parameters n and p. Then: an = P (Xn ≤ [rn + c]) ≤ P (Xn ≤ rn + c) = Xn − np (r − p)n + c =P √ ≤ √ np(1 − p) np(1 − p) Since r − p > 0, it follows finally for all n ∈ N that Xn − np (p − r)n − c ≥ √ an ≤ P √ np(1 − p) np(1 − p) Furthermore, it follows finally for all n ∈ N that (p − r)n − c ≥ n1/8 √ np(1 − p) and therefore Xn − np ≥ n1/8 ≥ an finally for all n ∈ N P √ np(1 − p) We have Xn − np ≥ n1/8 √2 n−1/8 exp − 1 n1/4 P √ 2 np(1 − p) 2π From 2 1 m √ m−1/8 exp − m1/4 2 2π m=1 ∞
∞ (n+1) −1 1 1/4 2 m exp − m =√ 2 2π m=1 4 4
m=n
2 =√ 2π
∞ m=1
n 0 we have
1 (1 − x a )n dx = (1 +
lim n1/a
n→∞
1 ) a
0
( Euler gamma function). (ii) for all b ∈ (0, 1) and c > 0 we have ∞
(1 − cn−b )n < ∞.
n=1
Proof (i) Partial integration yields
1
1 (1 − x ) dx = a n
x(1 − x a )n |10
+
0
xn(1 − x a )n−1 ax a−1 dx 0
1 x a (1 − x a )n−1 dx.
= an 0
Using partial integration again, this yields
1
a 2 n(n − 1) (1 − x ) dx = a+1
1 x 2a (1 − x a )n−2 dx.
a n
0
0
When this is continued, we finally obtain
1
a n n! (1 − x ) dx = (a + 1)(2a + 1) · · · ((n − 1)a + 1)
1
a n
0
a n n! = (a + 1)(2a + 1) · · · (na + 1) = We have
n! ( a1
+ 1)( a1
+ 2) · · · ( a1 + n)
.
x na dx 0
380
B Analytical Tools
n1/a n!
lim
n→∞ 1 ( 1 a a
+ 1)( a1 + 2) · · · ( a1 + n)
= (1/a).
From this it follows that
1 (1 − x a )n dx =
1/a
lim n
n→∞
1 1 1 ( ) = (1 + ). a a a
0
(ii) Because of b > 0, we have lim nb = ∞ and thus n→∞
lim (1 − cn−b )n = e−c . b
n→∞
For q ∈ [e−c , 1) therefore an N ∈ N exists with (1 − cn−b )n ≤ q for all n ≥ N b
For n ≥ N it follows therefore that (1 − cn−b )n ≤ q n According to the condition, we have 1 − b > 0. Let s ∈ N with 0 < 1s ≤ 1 − b. Then ∞
q
n1−b
n=1
≤
≤
∞ n=1
∞ (j +1) −1
1−b
.
s
q
n1/s
=
j =1
qk
1/s
k=j s
∞ ∞ s 1/s ((j + 1)s − j s )q (j ) = ((j + 1)s − j s )q j < ∞ j =1
j =1
This implies directly the assertion.
Appendix C
Probabilistic Tools
Lemma C.1 Consider a sequence (Xn : n ∈ N) of random variables with ∞ EXn2 < ∞. n2 n=1
If E(Xn+1 |Xn , . . . , X1 ) ≥ p > 0 P -a.s. for all n ∈ N, then lim inf n1 n→∞
P -a.s. If E(Xn+1 |Xn , . . . , X1 ) ≤ p P -a.s. for all n ∈ N, then lim sup n1 n→∞
n
Xi ≥ p
i=1 n
Xi ≤ p
i=1
P -a.s. Proof Because of
∞ n=1
EXn2 n2
< ∞ we have
∞
1 (Xi − E(Xi |Xi−1 , . . . , X1 )) = 0 P -a.s. . n→∞ n lim
i=1
In the first case, because of (Xi − E(Xi |Xi−1 , . . . , X1 )) ≤ Xi − p P -a.s. , we obtain the inequality 1 (Xi − p) ≥ 0 P -a.s. . n n
lim inf n→∞
i=1
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 K. Marti, Optimization Under Stochastic Uncertainty, International Series in Operations Research & Management Science 296, https://doi.org/10.1007/978-3-030-55662-4
381
382
C Probabilistic Tools
In the second case we get 1 (Xi − p) ≤ 0 P -a.s. . n n
lim sup n→∞
i=1
In both cases, this yields the assertion.
Definition C.1 Let X denote a random variable on a probability space (, A, P ). For a ∈ R and A ∈ A we define: X≥a
P -a.s. on A if and only if (X − a) · 1A ≥ 0
P -a.s. .
Lemma C.2 Let X be a random variable on a probability space (, A, P ), and let let denote C a sub—σ -algebra of A. Moreover, let C ∈ C, and consider an arbitrary a ∈ R and B ∈ A. Then: (1)
E(X|C) ≥ a P -a.s. on C if and only if
XdP ≥ aP (A ∩ C) for all A ∈ C.
(2) P (B|C) ≥ a P -a.s. on C if and only if P (A∩B∩C) ≥ aP (A∩C) for all A ∈ C. Proof Because of P (B|C) = E(1B |C), the second assertion follows from the first. Moreover, the first one is essentially based on the fact that for arbitrary Bmeasurable random variables Z (B ⊂ A) the following equivalence holds:
ZdP ≥ 0 for all B ∈ B.
Z ≥ 0 P -a.s. if and only if B
The implication from the left to the right is obviously true. The implication from the right to the left is obtained by means of a special selection of B, namely B = {Z < 0}. ZdP < 0, in contradiction Indeed, if P (Z < 0) > 0, then we would have {Z 0 and an a > 0 such that P (Zn+1 = 1|An ) ≥ p P -a.s. on {Yn ≤ a}
for all n ∈ N0 .
Then for all n ∈ N, K ∈ {0, . . . , n} and each stopping time T relative to (An : n ∈ N) we have (1) ( P
T +n
)
T8 +n
Zi ≤ K ∩
i=T +1
! {Yi ≤ a}|AT
i=T +1
K n ≤ ( )pj (1 − p)n−j P -a.s. on {YT ≤ a} j j =0
(2) ( P
T +n
) Zi ≤ K ∩
i=T +1
T8 +n
! {Yi ≤ a}|AT
i=T
K n ≤ P (YT ≤ a) ( )pj (1 − p)n−j . j j =0
Proof (1) Proof with complete induction for n. Let A ∈ AT := {C ∈ A : C ∩ {T = k} ∈ Ak for all k ∈ N0 }. From P (Zn+1 = 1|An ) ≥ p P -a.s. on {Yn ≤ a} for all n ∈ N0
C Probabilistic Tools
385
we get (cf. Lemma C3) P (ZT +1 = 1|AT ) ≥ p P -a.s. on {YT ≤ a} for all n ∈ N0 . Assume now n = 1 and K ∈ {0, 1}. For K = 1 the assertion is trivial. Thus, put K = 0. Then ( P
T +n
)
T8 +n
Zi ≤ K ∩
i=T +1
! {Yi ≤ a}|AT
i=T +1
≤ P (ZT +1 = 0|AT ) ≥ 1 − p P -a.s. on {YT ≤ a}. Thus, the assertion is shown for n = 1 . Assume now that the assertion holds for a fixed n ∈ N, for all K ∈ {0, . . . , n} and for all stopping times T relative to (An : n ∈ N). We have to show now that the assertion also holds for n + 1, i.e., for all stopping times T relative to (An : n ∈ N), for each K ∈ {0, . . . , n + 1} and for arbitrary A ∈ AT it must be shown that the following inequality holds: P
A ∩ {YT ≤ a} ∩ {
T +n
Zi ≤ K} ∩
i=T +1
≤ P (A ∩ {YT ≤ a})
T8 +n
! {Yi ≤ a}
i=T +1
K n+1 j p (1 − p)n+1−j . j j =0
Without loss of generality we can assume that K ∈ {0, . . . , n} (in case k = n + 1 the inequality obviously holds true). Applying the condition of induction to the stopping time T + 1 , we get P
A ∩ {YT ≤ a} ∩ {
T +n+1
Zi ≤ K} ∩
i=T +1
T +n+1 8
! {Yi ≤ a}
i=T +1
= P (A ∩ {YT ≤ a} ∩ {YT +1 ≤ a} ∩ {ZT +1 = 1} ( T +1+n ) ! T +1+n 8 ∩ Zi ≤ K − 1 ∩ {Yi ≤ a} i=T +1+1
i=T +1+1
+ P (A ∩ {YT ≤ a} ∩ {YT +1 ≤ a} ∩ {ZT +1 = 0} ( T +1+n ) ! T +1+n 8 ∩ Zi ≤ K ∩ {Yi ≤ a} . i=T +1+1
i=T +1+1
386
C Probabilistic Tools
For the first term we have P (A ∩ {YT ≤ a} ∩ {YT +1 ≤ a} ∩ {ZT +1 = 1} T +1+n
∩{
Zi ≤ K − 1} ∩
i=T +1+1
(
=
P
T +1+n 8
!
{Yi ≤ a}
i=T +1+1
T +1+n
)
T +1+n 8
Zi ≤ K − 1 ∩
i=T +1+1
! {Yi ≤ a}|AT +1 dP
i=T +1+1
≤ A ∩ {YZ ≤ a} ∩ {YT +1 ≤ a} ∩ {ZT +1 = 1} ⎛ ⎞ K−1 n ⎝ p j (1 − p)n−j ⎠ P (A ∩ {YT ≤ a} ∩ {ZT +1 = 1}). j j =0
Analogously, for the second term we find the inequality P (A ∩ {YT ≤ a} ∩ {YT +1 ≤ a} ∩ {ZT +1 = 0} ( T +1+n ) T +1+n 8 ∩ Zi ≤ K ∩ {Yi ≤ a}) i=T +1+1
i=T +1+1
⎞ ⎛ K n p j (1 − p)n−j ⎠ P (A ∩ {YT ≤ a} ∩ {ZT +1 = 0}) ≤⎝ j j =0
= (P (A ∩ {YT ≤ a}) − P (A ∩ {YT ≤ a} ∩ {ZT +1 = 1})) K n j p (1 − p)n−j . × j j =0
Hence, P
A ∩ {YT ≤ a} ∩ {
T +1+n i=T +1
Zi ≤ K} ∩
T +n+1 8
!
{Yi ≤ a}
i=T +1
K n j P ((1 − P )n−j ) − P (A ∩ {YT ≤ a} ≤ P (A ∩ {YT ≤ a}) j j =0
∩{ZT +1
n K p (1 − p)n−K = 1}) K
C Probabilistic Tools
387
K n j n K n−j p (1 − p) ≤ P (A ∩ {YT ≤ a})( −p p (1 − p)n−K ) j K j =0
⎛
= P (A ∩ {YT ≤ a}) ⎝(1 − p)
K n j p (1 − p)n−j j j =0
⎞
+p
K n j p (1 − p)n−j ⎠ K j =1
⎛
= P (A ∩ {YT ≤ a}) ⎝
K n j p (1 − p)n−j +1 j j =0
⎞ K n pj (1 − p)n−j +1 ⎠ × j −1 j =1
= P (A ∩ {YT ≤ a})
K n+1 j =0
j
pj (1 − p)n+1−j ,
where the last inequality follows from
P (A ∩ {YT ≤ a} ∩ {ZT +1 = 1}) =
P ({ZT +1 = 1|AT })dP ≥ A∩{YT ≤A}
= pP (A ∩ {YT ≤ a}). Thus, the first assertion is shown. (2) The second assertion is a direct consequence of the first one. Lemma C.5 Let (Mn : n ∈ N) be a sequence of random variables with values in N. If P (Mn ≥ k) ≤ ak for all k, n ∈ N, and
∞ k=1
Mn n→∞ n
then lim
= 0 P -a.s. .
Proof For arbitrary integers i, n ∈ N we have
ak < ∞,
388
C Probabilistic Tools
6! ; 4 Mm Mm > 1/ i ≤ P sP sup ≥ 1/i m m≥n m m≥n P (Mm ≥ m/i) ≤ P (Mm ≥ [m/i]) ≤
m≥n
m≥n
≤
P (Mm ≥ [m/i])
m≥[n/i]i (j +1)i−1
∞
=
P (Mm ≥ [m/i])
j =[n/i] m=j ·i ∞
≤
j =[n/i]
≤i
∞ ji iP(Mm ≥ [ ]) = i P (Mm ≥ j ) i
∞
j =[n/i]
aj .
j =[n/i]
If n → ∞, then [n/ i] → ∞ and therefore lim
n→∞
∞
aj = 0, since
j =[n/i]
∞
ak < ∞.
k=1 0
This implies P (sup | Mmm | = 1/ i) −−−→ for all i ∈ N and thus the assertion. n→∞
m≥n
Lemma C.6 Let be (Yn : n ∈ N) a sequence of random variables, B, C ∈ R measurable sets and c ∈ R, p1 , p2 > 0 such that n i=1 lim inf n n→∞ i=1 n
lim inf n→∞
i=1
1{Yi ∈B} ≥ p1 P -a.s. and 1{Yi 0, p2 −a2 > 0 and (p1 −a1 )(p2 −a2 ) ≥ p1 p2 − a, then n
1{Yi ∈B} ≥ (p1 − a1 )
i=1
n
1{Yi